This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Using Conda Channels and PyPI (pip)

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • What are Conda channels?

  • Why should I be explicit about which channels my research project uses?

  • What should I do if a Python package isn’t available via a Conda channel?

Objectives
  • Install a package from a specific channel.

What are Conda channels?

Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. The conda command searches a standard set of channels, referred to as defaults. The defaults channels include:

Unless otherwise specified, packages installed using conda will be downloaded from the defaults channels.

The conda-forge channel

In addition to the defaults channels that are managed by Anaconda Inc., there is another channel that also has

a special status. The Conda-Forge project “is a community led collection of recipes, build infrastructure and distributions for the conda package manager.

There are a few reasons that you may wish to use the conda-forge channel instead of the defaults channel maintained by Anaconda:

  1. Packages on conda-forge may be more up-to-date than those on the defaults channel.
  2. There are packages on the conda-forge channel that aren’t available from defaults.

My package isn’t available in the defaults channels! What should I do?

You may find that packages (or often more recent versions of packages!) that you need to install for your project are not available on the defaults channels. In this case you could try the following channels.

  1. conda-forge: the conda-forge channel contains a large number of community curated Conda packages. Typically the most recent versions of packages that are generally available via the defaults channel are available on conda-forge first.
  2. bioconda: the bioconda channel also contains a large number of Bioinformatics curated conda packages. bioconda channel is meant to be used with conda-forge, you should not worried about using the two channels when installing your prefered packages.

For example, Kaggle publishes a Python 3 API that can be used to interact with Kaggle datasets, kernels and competition submissions. You can search for the package on the defaults channels but you will not find it!

$ conda search kaggle
Loading channels: done
No match found for: kaggle. Search: *kaggle*

PackagesNotFoundError: The following packages are not available from current channels:

  - kaggle

Current channels:

  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/osx-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

Let’s check whether the package exists on at least conda-forge channel. Note that the official installation instructions suggest a different way to install.

$ conda search --channel conda-forge kaggle
Loading channels: done
# Name                       Version           Build  Channel
kaggle                         1.5.3          py27_1  conda-forge
kaggle                         1.5.3          py36_1  conda-forge
kaggle                         1.5.3          py37_1  conda-forge
kaggle                         1.5.4          py27_0  conda-forge
kaggle                         1.5.4          py36_0  conda-forge
kaggle                         1.5.4          py37_0  conda-forge
.
.
.
kaggle                        1.5.12  py38h578d9bd_1  conda-forge
kaggle                        1.5.12  py38h578d9bd_2  conda-forge
kaggle                        1.5.12  py39hf3d152e_0  conda-forge
kaggle                        1.5.12  py39hf3d152e_1  conda-forge
kaggle                        1.5.12  py39hf3d152e_2  conda-forge
kaggle                        1.5.12    pyhd8ed1ab_4  conda-forge

Or you can also check online at https://anaconda.org/conda-forge/kaggle.

Once we know that the kaggle package is available via conda-forge we can go ahead and install it.

$ conda create --name machine-learning-env python=3.10
$ conda activate machine-learning-env
$ conda install --channel conda-forge kaggle=1.5.12

Channel priority

You may specify multiple channels for installing packages by passing the --channel argument multiple times.

$ conda install scipy=1.10.0 --channel conda-forge --channel bioconda

Channel priority decreases from left to right - the first argument has higher priority than the second. For reference, bioconda is a channel for the conda package manager specializing in bioinformatics software. For those interested in learning more about the Bioconda project, checkout the project’s GitHub page.

Please note that in our example, adding bioconda channel is irrelevant because scipy is no longer available on bioconda channel.

Specifying channels when installing packages

polars is an alternative to pandas written in the Rust programming language, so it runs faster.

Create a Python 3.10 environment called fast-analysis-project with the polars package. Also include the most recent versions of jupyterlab (so you have a nice UI) and matplotlib (so you can make plots) in your environment .

Solution

In order to create a new environment we use the conda create command. After making and activating the environment we check what versions of polars are available so we can install explicit version of these. Finally we install the version of polars we wish to use along with the most recent versions of jupyterlab and matplotlib (since we do not explicitly state the versions of these).

$ mkdir my-computer-vision-project
$ cd my-computer-vision-project/
$ conda create --name my-computer-vision-project python=3.10
$ conda activate my-computer-vision-project
$ conda search --channel conda-forge polars
$ conda install --channel conda-forge jupyterlab polars matplotlib

Hint: the --channel argument can also be shortened to -c, for more abbreviations, see also the Conda command reference .

Alternative syntax for installing packages from specific channels

There exists an alternative syntax for installing conda packages from specific channels that more explicitly links the channel being used to install a particular package under the current active environment.

$ conda install conda-forge::polars

Repeat the previous exercise using this alternative syntax to install python, jupyterlab, and matplotlib from the default channel and polars the conda-forge channel in an environment called my-final-project.

Solution

One possibility of doing this is to create the environment my-final-project with an explicit version of Python, activate it, then install the packages jupyterlab and matplotlib without specifying channel, but prefixing polars with the conda-forge:: channel.

Using pip and Conda

You can use the default Python package manager pip to install packages from Python Package Index (PyPI). However, there are a few potential issues that you should be aware of when using pip to install Python packages when using Conda.

First, pip is sometimes installed by default on operating systems where it is used to manage any Python packages needed by your OS. You do not want to use /usr/bin/pip to install Python packages when using Conda environments.

(base) $ conda deactivate
$ which python
/usr/bin/python
$ which pip # sometimes installed as pip3
/usr/bin/pip

Windows users…

You can type where.exe in PowerShell and it does the same thing as which in bash.

Second, pip is also included in the Miniconda installer where it is used to install and manage OS specific Python packages required to setup your base Conda environment. You do not want to use this ~/miniconda3/bin/pip to install Python packages when using Conda environments.

$ conda activate
(base) $ which python
~/miniconda3/bin/python
$ which pip
~/miniconda3/bin/pip

Why should I avoid installing packages into the base Conda environment?

If your base Conda environment becomes cluttered with a mix of pip and Conda installed packages it may no longer function. Creating separate Conda environments allows you to delete and recreate environments readily so you dont have to worry about risking your core Conda functionality when mixing packages installed with Conda and Pip.

If you find yourself needing to install a Python package that is only available via PyPI, then you should use the copy of pip, which is installed automatically when you create a Conda environment with Python, to install the desired package from PyPI. Using the pip installed in your Conda environment to install Python packages not available via Conda channels will help you avoid difficult to debug issues that frequently arise when using Python packages installed via a pip that was not installed inside you Conda environment.

Conda (+Pip)

Pitfalls of using Conda and pip together can be avoided by always ensuring your desired environment is active before installing anything using pip. This can be done by looking at the output of conda info.

Installing packages into Conda environments using pip

Combo is a comprehensive Python toolbox for combining machine learning models and scores. Model combination can be considered as a subtask of ensemble learning, and has been widely used in real-world tasks and data science competitions like Kaggle.

Activate the machine-learning-env you created in a previous challenge and use pip to install combo.

Solution

The following commands will activate the machine-learning-env and install combo.

$ conda activate machine-learning-env
$ pip install combo==0.1.*

For more details on using pip see the official documentation.

Key Points

  • A package is a compressed archive file containing system-level libraries, Python or other modules, executable programs and other components, and associated metadata.

  • A Conda channel is a URL to a directory containing a Conda package(s).

  • Conda and Pip can be used together effectively.