Using Conda Channels and PyPI (pip)
Overview
Teaching: 20 min
Exercises: 10 minQuestions
What are Conda channels?
Why should I be explicit about which channels my research project uses?
What should I do if a Python package isn’t available via a Conda channel?
Objectives
Install a package from a specific channel.
What are Conda channels?
Conda packages are downloaded from
remote channels, which are URLs to directories containing conda packages. The conda
command
searches a standard set of channels, referred to as defaults
. The defaults
channels include:
main
: The majority of all new Anaconda, Inc. package builds are hosted here. Included indefaults
as the top priority channel.r
: Microsoft R Open conda packages and Anaconda, Inc.’s R conda packages.
Unless otherwise specified, packages installed using conda
will be downloaded from the defaults
channels.
The
conda-forge
channelIn addition to the
defaults
channels that are managed by Anaconda Inc., there is another channel that also hasa special status. The Conda-Forge project “is a community led collection of recipes, build infrastructure and distributions for the conda package manager.”
There are a few reasons that you may wish to use the
conda-forge
channel instead of thedefaults
channel maintained by Anaconda:
- Packages on
conda-forge
may be more up-to-date than those on thedefaults
channel.- There are packages on the
conda-forge
channel that aren’t available fromdefaults
.
My package isn’t available in the defaults
channels! What should I do?
You may find that packages (or often more recent versions of packages!) that you need to
install for your project are not available on the defaults
channels. In this case you could try the
following channels.
conda-forge
: theconda-forge
channel contains a large number of community curated Conda packages. Typically the most recent versions of packages that are generally available via thedefaults
channel are available onconda-forge
first.bioconda
: thebioconda
channel also contains a large number of Bioinformatics curated conda packages.bioconda
channel is meant to be used withconda-forge
, you should not worried about using the two channels when installing your prefered packages.
For example, Kaggle publishes a Python 3 API that can be used to interact with Kaggle
datasets, kernels and competition submissions. You can search for the package on the defaults
channels but you will
not find it!
$ conda search kaggle
Loading channels: done
No match found for: kaggle. Search: *kaggle*
PackagesNotFoundError: The following packages are not available from current channels:
- kaggle
Current channels:
- https://repo.anaconda.com/pkgs/main/osx-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/free/osx-64
- https://repo.anaconda.com/pkgs/free/noarch
- https://repo.anaconda.com/pkgs/r/osx-64
- https://repo.anaconda.com/pkgs/r/noarch
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
Let’s check whether the package exists on at least conda-forge
channel.
Note that the official installation instructions
suggest a different way to install.
$ conda search --channel conda-forge kaggle
Loading channels: done
# Name Version Build Channel
kaggle 1.5.3 py27_1 conda-forge
kaggle 1.5.3 py36_1 conda-forge
kaggle 1.5.3 py37_1 conda-forge
kaggle 1.5.4 py27_0 conda-forge
kaggle 1.5.4 py36_0 conda-forge
kaggle 1.5.4 py37_0 conda-forge
.
.
.
kaggle 1.5.12 py38h578d9bd_1 conda-forge
kaggle 1.5.12 py38h578d9bd_2 conda-forge
kaggle 1.5.12 py39hf3d152e_0 conda-forge
kaggle 1.5.12 py39hf3d152e_1 conda-forge
kaggle 1.5.12 py39hf3d152e_2 conda-forge
kaggle 1.5.12 pyhd8ed1ab_4 conda-forge
Or you can also check online at https://anaconda.org/conda-forge/kaggle.
Once we know that the kaggle
package is available via conda-forge
we can go ahead and install
it.
$ conda create --name machine-learning-env python=3.10
$ conda activate machine-learning-env
$ conda install --channel conda-forge kaggle=1.5.12
Channel priority
You may specify multiple channels for installing packages by passing the
--channel
argument multiple times.$ conda install scipy=1.10.0 --channel conda-forge --channel bioconda
Channel priority decreases from left to right - the first argument has higher priority than the second. For reference, bioconda is a channel for the conda package manager specializing in bioinformatics software. For those interested in learning more about the Bioconda project, checkout the project’s GitHub page.
Please note that in our example, adding
bioconda
channel is irrelevant becausescipy
is no longer available onbioconda
channel.
Specifying channels when installing packages
polars
is an alternative topandas
written in the Rust programming language, so it runs faster.Create a Python 3.10 environment called
fast-analysis-project
with thepolars
package. Also include the most recent versions ofjupyterlab
(so you have a nice UI) andmatplotlib
(so you can make plots) in your environment .Solution
In order to create a new environment we use the
conda create
command. After making and activating the environment we check what versions ofpolars
are available so we can install explicit version of these. Finally we install the version ofpolars
we wish to use along with the most recent versions ofjupyterlab
andmatplotlib
(since we do not explicitly state the versions of these).$ mkdir my-computer-vision-project $ cd my-computer-vision-project/ $ conda create --name my-computer-vision-project python=3.10 $ conda activate my-computer-vision-project $ conda search --channel conda-forge polars $ conda install --channel conda-forge jupyterlab polars matplotlib
Hint: the
--channel
argument can also be shortened to-c
, for more abbreviations, see also the Conda command reference .
Alternative syntax for installing packages from specific channels
There exists an alternative syntax for installing conda packages from specific channels that more explicitly links the channel being used to install a particular package under the current active environment.
$ conda install conda-forge::polars
Repeat the previous exercise using this alternative syntax to install
python
,jupyterlab
, andmatplotlib
from thedefault
channel andpolars
theconda-forge
channel in an environment calledmy-final-project
.Solution
One possibility of doing this is to create the environment
my-final-project
with an explicit version of Python,activate
it, then install the packagesjupyterlab
andmatplotlib
without specifying channel, but prefixingpolars
with theconda-forge::
channel.
Using pip
and Conda
You can use the default Python package
manager pip
to install packages from Python Package Index
(PyPI). However, there are a few potential
issues that you should be aware of when using pip
to
install Python packages when using Conda.
First, pip
is sometimes installed by default on operating systems where it is used to manage any Python packages
needed by your OS. You do not want to use /usr/bin/pip
to install Python packages when using Conda
environments.
(base) $ conda deactivate
$ which python
/usr/bin/python
$ which pip # sometimes installed as pip3
/usr/bin/pip
Windows users…
You can type
where.exe
in PowerShell and it does the same thing aswhich
in bash.
Second, pip
is also included in the Miniconda installer where it is used to install and manage OS specific Python
packages required to setup your base
Conda environment. You do not want to use this ~/miniconda3/bin/pip
to
install Python packages when using Conda environments.
$ conda activate
(base) $ which python
~/miniconda3/bin/python
$ which pip
~/miniconda3/bin/pip
Why should I avoid installing packages into the
base
Conda environment?If your
base
Conda environment becomes cluttered with a mix ofpip
and Conda installed packages it may no longer function. Creating separate Conda environments allows you to delete and recreate environments readily so you dont have to worry about risking your core Conda functionality when mixing packages installed with Conda and Pip.
If you find yourself needing to install a Python package that is only available via PyPI, then you should use the copy of
pip
, which is installed automatically when you create a Conda environment with Python, to install the desired package
from PyPI. Using the pip
installed in your Conda environment to install Python packages not available via Conda
channels will help you avoid difficult to debug issues that frequently arise when using Python packages installed via a
pip
that was not installed inside you Conda environment.
Conda (+Pip)
Pitfalls of using Conda and
pip
together can be avoided by always ensuring your desired environment is active before installing anything usingpip
. This can be done by looking at the output ofconda info
.
Installing packages into Conda environments using
pip
Combo is a comprehensive Python toolbox for combining machine learning models and scores. Model combination can be considered as a subtask of ensemble learning, and has been widely used in real-world tasks and data science competitions like Kaggle.
Activate the
machine-learning-env
you created in a previous challenge and usepip
to installcombo
.Solution
The following commands will activate the
machine-learning-env
and installcombo
.$ conda activate machine-learning-env $ pip install combo==0.1.*
For more details on using
pip
see the official documentation.
Key Points
A package is a compressed archive file containing system-level libraries, Python or other modules, executable programs and other components, and associated metadata.
A Conda channel is a URL to a directory containing a Conda package(s).
Conda and Pip can be used together effectively.