This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Conda environments for effective and reproducible research: Additional Syntax

Key Points

Getting Started with Conda
  • Conda is a platform agnostic, open source package and environment management system.

  • Using a package and environment management tool facilitates portability and reproducibility of (data) science workflows.

  • Conda solves both the package and environment management problems and targets multiple programming languages. Other open source tools solve either one or the other, or target only a particular programming language.

  • Anaconda is not only for Python

Working with Environments
  • A Conda environment is a directory that contains a specific collection of Conda packages that you have installed.

  • You create (remove) a new environment using the conda create (conda remove) commands.

  • You activate (deactivate) an environment using the conda activate (conda deactivate) commands.

  • You install packages into environments conda install.

  • Use the conda env list command to list existing environments and their respective locations.

  • Use the conda list command to list all of the packages installed in an environment.

  • Use the conda [command] --help to get information on how to use conda or a specific command.

Using Conda Channels and PyPI (pip)
  • A package is a compressed archive file containing system-level libraries, Python or other modules, executable programs and other components, and associated metadata.

  • A Conda channel is a URL to a directory containing a Conda package(s).

  • Conda and Pip can be used together effectively.

Sharing Environments

Additional Syntax

In this course we have deliberately chosen to use a single approach to creating and activating Conda environments and installing packages within them, following the pattern of Create > Activate > Install. This was done to avoid confusing and overwhelming participants with the multitude of options because there are other methods/syntax for combining these steps that achieve the same end result.

Creating and Installing packages in one command

It is possible to create a Conda environment and install specific Python packages within it (from Conda repositories at least, not using pip to install packages from PyPI) in a single step. This negates the need to specify a version of Python to install (i.e. python=3.10) as it will be pulled in as a dependency of the Python package.

For example to create the basic-scipy-env created whilst working through Chapter 2 you could use the following.

$ conda create --name basic-scipy-env numba scikit-learn=1.2.0 ipython matplotlib=3.7 scipy=1.9.3
$ conda activate basic-scipy-env

Namespaces

An alternative method of specifying the channel from which you wish to install a specific package from is know as “Namespaces” where the package is preceded by the channel from which you wish to install it from and separated by two semi-colons.

For example to install the polars package from the conda-forge channel within the my-final-project environment as is done Chapter 3 you could use the following.

$ conda create --name my-final-project python=3.10
$ conda activate my-final-project
$ conda install jupyterlab matplotlib conda-forge::polars

This can even be combined with the above example of creating and installing packages at the same time to give the following.

$ conda create --name my-final-project python=3.10 jupyterlab matplotlib conda-forge::polars
$ conda activate my-final-project

Glossary