17 Using Mamba/Conda
18 Using Mamba/Conda
18.1 What is mamba/anaconda
Conda is an open-source, cross-platform package and environment management system that simplifies the installation, updating, and management of software packages and their dependencies. It allows users to create isolated environments for different projects, ensuring that conflicting package versions do not interfere with each other. As we have not purchased a license from Anaconda, we rely on the open-source mamba package from the conda-forge community channel, which serves as a faster, C++-based, and fully compatible drop-in replacement for the conda command.
18.2 Install Mamba
See Section 1.6
18.3 Channels
18.3.1 What is a channel
In conda, a channel is a URL to a repository where packages are stored. When you install a package, conda searches the configured channels to find the necessary files. The order of the channels in your configuration file, ~/.condarc, determines the priority in which packages are downloaded.
18.3.2 Open-source channels
Anaconda considers Company a business and requires that we have an enterprise license to use their channels. We have opted to use open-source channels instead. Businesses without an Anaconda license can use conda-forge, a community-driven channel with a vast collection of open-source software. This channel is not maintained by Anaconda and is free for all to use. Other community-maintained channels, such as bioconda for bioinformatics software, are also available and free to use. These channels are hosted on anaconda.org, but are not subject to the same licensing restrictions as Anaconda’s defaults channel.
Do not use the defaults or R channels, as these are not open-source channels and require a paid Anaconda license for businesses.
18.4 Create a conda environment
Important Note About Channels
- do not use the “default” or “R” channels in Conda as these require a paid Anaconda license.
Say I want to run the DNA variant analysis software Slivar. Since its a bioinformatics tool, lets search the bioconda repository.
So, I have learned that a conda package exists for Slivar, and it is published in the bioconda channel. Now I can create a command to make a new environment with Slivar installed. The -n slivar means I want to give the environment the (arbitrary) name slivar, the -c bioconda means to look in the bioconda channel for packages, and the slivar at the end is the package that I want to install. The command might take a few minutes and will prompt me to type y to continue.
mamba create -n slivar -c bioconda slivarNow when I want to use Slivar I do
mamba activate slivarI can see (slivar) next to my bash prompt now, indicating I am in the slivar environment that I made. Now I can use Slivar.
slivar --helpYou can install more than one tool into an environment. For example, if I wanted to add BCFtools to my slivar environment, I would run the following command while the environment was active:
mamba install -c bioconda bcftoolsWhen I am done using or updating it, I can exit the environment.
mamba deactivateIf I forget where I have put the environments that I have made, I can list them.
mamba env listFor more details, the Managing environments page is very helpful. In particular, you might look at the --prefix option for saving an environment in a particular location (like with the project that it is used for), and the -f option for creating an environment from a yaml file (making it easy to share and reproduce). I recommend keeping separate environments for separate tasks as much as possible, since it is easy to switch between environments. However, if you have multiple programs that will be piped together, or just used together frequently, it makes sense to put them together in one environment.
18.5 Using conda in a batch script
When you initialize Mamba (or Conda), it modifies your ~/.bashrc file to add necessary environment variables and functions that allow commands like mamba activate to work correctly. Slurm job scripts are usually run as non-interactive shells. By default, bash does not automatically source ~/.bashrc for non-interactive shells. By explicitly including source ~/.bashrc at the beginning of your Slurm job script, you ensure that the Mamba initialization code within your ~/.bashrc is executed, setting up the necessary environment for Mamba commands to function within the job.
See Section 11.5 for an example.
18.6 Additional help
RSC hosted a lunch and learn that contains more detail on best practices building and maintaining your conda environments that we highly recommend you review: video - slide deck