Conda virtual environments on ErisXDL

Anaconda is a distribution of the Python and R programming languages for scientific computing (data sciencemachine learning applications, that aims to simplify package management and deployment. Most data science and machine learning packages can be obtained from anaconda.

A main component of Anaconda is the package management system conda that allows to download packages and ensures that dependencies are meet. As Python packages can have very complicated dependencies, that might be in conflict with other applications, conda allows to create virtual environments. The porous of a virtual environment is to provide everything a project needs in terms of dependencies and applications, without interfering with other installations on the computer system.  However, unlike containers, the package still has access to the overall system and system tools.

How to use anaconda in any ERIS HPC environment

Anaconda is installed in multiple versions on the system. ERISXdl pulls from the existing ERISTwo modules, and can any of the available Anaconda modules there. Anaconda can be loaded for the first time via the module files:

$ module load anaconda/<version>

Then then conda can be used. For versions >=4.4, you will need to also run the following command, and then logout and back in to fully initialize conda:

$ conda init bash

To create an environment use:

$ conda create -n <env_name>

This will create an environment with the base settings. To specify specific programs or versions you can add them at this point:

$ conda create -n <env_name> python=3.8 numpy pandas

This would create an environment with python version 3.8, with the pandas and numpy libraries installed.It will take some time to create the environment and you might be asked to confirm the installation. The environment can then be activated by:

$ conda activate <env_name>

Once the environment is activated, the installed packages will be taken from the environment not from the general system. Other programs/packages can be installed as well:

(env_name)$ conda install <package>

For python programs, pip is available within the conda environment too:

(env_name)$ pip install <package>

Pip is as well a package management system, mainly for python. There are differences between conda and pip in how it handles conflicts, why we recommend to stay with conda but some packages are only available using pip.

How to use conda environments in ERISXdl job submissions

It should be further noted that conda environments that are used on ERISXdl should be created on ERISXdl. This is because some optimizations are hardware / environment dependent. Further you should make sure to load the gpu-enabled version of packages (if available) into the environment. To take TensorFlow as an example, the environment creation would be:

$ module load anaconda/version
$ conda create -n tf_xdl python=3.8
$ conda activate tf_xdl
(tf_xdl)$ conda install cuda
(tf_xdl)$ conda install matplotlib
(tf_xdl)$ conda install tensorflow-gpu

Conda in job submissions

To use the conda environment in a SLURM batch job, one needs to activate the environment before using it. For more information on using and submitting jobs through the SLURM job scheduler on ERISXdl, please read the Using SLURM article. When submitting jobs, you may wish to run code within a Docker container. For more information on containers and how to use them, please read the Using Docker containers article.

Before running jobs, please make sure that you have prepared your conda environment and any code or containers necessary for your job. If there are specific packages that need to be installed within your conda environment, please do so on the login nodes by following the package installation instructions above. Users should not run any computational jobs on the ERISXdl login nodes, and should activate conda environments on the login nodes to configure their packages correctly.

Using conda without containers

If you would like to activate an existing conda environment when running a job without a container, such that the code in the script will run directly on the ERISXdl host compute node, you will just need to include the module load and the conda activate command in the bash script.

-----example.sh-----

#!/bin/bash

#SBATCH --job-name=example_gpu
#SBATCH --output=/PHShome/abc123/output/file/that/must/exist.txt
#SBATCH --partition=Basic
#SBATCH --qos=Basic

#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=0:50:00
#SBATCH --mem-per-cpu=10000

# load in your preferred anaconda version
module use /apps/modulefiles/conversion
module load anaconda/<verion>

# Initialize conda
# If you have run the conda init bash command previous to your job and have not modified your .bashrc file, you can omit the command from the script
conda init bash
source ~/.bashrc

# activate your environment

conda activate tf_xdl
python example.py

This example can then be submitted via 

$sbatch -p <partition_name> example.sh

Using conda environments within containers

Users may also wish to run jobs with a conda environment activated in a Docker container. To do so, users will need to created two scripts: a script to be submitted to the job scheduler that specifies a container and code to run, and a script to run within the container that will load and activate conda. In the example below, those two scripts will be specified as job_script.sh and example-script-loading-conda.sh:

job_script.sh :

#!/bin/bash

#SBATCH --job-name=test-job-conda
#SBATCH --output=/PHShome/abc123/path/to/output.txt
#SBATCH --partition=Basic
#SBATCH --qos=Basic
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

## NOTE: The output location specified above MUST exist before submitting the job

## Set the docker container image to be used in the job runtime
export KUBE_IMAGE=erisxdl.partners.org/abc123/cuda

## Specify the script to be run within the container above - this MUST be a separate script
export KUBE_SCRIPT=/PHShome/abc123/path/to/example-script-loading-conda.sh

## Required wrapper script. This must be included at the end of the job submission script.
## This wrapper script provides cluster features within the running KUBE_IMAGE container, such as
## - mounting the /apps, /data, and your /PHShome directories into the container, allowing access to files
## - providing the 'module' command to load and use modules from ERISTwo
srun /data/erisxdl/kube-slurm/wrappers/kube-slurm-lmod-incontainer-job.sh

example-script-.sh :

#!/bin/bash

## NOTE: In order to use the 'module' command made available in containers,
## the following two lines must be included to correctly initialize the module system setup
source /etc/profile.d/lmod.sh
module use /apps/modulefiles/conversion

# Load the desired anaconda module
module load anaconda/<version>
# Initialize conda
# If you have run the conda init bash command previous to your job and have not modified your .bashrc file, you can omit the command from the script
conda init bash
source ~/.bashrc

# Activate your existing conda environment conda activate test_xdl

Submitting the job from the login nodes:

$ sbatch job_script.sh

Further reads

We recommend the following to familiarize yourself with Anaconda and conda environments:

https://www.anaconda.com/

https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

https://docs.conda.io/projects/conda/en/latest/_downloads/843d9e0198f2a193a3484886fa28163c/conda-cheatsheet.pdf

Go to KB0038606 in the IS Service Desk

Related articles