March 2, 2023
Queuing system (Slurm)
Slurm (Simple Linux Universal Resource Manager) is a scheduler that allocates resources to the submitted job; therefore, all jobs on ERISXdl should be submitted through the SLURM scheduler system. For more information on using ERISXdl, see the Getting Started article.
Partitions
Slurm’s partitions are similar to ‘queues’ in other job schedulers like LSF on Scientific Computing (SciC) Linux Clusters. Each partition has its dedicated resources such as the number of nodes, run time, GPU, CPU, memory, etc.
To view the list of available partitions, execute the command:
$ sinfo
A summary of the partitions
Please remember that except for the Basic partition, all others require a group and fund number registration to be able to send jobs to them.
Partition | GPU limit | Max time limit | Memory limit |
Basic (Free tier) | 1 GPU | 10 min | 30G |
Short | 1 GPU | 1 hour | 60G |
Medium | 2 GPU | 4 hours | 100G |
Long | 4 GPU | 10 hours | 100G |
Mammoth | 8 GPU | 2 weeks | 400G |
* NOTE : Please do not use the 'batch' partition for job submissions on ERISXdl
For additional info on a specific partition, execute command:
$ spart <partition_name>
There are several GPU nodes that accept jobs from all the queues. Currently, there are 5 nodes dgx-1 through dgx-5. To view the GPU card status on each node:
$ gpulist <node_name>
Submitting jobs
To submit a job, write a bash script with the SBATCH flags specified at the top of the file. SLURM job accepts the following flags to request resources:
Job Name | #SBATCH --job-name=My-Job_Name |
Wall time hours | #SBATCH --time=24:0:0 or -t[days-hh:min:sec] |
Number of nodes | #SBATCH --nodes=1 |
Number of proc per node | #SBATCH --ntasks-per-node=24 |
Number of cores per task | #SBATCH --cpus-per-task=24 |
Number of GPU | #SBATCH --gpus=3 |
Send mail at end of the job | #SBATCH --mail-type=end |
User's email address | #SBATCH --mail-user=userid@mgb.edu |
Working Directory | #SBATCH --workdir=dir-name |
Job Restart | #SBATCH --requeue |
Share Nodes | #SBATCH --shared |
Dedicated nodes | #SBATCH --exclusive |
Memory Size | #SBATCH --mem=[mem |M|G|T] or --mem-per-cpu |
Account to Charge | #SBATCH --account=[account] |
Partition | #SBATCH --partition=[name] |
Quality of Service | #SBATCH --qos=[name] |
Job Arrays | #SBATCH --array=[array_spec] |
Use specific resource | #SBATCH --constraint="XXX" |
To submit a job script:
$ sbatch <path-to-script>
After submitting your jobs, always check that your jobs have been submitted successfully.
Check job status:
$ squeue
View more verbose job status:
$ sjob <job_ID>
Check job in detail:
$ scontrol show job <job_ID>
Slurm Job status, code, and explanation
When you request status information of your job you can get one of the following:
COMPLETED |
CD |
The job has completed successfully. |
COMPLETING |
CG |
The job is finishing but some processes are still active. |
FAILED |
F |
The job terminated with a non-zero exit code and failed to execute. |
PENDING |
PD |
The job is waiting for resource allocation. It will eventually run. |
PREEMPTED |
PR |
The job was terminated because of preemption by another job. |
RUNNING |
R |
The job currently is allocated to a node and is running. |
SUSPENDED |
S |
A running job has been stopped with its cores released to other jobs. |
STOPPED |
ST |
A running job has been stopped with its cores retained. |
The job can be canceled or killed; execute the command:
$ scancel <jobID>
Common commands in Slurm vs. LSF
Slurm |
LSF |
Explanation |
sbatch |
bsub |
Submit job |
sinfo |
bqueues |
List queues |
spart <partition_name> |
bqueues -l <queue name> |
View queue in details |
squeue |
bjobs -u all |
List all jobs status |
scontrol show job <jobid> |
bjobs -l <jobID> |
Check job in details |
scancel |
bkill |
Cancel or kill job |
Example SLURM Job Submissions
2023/03/02 The below examples are deprecated and the text will be amended in due course. While new text is being prepared please request a demo case for ERISXdl from hpcsupport@partners.org.
To run computational jobs with containers, users should submit a job script to the SLURM job scheduler. Computational jobs within containers should not be run on the login nodes. When submitting jobs that use containers, you will need to specify both the registry location of the image and code that should be run within the container. Keep in mind that any locally stored containers on the cluster obtained from using podman pull
will not be accessible to the GPU-nodes. All containers must be available on the Harbor or other third-party registry.
For more information on containers and using Podman on ERISXdl, see the Using Docker Containers article.
Example 1: Using a public image when running a job
In the following example, the public CUDA image is used to run code in the Long ‘queue’/partition. The job script to be submitted is called job_script.sh
and the script to be run within the container during the job is called example_script.sh
.
job_script.sh
:
#!/bin/bash
#SBATCH --job-name=test-job
#SBATCH --output=/PHShome/<username>/path/to/output.txt
#SBATCH --partition=Long
#SBATCH --gpus=3
#SBATCH --ntasks=1
#SBATCH --time=50:00
#SBATCH --mem-per-cpu=100
## The output location specified above MUST exist before submitting the job
## Define working directory to use e.g.
export KUBE_DATA_VOLUME=/data/<group briefcase>
## Set the docker container image to be used in the job runtime.
## In this example, the registry location points to the public CUDA image in Harbor
export KUBE_IMAGE=erisxdl.partners.org/nvidia/cuda
## Set the script to be run within the specified container - this MUST be a separate script
export KUBE_SCRIPT=/PHShome/<username>/path/to/example-script.sh
## Required wrapper script. This must be included at the end of the job submission script.
## This wrapper script provides cluster features within the running KUBE_IMAGE container, such as
## - mmounting the /apps, /data, and your /PHShome directory into the container so that they are accessible when running the KUBE_SCRIPT
## - providing the 'module' command to load ERISTwo modules
srun /data/erisxdl/kube-slurm/wrappers/kube-slurm-lmod-incontainer-job.sh
example-script.sh
:
#!/bin/bash
...
# your code here that will be run in the specified KUBE_IMAGE container
...
Submitting the job from the login nodes:
$ sbatch job_script.sh
Example 2: Using a personal image when running a job
In the following example, the hypothetical ‘abc123’ username’s alpine image is used to run code in the Short ‘queue’/partition.
job_script.sh
:
#!/bin/bash
#SBATCH --job-name=test-job
#SBATCH --output=/PHShome/abc123/path/to/output.txt
#SBATCH --partition=Short
#SBATCH --gpus=1
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
## The output location specified above MUST exist before submitting the job
## Define working directory to use e.g.
export KUBE_DATA_VOLUME=/data/<group briefcase>
## Set the docker container image to be used in the job runtime
export KUBE_IMAGE=erisxdl.partners.org/abc123/alpine
## Specify the script to be run within the specified container - this MUST be a separate script
export KUBE_SCRIPT=/PHShome/abc123/path/to/example-script.sh
## Required wrapper script. This must be included at the end of the job submission script.
## This wrapper script provides cluster features within the running KUBE_IMAGE container, such as
## - mounting the /apps, /data, and your /PHShome directories into the container, allowing access to files
## - providing the 'module' command to load and use modules from ERISTwo
srun /data/erisxdl/kube-slurm/wrappers/kube-slurm-lmod-incontainer-job.sh
example-script.sh
:
#!/bin/bash
...
# your code here that will be run in the specified KUBE_IMAGE container
...
Submitting the job from the login nodes:
$ sbatch job_script.sh
Example 3: Using modules within containers when running a job
In the following example, the hypothetical ‘abc123’ username’s CUDA image is used to run code in the Basic ‘queue’/partition. The script run within the container uses the 'module' command to load in Python 3.8.2 from the ERISTwo modules. For more information on loading and using modules, see the Loading Applications article.
job_script.sh
:
#!/bin/bash
#SBATCH --job-name=test-job-kube
#SBATCH --output=/PHShome/abc123/path/to/output.txt
#SBATCH --partition=Basic
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
## The output location specified above MUST exist before submitting the job.
## Define working directory to use e.g.
export KUBE_DATA_VOLUME=/data/<group briefcase>
## Set the docker container image to be used in the job runtime
export KUBE_IMAGE=erisxdl.partners.org/abc123/cuda
## Specify the script to be run within the specified container - this MUST be a separate script
export KUBE_SCRIPT=/PHShome/abc123/path/to/example-script-with-modules.sh
## Required wrapper script. This must be included at the end of the job submission script.
## This wrapper script provides cluster features within the running KUBE_IMAGE container, such as
## - mounting the /apps, /data, and your /PHShome directories into the container, allowing access to files
## - providing the 'module' command to load and use modules from ERISTwo
srun /data/erisxdl/kube-slurm/wrappers/kube-slurm-lmod-incontainer-job.sh
example-script-with-modules.sh
:
#!/bin/bash
## NOTE: In order to use the 'module' command made available in containers,
## the following two lines must be included to correctly initialize the module system setup
source /etc/profile.d/lmod.sh
module use /apps/modulefiles/conversion
module load python/3.8.2
python --version
...
# your code here that will be run in the specified KUBE_IMAGE container
...
Submitting the job from the login nodes:
$ sbatch job_script.sh