ERISXdl: Using SLURM Job Scheduler

Queuing system (Slurm)

Slurm (Simple Linux Universal Resource Manager) is a scheduler that allocates resources to the submitted job; therefore, all jobs on ERISXdl should be submitted through the SLURM scheduler system. For more information on using ERISXdl, see the Getting Started article.

Partitions

Slurm’s partitions are similar to ‘queues’ in other job schedulers like LSF on ERISOne and ERISTwo.  Each partition has its dedicated resources such as the number of nodes, run time, GPU, CPU, memory, etc.

To view the list of available partitions, execute the command:

$ sinfo

A summary of the partitions

Please remember that except for the Basic partition, all others require a group and fund number registration to be able to send jobs to them.

Partition GPU limit Max time limit Memory limit
Basic (Free tier) 1 GPU 10 min 30G
Short 1 GPU 1 hour 60G
Medium 2 GPU 4 hours 100G
Long 4 GPU 10 hours 100G
Mammoth 8 GPU 2 weeks 400G

* NOTE : Please do not use the 'batch' partition for job submissions on ERISXdl

For additional info on a specific partition, execute command:

$ spart <partition_name>

There are several GPU nodes that accept jobs from all the queues. Currently, there are 5 nodes dgx-1 through dgx-5. To view the GPU card status on each node:

$ gpulist <node_name>

Submitting jobs

To submit a job, write a bash script with the SBATCH flags specified at the top of the file. SLURM job accepts the following flags to request resources:

Job Name   #SBATCH --job-name=My-Job_Name
Wall time hours  #SBATCH --time=24:0:0   or -t[days-hh:min:sec]
Number of nodes   #SBATCH --nodes=1
Number of proc per node   #SBATCH --ntasks-per-node=24
Number of cores per task   #SBATCH --cpus-per-task=24
Number of GPU #SBATCH --gpus=3
Send mail at end of the job #SBATCH --mail-type=end
User's email address   #SBATCH --mail-user=userid@mgb.edu
Working Directory  #SBATCH --workdir=dir-name
Job Restart  #SBATCH --requeue
Share Nodes  #SBATCH --shared
Dedicated nodes  #SBATCH --exclusive
Memory Size    #SBATCH --mem=[mem |M|G|T] or --mem-per-cpu
Account to Charge   #SBATCH --account=[account]
Partition #SBATCH --partition=[name]
Quality of Service #SBATCH --qos=[name]
Job Arrays    #SBATCH --array=[array_spec]
Use specific resource  #SBATCH --constraint="XXX"

When specifying a partition, please make sure that the SBATCH qos flag (quality of service) is set to the same partition name.

To submit a job script:

$ sbatch <path-to-script>

After submitting your jobs, always check that your jobs have been submitted successfully.

Check job status:

$ squeue

View more verbose job status:

$ sjob <job_ID>

Check job in detail:

$ scontrol show job <job_ID>

Slurm Job status, code, and explanation

When you request status information of your job you can get one of the following:

COMPLETED

CD

The job has completed successfully.

COMPLETING

CG

The job is finishing but some processes are still active.

FAILED

F

The job terminated with a non-zero exit code and failed to execute.

PENDING

PD

The job is waiting for resource allocation. It will eventually run.

PREEMPTED

PR

The job was terminated because of preemption by another job.

RUNNING

R

The job currently is allocated to a node and is running.

SUSPENDED

S

A running job has been stopped with its cores released to other jobs.

STOPPED

ST

A running job has been stopped with its cores retained.

The job can be canceled or killed; execute the command:

$ scancel <jobID>

Common commands in Slurm vs. LSF

Slurm

LSF

Explanation

sbatch

bsub

Submit job

sinfo

bqueues

List queues

spart <partition_name>

bqueues -l <queue name>

View queue in details

squeue

bjobs -u all

List all jobs status

scontrol show job  <jobid>

bjobs -l <jobID>

Check job in details

scancel

bkill

Cancel or kill job

Example SLURM Job Submissions

To run computational jobs with containers, users should submit a job script to the SLURM job scheduler. Computational jobs within containers should not be run on the login nodes. When submitting jobs that use containers, you will need to specify both the registry location of the image and code that should be run within the container. Keep in mind that any locally stored containers on the cluster obtained from using podman pull will not be accessible to the GPU-nodes. All containers must be available on the Harbor or other third-party registry.

For more information on containers and using Podman on ERISXdl, see the Using Docker Containers article

Example 1: Using a public image when running a job

In the following example, the public CUDA image is used to run code in the Long ‘queue’/partition. The job script to be submitted is called job_script.sh and the script to be run within the container during the job is called example_script.sh.

job_script.sh :

#!/bin/bash

#SBATCH --job-name=test-job
#SBATCH --output=/PHShome/<username>/path/to/output.txt
#SBATCH --partition=Long
#SBATCH --qos=Long
#SBATCH --gpus=3
#SBATCH --ntasks=1
#SBATCH --time=50:00
#SBATCH --mem-per-cpu=100

## The output location specified above MUST exist before submitting the job
## The partition and qos flags MUST be set to the same partition

## Set the docker container image to be used in the job runtime.
## In this example, the registry location points to the public CUDA image in Harbor
export KUBE_IMAGE=erisxdl.partners.org/nvidia/cuda

## Set the script to be run within the specified container - this MUST be a separate script
export KUBE_SCRIPT=/PHShome/<username>/path/to/example-script.sh

## Required wrapper script. This must be included at the end of the job submission script.
## This wrapper script provides cluster features within the running KUBE_IMAGE container, such as
## - mmounting the /apps, /data, and your /PHShome directory into the container so that they are accessible when running the KUBE_SCRIPT
## - providing the 'module' command to load ERISTwo modules
srun /data/erisxdl/kube-slurm/wrappers/kube-slurm-lmod-incontainer-job.sh

example-script.sh

#!/bin/bash

...

# your code here that will be run in the specified KUBE_IMAGE container

...

Submitting the job from the login nodes:

$ sbatch job_script.sh

Example 2: Using a personal image when running a job

In the following example, the hypothetical ‘abc123’ username’s alpine image is used to run code in the Short ‘queue’/partition.

job_script.sh :

#!/bin/bash

#SBATCH --job-name=test-job
#SBATCH --output=/PHShome/abc123/path/to/output.txt
#SBATCH --partition=Short
#SBATCH --qos=Short
#SBATCH --gpus=1
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

## The output location specified above MUST exist before submitting the job
## The partition and qos flags MUST be set to the same partition

## Set the docker container image to be used in the job runtime
export KUBE_IMAGE=erisxdl.partners.org/abc123/alpine

## Specify the script to be run within the specified container - this MUST be a separate script
export KUBE_SCRIPT=/PHShome/abc123/path/to/example-script.sh

## Required wrapper script. This must be included at the end of the job submission script.
## This wrapper script provides cluster features within the running KUBE_IMAGE container, such as
## - mounting the /apps, /data, and your /PHShome directories into the container, allowing access to files
## - providing the 'module' command to load and use modules from ERISTwo
srun /data/erisxdl/kube-slurm/wrappers/kube-slurm-lmod-incontainer-job.sh

example-script.sh

#!/bin/bash

...

# your code here that will be run in the specified KUBE_IMAGE container

...

Submitting the job from the login nodes:

$ sbatch job_script.sh

Example 3: Using modules within containers when running a job

In the following example, the hypothetical ‘abc123’ username’s CUDA image is used to run code in the Basic ‘queue’/partition. The script run within the container uses the 'module' command to load in Python 3.8.2 from the ERISTwo modules. For more information on loading and using modules, see the Loading Applications article.

job_script.sh :

#!/bin/bash

#SBATCH --job-name=test-job-kube
#SBATCH --output=/PHShome/abc123/path/to/output.txt
#SBATCH --partition=Basic
#SBATCH --qos=Basic
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

## The output location specified above MUST exist before submitting the job
## The partition and qos flags MUST be set to the same partition

## Set the docker container image to be used in the job runtime
export KUBE_IMAGE=erisxdl.partners.org/abc123/cuda

## Specify the script to be run within the specified container - this MUST be a separate script
export KUBE_SCRIPT=/PHShome/abc123/path/to/example-script-with-modules.sh

## Required wrapper script. This must be included at the end of the job submission script.
## This wrapper script provides cluster features within the running KUBE_IMAGE container, such as
## - mounting the /apps, /data, and your /PHShome directories into the container, allowing access to files
## - providing the 'module' command to load and use modules from ERISTwo
srun /data/erisxdl/kube-slurm/wrappers/kube-slurm-lmod-incontainer-job.sh

example-script-with-modules.sh :

#!/bin/bash

## NOTE: In order to use the 'module' command made available in containers,
## the following two lines must be included to correctly initialize the module system setup
source /etc/profile.d/lmod.sh
module use /apps/modulefiles/conversion

module load python/3.8.2
python --version

...

# your code here that will be run in the specified KUBE_IMAGE container

...

Submitting the job from the login nodes:

$ sbatch job_script.sh
Go to KB0038883 in the IS Service Desk