September 10, 2024
Getting Started with Slurm on ERISTwo
Queuing system (Slurm)
SLURM (Simple Linux Universal Resource Manager) is a scheduler that allocates resources for submitted compute jobs. Similarly to ERIStwo/LSF this is a free service to all users.
Partitions
SLURM's partitions are similar to the ‘queues’ of the LSF job scheduler previously deployed on the Scientific Computing (SciC) Linux Clusters. Each partition has its own dedicated resources such as the number of nodes, run time, GPU, CPU, memory, etc.
To view the list of available partitions, execute the command:
$ sinfo -s
Partition | Max Time | Max Cores | Memory |
Short | 3 h | 80 | < 360 GB |
Normal | 120 h | 80 | < 360 GB |
Bigmem | 192 h | 88 | 360 GB - 1 TB |
Long | 384 h | 80 | < 360 GB |
Filemove | - | 8 | < 130 GB |
There are some other groups partitions, you will only have visibility to the ones you are authorized for. For additional info on a specific partition, execute the command:
$ sinfo -p <partition_name>
Submitting jobs
Most commonly, SLURM jobs are submitted in batch mode using the command 'sbatch' and an input script file consisting of SBATCH flags specified at the top of the file. SLURM jobs accept the following flags to request resources:
Job Name | #SBATCH --job-name=My-Job_Name |
Wall time hours | #SBATCH --time=24:0:0 or -t[days-hh:min:sec] |
Number of nodes | #SBATCH --nodes=1 |
Number of proc per node | #SBATCH --ntasks-per-node=24 |
Number of cores per task | #SBATCH --cpus-per-task=24 |
Number of GPU | #SBATCH --gpus=3 |
Send mail at end of the job | #SBATCH --mail-type=end |
User's email address | #SBATCH --mail-user=@email |
Working Directory | #SBATCH --workdir=dir-name |
Memory Size | #SBATCH --mem=[mem |M|G|T] or --mem-per-cpu |
Partition | #SBATCH --partition=[name] |
Job Arrays | #SBATCH --array=[array_spec] |
and where more details about each of the settings can be found here. In SLURM each task represents a collection of resources where, for example, the user can specify the number of CPUs per task and memory per task. The main constraint is that all tasks must run on a single node (e.g. not claim more CPUs than are available on a single node).
To submit a job script:
$ sbatch <path-to-script>
After submitting your jobs, always check that your jobs have been submitted successfully.
Check job queue status:
$ squeue
Or for a list of recently submitted user jobs:
$ sacct
Check the job in detail:
$ scontrol show job <job_ID>
Slurm Job status, code, and explanation
When you request status information on your job you can get one of the following:
COMPLETED |
CD |
The job has been completed successfully. |
COMPLETING |
CG |
The job is finishing but some processes are still active. |
FAILED |
F |
The job terminated with a non-zero exit code and failed to execute. |
PENDING |
PD |
The job is waiting for resource allocation. It will eventually run. |
PREEMPTED |
PR |
The job was terminated because of preemption by another job. |
RUNNING |
R |
The job currently is allocated to a node and is running. |
SUSPENDED |
S |
A running job has been stopped with its cores released to other jobs. |
STOPPED |
ST |
A running job has been stopped with its cores retained. |
The job can be canceled or killed; execute the command:
$ scancel <jobID>
Handy LSF to Slurm Reference
The table below provides a convenient reference to assist users in the transition from LSF to SLURM. Please note that some specifications are not always 'one-to-one' (e.g. the specification of tasks as noted above).
Commands
LSF | Slurm | Description |
---|---|---|
bsub < script_file | sbatch script_file | Submit a job from script_file |
bkill 123 | scancel 123 | Cancel job 123 |
bjobs | squeue | List user's pending and running jobs |
bqueues | sinfo
sinfo -s |
Cluster status with partition (queue) list
With '-s' a summarised partition list, which is shorter and simpler to interpret. |
Job Specification
LSF | Slurm | Description |
---|---|---|
#BSUB | #SBATCH | Scheduler directive |
-q queue_name | -p queue_name | Queue to 'queue_name' |
-n 64 | -n 64 | Processor count of 64 |
-W [hh:mm:ss] | -t [minutes] or -t [days-hh:mm:ss] |
Max wall run time |
-o file_name | -o file_name | STDOUT output file |
-e file_name | -e file_name | STDERR output file |
-J job_name | --job-name=job_name | Job name |
-x | --exclusive | Exclusive node usage for this job - i.e. no other jobs on same nodes |
-M 128 | --mem-per-cpu=128M or --mem-per-cpu=1G |
Memory requirement |
-R "span[ptile=16]" | --tasks-per-node=16 | Processes per node |
-P proj_code | --account=proj_code | Project account to charge job to |
-J "job_name[array_spec]" | --array=array_spec | Job array declaration |
Job Environment Variables
LSF | Slurm | Description |
---|---|---|
$LSB_JOBID | $SLURM_JOBID | Job ID |
$LSB_SUBCWD | $SLURM_SUBMIT_DIR | Submit directory |
$LSB_JOBID | $SLURM_ARRAY_JOB_ID | Job Array Parent |
$LSB_JOBINDEX | $SLURM_ARRAY_TASK_ID | Job Array Index |
$LSB_SUB_HOST | $SLURM_SUBMIT_HOST | Submission Host |
$LSB_HOSTS $LSB_MCPU_HOST |
$SLURM_JOB_NODELIST | Allocated compute nodes |
$LSB_DJOB_NUMPROC | $SLURM_NTASKS (mpirun can automatically pick this up from Slurm, it does not need to be specified) |
Number of processors allocated |
$SLURM_JOB_PARTITION | Queue |
note that similarly to LSF, %j can be used to insert the jobID into the name of the output and error log files (see Example 2 below).
Types of SLURM Job Submissions
In addition to 'sbatch' SLURM also offers other modes of job submission which can be summarised as follows:
i) sbatch – Submit a script for later execution (batch mode)
ii) salloc – Create job allocation and start a shell to use it (interactive mode)
iii) srun – Create a job allocation (if needed) and launch a job step within an sbatch or interactive (salloc) job, for example.
-
● srun can use a subset or all of the job's resources.
iv) sattach – Connect stdin/out/err for an existing job step
Example SLURM Job Submission
EXAMPLE 1: Interactive JOB
An interactive session on one of the compute nodes can most conveniently be initiated by invoking the following:
srun --pty -p interactive /bin/bash
and where additional resources (e.g. memory, cpus etc) can be specified using the settings indicated in the table above. Alternatively,
an interactive session can be initiated by using salloc.
EXAMPLE 2: SINGLE CPU JOB
In the example below, a simple Python program is submitted as a SLURM job to the ‘short’ partition using ‘sbatch <file_name>’ where 1 CPU and 1GB of memory are requested.
#!/bin/bash
#SBATCH --job-name=single_cpu_example
#SBATCH --partition=short
#SBATCH –ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH –mem-per-cpu=1G
#SBATCH --output=log%J.out
#SBATCH --error=log%J.err
# Load conda with the module of miniforge3 distribution
module load miniforge3
# Run a demo python program on a single CPU
python hello.py
EXAMPLE 3: MULTIPLE CPU JOB
In the example below, a Python program is submitted as a SLURM job to the ‘normal’ partition using ‘sbatch <file_name>' where 4 CPUs and 1GB of memory per CPU are requested.
#!/bin/bash
#SBATCH --job-name=multi_cpu_example
#SBATCH --partition=normal
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1
#SBATCH --output=log%J.out
#SBATCH --error=log%J.err
# Load conda with the module of miniforge3 distribution
module load miniforge3
# Run a python program on 4 CPUs that splits the iterations of the loop among each CPU
python square.py
EXAMPLE 4: SINGULARITY JOB
In the example below, a Python program is run upon the startup of a singularity container and is submitted as a SLURM job to the ‘short’ partition using ‘sbatch <file_name>’.
#!/bin/bash
#SBATCH --job-name=singularity_example
#SBATCH --partition=short
#SBATCH –ntasks=1
#SBATCH --mem-per-cpu=2
#SBATCH --output=singularity-log%J.out
#SBATCH --error=singularity-log%J.err
# Check available versions of singularity with ‘module avail singularity’
module load singularity
# Run application using singularity
singularity exec conda-miniforge3.sif python3 pyTest.py
EXAMPLE 5: ARRAY JOB USING R
In the example below, an R script is submitted as a SLURM job to the ‘filemove’ partition using ‘sbatch <file_name>'.
#!/bin/bash
#SBATCH --job-name=hello-parallel-test
#SBATCH --partition=filemove
#SBATCH --array=1-5
#SBATCH –ntasks=2
#SBATCH --mem-per-cpu=1
#SBATCH --output=hello-%j-%a.out
#SBATCH --error=hello-%j-%a.err
# Check available versions of R using ‘module avail R’
module load R/3.5.1-foss-2018b
# Run application using R passing in the array ID, corresponding to $SLURM_ARRAY_TASK_ID, as a command line argument
Rscript hello-parallel.R $SLURM_ARRAY_TASK_ID
The example below is the simple R script used in example 4.
hello-parallel.R
#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
print(paste0('Hello! I am task number: ', args[1]))
vector = c(1, 10, 100, 1000, 10000)
multiply = function(x, y) {
return(x*y)
}
num = as.integer(args[1])
res = multiply(vector[num],num)
print(paste0('Result: ', res))