Getting Started with Slurm on ERISTwo

Getting Started with Slurm on ERISTwo

 

Queuing system (Slurm)

SLURM (Simple Linux Universal Resource Manager) is a scheduler that allocates resources for submitted compute jobs. Similarly to ERIStwo/LSF this is a free service to all users.

Partitions

SLURM's partitions are similar to the ‘queues’ of the LSF job scheduler previously deployed on the Scientific Computing (SciC) Linux Clusters.  Each partition has its own dedicated resources such as the number of nodes, run time, GPU, CPU, memory, etc.

To view the list of available partitions, execute the command:

$ sinfo -s
Partition Max Time Max Cores Memory
Short 3 h 80 < 360 GB
Normal 120 h 80 < 360 GB
Bigmem 192 h 88 360 GB - 1 TB
Long 384 h 80 < 360 GB
Filemove - 8 < 130 GB

 

There are some other groups partitions, you will only have visibility to the ones you are authorized for. For additional info on a specific partition, execute the command:

$ sinfo -p <partition_name>


Submitting jobs

Most commonly, SLURM jobs are submitted in batch mode using the command 'sbatch' and an input script file consisting of SBATCH flags specified at the top of the file. SLURM jobs accept the following flags to request resources:

Job Name   #SBATCH --job-name=My-Job_Name
Wall time hours  #SBATCH --time=24:0:0   or -t[days-hh:min:sec]
Number of nodes   #SBATCH --nodes=1
Number of proc per node   #SBATCH --ntasks-per-node=24
Number of cores per task   #SBATCH --cpus-per-task=24
Number of GPU #SBATCH --gpus=3
Send mail at end of the job #SBATCH --mail-type=end
User's email address   #SBATCH --mail-user=userid@mgb.edu
Working Directory  #SBATCH --workdir=dir-name
Memory Size    #SBATCH --mem=[mem |M|G|T] or --mem-per-cpu
Partition #SBATCH --partition=[name]
Job Arrays    #SBATCH --array=[array_spec]

 

and where more details about each of the settings can be found here. In SLURM each task represents a collection of resources where, for example, the user can specify the number of CPUs per task and memory per task. The main constraint is that all tasks must run on a single node (e.g. not claim more CPUs than are available on a single node).

 

To submit a job script:

$ sbatch <path-to-script>

After submitting your jobs, always check that your jobs have been submitted successfully.

Check job status:

$ squeue

View more verbose job status:

$ sjob <job_ID>

Check the job in detail:

$ scontrol show job <job_ID>

 

Slurm Job status, code, and explanation

When you request status information on your job you can get one of the following:

COMPLETED

CD

The job has been completed successfully.

COMPLETING

CG

The job is finishing but some processes are still active.

FAILED

F

The job terminated with a non-zero exit code and failed to execute.

PENDING

PD

The job is waiting for resource allocation. It will eventually run.

PREEMPTED

PR

The job was terminated because of preemption by another job.

RUNNING

R

The job currently is allocated to a node and is running.

SUSPENDED

S

A running job has been stopped with its cores released to other jobs.

STOPPED

ST

A running job has been stopped with its cores retained.

The job can be canceled or killed; execute the command:

$ scancel <jobID>

 

Handy LSF to Slurm Reference

The table below provides a convenient reference to assist users in the transition from LSF to SLURM. Please note that some specifications are not always 'one-to-one'  (e.g. the specification of tasks as noted above). 

 

Commands

LSF Slurm Description
bsub < script_file sbatch script_file Submit a job from script_file
bkill 123 scancel 123 Cancel job 123
bjobs squeue List user's pending and running jobs
bqueues sinfo

sinfo -s

Cluster status with partition (queue) list

With '-s' a summarised partition list, which is shorter and simpler to interpret.

Job Specification

LSF Slurm Description
#BSUB #SBATCH Scheduler directive
-q queue_name -p queue_name Queue to 'queue_name'
-n 64 -n 64 Processor count of 64
-W [hh:mm:ss] -t [minutes]
or
-t [days-hh:mm:ss]
Max wall run time
-o file_name -o file_name STDOUT output file
-e file_name -e file_name STDERR output file
-J job_name --job-name=job_name Job name
-x --exclusive Exclusive node usage for this job - i.e. no other jobs on same nodes
-M 128 --mem-per-cpu=128M
or
--mem-per-cpu=1G
Memory requirement
-R "span[ptile=16]" --tasks-per-node=16 Processes per node
-P proj_code --account=proj_code Project account to charge job to
-J "job_name[array_spec]" --array=array_spec Job array declaration

 

Job Environment Variables

LSF Slurm Description
$LSB_JOBID $SLURM_JOBID Job ID
$LSB_SUBCWD $SLURM_SUBMIT_DIR Submit directory
$LSB_JOBID $SLURM_ARRAY_JOB_ID Job Array Parent
$LSB_JOBINDEX $SLURM_ARRAY_TASK_ID Job Array Index
$LSB_SUB_HOST $SLURM_SUBMIT_HOST Submission Host
$LSB_HOSTS
$LSB_MCPU_HOST
$SLURM_JOB_NODELIST Allocated compute nodes
$LSB_DJOB_NUMPROC $SLURM_NTASKS
(mpirun can automatically pick this up from Slurm, it does not need to be specified)
Number of processors allocated
  $SLURM_JOB_PARTITION Queue

 

note that similarly to LSF, %j can be used to insert the jobID into the name of the output and error log files (see Example 2 below).

 

Types of SLURM Job Submissions

In addition to 'sbatch' SLURM also offers other modes of job submission which can be summarised as follows:

 

i) sbatch – Submit a script for later execution (batch mode)

ii) salloc – Create job allocation and start a shell to use it (interactive mode) 

iii) srun – Create a job allocation (if needed) and launch a job step within an sbatch or interactive (salloc) job, for example.

  • ●  srun can use a subset or all of the job's resources.

iv) sattach – Connect stdin/out/err for an existing job step

 

 

Example SLURM Job Submission

 

EXAMPLE 1: SINGLE CPU JOB 

 

In the example below, a simple Python program is submitted as a SLURM job to the ‘short’ partition using ‘sbatch <file_name>’ where 1 CPU and 1GB of memory are requested.  

 

#!/bin/bash 

#SBATCH --job-name=single_cpu_example 

#SBATCH --partition=short 

#SBATCH –ntasks=1 

#SBATCH --cpus-per-task=1 

#SBATCH –mem-per-cpu=1G 

#SBATCH --output=log%J.out 

#SBATCH --error=log%J.err 

 

# Check what versions of anaconda are available with ‘module avail anaconda’ 

module load anaconda/4.12.0 

 

# Run a demo python program on a single CPU 

python hello.py 

 

EXAMPLE 2: MULTIPLE CPU JOB 

 

In the example below, a Python program is submitted as a SLURM job to the ‘normal’ partition using ‘sbatch <file_name>' where 4 CPUs and 1GB of memory per CPU are requested.  

 

#!/bin/bash 

#SBATCH --job-name=multi_cpu_example 

#SBATCH --partition=normal 

#SBATCH --ntasks=4 

#SBATCH --cpus-per-task=1 

#SBATCH --mem-per-cpu=1 

#SBATCH --output=log%J.out 

#SBATCH --error=log%J.err 
 

# Check what versions of anaconda are available with ‘module avail anaconda’ 

module load anaconda/4.12.0 

 

# Run a python program on 4 CPUs that splits the iterations of the loop among each CPU 

python square.py 

 

EXAMPLE 3: SINGULARITY JOB 

 

In the example below, a Python program is run upon the startup of a singularity container and is submitted as a SLURM job to the ‘short’ partition using ‘sbatch <file_name>’.  

 

#!/bin/bash 

#SBATCH --job-name=singularity_example 

#SBATCH --partition=short 

#SBATCH –ntasks=1 

#SBATCH --mem-per-cpu=2 

#SBATCH --output=singularity-log%J.out 

#SBATCH --error=singularity-log%J.err 

 

# Check available versions of singularity with ‘module avail singularity’ 

module load singularity 

 

# Run application using singularity 

singularity exec conda-miniconda3.sif python3 pyTest.py 

 

EXAMPLE 4: ARRAY JOB USING R 

 

In the example below, an R script is submitted as a SLURM job to the ‘filemove’ partition using ‘sbatch <file_name>'.  

 

#!/bin/bash 

#SBATCH --job-name=hello-parallel-test 

#SBATCH --partition=filemove 

#SBATCH --array=1-5 

#SBATCH –ntasks=2 

#SBATCH --mem-per-cpu=1 

#SBATCH --output=hello-%j-%a.out 

#SBATCH --error=hello-%j-%a.err 
 

# Check available versions of R using ‘module avail R’ 

module load R/3.5.1-foss-2018b 

 

# Run application using R passing in the array ID, corresponding to $SLURM_ARRAY_TASK_ID, as a command line argument 

Rscript hello-parallel.R $SLURM_ARRAY_TASK_ID 

 

The example below is the simple R script used in example 4.

hello-parallel.R 

#!/usr/bin/env Rscript 

args = commandArgs(trailingOnly=TRUE) 

print(paste0('Hello! I am task number: ', args[1])) 

vector = c(1, 10, 100, 1000, 10000) 

multiply = function(x, y) { 

return(x*y) 

} 

num = as.integer(args[1]) 

res = multiply(vector[num],num) 

print(paste0('Result: ', res)) 

Go to KB0039912 in the IS Service Desk