Queues for Job Scheduling and Resource Reservation on ERISTwo

Introduction 

On the Scientific Computing (SciC) Linux Clusters, it is important to choose the correct queue so that your job is scheduled as quickly as possible and has access to the resources needed by your application. This is a list of the most common ERISTwo queues which apply equally for general use and for research groups with access to dedicated nodes.

Working with Queues

Set the queue to use with the "-q" option to bsub:

bsub -q long < my_script.lsf

Or by putting the same option in the header of your script:

#BSUB -q long 

Specifying resource requirements

This example requests 4 CPU cores and 10GB RAM memory (specified in MB)

bsub -q big-multi -n 4 -R 'rusage[mem=10000]' < my_script.lsf 

The same options given in an LSF script:

#BSUB -q big-multi #BSUB -n 4 #BSUB -R rusage[mem=10000] 

Another example showing both memory requirement and memory limit settings, which are both needed for reservations of more than 40GB - here 64GB is reserved:

bsub -q big-multi -M 64000 -R 'rusage[mem=64000]' < my_script.lsf 

The same options given in an LSF script:

#BSUB -q big-multi #BSUB -M 64000 #BSUB -R rusage[mem=64000] 

Job arrays

Job arrays assign the CPU/memory allocation to each job in the job array.  Requesting multiple job slots with "-n SLOTS" does not do this, except in the "mpi" queue.

Queues scripts examples

Several script templates are available on each home folder upon account creation, look for them on:

ls ~/lsf/templates/bsub

If you want to test it, copy each example on a different folder, for example on ~/lsf, and then submit the job as described on the example. Read each example for more detailed information. 

If you have deleted your ~/lsf folder, you can copy it from /lsf/copy/templates. 

Standard Queues on ERISTwo

The job scheduler offers several job queues to which you can submit your jobs.  Each queue is optimized for different types of job, based on:

  • run time
  • memory requirement
  • number of CPUs used in parallel

Summary

vshort

The "vshort" queue is a high priority queue for very short jobs requiring 1GB or less memory.

  • Default memory allocation is 1GB and should not exceed 4GB..
  • Maximum runtime is 15 minutes.

short

The "short" queue is a priority queue for short jobs taking less than 1 hour with modest memory requirements.

  • Default memory allocation is 2GB.
  • Minimum runtime is 10s.
  • Maximum runtime is 1 hour.
  • Memory requirement should be specified if more than 2GB and should not exceed 4GB.
  • Ideal for single threaded applications (-n 1)

medium

The "medium" queue is a priority queue for jobs taking less than 1 day with modest memory requirements.

  • Default memory allocation is 2GB.
  • Minimum runtime is 1min.
  • Maximum runtime is 24 hours.
  • Memory requirement should be specified if more than 2GB and should not exceed 8GB.
  • Ideal for applications using less than 4 CPU cores per job (-n 4 or less)

normal

The "normal" queue is a general queue for jobs taking less than 3 days

  • Default memory allocation is 2GB.
  • Minimum runtime is 1min.
  • Maximum runtime is 3 days.
  • Memory requirement should be specified if more than 2GB and should not exceed 8GB.
  • Maximum CPU allocation is 6 CPU cores per job

long

The "long" queue is suitable for running jobs with modest memory requirements. 

  • Default memory allocation is 2GB.
  • Minimum runtime is 1min.
  • Maximum runtime is 1 week.
  • Memory requirement should be specified if more than 2GB and should not exceed 8GB.
  • Ideal for applications using less than 4 CPU cores per job (-n 4 or less).

vlong

The "long" queue is suitable for long running jobs with modest memory requirements. 

  • Default memory allocation is 2GB.
  • Minimum runtime is 1min.
  • Default max runtime is 4 weeks.
  • Memory requirement should be specified if more than 2GB and should not exceed 4GB.
  • Email Scientific Computing if you require access to queues with longer run time.

big

The "big" queue is suitable for single threaded, large memory jobs, 8GB or more.

  • Big single node jobs will be dispatched fastest from this queue.
  • Only 1-6 job slots (CPUs) can be allocated.
  • Minimum runtime is 1min.
  • Memory requirement should be specified if more than 16GB.
  • Memory limit must also be set equal to memory reservation if more than 40GB.
  • Maximum memory limit 498G.

big-multi

The "big-multi" queue is suitable for multi threaded, large memory jobs, 8GB or more.

  • Big single node multi-core jobs will be dispatched fastest from this queue.
  • Number of CPU cores required should be specified with the "-n THREADS" option.
  • Ideal for applications using between 4 and 12 (or 16) CPUs per job (-n 4 or more)
  • Minimum runtime is 1min.
  • Memory requirement should be specified if more than 8GB.
  • Memory limit must also be set equal to memory reservation if more than 40GB.
  • Maximum memory limit 498G.

mpi

The "mpi" queue is for jobs using an implementation of the Message Passing Interface to run jobs spanning several compute nodes. 

  • job slots can be allocated on different hosts.
  • Maximum runtime is 4 weeks.
  • Memory requirement should be specified if more than 2GB and should not exceed 4GB per job slot.
  • Example submission script to follow in the test folder.

Additional Queues

Queue for an interactive command line session

To open a login session on a compute node use the following:

bsub -Is /bin/bash 

or, if X11 forwarding is required

bsub -Is -XF /bin/bash 

Remember to request more than one job slot and additional memory if multi-threaded/large memory applications are to be run in the session (eg 10GB, 4 concurrent CPUs):

bsub -Is -n 4 -R 'rusage[mem=10000]' 

Rerunnable queue for access to more resources

Submit to the "rerunnable" queue to use idle time on "Adopt-a-Nodes" that are otherwise available to your jobs.  If the "Adopt-a-Node" owner requires the node while your job is running on (via the rerunnable queue) then your job will be terminated and resubmitted to the top of the queue for running elsewhere. 

  • Default memory allocation is 4GB.
  • Maximum runtime is 4 weeks.
  • Memory requirement should be specified if more than 4GB and should not exceed 12GB.
  • !! Job scripts require testing to ensure they work after being terminated and restarted !!

Job slot allocation

Only the mpi queue will allocate job slots on different hosts when multiple jobs slots are requested with "-n SLOTS". All others allocate all jobs slots on the same host.  Job arrays are treated differently and will run jobs on different hosts in all queues.

Priority node allocation

Labs that have priority nodes through the "Adopt-a-node" program get priority access to those nodes by submitting to the following queues:

  • vlong
  • long
  • normal
  • medium
  • short
  • vshort
  • big
  • big-multi
  • interact
  • matlab
  • matlabdce

The "defaultlow" queue does not give access to priority nodes

 

Other requirements

Please contact Scientific Computing if none of the above queues fit your requirements

Go to KB0027902 in the IS Service Desk