November 11, 2024
What is ERISTwo
ERISTwo is our next-generation CentOS 7 Linux cluster. It is currently in production to enable the transition off ERISTwo. ERISTwo includes a change of administrative tools, an updated job scheduler, and an updated application module system. If you require GPU machines please use ERISXdl there is no support for new GPU applications on ERISTwo. The storage both home folders and Briefcase from ERISTwo are available on ERISTwo and ERISXdl.
Why a new environment?
ERISTwo has long been managed with administration tools that were supported by CentOS 6 but are not compatible with CentOS 7. We decided to ease the transition for admins, as well as users, to provide a separate environment where all cluster features can be thoroughly tested. All new nodes including adopted nodes are being deployed in CentOS 7 and all new applications are being implemented for the environment configuration of ERISTwo.
Why should I use it?
IMPORTANT: All users are urged to test ERISTwo for their workloads and should be able to migrate their work to ERISTwo. As users migrate their workloads we will migrate nodes to improve the capacity of ERISTwo. Only applications that are not compatible with CentOS 7 will be maintained on ERISTwo and the computational resources will be reduced according to the number of people using each system. It is important for users to let us know if there are any features missing on ERISTwo.
You can take a look at the FAQ at FAQ: ERISTwo Linux Cluster (beta)
Comparison Table for Users | ||
ERISTwo |
ERISTwo | |
OS | CentOS 6 | CentOS 7 |
Filesystem | Panasas Panfs | Panasas Panfs |
Scheduler | LSF 8.0.1 | LSF 9.1.3.0 |
Login nodes | Yes | Yes |
Applications | Legacy modules/easybuild | New modules |
Remote Desktops | Yes | Yes |
Filemovers | Yes | Yes |
RStudio Pro | No (Have been moved) | Yes |
Jupyter | No (Have been moved) | Yes |
Shiny | No (Have been moved) | Yes |
How to log-in
If you already have access to ERISTwo you can login via SSH, if not, you would need to request an account by filling out the ERISTwo Account Request form.
ERISTwo can be accessed by ssh:
$ssh <userID>@eristwo.partners.org
You will be landing on one of our two login nodes eris2n4 or eris2n5. In the same way as before, no large job should be run on these nodes. All jobs must be submitted to the compute nodes.
General Usage
Overview
ERISTwo is a Linux cluster. Currently, it is only accessible via ssh from a command line interface (bash). Jobs should be submitted through the lsf job scheduler, this includes GPU jobs and file transfer jobs. Applications are loaded via the lmod module system. Unlike ERISTwo there is no need to activate lmod since it is loaded by default.
LSF 9
Platform Load Sharing Facility (or simply LSF) is a workload management platform, and job scheduler, for distributed high-performance computing (HPC). We have implemented LSF version 9.1.3.0. The general idea is, that each computational job should be submitted to the system, so the system can distribute the jobs to the available nodes, providing each user with a fair share of the cluster and maximizing efficiency.
Do not run computational jobs on the login nodes. Computational jobs on the login nodes will be terminated and we reserve the right to ban users if we find the user in constant violation.
How to submit a job
The general syntax of a job submission is:
$bsub [options]< script.lsf
Note: The “<” is important, when you pass an lsf script.
The lsf script contains descriptions of the job. An example of an lsf script can be found in each user's lsf folder: ~/lsf/test.lsf
The options can either be specified in the script or during the bsub command. Some important options:
- -q que_name: Specify the queue for the job
- -n <number of cores> : Request that number of cores for the job
- -R ' ' : Requirements, mainly on memory
Note that if you don’t use an option, the default value is set.
Example:
$bsub -q normal -R 'rusage[mem=64000]' < my_script.lsf
Depending on job requirements, you need to choose the right queue for your job. (Right now there are only 5 queues).
How to start an interactive job
To start a regular interactive job you can do, for example:
$bsub -Is -q interactive /bin/bash
You can start an interactive job with memory reservation as:
$bsub -Is -q interactive -R 'rusage[mem=64000]' /bin/bash
Queues
ERISTwo currently has the following queues:
Queue | Memory limits | Max run time | Job limit | PEND limit |
GPU | - | 4 days | 100 | 200 |
Normal | <32G | 15 days | 500 | 1000 |
Filemove | - | 5 days | 100 | 200 |
Bigmem | >32G | 5 days | 100 | 200 |
Interactive | - | 5 days | 5 | 0 |
- GPU: Limited GPU machines are available. Users must request/renew access at @email.
- Normal: Most cluster jobs fit the normal queue. All jobs with less than 32G memory requirement.
- Filemove: Transfer files out/in ERISTwo to an external mount.
- Bigmem: Only for jobs with more than 32G memory requirement.
- Interactive: To request an interactive session.
If you don’t specify a queue it will start in the normal queue. This is meant for average-size jobs. If you require GPU acceleration you need to use the GPU queue. Filemove is meant for transferring files. Note that you should not run any large file transfers (> 100MB) on the login node.
The queues can be seen by the command bqueues.
$ bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
filemove 1 Open:Active - - - - 0 0 0 0
normal 1 Open:Active - - - - 27069 27019 50 0
gpu 1 Open:Active - - - - 0 0 0 0
bigmem 1 Open:Active - - - - 0 0 0 0
interactive 1 Open:Active - - - - 0 0 0 0
This command as well shows how many jobs are currently in the queue and gives you an idea how long it will take for new jobs to start.
Jobs
To see the current status of the jobs use:
$bjobs [options] [jobID]
When used without any options or argument it shows all your current jobs. If you want to see jobs by all users you can use “-u all”. To specify a queue with “-q [queue_name]”
Example:
$bjobs -q gpu -u all
shows jobs by all users currently in the gpu queue
Modules
Environment modules are a great way to manage applications on a multi-user, multi-node system. The general problem is, that users have different requirements for what software and what version of the software they need. If multiple versions of the same software (especially libraries) are installed, the system needs rules to know what version to use. This is done by setting certain system variables. Modules allow you to easily set those variables by using a simple command. On ERISTwo, by default, we used tcl environment modules while ERISTwo used lmod. Lmod is a more advanced module system that is better in handling more complex dependencies and adds some user features. However, the basic usage and functionality is the same and even old module files can be used (Note, ERISTwo module files point to CentOS 6 applications and should not be called).
Basic usage of lmod
In order to use an application that is available via lmod, you need to load the corresponding module:
$module load [modulename]/[version_number]
You don’t need to specify a version number but if you don’t it will load the default version. If no default is set, it will load the highest version number.
In order to see available modules you use:
$module avail
This will show a long list of modules and the path where the corresponding module file is located. If you are looking for a specific application you can do:
$module avail [application_name]
This will list all modules that contain the application_name, independent of capitalization. This might as well be other modules that contain the application name in the module. Note that this search feature is unique to lmod and will not work on the tcl environment modules on ERISTwo.
If you need more information about a module you can use the following:
$module spider [module_name]
This will gather information about different versions and general information about the module.
To see what modules are currently loaded, use
$module list
Note here that some modules load other modules as a dependency, so don’t be confused that when you have only loaded one module you see multiple ones listed.
Example:
$module load CMake
$module list
Currently Loaded Modules:
1) GCCcore/7.3.0 2) ncurses/6.1-GCCcore-7.3.0 3) CMake/3.11.4-GCCcore-7.3.0
You can unload modules by
$modules unload [module_name]
Important: When you use an application in a batch job, you need to add the “module load” command to the lsf file.