March 28, 2022
What is ERISTwo
ERISTwo is our next generation CentOS 7 Linux cluster. It is currently a development/testing platform to enable the transition of ERISOne. ERISTwo includes a change of administrative tools, an updated job scheduler, and an updated application module system. ERISTwo will host the GPU nodes for machine learning and other GPU applications. The same content of home folders and storage from ERISOne will be available on ERISTwo.
Why a new environment?
ERISOne has long been managed with administration tools that were supported by CentOS 6 that are not compatible with CentOS 7. We decided to ease the transition for admins, as well as users, to provide a separate environment where all cluster features can be thoroughly tested. All new nodes included adopted nodes, are being deployed in CentOS 7 and all new applications are being implemented for the environment configuration of ERISTwo.
Why should I use it?
IMPORTANT: All users are encouraged to test ERISTwo for their workloads and should be able to migrate their work to ERISTwo eventually. As users migrate their workloads we will migrate nodes to improve the capacity of ERISTwo. Only applications that are not compatible with CentOS 7 will be maintained on ERISOne and the computational resources will be reduced according to the number of people using each system. For this reason, it is important for users to let us know if there are any features missing on ERISTwo.
You can take a look at the FAQ at FAQ: ERISTwo Linux Cluster (beta)
|Comparison Table for Users|
|OS||CentOS 6||CentOS 7|
|Filesystem||Panasas Panfs (Recently updated)||Panasas Panfs|
|Scheduler||LSF 8.0.1||LSF 18.104.22.168|
|Applications||Legacy modules/easybuild||New modules|
|RStudio Pro||No (Have been moved)||Yes|
|Jupyter||No (Have been moved)||Yes|
|Shiny||No (Have been moved)||Yes|
How to log-in
If you already have access to ERISOne you can login via SSH, if not, you would need to request an account by filling the ERISOne Account Request form.
ERISTwo can be accessed by ssh:
You will be landing on one of our two login nodes eris2n4 or eris2n5. In the same way as before, no large job should be run on these nodes. All jobs must be submitted to the compute nodes.
ERISTwo is a Linux cluster. Currently it is only accessible via ssh, into a command line interface (bash). Jobs should be submitted through the lsf job-scheduler, this includes GPU jobs, and file transfer jobs. Applications are loaded via the lmod module system. Unlike ERISOne there is no need to activate lmod since is loaded by default.
Platform Load Sharing Facility (or simply LSF) is a workload management platform, job scheduler, for distributed high-performance computing (HPC). We have implemented LSF version 22.214.171.124. The general idea is, that each computational job should be submitted to the system, so the system can distribute the jobs to the available nodes, providing each user with a fair share of the cluster and maximizing efficiency.
Do not run computational jobs on the login nodes. Computational jobs on the login nodes will be terminated and we reserve the right to ban users if we find the user in constant violation.
How to submit a job
The general syntax of a job submission is:
$bsub [options]< script.lsf
Note: The “<” is important, when you pass a lsf script.
The lsf script contains descriptions of the job. An example of a lsf script can be found in each users lsf folder: ~/lsf/test.lsf
The options can either be specified in the script or during the bsub command. Some important options:
- -q que_name: Specify the queue for the job
- -n <number of cores> : Request that number of cores for the job
- -R ' ' : Requirements, mainly on memory
Note that if you don’t use an option, the default value is set.
$bsub -q normal -R 'rusage[mem=64000]' < my_script.lsf
You need to choose the right queue for your job, depending on job requirements. (Right now there are only 5 queues).
How to start an interactive job
To start a regular interactive job you can do, for example:
$bsub -Is -q interactive /bin/bash
You can start an interactive job with memory reservation as:
$bsub -Is -q interactive -R 'rusage[mem=64000]' /bin/bash
ERISTwo currently has the following queues:
|Queue||Memory limits||Max run time||Job limit||PEND limit|
- GPU: Limited GPU machines are available. Users must request/renew access at firstname.lastname@example.org.
- Normal: Most cluster jobs fit the normal queue. All jobs with less than 32G memory requirement.
- Filemove: Transfer files out/in ERISOne to external mount.
- Bigmem: Only for jobs with more than 32G memory requirement.
- Interactive: To request an interactive session.
If you don’t specify a queue it will start in the normal queue. This is meant for average size jobs. If you require GPU acceleration you need to use the GPU queue. Filemove is meant for transferring files. Note that you should not run any large file transfers (> 100MB) on the login node.
The queues can be seen by the command bqueues.
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
filemove 1 Open:Active - - - - 0 0 0 0
normal 1 Open:Active - - - - 27069 27019 50 0
gpu 1 Open:Active - - - - 0 0 0 0
bigmem 1 Open:Active - - - - 0 0 0 0
interactive 1 Open:Active - - - - 0 0 0 0
This command as well shows how many jobs are currently in the queue and gives you an idea how long it will take for new jobs to start.
To see the current status of the jobs use:
$bjobs [options] [jobID]
When used without any options or argument it shows all your current jobs. If you want to see jobs by all users you can use “-u all”. To specify a queue with “-q [queue_name]”
$bjobs -q gpu -u all
shows jobs by all users currently in the gpu queue
Environment modules are a great way to manage applications on a multi-user, multi-node system. The general problem is, that users have different requirements for what software and what version of the software they need. If multiple versions of the same software (especially libraries) are installed, the system needs rules to know what version to use. This is done by setting certain system variables. Modules allow you to easily set those variables by using a simple command. On ERISOne, by default we used tcl environment modules while ERISTwo used lmod. Lmod is a more advanced module system that is better in handling more complex dependencies and adds some user features. However, the basic usage and functionality is the same and even old module files can be used (Note, ERISOne module files point to CentOS 6 applications and should not be called).
Basic usage of lmod
In order to use an application that is available via lmod, you need to load the corresponding module:
$module load [modulename]/[version_number]
You don’t need to specify a version number but if you don’t it will load the default version. If no default is set, it will load the highest version number.
In order to see available modules you use:
This will show a long list of modules and the path where the corresponding module file is located. If you are looking for a specific application you can do:
$module avail [application_name]
This will list all modules that contain the application_name, independent of capitalization. This might as well be other modules that contain the application name in the module. Note that this search feature is unique to lmod and will not work on the tcl environment modules on ERISOne.
If you need more information about a module you can use:
$module spider [module_name]
This will gather information about different versions and general information about the module.
To see what modules are currently loaded, use
Note here that some modules load other modules as a dependency, so don’t be confused that when you have only loaded one module you see multiple ones listed.
$module load CMake
Currently Loaded Modules:
1) GCCcore/7.3.0 2) ncurses/6.1-GCCcore-7.3.0 3) CMake/3.11.4-GCCcore-7.3.0
You can unload modules by
$modules unload [module_name]
Important: When you use an application in a batch job, you need to add the “module load” command to the lsf file.