ERISTwo Quick User Guide (Beta Test)

What is ERISTwo 

ERISTwo is our next generation CentOS 7 Linux cluster. It is currently a development/testing platform to enable the transition of ERISOne. ERISTwo includes a change of administrative tools, an updated job scheduler, and an updated application module system. ERISTwo will host the GPU nodes for machine learning and other GPU applications. The same content of home folders and storage from ERISOne will be available on ERISTwo.  

Why a new environment?

ERISOne has long been managed with administration tools that were supported by CentOS 6 that are not compatible with CentOS 7. We decided to ease the transition for admins, as well as users, to provide a separate environment where all cluster features can be thoroughly tested. All new nodes included adopted nodes, are being deployed in CentOS 7 and all new applications are being implemented for the environment configuration of ERISTwo.

Why should I use it?

IMPORTANT: All users are encouraged to test ERISTwo for their workloads and should be able to migrate their work to ERISTwo eventually. As users migrate their workloads we will migrate nodes to improve the capacity of ERISTwo. Only applications that are not compatible with CentOS 7 will be maintained on ERISOne and the computational resources will be reduced according to the number of people using each system. For this reason, it is important for users to let us know if there are any features missing on ERISTwo.

You can take a look at the FAQ at FAQ: ERISTwo Linux Cluster (beta)

Comparison Table for Users
   ERISOne    ERISTwo
OS CentOS 6 CentOS 7
Filesystem Panasas Panfs (Recently updated) Panasas Panfs 
Scheduler LSF 8.0.1 LSF 9.1.3.0
Login nodes Yes Yes
Applications Legacy modules/easybuild New modules
Remote Desktops Yes No
Filemovers Yes Yes
RStudio Pro No (Have been moved) Yes
Jupyter No (Have been moved) Yes
Shiny No (Have been moved) Yes

 

How to log-in  

If you already have access to ERISOne you can login via SSH, if not, you would need to request an account by filling the ERISOne Account Request form. 

ERISTwo can be accessed by ssh: 
$ssh <userID>@eristwo.partners.org 

You will be landing on one of our two login nodes eris2n4 or eris2n5. In the same way as before, no large job should be run on these nodes. All jobs must be submitted to the compute nodes. 

 

General Usage 

Overview 

ERISTwo is a Linux cluster. Currently it is only accessible via ssh, into a command line interface (bash). Jobs should be submitted through the lsf job-scheduler, this includes GPU jobs, and file transfer jobs. Applications are loaded via the lmod module system. Unlike ERISOne there is no need to activate lmod since is loaded by default 

 

LSF 9 

Platform Load Sharing Facility (or simply LSF) is a workload management platform, job scheduler, for distributed high-performance computing (HPC). We have implemented LSF version 9.1.3.0. The general idea is, that each computational job should be submitted to the system, so the system can distribute the jobs to the available nodes, providing each user with a fair share of the cluster and maximizing efficiency.  

Do not run computational jobs on the login nodes. Computational jobs on the login nodes will be terminated and we reserve the right to ban users if we find the user in constant violation.  

 

How to submit a job 

The general syntax of a job submission is: 

$bsub [options]script.lsf 

Note: The “<” is important, when you pass a lsf script. 

The lsf script contains descriptions of the job. An example of a lsf script can be found in each users lsf folder: ~/lsf/test.lsf 

 

The options can either be specified in the script or during the bsub command. Some important options: 

  • -q que_name: Specify the queue for the job 
  • -n <number of cores> : Request that number of cores for the job 
  • -R ' ' : Requirements, mainly on memory 

Note that if you don’t use an option, the default value is set. 

Example: 

$bsub -q normal -R 'rusage[mem=64000]' < my_script.lsf  

You need to choose the right queue for your job, depending on job requirements. (Right now there are only 5 queues). 

 

How to start an interactive job

To start a regular interactive job you can do, for example:

$bsub -Is -q interactive /bin/bash

You can start an interactive job with memory reservation as:

$bsub -Is -q interactive -R 'rusage[mem=64000]' /bin/bash  

 

Queues 

ERISTwo currently has the following queues: 

  • GPU: Limited GPU machines are available. Users must request/renew access at hpcsupport@partners.org.
    • job limit per queue: 100 
    • PEND limit: 200   
    • Run-Time limit: 4 days 
  • Normal: Most cluster jobs fit the normal queue. All jobs with less than 32G memory requirement.
    • job limit per queue: 500 
    • PEND limit: 1000 
    • Run-Time limit: 15 days 
  • Filemove: Transfer files out/in ERISOne to external mount.
    • job limit per queue: 100 
    • PEND limit: 200 
    • Run-Time limit: 5 days 
  • Bigmem: Only for jobs with more than 32G memory requirement.
    • job limit per queue: 100
    • PEND limit: 200
    • Run-Time limit: 5 days 
  • Interactive:  To request an interactive session.
    • job limit per queue: 5 
    • PEND limit: 0 
    • Run-Time limit: 5 days 

If you don’t specify a que it will start in the normal queue. This is meant for average size jobs. If you require GPU acceleration you need to use the GPU queueFilemove is meant for transferring files. Note that you should not run any large file transfers (> 100MB) on the login node.  

The queues can be seen by the command bqueues 

bqueues  
QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP  
filemove          1  Open:Active       -    -    -    -     0     0     0     0 
normal            1  Open:Active       -    -    -    - 27069 27019    50     0 
gpu           1  Open:Active       -    -    -    -     0     0     0     0
bigmem            1  Open:Active       -    -    -    -     0     0     0     0
interactive       1  Open:Active       -    -    -    -     0     0     0     0 

This command as well shows how many jobs are currently in the queue and gives you an idea how long it will take for new jobs to start.  

 

Jobs 

To see the current status of the jobs use:

$bjobs [options] [jobID] 

 When used without any options or argument it shows all your current jobs. If you want to see jobs by all users you can use “-u all”. To specify a queue with “-q [queue_name]”  

 
Example: 

$bjobs -q gpu -u all 

shows jobs by all users currently in the gpu queue  

 

Modules 

Environment modules are a great way to manage applications on a multi-user, multi-node system. The general problem is, that users have different requirements for what software and what version of the software they need. If multiple versions of the same software (especially libraries) are installed, the system needs rules to know what version to use. This is done by setting certain system variables. Modules allow you to easily set those variables by using a simple command.  On ERISOne, by default we used tcl environment modules while ERISTwo used lmodLmod is a more advanced module system that is better in handling more complex dependencies and adds some user features. However, the basic usage and functionality is the same and even old module files can be used (Note, ERISOne module files point to CentOS 6 applications and should not be called). 

 

Basic usage of lmod 

In order to use an application that is available via lmod, you need to load the corresponding module: 

$module load [modulename]/[version_number] 

You don’t need to specify a version number but if you don’t it will load the default version. If no default is set, it will load the highest version number.  

In order to see available modules you use: 

$module avail 

This will show a long list of modules and the path where the corresponding module file is located. If you are looking for a specific application you can do: 

$module avail [application_name] 

This will list all modules that contain the application_name, independent of capitalization. This might as well be other modules that contain the application name in the module. Note that this search feature is unique to lmod and will not work on the tcl environment modules on ERISOne.   

If you need more information about a module you can use: 

$module spider [module_name] 

This will gather information about different versions and general information about the module.  

To see what modules are currently loaded, use 

$module list 

Note here that some modules load other modules as a dependency, so don’t be confused that when you have only loaded one module you see multiple ones listed.  

 

Example: 

$module load CMake 
$module list 

Currently Loaded Modules: 

  1) GCCcore/7.3.0   2) ncurses/6.1-GCCcore-7.3.0   3) CMake/3.11.4-GCCcore-7.3.0 

You can unload modules by  

$modules unload [module_name] 

  

Important: When you use an application in a batch job, you need to add the “module load” command to the lsf file.  

 

 


Related articles