Discovery Stories

Image
HPC Data Usage Report 2020 Graphic
HPC Data Usage Report 2021

Scientific Computing Report 2021

ERIS Scientific Computing provides a range of computational platforms, resources and support for research and innovation across MGB.

This report provides an overview of the resources available to the Mass General Brigham research and innovation communities and how they are used. 

ERISOne/ERISTwo Linux Cluster 

Account creation history

The account creation form processes the creation of accounts in both ERISOne and ERISTwo, the CPU-based high performance computing platform. Fig.1 shows the total number of accounts created in each year since 2015. While there was a peak of users in 2019, the number of new users in 2021 is a close second with 710 new users and represents a 4.5% increase from prior year.

Bar chart showing number of new users for ERISOne and ERISTwo each year
Figure 1 - Accounts opened per year

Of the 710 new users, the majority reported they were from Brigham and Women’s Hospital (42%), Massachusetts General Hospital (36%) or from Mass General Brigham (MGB Corporate) (11%). Figure 2 below shows the full breakdown of new accounts created in 2021 by institution.

Pie chart showing percentage breakdown of accounts in 2021 by institution
Figure 2 - Distribution of new ERISOne 2021 accounts by institution

 

Currently, there are 2467 active Linux accounts and a total of 250 groups. 

 

ERISOne Access Methods 

Users were asked how they intend to use the cluster and what access method they will most likely use (multiple answers are possible). Fig. 4A shows the answer to the new users for each year.  Most users are using a command-line interface. There is a strong trend to use the web portals, Jupyter-Notebooks and R-Studio. It is notable that while remote desktop access has remained steady, using web portals increased have increased nearly every year. 

Bar chart showing number of users per access method for ERISOne and ERISTwo each year
Figure 3 - ERISOne access methods per year

 

Computational Capacity

Number of available nodes per cluster

ERISOne

Nodes  
Total cluster 276
Login  2
General Compute 152
Filemovers 3
Special  6 (aristo, celeste, lm001, plato, seed1, and socrates)
Remote Desktops 7

ERISTwo

Nodes  
Total cluster 49
Login 2
General Compute 16
Filemove  2
Adopted 10
Remote Desktop 13

Number of resources per cluster

Please note that the data below is from nodes that are online at the time of data collection.  Counts are subject to change based on nodes being online/offline.

ERISOne

Total RAM (GB) Total CPUs Total Cores
40196 6160 69708

ERISTwo

Total RAM (GB) Total CPUs Total Cores
12438 1952 30912

 

Scientific Applications

We provide a wide range of applications for each Platform. Currently, new installations on ERISOne are closed.  New applications are installed only on ERISTwo that is installed with the latest OS. A total of 574 modules have been built on ERISTwo with more are available each day. 

Main Applications

  • BGEN/1.1.7
  • R/4.1.0
  • anaconda/4.10.1
  • bedtools/2.30.0
  • cellranger/6.1.1
  • freesurfer/7.2.0
  • igblast/1.17.1
  • samtools/1.11
  • FastQC/0.11.8-Java-1.8
  • GATK/3.8-0-Java-1.8.0_161
  • Plink/2.0
  • picard/2.22.0
  • Bowtie2/2.3.4.2-foss-2018b
  • OpenBLAS/0.3.1-GCC-7.3.0-2.30
  • MATLAB/2020b
  • QIIME2/2018.11

 

Web Portals Usage

Jupyter Hub

Jupyter hub is an open-source web application that allows users to create and share documents that contain live code, equations, visualizations and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

Line graph showing usage of jupyterhub over time
Figure 4 - JupyterHub daily users

Figure 4 depicts open sessions over time. Please note that Jupyter-hub sessions stay open unless the user explicitly closes them. This explains the continual increase with gaps representing reboots or outages. The service needs restarting periodically which eliminates inactive sessions.  The overall maximum number of users, 198, occurred in February 2020, while the maximum users in 2021 was 195 in March.

R-studio

RStudio is an integrated development environment for R Statistical Computing. Figure 5 shows the number of active users per day. Unlike JupyterHub, inactive sessions are automatically closed, which explains the frequent drops in usage and relatively stable usage trend unlike the spikes in JupyterHub usage. As with Jupyter hub, there is a gap in the data recording from mid-April till mid-May of 2019 due to failure of the data collection database

Line graph showing usage of rstudio over time
Figure 5 - RStudio daily users

Shiny Apps

Shiny applications developed in R is a valuable tool for our community.  Shiny provides web interactivity and access to Briefcase data and ERISOne resources. 

  • 19 public apps
  • 17 private (PAS login required) apps

This represents a doubling from 2020.

Remote Desktop

The NoMachine Remote Desktops are a group of nodes on ERISOne and ERISTwo that host remote, graphical Linux desktop sessions that users connect to virtually. This allows users to work in a full Linux GUI while accessing their cluster data and any cluster software modules, especially helpful when working with graphical applications like Matlab. Inactive sessions are terminated after 2 weeks.

Figure 6 below shows usage over time of the remote desktop sessions specifically on ERISOne rgs machines. In 2021, we have worked to remove any non-functioning graphical nodes, which have included several rgs nodes on ERISOne.  This accounts for the slight decline throughout the year. In their place, we have recently installed 8 new grx graphical nodes in ERISTwo for remote desktop usage. The grx nodes for ERISTwo remote desktop sessions are not included in the graph below.

Line graph showing usage of remote desktop over time
Figure 6 - NoMachine Remote Desktop daily users

HPCWIN3/4 Windows Analytics Servers

Account creation

While the number of new Linux users is continually growing, the number of new users on the HPCWIN3/4 is relatively stable from 2017 till 2020.  In 2020 we saw a decrease of 30% in new users, and in 2021 we saw another decrease in new users of 26%.

Bar chart showing number of new users for HPCWin each year
Figure 7 - New HPCWIN3/4 accounts per year

With respect to the user base, as shown in Figure 8, the distribution of users is very similar to the Linux cluster and follows roughly the same breakdown across MGB institutions. 

Pie chart showing percentage breakdown of accounts in 2021 by institution
Figure 8 - Distribution of new HPCWIN3/4 2021 accounts by institution

 

ERISXdl

This past year, the newest cluster ERISXdl opened for new users during its pilot phase. Currently free for users, the goal of ERISXdl is to give researchers the opportunity and necessary resources to complete deep learning and other GPU-powered analyses. The cluster is made up of 3 login nodes and 5 Nvidia DGX compute nodes, each equipped with NVIDIA v100 GPUs. Data tracking for this cluster began in June 2021, with limited usage due to the small size of the cluster and the limited nature of the pilot testing phase. Figure 9 shows the number of jobs submitted to the cluster daily, and Figure 10 shows the number of active users each month.

As seen below, while the month with the most number of active users was July 2021, the month with the peak job submissions was November with 13 active users. Sudden spikes in job submissions can often be accounted for by a few select users who submit large numbers of jobs in batch, but normal day to day job submissions remain below 50 jobs per day. Please note that daily jobs are calculated by the starting date of each job, as jobs can run for up to 14 days on the cluster. This means that while only a few new jobs might be submitted on any given day, the 5 DGX nodes that are running may still be backed up from longer jobs submitted several days prior. Additionally, the daily job count includes both completed and incomplete (failed or canceled) jobs.

Line graph showing usage of erisxdl over time
Figure 9 - Daily ERISXdl job submissions in 2021
Bar chart showing number of active users for ERISXdl each month
Figure 10 - Active users per month on ERISXdl in 2021

Current Projects

  • Deploying 60 additional nodes on ERISTwo.
  • Deploy 3PB of additional storage on Briefcase.
  • Open ERISXdl for production with charge-back implementation.