Discovery Stories

HPC Data Usage Report 2020

Scientific Computing Report 2020

ERIS Scientific Computing provides a range of computational resources, platforms and support for research and innovation.

This report provides an overview of the resources available to the Mass General Brigham research and innovation communities and how they are used. 

ERISOne/ERISTwo Linux Cluster 

Account creation history

The account creation form processes the creation of the account in both ERISOne and ERISTwo. Fig.1 shows the total number of accounts created in each year since 2015. For 2020 we have 679 new users. Given the freeze on new hires and the overall situation this amount of growth is still rather large and shows the continue grow of interest in computational resources.

Chart, bar chartDescription automatically generated

Fig.1 Accounts opened per year.

 

Currently, there are 2265 Users on ERISOne. The Distribution of new accounts is shown in Figure 2A. The new accounts are mainly split up between Brigham (BWH) (41%) and Mass General (MGH) (45%) while PHS (5%), and McLean (4%), some additions to the user base. 

Chart, pie chartDescription automatically generated

Fig.2 Distribution of accounts by institution

ERISOne Access Methods 

Users where asked how they intend to use the cluster and what access method they will most likely use (multiple answers are possible). Fig. 4A shows the answer of the new users for each year.  Most users are using a command line interface, while a significant portion is using the remote desktop. Further there is a strong trend to use the web portals (Jupyter-Notebooks and R-Studio). It is here as well notable that the remote desktop access increased in popularity. 
 

Chart, histogramDescription automatically generated

Fig. 3 ERISOne access method, per year

Computational Capacity

Number of available nodes per cluster including adopted nodes:

ERISOne

Nodes  
Total cluster 329
Login  2
General Compute 286
Filemove  5
Special  6 (aristo, celeste, lm001, plato and seed1)
Unavailable 19 failed, to be decommissioned
Remote Desktops 11

ERISTwo

Nodes  
Total cluster 43
Login 2
Genearl Compute 13
Filemove  2
General GPU 3
Adopted 10
Remote Desktop 5 (8 to be added)

 

ERISOne Jobs Report

The RTM historical information from ERISOne jobs submission:

Graphical user interfaceDescription automatically generated with medium confidence

Fig. 4 ERISOne RTM yearly report

Web Portals Usage

Jupyter Hub

The Jupyter hub is an open-source web application that allows the user to create and share documents that contain live code, equations, visualizations and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

Chart, line chartDescription automatically generated

Fig. 5 Jupyter Notebooks over time

Fig 5 depicts the open sessions over time. It has to be noted that Jupyter-hub sessions stay open, unless the user explicitly closes them. That explains the almost constant increase. The gap between May and June 2019, is due to an outage of the database server, that records the usage. Several times the service needed restarting, what eliminated inactive sessions. As one can see, the usage went up within a few days and then kept climbing. The overall maximum is 198 users. 

R-studio

RStudio is an integrated development environment for R Statistical Computing. Fig. 6 shows the number of active users for a given day. Unlike Jupyter-Hub, inactive sessions are canceled. Therefore, the daily usage is highly variable with a median of 24 user and a standard deviation of 19.3 The maximum number of active users is 150. As for Jupyter hub, there is a gap in the data recording from mid-April till mid-May

 

ChartDescription automatically generated

Fig.6 Rstudio users

Shiny Apps

Shiny applications developed in R have been very successful to provide web interactivity and access to Briefcase data and ERISOne resources. 

  • 11 public apps
  • 8 private (PAS login required) apps

User training

Due to the covid-19 lockdown all in person training past March was canceled. However, several virtual trainings where still held:

Python Training:

  • Covid-19 Demo: Wednesday, July 15th, 2020 (45 Registered)
  • Intro to Python: Wednesday, February 25, 2020 (30 registered)
  • February 25, 2020 10:00 am to 4:00 pm (30 registered)
  • All python training filled up very quickly and was well attended. 

Linux Training

  • January 14, 2020 1:00 pm to 4:00 pm 
  • February 26, 2020 1:00 pm to 4:00 pm

R Training 

  • ThuRsday R's Day: instead of one-day 8-hr training, I started small weekly meetings (about 1 hour)
  • 20 - 50 attendance weekly
  • Each meeting is recorded and shared. Now, 11 Videos are available, and the playing time so far is 112 hours and increasing.

HPCWIN3 windows analytics server

Account creation

While the new number of new Linux users is almost constantly growing, the number of new users on the HPCWIN3 is relatively stable from 2017 till 2020.  In 2020 we saw a decrease of 30% in new users. (from 176 to 125) Fig 7.

Chart, bar chartDescription automatically generated

Fig. 7 New HPCwin3 Users

With respect of the user base, as shown in Fig. 8 the Distribution of users is very similar to the Linux cluster and follows roughly the size of the institutions. 

Chart, pie chartDescription automatically generated

Fig. 8  HPCwin3 user distribution.

 

Other institutions  
DFCI 9
Dana Farber Cancer Institute 1
Harvard Medical School 1
MEE 2
Massachusetts Eye and Ear 1
Massachuesetts Eye and Ear Infirmary 1
Spaulding Rehabilitation Hospital 2

Linux Applications

  • 68 new research applications
  • Over 800 bioinformatics packages/libraries in the R programming language
  • Update to latest R, python, perl, C/C++ programming languages and associated toolchains
  • Designed application conversion process for ERISTwo cluster

Main Internal Projects

  • Setup an infrastructure inventory on NetBox.
  • Deploying new severs with Ansible.
  • Migrate HPC team’s code management system from SVN to Git.
  • Setup and deploy ERISXdl system.
  • Briefcase Bladeset 4 replacement installation and data migration.  
  • Archival automation of unused data on Briefcase.