Scientific Computing Report 2020
This report provides an overview of the resources available to the Mass General Brigham research and innovation communities and how they are used.
ERISOne/ERISTwo Linux Cluster
Account creation history
The account creation form processes the creation of the account in both ERISOne and ERISTwo. Fig.1 shows the total number of accounts created in each year since 2015. For 2020 we have 679 new users. Given the freeze on new hires and the overall situation this amount of growth is still rather large and shows the continue grow of interest in computational resources.
Fig.1 Accounts opened per year.
Currently, there are 2265 Users on ERISOne. The Distribution of new accounts is shown in Figure 2A. The new accounts are mainly split up between Brigham (BWH) (41%) and Mass General (MGH) (45%) while PHS (5%), and McLean (4%), some additions to the user base.
Fig.2 Distribution of accounts by institution
ERISOne Access Methods
Users where asked how they intend to use the cluster and what access method they will most likely use (multiple answers are possible). Fig. 4A shows the answer of the new users for each year. Most users are using a command line interface, while a significant portion is using the remote desktop. Further there is a strong trend to use the web portals (Jupyter-Notebooks and R-Studio). It is here as well notable that the remote desktop access increased in popularity.
Fig. 3 ERISOne access method, per year
Number of available nodes per cluster including adopted nodes:
|Special||6 (aristo, celeste, lm001, plato and seed1)|
|Unavailable||19 failed, to be decommissioned|
|Remote Desktop||5 (8 to be added)|
ERISOne Jobs Report
The RTM historical information from ERISOne jobs submission:
Fig. 4 ERISOne RTM yearly report
Web Portals Usage
The Jupyter hub is an open-source web application that allows the user to create and share documents that contain live code, equations, visualizations and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
Fig. 5 Jupyter Notebooks over time
Fig 5 depicts the open sessions over time. It has to be noted that Jupyter-hub sessions stay open, unless the user explicitly closes them. That explains the almost constant increase. The gap between May and June 2019, is due to an outage of the database server, that records the usage. Several times the service needed restarting, what eliminated inactive sessions. As one can see, the usage went up within a few days and then kept climbing. The overall maximum is 198 users.
RStudio is an integrated development environment for R Statistical Computing. Fig. 6 shows the number of active users for a given day. Unlike Jupyter-Hub, inactive sessions are canceled. Therefore, the daily usage is highly variable with a median of 24 user and a standard deviation of 19.3 The maximum number of active users is 150. As for Jupyter hub, there is a gap in the data recording from mid-April till mid-May
Fig.6 Rstudio users
Shiny applications developed in R have been very successful to provide web interactivity and access to Briefcase data and ERISOne resources.
- 11 public apps
- 8 private (PAS login required) apps
Due to the covid-19 lockdown all in person training past March was canceled. However, several virtual trainings where still held:
- Covid-19 Demo: Wednesday, July 15th, 2020 (45 Registered)
- Intro to Python: Wednesday, February 25, 2020 (30 registered)
- February 25, 2020 10:00 am to 4:00 pm (30 registered)
- All python training filled up very quickly and was well attended.
- January 14, 2020 1:00 pm to 4:00 pm
- February 26, 2020 1:00 pm to 4:00 pm
- ThuRsday R's Day: instead of one-day 8-hr training, I started small weekly meetings (about 1 hour)
- 20 - 50 attendance weekly
- Each meeting is recorded and shared. Now, 11 Videos are available, and the playing time so far is 112 hours and increasing.
HPCWIN3 windows analytics server
While the new number of new Linux users is almost constantly growing, the number of new users on the HPCWIN3 is relatively stable from 2017 till 2020. In 2020 we saw a decrease of 30% in new users. (from 176 to 125) Fig 7.
Fig. 7 New HPCwin3 Users
With respect of the user base, as shown in Fig. 8 the Distribution of users is very similar to the Linux cluster and follows roughly the size of the institutions.
Fig. 8 HPCwin3 user distribution.
|Dana Farber Cancer Institute||1|
|Harvard Medical School||1|
|Massachusetts Eye and Ear||1|
|Massachuesetts Eye and Ear Infirmary||1|
|Spaulding Rehabilitation Hospital||2|
- 68 new research applications
- Over 800 bioinformatics packages/libraries in the R programming language
- Update to latest R, python, perl, C/C++ programming languages and associated toolchains
- Designed application conversion process for ERISTwo cluster
Main Internal Projects
- Setup an infrastructure inventory on NetBox.
- Deploying new severs with Ansible.
- Migrate HPC team’s code management system from SVN to Git.
- Setup and deploy ERISXdl system.
- Briefcase Bladeset 4 replacement installation and data migration.
- Archival automation of unused data on Briefcase.