Scientific Computing Report 2020
ERIS Scientific Computing provides a range of computational resources, platforms and support for research and innovation.
This report provides an overview of the resources available to the Mass General Brigham research and innovation communities and how they are used.
ERISOne/ERISTwo Linux Cluster
Account creation history
The account creation form processes the creation of the account in both ERISOne and ERISTwo. Fig.1 shows the total number of accounts created in each year since 2015. For 2020 we have 679 new users. Given the freeze on new hires and the overall situation this amount of growth is still rather large and shows the continue grow of interest in computational resources.
Currently, there are 2265 Users on ERISOne. The Distribution of new accounts is shown in Figure 2A. The new accounts are mainly split up between Brigham (BWH) (41%) and Mass General (MGH) (45%) while PHS (5%), and McLean (4%), some additions to the user base.
ERISOne Access Methods
Users where asked how they intend to use the cluster and what access method they will most likely use (multiple answers are possible). Fig. 4A shows the answer of the new users for each year. Most users are using a command line interface, while a significant portion is using the remote desktop. Further there is a strong trend to use the web portals (Jupyter-Notebooks and R-Studio). It is here as well notable that the remote desktop access increased in popularity.
Computational Capacity
Number of available nodes per cluster including adopted nodes:
ERISOne
Nodes | |
---|---|
Total cluster | 329 |
Login | 2 |
General Compute | 286 |
Filemove | 5 |
Special | 6 (aristo, celeste, lm001, plato and seed1) |
Unavailable | 19 failed, to be decommissioned |
Remote Desktops | 11 |
ERISTwo
Nodes | |
---|---|
Total cluster | 43 |
Login | 2 |
Genearl Compute | 13 |
Filemove | 2 |
General GPU | 3 |
Adopted | 10 |
Remote Desktop | 5 (8 to be added) |
ERISOne Jobs Report
The RTM historical information from ERISOne jobs submission:
Web Portals Usage
Jupyter Hub
The Jupyter hub is an open-source web application that allows the user to create and share documents that contain live code, equations, visualizations and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
Fig 5 depicts the open sessions over time. It has to be noted that Jupyter-hub sessions stay open, unless the user explicitly closes them. That explains the almost constant increase. The gap between May and June 2019, is due to an outage of the database server, that records the usage. Several times the service needed restarting, what eliminated inactive sessions. As one can see, the usage went up within a few days and then kept climbing. The overall maximum is 198 users.
R-studio
RStudio is an integrated development environment for R Statistical Computing. Fig. 6 shows the number of active users for a given day. Unlike Jupyter-Hub, inactive sessions are canceled. Therefore, the daily usage is highly variable with a median of 24 user and a standard deviation of 19.3 The maximum number of active users is 150. As for Jupyter hub, there is a gap in the data recording from mid-April till mid-May
Shiny Apps
Shiny applications developed in R have been very successful to provide web interactivity and access to Briefcase data and ERISOne resources.
- 11 public apps
- 8 private (PAS login required) apps
User training
Due to the covid-19 lockdown all in person training past March was canceled. However, several virtual trainings where still held:
Python Training:
- Covid-19 Demo: Wednesday, July 15th, 2020 (45 Registered)
- Intro to Python: Wednesday, February 25, 2020 (30 registered)
- February 25, 2020 10:00 am to 4:00 pm (30 registered)
- All python training filled up very quickly and was well attended.
Linux Training
- January 14, 2020 1:00 pm to 4:00 pm
- February 26, 2020 1:00 pm to 4:00 pm
R Training
- ThuRsday R's Day: instead of one-day 8-hr training, I started small weekly meetings (about 1 hour)
- 20 - 50 attendance weekly
- Each meeting is recorded and shared. Now, 11 Videos are available, and the playing time so far is 112 hours and increasing.
HPCWIN3 windows analytics server
Account creation
While the new number of new Linux users is almost constantly growing, the number of new users on the HPCWIN3 is relatively stable from 2017 till 2020. In 2020 we saw a decrease of 30% in new users. (from 176 to 125) Fig 7.
With respect of the user base, as shown in Fig. 8 the Distribution of users is very similar to the Linux cluster and follows roughly the size of the institutions.
Other institutions | |
---|---|
DFCI | 9 |
Dana Farber Cancer Institute | 1 |
Harvard Medical School | 1 |
MEE | 2 |
Massachusetts Eye and Ear | 1 |
Massachuesetts Eye and Ear Infirmary | 1 |
Spaulding Rehabilitation Hospital | 2 |
Linux Applications
- 68 new research applications
- Over 800 bioinformatics packages/libraries in the R programming language
- Update to latest R, python, perl, C/C++ programming languages and associated toolchains
- Designed application conversion process for ERISTwo cluster
Main Internal Projects
- Setup an infrastructure inventory on NetBox.
- Deploying new severs with Ansible.
- Migrate HPC team’s code management system from SVN to Git.
- Setup and deploy ERISXdl system.
- Briefcase Bladeset 4 replacement installation and data migration.
- Archival automation of unused data on Briefcase.