Scientific Computing Linux Clusters Storage Organization Best Practice

Introduction

This page is relevant to larger research labs using Scientific Computing (SciC) Linux Clusters as an integral part of their workflow. Labs wishing to implement the resilient storage organization outlined here should contact Scientific Computing.

In response to the 450TB filesystem outage following maintenance on 28th April, we are proposing changes to prevent interruptions to your workflow during large filesystem maintenance or unavailability. Currently this document is a work-in-progress, feedback and comments welcome, again at Scientific Computing.

Background

Many Labs have data storage in the 10 to 100 terabyte range (“big data”).  Even on a high performance filesystem it can take hours, days or weeks to move these volumes of data, restore from backup or do fileserver maintenance.  ERIS is in the process of purchasing a large new generation filesystem with improvements such as an object storage model and solid state disk cache that will significantly reduce maintenance time.  However, operations on our 450TB filesystem will continue to take days, and if something should go wrong, a further period for troubleshooting. 

All SciC Linux Clusters fileservers are configured for high availability, which means a second node will take over operation seamlessly should any one node suffer a hardware failure and go offline.  The storage best practices outlined are to ensure high resiliency of service during maintenance or unforeseen events that result in an entire filesystem being offline.

To ensure storage resiliency while acknowledging the challenge of “big data,” two options are outlined below:

1.  Replicate all files: Maintain two copies of your "/data" folder on different filesystems.  This option is available now on request for smaller folders, and will be available later for larger data folders.  The downside here is a doubling of storage cost which becomes important over 10’s TB.

2. Replicate essential files:  By classifying data into applications, scratch and archive, and maintaining dual copies only of data needed for day-to-day operation replication becomes feasible at reasonable cost.

Replicate all files

Complete replication does not require configuration changes but doubles storage cost.

Replicate essential files - Storage organization

A Lab folder can be organized into different classes of files: applications, common configuration files, new data, processed data and temporary files generated during active analysis. Reference data and databases are additional categories that may be too large for the shared database service.  Each category has different requirements for access speed, backup and availability.  By separating data into different folders by type, ERIS can offer replication and performance optimized for the data. The files can still “look” like one folder using symbolic links or by building the actual paths into the workflow.

Applications and configuration

This folder should contain group-specific applications and configuration files.

These are essential for all work on the cluster.  SciC Linux Clusters-provided applications will be duplicated for resiliency.  Lab groups can choose to request an applications folder "/apps/labs/lab_name" which will also be duplicated, allowing work to continue through maintenance on the primary data folders.

Main Data folder

The data folder should contain input data and final, processed results.

The primary data folder "/data/lab_name" contains data for processing and the final results, as well as intermediate files from processing that are retained longer term. 

Scratch space

For temporary files are those created while a job runs, not including the final output.

Scratch space is used by jobs generating many temporary files and is placed on the highest-performance storage.  While temporary files do not require backing up, they currently slow down the process of backing up HPC data considerably because they change frequently.  By excluding this folder from backup, ERIS will be better able to provide backup for the essential data.  Scratch space is not duplicated but a fallback scratch space on different servers is available to provide resiliency for high up-time.

Archive

The archive folder should contain data for which processing is complete.     

Archive storage on the SciC Linux Clusters Massive Array of Disks (MAD) service costs less than other storage but is still accessible directly from the cluster.  It supports fast data transfers but not as many simultaneous read and write operations as the scratch and primary data spaces.

Input Data

A location where new data is deposited from sources external to the cluster

Having a dedicated folder for input data makes data transfer easier for people who may not have full SciC Linux Clusters accounts or devices such as sequencers, imaging devices or servers.  This folder may be accessed via Microsoft Windows file-sharing.

Reference data

Reference data is public data used by many researchers.

SciC Linux Clusters has 10’s TB of reference data, of which perhaps 10% is used on a regular basis.  Notify us which reference datasets are required for your workloads and we will duplicate those for resiliency.

Summary Table

Class

Path

Cost / Performance

High Availability

Tape Backup

Permissions

Application and config

/apps/lab/lab_name

/apps/modulefiles/lab_name

 

replication

yes

 

Scratch

(temp)

/data/lab_name/scratch

fastest

Alternate location

no

Group

Main

/data/lab_name

 

 

On request

Group

Input data

/data/lab_name/sequencer

 

 

On request

Group

Reference data

/pub/lab/lab_name

 

Selective replication

yes

public

Database

/dbase/lab_name

fastest

 

 

 

Archive

/external/lab_name

cheapest

Replication

No

Single login for group

Go to KB0027899 in the IS Service Desk

Related articles