October 31, 2024
Redundancy levels
Traditional RAID protection distributes the file and parity-check data across independent disk drives. BriefCASE instead distributes file and parity data across independent file servers, and this is termed "object-RAID".
BriefCASE redundancy levels can be set on a per-file or per-directory basis. By default files under 64kB are stored with object-RAID1 and files over 64kB are stored with object-RAID5. In both cases, a second orthogonal layer of protection is applied, termed "vertical parity". Object-RAID5 + vertical parity provides a comparable level of protection to traditional RAID6, with the advantage that object-RAID rebuild times are much shorter. Object-RAID6 will be available in a future release. Vertical parity overhead is 3% (1 in 32 sectors), and it protects against loss of a disk sector.
The redundancy overhead of Object-RAID5 + vertical parity for large files is 1.143570, meaning that 1GB of data occupies 1.143570GB of formatted disk space. The "du" (disk usage) linux command by itself gives the size-on-disk, and with the "--apparent-size" flag gives the actual file size:
du -B 1000000 test.bam
195764 .test.bam
du --apparent-size -B 1000000 test.bam
171208 test.bam
Folder usage
BriefCASE includes utilities optimized for to that filesystem. "pan_du" calculates disk usage based on file size and number of files in each folder. Note that the default 20 parallel threads places a heavy load on the system and should not be used except off-hours. Instead use
pan_du -t 2 /data/lab_folder
A usage report is automatically generated each week for /data folders on BriefCASE. Look for a file named "usage_report.txt". To quickly identify files and folders taking up the most space, use the command
sort -n -k 5 usage_report.txt | tail -n 100
Quotas
Each folder has an associated hard and soft quota. The hard quota is a fixed limit is the value in the "Size" column seen with the "df" command:
cd /data/test
df -H .
Filesystem Size Used Avail Use% Mounted on
panfs://10.129.86.180/hpc/groups/test
64T 49T 16T 77% .
The "Used" column reported by "df" does not match the billed usage so don't rely on this value. It may include the amount of space occupied by snapshots (we're not sure).
Email Scientific Computing to request a quota expansion if you anticipate exceeding the current quota. There is also a soft quota set at a lower value which alerts ERIS HPC staff that a folder is close to quota, you can request to be included on this email alert.
Extended Attributes
Extended attributes can be used to set the redundancy level, file locking settings and extended permissions for a file or folder