November 11, 2024
ERISXdl Containers
The management of Docker Containers and images on the login nodes of ERISXdl can be conveniently performed by a tool called podman. The podman program provides access to a significant portion of the Docker container API without requiring root privileges for podman commands to run. Researchers will be able to pull and use containers built and distributed by their colleagues or docker registries, like dockerhub.io, for the purpose of running GPU-based analysis on the ERISXdl GPU-nodes.
The GPU-nodes have no direct connection the internet so they will not be able to run code which requires internet access. Researchers will need to prepare/update their containers and code before jobs are submitted to the gpu-nodes for analysis. Computational jobs should not be run on the login nodes, and should be submitted through the SLURM scheduler. For more information on SLURM and using containers in submitted jobs, see the Using SLURM Scheduler article.
Images to be run on the compute nodes need to be pushed to the Harbor registry and which is hosted at erisxdl.partners.org. Each research group is provided with their own project space in the Harbor registry service whose name corresponds to the lowercase form of their briefcase PAS group name. Group members can login to the website erisxdl.partners.org to examine their group's space by using their Partners user-id and password. By default all project accounts are initially allocated 50GB of storage and where this can be expanded by submitting a request to @email
For all ERISXdl/SLURM jobs both the user's home directory and group briefcase are mounted by default into the runtime Docker container so that convenient access to research data is made possible. This is achieved by means of wrapper scripts which are referenced in "Using SLURM Job Scheduler".
Containers Provided by ERIS HPC Team
Sample containers are provided in the Harbor Registry at the following locations:
- erisxdl.partners.org/library : containers providing GUI interactive sessions e.g. for JupyterHub and Cryosparc
- erisxdl.partners.org/nvidia : several curated NVIDIA NVCR containers such as TensorFlow and CUDA
In future we hope to expand the range of containers offering GPU-powered GUI interactive sessions and are open to suggestions in this regard. In terms of implementation, each type of session, JupyterHub, Cryosparc etc. has a corresponding wrapper script that is invoked in a SLURM batch job. The wrapper script will then generate a custom URL that is output to the job log file and which is 'live' for the duration of the SLURM job. The link should be accessible by most modern web browsers. More details can be found in the test cases examined in the ERISXdl/SLURM section.
Regarding containers stored at erisxdl.partners.org/nvidia, these are generally old and intended for demonstration purposes only. Instead the user is advised to obtain the latest CUDA run-time binaries from NVIDIA's Container Registry for which the HPC ERIS team has purchased a subscription. The images in this registry are optimized for use on ERISXdl's dgx compute nodes. These images are proprietary however and are not authorized to be distributed outside of the MGB network. Any attempt to do so could result in the termination of services for all the MGB research groups who rely upon these images for their work.
How-to Manage Container Examples
Example 1: Pulling container images from outside registries
Users can pull in container images from registries outside of ERISXdl/Harbor such as DockerHub. Login may be necessary for different registries, and may require an account for that registry. Once logged in, you will then be able to pull a container image from the registry, tag the image as your own copy, and push that copy to your Harbor project. To view all the images you currently have available on your local storage, run the podman images
command. This does not reflect the container images you may have in your Harbor project.
For example, the steps below show how an alpine Linux container would be pulled from DockerHub and stored in the hypothetical 'abc123' username's Harbor project. But please bear in mind that since the end of the pilot phase there are in fact no individual Harbor accounts, only group accounts of the form <PAS Group Name in lowercase>
.
-
Login to the registry/registries you are pulling from and pushing to
Note: your login credentials for the ERISXdl Harbor registry should be the same as your cluster credentials.
$ podman login docker.io
Username: abc123
Password: *************
Login Succeeded!
$ podman login erisxdl.partners.org
Username: abc123
Password: *************
Login Succeeded! - Search for a container
$ podman search docker.io/alpine
INDEX NAME DESCRIPTION STARS OFFICIAL AUTOMATED
docker.io docker.io/library/alpine A minimal Docker image based on Alpine Linux... 7670 [OK] - Pull the container image
$ podman pull docker.io/library/alpine
Trying to pull docker.io/library/alpine...
Getting image source signatures
Copying blob 5843afab3874 done
Copying config d4ff818577 done
Writing manifest to image destination
Storing signatures
d4ff818577bc193b309b355b02ebc9220427090057b54a59e73b79bdfe139b83$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/alpine latest d4ff818577bc 4 weeks ago 5.87 MB - Tag the container image*
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/alpine latest d4ff818577bc 4 weeks ago 5.87 MB
erisxdl.partners.org/ic876/alpine demo-copy d4ff818577bc 4 weeks ago 5.87 MBFor the alpine example, we are tagging the alpine image with
demo-copy
$ podman tag d4ff818577bc erisxdl.partners.org/abc123/alpine:demo-copy
and if we wish to tag with the PAS Group account we would use
$ podman tag d4ff818577bc erisxdl.partners.org/<PAS Group Name in lowercase>/alpine:demo-copy
for example
$ podman tag d4ff818577bc erisxdl.partners.org/phs-erisadm-g/alpine:demo-copy*Ideally, tag your image to reflect its current version.
- Push the container image
$ podman push erisxdl.partners.org/abc123/alpine:demo-copy
or, for the PAS Group account
$ podman push erisxdl.partners.org/<PAS Group Name in lowercase>/alpine:demo-copyOnce it is successfully pushed to your Harbor project, you can now pull your copy to your podman runtime at any time, as well as access it in scripts submitted to the job scheduler.
Optional (and at user's own risk): to confirm that it was pushed successfully, remove the locally stored image (this will not affect your Harbor project) and pull it again.
$ podman rmi -f d4ff818577bc
$ podman pull erisxdl.partners.org/abc123/alpine:demo-copy
Example 2: Pulling provided containers from Harbor
Once in full production, ERISXdl users will be able to choose from several curated, pre-built containers provided through Harbor. In the following example, the hypothetical ‘abc123’ username pulls the public CUDA image and stores a copy of it in their Harbor project. (Please note, this CUDA image is very old and intended for demo purposes only, newer versions of CUDA are available at the NVIDIA catalog.)
- Login to Harbor
Note: your login credentials for the ERISXdl Harbor registry should be the same as your cluster credentials.
$ podman login erisxdl.partners.org
Username: abc123
Password: *************
Login Succeeded! - Pull the container image from Harbor
Note: depending on the size of the container, this step may take several minutes
$ podman pull erisxdl.partners.org/nvidia/cuda
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
erisxdl.partners.org/nvidia/cuda latest 979cd1f9e2c8 2 weeks ago 4.24 GB - Tag the container
$ podman tag 979cd1f9e2c8 erisxdl.partners.org/abc123/cuda:latest
and if we wish to tag with the PAS Group account we would use
$ podman tag 979cd1f9e2c8 erisxdl.partners.org/<PAS Group Name in lowercase>/cuda:latest$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
erisxdl.partners.org/nvidia/cuda latest 979cd1f9e2c8 2 weeks ago 4.24 GB
erisxdl.partners.org/abc123/cuda latest 979cd1f9e2c8 2 weeks ago 4.24 GB - Push the container
$ podman push erisxdl.partners.org/abc123/cuda:latest
or, for the PAS Group account
$ podman push erisxdl.partners.org/<PAS Group Name in lowercase>/cuda:latest
Example 3: Running and customizing containers
One of the key features in using containers is the user who runs the container has root permissions inside of the running image. This means that users can run package managers and make system changes freely within their container. To save changes you make to a container, you will need to run the container image, make modifications, and then commit those changes with podman before you push the latest version to your Harbor project.
Note: some containers have extra security layers that prevent users from making certain changes even with root permissions. This may prevent users from using package managers and installing applications within the container.
In the following example, the hypothetical ‘abc123’ username (now superseded by <PAS Group Name in lowercase>) runs and updates their copy of the CUDA image and then stores this updated image in their Harbor project.
- Pull the container from Harbor
$ podman pull erisxdl.partners.org/abc123/cuda:latest
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
erisxdl.partners.org/abc123/cuda latest 979cd1f9e2c8 2 weeks ago 4.24 GB - Run the container and make any changes in the container, like installing additional packages *
$ podman run -it 979cd1f9e2c8 /bin/bash
# Or, if user abc wishes to mount both their home directory and group briefcase into the container
$ podman run -it 979cd1f9e2c8 -v /PHShome/abc:/home/abc -v /data/briefcase:/data /bin/bash
## NOTE : once you run the container, you will have root privileges within the container's filesystem
## In this example, we install OpenGL using the package manager
root@54116e44f656:/# apt-get upgrade
root@54116e44f656:/# apt-get install opengl
root@54116e44f656:/# exit* Container images can be run interactively as containers by using the podman run command. Users cannot run computational jobs on the ERISXdl login nodes, and should only run containers on login nodes when making modifications.
- Commit the changes made as a new container image
$ podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
58b3f6a7ede2 erisxdl.partners.org/abc123/cuda:latest /bin/bash About a minute ago Exited (130) 10 seconds ago$ podman commit 58b3f6a7ede2 erisxdl.partners.org/abc123/cuda:with-opengl
- Push the modified container image to Harbor
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
erisxdl.partners.org/abc123/cuda with-opengl a7932ec48e13 37 seconds ago 4.27 GB
erisxdl.partners.org/abc123/cuda latest 979cd1f9e2c8 2 weeks ago 4.24 GB$ podman push erisxdl.partners.org/abc123/cuda:with-opengl
Podman Settings
On ERISXdl there are three login nodes, erisxdl1, erisxdl2 and erisxdl3 and where each will contain differing collections of locally-stored images stored under
/erisxdl[1-3]/local/storage
In order to ensure the user has access to the images on a given node please locate the following file in the home directory
~/.config/containers/storage.conf
and make the following change using your favorite text editor:
graphroot = "/erisxdl/local/storage/abc123"
where abc123 corresponds to the userid. By this means podman will operate normally on all 3 login nodes, even in the case of node failure. However, the images available on each node will be different. The images stored in Harbor at
erisxdl.partners.org/<PAS Group Name in lowercase>
will of course always be available. If there is trouble please submit a request to hpcsupport to run
sudo ./podman_reset.sh <userID> /erisxdl/local/storage
Finally, please note that on each of the login nodes erisxdl[1-3] a user will have a quota of 50GB (as of 2024/01) for the storage of local images. The current consumption of storage under
/erisxdl[1-3]/local/storage/<userID>
can be displayed with the following terminal command:
quota -vs