October 26, 2021
Tensorflow with gpu relies on the cuda tool kit to accelerate machine learning. The cuda toolkit requires a certain level of drivers. However on ml00X we only run the cuda driver 396.37 This means that we cannot run the latest version of tensorflow gpu. The best way to install tf-gpu is then in a virtual anaconda environment. In order to use other packages that offer GPU acceleration via cuda, the process follows the one described here, but instead of installing tensorflow-gpu the corresponding package needs to be installed.
Setup a virtual environment
see KB0036593: Create a virtual environent to specify Python and R
Log in to ERISTwo
(local)$ ssh erisone
Then request an interactive session in the GPU queue:
$bsub -q gpu -Is /bin/bash
If you are already using virtual environments and Anaconda you can skip the module load and the adding to the bash RC
Then activate Anaconda3:
(ml002)$ module load Anaconda3
Then create a tf-gpu envirorment
(ml002) conda create --name tf-gpu python=3.6
Note that you can specify a different Python version here when required
activate the virtual enviorment:
(ml002)$conda activate tf-gpu
You see an error like:
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
If your shell is Bash or a Bourne variant, enable conda for the current user with
$ echo ". /apps/software-2.12/Anaconda3/5.2.0/etc/profile.d/conda.sh" >> ~/.bashrc
If you don't see that error and you have used Anaconda before, you can skip forward to install cuda and tf-gpu. Otherwise Copy the line that reads echo….(The one below is an example!!!)
$ echo ". /apps/software-2.12/Anaconda3/5.2.0/etc/profile.d/conda.sh" >> ~/.bashrc
and source the bashrc
$source ~./bashrc
Install cuda and tensorflow-gpu
To install cuda use
(tf-gpu)$ conda install -c anaconda cudatoolkit=9.0
This will take a while. You should agree to the questions.Then install tensorflow-gpu
(tf-gpu)$conda install tensorflow-gpu=1.10.0
Test
To test the installation you can run this small python code. To create the file you can use the following echo commands:
(tf-gpu)$ echo "from tensorflow.python.client import device_lib" > test_tf_gpu.py
(tf-gpu)$ echo "print(device_lib.list_local_devices())" >> test_tf_gpu.py
(or just open an editor, and create a file test_tf_gpu.py including the above code, and then run:)
(tf-gpu)$ python test_tf_gpu.py
When working correctly you will see a list with information about the GPUs on the system. Note that this is only a sanity check and not a full test.