Tensorflow-gpu on ERISTwo

Tensorflow with gpu relies on the cuda tool kit to accelerate machine learning. The cuda toolkit requires a certain level of drivers. However on ml00X we only run the cuda driver 396.37 This means that we cannot run the latest version of tensorflow gpu.  The best way to install tf-gpu is then in a virtual  anaconda environment. In order to use other packages that offer GPU acceleration via cuda, the process follows the one described here, but instead of installing tensorflow-gpu the corresponding package needs to be installed. 

Setup a virtual environment

see KB0036593: Create a virtual environent to specify Python and R

Log in to ERISTwo

(local)$ ssh erisone

Then request an interactive session in the GPU queue:

 $bsub -q gpu -Is /bin/bash

If you are already using virtual environments and Anaconda you can skip the module load and the adding to the bash RC 

 Then activate Anaconda3:

(ml002)$ module load Anaconda3

Then create a tf-gpu envirorment

(ml002) conda create --name tf-gpu python=3.6

Note that you can specify a different Python version here when required

activate the virtual enviorment:

(ml002)$conda activate tf-gpu

You see an error like:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.

If your shell is Bash or a Bourne variant, enable conda for the current user with

$ echo ". /apps/software-2.12/Anaconda3/5.2.0/etc/profile.d/conda.sh" >> ~/.bashrc

If you don't see that error and you have used Anaconda before, you can skip forward to install cuda and tf-gpu. Otherwise Copy the line that reads echo….(The one below is an example!!!)

$ echo ". /apps/software-2.12/Anaconda3/5.2.0/etc/profile.d/conda.sh" >> ~/.bashrc

and source the bashrc

$source ~./bashrc 

Install cuda and tensorflow-gpu

To install cuda use

(tf-gpu)$  conda install -c anaconda cudatoolkit=9.0

This will take a while. You should agree to the questions.Then install tensorflow-gpu

(tf-gpu)$conda install tensorflow-gpu=1.10.0



To test the installation you can run this small python code. To create the file you can use the following echo commands: 

(tf-gpu)$ echo "from tensorflow.python.client import device_lib" > test_tf_gpu.py

(tf-gpu)$ echo "print(device_lib.list_local_devices())" >> test_tf_gpu.py

(or just open an editor, and create a file test_tf_gpu.py including the above code, and then run:)

(tf-gpu)$ python test_tf_gpu.py

When working correctly you will see a list with information about the GPUs on the system. Note that this is only a sanity check and not a full test. 

Related articles