How to Run Python on A GPU Accelerated Computing Linux VDI

Introduction

Python is one of the most popular programming languages for science, engineering, data analytics, and deep learning applications. However, as an interpreted language, it’s been considered too slow for high-performance computing. A GPU Accelerated Computing Linux VDI has just been added into Data Enclave environment, which might address the need for high-performance computing.

GPU’s have more cores than CPU and hence when it comes to parallel computing of data, GPUs perform exceptionally better than CPUs even though GPUs has lower clock speed and it lacks several core management features as compared to the CPU.

Thus, running a python script on GPU can prove to be comparatively faster than CPU. NVIDIA’s CUDA Python provides a driver and runtime API for existing toolkits and libraries to simplify GPU-based accelerated processing. This document provides the know-how, python examples on how to run the NVIDIA CUDA Toolkit and cuDNN libraries on a Linux VDI instance.

Requirements or Prerequisites

What the end user needs to get started are:

  1. You can access a Linux VDI instance running Ubuntu 20.04 and with the following software installed.
    • NVIDIA vGPU Driver Version: 460.106.00
    • CUDA Version: 11.2
    • cuDNN: 8.1.

To run CUDA Python, you’ll need the CUDA Toolkit installed on a system with CUDA-capable GPUs.

  1. You have installed Numba—a Python compiler from Anaconda that can compile Python code for execution on CUDA®-capable GPUs

Step-by-Step Instructions

What are the exact, detailed steps to accomplish the task?

  • Installation

You will need to install Anaconda, if it is not installed already. You can add anaconda to the environment while installing.

Here is a Conda repo that Rod created. https://gitlab.partners.org/rd482/enclave-conda

Same instructions as in Confluence, just copy & paste this command and should install everything for you.

$ curl -sfL https://gitlab.partners.org/rd482/enclave-conda/-/blob/main/conda-setup.sh | sh -s

This will install Anaconda3 in your $HOME folder.

Then update your ~/.bashrc file

$ echo "export PATH=${PATH}:${HOME}/anaconda3/bin" >> ~/.bashrc && source ~/.bashrc

  • Run the bash script to install Anaconda3

With the bash installer script downloaded, run the .sh script to install Anaconda3. Ensure you are in the directory where the installer script downloaded:

$ ls

Anaconda3-2022.05-Linux-x86_64.sh

Run the installer script with bash.

$ bash Anaconda3-2022.05-Linux-x86_64.sh

Accept the License Agreement and allow Anaconda to be added to your PATH. By adding Anaconda to your PATH, the Anaconda distribution of Python will be called when you type $ python in a terminal.

  • Create a Project Environment

Next, create a new environment and activate it.

$ conda create -n GPU_env -y

$ conda activate GPU_env

  • Install Python 3.8

$ conda install python=3.8

  • Install numba cudatoolkit pyculib

To get started with Numba, the first step is to download and install the Anaconda Python distribution that includes many popular packages (Numpy, SciPy, Matplotlib, iPython, etc.) and “conda,” a powerful package manager.

Once you have Anaconda installed, install the required CUDA packages by typing

conda install numba cudatoolkit pyculib

If the installation fails, you might have to search for alternative channels.

$ conda install numba

Check the packages installed.

$ conda list

$ conda list

# packages in environment GPU_env:

#

  • Develop the Python scripts

We will use the numba.jit decorator for the function we want to compute over the GPU. The decorator has several parameters but we will work with only the target parameter.

Target tells the jit to compile codes for which source(“CPU” or “Cuda”). “Cuda” corresponds to GPU.

However, if the CPU is passed as an argument then the jit tries to optimize the code run faster on CPU and improves the speed too.

 Create a python script called test_GPU.py with the following code:

 

from numba import jit, cuda import numpy as np

# to measure exec time

from timeit import default_timer as timer

# normal function to run on cpu def func(a):

for i in range(10000000): a[i]+= 1

# function optimized to run on gpu @jit(target_backend='cuda')

def func2(a):

for i in range(10000000): a[i]+= 1

if  name ==" main ": n = 10000000

a = np.ones(n, dtype = np.float64)

 

start = timer() func(a)

print("without GPU:", timer()-start)

 

start = timer() func2(a)

print("with GPU:", timer()-start)

  • Run the python script 

    $ python3 test_GPU.py

    Output:

    without GPU: 3.448683829017682

    with GPU: 0.16991339999367483

  • Create the test_CUDA_example.py script as below

from  future import division from numba import cuda import numpy

import math

 

 

 

# CUDA kernel @cuda.jit

def my_kernel(io_array):

pos = cuda.grid(1)

if pos < io_array.size:

io_array[pos] *= 2 # do the computation

 

# Host code

data = numpy.ones(256) threadsperblock = 256

blockspergrid = math.ceil(data.shape[0] / threadsperblock) my_kernel[blockspergrid, threadsperblock](data) print(data)

  • Run the test cuda example script

$ python3 test_CUDA_example.py

Output:

$ python3 test_CUDA_example.py

[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.

2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]

Programming Numba for CUDA

GitHub - ContinuumIO/gtc2017-numba: Numba tutorial for GTC 2017 conference

https://github.com/ContinuumIO/gtc2017-numba.git

Deep Learning

Deep Learning | NVIDIA Developer

Deep learning differs from traditional machine learning techniques in that they can automatically learn representations from data such as images, video or text, without introducing hand-coded rules or human domain knowledge. Their highly flexible architectures can learn directly from raw data and can increase their predictive accuracy when provided with more data.

Deep learning is commonly used across apps in computer vision, conversational AI and recommendation systems.

For developers integrating deep neural networks into their cloud-based or embedded application, Deep Learning SDK includes high-performance libraries that implement building block APIs for implementing training and inference directly into their apps. With a single programming model for all GPU platform - from desktop to datacenter to embedded devices, developers can start development on their desktop, scale up in the cloud and deploy to their edge devices - with minimal to no code changes.

NVIDIA provides optimized software stacks to accelerate training and inference phases of the deep learning workflow. Learn more on the links below. 

Deep Learning SDKs

PyTorch

https://developer.nvidia.com/deep-learning-frameworks#:~:text=PyTorch%20is%20a,Automatic%20mixed%20precision

PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like numpy) with strong GPU acceleration.
  • Deep Neural Networks (DNNs) built on a tape-based autograd system.

Reuse your favorite Python packages, such as numpy, scipy and Cython, to extend PyTorch when needed.

PyTorch on NGC Sample models Automatic mixed precision

TensorFlow

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. For visualizing TensorFlow results, TensorFlow offers TensorBoard, a suite of visualization tools.

TensorFlow on NGC TensorFlow on GitHub Sample models Automatic mixed precision TensorFlow for JetPack

Model Deployment

For high performance inference deployment for TensorFlow trained models:

  1. Use the TensorFlow-TensorRT integration to optimize and deploy models within TensorFlow.
  2. Export the TensorFlow model to ONNX and import, optimize, and deploy with NVIDIA TensorRT, an SDK for high performance deep learning inference.

Limitations

  1. Only NVIDIA GPUs are supported for now and the ones which are listed on this page. If your graphics card has CUDA cores, then you can proceed further with setting up things.
  2. Running a python script on GPU can prove to be comparatively faster than CPU, however, it must be noted that for processing a data set with GPU, the data will first be transferred to the GPU’s memory which may require additional time so if data set is small then CPU may perform better than GPU.

Relevant References

Go to KB0040962 in the IS Service Desk