User Tools

Site Tools


guide:gpu

GPU Nodes

Upgraded: more GPUs added.

The Lengau cluster at the CHPC includes 9 GPU compute nodes with a total of 30 Nvidia V100 GPU devices. There are 6 gpu* nodes with 3 GPUs in each, and 3 gpu* nodes with 4 GPUs in each.

GPU Node CPU Cores GPU Devices Interface
gpu2001 36 3× Nvidia V100 16GB PCIe
gpu2002 36 3× Nvidia V100 16GB PCIe
gpu2003 36 3× Nvidia V100 16GB PCIe
gpu2004 36 3× Nvidia V100 16GB PCIe
gpu2005 36 3× Nvidia V100 32GB PCIe
gpu2006 36 3× Nvidia V100 32GB PCIe
gpu4001 40 4× Nvidia V100 16GB NVlink
gpu4002 40 4× Nvidia V100 16GB NVlink
gpu4003 40 4× Nvidia V100 16GB NVlink
Jobs that require 1, 2 or 3 GPUs can be allocated to any node, and will share the node if the job does not use all the GPU devices on that node. Jobs that require 4 GPUs can only be allocated to gpu4* nodes and will not be shared, obviously.

Policies

Access

Access to these GPU node is by PI application only though the CHPC Helpdesk.

Allocation

Research programme allocations will be depleted by a factor of the wallclock time and the number of GPUs (1, 2, or 4) requested by the job.

gpu_allocation_used = 40 * runtime * ngpus

Usage

GPU applications on Lengau

Some pre-built applications have automated scripts that you can use to launch them:

GPU Job Scripts

GPU Queues

There are four queues available in PBSPro which access the GPU nodes:

Queue name Max. CPUs Max. GPUs PBSPro options Comments
gpu_1 10 1 -q gpu_1
-l ncpus=9:ngpus=1
Access one GPU device only.
gpu_2 20 2 -q gpu_2
-l ncpus=18:ngpus=2
Access two GPU devices.
gpu_3 36 3 -q gpu_3
-l ncpus=36:ngpus=3
Access three GPU devices.
gpu_4 40 4 -q gpu_4
-l ncpus=40:ngpus=4
Access four GPU devices on NVLink nodes.

GPU Queue Limits

The maximum wall clock time on all GPU queues is 12 hours.

#PBS -l walltime=12:00:00
It is better to specify a shorter walltime if your code executes in less time: this allows the scheduler a better chance of running your job sooner.

Interactive Job on a GPU Node

A single interactive session may be request on a GPU node by

qsub -I -q gpu_1 -P PRJT1234

NB: Replace PRJT1234 with your project number.

The default time for an interactive session is 1 hour.

Example Job Script

#!/bin/bash
#PBS -N nameyourjob
#PBS -q gpu_1
#PBS -l ncpus=10:ngpus=1
#PBS -P PRJT1234
#PBS -l walltime=4:00:00
#PBS -o /mnt/lustre/users/USERNAME/cuda_test/test1.out
#PBS -e /mnt/lustre/users/USERNAME/cuda_test/test1.err
#PBS -m abe
#PBS -M your.email@address
 
cd /mnt/lustre/users/USERNAME/cuda_test
 
echo
echo `date`: executing CUDA job on host ${HOSTNAME}
echo
 
# Run program
./hello_cuda

Compiling GPU Code

The Nvidia V100 GPUs are programmed using the CUDA development tools.

To build a CUDA code (library or application) for the GPU nodes requires loading the appropriate CUDA module before compiling. The CUDA runtime tools are already installed on all GPU nodes and won't need to be loaded specifically unless you require (for some reason) a different version. The V100 GPUs have Volta architecture cores. CUDA applications built using CUDA Toolkit versions 2.1 through 8.0 are compatible with Volta as long as they are built to include PTX versions of their kernels. To test that PTX JIT is working for your application, you can do the following: Download and install the latest driver from http://www.nvidia.com/drivers. Set the environment variable CUDA_FORCE_PTX_JIT=1. Launch your application. When starting a CUDA application for the first time with the above environment flag, the CUDA driver will JIT-compile the PTX for each CUDA kernel that is used into native cubin code.

If you set the environment variable above and then launch your program and it works properly, then you have successfully verified Volta compatibility.

Note: Be sure to unset the CUDA_FORCE_PTX_JIT environment variable when you are done testing.

Further Reading

/var/www/wiki/data/pages/guide/gpu.txt · Last modified: 2019/09/30 14:02 by wikiadmin