User Tools

Site Tools


guide:gpu

GPU Nodes

Upgraded: more GPUs added.

The Lengau cluster at the CHPC includes 9 GPU compute nodes with a total of 30 Nvidia V100 GPU devices. There are 6 gpu200n nodes with 3 GPUs in each, and 3 gpu400n nodes with 4 GPUs in each.

GPU Node CPU Cores GPU Devices Interface
gpu2001 36 3× Nvidia V100 16GB PCIe
gpu2002 36 3× Nvidia V100 16GB PCIe
gpu2003 36 3× Nvidia V100 16GB PCIe
gpu2004 36 3× Nvidia V100 16GB PCIe
gpu2005 36 3× Nvidia V100 32GB PCIe
gpu2006 36 3× Nvidia V100 32GB PCIe
gpu4001 40 4× Nvidia V100 16GB NVlink
gpu4002 40 4× Nvidia V100 16GB NVlink
gpu4003 40 4× Nvidia V100 16GB NVlink
Jobs that require 1, 2 or 3 GPUs can be allocated to any node, and will share the node if the job does not use all the GPU devices on that node. Jobs that require 4 GPUs can only be allocated to gpu4* nodes and will not be shared, obviously.

Policies

Access

Principle Investigators apply for GPU access for their Research Programme members through the CHPC Helpdesk. RP members may not apply directly. E-mailed applications will not be considered.

Allocation

Research programme allocations will be depleted by a factor of the wallclock time and the number of GPUs (1, 2, or 4) requested by the job.

gpu_allocation_used = 40 * runtime * ngpus

Usage

GPU applications on Lengau

Some pre-built applications have automated scripts that you can use to launch them:

GPU Job Scripts

GPU Queues

There are four queues available in PBSPro which access the GPU nodes:

Queue name Max. CPUs Max. GPUs PBSPro options Comments
gpu_1 9 1 -q gpu_1
-l select=1:ncpus=9:ngpus=1
Access one GPU device only per job.
gpu_2 18 2 -q gpu_2
-l select=1:ncpus=18:ngpus=2
Access two GPU devices per job.
gpu_3 36 3 -q gpu_3
-l select=1:ncpus=30:ngpus=3
Access three GPU devices per job.
gpu_4 40 4 -q gpu_4
-l select=1:ncpus=40:ngpus=4
Access four GPU devices on NVLink nodes.

Note the ncpus parameters above is the maximum that should be set to match the number of GPU devices you need.

GPU Queue Limits

The maximum wall clock time on all GPU queues is 12 hours.

#PBS -l walltime=12:00:00
It is better to specify a shorter walltime if your code executes in less time: this allows the scheduler a better chance of running your job sooner.

Interactive Job on a GPU Node

A single interactive session may be request on a GPU node by

qsub -I -q gpu_1 -P PRJT1234 -l select=1:ncpus=9:ngpus=1

NB: Replace PRJT1234 with your project number.

The default time for an interactive session is 1 hour.

Example Job Script

#!/bin/bash
#PBS -N nameyourjob
#PBS -q gpu_1
#PBS -l select=1:ncpus=4:ngpus=1
#PBS -P PRJT1234
#PBS -l walltime=4:00:00
#PBS -m abe
#PBS -M your.email@address
 
cd /mnt/lustre/users/USERNAME/cuda_test
 
echo
echo `date`: executing CUDA job on host ${HOSTNAME}
echo
echo Available GPU devices: $CUDA_VISIBLE_DEVICES
echo
 
# Run program
./hello_cuda

As usual, replace PRJT1234 with your group's project name, your.email@address with your email address, and USERNAME with your cluster user name.

Compiling GPU Code

The Nvidia V100 GPUs are programmed using the CUDA development tools.

To build a CUDA code (library or application) for the GPU nodes requires loading the appropriate CUDA module before compiling. The current CUDA modules are:

chpc/cuda/11.2/PCIe/11.2
chpc/cuda/11.2/SXM2/11.2
chpc/cuda/11.5.1/PCIe/11.5.1
chpc/cuda/11.6/PCIe/11.6
chpc/cuda/11.6/SXM2/11.6
chpc/cuda/12.0/12.0

with version 12.0 the most recent version.

Note that the 11.x version modules are available in two types:

  • the PCIe version is for PCIe bus nodes: gpu200x
  • the SXM2 version is for the SXM2 bus nodes: gpu400x

Further Reading

/app/dokuwiki/data/pages/guide/gpu.txt · Last modified: 2024/10/15 14:00 by ccrosby