Altair UltraFluidX

Introduction

UltraFluidX is a Lattice Boltzman solver that is part of Altair's suite of CFD codes. The CHPC hosts a license on behalf of the Mechanical engineering department at Nelson Mandela University. This license is access-controlled and available to users from the this department only. If you are interested in using any of the Altair fluid dynamics solvers, please contact Ernst Burger at Altair.

Pre- and post-processing

Altair's pre-processor Virtual Wind-tunnel is not available on Linux. However, consider running VWT on your Windows workstation and accessing your storage on the cluster by mounting it on your workstation with sshfs

Graphical post-processing can be done by means of Altair's tools, or with Paraview. The CHPC's Paraview page supplies instructions on how to use it on the system. Please note that although Paraview works well as a free-standing program, it is actually designed to be used in client-server mode, whereby you can make use of the power and memory capacity of a cluster compute node to handle the heavy processing while displaying the images on your own local workstation.

Using UltraFluidX

The software is installed under the directory

 /apps/chpc/compmech/altair/CFD2021.2

To set up the paths to the relevant binaries, libraries, scripts and license server, source the setCFD script as follows:

 .  /apps/chpc/compmech/altair/setCFD

If you want to test interactively in a GPU-enabled compute node, first get an interactive PBS session and then source the script. The source statement also needs to be included in your PBS script, when you are running in the more typical batch mode. Obtain an interactive GPU node for 1 hour as follows (and please use your own project code!):

 qsub -X -I -l select=1:ncpus=2:mpiprocs=2:ngpus=1 -q gpu_1 -P MECH1234 -l walltime=1:00:00

Example PBS scripts

To submit the batch job runUFX_1.qsub to the scheduler:
```
 qsub runUFX_1.qsub 
```
To check the status of your jobs, and get their Job IDs, if your userid is jblogs:
```
 qstat -awu jblogs 
```
To get full information on say Job 4096301.sched01 :
```
 qstat -f 4096301.sched01 
```
To get the hostname(s) where the Job 4096301.sched01 is running:
```
 qstat -n1 4096301.sched01 
```
To terminate Job 4096301.sched01 :
```
 qdel 4096301.sched01 
```
To terminate Job 4096301.sched01 with extreme prejudice:
```
 qdel -W force 4096301.sched01 
```
To view the output file uFX_log_2022-03-28_08-27-50.out
```
 less uFX_log_2022-03-28_08-27-50.out 
```
Scroll down with the arrow keys or spacebar, exit with the q key.
To monitor the progress of the job that is writing the output file uFX_log_2022-03-28_08-27-50.out
```
 tail -f uFX_log_2022-03-28_08-27-50.out 
```
To exit press CTRL-C.
To view GPU usage, get the hostname of the node being used, then ssh to it. This is only possible as long as you have a job running on it. Then run the command
```
 /apps/chpc/compmech/nvtop/bin/nvtop 
```
To exit press the q key.

Relationship between the number of MPI ranks and the number of GPUs

UltraFluidX requires one more MPI rank than the number of GPUs. Please refer to this table:

Number of GPUs	ncpus	mpiprocs	ngpus	queue	Command line
1	2	2	1	gpu_1	ufx -np 2 -inpFile ….
2	3	3	2	gpu_2	ufx -np 3 -inpFile ….
3	4	4	3	gpu_3	ufx -np 4 -inpFile ….
4	5	5	4	gpu_4	ufx -np 5 -inpFile ….

This is an example of a PBS script for running on a single GPU:

runUFX_1.qsub

#!/bin/bash
#PBS -l select=1:ncpus=2:mpiprocs=2:ngpus=1
#PBS -l walltime=00:10:00
#PBS -q gpu_1
#PBS -P MECH1234
#PBS -o /mnt/lustre/users/jblogs/UltraFluidX/cube/cube.out
#PBS -e /mnt/lustre/users/jblogs/UltraFluidX/cube/cube.err
. /apps/chpc/compmech/altair/setCFD
cd /mnt/lustre/users/jblogs/UltraFluidX/cube
ufx -np 2 -inpFile cube.xml

This is an example of a PBS script for running on 3 GPUs:

runUFX_3.qsub

#!/bin/bash
#PBS -l select=1:ncpus=4:mpiprocs=4:ngpus=3
#PBS -l walltime=04:00:00
#PBS -q gpu_3
#PBS -P MECH1234
#PBS -o /mnt/lustre/users/jblogs/UltraFluidX/roadster/roadster.out
#PBS -e /mnt/lustre/users/jblogs/UltraFluidX/roadster/roadster.err
. /apps/chpc/compmech/altair/setCFD
cd /mnt/lustre/users/jblogs/UltraFluidX/roadster
ufx -np 4 -inpFile roadster.xml

Performance considerations

Running on a single V100 GPU card produces a performance of around 400 MNUPS. Running on 2 cards can deliver up to 690 and on 3 cards around 1050 MNUPS is possible. The current configuration of the scheduler does not make it straightforward to run over multiple compute nodes. Until further notice, do not attempt to use more than one node at a time, but it is worthwhile to use multiple cards if the model size justifies it. However, there is an important factor that must be considered. Please refer to the configuration of the GPU cluster as documented here. Only three of the nodes have the GPUs connected with the high speed low-latency NVlink fabric. In the remaining nodes the GPUs are connected with the PCIe system. For an application like UltraFluidX it appears to be essential to use NVlink on multi-GPU runs. Running with more than one GPU on the PCIe system only results in around 310 MNUPS, which is slower than using a single GPU. In order to obtain a full 4-GPU node with NVlink, the following type of submission can be used:

 qsub -l select=1:ncpus=5:mpiprocs=5:ngpus=4 -q gpu_4 -P MECH1234 -l walltime=4:00:00

It gets more difficult if only two or three NVlinked GPUs are required. It is then necessary to nominate a particular NVlink-equipped compute node:

 qsub -l select=1:ncpus=3:mpiprocs=3:ngpus=2:host=gpu4001 -q gpu_4 -P MECH1234 -l walltime=4:00:00

The scheduler is not currently configured in such a way that it is possible to select any available one of the three NVlink-equipped nodes. This may change in future.

wiki.chpc.ac.za

Table of Contents

Altair UltraFluidX

Introduction

Pre- and post-processing

Using UltraFluidX

Example PBS scripts

Relationship between the number of MPI ranks and the number of GPUs

Performance considerations

wiki.chpc.ac.za

User Tools

Site Tools

Table of Contents

Altair UltraFluidX

Introduction

Pre- and post-processing

Using UltraFluidX

Example PBS scripts

Relationship between the number of MPI ranks and the number of GPUs

Performance considerations

Page Tools