UltraFluidX is a Lattice Boltzman solver that is part of Altair's suite of CFD codes. The CHPC hosts a license on behalf of the Mechanical engineering department at Nelson Mandela University. This license is access-controlled and available to users from the this department only. If you are interested in using any of the Altair fluid dynamics solvers, please contact Ernst Burger at Altair.
Altair's pre-processor Virtual Wind-tunnel is not available on Linux. However, consider running VWT on your Windows workstation and accessing your storage on the cluster by mounting it on your workstation with sshfs
Graphical post-processing can be done by means of Altair's tools, or with Paraview. The CHPC's Paraview page supplies instructions on how to use it on the system. Please note that although Paraview works well as a free-standing program, it is actually designed to be used in client-server mode, whereby you can make use of the power and memory capacity of a cluster compute node to handle the heavy processing while displaying the images on your own local workstation.
The software is installed under the directory
/apps/chpc/compmech/altair/CFD2021.2
To set up the paths to the relevant binaries, libraries, scripts and license server, source the setCFD script as follows:
. /apps/chpc/compmech/altair/setCFD
If you want to test interactively in a GPU-enabled compute node, first get an interactive PBS session and then source the script. The source statement also needs to be included in your PBS script, when you are running in the more typical batch mode. Obtain an interactive GPU node for 1 hour as follows (and please use your own project code!):
qsub -X -I -l select=1:ncpus=2:mpiprocs=2:ngpus=1 -q gpu_1 -P MECH1234 -l walltime=1:00:00
qsub runUFX_1.qsub
qstat -awu jblogs
qstat -f 4096301.sched01
qstat -n1 4096301.sched01
qdel 4096301.sched01
qdel -W force 4096301.sched01
less uFX_log_2022-03-28_08-27-50.out
Scroll down with the arrow keys or spacebar, exit with the q
key.
tail -f uFX_log_2022-03-28_08-27-50.out
To exit press CTRL-C.
/apps/chpc/compmech/nvtop/bin/nvtop
To exit press the q
key.
UltraFluidX requires one more MPI rank than the number of GPUs. Please refer to this table:
Number of GPUs | ncpus | mpiprocs | ngpus | queue | Command line |
---|---|---|---|---|---|
1 | 2 | 2 | 1 | gpu_1 | ufx -np 2 -inpFile …. |
2 | 3 | 3 | 2 | gpu_2 | ufx -np 3 -inpFile …. |
3 | 4 | 4 | 3 | gpu_3 | ufx -np 4 -inpFile …. |
4 | 5 | 5 | 4 | gpu_4 | ufx -np 5 -inpFile …. |
This is an example of a PBS script for running on a single GPU:
#!/bin/bash #PBS -l select=1:ncpus=2:mpiprocs=2:ngpus=1 #PBS -l walltime=00:10:00 #PBS -q gpu_1 #PBS -P MECH1234 #PBS -o /mnt/lustre/users/jblogs/UltraFluidX/cube/cube.out #PBS -e /mnt/lustre/users/jblogs/UltraFluidX/cube/cube.err . /apps/chpc/compmech/altair/setCFD cd /mnt/lustre/users/jblogs/UltraFluidX/cube ufx -np 2 -inpFile cube.xml
This is an example of a PBS script for running on 3 GPUs:
#!/bin/bash #PBS -l select=1:ncpus=4:mpiprocs=4:ngpus=3 #PBS -l walltime=04:00:00 #PBS -q gpu_3 #PBS -P MECH1234 #PBS -o /mnt/lustre/users/jblogs/UltraFluidX/roadster/roadster.out #PBS -e /mnt/lustre/users/jblogs/UltraFluidX/roadster/roadster.err . /apps/chpc/compmech/altair/setCFD cd /mnt/lustre/users/jblogs/UltraFluidX/roadster ufx -np 4 -inpFile roadster.xml
Running on a single V100 GPU card produces a performance of around 400 MNUPS. Running on 2 cards can deliver up to 690 and on 3 cards around 1050 MNUPS is possible. The current configuration of the scheduler does not make it straightforward to run over multiple compute nodes. Until further notice, do not attempt to use more than one node at a time, but it is worthwhile to use multiple cards if the model size justifies it. However, there is an important factor that must be considered. Please refer to the configuration of the GPU cluster as documented here. Only three of the nodes have the GPUs connected with the high speed low-latency NVlink fabric. In the remaining nodes the GPUs are connected with the PCIe system. For an application like UltraFluidX it appears to be essential to use NVlink on multi-GPU runs. Running with more than one GPU on the PCIe system only results in around 310 MNUPS, which is slower than using a single GPU. In order to obtain a full 4-GPU node with NVlink, the following type of submission can be used:
qsub -l select=1:ncpus=5:mpiprocs=5:ngpus=4 -q gpu_4 -P MECH1234 -l walltime=4:00:00
It gets more difficult if only two or three NVlinked GPUs are required. It is then necessary to nominate a particular NVlink-equipped compute node:
qsub -l select=1:ncpus=3:mpiprocs=3:ngpus=2:host=gpu4001 -q gpu_4 -P MECH1234 -l walltime=4:00:00
The scheduler is not currently configured in such a way that it is possible to select any available one of the three NVlink-equipped nodes. This may change in future.