New problem, as of 11 AM on 25 March 2023. It is not possible to start the Ansys license server on login1 due to a library compatibility problem following the emergency shutdown. For now, the only available Ansys license is on chpclic1, and this works only for pre-R21.1 versions of the software.
Please bear with us while we work with Ansys on sorting out some teething problems introduced by the license system changes.
With the release of Ansys version 21.1, the structure of the license pool has changed dramatically. The old license resources aa_r_cfd and aa_r_hpc are no longer relevant, and requesting them will prevent your job from running. It is necessary to change these statements to request the new license resources cfd_base and anshpc, as per the information given below.
We are in the process of retiring the existing license server chpclic1. The license has been moved to login1. Please change your .bashrc file and job scripts to point to the new license server, as per the example scripts below.
The CHPC has an installation of Ansys-CFD along with a limited license for academic use only. The license covers use of the Fluent and CFX solvers, as well as the IcemCFD meshing code. If you are a new Ansys user on Lengau, submit a helpdesk ticket, requesting access to the license.
If you are a full time student or staff at an academic institution then you may request access to use Ansys-CFD on the CHPC cluster. Please go to the CHPC user database to register and request resources. Commercial use of Ansys software at the CHPC is also possible, but software license resources need to be negotiated directly with Ansys or their local agents. Remote license check-out has not been ruled out by Ansys, but once again this needs to be negotiated with the software vendor.
Ansys software versions have been installed under /mnt/lustre/apps/CHPC/compmech/CFD/ansys_inc
and /home/apps/chpc/compmech/ansys_inc
, but all versions have been symbolically linked to /apps/chpc/compmech/CFD/ansys_inc
, from where they may be accessed:
v160 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v160 v172 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v172 v180 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v180 v181 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v181 v182 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v182 v190 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v190 v191 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v191 v192 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v192 v194 -> /mnt/lustre/apps/chpc/compmech/CFD/ansys_inc/v194 v195 -> /home/apps/chpc/compmech/ansys_inc/v195 v212 -> /home/apps/chpc/compmech/ansys_inc/v212 v221 -> /home/apps/chpc/compmech/ansys_inc/v221 v222 -> /home/apps/chpc/compmech/ansys_inc/v222 v231 -> /home/apps/chpc/compmech/ansys_inc/v231 v232 -> /home/apps/chpc/compmech/ansys_inc/v232
CHPC has academic licenses for AnsysCFD. If you are a new Ansys user on Lengau, submit a helpdesk ticket, requesting access to the license. There are 25 “solver” processes and 4096 “HPC” licenses. Please use the license resource management system to ensure that your job does not start if there are no licenses available for it.
If you request license resources (as in these example scripts), the scheduler will check for license availability before starting a job. License unavailability will result in the job being held back until the necessary licenses have become available. Although use of the license resource request is not mandatory, its use is strongly recommended. If you do not use the license resource requests, the job will fail if no licenses are available. A single cfd_base license is required to start the solver, and includes up to 4 HPC licenses. Therefore you should request ($nproc-4) anshpc licenses. Do not request more than you need, it will delay the start of your job.
Update. Although we previously stated that “The Fluent licenses are in general highly utilised. The consequence is that jobs may be held back due to unavailability of licenses. It is possible for the CHPC to forcefully apply measures that will ensure fair use. However, in order to avoid this situation, please stick to the following guidelines:
qsub -W depend=afterany:123456 thisjob.pbs
where 123456 should be replaced with the number of the previously submitted job, and thisjob.pbs is simply the name of the new script that you are submitting. The afterany directive will make sure that the dependent job gets launched regardless of whether the running job has finished normally, crashed or been killed.
The above information is now (as of June 2022) no longer applicable. For the last several months, the usage level of the Ansys license pool has been well below maximum capacity. Users are therefore encouraged to make more aggressive use of the resources, until further notice.
On the CHPC cluster all simulations are submitted as jobs to the PBS Pro job scheduler which will assign your job to the appropriate queue. Below are given instructions and examples, which are also further elaborated upon here.
Example job script:
#!/bin/bash ##### The following line will request 4 (virtual) nodes, each with 24 cores running 24 mpi processes for ##### a total of 96-way parallel. Specifying memory requirement is unlikely to be necessary, as the ##### compute nodes have 128 GB each. #PBS -l select=4:ncpus=24:mpiprocs=24:mem=32GB:nodetype=haswell_reg #### Check for license availability. If insufficient licenses are available, job will be held back untill #### licenses are available. #PBS -l cfd_base=1 #PBS -l anshpc=92 ## For your own benefit, try to estimate a realistic walltime request. Over-estimating the ## wallclock requirement interferes with efficient scheduling, will delay the launch of the job, ## and ties up more of your CPU-time allocation untill the job has finished. #PBS -q normal #PBS -P myprojectcode #PBS -l walltime=1:00:00 #PBS -o /mnt/lustre/users/username/FluentTesting/fluent.out #PBS -e /mnt/lustre/users/username/FluentTesting/fluent.err #PBS -m abe #PBS -M username@email.co.za ##### Running commands #### Put these commands in your .bashrc file as well, to ensure that the compute nodes #### have the correct environment. Ensure that any OpenFOAM-related environment #### settings have been removed. ####### PLEASE NOTE THAT THE LICENSE SERVER ID HAS NOW CHANGED, IT IS NOW login1 export LM_LICENSE_FILE=1055@login1 export ANSYSLMD_LICENSE_FILE=1055@login1 # Edit this next line to select the appropriate version. export PATH=/apps/chpc/compmech/CFD/ansys_inc/v221/fluent/bin:$PATH export FLUENT_ARCH=lnamd64 #### explicitly set working directory and change to that. export PBS_JOBDIR=/mnt/lustre/users/username/FluentTesting cd $PBS_JOBDIR nproc=`cat $PBS_NODEFILE | wc -l` exe=fluent $exe 3d -t$nproc -pinfiniband -ssh -cnf=$PBS_NODEFILE -g < fileContainingTUIcommands > run.out
There are two methods which can be used to submit a series of instructions to Fluent. In the above example, a file containing so-called “TUI” commands is passed to Fluent, either by the “<” redirection symbol, or with the “-i” command line option. There are two disadvantages to using this method:
The second method allows the use of a recorded journal file and also supports “on the fly” generation of images. We have previously made use of the virtual frame buffer “Xvfb” to enable this. However, the frame buffer method has now been deprecated, simply use the addition of the command line options -gu -driver null
to enable the generation of images. The following is an example of a PBS job script using this method:
#!/bin/bash ##### The following line will request 4 (virtual) nodes, each with 24 cores running 24 mpi processes for ##### a total of 96-way parallel. #PBS -l select=4:ncpus=24:mpiprocs=24:mem=32GB:nodetype=haswell_reg #### License resource request. #PBS -l cfd_base=1 #PBS -l anshpc=92 ## For your own benefit, try to estimate a realistic walltime request. Over-estimating the ## wallclock requirement interferes with efficient scheduling, will delay the launch of the job, ## and ties up more of your CPU-time allocation untill the job has finished. #PBS -q normal #PBS -P myprojectcode #PBS -l walltime=1:00:00 #PBS -o /mnt/lustre/users/username/FluentTesting/fluent.out #PBS -e /mnt/lustre/users/username/FluentTesting/fluent.err #PBS -m abe #PBS -M username@email.co.za ##### Running commands #### Put these commands in your .bashrc file as well, to ensure that the compute nodes #### have the correct environment. Ensure that any OpenFOAM-related environment #### settings have been removed. ####### PLEASE NOTE THAT THE LICENSE SERVER ID HAS NOW CHANGED, IT IS login1 export LM_LICENSE_FILE=1055@login1 export ANSYSLMD_LICENSE_FILE=1055@login1 # Edit this next line to select the appropriate version. export PATH=/apps/chpc/compmech/CFD/ansys_inc/v221/fluent/bin:$PATH export FLUENT_ARCH=lnamd64 #### explicitly set working directory and change to that. export PBS_JOBDIR=/mnt/lustre/users/username/FluentTesting cd $PBS_JOBDIR nproc=`cat $PBS_NODEFILE | wc -l` exe=fluent $exe 3d -t$nproc -pinfiniband -ssh -cnf=$PBS_NODEFILE -gu -driver null -i journalFile.jou > run.out
Within fairly strict limitations, it is now possible to run Fluent on Nvidia GPUs instead of CPUs. We have good news and bad news for you about this. The good news is that the performance is spectacularly good and can be regarded as game-changing. A single V100 card has more or less the same performance as 8 Lengau compute nodes with 192 cores. The bad news items are:
The most important thing to bear in mind is that there should be one MPI rank for each GPU. No more and no less. The GPU nodes have lots of CPU cores, so you may as well assign 10 CPU cores for each GPU, or just 1, it does not matter. The entire job runs on GPU, although each GPU requires one CPU core to control it. The resource request line in the job script needs to specify ngpus
in addition to the usual ncpus
and mpiprocs
. At this stage, set anshpc
to 20 per GPU.
Please note that the walltime limit on the GPU queues is just 12 hours. The GPUs have limited amounts of memory, so if your job mysteriously crashes, the most probable cause is inadequate memory. Most of the cards have only 16GB, although there are some with 32GB. For this reason, do not use double precision unless you really need it.
There are separate GPU queues for 1, 2, 3 and 4 GPU jobs. Ensure that you use the correct one.
Once the job is running, find the hostname of the node where your job is running with qstat -n1
followed by the job number. Then ssh into that node and monitor the GPU activity with nvidia-smi
, or, more usefully, with nvtop
. There is a module for nvtop:
module load chpc/compmech/nvtop/1.2.2
Alternatively, just give the full path to nvtop or set up an alias for it:
/apps/chpc/compmech/nvtop/bin/nvtop
#!/bin/bash #PBS -l select=1:ncpus=10:mpiprocs=1:ngpus=1 #PBS -l cfd_base=1 #PBS -l anshpc=20 #PBS -q gpu_1 #PBS -P MECH1234 #PBS -l walltime=02:00:00 #PBS -o /mnt/lustre/users/jblogs/FluentGPUcase/fluent_1gpu.out #PBS -e /mnt/lustre/users/jblogs/FluentGPUcase/fluent_1gpu.err export LM_LICENSE_FILE=1055@login1 export ANSYSLMD_LICENSE_FILE=1055@login1 # Edit this next line to select the appropriate version. export PATH=/apps/chpc/compmech/CFD/ansys_inc/v231/fluent/bin:$PATH export FLUENT_ARCH=lnamd64 #### explicitly set working directory and change to that. export PBS_JOBDIR=/mnt/lustre/users/jblogs/FluentGPUcase cd $PBS_JOBDIR fluent 3d -t1 -pinfiniband -ssh -cnf=$PBS_NODEFILE -gpuapp -gpgpu=1 -g < iterate.txt > run1gpu.out
#!/bin/bash #PBS -l select=1:ncpus=10:mpiprocs=3:ngpus=3 #PBS -l cfd_base=1 #PBS -l anshpc=60 #PBS -q gpu_3 #PBS -P MECH1234 #PBS -l walltime=0:40:00 #PBS -o /mnt/lustre/users/jblogs/FluentGPUcase/fluent_3gpu.out #PBS -e /mnt/lustre/users/jblogs/FluentGPUcase/fluent_3gpu.err export LM_LICENSE_FILE=1055@login1 export ANSYSLMD_LICENSE_FILE=1055@login1 # Edit this next line to select the appropriate version. export PATH=/apps/chpc/compmech/CFD/ansys_inc/v231/fluent/bin:$PATH export FLUENT_ARCH=lnamd64 #### explicitly set working directory and change to that. export PBS_JOBDIR=/mnt/lustre/users/jblogs/FluentGPUcase cd $PBS_JOBDIR fluent 3d -t3 -pinfiniband -ssh -cnf=$PBS_NODEFILE -gpuapp -gpgpu=3 -g < iterate.txt > run3gpu.out
Some tasks, such as setting up runs, meshing or post-processing may require a graphics-capable login. This is possible in a number of ways. Using a compute node for a task that requires graphics involves a little bit of trickery, but is really not that difficult.
Obtain exclusive use of a compute node by logging into Lengau according to your usual method, and obtaining an interactive session:
qsub -I -l select=1:ncpus=24:mpiprocs=24 -q smp -P MECH1234 -l walltime=4:00:00
Obviously replace MECH1234 with the shortname of your particular Research Programme. Note down the name of the compute node that you have been given, let us use cnode0123 for this example. You can also use an interactive session like this to perform “service” tasks, such as archiving or compressing data files, which will be killed when attempted on the login node.
There are three ways of doing this:
X-forwarding in two stages is really only a practical proposition if you are on a fast, low-latency connection into the Sanren network. Otherwise, get the VNC session first by following these instructions.
From an X-windows capable workstation (in other words, from a Linux terminal command prompt, or an emulator on Windows that includes an X-server, such as MobaXterm, or a VNC session on one of the visualization nodes), log in to Lengau:
ssh -X jblogs@lengau.chpc.ac.za
Once logged in, do a second X-forwarding login to your assigned compute node:
ssh -X cnode0123
. Alternatively, you can also do an interactive PBS session with X-forwarding:
qsub -I -l select=1:ncpus=24:mpiprocs=24 -q smp -P MECH1234 -l walltime=4:00:00 -X
A normal broadband connection will probably be too slow to use the double X-forwarding method. In this case, first get the VNC desktop going, as described above, and open a terminal. From this terminal, log in to your assigned compute node:
ssh -X cnode0123
export LM_LICENSE_FILE=1055@login1 export ANSYSLMD_LICENSE_FILE=1055@login1 export PATH=/apps/chpc/compmech/CFD/ansys_inc/v221/fluent/bin:$PATH export FLUENT_ARCH=lnamd64
You can now simply start the program in the usual way, with the command
fluent 3d -t24 -ssh
Thanks to the magic of software rendering, you have access to the GUI and graphics capability of the interface.
Starting with Version 19.0 of the software, it is now possible to use a GUI to connect to a Fluent process that is already running. The process requires that the Fluent be started with access to an X-server, therefore use a run command that contains the parameters -gu -driver null
. Here is a minimalist example of such a script:
#!/bin/bash #PBS -l select=5:ncpus=24:mpiprocs=24:nodetype=haswell_reg #PBS -q normal #PBS -P MECH1234 #PBS -l walltime=12:00:00 #PBS -o /mnt/lustre/users/username/FluentTest/fluent.out #PBS -e /mnt/lustre/users/username/FluentTest/fluent.err #PBS -l cfd_base=1 #PBS -l anshpc=116 export LM_LICENSE_FILE=1055@login1 export ANSYSLMD_LICENSE_FILE=1055@login1 export PATH=$PATH:/apps/chpc/compmech/CFD/ansys_inc/v221/fluent/bin export FLUENT_ARCH=lnamd64 cd /mnt/lustre/users/username/FluentTest nproc=`cat $PBS_NODEFILE | wc -l` fluent 3ddp -t$nproc -pinfiniband -ssh -mpi=intel -cnf=$PBS_NODEFILE -gu -driver null -i runCommands.txt | tee fluentrun.out
It is critical that the file containing the run instructions, in this case called runCommands.txt, has the following line:
server/start-server server-info.txt
This will create a file called server-info.txt, which contains the hostname of the master node, as well as a port number which the remote client will need to connect to.
On the viz node (you have a TurboVNC session open, right?), get a terminal, change directory to where your Fluent run is, and issue the following command:
/opt/VirtualGL/bin/vglrun /apps/chpc/compmech/CFD/ansys_inc/v190/fluent/bin/flremote &
The Fluent Remote Visualization Client will start up. Provide the appropriate Server Info Filename and you will be able to connect to your Fluent process.
The “standard” process assumes that the user already has a local license for the software.
If your simulations files are too large, or your internet connection too slow, consider transferring geometry and script files only. This will require careful scripting and testing, but is certainly practical.
set term dumb
to get funky 1970's style ASCII graphics.