Please refer to the FDS home page for more information on FDS, an open-source code targeted at fire simulation. There are two installed versions. The older version of the code is 6.7.0, and has been installed in '/apps/chpc/compmech/CFD/FDS
'. There is a module for FDS installed on Lengau. module load chpc/compmech/FDS/6.7.0
will set up the appropriate environment.
The most recent version is FDS-6.7.9. This is installed in /home/apps/chpc/compmech/FDS/FDS-6.7.9
, along with Smokeview-6.7.21. There is also a directory containing sample cases in /home/apps/chpc/compmech/FDS/FDS-6.7.9/Examples
. To use these versions, load the following modules:
module load chpc/compmech/FDS/FDS-6.7.9 module load chpc/compmech/FDS/SMV-6.7.21
This will give you access to two fds executables:
If you intend using MPI parallel, it will be necessary to provide a machinefile for this version, which has not been compiled with one of the system MPI compilers. A typical command line will thus be:
mpirun -np 48 -machinefile $PBS_NODEFILE fds box_burn_away1.fds > fds.out
If you want to run Smokeview on a compute node, it will be necessary to load a module for Mesa software rendering:
module load chpc/compmech/mesa/20.2.2_swr
#!/bin/bash ## Lines starting with the # symbol are comments, unless followed by ! or PBS, ## in which case they are directives ## The following PBS directive requests two 24 core compute nodes ## The $PBS_NODEFILE file will contain the hostnames for 16 MPI processes (2 X 8, as per mpiprocs) ## This is only meaningful if your model contains 16 grids ## The number of grids must match the number of MPI processes #PBS -l select=2:ncpus=24:mpiprocs=8 -q normal ## Specify your own project shortcode here #PBS -P MECH1234 ## The walltime should be a small overestimate of the expected run time ## Requesting a very long walltime may delay the start of your job ## If the requested walltime is too short, the job will be killed before it is finished #PBS -l walltime=6:00:00 ## Obviously use your own paths here #PBS -e /mnt/lustre/users/jblogs/FDS_Runs/stderr.txt #PBS -o /mnt/lustre/users/jblogs/FDS_Runs/stdout.txt export PBS_JOBDIR=/mnt/lustre/users/jblogs/FDS_Runs cd $PBS_JOBDIR module load chpc/compmech/FDS/FDS-6.7.9 ## Assign a sensible value for OMP_NUM_THREADS ## If your number of MPI processes is a multiple of ## 24, it will be best to set it to 1, ## and not use OpenMP at all ## A value greater than 3 does not help export OMP_NUM_THREADS=3 ## The number of MPI processes is extracted from the length of the machinefile $PBS_NODEFILE nproc=`cat $PBS_NODEFILE | wc -l` mpirun -np $nproc -machinefile $PBS_NODEFILE fds_openmp FDS_inputFile.fds > fds.out
#!/bin/bash ## Lines starting with the # symbol are comments, unless followed by ! or PBS, ## in which case they are directives ## The following PBS directive requests two 24 core compute nodes ## The $PBS_NODEFILE file will contain the hostnames for 16 MPI processes (2 X 8, as per mpiprocs) ## This is only meaningful if your model contains 16 grids ## The number of grids must match the number of MPI processes #PBS -l select=2:ncpus=24:mpiprocs=8 -q normal ## Specify your own project shortcode here #PBS -P MECH1234 ## The walltime should be a small overestimate of the expected run time ## Requesting a very long walltime may delay the start of your job ## If the requested walltime is too short, the job will be killed before it is finished #PBS -l walltime=6:00:00 ## Obviously use your own paths here #PBS -e /mnt/lustre/users/jblogs/FDS_Runs/stderr.txt #PBS -o /mnt/lustre/users/jblogs/FDS_Runs/stdout.txt export PBS_JOBDIR=/mnt/lustre/users/jblogs/FDS_Runs cd $PBS_JOBDIR module load chpc/compmech/FDS/6.7.0 ## Assign a sensible value for OMP_NUM_THREADS ## If your number of MPI processes is a multiple of ## 24, it will be best to set it to 1, ## and not use OpenMP at all ## A value greater than 3 does not help export OMP_NUM_THREADS=3 ## The number of MPI processes is extracted from the length of the machinefile $PBS_NODEFILE nproc=`cat $PBS_NODEFILE | wc -l` mpirun -np $nproc fds FDS_inputFile.fds > fds.out
Smokeview is also installed on the system, and can be accessed with the command smokeview
or smv
. This can be done on one of the visualisation nodes chpcviz1 or chpclic1. Please read the instructions on setting up a VNC connection, and run Smokeview with the VirtualGL wrapper /opt/VirtualGL/bin/vglrun smokeview
.
If you want to run Smokeview on a compute node, please read the instructions at https://wiki.chpc.ac.za/howto:remote_viz#getting_a_virtual_desktop_on_a_compute_node. You will need to load a mesa module to enable software rendering.
It may also be practical to use the very well-developed visualisation codes Paraview or VisIt. Please experiment and provide feedback.
FDS implements two forms of parallisation, using OpenMP threads as well as MPI-based domain decomposition.
OpenMP provides only modest improvement in performance, but has the advantage of also working with a single grid model. Going from 1 OpenMP thread to 2 provides a modest but helpful improvement, and going to 3 threads will provide another very small improvement. More than 3 OpenMP threads do not provide more improvement.
MPI parallel will only work if the model has been set up in such a way that the number of grids is equal to the number of MPI processes. Somewhat confusingly, the code will still run if this condition is not satisfied, but not efficiently. If there are more MPI processes than grids, the extra MPI processes will start and consume CPU resources, but not do any useful work. If there are more grids than MPI processes, the slowdown is quite dramatic. MPI parallel scaling is very good, provided that the number of grids match the number of MPI processes, and are all similarly dimensioned. The compute nodes in the Lengau cluster have 24 cores each. Good MPI scaling and efficiency is therefore achieved by developing models where the number of grids is a multiple of 12 or 24. Underloading the compute nodes, by running say 12 MPI processes, each with two OpenMP threads, will achieve the best results, at the expense of occupying more nodes. This is a typical characteristic of the performance of any CFD code, which is strongly constrained by memory bandwidth. Maximum performance is achieved by accessing the largest number of memory channels.