Running the open-source Fire Dynamics Simulator FDS on the CHPC system

Introduction

Please refer to the FDS home page for more information on FDS, an open-source code targeted at fire simulation. There are two installed versions. The older version of the code is 6.7.0, and has been installed in '/apps/chpc/compmech/CFD/FDS'. There is a module for FDS installed on Lengau. module load chpc/compmech/FDS/6.7.0 will set up the appropriate environment.

The most recent version is FDS-6.7.9. This is installed in /home/apps/chpc/compmech/FDS/FDS-6.7.9, along with Smokeview-6.7.21. There is also a directory containing sample cases in /home/apps/chpc/compmech/FDS/FDS-6.7.9/Examples. To use these versions, load the following modules:

module load chpc/compmech/FDS/FDS-6.7.9
module load chpc/compmech/FDS/SMV-6.7.21

This will give you access to two fds executables:

fds has been compiled without support for OpenMP and should be used with MPI only
fds_openmp has been compiled with OpenMP support and can be used for single node and hybrid parallel runs

If you intend using MPI parallel, it will be necessary to provide a machinefile for this version, which has not been compiled with one of the system MPI compilers. A typical command line will thus be:

mpirun -np 48 -machinefile $PBS_NODEFILE fds box_burn_away1.fds > fds.out

If you want to run Smokeview on a compute node, it will be necessary to load a module for Mesa software rendering:

module load chpc/compmech/mesa/20.2.2_swr

Sample PBS script for version 6.7.9

runFDS679.qsub

#!/bin/bash
## Lines starting with the # symbol are comments, unless followed by ! or PBS, 
##  in which case they are directives
## The following PBS directive requests two 24 core compute nodes
##  The $PBS_NODEFILE file will contain the hostnames for 16 MPI processes (2 X 8, as per mpiprocs)
##  This is only meaningful if your model contains 16 grids
##  The number of grids must match the number of MPI processes
#PBS -l select=2:ncpus=24:mpiprocs=8 -q normal
## Specify your own project shortcode here
#PBS -P MECH1234
## The walltime should be a small overestimate of the expected run time
##  Requesting a very long walltime may delay the start of your job
##  If the requested walltime is too short, the job will be killed before it is finished
#PBS -l walltime=6:00:00
##  Obviously use your own paths here
#PBS -e /mnt/lustre/users/jblogs/FDS_Runs/stderr.txt
#PBS -o /mnt/lustre/users/jblogs/FDS_Runs/stdout.txt
export PBS_JOBDIR=/mnt/lustre/users/jblogs/FDS_Runs
cd $PBS_JOBDIR
module load chpc/compmech/FDS/FDS-6.7.9 
## Assign a sensible value for OMP_NUM_THREADS
##  If your number of MPI processes is a multiple of ##  24, it will be best to set it to 1,
##  and not use OpenMP at all
##  A value greater than 3 does not help
export OMP_NUM_THREADS=3
## The number of MPI processes is extracted from the length of the machinefile $PBS_NODEFILE
nproc=`cat $PBS_NODEFILE | wc -l`
mpirun -np $nproc -machinefile $PBS_NODEFILE fds_openmp FDS_inputFile.fds > fds.out

Sample PBS script for version 6.7.0

runFDS.qsub

#!/bin/bash
## Lines starting with the # symbol are comments, unless followed by ! or PBS, 
##  in which case they are directives
## The following PBS directive requests two 24 core compute nodes
##  The $PBS_NODEFILE file will contain the hostnames for 16 MPI processes (2 X 8, as per mpiprocs)
##  This is only meaningful if your model contains 16 grids
##  The number of grids must match the number of MPI processes
#PBS -l select=2:ncpus=24:mpiprocs=8 -q normal
## Specify your own project shortcode here
#PBS -P MECH1234
## The walltime should be a small overestimate of the expected run time
##  Requesting a very long walltime may delay the start of your job
##  If the requested walltime is too short, the job will be killed before it is finished
#PBS -l walltime=6:00:00
##  Obviously use your own paths here
#PBS -e /mnt/lustre/users/jblogs/FDS_Runs/stderr.txt
#PBS -o /mnt/lustre/users/jblogs/FDS_Runs/stdout.txt
export PBS_JOBDIR=/mnt/lustre/users/jblogs/FDS_Runs
cd $PBS_JOBDIR
module load chpc/compmech/FDS/6.7.0
## Assign a sensible value for OMP_NUM_THREADS
##  If your number of MPI processes is a multiple of ##  24, it will be best to set it to 1,
##  and not use OpenMP at all
##  A value greater than 3 does not help
export OMP_NUM_THREADS=3
## The number of MPI processes is extracted from the length of the machinefile $PBS_NODEFILE
nproc=`cat $PBS_NODEFILE | wc -l`
mpirun -np $nproc fds FDS_inputFile.fds > fds.out

Postprocessing

Smokeview is also installed on the system, and can be accessed with the command smokeview or smv. This can be done on one of the visualisation nodes chpcviz1 or chpclic1. Please read the instructions on setting up a VNC connection, and run Smokeview with the VirtualGL wrapper /opt/VirtualGL/bin/vglrun smokeview.

If you want to run Smokeview on a compute node, please read the instructions at https://wiki.chpc.ac.za/howto:remote_viz#getting_a_virtual_desktop_on_a_compute_node. You will need to load a mesa module to enable software rendering.

It may also be practical to use the very well-developed visualisation codes Paraview or VisIt. Please experiment and provide feedback.

Parallel Scaling

FDS implements two forms of parallisation, using OpenMP threads as well as MPI-based domain decomposition.

OpenMP provides only modest improvement in performance, but has the advantage of also working with a single grid model. Going from 1 OpenMP thread to 2 provides a modest but helpful improvement, and going to 3 threads will provide another very small improvement. More than 3 OpenMP threads do not provide more improvement.

MPI parallel will only work if the model has been set up in such a way that the number of grids is equal to the number of MPI processes. Somewhat confusingly, the code will still run if this condition is not satisfied, but not efficiently. If there are more MPI processes than grids, the extra MPI processes will start and consume CPU resources, but not do any useful work. If there are more grids than MPI processes, the slowdown is quite dramatic. MPI parallel scaling is very good, provided that the number of grids match the number of MPI processes, and are all similarly dimensioned. The compute nodes in the Lengau cluster have 24 cores each. Good MPI scaling and efficiency is therefore achieved by developing models where the number of grids is a multiple of 12 or 24. Underloading the compute nodes, by running say 12 MPI processes, each with two OpenMP threads, will achieve the best results, at the expense of occupying more nodes. This is a typical characteristic of the performance of any CFD code, which is strongly constrained by memory bandwidth. Maximum performance is achieved by accessing the largest number of memory channels.

wiki.chpc.ac.za

Table of Contents

Running the open-source Fire Dynamics Simulator FDS on the CHPC system

Introduction

Sample PBS script for version 6.7.9

Sample PBS script for version 6.7.0

Postprocessing

Parallel Scaling

Scaling graphs

wiki.chpc.ac.za

User Tools

Site Tools

Table of Contents

Running the open-source Fire Dynamics Simulator FDS on the CHPC system

Introduction

Sample PBS script for version 6.7.9

Sample PBS script for version 6.7.0

Postprocessing

Parallel Scaling

Scaling graphs

Page Tools