Please refer to the FDS home page for more information on FDS, an open-source code targeted at fire simulation. The current version of the code is 6.7.0, and has been installed in
'/apps/chpc/compmech/CFD/FDS'. There is a module for FDS installed on Lengau.
module load chpc/compmech/FDS/6.7.0 will set up the appropriate environment.
#!/bin/bash ## Lines starting with the # symbol are comments, unless followed by ! or PBS, ## in which case they are directives ## The following PBS directive requests two 24 core compute nodes ## The $PBS_NODEFILE file will contain the hostnames for 16 MPI processes (2 X 8, as per mpiprocs) ## This is only meaningful if your model contains 16 grids ## The number of grids must match the number of MPI processes #PBS -l select=2:ncpus=24:mpiprocs=8 -q normal ## Specify your own project shortcode here #PBS -P MECH1234 ## The walltime should be a small overestimate of the expected run time ## Requesting a very long walltime may delay the start of your job ## If the requested walltime is too short, the job will be killed before it is finished #PBS -l walltime=6:00:00 ## Obviously use your own paths here #PBS -e /home/jblogs/lustre/FDS_Runs/stderr.txt #PBS -o /home/jblogs/lustre/FDS_Runs/stdout.txt ## These two lines will send you an email on Abort, Begin and End of the job ## Obviously use your own real email address #PBS -m abe #PBS -M firstname.lastname@example.org export PBS_JOBDIR=/home/jblogs/lustre/FDS_Runs cd $PBS_JOBDIR module load chpc/compmech/FDS/6.7.0 ## Assign a sensible value for OMP_NUM_THREADS ## If your number of MPI processes is a multiple of ## 24, it will be best to set it to 1, ## and not use OpenMP at all ## A value greater than 3 does not help export OMP_NUM_THREADS=3 ## The number of MPI processes is extracted from the length of the machinefile $PBS_NODEFILE nproc=`cat $PBS_NODEFILE | wc -l` mpirun -np $nproc -machinefile $PBS_NODEFILE fds FDS_inputFile.fds > fds.out
Smokeview is also installed on the system, and can be accessed with the command
smv. However, this is only possible when using one of the visualisation nodes chpcviz1 or chpclic1. Please read the instructions on setting up a VNC connection, and run Smokeview with the VirtualGL wrapper
/opt/VirtualGL/bin/vglrun smokeview. It may also be practical to use the very well-developed visualisation codes Paraview or VisIt. Please experiment and provide feedback.
FDS implements two forms of parallisation, using OpenMP threads as well as MPI-based domain decomposition.
OpenMP provides only modest improvement in performance, but has the advantage of also working with a single grid model. Going from 1 OpenMP thread to 2 provides a modest but helpful improvement, and going to 3 threads will provide another very small improvement. More than 3 OpenMP threads do not provide more improvement.
MPI parallel will only work if the model has been set up in such a way that the number of grids is equal to the number of MPI processes. Somewhat confusingly, the code will still run if this condition is not satisfied, but not efficiently. If there are more MPI processes than grids, the extra MPI processes will start and consume CPU resources, but not do any useful work. If there are more grids than MPI processes, the slowdown is quite dramatic. MPI parallel scaling is very good, provided that the number of grids match the number of MPI processes, and are all similarly dimensioned. The compute nodes in the Lengau cluster have 24 cores each. Good MPI scaling and efficiency is therefore achieved by developing models where the number of grids is a multiple of 12 or 24. Underloading the compute nodes, by running say 12 MPI processes, each with two OpenMP threads, will achieve the best results, at the expense of occupying more nodes. This is a typical characteristic of the performance of any CFD code, which is strongly constrained by memory bandwidth. Maximum performance is achieved by accessing the largest number of memory channels.