User Tools

Site Tools


howto:fds

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
howto:fds [2018/07/27 11:02]
ccrosby
howto:fds [2018/07/27 11:11]
ccrosby
Line 47: Line 47:
  
  
-OpenMP provides only modest improvement in performance,​ but has the advantage of also working with a single grid model. ​ Going from 1 OpenMP thread to 2 provides a modest ​bu helpful improvement,​ and going to 3 threads will provide another very small improvement. ​ More than 3 OpenMP threads do not provide more improvement. ​+OpenMP provides only modest improvement in performance,​ but has the advantage of also working with a single grid model. ​ Going from 1 OpenMP thread to 2 provides a modest ​but helpful improvement,​ and going to 3 threads will provide another very small improvement. ​ More than 3 OpenMP threads do not provide more improvement. ​
  
  
-MPI parallel will only work if the model has been set up in such a way that the number of grids is equal to the number of MPI processes. ​ Somewhat confusingly,​ the code will still run if this condition is not satisfied, but not efficiently. ​ If there are more MPI processes than grids, the extra MPI processes will start and consume CPU resources, but not do any useful work.  If there are more grids than MPI processes, the slowdown is quite dramatic. ​ MPI parallel scaling is very good, provided that the number of grids match the number of MPI processes, and are all similarly dimensioned.+MPI parallel will only work if the model has been set up in such a way that the number of grids is equal to the number of MPI processes. ​ Somewhat confusingly,​ the code will still run if this condition is not satisfied, but not efficiently. ​ If there are more MPI processes than grids, the extra MPI processes will start and consume CPU resources, but not do any useful work.  If there are more grids than MPI processes, the slowdown is quite dramatic. ​ MPI parallel scaling is very good, provided that the number of grids match the number of MPI processes, and are all similarly dimensioned.  The compute nodes in the Lengau cluster have 24 cores each.  Good MPI scaling and efficiency is therefore achieved by developing models where the number of grids is a multiple of 12 or 24.  Underloading the compute nodes, by running say 12 MPI processes, each with two OpenMP threads, will achieve the best results, at the expense of occupying more nodes. ​ This is a typical characteristic of the performance of any CFD code, which is strongly constrained by memory bandwidth. ​ Maximum performance is achieved by accessing the largest number of memory channels.
  
 === Scaling graphs === === Scaling graphs ===
/var/www/wiki/data/pages/howto/fds.txt · Last modified: 2020/05/21 14:36 by ccrosby