It is extremely important to verify that your code/problem will scale properly in the cluster before running big jobs. So here is a simple example of how one might perform such a check…
First one should start with a problem that can be completed quickly – say one that would take 10 minutes or so on a single node. The problem should not be trivially small – large jobs are needed before parallelism can be seen properly. If possible, run for fewer iterations or timesteps.
Then one submits the jobs on increasing numbers of nodes and observes to see if the runtime comes down as expected. As an example of how one may go about this, here are two scripts – one is the PBS job script, and the second is a simple shell script that allows the user to submit the PBS script with different parameters.
#!/bin/bash #PBS -l walltime=10:00:00 #PBS -q normal #PBS -M YOUR@EMAIL.ADDRES #PBS -m be #PBS -V #PBS -e /mnt/lustre/users/USERNAME/scaling_test/test1.out #PBS -o /mnt/lustre/users/USERNAME/scaling_test/test1.err #PBS -mb module add ### MODULES NEEDED NP=`cat ${PBS_NODEFILE} | wc -l` export OMP_NUM_THREADS=1 EXE="mdrun_mpi" ARGS="-deffnm md -g md.${NP}.log" cd /mnt/lustre/users/USERNAME/scaling_test mpirun -np ${NP} -machinefile ${PBS_NODEFILE} ${EXE} ${ARGS}
and
#!/bin/bash first=1 for i in 1 2 4 8 10 do select="select=${i}:ncpus=24:mpiprocs=24:nodetype=haswell_reg,place=excl" name="${i}_g_scale" if [ ${first} -eq 1 ] then previous=`qsub -l ${select} -N ${name} scaling_test.qsub` first=0 else current=`qsub -l ${select} -N ${name} -W depend=afterok:${previous} scaling_test.qsub` previous=${current} fi done
The (fake) results are represented here as a table:
Number of nodes | Runtime (seconds) |
---|---|
1 | 344 |
2 | 181 |
4 | 105 |
8 | 55 |
10 | 43 |
These results are best interpreted in the form of a graph:
This graph looks OK (the run time comes down as the number of nodes goes up), but is actually quite difficult to interpret, in terms of finding the optimum number of nodes to use. It is much more useful to plot the reciprocal of the runtime, such as the number of runs per hour:
This clearly shows that the application is scaling linearly. Adding more nodes will almost certainly make it run faster.
The following two graphs illustrate a case which does not scale well. The runtime graph does not look as good as the first example, but it is not immediately clear where the scaling starts worsening. The “Runs per hour” graph provides much better insight.
In this case there is definitely no point in trying to use more than 8 nodes, and the scaling is already showing diminishing returns after 2 nodes. In this case it may be more efficient to use fewer nodes per run, but submit more simultaneous runs.