User Tools

Site Tools


scaling:start

Scaling

It is extremely important to verify that your code/problem will scale properly in the cluster before running big jobs. So here is a simple example of how one might perform such a check…

First one should start with a relatively small problem – say one that would take 10 minutes or so on a single node1).

Then one submits the jobs on increasing numbers of nodes and observes to see if the runtime comes down as expected. As an example of how you may go about this I have two scripts – one is the PBS job script, and the second is a simple shell script that allows me to submit the PBS script with different parameters.

scaling_test.qsub
#!/bin/bash
#PBS -l walltime=10:00:00
#PBS -q specialq
#PBS -M dkennedy1@csir.co.za
#PBS -m be
#PBS -V
#PBS -e /export/home/username/scratch5/scaling_test/stderr.txt
#PBS -o /export/home/username/scratch5/scaling_test/stdout.txt
#PBS -mb
 
MODULEPATH=/opt/gridware/bioinformatics/modules:$MODULEPATH
source /etc/profile.d/modules.sh
module add gromacs/4.6.5_nehalem
 
NP=`cat ${PBS_NODEFILE} | wc -l`
export OMP_NUM_THREADS=1
 
EXE="mdrun_mpi"
ARGS="-deffnm md -g md.${NP}.log"
 
cd /export/home/username/scratch5/scaling_test
mpirun -np ${NP} -machinefile ${PBS_NODEFILE} ${EXE} ${ARGS}

and

submit_test.sh
#!/bin/bash
 
first=1
 
for i in 1 2 4 8 10 16 20 32 40 64
do
  select="select=${i}:ncpus=8:mpiprocs=8:jobtype=nehalem,place=excl"
  name="${i}_g_scale"
  if [ ${first} -eq 1 ]
  then
    previous=`qsub -l ${select} -N ${name} scaling_test.qsub`
    first=0
  else
    current=`qsub -l ${select} -N ${name} -W depend=afterok:${previous} scaling_test.qsub`
    previous=${current}
  fi
done

When I look at the results I see that the corresponding walltimes are:

Number of nodes Runtime (seconds)
1 344
2 181
4 105
8 55
10 43
16 30
20 25
32 21
40 15
64 13
1) This may not always be appropriate – sometimes large jobs are needed before parallelism can be seen properly
/var/www/wiki/data/pages/scaling/start.txt · Last modified: 2014/12/11 17:12 by dane