User Tools

Site Tools


guide:moab

This is an old revision of the document!


A Guide to the Moab Job Scheduler at the CHPC

To submit jobs to run on CHPC systems, you must write a shell script that will run your job, and then use msub to put it on the queue. msub needs information, like which cluster to run your code on, and where to put any errors or output from the job. This extra information can be put within the script, disguised from the shell as special comments, or put on the command line when submitting the job.

SUN HYBRID SYSTEM

PARTITIONS

The Sun system is divided into partitions where each partition represents a separate physical system:

  1. dell for the new Dell Intel Westmere cluster
  2. westmere for the Sun Intel Westmere cluster
  3. harpertown for the Intel Harpertown cluster
  4. nehalem for the Intel Nahalem cluster
  5. sparc for the Sparc M9000 SMP
  6. viz for the AMD Opteron visualization node
  7. test for the test (pre- and post-processing) nodes (also Intel Nehalem chipsets)

Users can specify different cluster partitions by adding a -l feature flag in their job submission script or when running the msub command. For example

msub -l feature=nehalem

You are able to choose more than one partition, provided, of course, your code can execute on the requested hardware. The syntax looks like:

msub -l feature=nehalem|harpertown

to use both nehalem and harpertown clusters.

Sun and Dell Westmere Clusters

The new Westmere clusters are now open to be used and we hope that this will reduce the waiting time and the number of queuing jobs. The new Westmere clusters have their own partitions called “westmere” for the Sun Westmere cluster and “dell” for the Dell Westmere cluster.

NB: The new Dell cluster runs a different O/S from the Sun clusters. Please see the special instructions in the Dell section of this user guide.

Please note that each node of Westmere has 12 cores, so please change the processes per node value from ppn=8 to ppn=12 only when you run on the westmere or del partition.

To run on one of the Westmere clusters specify the westmere or dell partition as follows:

For example, in your script

#MSUB -l feature=westmere
#MSUB -l nodes=1:ppn=12 

Or on the command line

msub -l feature=westmere -l nodes=1:ppn=12 <scriptname>

See the Dell section of this user guide for information on using the new Dell Westmere cluster.

QUEUES

The Moab scheduler has several different queues on the Sun systems for different sized and priority jobs:

  1. small (min processors=1 , max processors = 8 , max walltime = 336 hours)
  2. par32 (min processors=9 , max processors = 32 , max walltime = 336 hours)
  3. par64 ( min processors=33 ,max processors = 64 , max walltime = 336 hours)
  4. big ( min processors=65,max processors = 128 , max walltime = 336 hours)
  5. test (min processors=1 , max processors = 8 , max walltime = 10 minutes)
  6. interactive (max processors = 2 )
  7. special ( min processors=129 , max processors = 512 , max walltime = 336hours )
  8. priority ( min processors=129 , max processors = 768 , max walltime = 336hours )

NOTE: Users need to log a call at http://www.chpc.ac.za/apply-additional/ two weeks in advance if they need to use the special or priority queues.

Note that it is not necessary for users to specify queues since moab will automatically choose the correct queue for you from the information supplied in your job script e.g. using “msub -l nodes=1: ppn=8 and msub -l walltime”

Job script example:

MPI job

Edit the file with, for example, vi mpi.job

#/bin/sh
#MSUB -l nodes=1:ppn=8
#MSUB -l feature=nehalem|harpertown
#MSUB -l walltime=3:00:00
#MSUB -m be
#MSUB -V
#MSUB -o /export/home/username/scratch/stdout
#MSUB -e /export/home/username/scratch/stderr
#MSUB -d /export/home/username/scratch
#MSUB -mb
##### Running commands
exe=/opt/gridware/usersexecutable
nproc=`cat $PBS_NODEFILE | wc -l`
mpirun -np $nproc -machinefile $PBS_NODEFILE $exe > users.out

To submit this job (the mpi.job file) use the msub command:

msub mpi.job

By default the job will be submitted to small queue since it only requests 8 processors.

NB: Your job MUST be located on the scratch partition, otherwise it will fail to run.

Interactive job

You have to specify -I flag when submitting your job, i.e.

msub -I

There will be a delay while the scheduler allocates and logs you into a free compute node. You can target a specific free node:

msub -I -l nodes=cnode-9-23

Test jobs

Users can use the same script and just change the walltime to 10 minutes or less and then change

#MSUB -l feature=nehalem|harpertown

to

#MSUB -l feature=test

in the job script.

GPU CLUSTER

PARTITIONS

  1. C2070 (8 of these Nvidia Fermi GPU cards are available on two nodes)
  2. C1060 (12 of these Nvidia Tesla GPU cards are available on three nodes)

Note that GPU cluster has 5 compute nodes and each node has 16 processors and 4 GPUs. In addition the head node (login node) has one each of C2070 and C1060 GPU cards.

Job script example:

MPI job over infiniband

Edit with vi mpi.job as before.

###These lines are for Moab
#MSUB -l nodes=2:ppn=16:gpus=4
#MSUB -l feature=c2070
#MSUB -l walltime=168:00:00
#MSUB -m be
#MSUB -V
#MSUB -o /GPU/home/username/stdout
#MSUB -e /GPU/home/username/stderr
#MSUB -d /GPU/home/username
#MSUB -mb
##### Running commands
echo "original machine file is:"
echo "++++++++++"
cat $PBS_NODEFILE
echo "++++++++++"
cat $PBS_NODEFILE|sed -e 's/.*/&-ib/'>$PBS_STEP_OUT.hostfile
echo "modified machine file is:"
echo "++++++++++"
cat $PBS_STEP_OUT.hostfile
exe=location of the executable
nproc=`cat $PBS_NODEFILE | wc -l`
cd /GPU/home/username/testjob/
mpirun $nproc $exe > /GPU/home/username/testjob/OUTPUT

Type msub mpi.job at the Linux command line to submit your job.

In this example we have requested four C2070 GPUs and 32 processors

(Obsolete) BlueGene/P CLUSTER

The BlueGene has been decommissioned and is no longer available.

Users can only use 32, 64, 128, 256 or 512 processors (ppn).

Job script example:

MPI job

The mpi.job file contains:

###These lines are for Moab
#MSUB -l ppn=128
#MSUB -l walltime=168:00:00
#MSUB -m be
#MSUB -V
#MSUB -o /CHPC/work/username/testjob/out
#MSUB -e /CHPC/work/username/testjob/err
#MSUB -d /CHPC/work/username/testjob/
#MSUB -mb
##### Running commands
/bgsys/drivers/ppcfloor/bin/mpirun -np 128 \
-mode VN -exe LOCATIONOFTHEEXECUTABLE

As before, use msub mpi.job to submit the job.

Moab Summary

SUBMITTING A JOB USING MOAB

Create a test job using a text editor: the job script file is a shell script file and should be a plain Ascii text file with Unix style line end characters. The easiest way to create the file is to use one of the standard Linux text editors, for example, vi or vim:

vi test.job

to create a file called test.job. Note that you do not have to use the extension .job for your file but it is a very useful convention that allows you to easily distinguish job script files from other script files (which end in .sh by convention).

Include the following in your script, for more info check http://www.clusterresources.com To get the definitions, and details of other flags, of the MSUB statements run the command man msub online version.

###These lines are for Moab
#MSUB -l nodes=4:ppn=8
#MSUB -l walltime=2:00:00
#MSUB -m be
#MSUB -o /export/home/nmonama/scratch/test/dlpoly.3.07.out
#MSUB -e /export/home/nmonama/scratch/test/dlpoly.3.07.err
#MSUB -d /export/home/nmonama/scratch/test
#MSUB -mb
#MSUB -M nmonama@csir.co.za

##### Running commands
NP=`cat $PBS_NODEFILE | wc –l`
mpirun –np $NP –machinefile $PBS_NODEFILE DLPOLY_3.07.SPARC.Y

Details of each line of the job script:

SCRIPT	       Description/Ntes

#MSUB -a 	       Declares the time after which the job is eligible for execution. Syntax: (brackets delimit optional items with the default being current date/time):
#MSUB -A account     Defines the account associated with the job.
#MSUB -e	       Defines the file name to be used for stderr.
#MSUB -d path	       Specifies the directory in which the job should begin executing.
#MSUB -h	       Put a user hold on the job at submission time.
#MSUB -j oe	       Combine stdout and stderr into the same output file
#MSUB -l string      Defines the resources that are required by the job
#MSUB -m options     Defines the set of conditions (a=abort,b=begin,e=end) when the server will send a mail message about the job to the user
#MSUB -N name	       Gives a user specified name to the job
#MSUB -o filename    Defines the file name to be used for stdout.
#MSUB -p priority    Assigns a user priority value to a job
#MSUB -q queue       Run the job in the specified queue (short.q,long.q,graphics.q,serial.q and interactive.q)
#MSUB -r y	       Automatically rerun the job is there is a system failure
#MSUB -S path	       Specifies the shell which interprets the job script. The default is your login shell.
#MSUB -v list	       Specifically adds a list (comma separated) of environment variables that are exported to the job
#MSUB -V    	       Declares that all environment variables in the msub environment are exported to the batch job.
#MSUB -W	       This option has been deprecated and should be ignored.

To submit the test job script use the following on the Linux command line:

msub test.job

NB: Your job should always write its output to scratch. It will be faster if it reads its input from there as well.

There are various controls of Moab and Loadleveler that can help you to manage your job:

Some of Moab controls are as follows:

Command Description
msub Scheduler job submission
showq Show queued jobs
canceljob Cancel job
showres Show existing reservations
showstart Show estimate of when job can or will start
checkjob Provide report for specified job
/var/www/wiki/data/attic/guide/moab.1374158432.txt.gz · Last modified: 2013/07/18 16:40 by ischeepers