User Tools

Site Tools


quick:pbspro

Job submission using PBSPro

As of January 2014, CHPC has a new scheduler - PBSPro.

Main user documentation can be found at http://resources.altair.com/pbs/documentation/support/PBSProUserGuide12.1.pdf

But here is a summary of how to submit jobs at CHPC.

Common PBS commands

  • qsub submit a batch job
  • qstat see all jobs and their status
  • qdel delete a job
  • qalter modify options on a pending job
  • qsig send a signal to a job (eg. to terminate it)

How to submit a batch job

Use the qsub command to submit a script file to the scheduler. PBS-Pro jobs can be controlled using command line switches, or directives in the job submission file prefaced by the #PBS keyword.

See the qsub man page for a complete list of options.

The first line of a script file may specify the interpreter, eg., bash is selected by using #!/bin/bash as the first line of the job script file.

A script can include just about anything which you could do during a terminal session such as set environment variables, change directories, and move files.

qsub myscriptname.sub

How to start an interactive job on a compute node

qsub -I -V

to start an interactive session, retaining all environment variables.

How to monitor job submissions to the queue

The job number will be returned if the job has been accepted by the scheduler. Use this number to check on the progress of a specific job:

qstat -f <jobnum>

To inspect jobs that have already finished:

qstat -x -f <jobnum>

or

qstat -xf <jobnum>

How to monitor job arrays

To see the array job status on the queue, postfix the jobnumber with empty square brackets:

qstat -J <jobnum>[]

To look at one specific job in the array, add -t to the qstat arguments:

qstat -xft <jobnum>[array-index]

eg

qstat -xft 967890[2]

PBS-Pro Job Exit Codes

Job Exit Codes Between 0 and 128 (or 256)

This is the exit value of the top process in the job, typically the shell. This may be the exit value of the last command executed in the shell or the .logout script if the user has such a script (csh).

Job Exit Codes >= 128 (or 256)

This means the job was killed by the PBS scheduler by sending it a signal.

The modulo or remainder operator http://en.wikipedia.org/wiki/Modulo_operation is used to interpret (or “decode”) PBS job exit values.

The signal is given by X modulo 128 (or 256). An exit value of 137 means the job's top process was killed with signal 9 because 137 mod 128 = 9.

Depending on the system, a value greater than 128 or 256, see man wait(2), is the signal that was sent by PBS-Pro to kill the job. (i.e., 265 mod 256 = 9 or 271 mod 256 = 15.)

Exit code 271 means that the job exceeded the limit requested at submission, eg. walltime or cpu usage, and was killed.

See man kill for a mapping of signal numbers to signal name on your operating system.

Terminate a job by signal

Send a signal to the job

qsig -s SIGTERM <jobnum>

Examples: PBS-Pro job submission scripts

Examples: Gaussian jobs

Examples: Amber jobs

Example: Quantum Espresso

Example 4 : empty job

#! /bin/sh
#PBS -N <jobname>
#PBS -l select=<numberofnodes>:ncpus=<totalcores>:jobtype=<architecture>:place=free:group=nodetype
#PBS -l walltime=<hours>:<minutes>:<seconds>
#PBS -q <queue>
#PBS -m be
#PBS -o <path to your output file>
#PBS -e <path to your error file>
source /etc/profile.d/modules.sh   # so module command works
cd $PBS_O_WORKDIR   # change to working directory

#Where
#<jobname> = name of the job
#<totalcores>= total number of cores
#<hosts> = number of hosts
#<architecture> = architecture where the jobs will run (nehalem, westmere, dell)
#<queue> = submission queue ( workq , priorityq , specialq , intel_mic , spark , kepla_k20 , accelrys)
#<executable> = Binary to be run

mpirun -np <totalcores> <executable>

Note that the M9000 sparc queue is called 'spark'

Example 5 : MPI example

Syntax

...
#PBS -l walltime=$HOURS:$MINUTES:$SECONDS
#PBS -l select=${NO_OF_HOSTs}:ncpus=${CPUs_PER_HOST}:mpiprocs=${CPUs_PER_HOST}:jobtype=$FEATURE
#PBS -q $QUEUE
...

Example 6 : job array

See here for a longer explanation.

simple_job_array.pbs
#! /bin/bash
#PBS -l select=1:ncpus=1
#PBS -l walltime=00:01:00
#PBS -o /lustre/SCRATCH5/users/username/simple_jobarray.stderr
#PBS -e /lustre/SCRATCH5/users/username/simple_jobarray.stderr
#PBS -M youremail@address.com
#PBS -m abe
#PBS -N J_example
#PBS -J 1-24
 
 
# Make sure we're in the right working directory
cd /export/home/username/scratch5/
 
#sleep a random number of seconds (<10)
sleep $[ ( $RANDOM % 10 )  + 1 ]s
 
#see what happens when we write to a file...
echo "Wawaweewaa! This is sub-job ${PBS_ARRAY_INDEX} of job ${PBS_JOBID} running on ${HOSTNAME} run by ${USER} writing to stdout!"
echo "Wawaweewaa! This is sub-job ${PBS_ARRAY_INDEX} of job ${PBS_JOBID} running on ${HOSTNAME} run by ${USER} writing to stderr!" 1>&2
echo "Wawaweewaa! This is sub-job ${PBS_ARRAY_INDEX} of job ${PBS_JOBID} running on ${HOSTNAME} run by ${USER} appending to simple_jobarray.txt" >> simple_jobarray.txt

Thing to note: the stdout and stderr outputs do not all get joined together as they might with an mpi job.

Further examples

  • Various bioinformatics examples illustrate some further possibilities. Specifically the blast examples illustrate job arrays and job dependencies.
  • This gromacs example shows an interesting case where mpi, openmp and the gpus are used in a single job.
/var/www/wiki/data/pages/quick/pbspro.txt · Last modified: 2015/09/17 12:28 by andyr