User Tools

Site Tools


howto:tipsandtricks

This is an old revision of the document!


Tips and Tricks

Allowing or preventing rerunning

Under certain conditions, such as when recovering from system faults, PBS may rerun jobs that had previously been interrupted. Depending on your particular setup, this is not necessarily beneficial. If, for example, your software regularly writes restart data, but by default starts a run from scratch unless otherwise specified, you definitely want to suppress rerunning, because it would overwrite existing results. Add the following line to your PBS directives in the job script:

#PBS -r n

On the other hand, if your software is set up to resume automatically from the last data written, PBS should be permitted to rerun the process:

#PBS -r y

Checking for zombie processes

The PBS scheduler cleans up each compute node on the completion of a job. However, under certain conditions, it is possible for the scheduler to be unaware of rogue processes left on compute nodes. These may interfere with other user process. If your job is running significantly slower than expected, it may be worth checking for the presence of rogue processes. This can be done quite easily by adding the following lines to your job script, preferably just before launching the actual compute process.

myname=`whoami`
for host in `cat  $PBS_NODEFILE | sort -u` ; 
 do 
  echo $host ` ssh $host ps hua -N -u root,dbus,nslcd,ntp,rpcuser,rpc,$myname ` ; 
 done;

This will produce a list of your compute nodes, together with any processes not belonging to yourself or the system. If you do find compute processes belonging to other users, you should log into the compute node concerned, and run top to see if your processes are suffering as a result of the rogue process. Also submit a helpdesk ticket to inform the system administrators.

Please slow down and work methodically

  • Do NOT bring a complicated script from your own system and insist on trying to run it as is on 10 cluster nodes for your first cluster run. It has NEVER worked for anybody else, and it is NOT going to work for you.
  • Start by obtaining an interactive session.
  • Work through your process in step-by-step fashion and fix your problems at each step before advancing to the next stage.
  • Do not attempt to solve your entire problem at the first attempt. It does not work. It WILL break, leaving you with a complicated mess to untangle, and it will unendear you to the CHPC staff tasked with sorting out YOUR mess.
  • Start with a smaller and simplified version of your problem, and satisfy yourself that it works on a single node.
  • Once you are happy that it works on a single node, try it on two nodes.
  • Do not move on before you have proved to yourself that it is working as expected, and is faster than a single node.
  • Add complexity and compute nodes only once you are totally satisfied that EVERYTHING is working properly.
  • Please bear in mind that you also need to prove to the CHPC that you are competent to run very large cases. You can only do this by starting with small cases and demonstrating that you can run them efficiently.
  • Now, and only now, may you start thinking of working on automating your process. It might just work.

The DOS vs Unix end of line character problem

If you have created an ASCII file on Windows, and transferred the file to the cluster, you may experience a subtle problem. The background to this is that DOS (and thus also Windows) terminates each line in a text file with both a carriage return and a linefeed character. Unix (and thus also Linux) uses a linefeed character only. Some Linux applications have a built-in way of handling this difference, but most don't. A PBS script that has not been corrected will produce output that looks like “/bin/bash^M bad interpreter: no such file or directory”. This problem is trivially easy to fix. Simply run the utility dos2unix on the offending file. for example:

 dos2unix runMyJob.pbs 

Running the utility on a file that has already been converted will do no damage to it, and attempting to run it on a binary file will result in a warning message. There is also a unix2dos utility that can convert a file back to Windows format. These utilities are available on the login and visualisation nodes, but not the compute nodes.

Most codes use ASCII input or run script files. These may or may not be affected by this problem, but if you get weird or unexpected behaviour, run dos2unix on all the ASCII files.

Using directory and file names containing spaces

The Linux operating system can deal with directory and file names containing spaces. This does not mean that your command line instruction, script or application is going to handle it correctly. The simple answer is “DON'T”. Also do not expect any sympathy from CHPC staff if you have used spaces and cannot find the reason why your job is not working correctly. For that matter, don't use double dots either. If you are having difficulties, and we see something that looks like this

 My File..data   

you will be entitled to a severe reprimand.

Keeping your ssh login sessions alive

By default, unused ssh sessions time out after about 20 minutes. You can keep your ssh session alive by following the instructions on How to geek. In summary, in your ~/.ssh directory on your workstation, create a file called config. This file should contain the following line:

 ServerAliveInterval 60  

Using an interactive PBS session

The CHPC follows a fairly strict policy on CPU usage on the login node. As a consequence, any moderately intensive task, such as unzipping a large file, will be killed very quickly. In order to get around this problem, use an interactive session for more or less everything. The syntax for obtaining an interactive session is:

 qsub -I -l select=1:ncpus=4:mpiprocs=4 -q serial -P MECH1234 -l walltime=3:00:00   

Take note of the following:

  • Obviously use your project short code instead of MECH1234.
  • Yes, it is tedious to type that in every time. Edit your .bashrc file, and define an alias as follows: alias qsubi=“qsub -I -l select=1:ncpus=4:mpiprocs=4 -q serial -P MECH1234 -l walltime=3:00:00” . Now typing in the command qsubi will do the job for you.
  • Please customize the command to suit your requirements. If you are going to need it all day, use walltime=8:00:00 . In this example we are asking for 4 processes. You can ask for more or less, depending on what you need to do. If you want a full node, use ncpus=24:mpiprocs=24.
  • You can use -l select=2:ncpus=24:mpiprocs=24 -q normal, for example, which will give you two complete nodes. This way you can test in interactive mode if your program runs in distributed parallel mode. You will need to know which nodes you have got ….. cat $PBS_NODEFILE will give you the contents of your machinefile.
  • Once you have an interactive session, you can also ssh into that node separately from the login node. This is very handy, because you can now get multiple sessions with different environments without having to exit and restart an interactive PBS session.

Running software with GUI's

The usual ssh-session and interactive PBS sessions do not by default support any graphics. If you need to run a software package with a GUI (many pre-processors, for example), you need a session with graphics capability. There are two ways of getting this:

  1. Use a VNC session to connect to one of the two visualization nodes, as per the instructions on Remote visualization. Keep in mind that it is possible and practical to get a VNC session directly on a compute node, without using one of the dedicated visualization nodes.
  2. Use X-forwarding. This is only a realistic option if you are on a fast connection to the CHPC. ssh -X into lengau, then ssh -X from there to your compute node that you already have interactive PBS session on (see above). Thanks to the wonders of Mesa and software rendering, quite sophisticated graphics processing may be done this way. Look for the Mesa modules if you need OpenGL-capable software to run in this manner.
  3. You can also get an X-capable interactive PBS session by appending -X to your qsub -I instruction. This will only work if your ssh-session into the login node has X-forwarding turned on, that is, ssh -X user@lengau.chpc.ac.za or ssh -Y user@lengau.chpc.ac.za

Windows ssh clients

PuTTY is widely used, and also has an easy to use interface for setting up ssh-tunnels. However, MobaXterm also works extremely well, and has a number of additional advantages, such as:

  • Multiple tabs
  • Remembering passwords
  • X-forwarding that (mostly) works
  • Convenient graphical interface for setting up ssh-tunnels
  • A file explorer for transferring files
  • Linux-style mouse-button bindings

Transferring files to the cluster

Command line scp and rsync are the usual methods for data transfer. However, it is easy to make mistakes, and you need to have the path right. MobaXterm (see above) has an easy to use “drag & drop” interface. FileZilla is fast, easy to use and runs on Linux, Windows and OSX. A different option is to use sshfs to mount your lengau directory directly on your workstation. There is a Windows sshfs client that sort-of works. Sometimes.

How to qsub from a compute node

You may have difficulties submitting a PBS job from a compute node. However, it is possible to have an ssh command in a PBS script, so the obvious solution is to ssh to the login node in order to submit another PBS script, if you wanted to submit another PBS script at the completion of the current one, you could insert a line like this at the end of your first script:

ssh login2 qsub /mnt/lustre/users/jblogs/scripts/PBS_Script1

Submit one script after completion of another

There are several situations where you may only want one job to run after completion of another. It may be to manage the load on a limited software license pool, or the first job may be a pre-processing step for the subsequent one, or the second job could be dependent on the data file written by the first one. One solution is to submit the second job from the PBS script of the first one, as described above. An alternative method is to use the depend option in PBS:

jobid=`qsub runJob01`
jobid=`echo $jobid | awk 'match($0,/[0-9]+/){print substr($0,RSTART,RLENGTH)}'`
qsub -W depend=afterany:$jobid runJob2

Using Large Queue

Add the following PBS directive to your submit-script:

#PBS -W group_list=largeq

How to run jobs that require very long wall times

It becomes difficult for the scheduler to fit in jobs that require very long wall times. It is instructive to think of the scheduler's task as a game of Tetris. It has to fit in contiguous blocks of varying length (time) and width (number of nodes). Very long narrow blocks are difficult to fit in without wasting a lot of space (computing resources). For these reasons, the CHPC's policies do not permit very long wall times. We prefer users to submit jobs that are effectively parallelized, and can therefore finish more quickly by using more simultaneous computing resources. If you do have a job that requires a very long wall time, use either a secondary qsub from your job script (see the paragraph “How to qsub from a compute node”) or alternatively a dependent qsub (see the paragraph “Submit one script after completion of another”). Both of these methods assume that your code can write restart files. If your code cannot write restart files, you have a serious problem which can be resolved in one of three ways:

  • If it is your own code, implement restart files immediately. What on earth are you trying to achieve by doing multi-day runs without a restart capability?
  • If it is an open-source code, take on the task of implementing a restart capability.
  • If it is a commercial code, inform the code developer that a restart capability is essential.

In order to improve the efficiency of very long single node jobs, a new queue has been introduced with effect from 13 December 2018. The seriallong queue has a walltime limit of 144 hours, but can only be used with less than 13 cores. Because it results in node-sharing, it is mandatory to provide a memory resource request. The relevant lines in a PBS script should look something like this:

#PBS -l select=1:ncpus=8:mpiprocs=8:mem=24gb 
#PBS -q seriallong
#PBS -l walltime=120:00:00

Check health of compute nodes before starting the run

An HPC cluster consists of a very large number of compute nodes, and statistics dictate that larger numbers of components result in more failures. When doing especially large queue runs, your chances of encountering a faulty node are significant. Some software will immediately crash and terminate the job, but others may simply hang up. In the case of a faulty node, it is obviously better for the run to terminate immediately and send helpful diagnostics to the system administrators. This can be implemented easily by adding the following lines to your job script, immediately above the actual run command:

module add chpc/healthcheck/0.2
healthcheck -v || exit 1
module del chpc/healthcheck/0.2

The -v option will provide helpful diagnostics, but can be omitted if you want to avoid a substantial amount of additional output. You may also want to check for rogue or zombie processes which may slow down your calculations.

Dealing with zombies

Under certain unusual circumstances, a job can turn into a zombie job, which sits in the queue with “R” status, occupies resources, but does not actually run or produce meaningful output. Zombies resist killing by means of the qdel command, and need to be terminated with extreme prejudice. Use qdel -W force followed by the job number in order to accomplish this. The job will exit with status E, and, true to the tradition of zombies, linger on with this status in the visible queue for a while longer.

/var/www/wiki/data/attic/howto/tipsandtricks.1562227781.txt.gz · Last modified: 2019/07/04 10:09 by ccrosby