Parallel computing relies on passwordless ssh access between compute nodes. If passwordless ssh does not work, nothing else will. Therefore, if you find yourself with jobs that simply won't run, or even provide helpful error messages, first check that your ssh keys are correct. You can do this very simply be testing if you can ssh from the login node into another service node without supplying a password:
ssh login1
or
ssh chpcviz1
for example. If you cannot get in without a password, you have a problem which first has to be corrected. The first thing to do is to check the permissions of the files in your $HOME/.ssh
directory. There should be rw
access for you only. It can be corrected with the command
chmod 0600 ~/.ssh/*
You may want to do yourself the favour of checking with the commercial cloud providers how much they charge for data storage. Once you have picked yourself up off the floor, set about managing your level of usage of the CHPC's free Lustre storage resource.
Remember that the CHPC Lustre is not intended for long-term storage - it is temporary workspace, is limited and has been designed to be fast rather than reliable. The CHPC's official policies allow us to remove data that has not been used in the preceding 90 days. Please be pro-active about managing your data before we do it for you. Without first asking your permission. In order to get a list of files that have not been accessed in the last 90 days, use the find
command:
find -type f -atime +90 -printf "%h/%f, %s, %AD \n"
which will produce a csv table with the path, size in bytes and last time of access. To make it even easier for yourself, simply delete these files automagically with find:
find -type f -atime +90 -delete
If you are unfamiliar with the Linux command line, it can be painful to find your way around your files and directories. Consider using the Gnu Midnight Commander. Here is a short video which demonstrates its use on the cluster. It is available on Lengau by way of a module:
module load chpc/mc/4.8.17
It is tempting to put echo statements in your .bashrc file, so that you can get a handy heads-up when you log in that some environment variables are being set. Please do not do this. The reason is that it breaks scp, which expects to see its protocol data over the stdin/stdout channels. There are ways around this (you can use Google, right?), but unless you know what you are doing, just don't put echo statements in a .bashrc file.
Under certain conditions, such as when recovering from system faults, PBS may rerun jobs that had previously been interrupted. Depending on your particular setup, this is not necessarily beneficial. If, for example, your software regularly writes restart data, but by default starts a run from scratch unless otherwise specified, you definitely want to suppress rerunning, because it would overwrite existing results. Add the following line to your PBS directives in the job script:
#PBS -r n
On the other hand, if your software is set up to resume automatically from the last data written, PBS should be permitted to rerun the process:
#PBS -r y
The PBS scheduler cleans up each compute node on the completion of a job. However, under certain conditions, it is possible for the scheduler to be unaware of rogue processes left on compute nodes. These may interfere with other user process. If your job is running significantly slower than expected, it may be worth checking for the presence of rogue processes. This can be done quite easily by adding the following lines to your job script, preferably just before launching the actual compute process.
myname=`whoami` for host in `cat $PBS_NODEFILE | sort -u` ; do echo $host ` ssh $host ps hua -N -u root,apache,dbus,nslcd,ntp,rpcuser,rpc,$myname ` ; done;
This will produce a list of your compute nodes, together with any processes not belonging to yourself or the system. If you do find compute processes belonging to other users, you should log into the compute node concerned, and run top
to see if your processes are suffering as a result of the rogue process. Also submit a helpdesk ticket to inform the system administrators.
If you have created an ASCII file on Windows, and transferred the file to the cluster, you may experience a subtle problem. The background to this is that DOS (and thus also Windows) terminates each line in a text file with both a carriage return and a linefeed character. Unix (and thus also Linux) uses a linefeed character only. Some Linux applications have a built-in way of handling this difference, but most don't. A PBS script that has not been corrected will produce output that looks like /bin/bash^M bad interpreter: no such file or directory.
This problem is trivially easy to fix. Simply run the utility dos2unix on the offending file. for example:
dos2unix runMyJob.pbs
Running the utility on a file that has already been converted will do no damage to it, and attempting to run it on a binary file will result in a warning message. There is also a unix2dos utility that can convert a file back to Windows format. These utilities are available by default on the login and visualisation nodes, but not the compute nodes where you will need to load a module instead:
module load chpc/compmech/dos2unix
Most codes use ASCII input or run script files. These may or may not be affected by this problem, but if you get weird or unexpected behaviour, run dos2unix on all the ASCII files.
If you have a lot of files in sub-directories, use the command
find . -type f -print0 | xargs -0 dos2unix
to recursively go through your directories and change all the files.
The Linux operating system can deal with directory and file names containing spaces. This does not mean that your command line instruction, script or application is going to handle it correctly. The simple answer is “DON'T”. Also do not expect any sympathy from CHPC staff if you have used spaces and cannot find the reason why your job is not working correctly. For that matter, don't use double dots either. If you are having difficulties, and we see something that looks like this
My File..data
you will be entitled to a severe reprimand.
By default, unused ssh sessions time out after about 20 minutes. You can keep your ssh session alive by following the instructions on How to geek. In summary, in your ~/.ssh directory on your workstation, create a file called config. This file should contain the following lines, note the indent on the second line:
Host * ServerAliveInterval 60
If you are using MobaXterm to access the cluster, follow the menus to “Settings - Configuration - SSH - SSH-Settings” to activate the SSH Keepalive option, as per this example.
The CHPC follows a fairly strict policy on CPU usage on the login node. As a consequence, any moderately intensive task, such as unzipping a large file, will be killed very quickly. In order to get around this problem, use an interactive session for more or less everything. The syntax for obtaining an interactive session is:
qsub -X -I -l select=1:ncpus=4:mpiprocs=4 -q serial -P MECH1234 -l walltime=3:00:00
The -X
option turns on X-forwarding.
Take note of the following:
alias qsubi="qsub -X -I -l select=1:ncpus=4:mpiprocs=4 -q serial -P MECH1234 -l walltime=3:00:00"
Now typing in the command qsubi
will do the job for you.
ncpus=24:mpiprocs=24 -q smp
-l select=2:ncpus=24:mpiprocs=24 -q normal
, for example, which will give you two complete nodes. This way you can test in interactive mode if your program runs in distributed parallel mode. You will need to know which nodes you have got …..
cat $PBS_NODEFILE
will give you the contents of your machinefile.
qsub -X -I -l select=1:ncpus=8:mpiprocs=8:ngpus=1 -q gpu_1 -P MECH1234
The usual ssh-session and interactive PBS sessions do not by default support any graphics. If you need to run a software package with a GUI (many pre-processors, for example), you need a session with graphics capability. Here are some ways of getting this:
ssh -X
into lengau, then get an interactive PBS session with X-forwarding, as described in the previous section on Interactive PBS sessions. Thanks to the wonders of Mesa and software rendering, quite sophisticated graphics processing may be done this way. Look for the Mesa modules if you need OpenGL-capable software to run in this manner.PuTTY is widely used, and also has an easy to use interface for setting up ssh-tunnels. However, MobaXterm also works extremely well, and has a number of additional advantages, such as:
There are also other options such as Cygwin, which give you Linux functionality on a Windows system. It is also fairly straightforward to set up the Windows Subsystem for Linux, which allows you to install and run a Linux distribution directly inside Windows. This has some similarities to running a Linux Virtual Machine on a Windows computer. These three methods all provide you with a useful environment to experiment with and test things in Linux, but are definitely overkill if you just need an ssh client.
Many of the CHPC's users work with WinSCP, which offers powerful file transfer and management options. However, we find that its ssh-client is rather cumbersome.
Large amounts of data should be transferred by means of Globus, which provides a GUI for managing your data transfers, although there is also an API which can be used to script file transfers. Globus is based on the GridFTP protocol, and is faster and more robust than scp-based methods.
SanRen has a facility for staging fairly large quantities of data. Please take a look at the FileSender web page.
Command line scp and rsync are the usual methods for data transfer of smaller files. Remember to transfer files to scp.chpc.ac.za rather than lengau.chpc.ac.za However, it is easy to make mistakes, and you need to have the path right. MobaXterm (see above) has an easy to use “drag & drop” interface. FileZilla is fast, easy to use and runs on Linux, Windows and OSX.
A different option is to use sshfs to mount your lengau directory directly on your workstation. Take a look at these instructions. In summary:
You may have difficulties submitting a PBS job from a compute node, although as the cluster is currently configured, it generally does work. However, it is possible to have an ssh command in a PBS script, so the obvious solution if you experience difficulties, is to ssh to the login node in order to submit another PBS script, if you wanted to submit another PBS script at the completion of the current one, you could insert a line like this at the end of your first script:
ssh login2 qsub /mnt/lustre/users/jblogs/scripts/PBS_Script1
There are several situations where you may only want one job to run after completion of another. It may be to manage the load on a limited software license pool, or the first job may be a pre-processing step for the subsequent one, or the second job could be dependent on the data file written by the first one. One solution is to submit the second job from the PBS script of the first one, as described above. An alternative method is to use the depend option in PBS:
jobid=`qsub runJob01` jobid=`echo $jobid | awk 'match($0,/[0-9]+/){print substr($0,RSTART,RLENGTH)}'` qsub -W depend=afterany:$jobid runJob2
Add the following PBS directive to your submit-script:
#PBS -W group_list=largeq
It becomes difficult for the scheduler to fit in jobs that require very long wall times. It is instructive to think of the scheduler's task as a game of Tetris. It has to fit in contiguous blocks of varying length (time) and width (number of nodes). Very long narrow blocks are difficult to fit in without wasting a lot of space (computing resources). For these reasons, the CHPC's policies do not permit very long wall times. We prefer users to submit jobs that are effectively parallelized, and can therefore finish more quickly by using more simultaneous computing resources. If you do have a job that requires a very long wall time, use either a secondary qsub from your job script (see the paragraph “How to qsub from a compute node”) or alternatively a dependent qsub (see the paragraph “Submit one script after completion of another”). Both of these methods assume that your code can write restart files. If your code cannot write restart files, you have a serious problem which can be resolved in one of three ways:
In order to improve the efficiency of very long single node jobs, a new queue has been introduced with effect from 13 December 2018. The seriallong
queue has a walltime limit of 144 hours, but can only be used with less than 13 cores. Because it results in node-sharing, it is mandatory to provide a memory resource request. The relevant lines in a PBS script should look something like this:
#PBS -l select=1:ncpus=8:mpiprocs=8:mem=24gb #PBS -q seriallong #PBS -l walltime=120:00:00
An HPC cluster consists of a very large number of compute nodes, and statistics dictate that larger numbers of components result in more failures. When doing especially large queue runs, your chances of encountering a faulty node are significant. Some software will immediately crash and terminate the job, but others may simply hang up. In the case of a faulty node, it is obviously better for the run to terminate immediately and send helpful diagnostics to the system administrators. This can be implemented easily by adding the following lines to your job script, immediately above the actual run
command:
module add chpc/healthcheck/0.2 healthcheck -v || exit 1 module del chpc/healthcheck/0.2
The -v
option will provide helpful diagnostics, but can be omitted if you want to avoid a substantial amount of additional output. You may also want to check for rogue or zombie processes which may slow down your calculations.
Under certain unusual circumstances, a job can turn into a zombie job, which sits in the queue with “R” status, occupies resources, but does not actually run or produce meaningful output. Zombies resist killing by means of the qdel
command, and need to be terminated with extreme prejudice. Use qdel -W force
followed by the job number in order to accomplish this. The job will exit with status E
, and, true to the tradition of zombies, linger on with this status in the visible queue for a while longer.
Your job/s may be queued for various reasons. When the cluster is oversubscribed, such as when there are loadshedding cycles and we do not have sufficient generator capacity, a major reason is that there large number of users' jobs waiting in the queues. However, it is important to be aware that your job/s may be queued because your Research Programme (RP) allocation has run out and your Principal Investigator (PI) needs to provide 6 monthly feedback and/or contact your CHPC support scientist. Also possible is that you have specified a job which cannot ever run. Please check your jobs and queued jobs on the cluster using:
qstat -n1awu my_userid qstat -f myqueued_jobid1 myqueued_jobid2 |grep comment
For example for one of your queued jobs you may see something like::
qstat -f 5015940.sched01 |grep comment comment = Not Running: Server per-project limit reached on resource ncpus
This indicates that your job is queued because your RP allocation has expired or has run out of cpu hours and your PI needs to submit feedback.
Other job comment messages include:
comment = Not Running: Insufficient amount of resource: nodetype comment = Not Running: Insufficient amount of resource: ncpus comment = Not Running: Insufficient amount of resource: ngpus (R: 2 A: 1 T: comment = Not Running: User has reached queue smp running job limit. comment = Not Running: User has reached queue normal running job limit. comment = Not Running: User has reached queue serial running job limit. comment = Not Running: User has reached queue seriallong running job limit. comment = Not Running: User has reached queue gpu_1 running job limit.
The first 3 messages indicate indicate that there are not enough resources of the particular types. The last 5 messages indicate that the user (you if these your own job numbers) have other jobs in the specified queue which have reached the limit of number of jobs per user in this queue.
If you see “Insufficient amount of resource” then it is worth checking whether your jobs do correctly request the resources.
qstat -f 4123211.sched01 |grep "List.select" Resource_List.select = 1:ncpus=24:mpiprocs=24:mem=999gb qstat -f 4123211.sched01 |grep "queue" queue = smp
This example indicates a job on the smp queue requiring one node, 24 cpu's/cores, 24 mpi processes, and memory of 999Gb. Note that the standard nodes on Lengau have either 64Gb or 128Gb, so this memory request is inappropriate and will never run. In fact in terms of memory on such nodes one should specify 56gb or 120gb since each node needs memory for system processes.
If you are running Materials Studio jobs (accelrys) you will likely need to first look up you job number on the cluster. MS does give you a job identification name, so for example:
qstat |grep My_MS_jobname qstat|grep MS_L2FXD 5046144.sched01 MS_L2FXD accelrys 0 Q accelrys
Then in this example the job number is 5046144.sched01, so thereafter please follow the procedure above.