You may want to do yourself the favour of checking with the commercial cloud providers how much they charge for data storage. Once you have picked yourself up off the floor, set about managing your level of usage of the CHPC's free Lustre storage resource.
Remember that the CHPC Lustre is not intended for long-term storage - it is temporary workspace, is limited and has been designed to be fast rather than reliable. The CHPC's official policies allow us to remove data that has not been used in the preceding 90 days. Please be pro-active about managing your data before we do it for you. Without first asking your permission. In order to get a list of files that have not been accessed in the last 90 days, use the find
command:
find -type f -atime +90 -printf "%h/%f, %s, %AD \n"
which will produce a csv table with the path, size in bytes and last time of access. To make it even easier for yourself, simply delete these files automagically with find:
find -type f -atime +90 -delete
If you are unfamiliar with the Linux command line, it can be painful to find your way around your files and directories. Consider using the Gnu Midnight Commander. It is available on Lengau by way of a module:
module load chpc/mc/4.8.17
Under certain conditions, such as when recovering from system faults, PBS may rerun jobs that had previously been interrupted. Depending on your particular setup, this is not necessarily beneficial. If, for example, your software regularly writes restart data, but by default starts a run from scratch unless otherwise specified, you definitely want to suppress rerunning, because it would overwrite existing results. Add the following line to your PBS directives in the job script:
#PBS -r n
On the other hand, if your software is set up to resume automatically from the last data written, PBS should be permitted to rerun the process:
#PBS -r y
The PBS scheduler cleans up each compute node on the completion of a job. However, under certain conditions, it is possible for the scheduler to be unaware of rogue processes left on compute nodes. These may interfere with other user process. If your job is running significantly slower than expected, it may be worth checking for the presence of rogue processes. This can be done quite easily by adding the following lines to your job script, preferably just before launching the actual compute process.
myname=`whoami` for host in `cat $PBS_NODEFILE | sort -u` ; do echo $host ` ssh $host ps hua -N -u root,apache,dbus,nslcd,ntp,rpcuser,rpc,$myname ` ; done;
This will produce a list of your compute nodes, together with any processes not belonging to yourself or the system. If you do find compute processes belonging to other users, you should log into the compute node concerned, and run top
to see if your processes are suffering as a result of the rogue process. Also submit a helpdesk ticket to inform the system administrators.
If you have created an ASCII file on Windows, and transferred the file to the cluster, you may experience a subtle problem. The background to this is that DOS (and thus also Windows) terminates each line in a text file with both a carriage return and a linefeed character. Unix (and thus also Linux) uses a linefeed character only. Some Linux applications have a built-in way of handling this difference, but most don't. A PBS script that has not been corrected will produce output that looks like “/bin/bash^M bad interpreter: no such file or directory”. This problem is trivially easy to fix. Simply run the utility dos2unix on the offending file. for example:
dos2unix runMyJob.pbs
Running the utility on a file that has already been converted will do no damage to it, and attempting to run it on a binary file will result in a warning message. There is also a unix2dos utility that can convert a file back to Windows format. These utilities are available by default on the login and visualisation nodes, but not the compute nodes where you will need to load a module instead:
module load chpc/compmech/dos2unix
Most codes use ASCII input or run script files. These may or may not be affected by this problem, but if you get weird or unexpected behaviour, run dos2unix on all the ASCII files.
If you have a lot of files in sub-directories, use the command
find . -type f -print0 | xargs -0 dos2unix
to recursively go through your directories and change all the files.
The Linux operating system can deal with directory and file names containing spaces. This does not mean that your command line instruction, script or application is going to handle it correctly. The simple answer is “DON'T”. Also do not expect any sympathy from CHPC staff if you have used spaces and cannot find the reason why your job is not working correctly. For that matter, don't use double dots either. If you are having difficulties, and we see something that looks like this
My File..data
you will be entitled to a severe reprimand.
By default, unused ssh sessions time out after about 20 minutes. You can keep your ssh session alive by following the instructions on How to geek. In summary, in your ~/.ssh directory on your workstation, create a file called config. This file should contain the following line:
ServerAliveInterval 60
The CHPC follows a fairly strict policy on CPU usage on the login node. As a consequence, any moderately intensive task, such as unzipping a large file, will be killed very quickly. In order to get around this problem, use an interactive session for more or less everything. The syntax for obtaining an interactive session is:
qsub -X -I -l select=1:ncpus=4:mpiprocs=4 -q serial -P MECH1234 -l walltime=3:00:00
Take note of the following:
alias qsubi=“qsub -I -l select=1:ncpus=4:mpiprocs=4 -q serial -P MECH1234 -l walltime=3:00:00”
. Now typing in the command qsubi
will do the job for you. -l select=2:ncpus=24:mpiprocs=24 -q normal
, for example, which will give you two complete nodes. This way you can test in interactive mode if your program runs in distributed parallel mode. You will need to know which nodes you have got ….. cat $PBS_NODEFILE
will give you the contents of your machinefile.The usual ssh-session and interactive PBS sessions do not by default support any graphics. If you need to run a software package with a GUI (many pre-processors, for example), you need a session with graphics capability. There are two ways of getting this:
-X
to your qsub -I
instruction. This will only work if your ssh-session into the login node has X-forwarding turned on, that is, ssh -X user@lengau.chpc.ac.za
or ssh -Y user@lengau.chpc.ac.za
PuTTY is widely used, and also has an easy to use interface for setting up ssh-tunnels. However, MobaXterm also works extremely well, and has a number of additional advantages, such as:
Large amounts of data should be transferred by means of Globus, which provides a GUI for managing your data transfers. Globus is based on the GridFTP protocol, and is faster and more robust than scp-based methods.
SanRen has a facility for staging fairly large quantities of data. Please take a look at the FileSender web page.
Command line scp and rsync are the usual methods for data transfer of smaller files. Remember to transfer files to scp.chpc.ac.za rather than lengau.chpc.ac.za However, it is easy to make mistakes, and you need to have the path right. MobaXterm (see above) has an easy to use “drag & drop” interface. FileZilla is fast, easy to use and runs on Linux, Windows and OSX.
A different option is to use sshfs to mount your lengau directory directly on your workstation. Take a look at these instructions. In summary:
You may have difficulties submitting a PBS job from a compute node, although as the cluster is currently configured, it generally does work. However, it is possible to have an ssh command in a PBS script, so the obvious solution if you experience difficulties, is to ssh to the login node in order to submit another PBS script, if you wanted to submit another PBS script at the completion of the current one, you could insert a line like this at the end of your first script:
ssh login2 qsub /mnt/lustre/users/jblogs/scripts/PBS_Script1
There are several situations where you may only want one job to run after completion of another. It may be to manage the load on a limited software license pool, or the first job may be a pre-processing step for the subsequent one, or the second job could be dependent on the data file written by the first one. One solution is to submit the second job from the PBS script of the first one, as described above. An alternative method is to use the depend option in PBS:
jobid=`qsub runJob01` jobid=`echo $jobid | awk 'match($0,/[0-9]+/){print substr($0,RSTART,RLENGTH)}'` qsub -W depend=afterany:$jobid runJob2
Add the following PBS directive to your submit-script:
#PBS -W group_list=largeq
It becomes difficult for the scheduler to fit in jobs that require very long wall times. It is instructive to think of the scheduler's task as a game of Tetris. It has to fit in contiguous blocks of varying length (time) and width (number of nodes). Very long narrow blocks are difficult to fit in without wasting a lot of space (computing resources). For these reasons, the CHPC's policies do not permit very long wall times. We prefer users to submit jobs that are effectively parallelized, and can therefore finish more quickly by using more simultaneous computing resources. If you do have a job that requires a very long wall time, use either a secondary qsub from your job script (see the paragraph “How to qsub from a compute node”) or alternatively a dependent qsub (see the paragraph “Submit one script after completion of another”). Both of these methods assume that your code can write restart files. If your code cannot write restart files, you have a serious problem which can be resolved in one of three ways:
In order to improve the efficiency of very long single node jobs, a new queue has been introduced with effect from 13 December 2018. The seriallong
queue has a walltime limit of 144 hours, but can only be used with less than 13 cores. Because it results in node-sharing, it is mandatory to provide a memory resource request. The relevant lines in a PBS script should look something like this:
#PBS -l select=1:ncpus=8:mpiprocs=8:mem=24gb #PBS -q seriallong #PBS -l walltime=120:00:00
An HPC cluster consists of a very large number of compute nodes, and statistics dictate that larger numbers of components result in more failures. When doing especially large queue runs, your chances of encountering a faulty node are significant. Some software will immediately crash and terminate the job, but others may simply hang up. In the case of a faulty node, it is obviously better for the run to terminate immediately and send helpful diagnostics to the system administrators. This can be implemented easily by adding the following lines to your job script, immediately above the actual run
command:
module add chpc/healthcheck/0.2 healthcheck -v || exit 1 module del chpc/healthcheck/0.2
The -v
option will provide helpful diagnostics, but can be omitted if you want to avoid a substantial amount of additional output. You may also want to check for rogue or zombie processes which may slow down your calculations.
Under certain unusual circumstances, a job can turn into a zombie job, which sits in the queue with “R” status, occupies resources, but does not actually run or produce meaningful output. Zombies resist killing by means of the qdel
command, and need to be terminated with extreme prejudice. Use qdel -W force
followed by the job number in order to accomplish this. The job will exit with status E
, and, true to the tradition of zombies, linger on with this status in the visible queue for a while longer.