This guide is intended for experienced HPC users and provides a summary of the essential components of the systems available at the CHPC. For more detailed information on the subjects below see the full User Guide.
docti cave
If you are a new user to the CHPC, please watch these videos:
The CHPC's Dell Linux cluster has been up and running since 2014.
The new system is an homogeneous cluster, comprising Intel 5th generation CPUs. As of March 2017 it has 1368 compute nodes with 24 cores and 128 GiB* memory (360 nodes have only 64 GiB) each, and five large memory “fat” nodes with 56 cores and 1TiB* each, all interconnected using FDR 56 Gb/s InfiniBand accessing 4 PB of shared storage over the Lustre filesystem.
* Maximum available memory on each type of node: mem=124gb
(regular) or mem=61gb
(regular with only 64GiB), and mem=1007gb
(fat).
There are 9 compute nodes that contain a total of 30 Nvidia V100 GPUs. For more information see the GPU guide.
To connect to the new systems ssh to lengau.chpc.ac.za
and log in using the username and password sent to you by the CHPC:
ssh username@lengau.chpc.ac.za
The new system is running CentOS 7.3 and uses the Bash shell by default.
You should change your password after logging in the first time.
To change your password, use the passwd
command.
Rules are: 10 characters, with at least one of the following character types: upper and lower case, numbers, and special characters. Use ssh keys wherever possible.
Once you have logged in, give some consideration to how you will be using your session on the login node. If you are going to spend a long time logged in, doing a variety of tasks, it is best to get yourself an interactive PBS session to work in. This way, if you need to do something demanding, it will not conflict with other users logged into the login node.
Many users have their login blocked at some point. Usually this is because an incorrect password was entered more times than permitted (5 times). This restriction was put in place to prevent brute-force attacks by malicious individuals who want to gain access to your account.
There are two main protocols for transferring data to and from the CHPC:
Globus is a set of tools built on the GridFTP protocol. Instructions for transferring data with Globus are provided here.
To transfer data onto or off the CHPC cluster use scp, rsync or sftp commands and connect to or from the cluster's scp node: scp.chpc.ac.za
. Please do not transfer files directly to lengau.chpc.ac.za. It is a shared login node, and in order not to overload it, file transfer should be done to the dedicated scp server. For the same reason, when copying large files from one directory to another, it is preferable to not do this on the login node. You can log into one of several other servers to do the copying:
From the command line on your Linux workstation:
scp filetocopy.tar.gz yourusername@scp.chpc.ac.za:/mnt/lustre/users/yourusername/run15/
transfers the file filetocopy.tar.gz from your disk on your computer to the Lustre file system on the CHPC cluster, under the run15/ subdirectory of your scratch directory /mnt/lustre/users/yourusername/ (where yourusername is replaced by your user name on the CHPC cluster).
You may need to download data from a server at another site. Do not do this on login2 ! Use scp.chpc.ac.za for this purpose. The easiest way of doing this is with the wget command:
wget http://someserver.someuni.ac.za/pub/somefile.tgz
Very large files may be transferred more quickly by using a multi-threaded downloader. The easiest of these is axel, see axel's GitHub page. The syntax is very simple:
module load chpc/compmech/axel/2.17.6 axel -n 4 -a http://someserver.someuni.ac.za/pub/somefile.tgz
The new cluster has both NFS and the Lustre filesystems over Infiniband:
Mount point | File System | Size | Quota | Backup | Access |
---|---|---|---|---|---|
/home | NFS | 80 TB | 15 GB | NO [1] | Yes |
/mnt/lustre/users | Lustre | 4 PB | none | NO | Yes |
/apps | NFS | 20 TB | none | Yes | No [2] |
/mnt/lustre/groups | Lustre | 1 PB | 1 TB [3] | NO | On request only |
IMPORTANT NOTE: Files older than 90 days on /mnt/lustre/users
will be automatically deleted without any warning or advance notice.
Note [1] Unfortunately, at the moment the CHPC cannot guarantee any backup of the /home
file system owing to hardware limitations.
Note [2] Create a support ticket on Helpdesk if you would like to request us to install a new application, library or programming tool.
Note [3] Access to /mnt/lustre/groups
is by application only and a quota will be assigned to the programme, to be shared by all members of that group.
It is essential that all files that your job script writes to be on Lustre, apart from scheduler errors you will lose performance because your home directory is on NFS which is not a parallel file system. It is also recommended that all files your jobs scripts reads, especially if large or read more than once, be on Lustre for the same reason.
It is usually okay to keep binaries and libraries on home since they are read once and loaded into RAM when your executable launches. But you may notice improved performance if they are also on Lustre.
The /home
file system is managed by quotas and a strict limit of 15 GB (15 000 000 000 bytes) is applied to it. Please take care to not fill up your home directory. Use /mnt/lustre/users/yourusername
to store large files. If your project requires access to large files over a long duration (more than 60 days) then please submit a request to helpdesk.
You can see how much you are currently using with the du
command:
du --si -s $HOME
Make sure that all jobs use a working directory on the Lustre file system. Do not use your home directory for the working directory of your job. Use the directory allocated to you on the fast Lustre parallel file system:
/mnt/lustre/users/USERNAME/
where USERNAME
is replace by your user name on the CHPC cluster.
Always provide the full absolute path to your Lustre sub-directories. Do not rely on a symbolic link from your home directory.
For these and other general best practices, see http://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html
Software resides in /apps
which is an NFS file system mounted on all nodes:
/apps/ … | Description | Comment |
---|---|---|
chpc/ | Application codes supported by CHPC | (See below) |
compilers/ | Compilers, other programming languages and development tools | |
libs/ | Libraries | |
scripts/ | Modules and other environment setup scripts | |
tools/ | Miscellaneous software tools | |
user/ | Code installed by a user research programme | Not supported by CHPC. |
/apps/chpc/ … | Scientific Domain |
---|---|
astro/ | Astrophysics & Cosmology |
bio/ | BioInformatics |
chem/ | Chemistry |
compmech/ | Mechanics |
cs/ | Computer Science |
earth/ | Earth |
image/ | Image Processing |
material | Material Science |
phys/ | Physics |
space/ | Space |
CHPC uses the GNU modules utility, which manipulates your environment, to provide access to the supported software in /apps/
.
Each of the major CHPC applications has a modulefile that sets, unsets, appends to, or prepends to environment variables such as $PATH, $LD_LIBRARY_PATH, $INCLUDE, $MANPATH for the specific application. Each modulefile also sets functions or aliases for use with the application. You need only to invoke a single command to configure the application/programming environment properly. The general format of this command is:
module load <module_name>
where <module_name> is the name of the module to load. It also supports Tab-key completion of command parameters.
For a list of available modules:
module avail
The module command may be abbreviated and optionally be given a search term, eg.:
module ava chpc/open
Or, more flexibly, you can pipe stderr to grep, and search for a phrase, such as mpi:
module avail 2>&1 | grep mpi
To see a synopsis of a particular modulefile's operations:
module help <module_name>
To see currently loaded modules:
module list
To remove a module:
module unload <module_name>
To remove all modules:
module purge
To search for a module name or part of a name
module-search partname
After upgrades of software in /apps/
, new modulefiles are created to reflect the changes made to the environment variables.
Disclaimer: Codes in /apps/user/
are not supported by the CHPC and the TE for each research programme is required to create the appropriate module file or startup script.
Supported compilers for C, C++ and Fortran are found in /apps/compilers
along with interpreters for programming languages like Python.
For MPI programmes, the appropriate library and mpi*
compile scripts are also available.
The default gcc compiler is 6.1.0:
login2:~$ which gcc /cm/local/apps/gcc/6.1.0/bin/gcc login2:~$ gcc --version gcc (GCC) 6.1.0
To use any other version of gcc you need to remove 6.1.0 from all paths with
module purge
before loading any other modules.
The recommended combination of compiler and MPI library is GCC 5.1.0 and OpenMPI 1.8.8 and is accessed by loading both modules:
module purge module add gcc/5.1.0 module add chpc/openmpi/1.8.8/gcc-5.1.0
The module for the Intel compiler and Intel MPI is loaded with
module load chpc/parallel_studio_xe/64/16.0.1/2016.1.150
The CHPC cluster uses PBSPro as its job scheduler. With the exception of interactive jobs, all jobs are submitted to a batch queuing system and only execute when the requested resources become available. All batch jobs are queued according to priority. A user's priority is not static: the CHPC uses the “Fairshare” facility of PBSPro to modify priority based on activity. This is done to ensure the finite resources of the CHPC cluster are shared fairly amongst all users.
workq
is no longer to be used.
The available queues with their nominal parameters are given in the following table. Please take note that these limits may be adjusted dynamically to manage the load on the system.
Queue Name | Max. cores | Min. cores | Max. jobs | Max. time | Notes | Access | |
---|---|---|---|---|---|---|---|
per job | in queue | running | hrs | ||||
serial | 23 | 1 | 24 | 10 | 48 | For single-node non-parallel jobs. | |
seriallong | 12 | 1 | 24 | 10 | 144 | For very long sub 1-node jobs. | |
smp | 24 | 24 | 20 | 10 | 96 | For single-node parallel jobs. | |
normal | 240 | 25 | 20 | 10 | 48 | The standard queue for parallel jobs | |
large | 2400 | 264 | 10 | 5 | 96 | For large parallel runs | Restricted |
xlarge | 6000 | 2424 | 2 | 1 | 96 | For extra-large parallel runs | Restricted |
express | 2400 | 25 | N/A | 100 total nodes | 96 | For paid commercial use only | Restricted |
bigmem | 280 | 28 | 4 | 1 | 48 | For the large memory (1TiB RAM) nodes. | Restricted |
vis | 12 | 1 | 1 | 1 | 3 | Visualisation node | |
test | 24 | 1 | 1 | 1 | 3 | Normal nodes, for testing only | |
gpu_1 | 10 | 1 | 2 | 12 | Up to 10 cpus, 1 GPU | ||
gpu_2 | 20 | 1 | 2 | 12 | Up to 20 cpus, 2 GPUs | ||
gpu_3 | 36 | 1 | 2 | 12 | Up to 36 cpus, 3 GPUs | ||
gpu_4 | 40 | 1 | 2 | 12 | Up to 40 cpus, 4 GPUs | ||
gpu_long | 20 | 1 | 1 | 24 | Up to 20 cpus, 1 or 2 GPUs | Restricted |
qstat -Qf
to see what the current limits are.large
and bigmem
queues is restricted and by special application only.large
queue, you will need to submit satisfactory scaling results which demonstrate that the use of more than 10 nodes is justified. This includes demonstrating that you are competent enough to run large cases. This can only be done by proving that you can run smaller cases efficiently. Queue Name | Max. total simultaneous running cores |
---|---|
normal | 240 |
large | 2400 |
qstat | View queued jobs. |
qsub | Submit a job to the scheduler. |
qdel | Delete one of your jobs from queue. |
Parameters for any job submission are specified as #PBS
comments in the job script file or as options to the qsub
command. The essential options for the CHPC cluster include:
-l select=10:ncpus=24:mpiprocs=24:mem=120gb
sets the size of the job in number of processors:
select=N | number of nodes needed. |
ncpus=N | number of cores per node |
mpiprocs=N | number of MPI ranks (processes) per node |
mem=Ngb | amount of ram per node |
-l walltime=4:00:00
sets the total expected wall clock time in hours:minutes:seconds. Note the wall clock limits for each queue.
The job size and wall clock time must be within the limits imposed on the queue used:
-q normal
to specify the queue.
Each job will draw from the allocation of cpu-hours granted to your Research Programme:
-P PRJT1234
specifies the project identifier short name, which is needed to identify the Research Programme allocation you will draw from for this job. Ask your PI for the project short name and replace PRJT1234
with it.
The large
and bigmem
queues are restricted to users who have need for them. If you are granted access to these queues then you should specify that you are a member of the largeq
or bigmemq
groups. For example:
#PBS -q large #PBS -W group_list=largeq
or
#PBS -q bigmem #PBS -W group_list=bigmemq
Using the smp
queue to run your own program called hello_mp.x
which is in the path:
#!/bin/bash #PBS -l select=1:ncpus=24:mpiprocs=24 #PBS -P PRJT1234 #PBS -q smp #PBS -l walltime=4:00:00 #PBS -o /mnt/lustre/users/USERNAME/OMP_test/test1.out #PBS -e /mnt/lustre/users/USERNAME/OMP_test/test1.err #PBS -m abe #PBS -M youremail@ddress ulimit -s unlimited cd /mnt/lustre/users/USERNAME/OMP_test nproc=`cat $PBS_NODEFILE | wc -l` echo nproc is $nproc cat $PBS_NODEFILE # Run program hello_mp.x
Using the normal
queue to run WRF:
#!/bin/bash #PBS -l select=10:ncpus=24:mpiprocs=24:nodetype=haswell_reg #PBS -P PRJT1234 #PBS -q normal #PBS -l walltime=4:00:00 #PBS -o /mnt/lustre/users/USERNAME/WRF_Tests/WRFV3/run2km_100/wrf.out #PBS -e /mnt/lustre/users/USERNAME/WRF_Tests/WRFV3/run2km_100/wrf.err #PBS -m abe #PBS -M youremail@ddress ulimit -s unlimited . /apps/chpc/earth/WRF-3.7-impi/setWRF cd /mnt/lustre/users/USERNAME/WRF_Tests/WRFV3/run2km_100 rm wrfout* rsl* nproc=`cat $PBS_NODEFILE | wc -l` echo nproc is $nproc cat $PBS_NODEFILE time mpirun -np $nproc wrf.exe > runWRF.out
Assuming the above job script is saved as the text file example.job
, the command to submit it to the PBSPro scheduler is:
qsub example.job
No additional parameters are needed for the qsub command since all the PBS parameters are specified within the job script file.
Note that in the above job script example the working directory is on the Lustre file system. Do not use your home directory for the working directory of your job. Use the directory allocated to you on the fast Lustre parallel file system:
/mnt/lustre/users/USERNAME/
where USERNAME
is replace by your user name on the CHPC cluster.
Always provide the full absolute path to your Lustre sub-directories. Do not rely on a symbolic link from your home directory.
For example, to request an MPI job on one node with 12 cores per MPI rank, so that each MPI process can launch 12 OpenMP threads, change the -l
parameters:
#PBS -l select=1:ncpus=24:mpiprocs=2:nodetype=haswell_reg
There are two MPI ranks, so mpirun -n 2
… .
To request an interactive session on a single core, the full command for qsub is:
qsub -I -P PROJ0101 -q serial -l select=1:ncpus=1:mpiprocs=1:nodetype=haswell_reg
To request an interactive session on a full node, the full command for qsub is:
qsub -I -P PROJ0101 -q smp -l select=1:ncpus=24:mpiprocs=24:nodetype=haswell_reg
Note:
-I
selects an interactive job-X
to get X-forwardingsmp
, serial
or test
select=1
smp
queue you can request several cores: ncpus=24
mpiprocs=
If you find your interactive session timing out too soon then add -l walltime=4:0:0
to the above command line to request the maximum 4 hours.