This guide is intended for experienced HPC users and provides a summary of the essential components of the systems available at the CHPC. For more detailed information on the subjects below see the full User Guide.
NOTE: the new system is still under construction and information here and in the User Guide is incomplete and subject to sudden change.
The CHPC's brand new Dell Linux cluster is up and running.
The new system is an homogeneous cluster, comprising Intel 5th generation CPUs. As of February 2016 it has 1008 compute nodes with 24 cores and 128 GiB memory each, and five large memory “fat” nodes with 56 cores and 1TiB each, all interconnected using FDR 56 GHz InfiniBand accessing 4 PB of shared storage over the Lustre filesystem.
To connect to the new systems ssh to
lengau.chpc.ac.za and log in using the username and password sent to you by the CHPC:
The new system is running CentOS 7.0 and uses the Bash shell by default.
You should change your password after logging in the first time.
To change your password, use the
Rules are: 10 characters, with at least one of the following character types: upper and lower case, numbers, and special characters. Use ssh keys wherever possible.
The new cluster has both NFS and the Lustre filesystems over Infiniband:
|Mount point||File System||Size||Quota||Backup||Access|
| ||NFS||80 TB||15 GB||Yes||Yes|
| ||Lustre||4 PB||none||NO||Yes|
| ||Lustre||1 PB||none||NO||2015 users only|
| ||NFS||20 TB||none||Yes||On request|
| ||Lustre||1 PB||none||NO||On request only|
Software resides in
/apps which is an NFS file system mounted on all nodes:
| ||Application codes supported by CHPC||(See below)|
| ||Compilers, other programming languages and development tools|
| ||Modules and other environment setup scripts|
| ||Miscellaneous software tools|
| ||Code installed by a user research programme||Not supported by CHPC.|
| ||Scientific Domain|
| ||Astrophysics & Cosmology|
| ||Computer Science|
| ||Image Processing|
| ||Material Science|
CHPC uses the GNU modules utility, which manipulates your environment, to provide access to the supported software in
Each of the major CHPC applications has a modulefile that sets, unsets, appends to, or prepends to environment variables such as $PATH, $LD_LIBRARY_PATH, $INCLUDE, $MANPATH for the specific application. Each modulefile also sets functions or aliases for use with the application. You need only to invoke a single command to configure the application/programming environment properly. The general format of this command is:
module load <module_name>
where <module_name> is the name of the module to load. It also supports Tab-key completion of command parameters.
For a list of available modules:
The module command may be abbreviated and optionally be given a search term, eg.:
module ava chpc/open
To see a synopsis of a particular modulefile's operations:
module help <module_name>
To see currently loaded modules:
To remove a module:
module unload <module_name>
After upgrades of software in
/apps/, new modulefiles are created to reflect the changes made to the environment variables.
Disclaimer: Codes in
/apps/user/ are not supported by the CHPC and the TE for each research programme is required to create the appropriate module file or startup script.
Supported compilers for C, C++ and Fortran are found in
/apps/compilers along with interpreters for programming languages like Python.
For MPI programmes, the appropriate library and
mpi* compile scripts are also available.
The recommended combination of compiler and MPI library is GCC 5.1.0 and OpenMPI 1.8.8 and is accessed by loading both modules:
module add gcc/5.1.0 module add chpc/openmpi/1.8.8/gcc-5.1.0
The module for the Intel compiler and Intel MPI is loaded with
module load chpc/parallel_studio_xe/64/16.0.1/2016.1.150
The CHPC cluster uses PBSPro as its job scheduler. With the exception of interactive jobs, all jobs are submitted to a batch queuing system and only execute when the requested resources become available. All batch jobs are queued according to priority. A user's priority is not static: the CHPC uses the “Fairshare” facility of PBSPro to modify priority based on activity. This is done to ensure the finite resources of the CHPC cluster are shared fairly amongst all users.
workq is no longer to be used.
The available queues are:
|Queue Name||Max. cores||Min. cores||Max. jobs||Max. time||Notes||Access|
|per job||in queue||running||hrs|
|serial||24||1||???||???||48||For single-node non-parallel jobs.|
|smp||24||1||20||10||96||For single-node parallel jobs.|
|normal||240||48||20||10||48||The standard queue for parallel jobs|
|large||2400||264||10||5||48||For large parallel runs||Restricted|
|bigmem||280||28||4||1||48||For the large memory (1TiB RAM) nodes.||Restricted|
|test||24||1||1||1||3||Normal nodes, for testing only|
bigmemqueues is restricted and by special application only.
|Queue Name||Max. total simultaneous running cores|
| ||View queued jobs.|
| ||Submit a job to the scheduler.|
| ||Delete one of your jobs from queue.|
Parameters for any job submission are specified as
#PBS comments in the job script file or as options to the
qsub command. The essential options for the CHPC cluster include:
sets the size of the job in number of processors:
| ||number of nodes needed.|
| ||number of cores per node|
| ||number of MPI ranks (processes) per node|
sets the total expected wall clock time in hours:minutes:seconds. Note the wall clock limits for each queue.
The job size and wall clock time must be within the limits imposed on the queue used:
to specify the queue.
Each job will draw from the allocation of cpu-hours granted to your Research Programme:
specifies the project identifier short name, which is needed to identify the Research Programme allocation you will draw from for this job. Ask your PI for the project short name and replace
PRJT1234 with it.
normal queue to run WRF:
#!/bin/bash #PBS -l select=10:ncpus=24:mpiprocs=24:nodetype=haswell_reg #PBS -P PRJT1234 #PBS -q normal #PBS -l walltime=4:00:00 #PBS -o /mnt/lustre/users/USERNAME/WRF_Tests/WRFV3/run2km_100/wrf.out #PBS -e /mnt/lustre/users/USERNAME/WRF_Tests/WRFV3/run2km_100/wrf.err #PBS -m abe #PBS -M your.email@address ulimit -s unlimited . /apps/chpc/earth/WRF-3.7-impi/setWRF cd /mnt/lustre/users/USERNAME/WRF_Tests/WRFV3/run2km_100 rm wrfout* rsl* nproc=`cat $PBS_NODEFILE | wc -l` echo nproc is $nproc cat $PBS_NODEFILE time mpirun -np $nproc wrf.exe > runWRF.out
Assuming the above job script is saved as the text file
example.job, the command to submit it to the PBSPro scheduler is:
No additional parameters are needed for the qsub command since all the PBS parameters are specified within the job script file.
Note that in the above job script example the working directory is on the Lustre file system. Do not use your home directory for the working directory of your job. Use the directory allocated to you on the fast Lustre parallel file system:
USERNAME is replace by your user name on the CHPC cluster.
For example, to request an MPI job on one node with 12 cores per MPI rank, so that each MPI process can launch 12 OpenMP threads, change the
#PBS -l select=1:ncpus=24:mpiprocs=2:nodetype=haswell_reg
There are two MPI ranks, so
mpirun -n 2 … .
To request an interactive session on a single node, the full command for qsub is:
qsub -I -P PROJ0101 -q smp -l select=1:ncpus=24:mpiprocs=24:nodetype=haswell_reg
-Iselects an interactive job
smpqueue you can request several cores:
If you find your interactive session timing out too soon then add
-l walltime=4:0:0 to the above command line to request the maximum 4 hours.