This guide is intended for CHPC stuff and special users.
The CMC cluster is built using Dell servers. The system consist of a Power Edge M1000e Chassis with 16 M6220 compute nodes. Each of the compute nodes have 12 cores and 36 GB memory. There are 12 management nodes (mix of Power Edge R330's, R430's and R630) which includes NFS and Lustre management servers. All the servers are interconnected through QDR 40 Gb/s infiniband.
Domain names are not configured yet on the CHPC network. Logging in done using IP addresses.
login01 [10.128.24.153]
login02 [10.128.24.133]
ssh username@10.128.24.153
It is advisable to change the default password using passwd
command as soon as possible.
The new cluster has both NFS and the Lustre filesystems over Infiniband:
Mount point | File System | Size |
---|---|---|
/home | NFS | 2.5 TB |
/mnt/lustre/users | Lustre | 9.8 TB |
/apps | NFS | 1.5 TB |
Software resides in /apps
which is an NFS file system mounted on all nodes:
/apps/ … | Description | Comment |
---|---|---|
chpc/ | Application codes supported by CHPC | (See below) |
compilers/ | Compilers, other programming languages and development tools | |
libs/ | Libraries | |
scripts/ | Modules and other environment setup scripts | |
tools/ | Miscellaneous software tools | |
user/ | Code installed by a special user research programme |
/apps/chpc/ … | Scientific Domain |
---|---|
astro/ | Astrophysics & Cosmology |
bio/ | BioInformatics |
chem/ | Chemistry |
compmech/ | Mechanics |
cs/ | Computer Science |
earth/ | Earth |
image/ | Image Processing |
material | Material Science |
phys/ | Physics |
space/ | Space |
NB: Not all applications that exist on lengau
exist on the CMC. Users are expected to install their codes in the appropriate directory when they are evaluating/testing a code.
CHPC uses the GNU modules utility, which manipulates your environment, to provide access to the supported software in /apps/
.
Each of the major CHPC applications has a modulefile that sets, unsets, appends to, or prepends to environment variables such as $PATH, $LD_LIBRARY_PATH, $INCLUDE, $MANPATH for the specific application. Each modulefile also sets functions or aliases for use with the application. You need only to invoke a single command to configure the application/programming environment properly. The general format of this command is:
module load <module_name>
where <module_name> is the name of the module to load. It also supports Tab-key completion of command parameters.
For a list of available modules:
module avail
The module command may be abbreviated and optionally be given a search term, eg.:
module ava chpc/open
To see a synopsis of a particular modulefile's operations:
module help <module_name>
To see currently loaded modules:
module list
To remove a module:
module unload <module_name>
To remove all modules:
module purge
To search for a module name or part of a name
module-search partname
After upgrades of software in /apps/
, new modulefiles are created to reflect the changes made to the environment variables.
Disclaimer: Codes in /apps/user/
are not supported by the CHPC and the TE for each research programme is required to create the appropriate module file or startup script.
Supported compilers for C, C++ and Fortran are found in /apps/compilers
along with interpreters for programming languages like Python.
For MPI programmes, the appropriate library and mpi*
compile scripts are also available.
The default gcc compiler is 6.1.0:
login2:~$ which gcc /cm/local/apps/gcc/6.1.0/bin/gcc login2:~$ gcc --version gcc (GCC) 6.1.0
To use any other version of gcc you need to remove 6.1.0 from all paths with
module purge
before loading any other modules.
The recommended combination of compiler and MPI library is GCC 5.1.0 and OpenMPI 1.8.8 and is accessed by loading both modules:
module purge module add gcc/5.1.0 module add chpc/openmpi/1.8.8/gcc-5.1.0
The CHPC cluster uses PBSPro as its job scheduler. With the exception of interactive jobs, all jobs are submitted to a batch queuing system and only execute when the requested resources become available. All batch jobs are queued according to priority. A user's priority is not static: the CHPC uses the “Fairshare” facility of PBSPro to modify priority based on activity. This is done to ensure the finite resources of the CHPC cluster are shared fairly amongst all users.
workq
is no longer to be used.
The available queues are:
Queue Name | Max. cores | Min. cores | Max. jobs | Max. time | Notes | Access | |
---|---|---|---|---|---|---|---|
per job | in queue | running | hrs | ||||
serial | 12 | 1 | ??? | ??? | 48 | For single-node non-parallel jobs. | |
smp | 12 | 1 | 20 | 10 | 96 | For single-node parallel jobs. | |
normal | 48 | 24 | 20 | 10 | 48 | The standard queue for parallel jobs | |
large | 72 | 48 | 10 | 2 | 96 | For large parallel runs | Restricted |
bigmem | 72 | 48 | 10 | 2 | 48 | For large memory parallel runs | Restricted |
test | 12 | 1 | 1 | 1 | 3 | Normal nodes, for testing only |
Queue Name | Max. total simultaneous running cores |
---|---|
normal | 48 |
large | 72 |
qstat | View queued jobs. |
qsub | Submit a job to the scheduler. |
qdel | Delete one of your jobs from queue. |
Parameters for any job submission are specified as #PBS
comments in the job script file or as options to the qsub
command. The essential options for the CHPC cluster include:
-l select=10:ncpus=12:mpiprocs=12
sets the size of the job in number of processors:
select=N | number of nodes needed. |
ncpus=N | number of cores per node |
mpiprocs=N | number of MPI ranks (processes) per node |
-l walltime=4:00:00
sets the total expected wall clock time in hours:minutes:seconds. Note the wall clock limits for each queue.
The job size and wall clock time must be within the limits imposed on the queue used:
-q normal
to specify the queue.
The large
and bigmem
queues are restricted to users who have need for them. If you are granted access to these queues then you should specify that you are a member of the largeq
or bigmemq
groups. For example:
#PBS -q large #PBS -W group_list=largeq
or
#PBS -q bigmem #PBS -W group_list=bigmemq