User Tools

Site Tools


acelab:user_guide

CMC User guide

This guide is intended for CHPC stuff and special users.

Overview

The CMC cluster is built using Dell servers. The system consist of a Power Edge M1000e Chassis with 16 M6220 compute nodes. Each of the compute nodes have 12 cores and 36 GB memory. There are 12 management nodes (mix of Power Edge R330's, R430's and R630) which includes NFS and Lustre management servers. All the servers are interconnected through QDR 40 Gb/s infiniband.

Logging In

Domain names are not configured yet on the CHPC network. Logging in done using IP addresses.
login01 [10.128.24.153]
login02 [10.128.24.133]

ssh username@10.128.24.153

It is advisable to change the default password using passwd command as soon as possible.

Shared Filesystems

The new cluster has both NFS and the Lustre filesystems over Infiniband:

Mount point File System Size
/home NFS 2.5 TB
/mnt/lustre/users Lustre 9.8 TB
/apps NFS 1.5 TB

Software

Software resides in /apps which is an NFS file system mounted on all nodes:

/apps/ Description Comment
chpc/ Application codes supported by CHPC (See below)
compilers/ Compilers, other programming languages and development tools
libs/ Libraries
scripts/ Modules and other environment setup scripts
tools/ Miscellaneous software tools
user/ Code installed by a special user research programme

Application Codes Scientific Domains

/apps/chpc/ Scientific Domain
astro/ Astrophysics & Cosmology
bio/ BioInformatics
chem/ Chemistry
compmech/ Mechanics
cs/ Computer Science
earth/ Earth
image/ Image Processing
material Material Science
phys/ Physics
space/ Space

NB: Not all applications that exist on lengau exist on the CMC. Users are expected to install their codes in the appropriate directory when they are evaluating/testing a code.

Modules

CHPC uses the GNU modules utility, which manipulates your environment, to provide access to the supported software in /apps/.

Each of the major CHPC applications has a modulefile that sets, unsets, appends to, or prepends to environment variables such as $PATH, $LD_LIBRARY_PATH, $INCLUDE, $MANPATH for the specific application. Each modulefile also sets functions or aliases for use with the application. You need only to invoke a single command to configure the application/programming environment properly. The general format of this command is:

module load <module_name>

where <module_name> is the name of the module to load. It also supports Tab-key completion of command parameters.

For a list of available modules:

module avail

The module command may be abbreviated and optionally be given a search term, eg.:

module ava chpc/open

To see a synopsis of a particular modulefile's operations:

module help <module_name>

To see currently loaded modules:

module list

To remove a module:

module unload <module_name>

To remove all modules:

module purge

To search for a module name or part of a name

module-search  partname  

After upgrades of software in /apps/, new modulefiles are created to reflect the changes made to the environment variables.

Disclaimer: Codes in /apps/user/ are not supported by the CHPC and the TE for each research programme is required to create the appropriate module file or startup script.

Compilers

Supported compilers for C, C++ and Fortran are found in /apps/compilers along with interpreters for programming languages like Python.

For MPI programmes, the appropriate library and mpi* compile scripts are also available.

GNU Compiler Collection

The default gcc compiler is 6.1.0:

login2:~$ which gcc
/cm/local/apps/gcc/6.1.0/bin/gcc
login2:~$ gcc --version
gcc (GCC) 6.1.0

To use any other version of gcc you need to remove 6.1.0 from all paths with

module purge

before loading any other modules.

The recommended combination of compiler and MPI library is GCC 5.1.0 and OpenMPI 1.8.8 and is accessed by loading both modules:

module purge
module add gcc/5.1.0
module add chpc/openmpi/1.8.8/gcc-5.1.0

Scheduler

The CHPC cluster uses PBSPro as its job scheduler. With the exception of interactive jobs, all jobs are submitted to a batch queuing system and only execute when the requested resources become available. All batch jobs are queued according to priority. A user's priority is not static: the CHPC uses the “Fairshare” facility of PBSPro to modify priority based on activity. This is done to ensure the finite resources of the CHPC cluster are shared fairly amongst all users.

Queues

workq is no longer to be used.

The available queues are:

Queue Name Max. cores Min. cores Max. jobs Max. time Notes Access
per job in queue running hrs
serial 12 1 ??? ??? 48 For single-node non-parallel jobs.
smp 12 1 20 10 96 For single-node parallel jobs.
normal 48 24 20 10 48 The standard queue for parallel jobs
large 72 48 10 2 96 For large parallel runs Restricted
bigmem 72 48 10 2 48 For large memory parallel runs Restricted
test 12 1 1 1 3 Normal nodes, for testing only

Notes:

  • A standard compute node has 12 cores and 36 GiB of memory (RAM).
  • Additional restrictions:
Queue Name Max. total simultaneous running cores
normal 48
large 72

PBS Pro commands

qstat View queued jobs.
qsub Submit a job to the scheduler.
qdel Delete one of your jobs from queue.

Job script parameters

Parameters for any job submission are specified as #PBS comments in the job script file or as options to the qsub command. The essential options for the CHPC cluster include:

 -l select=10:ncpus=12:mpiprocs=12

sets the size of the job in number of processors:

select=N number of nodes needed.
ncpus=N number of cores per node
mpiprocs=N number of MPI ranks (processes) per node
 -l walltime=4:00:00

sets the total expected wall clock time in hours:minutes:seconds. Note the wall clock limits for each queue.

The job size and wall clock time must be within the limits imposed on the queue used:

 -q normal

to specify the queue.

Restricted queues

The large and bigmem queues are restricted to users who have need for them. If you are granted access to these queues then you should specify that you are a member of the largeq or bigmemq groups. For example:

#PBS -q large
#PBS -W group_list=largeq

or

#PBS -q bigmem
#PBS -W group_list=bigmemq
/var/www/wiki/data/pages/acelab/user_guide.txt · Last modified: 2018/07/27 17:59 by smasoka