User Tools

Site Tools


workshops:hpcschool:chpchardware

CHPC Hardware

In Week 2 of the Supercomputing MOOC you are introduced to the EPCC's super computer Archie. In this section we describe the CHPC's super computer: Lengau.

Lengau

The main compute resource at the CHPC is the Lengau cluster super computer, that provides a total of 32 832 cores in Intel Xeon CPUs and over 140 TiB of RAM altogether.

As a cluster super computer, Lengau is a distributed memory parallel computer comprising 1368 general compute nodes linked by an Inifiniband network as its interconnect.

The components of a cluster are:

  • nodes
  • interconnect
  • storage
  • operating system
  • management software

Nodes

32 832 cores

The main body of the cluster comprises 1368 compute nodes, of which 1008 nodes have 128 GiB of RAM, and 300 nodes have 64 GiB of RAM.

A compute node is the individual general purpose computer that is responsible for the computational work of the cluster. All the compute nodes are connected together by the interconnect: a high speed network. Working together by communicating over the interconnect, several compute nodes can be used to calculate a numerical solution to a problem too large for a single computer.

The compute nodes on Lengau each consist of a Dell server blade which are packed 24 at a time together into a chassis. Three chassis are stacked together in one rack. A rack is the large fridge-like free-standing cabinet that houses all the equipment making up the cluster.

Multiplying it all out:

one rack = 3 × chassis = 3 × (24 blades) = 3 × 24 × (two 12-core CPUs) = 3 × 24 × 24 cores = 3 × 576 cores = 1728 cores.

There are 11 racks with 128 GiB of RAM and 8 racks with 64 GiB, giving the total:

19 × 1728 cores = 32832 cores.

Benchmark:

Lengau was commissioned in June 2016 and, before any users were added to the system, the HPL (High Performance LINPACK) benchmark as used by the Top500 list was run. The result (using just the general compute nodes):

Rpeak = 1.307 PFlops
Rmax = 1.029 PFlops

Other nodes:

In addition to the general compute nodes, Lengau also has special purpose nodes. These are divided into

  • large memory compute nodes
  • GPU compute nodes
  • interactive login nodes
  • interactive visualisation nodes
  • interactive file transfer nodes
  • batch file transfer nodes
  • management nodes
  • storage nodes

These are described in separate notes on the course page.

Interconnect

Lengau actually has three networks connecting all the nodes together:

  • the main interconnect is Infiniband,
  • the second interconnect is an Ethernet network,
  • and there is a management network.

Infiniband

The Infiniband interconnect consists of a network of leaf switches connected to the main core switch. Each chassis connects its blades to an Infiniband leaf switch. Thus there are three leaf switches in a rack. Each leaf switch has 12 connections to the cluster's core switch. This results is a 2:1 tree network topology.

Lengau uses FDR Infiniband running with a bandwidth of 56 Gbps and a latency of 5 μs.

Ethernet

All nodes are also connected to a separate Ethernet network running at 1 Gbps with latency of 50 μs.

Management

A special separate management network is used to control the hardware at a physical level: it is mostly used for power control and administration. This network is invisible to the users.

Storage

The third component of a parallel distributed memory super computer is its storage. Lengau has two storage systems that are connected over the two networks.

Lustre Parallel File System

is connected over the Inifiniband network and provides high speed parallel file IO. Lustre comprises two Meta Data Servers (MDS), 16 Object Store Servers (OSS) and 96 Object STore Targets (OST).

The MDS provides file meta-data, the OSSs store the actual file contents in the OSTs. Each OST is a disk array (using RAID).

In total there is 4 PiB of storage space in Lengau's Lustre file system.

NFS

is used for home directories and software in /apps. This is much smaller than Lustre and slower since it is a serial file system.

Operating System

All nodes run the CENTOS Linux operating system. For this reason it is essential that you are familiar with the Linux shell commands and its command line interface (CLI). This is covered in the next section of this course.

Cluster Management Software

The scheduler on the Lengau cluster is Altair's PBSPro. This will be a major part of this course in the first week.

/app/dokuwiki/data/pages/workshops/hpcschool/chpchardware.txt · Last modified: 2023/03/01 12:21 by wikiadmin