This page refers to the discontinued SUN cluster. See the page Gromacs How To for the new cluster.
This page describes how to use Gromacs 4.6.5 on the GPU nodes of sun.chpc.ac.za.
To use Gromacs add the following lines to your PBS job script:
source /etc/profile.d/modules.sh module add /opt/gridware/applications/gpu/modules/gromacs/4.6.5
In this section we describe how to run a simple Gromacs job. Run this test case to verify the installation is working correctly.
Retrieve the files from the installation directory:
cp -r /opt/gridware/applications/gpu/gromacs/4.6.5/chpc/test . cd test
Optionally, execute the following sequence of commands to configure the test interactively on a GPU node:
qsub -I -q kepla_k20 cd test grompp -f pme_verlet_vsites.mdp -c conf.gro -p topol.top -o adh_cubic_vsites_pme.tpr logout
Below is the PBS job script for executing the job on two of the six available GPU nodes:
#!/bin/bash #PBS -N test_gromacs #PBS -l select=2:ncpus=24:mpiprocs=2 #PBS -l walltime=12:00:00 #PBS -j oe #PBS -q kepla_k20 source /etc/profile.d/modules.sh module add /opt/gridware/applications/gpu/modules/gromacs/4.6.5 cd $PBS_O_WORKDIR export OMP_NUM_THREADS=12 mpirun mdrun -s adh_cubic_vsites_pme.tpr -nb gpu_cpu -gpu_id 00
NOTE: The -s
flag above specifies the input file to be run. The -nb
flag specifies that the non-bonded force calculations should be performed on a hybrid GPU-CPU combination which currently yields the best performance. You can optionally set this to gpu
if you prefer to perform the non-bonded force calculations exclusively on the GPU or to cpu
if you prefer that all calculations are run on the CPU. The parameter to the -gpu_id
flag denotes the index of the GPU device each MPI process should use. In the above case, there is one GPU device per node and two MPI processes per node. So we specify that each MPI process use the first GPU device by setting the -gpu_id
string to 00
.
Submit the job using the following command:
qsub gromacs.pbs
Your output should look similar to the following:
:-) G R O M A C S (-: Gromacs Runs On Most of All Computer Systems :-) VERSION 4.6.5 (-: Contributions from Mark Abraham, Emile Apol, Rossen Apostolov, Herman J.C. Berendsen, Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, Gerrit Groenhof, Christoph Junghans, Peter Kasson, Carsten Kutzner, Per Larsson, Pieter Meulenhoff, Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, Michael Shirts, Alfons Sijbers, Peter Tieleman, Berk Hess, David van der Spoel, and Erik Lindahl. Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2012,2013, The GROMACS development team at Uppsala University & The Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. :-) mdrun (-: Option Filename Type Description ------------------------------------------------------------ -s adh_cubic_vsites_pme.tpr Input Run input file: tpr tpb tpa -o traj.trr Output Full precision trajectory: trr trj cpt -x traj.xtc Output, Opt. Compressed trajectory (portable xdr format) -cpi state.cpt Input, Opt. Checkpoint file -cpo state.cpt Output, Opt. Checkpoint file -c confout.gro Output Structure file: gro g96 pdb etc. -e ener.edr Output Energy file -g md.log Output Log file -dhdl dhdl.xvg Output, Opt. xvgr/xmgr file -field field.xvg Output, Opt. xvgr/xmgr file -table table.xvg Input, Opt. xvgr/xmgr file -tabletf tabletf.xvg Input, Opt. xvgr/xmgr file -tablep tablep.xvg Input, Opt. xvgr/xmgr file -tableb table.xvg Input, Opt. xvgr/xmgr file -rerun rerun.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt -tpi tpi.xvg Output, Opt. xvgr/xmgr file -tpid tpidist.xvg Output, Opt. xvgr/xmgr file -ei sam.edi Input, Opt. ED sampling input -eo edsam.xvg Output, Opt. xvgr/xmgr file -j wham.gct Input, Opt. General coupling stuff -jo bam.gct Output, Opt. General coupling stuff -ffout gct.xvg Output, Opt. xvgr/xmgr file -devout deviatie.xvg Output, Opt. xvgr/xmgr file -runav runaver.xvg Output, Opt. xvgr/xmgr file -px pullx.xvg Output, Opt. xvgr/xmgr file -pf pullf.xvg Output, Opt. xvgr/xmgr file -ro rotation.xvg Output, Opt. xvgr/xmgr file -ra rotangles.log Output, Opt. Log file -rs rotslabs.log Output, Opt. Log file -rt rottorque.log Output, Opt. Log file -mtx nm.mtx Output, Opt. Hessian matrix -dn dipole.ndx Output, Opt. Index file -multidir rundir Input, Opt., Mult. Run directory -membed membed.dat Input, Opt. Generic data file -mp membed.top Input, Opt. Topology file -mn membed.ndx Input, Opt. Index file Option Type Value Description ------------------------------------------------------ -[no]h bool no Print help info and quit -[no]version bool no Print version info and quit -nice int 0 Set the nicelevel -deffnm string Set the default filename for all file options -xvg enum xmgrace xvg plot formatting: xmgrace, xmgr or none -[no]pd bool no Use particle decompostion -dd vector 0 0 0 Domain decomposition grid, 0 is optimize -ddorder enum interleave DD node order: interleave, pp_pme or cartesian -npme int -1 Number of separate nodes to be used for PME, -1 is guess -nt int 0 Total number of threads to start (0 is guess) -ntmpi int 0 Number of thread-MPI threads to start (0 is guess) -ntomp int 0 Number of OpenMP threads per MPI process/thread to start (0 is guess) -ntomp_pme int 0 Number of OpenMP threads per MPI process/thread to start (0 is -ntomp) -pin enum auto Fix threads (or processes) to specific cores: auto, on or off -pinoffset int 0 The starting logical core number for pinning to cores; used to avoid pinning threads from different mdrun instances to the same core -pinstride int 0 Pinning distance in logical cores for threads, use 0 to minimize the number of threads per physical core -gpu_id string 00 List of GPU device id-s to use, specifies the per-node PP rank to GPU mapping -[no]ddcheck bool yes Check for all bonded interactions with DD -rdd real 0 The maximum distance for bonded interactions with DD (nm), 0 is determine from initial coordinates -rcon real 0 Maximum distance for P-LINCS (nm), 0 is estimate -dlb enum auto Dynamic load balancing (with DD): auto, no or yes -dds real 0.8 Minimum allowed dlb scaling of the DD cell size -gcom int -1 Global communication frequency -nb enum gpu_cpu Calculate non-bonded interactions on: auto, cpu, gpu or gpu_cpu -[no]tunepme bool yes Optimize PME load between PP/PME nodes or GPU/CPU -[no]testverlet bool no Test the Verlet non-bonded scheme -[no]v bool no Be loud and noisy -[no]compact bool yes Write a compact log file -[no]seppot bool no Write separate V and dVdl terms for each interaction type and node to the log file(s) -pforce real -1 Print all forces larger than this (kJ/mol nm) -[no]reprod bool no Try to avoid optimizations that affect binary reproducibility -cpt real 15 Checkpoint interval (minutes) -[no]cpnum bool no Keep and number checkpoint files -[no]append bool yes Append to previous output files when continuing from checkpoint instead of adding the simulation part number to all file names -nsteps step -2 Run this number of steps, overrides .mdp file option -maxh real -1 Terminate after 0.99 times this time (hours) -multi int 0 Do multiple simulations in parallel -replex int 0 Attempt replica exchange periodically with this period (steps) -nex int 0 Number of random exchanges to carry out each exchange interval (N^3 is one suggestion). -nex zero or not specified gives neighbor replica exchange. -reseed int -1 Seed for replica exchange, -1 is generate a seed -[no]ionize bool no Do a simulation including the effect of an X-Ray bombardment on your system Reading file adh_cubic_vsites_pme.tpr, VERSION 4.6.5 (single precision) Changing nstlist from 10 to 25, rlist from 0.935 to 1.024 The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 12 Using 4 MPI processes Using 12 OpenMP threads per MPI process 1 GPU detected on host cnode-9-31: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible 1 GPU user-selected for this run. Mapping of GPUs to the 2 PP ranks in this node: #0, #0 NOTE: You assigned a GPU to multiple MPI processes. starting mdrun 'NADP-DEPENDENT ALCOHOL DEHYDROGENASE in water' 10000 steps, 50.0 ps. Writing final coordinates. Average load imbalance: 14.1 % Part of the total run time spent waiting due to load imbalance: 3.6 % NOTE: The GPU has >20% more load than the CPU. This imbalance causes performance loss, consider using a shorter cut-off and a finer PME grid. Core t (s) Wall t (s) (%) Time: 4564.670 95.654 4772.1 (ns/day) (hour/ns) Performance: 45.167 0.531 gcq#230: "She Needs Cash to Buy Aspirine For Her Pain" (LIVE)