User Tools

Site Tools


howto:gpu_gromacs

Gromacs

This page describes how to use Gromacs 4.6.5 on the GPU nodes of sun.chpc.ac.za.

Basic Usage

To use Gromacs add the following lines to your PBS job script:

source /etc/profile.d/modules.sh
module add /opt/gridware/applications/gpu/modules/gromacs/4.6.5

Test Case

In this section we describe how to run a simple Gromacs job. Run this test case to verify the installation is working correctly.

Download the test case

Retrieve the files from the installation directory:

cp -r /opt/gridware/applications/gpu/gromacs/4.6.5/chpc/test .
cd test

Configuring the test case

Optionally, execute the following sequence of commands to configure the test interactively on a GPU node:

qsub -I -q kepla_k20
cd test
grompp -f pme_verlet_vsites.mdp -c conf.gro -p topol.top -o adh_cubic_vsites_pme.tpr
logout

PBS job script

Below is the PBS job script for executing the job on two of the six available GPU nodes:

gromacs.pbs
#!/bin/bash
#PBS -N test_gromacs
#PBS -l select=2:ncpus=24:mpiprocs=2
#PBS -l walltime=12:00:00
#PBS -j oe
#PBS -q kepla_k20
 
source /etc/profile.d/modules.sh
module add /opt/gridware/applications/gpu/modules/gromacs/4.6.5
 
cd $PBS_O_WORKDIR
 
export OMP_NUM_THREADS=12
 
mpirun mdrun -s adh_cubic_vsites_pme.tpr -nb gpu_cpu -gpu_id 00

NOTE: The -s flag above specifies the input file to be run. The -nb flag specifies that the non-bonded force calculations should be performed on a hybrid GPU-CPU combination which currently yields the best performance. You can optionally set this to gpu if you prefer to perform the non-bonded force calculations exclusively on the GPU or to cpu if you prefer that all calculations are run on the CPU. The parameter to the -gpu_id flag denotes the index of the GPU device each MPI process should use. In the above case, there is one GPU device per node and two MPI processes per node. So we specify that each MPI process use the first GPU device by setting the -gpu_id string to 00.

Submit the job

Submit the job using the following command:

qsub gromacs.pbs

Expected output

Your output should look similar to the following:

expected.out
                         :-)  G  R  O  M  A  C  S  (-:
 
                  Gromacs Runs On Most of All Computer Systems
 
                            :-)  VERSION 4.6.5  (-:
 
        Contributions from Mark Abraham, Emile Apol, Rossen Apostolov, 
           Herman J.C. Berendsen, Aldert van Buuren, Pär Bjelkmar,  
     Rudi van Drunen, Anton Feenstra, Gerrit Groenhof, Christoph Junghans, 
        Peter Kasson, Carsten Kutzner, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,
 
               Berk Hess, David van der Spoel, and Erik Lindahl.
 
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
         Copyright (c) 2001-2012,2013, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.
 
         This program is free software; you can redistribute it and/or
       modify it under the terms of the GNU Lesser General Public License
        as published by the Free Software Foundation; either version 2.1
             of the License, or (at your option) any later version.
 
                                :-)  mdrun  (-:
 
Option     Filename  Type         Description
------------------------------------------------------------
  -s adh_cubic_vsites_pme.tpr  Input        Run input file: tpr tpb tpa
  -o       traj.trr  Output       Full precision trajectory: trr trj cpt
  -x       traj.xtc  Output, Opt. Compressed trajectory (portable xdr format)
-cpi      state.cpt  Input, Opt.  Checkpoint file
-cpo      state.cpt  Output, Opt. Checkpoint file
  -c    confout.gro  Output       Structure file: gro g96 pdb etc.
  -e       ener.edr  Output       Energy file
  -g         md.log  Output       Log file
-dhdl      dhdl.xvg  Output, Opt. xvgr/xmgr file
-field    field.xvg  Output, Opt. xvgr/xmgr file
-table    table.xvg  Input, Opt.  xvgr/xmgr file
-tabletf    tabletf.xvg  Input, Opt.  xvgr/xmgr file
-tablep  tablep.xvg  Input, Opt.  xvgr/xmgr file
-tableb   table.xvg  Input, Opt.  xvgr/xmgr file
-rerun    rerun.xtc  Input, Opt.  Trajectory: xtc trr trj gro g96 pdb cpt
-tpi        tpi.xvg  Output, Opt. xvgr/xmgr file
-tpid   tpidist.xvg  Output, Opt. xvgr/xmgr file
 -ei        sam.edi  Input, Opt.  ED sampling input
 -eo      edsam.xvg  Output, Opt. xvgr/xmgr file
  -j       wham.gct  Input, Opt.  General coupling stuff
 -jo        bam.gct  Output, Opt. General coupling stuff
-ffout      gct.xvg  Output, Opt. xvgr/xmgr file
-devout   deviatie.xvg  Output, Opt. xvgr/xmgr file
-runav  runaver.xvg  Output, Opt. xvgr/xmgr file
 -px      pullx.xvg  Output, Opt. xvgr/xmgr file
 -pf      pullf.xvg  Output, Opt. xvgr/xmgr file
 -ro   rotation.xvg  Output, Opt. xvgr/xmgr file
 -ra  rotangles.log  Output, Opt. Log file
 -rs   rotslabs.log  Output, Opt. Log file
 -rt  rottorque.log  Output, Opt. Log file
-mtx         nm.mtx  Output, Opt. Hessian matrix
 -dn     dipole.ndx  Output, Opt. Index file
-multidir    rundir  Input, Opt., Mult. Run directory
-membed  membed.dat  Input, Opt.  Generic data file
 -mp     membed.top  Input, Opt.  Topology file
 -mn     membed.ndx  Input, Opt.  Index file
 
Option       Type   Value   Description
------------------------------------------------------
-[no]h       bool   no      Print help info and quit
-[no]version bool   no      Print version info and quit
-nice        int    0       Set the nicelevel
-deffnm      string         Set the default filename for all file options
-xvg         enum   xmgrace  xvg plot formatting: xmgrace, xmgr or none
-[no]pd      bool   no      Use particle decompostion
-dd          vector 0 0 0   Domain decomposition grid, 0 is optimize
-ddorder     enum   interleave  DD node order: interleave, pp_pme or cartesian
-npme        int    -1      Number of separate nodes to be used for PME, -1
                            is guess
-nt          int    0       Total number of threads to start (0 is guess)
-ntmpi       int    0       Number of thread-MPI threads to start (0 is guess)
-ntomp       int    0       Number of OpenMP threads per MPI process/thread
                            to start (0 is guess)
-ntomp_pme   int    0       Number of OpenMP threads per MPI process/thread
                            to start (0 is -ntomp)
-pin         enum   auto    Fix threads (or processes) to specific cores:
                            auto, on or off
-pinoffset   int    0       The starting logical core number for pinning to
                            cores; used to avoid pinning threads from
                            different mdrun instances to the same core
-pinstride   int    0       Pinning distance in logical cores for threads,
                            use 0 to minimize the number of threads per
                            physical core
-gpu_id      string 00      List of GPU device id-s to use, specifies the
                            per-node PP rank to GPU mapping
-[no]ddcheck bool   yes     Check for all bonded interactions with DD
-rdd         real   0       The maximum distance for bonded interactions with
                            DD (nm), 0 is determine from initial coordinates
-rcon        real   0       Maximum distance for P-LINCS (nm), 0 is estimate
-dlb         enum   auto    Dynamic load balancing (with DD): auto, no or yes
-dds         real   0.8     Minimum allowed dlb scaling of the DD cell size
-gcom        int    -1      Global communication frequency
-nb          enum   gpu_cpu  Calculate non-bonded interactions on: auto, cpu,
                            gpu or gpu_cpu
-[no]tunepme bool   yes     Optimize PME load between PP/PME nodes or GPU/CPU
-[no]testverlet bool   no      Test the Verlet non-bonded scheme
-[no]v       bool   no      Be loud and noisy
-[no]compact bool   yes     Write a compact log file
-[no]seppot  bool   no      Write separate V and dVdl terms for each
                            interaction type and node to the log file(s)
-pforce      real   -1      Print all forces larger than this (kJ/mol nm)
-[no]reprod  bool   no      Try to avoid optimizations that affect binary
                            reproducibility
-cpt         real   15      Checkpoint interval (minutes)
-[no]cpnum   bool   no      Keep and number checkpoint files
-[no]append  bool   yes     Append to previous output files when continuing
                            from checkpoint instead of adding the simulation
                            part number to all file names
-nsteps      step   -2      Run this number of steps, overrides .mdp file
                            option
-maxh        real   -1      Terminate after 0.99 times this time (hours)
-multi       int    0       Do multiple simulations in parallel
-replex      int    0       Attempt replica exchange periodically with this
                            period (steps)
-nex         int    0       Number of random exchanges to carry out each
                            exchange interval (N^3 is one suggestion).  -nex
                            zero or not specified gives neighbor replica
                            exchange.
-reseed      int    -1      Seed for replica exchange, -1 is generate a seed
-[no]ionize  bool   no      Do a simulation including the effect of an X-Ray
                            bombardment on your system
 
Reading file adh_cubic_vsites_pme.tpr, VERSION 4.6.5 (single precision)
Changing nstlist from 10 to 25, rlist from 0.935 to 1.024
 
The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 12
Using 4 MPI processes
Using 12 OpenMP threads per MPI process
 
1 GPU detected on host cnode-9-31:
  #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
 
1 GPU user-selected for this run.
Mapping of GPUs to the 2 PP ranks in this node: #0, #0
 
NOTE: You assigned a GPU to multiple MPI processes.
starting mdrun 'NADP-DEPENDENT ALCOHOL DEHYDROGENASE in water'
10000 steps,     50.0 ps.
 
Writing final coordinates.
 
 Average load imbalance: 14.1 %
 Part of the total run time spent waiting due to load imbalance: 3.6 %
 
 
 
NOTE: The GPU has >20% more load than the CPU. This imbalance causes
      performance loss, consider using a shorter cut-off and a finer PME grid.
 
               Core t (s)   Wall t (s)        (%)
       Time:     4564.670       95.654     4772.1
                 (ns/day)    (hour/ns)
Performance:       45.167        0.531
 
gcq#230: "She Needs Cash to Buy Aspirine For Her Pain" (LIVE)
/var/www/wiki/data/pages/howto/gpu_gromacs.txt · Last modified: 2014/03/18 08:40 by swyngaard