User Tools

Site Tools


General Info

LAMMPS (“Large-scale Atomic/Molecular Massively Parallel Simulator”) is a molecular dynamics program from Sandia National Laboratories. LAMMPS makes use of MPI for parallel communication and is free, open-source software, distributed under the terms of the GNU General Public License. source

Installation Guide

Setup environment:

mkdir LAMMPS
makedir tars
vim bashrc


source /opt/intel/bin/ intel64
source /opt/intel/mkl/bin/ intel64
source /opt/intel/impi/4.1.0/bin64/
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/lib
export C_INCLUDE_PATH=/usr/local/cuda/include
source bashrc
cd tar
Building the benchmark

Download the LAMMPS source (15 May 2015) here: lammps-stable.tar.gz

tar -xf lammps.stable.tar.gz
mv lammps-15May15 ..
cd ../lammps-15May15/

=Building x86 CPU Benchmark=

To build the Intel compiled LAMMPS binary, first edit the Makefile:

cd src
cp MAKE/OPTIONS/Makefile.intel_cpu MAKE/Makefile.intel_cpu
vim MAKE/



Build with:

make yes-user-intel
make yes-user-omp 
make intel_cpu

The lmp_intel_cpu binary should be produced.

cp lmp_intel_cpu ../bench
Building CUDA GPU Benchmark

Build this binary using the python script:

cd ~/LAMMPS/lammps-15May15/bench/KEPLER

Edit the file:


Edit the lmp_dir variable in line 21:

lmpdir= "/home/$USER/LAMMPS/lammps-15May15"

Correct the whitespace error in line 79:

cpu = opt = omp = 0

Next, edit the CUDA Makefile

vim Makefile.cuda


CC =            mpiicpc
CCFLAGS =       -O3 -xHost
SHFLAGS =       -fPIC
DEPFLAGS =      -M
LINK =          mpiicpc
LINKFLAGS =     -O3 -xHost
LIB =           -lstdc++
SIZE =          size

Build the binary with:

python cuda
cp lmp_cuda ../


The stock “3d Lennard-Jones melt” test problem is used as a benchmark for this code. A fixed number of particles/core is used for the problem size.

cd ~/LAMMPS/lammps-15May15/bench

A pre-configured input script is available here cpu.tar.gz

x86 CPU Benchmark

For x86 CPU benchmarks, this is 500K particles per core. Therefore, if you with to run the benchmark on 24 x86 cores, a total of 12,000K (500×24) particles is required.

The run the benchmark use:

mpirun -np <N> -hostfile <HF> ./lmp_intel_cpu -sf intel -v x <X> -v y <Y> -v z <Z> -v t 100 < in.lj

where: <N> is the number of cores, <HF> is the hostfile amd <X>, <Y> and <Z> are the problem scaling factors - used to reach 500K particles/core. The benchmark has been pre-configured to operate using 500K particles. Therefore, the X,Y and Z values are using to scale the number of particles up - for more cores. In order to run the benchmark on more cores simply scale X,Y and Z accordingly, such that their product equals the number of cores desired.

For example, running on 4 nodes with 24 cores each, the run command would be:

mpirun -np 96 -hostfile hosts ./lmp_intel_cpu -sf intel -v x 6 -v y 4 -v z 4 -v t 100 < in.lj

Giving a total particle count of 500K * (6*4*4) = 48×10^6. Which conforms to the 500K particles per core, (48×10^6 / 96 = 500K).

CUDA GPU Benchmark

For the GPU benchmark, the same process is used as above, however with a larger number of particles per GPU. For GPU benchmarks, the particles per GPU are set as 8M particles per GPU. Download the pre-configured input file here

mpirun -n <N> -hostfile <HF> ./lmp_cuda_mixed -c on -sf cuda -pk cuda 1 -v x 1 -v y 1 -v z 1 < in.lj

The results of the benchmark are reported as particle-timesteps per second. To calculate this value, take the number of particles in the simulation, multiple by simulation timesteps and divide by the runtime. See below for an example:

500 000 [particles] * 300 [simulation timesteps] / 25.4 [seconds] = 5.9 x10^6 [particle-timesteps/second]
/var/www/wiki/data/pages/acelab/lammps.txt · Last modified: 2015/09/02 11:55 by mcawood