User Tools

Site Tools


howto:wrf

Running ARW / WRF at the CHPC

WRF-3.8, compiled with the Intel compilers and using Intel MPI, is installed on the filesystem in /apps/chpc/earth/WRF-3.8-impi. Tests have indicated a very large benefit from using the Intel compiler and MPI rather than Gnu compiler and OpenMPI or MPICH. Please note that it is essential to set the unlimited stack size for the Intel-compiled version, as done in the script below. To set up an appropriate environment, “source” the setWRF file with the following command: . /apps/chpc/earth/WRF-3.8-impi/setWRF . This command should be placed in the PBS-Pro job submission script. Users need to develop their own workflows, but it is also practical to execute the pre-processing steps geogrid.exe, ungrib.exe, metgrid.exe and real.exe in single node mode with an interactive session. Simply give the command qsub -I -q smp -P <AAAA0000>, where <AAAA0000> should be replaced with your project code, to obtain an interactive session. Do not try to run these pre-processing steps from the login shell, as the shared login node cannot sustain a high work load. The real.exe pre-processing step for large cases may run into memory constraints. In that case, run real.exe in parallel over the requested number of nodes, but with only one process per node, as per the example script.

There is a gcc-compiled WRF-3.7 installed in /apps/chpc/earth/WRF-3.7-gcc-ompi. Source the setWRF file in this directory to set up a suitable environment. This version of WRF uses OpenMPI.

WRF, Parallel NetCDF and I/O Quilting

WRF-3.8 with parallel netcdf is installed on the filesystem in /apps/chpc/earth/WRF-3.8-pnc-impi_hwl. Source the setWRF file in this directory to use this version. If a sufficient number of CPU cores can be used, WRF's run time is severely restricted by the time taken to produce hourly outputs. Appropriate use of parallel netcdf can dramatically reduce the I/O time. There is also a WRF-3.7 version in /apps/chpc/earth/WRF-3.7-pnc-impi.

Making effective use of Parallel NetCDF with I/O quilting requires some changes to the namelist.input file as well as the PBS script.

  • It is recommended that the domain decomposition be specified manually by setting appropriate values for nproc_x, nproc_y, nio_tasks_per_group and nio_groups in namelist.input
  • There is conflicting advice on suitable values for the above parameters. Our experimentation shows that setting nio_groups=2 works quite well, and nio_tasks_per_group should divide into nproc_y. For example, if nproc_y=24, nio_tasks_per_group=12 should work acceptably well.
  • The nocolons flag should be set to T
  • Set lustre stripe count for the output files (see example script). Using multiples of 12 works well on the CHPC cluster.
  • Issue the mpirun command for nproc=(nproc_x*nproc_y) + (nio_tasks_per_group * nio_groups)

WRF-3.8 with the added chemistry model and kinetic pre-processor is available in /apps/chpc/earth/WRFCHEM-3.8-pnc-impi. As per the above instructions, source the setWRF script in that directory to set up a suitable environment. This version was compiled with the Intel compiler, Intel MPI and also supports parallel netcdf. There is also a WRF-3.7 version in /apps/chpc/earth/WRFCHEM-3.7-pnc-impi

runWRF.qsub
#!/bin/bash 
#### For the distributed memory versions of the code that we use at CHPC, mpiprocs should be equal to ncpus
#### Here we have selected the maximum resources available to a regular CHPC user
####  Obviously provide your own project identifier
#### For your own benefit, try to estimate a realistic walltime request.  Over-estimating the 
#### wallclock requirement interferes with efficient scheduling, will delay the launch of the job,
#### and ties up more of your CPU-time allocation untill the job has finished.
#PBS -l select=10:ncpus=24:mpiprocs=24 -q normal -P TEST1234
#PBS -l walltime=3:00:00
#PBS -o /home/username/scratch/WRFV3_test/run/stdout
#PBS -e /home/username/scratch/WRFV3_test/run/stderr
#PBS -m abe
#PBS -M username@unseenuniversity.ac.za
### Source the WRF-3.8 environment:
export WRFDIR=/apps/chpc/earth/WRF-3.8-impi_hwl
. $WRFDIR/setWRF
# Set the stack size unlimited for the intel compiler
ulimit -s unlimited
##### Running commands
# Set PBS_JOBDIR to where YOUR simulation will be run
export PBS_JOBDIR=/home/username/scratch/WRFV3_test/run
# First though, change to YOUR WPS directory
export WPS_DIR=/export/home/username/scratch/WPS_test
cd $WPS_DIR
# Clean the directory of old files
rm FILE*
rm GRIB*
rm geo_em*
rm met_em*
# Link to the grib files, obviously use the location of YOUR grib files
./link_grib.csh ../DATA_test/GFS_* 
# Run geogrid.exe
mpirun -np 1 -machinefile $PBS_NODEFILE geogrid.exe &> geogrid.out
# Run ungrib.exe
mpirun -np 1 -machinefile $PBS_NODEFILE ungrib.exe &> ungrib.out
# Run metgrid.exe
mpirun -np 1 -machinefile $PBS_NODEFILE metgrid.exe &> metgrid.out
# Now change to the main job directory
cd $PBS_JOBDIR
# Link the met_em* data files into this directory
ln -s $WPS_DIR/met_em* ./
# Figure out how many processes to use for wrf.exe
nproc=`cat $PBS_NODEFILE | wc -l`
# Now figure out how many nodes are being used
cat $PBS_NODEFILE | sort -u > hosts
# Number of nodes to be used for real.exe
nnodes=`cat hosts | wc -l`
# Run real.exe with one process per node
exe=$WRFDIR/WRFV3/run/real.exe
mpirun -np $nnodes -machinefile hosts $exe &> real.out
# Run wrf.exe with the full number of processes
exe=$WRFDIR/WRFV3/run/wrf.exe
mpirun -np $nproc -machinefile $PBS_NODEFILE $exe &> wrf.out

The following script runs wrf.exe only, with Parallel NetCDF:

runWRF.qsub
#!/bin/bash 
#### For the distributed memory versions of the code that we use at CHPC, mpiprocs should be equal to ncpus
#### Here we have selected the maximum resources available to a regular CHPC user
####  Obviously provide your own project identifier
#### For your own benefit, try to estimate a realistic walltime request.  Over-estimating the 
#### wallclock requirement interferes with efficient scheduling, will delay the launch of the job,
#### and ties up more of your CPU-time allocation untill the job has finished.
#PBS -l select=10:ncpus=24:mpiprocs=24 -q normal -P TEST1234
#PBS -l walltime=3:00:00
#PBS -o /home/username/scratch/WRFV3_test/run/stdout
#PBS -e /home/username/scratch/WRFV3_test/run/stderr
#PBS -m abe
#PBS -M username@unseenuniversity.ac.za
### Source the WRF-3.8 environment with parallel NetCDF:
export WRFDIR=/apps/chpc/earth/WRF-3.8-pnc-impi
. $WRFDIR/setWRF
# Set the stack size unlimited for the intel compiler
ulimit -s unlimited
##### Running commands
# Set PBS_JOBDIR to where YOUR simulation will be run
export PBS_JOBDIR=/home/username/scratch/WRFV3_test/run
cd $PBS_JOBDIR
exe=$WRFDIR/WRFV3/run/wrf.exe
# Clear and re-set the lustre striping for the job directory.  For the lustre configuration 
# used by CHPC, a stripe size of 12 should work well.
lfs setstripe -d $PBSJOBDIR 
lfs setstripe -c 12 $PBS_JOBDIR
## For this example, assume that nproc_x=8, nproc_y=28, nio_tasks_per_group=4 and nio_groups=4, for a total
## of 16 I/O processes and 228 solver processes, therefore 240 MPI processes in total.
mpirun -np 240 -machinefile $PBS_NODEFILE $exe &> wrf.out

For post-processing, ARW-Post, NCL and GrADS have been installed, and the necessary paths and environment variables set up by sourcing the setWRF file. In addition, ncview is also available as /apps/chpc/earth/ncview-2.1.7-gcc. Source the script file setNCView in that directory in order to set up a suitable environment. Alternatively, use just the binary /apps/chpc/earth/ncview-2.1.7-gcc/utils/bin/ncview. For graphics, refer to the Remote Visualization page for instructions on setting up a VNC session.

Parallel scaling

The following graphs can be used to estimate performance scaling on the cluster. Please note that if you are using a large number of cores, writing hourly output files will significantly slow down the run. Use a version of WRF compiled with parallel NetCDF support, and an appropriate input file to overcome this. Check in the rsl.out.0000 file to see how much time is being used for writing output files. If it takes much more than 2 or 3 seconds to write an output file, use parallel NetCDF. Using all cores per node produces the best performance per node, but it is also a case of diminishing returns, with very little advantage gained from the last few cores per node.

/var/www/wiki/data/pages/howto/wrf.txt · Last modified: 2018/02/20 21:42 by ccrosby