If you are making use of Jupyter notebook to write your python scripts then you first need to make sure you export the .py file from Jupyter and then copy it onto the cluster
Also ensure job is copied to /mnt/lustre/users/YOURUSERNAME or subdirectories therein
To test a job on a compute node first get onto an interactive node with the following:
qsub -I -P YOURPROGRAMME(E.G. CSCI1234) -q smp -l select=1:ncpus=24
Once on an interactive node (cnodeNNNN) you need to load up the appropriate modules:
module purge module load chpc/python/3.6.1_gcc-6.3.0
Then
cd /mnt/lustre/users/YOURUSERNAME or where ever you placed your .py file
Finally run
python nameofyourfile.py
If you import matplotlib in your python script you may end up with the following error:
ModuleNotFoundError: No module named '_tkinter'
If so then add the following to your python script before resubmitting
import matplotlib matplotlib.use('agg')
As with CPU version you can test your python jobs on an interactive node:
qsub -I -P YOURPROGRAMME(E.G. CSCI1234) -q gpu_1 -l select=1:ncpus=10:ngpus=1
Once on an interactive node (gpuNNNN) you need to load up appropriate modules:
module purge module load chpc/cuda/10.0 module load chpc/python/anaconda/3
Then
cd /mnt/lustre/users/YOURUSERNAME or where ever you placed your .py file
When running on a single GPU you need to include the following in your .py file to ensure that not all the CPU's on the node get consumed, thereby resulting in your job being killed by the scheduler
session_conf = tf.ConfigProto(intra_op_parallelism_threads=10,inter_op_parallelism_threads=10) sess = tf.Session(config=session_conf)
Finally run
python nameofyourfile.py
If you wish to run jobs through the scheduler then there are scripts on the login node that can help you setup a PBS submission script
Once you are on the login node just type:
qtensorflow_cpu (CPU version of Tensorflow)
or
qtensorflow_gpu (GPU version of tensorflow)
Examples of what is needed when running the above scripts are provided below:
EXAMPLE1 Enter research programme name CSCI1234 Enter python script name (with .py extension) test.py Enter total walltime (hour:minute) 2:00 Enter email address testing@gmail.com Generated pbs file for test Do you wish to submit job to cluster (y/n) y
EXAMPLE2 Enter research programme name CSCI1234 Enter python script name (with .py extension) test.py Enter total walltime (hour:minute) Enter email address testing@gmail.com Generated pbs file for test Do you wish to submit job to cluster (y/n) y
Please take note of empty space in EXAMPLE2. This corresponds to the enter key