User Tools

Site Tools


research:ml

If you are making use of Jupyter notebook to write your python scripts then you first need to make sure you export the .py file from Jupyter and then copy it onto the cluster

Also ensure job is copied to /mnt/lustre/users/YOURUSERNAME or subdirectories therein

Running Tensorflow on CPU nodes

To test a job on a compute node first get onto an interactive node with the following:

   qsub -I -P YOURPROGRAMME(E.G. CSCI1234) -q smp -l select=1:ncpus=24

Once on an interactive node (cnodeNNNN) you need to load up the appropriate modules:

   module purge
   module load chpc/python/3.6.1_gcc-6.3.0

Then

   cd /mnt/lustre/users/YOURUSERNAME or where ever you placed your .py file

Finally run

   python nameofyourfile.py

If you import matplotlib in your python script you may end up with the following error:

   ModuleNotFoundError: No module named '_tkinter'

If so then add the following to your python script before resubmitting

   import matplotlib
   matplotlib.use('agg')

Running Tensorflow on GPU nodes

As with CPU version you can test your python jobs on an interactive node:

   qsub -I -P YOURPROGRAMME(E.G. CSCI1234) -q gpu_1 -l select=1:ncpus=10:ngpus=1

Once on an interactive node (gpuNNNN) you need to load up appropriate modules:

   module purge
   module load chpc/cuda/10.0
   module load chpc/python/anaconda/3

Then

   cd /mnt/lustre/users/YOURUSERNAME or where ever you placed your .py file

When running on a single GPU you need to include the following in your .py file to ensure that not all the CPU's on the node get consumed, thereby resulting in your job being killed by the scheduler

    session_conf = tf.ConfigProto(intra_op_parallelism_threads=10,inter_op_parallelism_threads=10)
    sess = tf.Session(config=session_conf) 

Finally run

   python nameofyourfile.py

If you wish to run jobs through the scheduler then there are scripts on the login node that can help you setup a PBS submission script

Once you are on the login node just type:

qtensorflow_cpu (CPU version of Tensorflow)

or

qtensorflow_gpu (GPU version of tensorflow)

Examples of what is needed when running the above scripts are provided below:

          EXAMPLE1
 Enter research programme name
 CSCI1234
 Enter python script name (with .py extension)
 test.py
 Enter total walltime (hour:minute)
 2:00
 Enter email address
 testing@gmail.com
 Generated pbs file for test
 Do you wish to submit job to cluster (y/n)
 y
          EXAMPLE2
 Enter research programme name
 CSCI1234
 Enter python script name (with .py extension)
 test.py
 Enter total walltime (hour:minute)
 
 Enter email address
 testing@gmail.com
 Generated pbs file for test
 Do you wish to submit job to cluster (y/n)
 y

Please take note of empty space in EXAMPLE2. This corresponds to the enter key

/var/www/wiki/data/pages/research/ml.txt · Last modified: 2019/07/08 15:16 by kgovender