GPU clusters#
Submitting jobs#
To submit jobs to the joltik
GPU cluster, where each node provides 4
NVIDIA V100 GPUs (each with 32GB of GPU memory), use:
module swap cluster/joltik
To submit to the accelgor
GPU cluster, where each node provides 4
NVIDIA A100 GPUs (each with 80GB GPU memory), use:
module swap cluster/accelgor
Then use the familiar qsub
, qstat
, etc. commands, taking into
account the guidelines outlined in
section Requesting (GPU) resources.
Interactive jobs#
To interactively experiment with GPUs, you can submit an interactive job
using qsub -I
(and request one or more GPUs, see
section Requesting (GPU) resources).
Note that due to a bug in Slurm you will currently not be able to be able to interactively use MPI software that requires access to the GPUs. If you need this, please contact use via hpc@ugent.be.
Hardware#
See https://www.ugent.be/hpc/en/infrastructure.
Requesting (GPU) resources#
There are 2 main ways to ask for GPUs as part of a job:
-
Either as a node property (similar to the number of cores per node specified via
ppn
) using-l nodes=X:ppn=Y:gpus=Z
(where theppn=Y
is optional), or as a separate resource request (similar to the amount of memory) via-l gpus=Z
. Both notations give exactly the same result. The-l gpus=Z
is convenient if you only need one node and you are fine with the default number of cores per GPU. The-l nodes=...:gpus=Z
notation is required if you want to run with full control or in multinode cases like MPI jobs. If you do not specify the number of GPUs by just using-l gpus
, you get by default 1 GPU. -
As a resource of its own, via
--gpus X
. In this case however, you are not guaranteed that the GPUs are on the same node, so your script or code must be able to deal with this.
Some background:
-
The GPUs are constrained to the jobs (like the CPU cores), but do not run in so-called "exclusive" mode.
-
The GPUs run with the so-called "persistence daemon", so the GPUs is not re-initialised between jobs.
Attention points#
Some important attention points:
-
For MPI jobs, we recommend the (new) wrapper
mypmirun
from thevsc-mympirun
module (pmi
is the background mechanism to start the MPI tasks, and is different from the usualmpirun
that is used by themympirun
wrapper). At some later point, we might promote themypmirun
tool or rename it, to avoid the confusion in the naming. -
Sharing GPUs requires MPS. The Slurm built-in MPS does not really do want you want, so we will provide integration with
mypmirun
andwurker
. -
For parallel work, we are working on a
wurker
wrapper from thevsc-mympirun
module that supports GPU placement and MPS, without any limitations wrt the requested resources (i.e. also support the case where GPUs are spread heterogeneous over nodes from using the--gpus Z
option). -
Both
mypmirun
andwurker
will try to do the most optimised placement of cores and tasks, and will provide 1 (optimal) GPU per task/MPI rank, and set one so-called visible device (i.e.CUDA_VISIBLE_DEVICES
only has 1 ID). The actual devices are not constrained to the ranks, so you can access all devices requested in the job. We know that at this moment, this is not working properly, but we are working on this. We advise against trying to fix this yourself.
Software with GPU support#
Use module avail
to check for centrally installed software.
The subsections below only cover a couple of installed software packages, more are available.
GROMACS#
Please consult module avail GROMACS
for a list of installed versions.
Horovod#
Horovod can be used for (multi-node) multi-GPU TensorFlow/PyTorch calculations.
Please consult module avail Horovod
for a list of installed versions.
Horovod supports TensorFlow, Keras, PyTorch and MxNet (see
https://github.com/horovod/horovod#id9), but should be run as an MPI
application with mypmirun
. (Horovod also provides its own wrapper
horovodrun
, not sure if it handles placement and others correctly).
At least for simple TensorFlow benchmarks, it looks like Horovod is a bit faster than usual autodetect multi-GPU TensorFlow without horovod, but it comes at the cost of the code modifications to use horovod.
PyTorch#
Please consult module avail PyTorch
for a list of installed versions.
TensorFlow#
Please consult module avail TensorFlow
for a list of installed
versions.
Note: for running TensorFlow calculations on multiple GPUs and/or on more than one workernode, use Horovod
, see section Horovod.
Example TensorFlow job script#
#!/bin/bash
#PBS -l walltime=5:0:0
#PBS -l nodes=1:ppn=quarter:gpus=1
module load TensorFlow/2.6.0-foss-2021a-CUDA-11.3.1
cd $PBS_O_WORKDIR
python example.py
AlphaFold#
Please consult module avail AlphaFold
for a list of installed
versions.
For more information on using AlphaFold, we strongly recommend the VIB-UGent course available at https://elearning.bits.vib.be/courses/alphafold.
Getting help#
In case of questions or problems, please contact the HPC-UGent team via hpc@ugent.be, and clearly
indicate that your question relates to the joltik
cluster by adding
[joltik]
in the email subject.