MATLAB#
Note
To run a MATLAB program on the HPC-UGent infrastructure you must compile it first, because the MATLAB license server is not accessible from cluster workernodes (except for the interactive debug cluster).
Compiling MATLAB programs is only possible on the interactive debug cluster, not on the HPC-UGent login nodes where resource limits w.r.t. memory and max. number of progress are too strict.
Why is the MATLAB compiler required?#
The main reason behind this alternative way of using MATLAB is licensing: only a limited number of MATLAB sessions can be active at the same time. However, once the MATLAB program is compiled using the MATLAB compiler, the resulting stand-alone executable can be run without needing to contact the license server.
Note that a license is required for the MATLAB Compiler, see
https://nl.mathworks.com/help/compiler/index.html. If the mcc
command is provided by the MATLAB installation you are using, the MATLAB
compiler can be used as explained below.
Only a limited amount of MATLAB sessions can be active at the same time because there are only a limited amount of MATLAB research licenses available on the UGent MATLAB license server. If each job would need a license, licenses would quickly run out.
How to compile MATLAB code#
Compiling MATLAB code can only be done from the login nodes, because only login nodes can access the MATLAB license server, workernodes on clusters cannot.
To access the MATLAB compiler, the MATLAB
module should be loaded
first. Make sure you are using the same MATLAB
version to compile and
to run the compiled MATLAB program.
$ module avail MATLAB/
----------------------/apps/gent/RHEL8/zen2-ib/modules/all----------------------
MATLAB/2021b MATLAB/2022b-r5 (D)
$ module load MATLAB/2021b
After loading the MATLAB
module, the mcc
command can be used. To get
help on mcc
, you can run mcc -?
.
To compile a standalone application, the -m
flag is used (the -v
flag means verbose output). To show how mcc
can be used, we use the
magicsquare
example that comes with MATLAB.
First, we copy the magicsquare.m
example that comes with MATLAB to
example.m
:
cp $EBROOTMATLAB/extern/examples/compiler/magicsquare.m example.m
To compile a MATLAB program, use mcc -mv
:
mcc -mv example.m
Opening log file: /user/home/gent/vsc400/vsc40000/java.log.34090
Compiler version: 8.3 (R2021b)
Dependency analysis by REQUIREMENTS.
Parsing file "/user/home/gent/vsc400/vsc40000/example.m"
(Referenced from: "Compiler Command Line").
Deleting 0 temporary MEX authorization files.
Generating file "/user/home/gent/vsc400/vsc40000/readme.txt".
Generating file "run\_example.sh".
Libraries#
To compile a MATLAB program that needs a library, you can use the
-I library_path
flag. This will tell the compiler to also look for
files in library_path
.
It's also possible to use the -a path
flag. That will result in all
files under the path
getting added to the final executable.
For example, the command mcc -mv example.m -I examplelib -a datafiles
will compile example.m
with the MATLAB files in examplelib
, and will
include all files in the datafiles
directory in the binary it
produces.
Memory issues during compilation#
If you are seeing Java memory issues during the compilation of your
MATLAB program on the login nodes, consider tweaking the default maximum
heap size (128M) of Java using the _JAVA_OPTIONS
environment variable
with:
export _JAVA_OPTIONS="-Xmx64M"
The MATLAB compiler spawns multiple Java processes. Because of the default memory limits that are in effect on the login nodes, this might lead to a crash of the compiler if it's trying to create to many Java processes. If we lower the heap size, more Java processes will be able to fit in memory.
Another possible issue is that the heap size is too small. This could result in errors like:
Error: Out of memory
A possible solution to this is by setting the maximum heap size to be bigger:
export _JAVA_OPTIONS="-Xmx512M"
Multithreading#
MATLAB can only use the cores in a single workernode (unless the Distributed Computing toolbox is used, see https://nl.mathworks.com/products/distriben.html).
The amount of workers used by MATLAB for the parallel toolbox can be
controlled via the parpool
function: parpool(16)
will use 16
workers. It's best to specify the amount of workers, because otherwise
you might not harness the full compute power available (if you have too
few workers), or you might negatively impact performance (if you have
too many workers). By default, MATLAB uses a fixed number of workers
(12).
You should use a number of workers that is equal to the number of cores
you requested when submitting your job script (the ppn
value, see Generic resource requirements).
You can determine the right number of workers to use via the following
code snippet in your MATLAB program:
% specify the right number of workers (as many as there are cores available in the job) when creating the parpool
c = parcluster('local')
pool = parpool(c.NumWorkers)
See also the parpool documentation.
Java output logs#
Each time MATLAB is executed, it generates a Java log file in the users home directory. The output log directory can be changed using:
MATLAB_LOG_DIR=<OUTPUT_DIR>
where <OUTPUT_DIR>
is the name of the desired output directory. To
create and use a temporary directory for these logs:
# create unique temporary directory in $TMPDIR (or /tmp/$USER if
$TMPDIR is not defined)
# instruct MATLAB to use this directory for log files by setting $MATLAB_LOG_DIR
$ export MATLAB_LOG_DIR=$ (mktemp -d -p $TMPDIR:-/tmp/$USER)
You should remove the directory at the end of your job script:
rm -rf $MATLAB_LOG_DIR
Cache location#
When running, MATLAB will use a cache for performance reasons. This
location and size of this cache can be changed through the
MCR_CACHE_ROOT
and MCR_CACHE_SIZE
environment variables.
The snippet below would set the maximum cache size to 1024MB and the
location to /tmp/testdirectory
.
export MATLAB_CACHE_ROOT=/tmp/testdirectory
export MATLAB_CACHE_SIZE=1024M
So when MATLAB is running, it can fill up to 1024MB of cache in
/tmp/testdirectory
.
MATLAB job script#
All of the tweaks needed to get MATLAB working have been implemented in an example job script. This job script is also available on the HPC.
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=1:0:0
#
# Example (single-core) MATLAB job script
#
# make sure the MATLAB version matches with the one used to compile the MATLAB program!
module load MATLAB/2021b
# use temporary directory (not $HOME) for (mostly useless) MATLAB log files
# subdir in $TMPDIR (if defined, or /tmp otherwise)
export MATLAB_LOG_DIR=$(mktemp -d -p ${TMPDIR:-/tmp})
# configure MATLAB Compiler Runtime cache location & size (1GB)
# use a temporary directory in /dev/shm (i.e. in memory) for performance reasons
export MCR_CACHE_ROOT=$(mktemp -d -p /dev/shm)
export MCR_CACHE_SIZE=1024MB
# change to directory where job script was submitted from
cd $PBS_O_WORKDIR
# run compiled example MATLAB program 'example', provide '5' as input argument to the program
# $EBROOTMATLAB points to MATLAB installation directory
./run_example.sh $EBROOTMATLAB 5