Introduction to HPC#

What is HPC?#

"High Performance Computing" (HPC) is computing on a "Supercomputer", a computer with at the frontline of contemporary processing capacity -- particularly speed of calculation and available memory.

While the supercomputers in the early days (around 1970) used only a few processors, in the 1990s machines with thousands of processors began to appear and, by the end of the 20th century, massively parallel supercomputers with tens of thousands of "off-the-shelf" processors were the norm. A large number of dedicated processors are placed in close proximity to each other in a computer cluster.

A computer cluster consists of a set of loosely or tightly connected computers that work together so that in many respects they can be viewed as a single system.

The components of a cluster are usually connected to each other through fast local area networks ("LAN") with each node (computer used as a server) running its own instance of an operating system. Computer clusters emerged as a result of convergence of a number of computing trends including the availability of low cost microprocessors, high-speed networks, and software for high performance distributed computing.

Compute clusters are usually deployed to improve performance and availability over that of a single computer, while typically being more cost-effective than single computers of comparable speed or availability.

Supercomputers play an important role in the field of computational science, and are used for a wide range of computationally intensive tasks in various fields, including quantum mechanics, weather forecasting, climate research, oil and gas exploration, molecular modelling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals), and physical simulations (such as simulations of the early moments of the universe, airplane and spacecraft aerodynamics, the detonation of nuclear weapons, and nuclear fusion). ¹

What is the HPC-UGent infrastructure?#

The HPC is a collection of computers with AMD and/or Intel CPUs, running a Linux operating system, shaped like pizza boxes and stored above and next to each other in racks, interconnected with copper and fiber cables. Their number crunching power is (presently) measured in hundreds of billions of floating point operations (gigaflops) and even in teraflops.

The HPC-UGent infrastructure relies on parallel-processing technology to offer UGent researchers an extremely fast solution for all their data processing needs.

The HPC currently consists of:

a set of different compute clusters. For an up to date list of all clusters and their hardware, see https://vscdocumentation.readthedocs.io/en/latest/gent/tier2_hardware.html.

Job management and job scheduling are performed by Slurm with a Torque frontend. We advise users to adhere to Torque commands mentioned in this document.

What the HPC infrastucture is not#

The HPC infrastructure is not a magic computer that automatically:

runs your PC-applications much faster for bigger problems;
develops your applications;
solves your bugs;
does your thinking;
...
allows you to play games even faster.

The HPC does not replace your desktop computer.

Is the HPC a solution for my computational needs?#

Batch or interactive mode?#

Typically, the strength of a supercomputer comes from its ability to run a huge number of programs (i.e., executables) in parallel without any user interaction in real time. This is what is called "running in batch mode".

It is also possible to run programs at the HPC, which require user interaction. (pushing buttons, entering input data, etc.). Although technically possible, the use of the HPC might not always be the best and smartest option to run those interactive programs. Each time some user interaction is needed, the computer will wait for user input. The available computer resources (CPU, storage, network, etc.) might not be optimally used in those cases. A more in-depth analysis with the HPC staff can unveil whether the HPC is the desired solution to run interactive programs. Interactive mode is typically only useful for creating quick visualisations of your data without having to copy your data to your desktop and back.

What are cores, processors and nodes?#

In this manual, the terms core, processor and node will be frequently used, so it's useful to understand what they are.

Modern servers, also referred to as (worker)nodes in the context of HPC, include one or more sockets, each housing a multi-core processor (next to memory, disk(s), network cards, ...). A modern processor consists of multiple CPUs or cores that are used to execute computations.

Parallel or sequential programs?#

Parallel programs#

Parallel computing is a form of computation in which many calculations are carried out simultaneously. They are based on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel").

Parallel computers can be roughly classified according to the level at which the hardware supports parallelism, with multicore computers having multiple processing elements within a single machine, while clusters use multiple computers to work on the same task. Parallel computing has become the dominant computer architecture, mainly in the form of multicore processors.

The two parallel programming paradigms most used in HPC are:

OpenMP for shared memory systems (multithreading): on multiple cores of a single node
MPI for distributed memory systems (multiprocessing): on multiple nodes

Parallel programs are more difficult to write than sequential ones, because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronisation between the different subtasks are typically some of the greatest obstacles to getting good parallel program performance.

Sequential programs#

Sequential software does not do calculations in parallel, i.e., it only uses one single core of a single workernode. It does not become faster by just throwing more cores at it: it can only use one core.

It is perfectly possible to also run purely sequential programs on the HPC.

Running your sequential programs on the most modern and fastest computers in the HPC can save you a lot of time. But it also might be possible to run multiple instances of your program (e.g., with different input parameters) on the HPC, in order to solve one overall problem (e.g., to perform a parameter sweep). This is another form of running your sequential programs in parallel.

What programming languages can I use?#

You can use any programming language, any software package and any library provided it has a version that runs on Linux, specifically, on the version of Linux that is installed on the compute nodes, RHEL 8.8 (doduo, donphan, gallade) and RHEL 9.4 (skitty, shinx, joltik, accelgor).

For the most common programming languages, a compiler is available on RHEL 8.8 (doduo, donphan, gallade) and RHEL 9.4 (skitty, shinx, joltik, accelgor). Supported and common programming languages on the HPC are C/C++, FORTRAN, Java, Perl, Python, MATLAB, R, etc.

Supported and commonly used compilers are GCC and Intel.

Additional software can be installed "on demand". Please contact the HPC staff to see whether the HPC can handle your specific requirements.

What operating systems can I use?#

All nodes in the HPC cluster run under RHEL 8.8 (doduo, donphan, gallade) and RHEL 9.4 (skitty, shinx, joltik, accelgor), which is a specific version of Red Hat Enterprise Linux. This means that all programs (executables) should be compiled for RHEL 8.8 (doduo, donphan, gallade) and RHEL 9.4 (skitty, shinx, joltik, accelgor).

Users can connect from any computer in the UGent network to the HPC, regardless of the Operating System that they are using on their personal computer. Users can use any of the common Operating Systems (such as Windows, macOS or any version of Linux/Unix/BSD) and run and control their programs on the HPC.

A user does not need to have prior knowledge about Linux; all of the required knowledge is explained in this tutorial.

What does a typical workflow look like?#

A typical workflow looks like:

Connect to the login nodes with SSH (see First Time connection to the HPC infrastructure)
Transfer your files to the cluster (see Transfer Files to/from the HPC)
Optional: compile your code and test it (for compiling, see Compiling and testing your software on the HPC)
Create a job script and submit your job (see Running batch jobs)
Get some coffee and be patient:
1. Your job gets into the queue
2. Your job gets executed
3. Your job finishes
Study the results generated by your jobs, either on the cluster or after downloading them locally.

What is the next step?#

When you think that the HPC is a useful tool to support your computational needs, we encourage you to acquire a VSC-account (as explained in Getting a HPC Account), read Connecting to the HPC infrastructure, "Setting up the environment", and explore chapters Running interactive jobs to Fine-tuning Job Specifications which will help you to transfer and run your programs on the HPC cluster.

Do not hesitate to contact the HPC staff for any help.

Wikipedia: http://en.wikipedia.org/wiki/Supercomputer ↩