GPU available

logo glicid

1. GPU Cluster Nautilus

1.1. Gnode GPU compute servers[1-4]

1.1.1. Hardware configuration

gnode[1-4] are GPU compute servers with 4 Nvidia A100 cards. Hardware configuration

The gnode[1-4] servers are Bullx X410-A6 4U1N2S composed of:

2x AMD EPYC Genoa 9474F 48-Core

768GB of DDR5 memory @4800MT/s

1x 960GB SSD

4x GPU NVidia A100 80GB passive PCIe4x16-no CEC

2x 1GbE RJ45 ports

1x CNX4 25GbE DP OCP3.0 PCIe3.0 x16 SFP28 Ethernet Card

1x Infiniband ConnectX-6 SP HDR EDR card 100Gb QSFP56 PCIe3x16

gpu face view

1.2. VISU visu servers[1-4]

visu[1-4] are the 3D visualization servers with 2 Nvidia A40 cards.

2. Hardware configuration

2x AMD EPYC Genoa 9474F 48-Core

768GB of DDR5 memory @4800MT/s

2x 960GB SSD

2x GPU NVidia A40 48GB passive PCIe4x16-no CEC

2x 1GbE RJ45 ports

1x CNX4 25GbE DP OCP3.0 PCIe3.0 x16 SFP28 Ethernet Card

1x Infiniband ConnectX-6 SP HDR EDR card 100Gb QSFP56 PCIe3x16

visu face view

3. Slurm Constraint

To easily launch jobs from any front end, we have implemented constraint relating to particular configurations of clusters and nodes. These constraints allow you to target the desired nodes, especially if you are not on the target cluster.

To use them, add the Slurm --constraint=<constraint_name> option

if you do not specify a partition then the calculations will start on the default partition. On Nautilus it is the "standard" partition

Example : To request a feature/constraint which allows you to specify an a100 gpu which is a gnode de Nautilus , you must add the following line to your submit script:

#SBATCH --constraint="loc_ecn&gpu_a100_80"

3.1. Slurm gres

GRES (Generic Resource Scheduling) in SLURM manages and allocates specialized resources like GPUs. It ensures proper GPU allocation to jobs by specifying resources with #SBATCH --gres=gpu:<number>. This optimizes hardware usage and prevents resource conflicts.

Here’s an example of using GRES for requesting GPUs in a SLURM job script:

#SBATCH --gres=gpu:2

This requests 2 GPUs for the job.

4. Cuda

CUDA Compatibility Matrix

When using CUDA applications like in this GLiCID tutorial, you must ensure that the NVIDIA driver, CUDA toolkit, and GPU hardware are compatible. NVIDIA publishes an official matrix to help match driver versions with CUDA releases.

Official NVIDIA CUDA Compatibility Guide: CUDA Compatibility Guide

5. Quick Reference Table (extract)

CUDA Toolkit Version Minimum Driver Version Example Supported Driver Range

CUDA 12.2

>= 535.54.03

535.xx and newer

CUDA 12.0 / 12.1

>= 525.60.13

525.xx and newer

CUDA 11.8

>= 520.61.05

520.xx and newer

CUDA 11.4 – 11.7

>= 470.57.02

470.xx and newer

CUDA 11.0 – 11.3

>= 450.80.02

450.xx and newer

CUDA 10.2

>= 440.33

440.xx and newer

6. Tips

  • Newer drivers are usually backward compatible with older CUDA versions.

  • Always check the official NVIDIA matrix when upgrading toolkits or drivers.

  • On shared HPC clusters, confirm supported versions with system administrators.