Skip to content

GPU Features

The VACC has a heterogeneous GPU environment, and not all GPUs are created equal. The cluster contains a mix of GPU architectures spanning multiple generations, each with different strengths in floating-point precision, memory capacity, and compute capability. Understanding these differences is critical to getting accurate results and optimal performance from your jobs.

This guide explains the GPU feature tagging system, how to request GPU features, why floating-point precision matters, and how to select the right GPU for your jobs.

GPU Feature Constraints Are Required

All GPU jobs on the VACC must include a --constraint specifying a valid GPU feature. Jobs submitted without one will receive a scheduler warning.

Not sure which GPU you need?

  • Use --constraint=GPU_FP:FP64 to route your job to one of our 112 GPUs (A100, H100, H200) that handle both floating-point precisions efficiently.
  • Use --constraint=GPU_FP:FP32 or --constraint=GPU_ANY if you only need single precision — this opens up our full inventory of 160+ GPUs and can significantly reduce queue times.

To learn more about all available GPU features or see request examples, read on.


Why Floating-Point Precision Matters

Floating-point precision is essentially about how many decimal places a GPU tracks during a calculation. Double precision (FP64) tracks roughly 15–16 significant digits; single precision (FP32) tracks about 7. For many workloads, the extra digits simply don't matter - but some are sensitive to the loss of precision.

For example, teaching a neural network to recognize a cat in a photo works fine in FP32. The model is learning rough patterns from millions of examples, and a tiny rounding difference in any one calculation has no meaningful effect on the result. In contrast, running a large statistical analysis where small errors accumulate across millions of chained calculations requires the additional precision of FP64 to avoid significant rounding errors.

In the context of the VACC, all GPUs support FP32, but only the A100, H100, and H200 GPUs support FP64.


VACC GPU Inventory

The following GPUs are currently available on the VACC cluster:

GPU Generation VRAM FP64 Rate Compute Capability Cards in Cluster Best For
NVIDIA A100 Ampere 40 GB 1/2 FP32 8.0 4 Scientific computing, ML training
NVIDIA H100 Hopper 80 GB 1/2 FP32 9.0 8 Scientific computing, ML training, Large model training
NVIDIA H200 Hopper 140 GB 1/2 FP32 9.0 104 Scientific computing, ML training, LLM training/inference, large datasets
NVIDIA RTX 6000 Pro Blackwell 96 GB 1/64 FP32 12.0 48 ML inference, large model inference, visualization, FP32 workloads

RTX 6000 Pro and Double Precision

The RTX 6000 Pro has excellent FP32 performance and a large memory pool, but its FP64 throughput is approximately 1/64th of its FP32 rate. If your workload requires double precision, you should target an A100, H100, or H200 GPU or use the --constraint=GPU_FP:FP64 flag to request a GPU with double-precision support.

Multi-GPU Support

Choosing the right GPU for multi-GPU jobs depends on how well your job/code scales across multiple GPUs. Here's a breakdown of the multi-GPU capabilities of each GPU type on the VACC:

GPU Interconnect GPU-to-GPU Bandwidth Multi-GPU Description
A100 PCIe Gen4 ~32 GB/s X Good for data-parallel workloads, not recommended for multi-gpu workloads
H100 NVLink ~300 GB/s Great — low latency, high bandwidth between cards
H200 (SXM) NVLink + NVSwitch ~900 GB/s Excellent — low latency, high bandwidth between cards
H200 (NVL) NVLink Bridge 4.0 + PCIe Gen5 ~300 GB/s Great — low latency, high bandwidth between cards
RTX 6000 Pro PCIe Gen5 ~64 GB/s X Good for data-parallel workloads, not recommended for multi-gpu workloads

Available GPU Features

All GPU nodes on the VACC are tagged with features that describe their capabilities. These features allow you to request specific hardware at job submission time using the --constraint flag.

Feature Reference

Feature Description
GPU_ANY Any available GPU (no preference)
Floating-Point Capability
GPU_FP:FP64 GPU with full-rate double-precision support
GPU_FP:FP32 GPU with single-precision support (all GPUs)
GPU SKU
GPU_SKU:A100 NVIDIA A100
GPU_SKU:H100 NVIDIA H100
GPU_SKU:H200 NVIDIA H200
GPU_SKU:H200_NVL NVIDIA H200 NVLink variant
GPU_SKU:H200_SXM NVIDIA H200 SXM variant
GPU_SKU:H100_SXM NVIDIA H100 SXM variant
GPU_SKU:RTX6000 NVIDIA RTX 6000 Pro
GPU Generation
GPU_GEN:AMPERE NVIDIA Ampere architecture
GPU_GEN:HOPPER NVIDIA Hopper architecture
GPU_GEN:BLACKWELL NVIDIA Blackwell architecture
GPU Memory
GPU_MEM:40GB 40 GB VRAM
GPU_MEM:80GB 80 GB VRAM
GPU_MEM:96GB 96 GB VRAM
GPU_MEM:140GB 140 GB VRAM
Compute Capability
GPU_CC:8.0 Compute Capability 8.0 (Ampere)
GPU_CC:9.0 Compute Capability 9.0 (Hopper)
GPU_CC:12.0 Compute Capability 12.0 (Blackwell)

Feature Mapping by GPU

Feature A100 H100 H200 RTX 6000 Pro
GPU_FP:FP64
GPU_FP:FP32
GPU_GEN:AMPERE
GPU_GEN:HOPPER
GPU_GEN:BLACKWELL
GPU_MEM:40GB
GPU_MEM:80GB
GPU_MEM:96GB
GPU_MEM:140GB
GPU_CC:8.0
GPU_CC:9.0
GPU_CC:12.0

Requesting GPU Features

Basic Usage

Use the -C or --constraint flag to specify the GPU features your job requires:

# Request any GPU with double-precision support
sbatch --gpus=1 --constraint=GPU_FP:FP64 myjob.sh

# Request first available GPU (no preference)
sbatch --gpus=1 --constraint=GPU_ANY myjob.sh

# Request a specific SKU
sbatch --gpus=1 --constraint=GPU_SKU:H200 myjob.sh

# Request by memory capacity
sbatch --gpus=1 --constraint=GPU_MEM:140GB myjob.sh

Or equivalently in your batch script:

#!/bin/bash
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64
#SBATCH --time=04:00:00

python my_job.py

Combining Constraints

Multiple constraints can be combined using logical operators:

AND - require all features (use &):

# Hopper GPU with 140GB memory
#SBATCH --constraint="GPU_GEN:HOPPER&GPU_MEM:140GB"

OR — accept any matching feature (use |):

# Either an H100 or H200
#SBATCH --constraint="GPU_SKU:H100|GPU_SKU:H200"

Matching OR — all nodes in a multi-node job get the same feature (use []):

# Multi-node job where all nodes have the same GPU type
#SBATCH --nodes=4
#SBATCH --constraint="[GPU_SKU:H100|GPU_SKU:H200]"

Many node feature combinations are impossible to satisfy

Many combinations will result in impossible conditions, and will make jobs impossible to run on any node. The scheduler is usually able to detect this and reject the job at submission time.

For instance, submitting a job requesting an A100 GPU with 96GB of VRAM:

#SBATCH -C 'GPU_SKU:A100&GPU_MEM:96GB'
will result in the following error:
error: Job submit/allocate failed: Requested node configuration is not available
as the A100 GPU is only available with 40GB of VRAM.

Node features are text tags

Features have no numerical value and cannot be compared with greater-than or less-than operators. To request "at least 80GB of VRAM," you must enumerate the valid options:

#SBATCH --constraint="GPU_MEM:80GB|GPU_MEM:96GB|GPU_MEM:140GB"


Choosing the Right GPU for Your Workload

Quick Decision Guide

Does your job require FP64 (double precision)?
├── Yes -> Use --constraint=GPU_FP:FP64
│         (A100, H100, or H200)
├── No
│    ├── Do you need >80GB VRAM?
│    │   ├── Yes -> --constraint="GPU_MEM:96GB|GPU_MEM:140GB"
│    │   └── No  -> --constraint=GPU_ANY
│    └── Is this ML training with mixed precision?
│        ├── Yes -> --constraint=GPU_ANY (all GPUs have Tensor Cores)
└── Unsure -> See workload examples below; when in doubt, request --constraint=GPU_FP:FP64

Common Workloads and Recommendations

This is not an exhaustive list of all possible workloads, but is provided as an example of how different jobs require different GPU features.

Tip

Check your software documentation for precision requirements. When in doubt, request GPU_FP:FP64 — the job may wait a bit longer in the queue but will run with reasonable performance on any assigned GPU. Reach out to VACC Help if you'd like help choosing the right GPU constraints for your workload.

Deep Learning: PyTorch and TensorFlow

Most deep learning training and inference is FP32 or mixed precision. Any GPU on the cluster will work, but consider memory requirements.

# Standard training — any GPU is fine
#SBATCH --gpus=1
#SBATCH --constraint=GPU_ANY

# Large language model fine-tuning — need high VRAM
#SBATCH --gpus=1
#SBATCH --constraint=GPU_MEM:140GB

# Multi-GPU training with NCCL — match GPU types
#SBATCH --gpus=4
#SBATCH --constraint=GPU_SKU:H200

PyTorch automatic mixed precision (torch.amp) and TensorFlow mixed precision (tf.keras.mixed_precision) will leverage Tensor Cores on all available GPUs. Hopper and Blackwell generation GPUs additionally support FP8 via libraries like Transformer Engine.

Ollama, vLLM and LLM Inference

LLM inference servers like vLLM are memory-bound. The primary concern is fitting the model in VRAM.

Model Size Minimum VRAM Recommended Constraint
7B (FP16) ~14 GB GPU_ANY
13B (FP16) ~26 GB GPU_ANY
70B (FP16) ~140 GB GPU_MEM:140GB
70B (INT8) ~70 GB GPU_MEM:80GB|GPU_MEM:96GB|GPU_MEM:140GB

Molecular Dynamics: AMBER, LAMMPS

Molecular dynamics codes vary in their precision requirements. Many modern MD software perform neighbor-list and non-bonded force calculations in FP32 on the GPU, while accumulating energies and constraints in FP64 on the CPU. However, some configurations and force fields require GPU-side FP64.

# LAMMPS with GPU-side double precision (e.g., long-range Coulomb)
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64

Computational Fluid Dynamics: OpenFOAM

CFD solvers almost universally require FP64 for numerical stability. Pressure-velocity coupling, iterative linear solvers, and turbulence models all depend on double-precision arithmetic.

# CFD workloads — always request FP64
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64

Quantum Chemistry: Gaussian

Quantum chemistry calculations rely on FP64 throughout. There is no FP32 fallback for these workloads.

# Quantum chemistry — must be FP64
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64

Visualization and Rendering

Interactive visualization, 3D graphics rendering, and video processing are FP32 workloads. The RTX 6000 Pro's large 96GB VRAM and ray tracing cores make it a great choice.

# Visualization workloads
#SBATCH --gpus=1
#SBATCH --constraint=GPU_SKU:RTX6000

Listing Available Features

To see which features are available on nodes in a given partition, use sinfo:

# List all GPU features in the nvgpu partition
sinfo -p nvgpu -o "%N %f" | grep GPU

# Check features on a specific node
scontrol show node h2xnode01 | grep Features

Summary

If your workload is... Use this constraint Why
ML training (PyTorch, TensorFlow) GPU_ANY FP32/mixed precision; all GPUs work
LLM inference (vLLM, TGI) GPU_MEM:140GB or GPU_MEM:96GB Memory-bound; Enough VRAM to fit the model
Molecular dynamics (GROMACS, NAMD) GPU_ANY or GPU_FP:FP64 Check your software documentation
CFD (OpenFOAM, Fluent) GPU_FP:FP64 Requires double precision
Quantum chemistry (Gaussian) GPU_FP:FP64 Requires double precision
Visualization / rendering GPU_SKU:RTX6000 FP32; benefits from RT cores
Don't know / don't care GPU_ANY Fastest scheduling; any GPU

Need help?

If you're unsure which GPU feature to request for your workload, we suggest starting with --constraint=GPU_FP:FP64. You can then send a question to the VACC support team at vacchelp@uvm.edu or attend our office hours to discuss your specific needs.

Choosing the wrong precision can lead to miserable performance or silently incorrect results that are difficult to detect.