GPU Features¶

The VACC has a heterogeneous GPU environment, and not all GPUs are created equal. The cluster contains a mix of GPU architectures spanning multiple generations, each with different strengths in floating-point precision, memory capacity, and compute capability. Understanding these differences is critical to getting accurate results and optimal performance from your jobs.

This guide explains the GPU feature tagging system, how to request GPU features, why floating-point precision matters, and how to select the right GPU for your jobs.

GPU Feature Constraints Are Required

All GPU jobs on the VACC must include a --constraint specifying a valid GPU feature. Jobs submitted without one will receive a scheduler warning.

Not sure which GPU you need?

Use --constraint=GPU_FP:FP64 to route your job to one of our 112 GPUs (A100, H100, H200) that handle both floating-point precisions efficiently.
Use --constraint=GPU_FP:FP32 or --constraint=GPU_ANY if you only need single precision — this opens up our full inventory of 160+ GPUs and can significantly reduce queue times.

To learn more about all available GPU features or see request examples, read on.

Why Floating-Point Precision Matters¶

Floating-point precision is essentially about how many decimal places a GPU tracks during a calculation. Double precision (FP64) tracks roughly 15–16 significant digits; single precision (FP32) tracks about 7. For many workloads, the extra digits simply don't matter - but some are sensitive to the loss of precision.

For example, teaching a neural network to recognize a cat in a photo works fine in FP32. The model is learning rough patterns from millions of examples, and a tiny rounding difference in any one calculation has no meaningful effect on the result. In contrast, running a large statistical analysis where small errors accumulate across millions of chained calculations requires the additional precision of FP64 to avoid significant rounding errors.

In the context of the VACC, all GPUs support FP32, but only the A100, H100, and H200 GPUs support FP64.

VACC GPU Inventory¶

The following GPUs are currently available on the VACC cluster:

GPU	Generation	VRAM	FP32 TFLOPS Rate	FP64 Rate	Compute Capability	Cards in Cluster	Best For
NVIDIA A100	Ampere	40 GB	19.5 TF/s	1/2 FP32	8.0	4	Scientific computing, ML training
NVIDIA H100	Hopper	80 GB	~65 TF/s	1/2 FP32	9.0	8	Scientific computing, ML training, Large model training
NVIDIA H200	Hopper	140 GB	~65 TF/s	1/2 FP32	9.0	104	Scientific computing, ML training, LLM training/inference, large datasets
NVIDIA RTX 6000 Pro	Blackwell	96 GB	125 TF/s	1/64 FP32	12.0	48	ML inference, large model inference, visualization, FP32 workloads

RTX 6000 Pro and Double Precision

The RTX 6000 Pro has excellent FP32 performance and a large memory pool, but its FP64 throughput is approximately 1/64th of its FP32 rate. If your workload requires double precision, you should target an A100, H100, or H200 GPU or use the --constraint=GPU_FP:FP64 flag to request a GPU with double-precision support.

Multi-GPU Support¶

Choosing the right GPU for multi-GPU jobs depends on how well your job/code scales across multiple GPUs. Here's a breakdown of the multi-GPU capabilities of each GPU type on the VACC:

GPU	Interconnect	GPU-to-GPU Bandwidth	Multi-GPU	Description
A100	PCIe Gen4	~32 GB/s	X	Good for data-parallel workloads, not recommended for multi-gpu workloads
H100	NVLink	~300 GB/s	✓	Great — low latency, high bandwidth between cards
H200 (SXM)	NVLink + NVSwitch	~900 GB/s	✓	Excellent — low latency, high bandwidth between cards
H200 (NVL)	NVLink Bridge 4.0 + PCIe Gen5	~300 GB/s	✓	Great — low latency, high bandwidth between cards
RTX 6000 Pro	PCIe Gen5	~64 GB/s	X	Good for data-parallel workloads, not recommended for multi-gpu workloads

Available GPU Features¶

All GPU nodes on the VACC are tagged with features that describe their capabilities. These features allow you to request specific hardware at job submission time using the --constraint flag.

Feature Reference¶

Feature	Description
`GPU_ANY`	Any available GPU (no preference)
Floating-Point Capability
`GPU_FP:FP64`	GPU with full-rate double-precision support
`GPU_FP:FP32`	GPU with single-precision support (all GPUs)
GPU SKU
`GPU_SKU:A100`	NVIDIA A100
`GPU_SKU:H100`	NVIDIA H100
`GPU_SKU:H200`	NVIDIA H200
`GPU_SKU:H200_NVL`	NVIDIA H200 NVLink variant
`GPU_SKU:H200_SXM`	NVIDIA H200 SXM variant
`GPU_SKU:H100_SXM`	NVIDIA H100 SXM variant
`GPU_SKU:RTX6000`	NVIDIA RTX 6000 Pro
GPU Generation
`GPU_GEN:AMPERE`	NVIDIA Ampere architecture
`GPU_GEN:HOPPER`	NVIDIA Hopper architecture
`GPU_GEN:BLACKWELL`	NVIDIA Blackwell architecture
GPU Memory
`GPU_MEM:40GB`	40 GB VRAM
`GPU_MEM:80GB`	80 GB VRAM
`GPU_MEM:96GB`	96 GB VRAM
`GPU_MEM:140GB`	140 GB VRAM
Compute Capability
`GPU_CC:8.0`	Compute Capability 8.0 (Ampere)
`GPU_CC:9.0`	Compute Capability 9.0 (Hopper)
`GPU_CC:12.0`	Compute Capability 12.0 (Blackwell)

Feature Mapping by GPU¶

Feature	A100	H100	H200	RTX 6000 Pro
`GPU_FP:FP64`	✓	✓	✓
`GPU_FP:FP32`	✓	✓	✓	✓
`GPU_GEN:AMPERE`	✓
`GPU_GEN:HOPPER`		✓	✓
`GPU_GEN:BLACKWELL`				✓
`GPU_MEM:40GB`	✓
`GPU_MEM:80GB`		✓
`GPU_MEM:96GB`				✓
`GPU_MEM:140GB`			✓
`GPU_CC:8.0`	✓
`GPU_CC:9.0`		✓	✓
`GPU_CC:12.0`				✓

Requesting GPU Features¶

Basic Usage¶

Use the -C or --constraint flag to specify the GPU features your job requires:

# Request any GPU with double-precision support
sbatch --gpus=1 --constraint=GPU_FP:FP64 myjob.sh

# Request first available GPU (no preference)
sbatch --gpus=1 --constraint=GPU_ANY myjob.sh

# Request a specific SKU
sbatch --gpus=1 --constraint=GPU_SKU:H200 myjob.sh

# Request by memory capacity
sbatch --gpus=1 --constraint=GPU_MEM:140GB myjob.sh

Or equivalently in your batch script:

#!/bin/bash
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64
#SBATCH --time=04:00:00

python my_job.py

Combining Constraints¶

Multiple constraints can be combined using logical operators:

AND - require all features (use &):

# Hopper GPU with 140GB memory
#SBATCH --constraint="GPU_GEN:HOPPER&GPU_MEM:140GB"

OR — accept any matching feature (use |):

# Either an H100 or H200
#SBATCH --constraint="GPU_SKU:H100|GPU_SKU:H200"

Matching OR — all nodes in a multi-node job get the same feature (use []):

# Multi-node job where all nodes have the same GPU type
#SBATCH --nodes=4
#SBATCH --constraint="[GPU_SKU:H100|GPU_SKU:H200]"

Many node feature combinations are impossible to satisfy

Many combinations will result in impossible conditions, and will make jobs impossible to run on any node. The scheduler is usually able to detect this and reject the job at submission time.

For instance, submitting a job requesting an A100 GPU with 96GB of VRAM:

#SBATCH -C 'GPU_SKU:A100&GPU_MEM:96GB'

will result in the following error:

error: Job submit/allocate failed: Requested node configuration is not available

as the A100 GPU is only available with 40GB of VRAM.

Node features are text tags

Features have no numerical value and cannot be compared with greater-than or less-than operators. To request "at least 80GB of VRAM," you must enumerate the valid options:

#SBATCH --constraint="GPU_MEM:80GB|GPU_MEM:96GB|GPU_MEM:140GB"

Choosing the Right GPU for Your Workload¶

Quick Decision Guide¶

Does your job require FP64 (double precision)?
├── Yes -> Use --constraint=GPU_FP:FP64
│         (A100, H100, or H200)
├── No
│    ├── Do you need >80GB VRAM?
│    │   ├── Yes -> --constraint="GPU_MEM:96GB|GPU_MEM:140GB"
│    │   └── No  -> --constraint=GPU_ANY
│    └── Is this ML training with mixed precision?
│        ├── Yes -> --constraint=GPU_ANY (all GPUs have Tensor Cores)
└── Unsure -> See workload examples below; when in doubt, request --constraint=GPU_FP:FP64

Common Workloads and Recommendations¶

This is not an exhaustive list of all possible workloads, but is provided as an example of how different jobs require different GPU features.

Tip

Check your software documentation for precision requirements. When in doubt, request GPU_FP:FP64 — the job may wait a bit longer in the queue but will run with reasonable performance on any assigned GPU. Reach out to VACC Help if you'd like help choosing the right GPU constraints for your workload.

Deep Learning: PyTorch and TensorFlow¶

Most deep learning training and inference is FP32 or mixed precision. Any GPU on the cluster will work, but consider memory requirements.

# Standard training — any GPU is fine
#SBATCH --gpus=1
#SBATCH --constraint=GPU_ANY

# Large language model fine-tuning — need high VRAM
#SBATCH --gpus=1
#SBATCH --constraint=GPU_MEM:140GB

# Multi-GPU training with NCCL — match GPU types
#SBATCH --gpus=4
#SBATCH --constraint=GPU_SKU:H200

PyTorch automatic mixed precision (torch.amp) and TensorFlow mixed precision (tf.keras.mixed_precision) will leverage Tensor Cores on all available GPUs. Hopper and Blackwell generation GPUs additionally support FP8 via libraries like Transformer Engine.

Ollama, vLLM and LLM Inference¶

LLM inference servers like vLLM are memory-bound. The primary concern is fitting the model in VRAM.

Model Size	Minimum VRAM	Recommended Constraint
7B (FP16)	~14 GB	`GPU_ANY`
13B (FP16)	~26 GB	`GPU_ANY`
70B (FP16)	~140 GB	`GPU_MEM:140GB`
70B (INT8)	~70 GB	`GPU_MEM:80GB\|GPU_MEM:96GB\|GPU_MEM:140GB`

Molecular Dynamics: AMBER, LAMMPS¶

Molecular dynamics codes vary in their precision requirements. Many modern MD software perform neighbor-list and non-bonded force calculations in FP32 on the GPU, while accumulating energies and constraints in FP64 on the CPU. However, some configurations and force fields require GPU-side FP64.

# LAMMPS with GPU-side double precision (e.g., long-range Coulomb)
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64

Computational Fluid Dynamics: OpenFOAM¶

CFD solvers almost universally require FP64 for numerical stability. Pressure-velocity coupling, iterative linear solvers, and turbulence models all depend on double-precision arithmetic.

# CFD workloads — always request FP64
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64

Quantum Chemistry: Gaussian¶

Quantum chemistry calculations rely on FP64 throughout. There is no FP32 fallback for these workloads.

# Quantum chemistry — must be FP64
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64

Visualization and Rendering¶

Interactive visualization, 3D graphics rendering, and video processing are FP32 workloads. The RTX 6000 Pro's large 96GB VRAM and ray tracing cores make it a great choice.

# Visualization workloads
#SBATCH --gpus=1
#SBATCH --constraint=GPU_SKU:RTX6000

Listing Available Features¶

To see which features are available on nodes in a given partition, use sinfo:

# List all GPU features in the nvgpu partition
sinfo -p nvgpu -o "%N %f" | grep GPU

# Check features on a specific node
scontrol show node h2xnode01 | grep Features

Summary¶

If your workload is...	Use this constraint	Why
ML training (PyTorch, TensorFlow)	`GPU_ANY`	FP32/mixed precision; all GPUs work
LLM inference (vLLM, TGI)	`GPU_MEM:140GB` or `GPU_MEM:96GB`	Memory-bound; Enough VRAM to fit the model
Molecular dynamics (GROMACS, NAMD)	`GPU_ANY` or `GPU_FP:FP64`	Check your software documentation
CFD (OpenFOAM, Fluent)	`GPU_FP:FP64`	Requires double precision
Quantum chemistry (Gaussian)	`GPU_FP:FP64`	Requires double precision
Visualization / rendering	`GPU_SKU:RTX6000`	FP32; benefits from RT cores
Don't know / don't care	`GPU_ANY`	Fastest scheduling; any GPU

Need help?

If you're unsure which GPU feature to request for your workload, we suggest starting with --constraint=GPU_FP:FP64. You can then send a question to the VACC support team at vacchelp@uvm.edu or attend our office hours to discuss your specific needs.

Choosing the wrong precision can lead to miserable performance or silently incorrect results that are difficult to detect.