GPU Features¶
The VACC has a heterogeneous GPU environment, and not all GPUs are created equal. The cluster contains a mix of GPU architectures spanning multiple generations, each with different strengths in floating-point precision, memory capacity, and compute capability. Understanding these differences is critical to getting accurate results and optimal performance from your jobs.
This guide explains the GPU feature tagging system, how to request GPU features, why floating-point precision matters, and how to select the right GPU for your jobs.
GPU Feature Constraints Are Required
All GPU jobs on the VACC must include a --constraint specifying a valid GPU feature. Jobs submitted without one will receive a scheduler warning.
Not sure which GPU you need?
- Use
--constraint=GPU_FP:FP64to route your job to one of our 112 GPUs (A100, H100, H200) that handle both floating-point precisions efficiently. - Use
--constraint=GPU_FP:FP32or--constraint=GPU_ANYif you only need single precision — this opens up our full inventory of 160+ GPUs and can significantly reduce queue times.
To learn more about all available GPU features or see request examples, read on.
Why Floating-Point Precision Matters¶
Floating-point precision is essentially about how many decimal places a GPU tracks during a calculation. Double precision (FP64) tracks roughly 15–16 significant digits; single precision (FP32) tracks about 7. For many workloads, the extra digits simply don't matter - but some are sensitive to the loss of precision.
For example, teaching a neural network to recognize a cat in a photo works fine in FP32. The model is learning rough patterns from millions of examples, and a tiny rounding difference in any one calculation has no meaningful effect on the result. In contrast, running a large statistical analysis where small errors accumulate across millions of chained calculations requires the additional precision of FP64 to avoid significant rounding errors.
In the context of the VACC, all GPUs support FP32, but only the A100, H100, and H200 GPUs support FP64.
VACC GPU Inventory¶
The following GPUs are currently available on the VACC cluster:
| GPU | Generation | VRAM | FP64 Rate | Compute Capability | Cards in Cluster | Best For |
|---|---|---|---|---|---|---|
| NVIDIA A100 | Ampere | 40 GB | 1/2 FP32 | 8.0 | 4 | Scientific computing, ML training |
| NVIDIA H100 | Hopper | 80 GB | 1/2 FP32 | 9.0 | 8 | Scientific computing, ML training, Large model training |
| NVIDIA H200 | Hopper | 140 GB | 1/2 FP32 | 9.0 | 104 | Scientific computing, ML training, LLM training/inference, large datasets |
| NVIDIA RTX 6000 Pro | Blackwell | 96 GB | 1/64 FP32 | 12.0 | 48 | ML inference, large model inference, visualization, FP32 workloads |
RTX 6000 Pro and Double Precision
The RTX 6000 Pro has excellent FP32 performance and a large memory pool, but its FP64 throughput is approximately 1/64th of its FP32 rate. If your workload requires double precision, you should target an A100, H100, or H200 GPU or use the --constraint=GPU_FP:FP64 flag to request a GPU with double-precision support.
Multi-GPU Support¶
Choosing the right GPU for multi-GPU jobs depends on how well your job/code scales across multiple GPUs. Here's a breakdown of the multi-GPU capabilities of each GPU type on the VACC:
| GPU | Interconnect | GPU-to-GPU Bandwidth | Multi-GPU | Description |
|---|---|---|---|---|
| A100 | PCIe Gen4 | ~32 GB/s | X | Good for data-parallel workloads, not recommended for multi-gpu workloads |
| H100 | NVLink | ~300 GB/s | ✓ | Great — low latency, high bandwidth between cards |
| H200 (SXM) | NVLink + NVSwitch | ~900 GB/s | ✓ | Excellent — low latency, high bandwidth between cards |
| H200 (NVL) | NVLink Bridge 4.0 + PCIe Gen5 | ~300 GB/s | ✓ | Great — low latency, high bandwidth between cards |
| RTX 6000 Pro | PCIe Gen5 | ~64 GB/s | X | Good for data-parallel workloads, not recommended for multi-gpu workloads |
Available GPU Features¶
All GPU nodes on the VACC are tagged with features that describe their capabilities. These features allow you to request specific hardware at job submission time using the --constraint flag.
Feature Reference¶
| Feature | Description |
|---|---|
GPU_ANY |
Any available GPU (no preference) |
| Floating-Point Capability | |
GPU_FP:FP64 |
GPU with full-rate double-precision support |
GPU_FP:FP32 |
GPU with single-precision support (all GPUs) |
| GPU SKU | |
GPU_SKU:A100 |
NVIDIA A100 |
GPU_SKU:H100 |
NVIDIA H100 |
GPU_SKU:H200 |
NVIDIA H200 |
GPU_SKU:H200_NVL |
NVIDIA H200 NVLink variant |
GPU_SKU:H200_SXM |
NVIDIA H200 SXM variant |
GPU_SKU:H100_SXM |
NVIDIA H100 SXM variant |
GPU_SKU:RTX6000 |
NVIDIA RTX 6000 Pro |
| GPU Generation | |
GPU_GEN:AMPERE |
NVIDIA Ampere architecture |
GPU_GEN:HOPPER |
NVIDIA Hopper architecture |
GPU_GEN:BLACKWELL |
NVIDIA Blackwell architecture |
| GPU Memory | |
GPU_MEM:40GB |
40 GB VRAM |
GPU_MEM:80GB |
80 GB VRAM |
GPU_MEM:96GB |
96 GB VRAM |
GPU_MEM:140GB |
140 GB VRAM |
| Compute Capability | |
GPU_CC:8.0 |
Compute Capability 8.0 (Ampere) |
GPU_CC:9.0 |
Compute Capability 9.0 (Hopper) |
GPU_CC:12.0 |
Compute Capability 12.0 (Blackwell) |
Feature Mapping by GPU¶
| Feature | A100 | H100 | H200 | RTX 6000 Pro |
|---|---|---|---|---|
GPU_FP:FP64 |
✓ | ✓ | ✓ | |
GPU_FP:FP32 |
✓ | ✓ | ✓ | ✓ |
GPU_GEN:AMPERE |
✓ | |||
GPU_GEN:HOPPER |
✓ | ✓ | ||
GPU_GEN:BLACKWELL |
✓ | |||
GPU_MEM:40GB |
✓ | |||
GPU_MEM:80GB |
✓ | |||
GPU_MEM:96GB |
✓ | |||
GPU_MEM:140GB |
✓ | |||
GPU_CC:8.0 |
✓ | |||
GPU_CC:9.0 |
✓ | ✓ | ||
GPU_CC:12.0 |
✓ |
Requesting GPU Features¶
Basic Usage¶
Use the -C or --constraint flag to specify the GPU features your job requires:
# Request any GPU with double-precision support
sbatch --gpus=1 --constraint=GPU_FP:FP64 myjob.sh
# Request first available GPU (no preference)
sbatch --gpus=1 --constraint=GPU_ANY myjob.sh
# Request a specific SKU
sbatch --gpus=1 --constraint=GPU_SKU:H200 myjob.sh
# Request by memory capacity
sbatch --gpus=1 --constraint=GPU_MEM:140GB myjob.sh
Or equivalently in your batch script:
#!/bin/bash
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64
#SBATCH --time=04:00:00
python my_job.py
Combining Constraints¶
Multiple constraints can be combined using logical operators:
AND - require all features (use &):
OR — accept any matching feature (use |):
Matching OR — all nodes in a multi-node job get the same feature (use []):
# Multi-node job where all nodes have the same GPU type
#SBATCH --nodes=4
#SBATCH --constraint="[GPU_SKU:H100|GPU_SKU:H200]"
Many node feature combinations are impossible to satisfy
Many combinations will result in impossible conditions, and will make jobs impossible to run on any node. The scheduler is usually able to detect this and reject the job at submission time.
For instance, submitting a job requesting an A100 GPU with 96GB of VRAM:
will result in the following error: as the A100 GPU is only available with 40GB of VRAM.Node features are text tags
Features have no numerical value and cannot be compared with greater-than or less-than operators. To request "at least 80GB of VRAM," you must enumerate the valid options:
Choosing the Right GPU for Your Workload¶
Quick Decision Guide¶
Does your job require FP64 (double precision)?
├── Yes -> Use --constraint=GPU_FP:FP64
│ (A100, H100, or H200)
├── No
│ ├── Do you need >80GB VRAM?
│ │ ├── Yes -> --constraint="GPU_MEM:96GB|GPU_MEM:140GB"
│ │ └── No -> --constraint=GPU_ANY
│ └── Is this ML training with mixed precision?
│ ├── Yes -> --constraint=GPU_ANY (all GPUs have Tensor Cores)
└── Unsure -> See workload examples below; when in doubt, request --constraint=GPU_FP:FP64
Common Workloads and Recommendations¶
This is not an exhaustive list of all possible workloads, but is provided as an example of how different jobs require different GPU features.
Tip
Check your software documentation for precision requirements. When in doubt, request GPU_FP:FP64 — the job may wait a bit longer in the queue but will run with reasonable performance on any assigned GPU. Reach out to VACC Help if you'd like help choosing the right GPU constraints for your workload.
Deep Learning: PyTorch and TensorFlow¶
Most deep learning training and inference is FP32 or mixed precision. Any GPU on the cluster will work, but consider memory requirements.
# Standard training — any GPU is fine
#SBATCH --gpus=1
#SBATCH --constraint=GPU_ANY
# Large language model fine-tuning — need high VRAM
#SBATCH --gpus=1
#SBATCH --constraint=GPU_MEM:140GB
# Multi-GPU training with NCCL — match GPU types
#SBATCH --gpus=4
#SBATCH --constraint=GPU_SKU:H200
PyTorch automatic mixed precision (torch.amp) and TensorFlow mixed precision (tf.keras.mixed_precision) will leverage Tensor Cores on all available GPUs. Hopper and Blackwell generation GPUs additionally support FP8 via libraries like Transformer Engine.
Ollama, vLLM and LLM Inference¶
LLM inference servers like vLLM are memory-bound. The primary concern is fitting the model in VRAM.
| Model Size | Minimum VRAM | Recommended Constraint |
|---|---|---|
| 7B (FP16) | ~14 GB | GPU_ANY |
| 13B (FP16) | ~26 GB | GPU_ANY |
| 70B (FP16) | ~140 GB | GPU_MEM:140GB |
| 70B (INT8) | ~70 GB | GPU_MEM:80GB|GPU_MEM:96GB|GPU_MEM:140GB |
Molecular Dynamics: AMBER, LAMMPS¶
Molecular dynamics codes vary in their precision requirements. Many modern MD software perform neighbor-list and non-bonded force calculations in FP32 on the GPU, while accumulating energies and constraints in FP64 on the CPU. However, some configurations and force fields require GPU-side FP64.
# LAMMPS with GPU-side double precision (e.g., long-range Coulomb)
#SBATCH --gpus=1
#SBATCH --constraint=GPU_FP:FP64
Computational Fluid Dynamics: OpenFOAM¶
CFD solvers almost universally require FP64 for numerical stability. Pressure-velocity coupling, iterative linear solvers, and turbulence models all depend on double-precision arithmetic.
Quantum Chemistry: Gaussian¶
Quantum chemistry calculations rely on FP64 throughout. There is no FP32 fallback for these workloads.
Visualization and Rendering¶
Interactive visualization, 3D graphics rendering, and video processing are FP32 workloads. The RTX 6000 Pro's large 96GB VRAM and ray tracing cores make it a great choice.
Listing Available Features¶
To see which features are available on nodes in a given partition, use sinfo:
# List all GPU features in the nvgpu partition
sinfo -p nvgpu -o "%N %f" | grep GPU
# Check features on a specific node
scontrol show node h2xnode01 | grep Features
Summary¶
| If your workload is... | Use this constraint | Why |
|---|---|---|
| ML training (PyTorch, TensorFlow) | GPU_ANY |
FP32/mixed precision; all GPUs work |
| LLM inference (vLLM, TGI) | GPU_MEM:140GB or GPU_MEM:96GB |
Memory-bound; Enough VRAM to fit the model |
| Molecular dynamics (GROMACS, NAMD) | GPU_ANY or GPU_FP:FP64 |
Check your software documentation |
| CFD (OpenFOAM, Fluent) | GPU_FP:FP64 |
Requires double precision |
| Quantum chemistry (Gaussian) | GPU_FP:FP64 |
Requires double precision |
| Visualization / rendering | GPU_SKU:RTX6000 |
FP32; benefits from RT cores |
| Don't know / don't care | GPU_ANY |
Fastest scheduling; any GPU |
Need help?
If you're unsure which GPU feature to request for your workload, we suggest starting with --constraint=GPU_FP:FP64. You can then send a question to the VACC support team at vacchelp@uvm.edu or attend our office hours to discuss your specific needs.
Choosing the wrong precision can lead to miserable performance or silently incorrect results that are difficult to detect.