The VACC includes three clusters: Bluemoon, DeepGreen, and BlackDiamond.

For a download of cluster specs, as well as logos and requirements for citing the VACC in publications, talks or proceedings, see Assets for Grants & Publications

Bluemoon

Bluemoon has 161 nodes, providing 8392 compute cores; 5120 of these cores are available via HDR Infiniband. This cluster supports large-scale computation, low-latency networking for MPI workloads, large memory systems, and high-performance parallel filesystems.

Hardware

  • 39 dual-processor, 128-core AMD Epyc 7763 PowerEdge R6525 nodes, with 1TB RAM each. Mixed use: Infiniband connected HDR100 for file access as well as MPI communication along with 25Gb Ethernet
  • 2 dual-processor, 128-core AMD Epyc 7763 PowerEdge R7525 nodes, with 1TB RAM each and 1 A100 GPU
  • 1 dual-processor, 64-core EPYC 7543 PowerEdge R7525 node, with 4TB RAM. Infiniband connected HDR100 for file access, 10/25Gb Ethernet
  • 32 dual-processor, 12-core (Intel E5-2650 v4) Dell PowerEdge R430 nodes, with 64GB RAM each, 10Gb Ethernet-connected
  • 8 dual-processor, 12-core (Intel E5-2650 v4) Dell PowerEdge R430 nodes, with 256GB RAM each, 10Gb Ethernet-connected
  • 9 dual-processor, 20 core (Intel 6230), PowerEdge R440, with 10GB RAM, 10Gb Ethernet-connected
  • 3 dual-processor, 10-core (Intel E5-2650 v3) Dell PowerEdge R630 nodes, with 256GB RAM each, Ethernet-connected
  • 40 dual-processor, 10-core (Intel E5-2650 v3) Dell PowerEdge R630 nodes, with 64GB RAM each, Infiniband 4XFDR (56Gb)-connected
  • 2 dual-processor, 12-core (Intel E5-2650 v4) Dell R730, with 1TB RAM
  • 1 dual-processor, 8-core (Intel E7-8837) IBM x3690 x5, with 512GB RAM
  • 2 dual-processor, 12-core (Intel E5-2650 v4) Dell R730 GPU nodes, each with 2 Nvidia Tesla P100 GPUs.
  • 2 I/O nodes (Dell R740xd) with 40GbE, 200Gb HDR, along with 2 I/O nodes (Dell R430s, 10Gb Ethernet-connected) connected to:
    • 1 Dell MD3460 providing 287TB storage to GPFS
    • 1 Dell ME4084 providing 751TB of spinning disk storage
    • 1 IBM FS7200 providing 187TB of NVMe-attached FlashCore Module storage

Software

  • Operating System: RedHat Enterprise Linux 7.9 (64-bit) with the GNU compilers (gcc, f77)
  • Resources Manager: Slurm v20.11.8
  • Package Manager: Spack v0.11

DeepGreen

DeepGreen is a massively parallel cluster with 80 GPUs capable of over 8 petaflops of mixed-precision calculations based on the NVIDIA Tesla V100 architecture. Its hybrid design can expedite high-throughput artificial intelligence and machine learning workflows, and its extreme parallelism will forge new and transformative research pipelines. It is well-suited to support training and inference using neural models.

Hardware

  • 10 GPU nodes (Penguin Relion XE4118GTS) each with:
    • 2 Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz (2x 16 cores, 22M cache)
    • 768GB RAM (256GB for GPFS pagepool)
    • 8 NVIDIA Tesla V100s with 32GB RAM
    • 4 2-lane HDR (100Gb/s; 400Gb/s/node) Infiniband links to QM8700 switch
    • 2 NVMe nodes, each with 88TB NVMe devices (12x8TB), replicated to provide dedicated 88TB NVMe-over-fabrics filesystem
    • Mellanox QM8700 switch running at HDR speeds

Software

  • Operating System: RedHat Enterprise Linux 7.9 (64-bit) with the GNU compilers (gcc, f77)
  • Resources Manager: Slurm v20.11.8
  • Package Manager: Spack v0.11
  • CUDA 11.4

 

BlackDiamond

BlackDiamond is a high-performance computing cluster made possible by a gift from microchip manufacturer AMD. This cluster is built using AMD's 2nd Gen AMD EPYC processor, which pushes the boundaries for x86 performance, efficiency, security features, and overall system throughput.

Hardware

  • 6 GPU nodes, each with:
    • 1 AMD EPYC 7642 48-core processor
    • 8 AMD Radeon Instinct MI50 Accelerators (32GB)
    • 512GB DDR4-3200MHz RAM
    • HDR100 Infiniband links to QM8700 switch

Software

  • Operating System: RedHat Enterprise Linux 7.9 (64-bit) with the GNU compilers (gcc, f77)
  • Resources Manager: Slurm v20.11.8
  • Package Manager: Spack v0.11
  • ROCm 4.3