While we consider research computing to be central to the University’s mission, the VACC’s high-performance computing cluster is not considered a business-critical service. In the event of multiple failures of critical systems across campus, we will attend to business-critical services before fixing problems with the HPC cluster.
In addition, like many other HPC environments, our cluster’s architecture has been chosen for performance rather than resiliency or reliability. If a node or two were to fail, it would probably be several days before they were repaired, since the priority is for the continued operation of the cluster as a whole.