Transition to Red Hat 9¶
With the end of life of Red Hat Enterprise Linux 7 (RHEL 7), VACC and ETS are transitioning our HPC cluster to Red Hat Enterprise Linux 9 (RHEL 9). This update provides an opportunity to review and enhance the VACC's structure, allowing us to implement several updates and improvements.
Timeline¶
It does take some time to drain jobs from compute nodes (especially in the week-long queues), so please note that migrating nodes may become unavailable a few days before the dates below. They may also take a day or two after to become available to the RHEL9 cluster.
-
January 10: RHEL 9 cluster enters full production. RHEL 7 is deprecated but remains in service for some software. Significant compute resources will be moved from RHEL 7 to the RHEL 9 cluster. As the semester progresses, compute resources available to RHEL 7 will dwindle. As of the start of the semester, about 36% of the compute resources have moved to the RHEL9 cluster.
-
Week of January 29 (currently): About 50% of the compute resources will have moved to RHEL9.
-
Week of February 12: About 59% moved to RHEL9.
-
Week of March 5: About 71% moved to RHEL9.
-
Week of April 2: About 89% moved to RHEL9.
-
April 18: Red Hat 7 will no longer be accessible.
Login Nodes¶
The RHEL 9 login nodes for the cluster are accessible via login.vacc.uvm.edu
. Users will be connected to them in a round-robin system to balance usage across the nodes. It is still possible to connect to these nodes directly if necessary.
For more information, see Connecting via SSH.
Open OnDemand server changed¶
The Open OnDemand server for the Red Hat 9 cluster can be reached at:
https://ondemand.vacc.uvm.edu/
All applications should be available. The older, deprecated server will remain in service during our transition period. Please start using the new server and let us know of any problems you might encounter. Those who have installed R or Python should be particularly concerned to test viability of exiting installations.
Slurm Changes¶
Renaming of Slurm Partitions¶
We are renaming and consolidating the Slurm partitions to better reflect the usage of the VACC. While the underlying hardware remains the same, we have retired the branding for Bluemoon, Deep Green, and Data Mountain as grant lifetimes have ended. Going forward, most jobs will be appropriate for the general
or nvgpu
partitions.
Please see Slurm Partitions for more information.
Using Constraints with Slurm¶
When a job has specific hardware requirements, you can use constraints to select the appropriate nodes. For example, to select a node with an intel processor, use --constraint=intel
. Please see Submitting a Job for a list of available constraints.
Default Walltime of 30 Minutes¶
Previously, when jobs were submitted without a time limit (--time
), the default walltime (time a job is allowed to run) was thirty hours, which is the same as the maximum. This has been reduced to 30 minutes to encourage users to explicitly define the required runtime when submitting jobs. Please see Submitting a Job to learn how to evaluate your jobs.
Module Differences¶
The module system has been upgraded from the TCL-based module system to Lmod. Modules are now structured hierarchically, allowing for better organization and reducing module conflicts. Please see the Module Guide to learn how the new module system works. Note, we are no longer using Spack to load packages. Except for some software packages provided as containers, most packages are newly installed. As a result, you will find that many of the versions have been updated. Please use the module
commands to check current package versions and update your job scripts as needed.
AMD GPUs¶
AMD is ending software support for the GPUs that compose BlackDiamond (MI50s). As a result, BlackDiamond will be sunset with the transition to RHEL 9. We plan to purchase new NVIDIA GPU hardware in 2025 to replace this capacity and provide new functionality.
Apptainer is replacing Singularity¶
Apptainer replaced Singularity for containerization. Read our apptainer documentation
Transitioning R libraries¶
Many people will have installed R libraries. Some R libraries only contain R commands, but many if not most compile themselves. Those that do are likely to need to be reinstalled on the RHEL 9 cluster.
R has a standard location for libraries that it will create for you the first
time you install a library: ~/R/x86_64-pc-linux-gnu-library/M.N
, where the
M.N
will be the first two version numbers for R. In this case, the one with
which we will be concerned is 4.4
.
Some R libraries can be tricky to get installed, so you may not want to make changes to the ones you have for RHEL 7 until you have new versions for RHEL 9. R provides a way to change the library location, and we can use that to make it possible to use both clusters simply by setting that variable to a directory appropriate for each.
One good way to do this is to rename the RHEL 7 default library directory and
install the libraries you need into a new directory of the same name. So, for
the current version 4.4
, I would have ~/R/x86_64-pc-linux-gnu-library/4.4
.
I can rename that to 4.4-rhel7
, and if I try to install any libraries while
on the RHEL 9 cluster, a new 4.4
directory will be created for them. To
accomplish this, you would
$ cd ~/R/x86_64-pc-linux-gnu-library
$ mv 4.4 4.4-rhel7
Running R and installing any package will recreate
~/R/x86_64-pc-linux-gnu-library/4.4
if you accept the default values when you
install the first library.
If you find you need to run an R program on the older, RHEL 7, cluster, you
set the variable R_LIBS_USER
to be
~/R/x86_64-pc-linux-gnu-library/4.4-rhel7
before you start R.
Example¶
Suppose I installed only the package 'netstat' while using RHEL 7. I want to start fresh with RHEL 9 and install the 'lme4' package in the now-available default directory. I would do the following, answering 'yes' to both questions.
$ cd ~/R/x86_64-pc-linux-gnu-library
$ mv 4.4 4.4-rhel7
$ module load R
$ R
[ . . . . output removed . . . . ]
> install.packages('lme4')
Warning in install.packages("lme4") :
'lib = "/gpfs1/sw/rh9/pkgs/stacks/gcc/13.3.0/R/4.4.1/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘/gpfs1/home/b/f/bfauber/R/x86_64-pc-linux-gnu-library/4.4’
to install packages into? (yes/No/cancel) y
--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors
[ . . . . output removed . . . . ]
Selection: 65
[ . . . . output removed . . . . ]
* DONE (lme4)
I can run the R commands below to show the installed R packages.
> ip = as.data.frame(installed.packages()[,c(1,3:4)])
> ip = ip[is.na(ip$Priority),1:2,drop=FALSE]
> print(ip[2])
Version
lme4 1.1-36
minqa 1.2.8
nloptr 2.1.1
rbibutils 2.3
Rcpp 1.0.14
RcppEigen 0.3.4.0.2
Rdpack 2.6.2
reformulas 0.4.0
doParallel 1.0.17
foreach 1.5.2
iterators 1.0.14
If I go to the RHEL 7 cluster, set R_LIBS_USER
to the 4.4-rhel7
directory,
load R, and run the same commands, I will get,
$ R_LIBS_USER="$HOME/R/x86_64-pc-linux-gnu-library/4.4-rhel7"
$ module load R
$ R
[ . . . . output removed . . . . ]
> ip = as.data.frame(installed.packages()[,c(1,3:4)])
> ip = ip[is.na(ip$Priority),1:2,drop=FALSE]
> print(ip[2])
Version
netstat 0.1.2
doParallel 1.0.17
foreach 1.5.2
iterators 1.0.14
This is one way to have a working set of R libraries for both clusters during
the transition. Once all your libraries are working on RHEL 9, you can remove
the ~/R/x86_64-pc-linux-gnu-library/4.4-rhel7
with
$ rm -rf ~/R/x86_64-pc-linux-gnu-library/4.4-rhel7`
If you have questions about this, please contact us at vacchelp@uvm.edu.