1. Home
  2. Run a Job
  3. Submit a Job

Submit a Job

The VACC clusters use Slurm as a batch job scheduler. Slurm manages queues of jobs and assigns jobs to computing hardware.

This article assumes you understand the basic concepts of a batch job scheduler on a computing cluster. If you haven’t worked with one before (or want a refresher), see our article Understanding the Batch Job System

You can submit jobs to the VACC using Slurm by either of the following methods:

  • Creating a batch script, a file which specifies options for the resources you’d like to request, and then executes commands to run your job.
  • Requesting an interactive session, a login to a node on the cluster which lets you run command line software interactively while on the clusters’ nodes.

You can also use Open OnDemand to run Slurm jobs in a graphical interface – see Open OnDemand (OOD) for more information on that method.

Slurm Batch Jobs

Your batch script can be created using any text editor. If creating a batch file from the command line on the cluster, we recommend Nano, a user friendly CLI text editor which is accessible from any of the nodes of the cluster with the command nano. If you’d like to create a batch script outside of the cluster using other tools before launching it on the cluster, see Transfer Files To/From the Cluster.

Slurm expects these scripts to be shell scripts, and so typically a file extension of
.sh and a shebang of

#!/bin/sh

is used at the start of the script.

Slurm Directives

Slurm lets you specify options directly in a batch script, called Slurm “directives.” These directives can provide job setup information used by Slurm, including resource requests, email options, and more. This information is then followed by the commands to be executed to do the computational work of your job.

Slurm directives must precede the executable section in your script.

Slurm directives begin with #SBATCH. For example:

#SBATCH --partition=bluemoon

However, the hash sign (#) followed by a space is a “comment.” Comments are helpful to include above commands as explanations of the commands below. Comments are ignored by Slurm and not processed as commands. For example:

# specify a partition for the job to run on
#SBATCH --partition=bluemoon

Here is an example of the directives you might use for a job:

#SBATCH --partition=bluemoon
#SBATCH --nodes=1
#SBATCH --mem=14G
#SBATCH --time=5:00
#SBATCH --job-name=SbatchJob
#SBATCH --output=%x_%j.out
#SBATCH --mail-user=youremail@uvm.edu
#SBATCH --mail-type=ALL

In each directive, #SBATCH is followed by a space, then a “flag,” e.g. --partition. For a list of commonly used Slurm directives, see the following section “Requesting Resources” as well as the article Basic Commands: Slurm. For a complete list, check out Slurm’s documentation on the sbatch command.

Executable Section

Below the job script’s directives is the section of code that Slurm will execute. This section is equivalent to running a Bash script in the command line – it’ll go through and sequentially run each command that you include. When there are no more commands to run, the job will stop.

For example, the following executable section will print the hostname of the node that the job is running on, print the current date and time, then exit.

# print out the hostname of the current node
hostname
# print out the current date and time
date

As a more realistic example, the following executable section will move the working directory of the Slurm job to the home directory of example user “jshmoe,” then run the Python script “test.py” in that directory.

# go to jshmoe's home directory
cd /gpfs1/home/j/s/jshmoe
# in that directory, run test.py
python test.py

Submitting a Batch Script

Once your job script is written, you can submit it to the cluster. To submit your job, use the sbatch command with your filename. For example, where the filename is “myjob”:

sbatch myjob

When you submit your job, Slurm will respond with the job ID. For example, where the job ID is “123456,” Slurm will print:

Submitted batch job 123456

Note your job ID! The job ID is required to monitor and manage a job. If you don’t remember your jobID, you can request a list of all of your running jobs using squeue, as described in the above linked article.

As well as putting them at the start of your job script, you may put Slurm directives as command line options to sbatch. This is useful for experimenting with different resources for the same job, as the directives entered in the command line will override the directives entered in the batch script. For example, the following script would override the settings on the file “myjob” for CPU count and time limit:

sbatch --cpus-per-task=4 --time=10:00 myjob

For additional information about Slurm directives, see the Slurm’s documentation or use man sbatch on the command line.

Interactive Jobs

The srun command can be used to start an interactive job.

srun is used to initiate job steps in real time, by starting an interactive session on a compute node. srun takes the same directives as sbatch for resource requests and other options – see below for information on requesting resources.

The following example represents a request on BlackDiamond for 1 node, 2 GPUs, and 12GB of memory. --pty bash requests a pseudo-terminal shell running bash for interactive access.

# requests BlackDiamond partition, 1 node, 2 GPUs, 
# and 12 GB memory in a bash shell
srun -p bdgpu -N1 -G2 --mem=12G --pty bash

For additional srun options, see Slurm’s documentation.

Requesting Resources

Each program and workflow has unique requirements, so we advise that you estimate what resources you need before submitting a job to the cluster.

Under/Over Estimating Resources

Whenever possible, it’s useful to estimate ahead of time what resources your job will use. When that estimate doesn’t match up with the resources the job will actually use, that job can run into issues:

  • A too-small request could lead to your job being cancelled midway through or taking much longer to complete. If the allocated CPU or GPU resources are too low, the job will run into bottlenecks in its processing and slow down. If the allocated memory or time is too low, the job will be cancelled once it tries to request more memory than allocated or exceed its time limit.
  • An large request of any resource may make it harder for Slurm to fit your job in alongside other jobs. As a result, it’ll take longer for the job to be allocated resources and start.

Once a job has run, you can analyze the performance of it using the command my_job_statistics. This command takes one argument, a Slurm job ID, and reports information on the job’s runtime, CPU utilization, and memory utilization, compared to the values of each that you requested. You can use this to better tune the requested resources in future runs of that job.

Partitions

A “partition” is Slurm’s way to refer to a distinct job queue. Each partition has different amounts and types of compute nodes assigned to it, as well as different restrictions on the resources that you can request.

The VACC clusters use the following partitions:

  • bluemoon
    • This partition is the default partition for Slurm, used if no partition is specified, and assigns jobs to most of the nodes included in the Bluemoon cluster
    • It has a 30 hour time limit and a 64GB per CPU memory limit
    • Most CPU-focused jobs should be submitted to this partition
  • week
    • This partition assigns jobs to a small amount of the nodes included in the Bluemoon cluster
    • It has a 7 day time limit and a 64GB per CPU memory limit
    • CPU-focused jobs which have a longer runtime than 30 hours should be submitted to this partition. However, they will take longer to start due to the smaller pool of nodes reserved for the partition
  • ib
    • This partition assigns jobs to a set of nodes in the Bluemoon cluster which include Infiniband connectivity
    • It has a 30 hour time limit
    • Jobs which need faster connectivity to other nodes over MPI or faster file access to Bluemoon’s GPFS file server should be submitted to this partition
  • bigmem
    • This partition assigns jobs to a small subset of the nodes included in the Bluemoon cluster which have large amounts of memory
    • It has a 30 hour time limit and no memory limit
    • Memory-focused jobs which have a longer runtime should be submitted to this partition. However, they will take longer to start due to the smaller pool of nodes reserved for the partition
  • bigmemwk
    • This partition assigns jobs to a small subset of the nodes included in the Bluemoon cluster which have large amounts of memory
    • It has a 7 day time limit and no memory limit
    • Memory-focused jobs which have a longer runtime than 30 hours should use this node. However, they will take longer to start due to the smaller pool of node reserved for the partition
  • short
    • This partition assigns jobs to all of the nodes included in the Bluemoon cluster.
    • It has a 3 hour time limit and no memory limit
    • CPU or memory-focused jobs which will run for under 3 hours should use this partition to get some priority over longer jobs, and to access nodes otherwise dedicated to the large-memory partitions
  • bdgpu
    • This partition assign jobs to the BlackDiamond cluster
    • It has a 48 hour time limit and no memory limit
    • GPU-focused jobs which use AMD GPUs should be submitted to this partition. In order to reserve space for GPU focused jobs, do not submit jobs to this partition which do not use GPUs
  • dggpu
    • This partition assign jobs to the DeepGreen cluster
    • It has a 48 hour time limit and no memory limit
    • GPU-focused jobs which use Nvidia GPUs should be submitted to this partition. In order to reserve space for GPU focused jobs, do not submit jobs to this partition which do not use GPUs

The following partitions are available by request only. If interested, please send an email request which details the work you’d like to do on these partitions to vacchelp@uvm.edu!

  • hc
    • This partition includes two nodes which have CPUs that run at a higher clock (hence the name, “hc,” for “high clock”) speed; 3.50 GHz, rather than the average ~2.30 GHz of nodes elsewhere on the cluster
  • debug
    • This partition includes two nodes which contain Nvidia A100 GPUs which are used by research groups to pilot applications which need the higher processing power of this model of GPU compared to the ones in DeepGreen
  • staging
    • This partition is used by VACC system administrators to test out new nodes
    • Occasionally, this partition is opened to interested users for testing prior to adoption on the cluster

Partitions are selected in Slurm with the flag --partition=<name>. As an example, if you wanted to explicitly select the partition bluemoon, you’d include the following in your batch script:

#SBATCH --partition=bluemoon

Nodes

A “node” is a physical compute node. The clusters of the VACC are comprised of hundreds of individual compute nodes, each one a separate computer. You can request a specific number of nodes to use in a job with  --nodes=<count>.

As an example, the following line requests 1node:

# request one node
#SBATCH --nodes=1

Tasks

In Slurm, a “task” refers to the the number of processes available to the Slurm job. Most jobs, even ones that run in parallel on multiple CPUs, will use one process. Some jobs can open multiple processes at a time, most notably jobs using MPI. The number of tasks can be requested with --ntasks=<count>.  If requesting multiple nodes, this requests “<count>” tasks per node.

As an example, the following line requests 1 task per node:

# request one task
#SBATCH --ntasks=1

CPUs

In Slurm, a “cpu” refers to a single core of a physical CPU on one of the nodes.

Some software only uses a single core, where each step of the software is strictly sequential. This type of software is often referred to as “serial” or “single-threaded.” If you were using software where this was the case, you would request a single core from Slurm.

Some software, conversely, is able to run many steps at the same time on different CPU cores. This software is often referred to “parallel” or “multi-threaded.” If you were using software where this was the case, you would request multiple cores from Slurm.

Starting with a small amount of cores, for example 4 to 8 – is wise, as it allows your job to use a small amount of resources and therefore start faster. As discussed in the section at the start of “Requesting Resources,” you can use my_job_statistics to analyze the CPU usage of a completed job. If you’re seeing low percentages for CPU efficiency, you’re likely requesting more cores than your job is able to use successfully. If you’re seeing 100% CPU efficiency, it’s possible your job’s performance is being limited by the amount of CPUs you’re requesting. Try increasing the CPU count and seeing if the time it takes to complete your job improves!

CPUs in Slurm are typically requested per task, using the directive --cpus-per-task=<count>

As an example, the following line requests 4 CPUs per task:

# request four cpus per task
#SBATCH --cpus-per-task

GPUs

In Slurm, GPUs are assigned as “Generic Resources” (GRES). For detailed info on GRES in Slurm, see Slurm’s documentation. From a user perspective, it’s still fairly simple to request a GPU. To request GPUs, you can use the directive --gres=gpu:<count>. If requesting multiple nodes, this requests “<count>” GPUs per node.

As described above in the section “Partitions,” only some nodes include GPUs, and each GPU partition includes different types of GPUs. Make sure you’re choosing one that includes the GPUs you want!

As an example, the following line requests one GPU per node:

# request one GPU
#SBATCH --gres=gpu:1

Memory

The requested RAM accessible in a job can be specified in a number of different ways in Slurm. By default, it’s specified in the amount of memory per CPU, as described above in the section “Partition

If you need to specify a certain amount of memory per node, you can use the directive --mem=<size[K|M|G|T])>. Other options for requesting memory can be found in Slurm’s documentation.

As an example, the following line requests 40 gigabytes of memory per node:

# request 40 gigabytes of memory
#SBATCH --mem=40G

Walltime

Walltime is the maximum amount of time your job will run. The default and maximum walltimes for each partition are listed above.

Walltime can be explicitly requested with #SBATCH --time=<dd-hh:mm:ss>, where “dd” refers to day(s), “hh” to hour(s), “mm” to minute(s) and “ss” to second(s). Acceptable formats are: mm, mm:ss, hh:mm:ss, dd-hh, dd-hh:mm, dd-hh:mm:ss.

For example:

# request 1 hour of walltime
#SBATCH --time=01:00:00

Other Slurm Options

Mail Options

A job in Slurm can be set up to notify you of changes to your job.

Set Email Address

When setting up an email notification, you can specify the address to send to using the flag --mail-user. By default, the email address used by the VACC is your UVM email in the form: netid@uvm.edu.

Avoid Email Blocking

If you set a job to send to an email other than your UVM email and you are running many jobs, your email provider may consider the repeated email notifications sent by Slurm as spam and block it.

For example, where the email address you want to assign is “usr1234@otheraddress.com”:

#SBATCH --mail-user=usr1234@otheraddress.com

Set Mail Type

You must set what types of emails you would like to receive. The options include: NONE (no emails), BEGIN (when your job begins), END (when your job ends), FAIL (if your job fails), or ALL.

If interested in email notifications, we recommend you set this to ALL so that all the information about the job is captured. For example:

#SBATCH --mail-type=ALL

Job Name

Specifying a job name is not required. If you don’t supply a job name, the job ID (generated by Slurm) is used as a name.

Job name is used as part of the name of the job log files. It also appears in lists of queued and running jobs.

To specify a job name, use the --job-name flag. For example, where your job name is “myjob”:

#SBATCH --job-name=myjob

Job Log Files

By default, Slurm combines the standard output stream (STDOUT) and the standard error stream (STDERR) into one file and it is named “slurm-<job_id>.out”. For example, for a job where the job id is “123456,” the job log file would be named slurm-123456.out.

Log File Name

The job file name can be a static name, or can be changed by placeholders, referred to in Slurm’s documentation as “filename patterns” Two of the more useful ones are %x and %j. %x will be substituted by the job’s name, and %j will be substituted by the job’s ID. For example, where your job name is “myjob” and the job ID assigned to your job is “123456,” the following directive would set the file name to  “myjob_123456.out”

#SBATCH --output=%x_%j.out

Sample Script

Here is a sample Slurm batch script, with comments.

#!/bin/sh
# Run on the main bluemoon partition
#SBATCH --partition=bluemoon
# Request a single node
#SBATCH --nodes=1
# Request 1 task
#SBATCH --ntasks=1
# Request 14 GB of memory per node
#SBATCH --mem=14G
# Run for five minutes
#SBATCH --time=5:00
# Name the job
#SBATCH --job-name=SbatchJob
# Name the output file
#SBATCH --output=%x_%j.out
# Set the email address
#SBATCH --mail-user=usr1234@uvm.edu
# Request email to be sent at both begin and end, and if job fails
#SBATCH --mail-type=ALL 
# change to the directory where you submitted this script
cd ${SLURM_SUBMIT_DIR}
# Executable section: echoing some Slurm data
echo "Starting sbatch script myscript.sh at:`date`"
echo "Running host:    ${SLURMD_NODENAME}"
echo "Assigned nodes:  ${SLURM_JOB_NODELIST}"
echo "Job ID:          ${SLURM_JOBID}"

What’s Next?

See Monitoring and Managing a Job for checking the status of a job in the queue, monitoring the execution of a running job, or cancelling a job.

Updated on April 4, 2024

Related Articles

Need Support?
Can't find the answer you're looking for?
Contact Support