Introduction to SLURM and Job Submission¶

Now that you are connected to the cluster, it's time to learn about SLURM (Simple Linux Utility for Resource Management), which is the workload manager used on the cluster. SLURM handles job scheduling, resource allocation, and job monitoring.

In general, cluster users are expected to submit most computations to the job scheduler to be run on the dedicated compute nodes. The login nodes are meant for tasks like editing source/command files and running short test programs that do not use much memory, time, and only need one or two CPUs. Some rough guidelines are: the test will run less than five minutes, will use less than 5 GB of memory, and will not use more than two CPUs.

Submitting a Job with `sbatch`¶

You can submit a job by writing a job script. It's a simple text file that contains both the resource requirements and the commands you want to execute.

Let's create our first job script. (you can use the editor of your choice, e.g. emacs, joe, nano, vim, etc)

$ nano test_job.sh

We need to have a shebang line at the beginning of the script to specify the file is a shell script.

#!/bin/sh

Slurm lets you specify options directly in a batch script, called Slurm “directives.” These directives can provide job setup information used by Slurm, including resource requests, email options, and more. This information is then followed by the commands to be executed to do the computational work of your job.

Slurm directives must precede the executable section in your script.

# Run on the general partition 
#SBATCH --partition=general

# Request one node
#SBATCH --nodes=1

# Request one task
#SBATCH --ntasks=1

# Request 4GB of RAM
#SBATCH --mem=4G

# Run for a maximum of 5 minutes
#SBATCH --time=5:00

# Name of the job
#SBATCH --job-name=testjob

# Name the output file 
#SBATCH --output=%x_%j.out

# Specify when Slurm should send you e-mail.  You may choose from
# BEGIN, END, FAIL to receive mail, or NONE to skip mail entirely.
#SBATCH --mail-type=NONE

Below the job script’s directives is the section of code that Slurm will execute. This section is equivalent to running a Bash script in the command line – it’ll go through and sequentially run each command that you include. When there are no more commands to run, the job will stop.

For example, these commands go to jshmoe's home directory and executes a python program.

# go to jshmoe's home directory
cd /gpfs1/home/j/s/jshmoe
# in that directory, run test.py
python test.py

When you are done editing your file, save and exit.

To submit the job we use the sbatch command.

$ sbatch test_job.sh
Submitted batch job 123456

Your job will be submited and run onces the requested resources are avialable.

Jobs with fewer resources requested will run sooner.

Although jobs submitted before you are further ahead in the queue, the slurm scheduler looks for jobs that can fit in the gaps between larger jobs, as long as they do not delay them. This means that being conservative in your resource requests will result in jobs running sooner.

Running an Interactive Job with `srun`¶

In addition to batch jobs, you can run interactive jobs on the cluster using SLURM. An interactive job gives you direct access to a compute node, allowing you to run commands interactively as if you were logged into that node. This is useful for tasks like debugging and testing code.

To start an interactive session, use the srun command. Here's an example:

$ srun --partition=general --nodes=1 --ntasks=1 --mem=4G --time=30:00 --pty bash

In this command:

--partition=general: Specifies the partition to run the interactive session on.
--nodes=1: Requests one compute node.
--ntasks=1: Requests one task.
--mem=4G: Allocates 4GB of RAM.
--time=1:00:00: Sets a maximum runtime of one hour.
--pty bash: Starts a Bash shell interactively on the allocated compute node.

Once this command is executed, you will be dropped into a Bash shell running on a compute node. From here, you can run commands, load modules, or execute scripts as needed.

To end the interactive session, simply type:

$ exit

Interactive jobs are ideal for real-time experimentation and testing, complementing the batch job process.

Job Constraints¶

When a job has specific hardware requirements, you can use constraints to select the appropriate nodes. For example, to limit your job to a node with an Infiniband network card, you might use

#SBATCH --constraint=ib

or add --constraint=ib to your srun or salloc command.

The constraints in the table below were available when this page was last updated. For the most current available constraints, you can run

show_node_constraints

Constraint	Description
intel	Nodes with intel processors
amd	Nodes with amd processors
v100	Nodes with v100 gpus
a100	Nodes with a100 gpus
h100	Nodes with h100 gpus
noib	Nodes without Infiniband
ib	Infiniband nodes, all types
ib1	Infiniband nodes, switch 1
ib2	Infiniband nodes, switch 2
10g	10 Gig Ethernet
hc	High clockspeed nodes
cascadelake	Nodes with cascadelake generation processors

Up Next¶

Data & Storage

Introduction to SLURM and Job Submission¶

Login vs. Compute Nodes¶

Submitting a Job with sbatch¶

Running an Interactive Job with srun¶

Job Constraints¶

Up Next¶

Submitting a Job with `sbatch`¶

Running an Interactive Job with `srun`¶