1. Home
  2. Run a Job
  3. Monitor / Manage a Job

Monitor / Manage a Job

There are several commands available that allow you to check the status of your job, monitor execution of a running job, and collect performance statistics for your job. You can also delete, alter or hold a job.

NAVIGATE:

Monitor a Job

squeue

(PBS equivalent: qstat / showq)

Use the squeue command to check the status of a job.

If you use squeue without a flag, it will list all jobs in the system:

squeue

To list all jobs belonging to a particular user, you would use, for example, where the user name is “usr1234”:

squeue -u usr1234

scontrol show job

(PBS equivalent: checkjob)

To list the status of a particular job, you can use scontrol show job followed by the job ID. For example, where the job ID is “123456”:

scontrol show job 123456

squeue –start -j

(PBS equivalent: qstat / showq)

This command gives an estimate for the start time of a job. Unfortunately, these estimates are not accurate except for the highest priority job in the queue. The estimate may be off by a large amount in either direction.

To see the potential start time for a particular job where the job ID is “123456”:

squeue --start -j 123456

Manage a Job

You can cancel, alter, or hold a job. Below are some examples for a user with user name “usr1234” and where the job ID is “123456.”

scancel

(PBS equivalent: qdel)

To delete a particular job, use the command scancel followed by the job id. This command applies to both queued and running jobs. For example, where the job ID is “123456”:

scancel 123456

To delete all jobs belonging to a particular user, use the command scancel -u followed by the user name. For example, where the user name “usr1234”:

scancel -u usr1234

If you are unable to delete one of your jobs, it may be because of a hardware problem or system software crash. In this case, you should contact us at vacc@uvm.edu.

scontrol update

(PBS equivalent: qrun)

You can alter certain attributes of your job while it’s in the queue.

When Not to Use

  • You cannot make any alterations to the executable portion of the script.
  • You cannot use scontrol update once the job starts running.


The syntax is: scontrol update jobid=<job_id> <desired modifications>. The desired modifications argument consists of one or more Slurm directives in the form of command-line options, i.e., without the two dashes (--) preceding the command.

For example, to change the number of nodes to 2 and the number of ntasks to 8 where the job ID is “123456”:

scontrol update jobid=123456 nodes=2 ntasks=8

scontrol hold / scontrol release

(PBS equivalent: qhold / qrls )

If you want to prevent a job from running but want leave it in the queue, you can place a hold on it using the scontrol hold command. The job will remain blocked until you release it with the scontrol release command. A hold can be useful if you need to modify the input file for a job, for example, but you don’t want to lose your place in the queue.

For example:

# place hold on job where job ID is “123456”
scontrol hold 123456

# release hold on job where job ID is “123456”
scontrol release 123456

Check Node Status

sinfo

Use the command sinfo to check the status of all nodes. To check a specific partition, use the -p flag and the name of the partition, for example, sinfo -p bluemoon. Partition names are:

For Bluemoon:

  • bluemoon
  • bigmem
  • bigmemwk
  • ib
  • short
  • week

For BlackDiamond:

  • bdgpu

For DeepGreen:

  • dggpu

For more information on sinfo, see the SchedMD sinfo page.

Updated on May 30, 2022

Related Articles

Need Support?
Can't find the answer you're looking for?
Contact Support