There are several commands available that allow you to check the status of your job, monitor execution of a running job, and collect performance statistics for your job. You can also delete, alter or hold a job.
NAVIGATE:
- Monitor a Job
- squeue (status)
- scontrol show job (status)
- squeue –start -j (start time)
- Manage a Job
- scancel (cancel)
- scontrol update (alter)
- scontrol hold / scontrol release (hold / release)
- Check Node Status
Monitor a Job
squeue
(PBS equivalent: qstat
/ showq
)
Use the squeue
command to check the status of a job.
If you use squeue
without a flag, it will list all jobs in the system:
squeue
To list all jobs belonging to a particular user, you would use, for example, where the user name is “usr1234”:
squeue -u usr1234
scontrol show job
(PBS equivalent: checkjob
)
To list the status of a particular job, you can use scontrol show job
followed by the job ID. For example, where the job ID is “123456”:
scontrol show job 123456
squeue –start -j
(PBS equivalent: qstat
/ showq
)
This command gives an estimate for the start time of a job. Unfortunately, these estimates are not accurate except for the highest priority job in the queue. The estimate may be off by a large amount in either direction.
To see the potential start time for a particular job where the job ID is “123456”:
squeue --start -j 123456
Manage a Job
You can cancel, alter, or hold a job. Below are some examples for a user with user name “usr1234” and where the job ID is “123456.”
scancel
(PBS equivalent: qdel
)
To delete a particular job, use the command scancel
followed by the job id. This command applies to both queued and running jobs. For example, where the job ID is “123456”:
scancel 123456
To delete all jobs belonging to a particular user, use the command scancel -u
followed by the user name. For example, where the user name “usr1234”:
scancel -u usr1234
If you are unable to delete one of your jobs, it may be because of a hardware problem or system software crash. In this case, you should contact us at vacc@uvm.edu.
scontrol update
(PBS equivalent: qrun
)
You can alter certain attributes of your job while it’s in the queue.
The syntax is:
scontrol update jobid=<job_id> <desired modifications>
. The desired modifications argument consists of one or more Slurm directives in the form of command-line options, i.e., without the two dashes (--
) preceding the command.
For example, to change the number of nodes to 2 and the number of ntasks to 8 where the job ID is “123456”:
scontrol update jobid=123456 nodes=2 ntasks=8
scontrol hold / scontrol release
(PBS equivalent: qhold
/ qrls
)
If you want to prevent a job from running but want leave it in the queue, you can place a hold on it using the scontrol hold
command. The job will remain blocked until you release it with the scontrol release
command. A hold can be useful if you need to modify the input file for a job, for example, but you don’t want to lose your place in the queue.
For example:
# place hold on job where job ID is “123456” scontrol hold 123456 # release hold on job where job ID is “123456” scontrol release 123456
Check Node Status
sinfo
Use the command sinfo
to check the status of all nodes. To check a specific partition, use the -p
flag and the name of the partition, for example, sinfo -p bluemoon
. Partition names are:
For Bluemoon:
- bluemoon
- bigmem
- bigmemwk
- ib
- short
- week
For BlackDiamond:
- bdgpu
For DeepGreen:
- dggpu
For more information on sinfo
, see the SchedMD sinfo page.