This command allows manipulation of jo and other features of SLURM
Common usages are:
Command Line | Meaning |
---|---|
scontrol checkpoint CKPT_OP ID | Force a job to checkpoint where ID is the job ID, and CKPT_OP may be options such as create, restart, requeue |
scontrol hold job_list | Hold a series of jobs. |
scontrol release job_list | Reverse of hold. |
scontrol requeue job_list | Requeue jobs. Useful if you know the application has a bug and you have fixed it. |
**scontrol requeuehold job_list | As above, but you are about to fix the bug. |
scontrol suspend job_list | Stop CPU time being used for a set of jobs without killing the jobs. Note it still uses time from your project budget. |
scontrol resume job_list | Reverse of suspend. |
scontrol show ENTITY=ID | Display various detailed information about a job. |
**scontrol top job_id | Put job_id at the top of the list. I.e. reorder your jobs. |
scontrol show job_id | Shows detail about a job. |
To show more details on a particular job use scontrol show job <JOBID>:
scontrol show job 22401
JobId=22401 Name=tuhs
UserId=tuhs(7890) GroupId=lboro_tu(5678)
Priority=1 Account=tuhs98 QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=05:54:34 TimeLimit=20:00:00 TimeMin=N/A
HPC Midlands Plus - Quick-Start GuideSubmitTime=2014-07-24T10:21:37 EligibleTime=2014-07-24T10:21:37
StartTime=2014-07-24T10:21:37 EndTime=2014-07-25T06:21:37
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=all AllocNode_Sid=athena10:3606
ReqNodeList=(null) ExcNodeList=(null)
NodeList=node[56-61]
BatchHost=node56
NumNodes=6 NumCPUs=96 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=16 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=0 Contiguous=0 Licenses=(null) Network=(null)
Command=/gpfs/home/loughborough/tu/tuhs/work/m77r80/submit.sh
WorkDir=/gpfshome/loughborough/tu/tuhs/work/m77r80