12  Monitoring Slurm Jobs

Author

Sean Taylor, Marc Carlson, Glenn Morton, Lindsay Clark, Neerja Katiyar

Published

May 7, 2026

Slurm provides several tools for monitoring, in-depth analysis and control over your jobs, you can use sacct, seff, and scontrol. Each command provides a different type of information.

12.1 Monitoring queued and running jobs

Once you have submitted a job, you can use the squeue command to check its status. This command provides a real-time snapshot of all jobs currently in the Slurm queue.

12.1.1 Example commands

  • To check the status of a specific job that is currently running:
squeue -j <job_id>
  • You can view all jobs on the system by simply running:
squeue
  • To filter the results and see only your jobs, use the -u flag with your username:
squeue -u <your_username>

The output of squeue will show important information about your jobs, including their JOBID, PARTITION, NAME, STATE (e.g., PENDING, RUNNING, COMPLETING), and the NODELIST(REASON). The reason a job is pending is particularly useful for debugging, as it might indicate that it is waiting for a specific resource.

12.2 Analyzing past jobs

sacct is for job accounting and historical data. It queries the Slurm accounting database and provides detailed information about completed, canceled, or failed jobs. Unlike squeue, which shows a live view, sacct gives you a permanent record of a job’s resource usage, start and end times, exit code, and more.

12.2.1 Example Commands

  • To check the status of a completed job (e.g., with ID 12345):
sacct -j 12345
  • To see when a job started and ended:
sacct -j 12345 --format=JobID,Start,End
  • To see maximum memory used (MaxRSS) and elapsed time:
sacct -j 12345 --format=JobID,JobName,MaxRSS,Elapsed,State
  • To see details about individual jobs in an array:
sacct -j <jobid> --array
  • To see details about all of your jobs since a certain timepoint:
sacct -u <userid> -S 2024-01-15

12.2.2 Additional parameters to use with sacct:

https://slurm.schedmd.com/sacct.html

12.2.3 Job states

Flags to learn about job status
Code Status Description
PD PENDING Jobs awaiting resource allocation.
CG COMPLETING Job is done executing and has some ongoing processes that are being finalized.
CD COMPLETED Job has completed successfully.
R RUNNING Job has been allocated resources and is being processed by the compute node(s).
F FAILED The job terminated with a non-zero code and stopped executing.

Table courtesy of https://hpc.nmsu.edu/discovery/slurm/commands/#_the_squeue_command

12.3 Efficiency reports

seff is a convenient job efficiency report tool. It’s a script that uses data from the Slurm accounting database (similar to sacct) to provide a clean, easy-to-read summary of a completed job’s efficiency. It calculates metrics like CPU and memory efficiency by comparing requested resources to actual usage.

The output of seff may vary from system to system. Here are some key fields that are likely common:

  • Job ID: The unique identifier of the job.

  • State: Indicates whether the job completed successfully, failed, or is still running.

  • Nodes: The number of nodes allocated to the job.

  • Cores per node: The number of cores allocated per node.

  • CPU Utilized: The total CPU time used by the job.

  • CPU Efficiency: The percentage of allocated CPU time that was actually used.

  • Job Wall-clock time: The total time the job ran.

  • Memory Utilized: The amount of memory used by the job, often reported as a peak value.

  • Memory Efficiency: The percentage of allocated memory that was actually used. 

See Chapter 25 for more details on how to use this information to fine tune your jobs.

12.3.1 Example commands

  • To get an efficiency report for a finished job showing the percentage of CPU and memory used, helping you determine if you over-requested or under-requested resources:
seff 12345

12.4 Detailed job information

scontrol is a versatile command for viewing and modifying Slurm state. It can be used by both users and administrators to get highly detailed information about jobs, nodes, and partitions. For a regular user, the command scontrol show job is particularly useful for inspecting the full details of a specific job, including all its requested parameters.

Important Notes:

  • scontrol show job primarily works for pending and running jobs.

  • For jobs that completed more than a certain time ago (e.g., 30 minutes, depending on configuration), their records might be removed from Slurm’s memory. In such cases, the sacct command is used to retrieve historical job information from the Slurm database.

12.4.1 Example commands

  • To get a detailed, verbose output of a job’s parameters (e.g., for an active job), including its full submission script contents and requested resources:
scontrol show job <job_id>
  • To show even more detailed information, including the job submission script (if available):
scontrol -dd show job <job_id>

12.5 Canceling a job

  • To terminate a running or pending job, use the scancel command:
scancel <job_id>
  • To cancel one job within an array:
scancel <jobid>_5
  • To cancel a series of jobs in an array:
scancel <jobid>_[2-8]