10 The Slurm Scheduler

Author

Sean Taylor, Marc Carlson, Glenn Morton, Lindsay Clark, Neerja Katiyar

Published

May 7, 2026

10.1 Access

See Chapter 1 for instructions on getting access to the HPC. You will also want to request an association in order to get access to storage space and compute resources.

It is important to understand that the node you connect to when you first log in to the cluster is just a login node.
When running compute-intensive tasks on the Sasquatch HPC, it’s crucial to use the worker nodes instead of the login node. You should always launch your jobs using srun for interactive sessions or sbatch for batch scripts.
The login node is a shared resource for all users to manage files, compile code, and submit jobs. If you run resource-intensive work on this node, you can slow down or freeze the system, preventing others from performing essential tasks. In such cases, your processes may be terminated by an administrator to restore normal operations.

10.2 Using slurm on Sasquatch

Submitting jobs Chapter 11
Monitoring jobs Chapter 12
Logging Chapter 13
Scaling jobs Chapter 14

10.3 Common Slurm Terms

Term	Description
partition	Collection of computers (nodes). Usually grouped by similar architectural properties (cpus/gpus).
account	Collection of users. Used for permitting access to parts of the system.
node	A computer in the cluster. Physical hardware at the data center.
job	Collection of steps, often just a configuration step and an exeucution step that executes on the cluster.
step	A subdivision within a job. A set of tasks that are executed together in parallel or in series within a single step.
task	The smallest unit of execution in Slurm. Tasks are typically associated with a specific number of CPU cores.
cpu	A physical processor capable of doing a task. Within some tools, may be referenced as a core or thread. Be sure to read the documentation of your tool to determine specifics.
gpu	A powerful processor optimized for floating point operations that is typically useful for machine learning and other bespoke pipelines/algorithms. *Be sure to read the documentation of the tools you are using before you ask for one of these limited resources.
mem	Memory (RAM). Used to allocate working resources for your task.

10.4 HPC Architecture

Overall architecture diagram of Sasquatch can be found in Section 5.2.

Information on the different clusters for Posit can be found in Chapter 22.

10.5 Learning resources: SLURM

Workload Manager Rosetta Stone - useful resource for users new to SLURM but familiar with other job schedulers (PBS).
SLURM cheatsheet - quick guide for reference
SLURM commands and options - man pages for all slurm commands including srun, sbatch, sacct, etc.
Additional SLURM Documentation - more complete documentation on features and applications (from SchedMD)