SLURM Production Partitions: A Practical Guide to Job Scheduling

When managing HPC clusters in production, how you structure your SLURM partitions directly impacts cluster efficiency, user experience, and resource utilisation. A well-designed partition layout ensures the right jobs land on the right hardware, fair scheduling across user groups, and predictable turnaround times.This post covers typical production partition configurations and provides ready-to-use job script templates for each workload type.

What is a SLURM Partition?

A partition in SLURM is a logical grouping of compute nodes with shared attributes and scheduling policies. Think of partitions as queues users submit jobs to a partition, and SLURM schedules them according to that partition’s rules.

Partitions allow you to:

Separate hardware types (GPU nodes, high-memory nodes, standard compute)
Set different time limits and priorities
Control access for different user groups
Apply different preemption and scheduling policies
Track usage for billing and chargeback

Typical Production Partition Layout

A typical production cluster uses partitions structured by resource type and job priority:

# slurm.conf partition configuration

PartitionName=batch    Nodes=node[001-100]  Default=YES  MaxTime=24:00:00  State=UP
PartitionName=short    Nodes=node[001-100]  MaxTime=1:00:00   Priority=100  State=UP
PartitionName=long     Nodes=node[001-100]  MaxTime=7-00:00:00  Priority=10  State=UP
PartitionName=gpu      Nodes=gpu[01-16]     MaxTime=24:00:00  State=UP
PartitionName=highmem  Nodes=mem[01-08]     MaxTime=24:00:00  State=UP
PartitionName=debug    Nodes=node[001-004]  MaxTime=00:30:00  Priority=200  State=UP
PartitionName=preempt  Nodes=node[001-100]  MaxTime=24:00:00  PreemptMode=REQUEUE  State=UP

Partition Definitions

batch

The batch partition is the default queue where most standard compute jobs land. It provides a balance between time limits and priority, suitable for the majority of production workloads. If a user submits a job without specifying a partition, it goes here.

short

The short partition is for quick jobs that need fast turnaround. Higher priority ensures these jobs start quickly, but strict time limits (typically 1 hour or less) prevent users from abusing it for long-running work. Ideal for pre-processing, quick analyses, and iterative development.

long

The long partition accommodates multi-day jobs such as climate simulations, molecular dynamics, or large-scale training runs. Lower priority prevents these jobs from blocking shorter work, but they get scheduled during quieter periods or through backfill.

gpu

The gpu partition contains nodes equipped with GPUs (NVIDIA A100s, H100s, etc.). Separating GPU resources ensures expensive accelerators aren’t wasted on CPU-only workloads and allows for GPU-specific scheduling policies and billing.

highmem

The highmem partition groups high-memory nodes (typically 1TB+ RAM) for memory-intensive workloads like genome assembly, large-scale data analysis, or in-memory databases. These nodes are expensive, so isolating them prevents standard jobs from occupying them unnecessarily.

debug

The debug partition provides rapid access for testing and development. Highest priority and very short time limits (15-30 minutes) ensure users can quickly validate their scripts before submitting large production jobs. Usually limited to a small subset of nodes.

preempt

The preempt partition offers opportunistic access to idle resources. Jobs here can be killed and requeued when higher-priority work arrives. Ideal for fault-tolerant workloads that checkpoint regularly. Users get free cycles in exchange for accepting interruption.

Job Script Templates

Below are production-ready job script templates for each partition type. Adjust resource requests to match your specific workload requirements.

Standard Batch Job

Use the batch partition for typical compute workloads with moderate runtime requirements.

#!/bin/bash
#SBATCH --job-name=simulation
#SBATCH --partition=batch
#SBATCH --nodes=2
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --mem=64G
#SBATCH --time=12:00:00
#SBATCH --output=%x_%j.out

module load openmpi/4.1.4
mpirun ./simulate --input data.in

Debug Job

Use the debug partition to quickly test job scripts before submitting large production runs. Keep it short — this partition is for validation, not real work.

#!/bin/bash
#SBATCH --job-name=test_run
#SBATCH --partition=debug
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=00:15:00
#SBATCH --output=%x_%j.out

# Quick sanity check before submitting big job
./app --test-mode

GPU Training Job

Use the gpu partition for machine learning training, rendering, or any GPU-accelerated workload. Request specific GPU counts and ensure CUDA environments are loaded.

#!/bin/bash
#SBATCH --job-name=train_model
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --gpus=4
#SBATCH --mem=128G
#SBATCH --time=24:00:00
#SBATCH --output=%x_%j.out

module load cuda/12.2 python/3.11

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --epochs 100

High Memory Job

Use the highmem partition for memory-intensive workloads that exceed standard node capacity. Common use cases include genome assembly, large graph processing, and in-memory analytics.

#!/bin/bash
#SBATCH --job-name=genome_assembly
#SBATCH --partition=highmem
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=1T
#SBATCH --time=48:00:00
#SBATCH --output=%x_%j.out

module load assembler/2.1

assembler --threads 32 --memory 900G --input reads.fastq

Long Running Job

Use the long partition for multi-day simulations. Always enable email notifications for job completion or failure, and implement checkpointing for fault tolerance.

#!/bin/bash
#SBATCH --job-name=climate_sim
#SBATCH --partition=long
#SBATCH --nodes=8
#SBATCH --ntasks=256
#SBATCH --time=7-00:00:00
#SBATCH --output=%x_%j.out
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=user@company.com

module load openmpi netcdf

mpirun ./climate_model --checkpoint-interval 6h

Preemptible Backfill Job

Use the preempt partition for opportunistic workloads that can tolerate interruption. The --requeue flag ensures the job restarts if preempted. Your application must support checkpointing and resumption.

#!/bin/bash
#SBATCH --job-name=backfill_work
#SBATCH --partition=preempt
#SBATCH --nodes=4
#SBATCH --ntasks=64
#SBATCH --time=24:00:00
#SBATCH --requeue
#SBATCH --output=%x_%j.out

# Must handle being killed and restarted
./app --checkpoint-dir=/scratch/checkpoints --resume

SBATCH Directive Reference

Common SBATCH directives used across job scripts:

Directive	Purpose	Example
`--job-name`	Job identifier in queue and logs	`--job-name=my_simulation`
`--partition`	Target partition/queue	`--partition=gpu`
`--nodes`	Number of nodes required	`--nodes=4`
`--ntasks`	Total number of tasks (MPI ranks)	`--ntasks=64`
`--cpus-per-task`	CPU cores per task (for threading)	`--cpus-per-task=8`
`--mem`	Memory per node	`--mem=128G`
`--gpus`	Number of GPUs required	`--gpus=4`
`--time`	Maximum wall time (D-HH:MM:SS)	`--time=24:00:00`
`--output`	Standard output file (%x=job name, %j=job ID)	`--output=%x_%j.out`
`--mail-type`	Email notification triggers	`--mail-type=END,FAIL`
`--requeue`	Requeue job if preempted or failed	`--requeue`

Partition Selection Guide

Partition	Typical Use Case	Time Limit	Priority
debug	Testing scripts before production runs	15-30 min	Highest
short	Quick jobs, preprocessing, iteration	1 hour	High
batch	Standard compute workloads	24 hours	Normal
gpu	ML training, rendering, GPU compute	24 hours	Normal
highmem	Genomics, large datasets, in-memory work	48 hours	Normal
long	Multi-day simulations	7 days	Low
preempt	Opportunistic, fault-tolerant workloads	24 hours	Lowest

My Thoughts

A well-structured partition layout is the foundation of effective HPC cluster management. By separating resources by type and priority, you ensure:

Users get appropriate resources for their workloads
Expensive hardware (GPUs, high-memory nodes) is used efficiently
Short jobs don’t get stuck behind long-running simulations
Testing and development has fast turnaround
Usage can be tracked and billed accurately

Start with the templates above and adjust time limits, priorities, and access controls to match your organisation’s requirements. As your cluster grows, you can add specialised partitions for specific hardware or user groups.

Nick Tailor's Technical Blog

A detail-minded individual, combining strong technical understanding and communication skills with experiences in Systems Administration, Engineering, Automation, AI Automation and Solutions; a proven methodical problem solver.