SLURM accounting tracks every job that runs on your cluster — who submitted it, what resources it used, how long it ran, and which account to bill. This data powers fairshare scheduling, resource limits, usage reports, and chargeback billing.
This post walks through setting up SLURM accounting from scratch in a production environment, with the database on a dedicated server separate from the controller.
Architecture Overview
In production, you separate the database from the controller for performance and reliability:
Controller Node Database Node Compute Nodes
─────────────── ───────────── ─────────────
slurmctld slurmdbd slurmd
MariaDB/MySQL slurmd
slurmd
...
How it works:
slurmctld(scheduler) sends job data toslurmdbdslurmdbd(database daemon) writes to MariaDB/MySQL- Compute nodes (
slurmd) just run jobs — no database access
The controller never talks directly to the database. slurmdbd is the middleman
that handles connection pooling, batches writes, and queues data if the database is temporarily
unavailable.
Prerequisites
Before starting, ensure you have:
- Working SLURM cluster (slurmctld on controller, slurmd on compute nodes)
- Dedicated database server (can be VM or physical)
- Network connectivity between controller and database server
- Consistent SLURM user/group (UID/GID must match across all nodes)
- Munge authentication working across all nodes
Step 1: Install MariaDB on Database Server
On your dedicated database server:
# Install MariaDB
sudo apt update
sudo apt install mariadb-server mariadb-client -y
# Start and enable
sudo systemctl start mariadb
sudo systemctl enable mariadb
# Secure installation
sudo mysql_secure_installation
During secure installation:
- Set root password
- Remove anonymous users — Yes
- Disallow root login remotely — Yes
- Remove test database — Yes
- Reload privilege tables — Yes
Step 2: Create SLURM Database and User
Log into MariaDB and create the database:
sudo mysql -u root -p
-- Create database
CREATE DATABASE slurm_acct_db;
-- Create slurm user with access from controller node
CREATE USER 'slurm'@'controller.example.com' IDENTIFIED BY 'your_secure_password';
-- Grant privileges
GRANT ALL PRIVILEGES ON slurm_acct_db.* TO 'slurm'@'controller.example.com';
-- If slurmdbd runs on the database server itself (alternative setup)
-- CREATE USER 'slurm'@'localhost' IDENTIFIED BY 'your_secure_password';
-- GRANT ALL PRIVILEGES ON slurm_acct_db.* TO 'slurm'@'localhost';
FLUSH PRIVILEGES;
EXIT;
Step 3: Configure MariaDB for Remote Access
Edit MariaDB configuration to allow connections from the controller:
sudo nano /etc/mysql/mariadb.conf.d/50-server.cnf
Find and modify the bind-address:
# Change from
bind-address = 127.0.0.1
# To (listen on all interfaces)
bind-address = 0.0.0.0
# Or specific IP
bind-address = 192.168.1.10
Add performance settings for SLURM workload:
[mysqld]
bind-address = 0.0.0.0
innodb_buffer_pool_size = 1G
innodb_log_file_size = 64M
innodb_lock_wait_timeout = 900
max_connections = 200
Restart MariaDB:
sudo systemctl restart mariadb
Open firewall if needed:
# UFW
sudo ufw allow from 192.168.1.0/24 to any port 3306
# Or firewalld
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port protocol="tcp" port="3306" accept'
sudo firewall-cmd --reload
Step 4: Install slurmdbd on Database Server
You can run slurmdbd on the database server or the controller. Running it on the
database server keeps database traffic local.
# On database server
sudo apt install slurmdbd -y
Step 5: Configure slurmdbd
Create the slurmdbd configuration file:
sudo nano /etc/slurm/slurmdbd.conf
# slurmdbd.conf - SLURM Database Daemon Configuration
# Daemon settings
DbdHost=dbserver.example.com
DbdPort=6819
SlurmUser=slurm
# Logging
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/run/slurmdbd.pid
DebugLevel=info
# Database connection
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePort=3306
StorageUser=slurm
StoragePass=your_secure_password
StorageLoc=slurm_acct_db
# Archive settings (optional)
#ArchiveEvents=yes
#ArchiveJobs=yes
#ArchiveResvs=yes
#ArchiveSteps=no
#ArchiveSuspend=no
#ArchiveTXN=no
#ArchiveUsage=no
#ArchiveScript=/usr/sbin/slurm.dbd.archive
# Purge old data (optional - keep 12 months)
#PurgeEventAfter=12months
#PurgeJobAfter=12months
#PurgeResvAfter=12months
#PurgeStepAfter=12months
#PurgeSuspendAfter=12months
#PurgeTXNAfter=12months
#PurgeUsageAfter=12months
Set proper permissions:
# slurmdbd.conf must be readable only by SlurmUser (contains password)
sudo chown slurm:slurm /etc/slurm/slurmdbd.conf
sudo chmod 600 /etc/slurm/slurmdbd.conf
# Create log directory
sudo mkdir -p /var/log/slurm
sudo chown slurm:slurm /var/log/slurm
Step 6: Start slurmdbd
Start the daemon and verify it connects to the database:
# Start slurmdbd
sudo systemctl start slurmdbd
sudo systemctl enable slurmdbd
# Check status
sudo systemctl status slurmdbd
# Check logs for errors
sudo tail -f /var/log/slurm/slurmdbd.log
Successful startup looks like:
slurmdbd: debug: slurmdbd version 23.02.4 started
slurmdbd: debug: Listening on 0.0.0.0:6819
slurmdbd: info: Registering cluster(s) with database
Step 7: Configure slurmctld to Use Accounting
On your controller node, edit slurm.conf:
sudo nano /etc/slurm/slurm.conf
Add accounting configuration:
# Accounting settings
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=dbserver.example.com
AccountingStoragePort=6819
AccountingStorageEnforce=associations,limits,qos,safe
# Job completion logging
JobCompType=jobcomp/none
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30
# Process tracking (required for accurate accounting)
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup,task/affinity
AccountingStorageEnforce options:
- associations — Users must have valid account association to submit jobs
- limits — Enforce resource limits set on accounts/users
- qos — Enforce Quality of Service settings
- safe — Only allow jobs that can run within limits
Step 8: Open Firewall for slurmdbd
On the database server, allow connections from the controller:
# UFW
sudo ufw allow from 192.168.1.0/24 to any port 6819
# Or firewalld
sudo firewall-cmd --permanent --add-port=6819/tcp
sudo firewall-cmd --reload
Step 9: Restart slurmctld
On the controller:
sudo systemctl restart slurmctld
# Check it connected to slurmdbd
sudo tail -f /var/log/slurm/slurmctld.log
Look for:
slurmctld: accounting_storage/slurmdbd: init: AccountingStorageHost=dbserver.example.com:6819
slurmctld: accounting_storage/slurmdbd: init: Database connection established
Step 10: Create Cluster in Database
Register your cluster with the accounting database:
sudo sacctmgr add cluster mycluster
Verify:
sacctmgr show cluster
Cluster ControlHost ControlPort RPC Share GrpJobs GrpTRES GrpSubmit MaxJobs MaxTRES MaxSubmit MaxWall QOS Def QOS
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- ---------
mycluster controller.ex. 6817 9728 1 normal
Step 11: Create Accounts
Create your account hierarchy:
# Create parent account (organisation)
sudo sacctmgr add account science Description="Science Division" Organization="MyOrg"
# Create department accounts under science
sudo sacctmgr add account physics Description="Physics Department" Organization="MyOrg" Parent=science
sudo sacctmgr add account chemistry Description="Chemistry Department" Organization="MyOrg" Parent=science
sudo sacctmgr add account biology Description="Biology Department" Organization="MyOrg" Parent=science
# Create standalone accounts
sudo sacctmgr add account ai Description="AI Research" Organization="MyOrg"
sudo sacctmgr add account engineering Description="Engineering" Organization="MyOrg"
View account hierarchy:
sacctmgr show account -s
Account Descr Org
---------- -------------------- --------------------
science Science Division MyOrg
physics Physics Department MyOrg
chemistry Chemistry Department MyOrg
biology Biology Department MyOrg
ai AI Research MyOrg
engineering Engineering MyOrg
Step 12: Add Users to Accounts
# Add users to accounts
sudo sacctmgr add user jsmith Account=physics
sudo sacctmgr add user kwilson Account=ai
sudo sacctmgr add user pjones Account=chemistry
# User can belong to multiple accounts
sudo sacctmgr add user jsmith Account=ai
# Set default account for user
sudo sacctmgr modify user jsmith set DefaultAccount=physics
View user associations:
sacctmgr show assoc format=Cluster,Account,User,Partition,Share,MaxJobs,MaxCPUs
Cluster Account User Partition Share MaxJobs MaxCPUs
---------- ---------- ---------- ---------- --------- -------- --------
mycluster physics jsmith 1
mycluster ai jsmith 1
mycluster ai kwilson 1
mycluster chemistry pjones 1
Step 13: Set Resource Limits
Apply limits at account or user level:
# Limit physics account to 500 CPUs max, 50 concurrent jobs
sudo sacctmgr modify account physics set MaxCPUs=500 MaxJobs=50
# Limit specific user
sudo sacctmgr modify user jsmith set MaxCPUs=100 MaxJobs=10
# Limit by partition
sudo sacctmgr modify user jsmith where partition=gpu set MaxCPUs=32 MaxJobs=2
View limits:
sacctmgr show assoc format=Cluster,Account,User,Partition,MaxJobs,MaxCPUs,MaxNodes
Cluster Account User Partition MaxJobs MaxCPUs MaxNodes
---------- ---------- ---------- ---------- -------- -------- --------
mycluster physics 50 500
mycluster physics jsmith 10 100
mycluster physics jsmith gpu 2 32
Step 14: Configure Fairshare
Fairshare adjusts job priority based on historical usage. Heavy users get lower priority.
# Set shares (relative weight) for accounts
sudo sacctmgr modify account physics set Fairshare=100
sudo sacctmgr modify account chemistry set Fairshare=100
sudo sacctmgr modify account ai set Fairshare=200 # AI gets double weight
Enable fairshare in slurm.conf on the controller:
# Priority settings
PriorityType=priority/multifactor
PriorityWeightFairshare=10000
PriorityWeightAge=1000
PriorityWeightPartition=1000
PriorityWeightJobSize=500
PriorityDecayHalfLife=7-0
PriorityUsageResetPeriod=MONTHLY
Restart slurmctld after changes:
sudo systemctl restart slurmctld
Step 15: Verify Everything Works
Test job submission with accounting:
# Submit job with account
sbatch --account=physics --job-name=test --wrap="sleep 60"
# Check it's tracked
squeue
sacct -j JOBID
Check database connectivity:
# From controller
sacctmgr show cluster
sacctmgr show account
sacctmgr show assoc
Verify accounting is enforced:
# Try submitting without valid account (should fail if enforce=associations)
sbatch --account=nonexistent --wrap="hostname"
# Expected: error: Unable to allocate resources: Invalid account
Check usage reports:
sreport cluster utilization
sreport user top start=2026-01-01
sreport account top start=2026-01-01
Useful sacctmgr Commands
| Command | Purpose |
|---|---|
sacctmgr show cluster |
List registered clusters |
sacctmgr show account |
List all accounts |
sacctmgr show account -s |
Show account hierarchy |
sacctmgr show user |
List all users |
sacctmgr show assoc |
Show all associations (user-account mappings) |
sacctmgr add account NAME |
Create new account |
sacctmgr add user NAME Account=X |
Add user to account |
sacctmgr modify account X set MaxCPUs=Y |
Set account limits |
sacctmgr modify user X set MaxJobs=Y |
Set user limits |
sacctmgr delete user NAME Account=X |
Remove user from account |
sacctmgr delete account NAME |
Delete account |
Troubleshooting
slurmdbd won’t start
# Check logs
sudo tail -100 /var/log/slurm/slurmdbd.log
# Common issues:
# - Wrong database credentials in slurmdbd.conf
# - MySQL not running
# - Permissions on slurmdbd.conf (must be 600, owned by slurm)
# - Munge not running
slurmctld can’t connect to slurmdbd
# Test connectivity
telnet dbserver.example.com 6819
# Check firewall
sudo ufw status
sudo firewall-cmd --list-all
# Verify slurmdbd is listening
ss -tlnp | grep 6819
Jobs not being tracked
# Verify accounting is enabled
scontrol show config | grep AccountingStorage
# Should show:
# AccountingStorageType = accounting_storage/slurmdbd
# Check association exists for user
sacctmgr show assoc user=jsmith
Database connection errors
# Test MySQL connection from slurmdbd host
mysql -h localhost -u slurm -p slurm_acct_db
# Check MySQL is accepting connections
sudo systemctl status mariadb
sudo tail -100 /var/log/mysql/error.log
My Thoughts
Setting up SLURM accounting properly from the start saves headaches later. Once it’s running, you get automatic tracking of every job, fair scheduling between groups, and the data you need for billing and capacity planning.
Key points to remember:
- Keep the database separate from the controller in production
slurmdbdis the middleman — controller never hits the database directly- Compute nodes don’t need database access, they just run jobs
- Set up your account hierarchy before adding users
- Use
AccountingStorageEnforceto make accounting mandatory - Fairshare prevents any single group from hogging the cluster
The database is your audit trail. It tracks everything, so when someone asks “why is my job slow” or “how much did we use last month”, you have the answers.
