Category: HPC
How to configure Slurm Controller Node on Ubuntu 22.04
How to setup HPC-Slurm Controller Node
Refer to Key Components for HPC Cluster Setup; for which pieces you need to setup.
This guide provides step-by-step instructions for setting up the Slurm controller daemon (`slurmctld`) on Ubuntu 22.04. It also includes common errors encountered during the setup process and how to resolve them.
Step 1: Install Prerequisites
To begin, install the required dependencies for Slurm and its components:
sudo apt update && sudo apt upgrade -y
sudo apt install -y munge libmunge-dev libmunge2 build-essential man-db mariadb-server mariadb-client libmariadb-dev python3 python3-pip chrony
Step 2: Configure Munge (Authentication for slurm)
Munge is required for authentication within the Slurm cluster.
1. Generate a Munge key on the controller node:
sudo create-munge-key
2. Copy the key to all compute nodes:
scp /etc/munge/munge.key user@node:/etc/munge/
3. Start the Munge service:
sudo systemctl enable –now munge
Step 3: Install Slurm
1. Download and compile Slurm:
wget https://download.schedmd.com/slurm/slurm-23.02.4.tar.bz2
tar -xvjf slurm-23.02.4.tar.bz2
cd slurm-23.02.4
./configure –prefix=/usr/local/slurm –sysconfdir=/etc/slurm
make -j$(nproc)
sudo make install
2. Create necessary directories and set permissions:
sudo mkdir -p /etc/slurm /var/spool/slurm /var/log/slurm
sudo chown slurm: /var/spool/slurm /var/log/slurm
3. Add the Slurm user:
sudo useradd -m slurm
Step 4: Configure Slurm; more complex configs contact Nick Tailor
1. Generate a basic `slurm.conf` using the configurator tool at
https://slurm.schedmd.com/configurator.html. Save the configuration to `/etc/slurm/slurm.conf`.
# Basic Slurm Configuration
ClusterName=my_cluster
ControlMachine=slurmctld # Replace with your control node’s hostname
# BackupController=backup-slurmctld # Uncomment and replace if you have a backup controller
# Authentication
AuthType=auth/munge
CryptoType=crypto/munge
# Logging
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmctldDebug=info
SlurmdDebug=info
# Slurm User
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm
SlurmdSpoolDir=/var/spool/slurmd
# Scheduler
SchedulerType=sched/backfill
SchedulerParameters=bf_continue
# Accounting
AccountingStorageType=accounting_storage/none
JobAcctGatherType=jobacct_gather/linux
# Compute Nodes
NodeName=node[1-2] CPUs=4 RealMemory=8192 State=UNKNOWN
PartitionName=debug Nodes=node[1-2] Default=YES MaxTime=INFINITE State=UP
2. Distribute `slurm.conf` to all compute nodes:
scp /etc/slurm/slurm.conf user@node:/etc/slurm/
3. Restart Slurm services:
sudo systemctl restart slurmctld
sudo systemctl restart slurmd
Troubleshooting Common Errors
root@slrmcltd:~# tail /var/log/slurm/slurmctld.log
[2024-12-06T11:57:25.428] error: High latency for 1000 calls to gettimeofday(): 20012 microseconds
[2024-12-06T11:57:25.431] fatal: mkdir(/var/spool/slurm): Permission denied
[2024-12-06T11:58:34.862] error: High latency for 1000 calls to gettimeofday(): 20029 microseconds
[2024-12-06T11:58:34.864] fatal: mkdir(/var/spool/slurm): Permission denied
[2024-12-06T11:59:38.843] error: High latency for 1000 calls to gettimeofday(): 18842 microseconds
[2024-12-06T11:59:38.847] fatal: mkdir(/var/spool/slurm): Permission denied
Error: Permission Denied for /var/spool/slurm
This error occurs when the `slurm` user does not have the correct permissions to access the directory.
Fix:
sudo mkdir -p /var/spool/slurm
sudo chown -R slurm: /var/spool/slurm
sudo chmod -R 755 /var/spool/slurm
Error: Temporary Failure in Name Resolution
Slurm could not resolve the hostname `slurmctld`. This can be fixed by updating `/etc/hosts`:
1. Edit `/etc/hosts` and add the following:
127.0.0.1 slurmctld
192.168.20.8 slurmctld
2. Verify the hostname matches `ControlMachine` in `/etc/slurm/slurm.conf`.
3. Restart networking and test hostname resolution:
sudo systemctl restart systemd-networkd
ping slurmctld
Error: High Latency for gettimeofday()
Dec 06 11:57:25 slrmcltd.home systemd[1]: Started Slurm controller daemon.
Dec 06 11:57:25 slrmcltd.home slurmctld[2619]: slurmctld: error: High latency for 1000 calls to gettimeofday(): 20012 microseconds
Dec 06 11:57:25 slrmcltd.home systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Dec 06 11:57:25 slrmcltd.home systemd[1]: slurmctld.service: Failed with result ‘exit-code’.
This warning typically indicates timing issues in the system.
Fixes:
1. Install and configure `chrony` for time synchronization:
sudo apt install chrony
sudo systemctl enable –now chrony
chronyc tracking
timedatectl
2. For virtualized environments, optimize the clocksource:
sudo echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
3. Disable high-precision timing in `slurm.conf` (optional):
HighPrecisionTimer=NO
sudo systemctl restart slurmctld
Step 5: Verify and Test the Setup
1. Validate the configuration:
scontrol reconfigure
– no errors mean its working. If this doesn’t work check the connection between nodes
update your /etc/hosts to have the hosts all listed across the all machines and nodes.
2. Check node and partition status:
sinfo
root@slrmcltd:/etc/slurm# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 idle* node1
3. Monitor logs for errors:
sudo tail -f /var/log/slurm/slurmctld.log
Written By: Nick Tailor
Deploying Lustre File System with RDMA, Node Maps, and ACLs
Lustre is the de facto parallel file system for high-performance computing (HPC) clusters, providing extreme scalability, high throughput, and low-latency access across thousands of nodes. This guide walks through a complete deployment of Lustre using RDMA over InfiniBand for performance, along with Node Maps for client access control and ACLs for fine-grained permissions.
1. Understanding the Lustre Architecture
Lustre separates metadata and data services into distinct roles:
- MGS (Management Server) – Manages Lustre configuration and coordinates cluster services.
- MDT (Metadata Target) – Stores file system metadata (names, permissions, directories).
- OST (Object Storage Target) – Stores file data blocks.
- Clients – Mount and access the Lustre file system for I/O.
The typical architecture looks like this:
+-------------+ +-------------+
| Client 1 | | Client 2 |
| /mnt/lustre | | /mnt/lustre |
+------+------+ +------+------+
| |
+--------o2ib RDMA-------+
|
+-------+-------+
| OSS/OST |
| (Data I/O) |
+-------+-------+
|
+-------+-------+
| MGS/MDT |
| (Metadata) |
+---------------+
2. Prerequisites and Environment
| Component | Requirements |
|---|---|
| OS | RHEL / Rocky / AlmaLinux 8.x or higher |
| Kernel | Built with Lustre and OFED RDMA modules |
| Network | InfiniBand fabric (Mellanox or compatible) |
| Lustre Version | 2.14 or later |
| Devices | Separate block devices for MDT, OST(s), and client mount |
3. Install Lustre Packages
On MGS, MDT, and OSS Nodes:
dnf install -y lustre kmod-lustre lustre-osd-ldiskfs
On Client Nodes:
dnf install -y lustre-client kmod-lustre-client
4. Configure InfiniBand and RDMA (o2ib)
InfiniBand provides the lowest latency for Lustre communication via RDMA. Configure the o2ib network type for Lustre.
1. Install and verify InfiniBand stack
dnf install -y rdma-core infiniband-diags perftest libibverbs-utils
systemctl enable --now rdma
ibstat
2. Configure IB network
nmcli con add type infiniband ifname ib0 con-name ib0 ip4 10.0.0.1/24
nmcli con up ib0
3. Verify RDMA link
ibv_devinfo
ibv_rc_pingpong -d mlx5_0
4. Configure LNET for o2ib
Create /etc/modprobe.d/lustre.conf with:
options lnet networks="o2ib(ib0)"
modprobe lnet
lnetctl lnet configure
lnetctl net add --net o2ib --if ib0
lnetctl net show
Expected output:
net:
- net type: o2ib
interfaces:
0: ib0
5. Format and Mount Lustre Targets
Metadata Server (MGS + MDT)
mkfs.lustre --fsname=lustrefs --mgs --mdt --index=0 /dev/sdb
mount -t lustre /dev/sdb /mnt/mdt
Object Storage Server (OSS)
mkfs.lustre --fsname=lustrefs --ost --index=0 --mgsnode=<MGS>@o2ib /dev/sdc
mount -t lustre /dev/sdc /mnt/ost
Client Node
mount -t lustre <MGS>@o2ib:/lustrefs /mnt/lustre
sudo mkdir -p /mnt/lustre
sudo mount -t lustre \
172.16.0.10@o2ib:/lustrefs \
/mnt/lustre
example without ibnetwork
[root@vbox ~]# mount -t lustre 172.16.0.10@tcp:/lustre /mnt/lustre-client
[root@vbox ~]#
[root@vbox ~]# # Verify the mount worked
[root@vbox ~]# df -h /mnt/lustre-client
Filesystem Size Used Avail Use% Mounted on
172.16.0.10@tcp:/lustre 12G 2.5M 11G 1% /mnt/lustre-client
[root@vbox ~]# lfs df -h
UUID bytes Used Available Use% Mounted on
lustre-MDT0000_UUID 4.5G 1.9M 4.1G 1% /mnt/lustre-client[MDT:0]
lustre-OST0000_UUID 7.5G 1.2M 7.0G 1% /mnt/lustre-client[OST:0]
lustre-OST0001_UUID 3.9G 1.2M 3.7G 1% /mnt/lustre-client[OST:1]
filesystem_summary: 11.4G 2.4M 10.7G 1% /mnt/lustre-client
6. Configuring Node Maps (Access Control)
Node maps allow administrators to restrict Lustre client access based on network or host identity.
1. View current node maps
lctl nodemap_list
2. Create a new node map for trusted clients
lctl nodemap_add trusted_clients
3. Add allowed network range or host
lctl nodemap_add_range trusted_clients 10.0.0.0/24
4. Enable enforcement
lctl set_param nodemap.trusted_clients.admin=1
lctl set_param nodemap.trusted_clients.trust_client_ids=1
5. Restrict default map
lctl set_param nodemap.default.reject_unauthenticated=1
This ensures only IPs in 10.0.0.0/24 can mount and access the Lustre filesystem.
7. Configuring Access Control Lists (ACLs)
Lustre supports standard POSIX ACLs for fine-grained directory and file permissions.
1. Enable ACL support on mount
mount -t lustre -o acl <MGS>@o2ib:/lustrefs /mnt/lustre
2. Verify ACL support
mount | grep lustre
Should show:
/dev/sda on /mnt/lustre type lustre (rw,acl)
3. Set ACLs on directories
setfacl -m u:researcher:rwx /mnt/lustre/projects
setfacl -m g:analysts:rx /mnt/lustre/reports
4. View ACLs
getfacl /mnt/lustre/projects
Sample output:
# file: projects
# owner: root
# group: root
user::rwx
user:researcher:rwx
group::r-x
group:analysts:r-x
mask::rwx
other::---
8. Verifying Cluster Health
On all nodes:
lctl ping <MGS>@o2ib
lctl dl
lctl get_param -n net.*.state
Check RDMA performance:
lctl get_param -n o2iblnd.*.stats
Check file system mount from client:
df -h /mnt/lustre
Optional: Check node map enforcement
Try mounting from an unauthorized IP — it should fail:
mount -t lustre <MGS>@o2ib:/lustrefs /mnt/test
mount.lustre: mount <MGS>@o2ib:/lustrefs at /mnt/test failed: Permission denied
9. Common Issues and Troubleshooting
| Issue | Possible Cause | Resolution |
|---|---|---|
Mount failed: no route to host | IB subnet mismatch or LNET not configured | Verify lnetctl net show and ping -I ib0 between nodes. |
Permission denied | Node map restriction active | Check lctl nodemap_list and ensure client IP range is allowed. |
Slow performance | RDMA disabled or fallback to TCP | Verify lctl list_nids shows @o2ib transport. |
10. Final Validation Checklist
- InfiniBand RDMA verified with
ibv_rc_pingpong - LNET configured for
o2ib(ib0) - MGS, MDT, and OST mounted successfully
- Clients connected via
@o2ib - Node maps restricting unauthorized hosts
- ACLs correctly enforcing directory-level access
Summary
With RDMA transport, Lustre achieves near line-rate performance while node maps and ACLs enforce robust security and access control. This combination provides a scalable, high-performance, and policy-driven storage environment ideal for AI, HPC, and research workloads.
- 1
- 2
