Category: HPC

How to configure Slurm Controller Node on Ubuntu 22.04

How to setup HPC-Slurm Controller Node

Refer to Key Components for HPC Cluster Setup; for which pieces you need to setup.

This guide provides step-by-step instructions for setting up the Slurm controller daemon (`slurmctld`) on Ubuntu 22.04. It also includes common errors encountered during the setup process and how to resolve them.

Step 1: Install Prerequisites

To begin, install the required dependencies for Slurm and its components:

sudo apt update && sudo apt upgrade -y
sudo apt install -y munge libmunge-dev libmunge2 build-essential man-db mariadb-server mariadb-client libmariadb-dev python3 python3-pip chrony

Step 2: Configure Munge (Authentication for slurm)

Munge is required for authentication within the Slurm cluster.

1. Generate a Munge key on the controller node:
sudo create-munge-key

2. Copy the key to all compute nodes:
scp /etc/munge/munge.key user@node:/etc/munge/

3. Start the Munge service:
sudo systemctl enable –now munge

Step 3: Install Slurm

1. Download and compile Slurm:
wget https://download.schedmd.com/slurm/slurm-23.02.4.tar.bz2
tar -xvjf slurm-23.02.4.tar.bz2
cd slurm-23.02.4
./configure –prefix=/usr/local/slurm –sysconfdir=/etc/slurm
make -j$(nproc)
sudo make install

2. Create necessary directories and set permissions:
sudo mkdir -p /etc/slurm /var/spool/slurm /var/log/slurm
sudo chown slurm: /var/spool/slurm /var/log/slurm

3. Add the Slurm user:
sudo useradd -m slurm

Step 4: Configure Slurm; more complex configs contact Nick Tailor

1. Generate a basic `slurm.conf` using the configurator tool at
https://slurm.schedmd.com/configurator.html. Save the configuration to `/etc/slurm/slurm.conf`.

# Basic Slurm Configuration

ClusterName=my_cluster

ControlMachine=slurmctld # Replace with your control node’s hostname

# BackupController=backup-slurmctld # Uncomment and replace if you have a backup controller

.

# Authentication

AuthType=auth/munge

CryptoType=crypto/munge

.

# Logging

SlurmdLogFile=/var/log/slurm/slurmd.log

SlurmctldLogFile=/var/log/slurm/slurmctld.log

SlurmctldDebug=info

SlurmdDebug=info

.

# Slurm User

SlurmUser=slurm

StateSaveLocation=/var/spool/slurm

SlurmdSpoolDir=/var/spool/slurmd

.

# Scheduler

SchedulerType=sched/backfill

SchedulerParameters=bf_continue

.

# Accounting

AccountingStorageType=accounting_storage/none

JobAcctGatherType=jobacct_gather/linux

.

# Compute Nodes

NodeName=node[1-2] CPUs=4 RealMemory=8192 State=UNKNOWN

PartitionName=debug Nodes=node[1-2] Default=YES MaxTime=INFINITE State=UP

2. Distribute `slurm.conf` to all compute nodes:
scp /etc/slurm/slurm.conf user@node:/etc/slurm/

3. Restart Slurm services:
sudo systemctl restart slurmctld
sudo systemctl restart slurmd

Troubleshooting Common Errors

.

root@slrmcltd:~# tail /var/log/slurm/slurmctld.log

[2024-12-06T11:57:25.428] error: High latency for 1000 calls to gettimeofday(): 20012 microseconds

[2024-12-06T11:57:25.431] fatal: mkdir(/var/spool/slurm): Permission denied

[2024-12-06T11:58:34.862] error: High latency for 1000 calls to gettimeofday(): 20029 microseconds

[2024-12-06T11:58:34.864] fatal: mkdir(/var/spool/slurm): Permission denied

[2024-12-06T11:59:38.843] error: High latency for 1000 calls to gettimeofday(): 18842 microseconds

[2024-12-06T11:59:38.847] fatal: mkdir(/var/spool/slurm): Permission denied

Error: Permission Denied for /var/spool/slurm

This error occurs when the `slurm` user does not have the correct permissions to access the directory.

Fix:
sudo mkdir -p /var/spool/slurm
sudo chown -R slurm: /var/spool/slurm
sudo chmod -R 755 /var/spool/slurm

Error: Temporary Failure in Name Resolution

Slurm could not resolve the hostname `slurmctld`. This can be fixed by updating `/etc/hosts`:

1. Edit `/etc/hosts` and add the following:
127.0.0.1
slurmctld
192.168.20.8
slurmctld

2. Verify the hostname matches `ControlMachine` in `/etc/slurm/slurm.conf`.

3. Restart networking and test hostname resolution:
sudo systemctl restart systemd-networkd
ping slurmctld

Error: High Latency for gettimeofday()

Dec 06 11:57:25 slrmcltd.home systemd[1]: Started Slurm controller daemon.

Dec 06 11:57:25 slrmcltd.home slurmctld[2619]: slurmctld: error: High latency for 1000 calls to gettimeofday(): 20012 microseconds

Dec 06 11:57:25 slrmcltd.home systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE

Dec 06 11:57:25 slrmcltd.home systemd[1]: slurmctld.service: Failed with result ‘exit-code’.

This warning typically indicates timing issues in the system.

Fixes:
1. Install and configure `
chrony` for time synchronization:
sudo apt install chrony
sudo systemctl enable –now chrony
   chronyc tracking
timedatectl
2. For virtualized environments, optimize the clocksource:
sudo echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource

3. Disable high-precision timing in `slurm.conf` (optional):
HighPrecisionTimer=NO
sudo systemctl restart slurmctld

Step 5: Verify and Test the Setup

1. Validate the configuration:
scontrol reconfigure
– no errors mean its working. If this doesn’t work check the connection between nodes
update your /etc/hosts to have the hosts all listed across the all machines and nodes.

2. Check node and partition status:
sinfo

root@slrmcltd:/etc/slurm# sinfo

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST

debug* up infinite 1 idle* node1

3. Monitor logs for errors:
sudo tail -f /var/log/slurm/slurmctld.log

.

Written By: Nick Tailor

Deploying Lustre File System with RDMA, Node Maps, and ACLs

Lustre is the de facto parallel file system for high-performance computing (HPC) clusters, providing extreme scalability, high throughput, and low-latency access across thousands of nodes. This guide walks through a complete deployment of Lustre using RDMA over InfiniBand for performance, along with Node Maps for client access control and ACLs for fine-grained permissions.


1. Understanding the Lustre Architecture

Lustre separates metadata and data services into distinct roles:

  • MGS (Management Server) – Manages Lustre configuration and coordinates cluster services.
  • MDT (Metadata Target) – Stores file system metadata (names, permissions, directories).
  • OST (Object Storage Target) – Stores file data blocks.
  • Clients – Mount and access the Lustre file system for I/O.

The typical architecture looks like this:

+-------------+        +-------------+
|   Client 1  |        |   Client 2  |
| /mnt/lustre |        | /mnt/lustre |
+------+------+        +------+------+
       |                        |
       +--------o2ib RDMA-------+
                |
        +-------+-------+
        |     OSS/OST    |
        |   (Data I/O)   |
        +-------+-------+
                |
        +-------+-------+
        |     MGS/MDT    |
        |  (Metadata)    |
        +---------------+

2. Prerequisites and Environment

ComponentRequirements
OSRHEL / Rocky / AlmaLinux 8.x or higher
KernelBuilt with Lustre and OFED RDMA modules
NetworkInfiniBand fabric (Mellanox or compatible)
Lustre Version2.14 or later
DevicesSeparate block devices for MDT, OST(s), and client mount

3. Install Lustre Packages

On MGS, MDT, and OSS Nodes:

dnf install -y lustre kmod-lustre lustre-osd-ldiskfs

On Client Nodes:

dnf install -y lustre-client kmod-lustre-client

4. Configure InfiniBand and RDMA (o2ib)

InfiniBand provides the lowest latency for Lustre communication via RDMA. Configure the o2ib network type for Lustre.

1. Install and verify InfiniBand stack

dnf install -y rdma-core infiniband-diags perftest libibverbs-utils
systemctl enable --now rdma
ibstat

2. Configure IB network

nmcli con add type infiniband ifname ib0 con-name ib0 ip4 10.0.0.1/24
nmcli con up ib0

3. Verify RDMA link

ibv_devinfo
ibv_rc_pingpong -d mlx5_0

4. Configure LNET for o2ib

Create /etc/modprobe.d/lustre.conf with:

options lnet networks="o2ib(ib0)"
modprobe lnet
lnetctl lnet configure
lnetctl net add --net o2ib --if ib0
lnetctl net show

Expected output:

net:
  - net type: o2ib
    interfaces:
      0: ib0

5. Format and Mount Lustre Targets

Metadata Server (MGS + MDT)

mkfs.lustre --fsname=lustrefs --mgs --mdt --index=0 /dev/sdb
mount -t lustre /dev/sdb /mnt/mdt

Object Storage Server (OSS)

mkfs.lustre --fsname=lustrefs --ost --index=0 --mgsnode=<MGS>@o2ib /dev/sdc
mount -t lustre /dev/sdc /mnt/ost

Client Node

mount -t lustre <MGS>@o2ib:/lustrefs /mnt/lustre
sudo mkdir -p /mnt/lustre

sudo mount -t lustre \
  172.16.0.10@o2ib:/lustrefs \
  /mnt/lustre

example without ibnetwork
[root@vbox ~]# mount -t lustre 172.16.0.10@tcp:/lustre /mnt/lustre-client
[root@vbox ~]# 
[root@vbox ~]# # Verify the mount worked
[root@vbox ~]# df -h /mnt/lustre-client
Filesystem                Size  Used Avail Use% Mounted on
172.16.0.10@tcp:/lustre   12G  2.5M   11G   1% /mnt/lustre-client
[root@vbox ~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lustre-MDT0000_UUID         4.5G        1.9M        4.1G   1% /mnt/lustre-client[MDT:0]
lustre-OST0000_UUID         7.5G        1.2M        7.0G   1% /mnt/lustre-client[OST:0]
lustre-OST0001_UUID         3.9G        1.2M        3.7G   1% /mnt/lustre-client[OST:1]
filesystem_summary:        11.4G        2.4M       10.7G   1% /mnt/lustre-client

6. Configuring Node Maps (Access Control)

Node maps allow administrators to restrict Lustre client access based on network or host identity.

1. View current node maps

lctl nodemap_list

2. Create a new node map for trusted clients

lctl nodemap_add trusted_clients

3. Add allowed network range or host

lctl nodemap_add_range trusted_clients 10.0.0.0/24

4. Enable enforcement

lctl set_param nodemap.trusted_clients.admin=1
lctl set_param nodemap.trusted_clients.trust_client_ids=1

5. Restrict default map

lctl set_param nodemap.default.reject_unauthenticated=1

This ensures only IPs in 10.0.0.0/24 can mount and access the Lustre filesystem.


7. Configuring Access Control Lists (ACLs)

Lustre supports standard POSIX ACLs for fine-grained directory and file permissions.

1. Enable ACL support on mount

mount -t lustre -o acl <MGS>@o2ib:/lustrefs /mnt/lustre

2. Verify ACL support

mount | grep lustre

Should show:

/dev/sda on /mnt/lustre type lustre (rw,acl)

3. Set ACLs on directories

setfacl -m u:researcher:rwx /mnt/lustre/projects
setfacl -m g:analysts:rx /mnt/lustre/reports

4. View ACLs

getfacl /mnt/lustre/projects

Sample output:

# file: projects
# owner: root
# group: root
user::rwx
user:researcher:rwx
group::r-x
group:analysts:r-x
mask::rwx
other::---

8. Verifying Cluster Health

On all nodes:

lctl ping <MGS>@o2ib
lctl dl
lctl get_param -n net.*.state

Check RDMA performance:

lctl get_param -n o2iblnd.*.stats

Check file system mount from client:

df -h /mnt/lustre

Optional: Check node map enforcement

Try mounting from an unauthorized IP — it should fail:

mount -t lustre <MGS>@o2ib:/lustrefs /mnt/test
mount.lustre: mount <MGS>@o2ib:/lustrefs at /mnt/test failed: Permission denied

9. Common Issues and Troubleshooting

IssuePossible CauseResolution
Mount failed: no route to hostIB subnet mismatch or LNET not configuredVerify lnetctl net show and ping -I ib0 between nodes.
Permission deniedNode map restriction activeCheck lctl nodemap_list and ensure client IP range is allowed.
Slow performanceRDMA disabled or fallback to TCPVerify lctl list_nids shows @o2ib transport.

10. Final Validation Checklist

  • InfiniBand RDMA verified with ibv_rc_pingpong
  • LNET configured for o2ib(ib0)
  • MGS, MDT, and OST mounted successfully
  • Clients connected via @o2ib
  • Node maps restricting unauthorized hosts
  • ACLs correctly enforcing directory-level access

Summary

With RDMA transport, Lustre achieves near line-rate performance while node maps and ACLs enforce robust security and access control. This combination provides a scalable, high-performance, and policy-driven storage environment ideal for AI, HPC, and research workloads.