SLURM single node install

SimStack requires a batch system such as Torque, SLURM or LSF for job execution. Of these SLURM is the most unproblematic to run on a single node. It is also easily available on Centos, RHEL and Ubuntu server.

In this tutorial we explain how to install it on a fresh Centos 7 system. Most of the commands are run as root. If you are currently a user, change to root:

sudo -i
#or
su

SELinux disable

While it is completely possible to install SLURM on a SELinux system, it is very probable to run into file permission issues that way. For the first steps with SLURM, we recommend disabling SELinux. Edit /etc/selinux/config and change SELINUX to disabled:

SELINUX=disabled
# and then:
reboot
# and obtain superuser rights
sudo -i

Installing OpenHPC repository

The most straightforward way to install Slurm is using the packages from the OpenHPC repository. OpenHPC itself requires the epel repository. To install everything run the following codeblock and accept gpg keys along the way:

yum install wget epel-release unzip
yum update
wget "https://github.com/openhpc/ohpc/releases/download/v1.3.GA/ohpc-release-1.3-1.el7.x86_64.rpm"
rpm -ivh ohpc-release-1.3-1.el7.x86_64.rpm
yum update

Afterwards install the following packages:

yum install ohpc-slurm-server ohpc-slurm-client slurm-torque-ohpc slurm-slurmd-ohpc slurm-ctld-ohpc

Check that the file /etc/munge/munge.key was generated. Start munge and enable it by doing:

systemctl start munge
systemctl status munge
# and if everything is ok:
systemctl enable munge

Munge is a daemon to encrypt decrypt messages between clusternodes. Here it is only required for in-node communication. Check that everything works:

munge -n | unmunge

If your current server machine has a graphical user environment, open the following page in a webbrowser: /usr/share/doc/slurm-18.08.8/html/configurator.easy.html . If you do not have a GUI, copy it over using ssh and look at the page on your local machine. You can find your hostname using:

hostname -f

You should provide these options:

  • SlurmctldHost: The info obtained using hostname -f
  • NodeName: Again the same info
  • ComputeNodeAddress: Leave empty
  • MaxTime: Leave infinity
  • PartitionName: batch <- do not use the name "default"
  • CPUs: Empty
  • Sockets: The number of physical chips you have in your system. Usually smaller than 2
  • CoresPerSocket: The number of actual Cores your PC has, not including hyperthreading.
  • ThreadsPerCore: 2 if your PC has hyperthreading. 1 if your PC does not have hyperthreading.
  • RealMemory: The memory of your PC in megabytes.
  • SlurmUser: slurm (leave at default)
  • StateSaveLocation: /var/spool/slurm <- it is very important to change this on Centos!
  • SlurmdSpoolDir: /var/spool/slurmd <- default
  • ReturnToService: 2:Upon registration with a valid configuration.
  • Scheduling: Backfill, default
  • FastSchedule: 0
  • InterConnect: None (default)
  • Default MPI: None (default)
  • Process Tracking: LinuxProc, This is also important. Cgroup would be much better, but has a broken cgroup.conf by default on centos.
  • Resource Selection: Cons_res
  • SelectTypeParameters: CR_Core
  • Task Launch: None
  • Both EventLogging fields: Leave empty
  • JobAccountingGather: None
  • JobAccountingStorage: None
  • ClusterName: Be creative or leave it at its default: cluster
  • Process ID Logging: Leave both PID file fields at their default

Click submit and Copy and Paste the resulting text file into a new file at /etc/slurm/slurm.conf (ovewriting the old file). Leave the browser open in case an error occurs in the next step and we need to regenerate this file.

Start slurmctld. It is advisable to open another shell as root at the same time and check

journalctl -f 

to see the error message printed, if any. A common error is that slurm complains about the CPUs Sockets and CoresPerSocket options above, because they were chosen not to slurms liking. If Slurm complains about a missing state directory, this is normal after a first start, as this state directory has not been created yet.

systemctl start slurmctld
# Check its status
systemctl status slurmctld
# If everything is fine, enabe it
systemctl enable slurmctld

Do the same with slurmd

systemctl start slurmd
# Check its status
systemctl status slurmd
# and enable:
systemctl enable slurmd

To finally test the slurm installation switch away to your user account (do not do the following step as root) and copy and paste the following script into a file called test.sbatch

#!/bin/bash
#SBATCH -J test # Job name
#SBATCH -o job.%j.out # Name of stdout output file (%j expands to %jobId)
#SBATCH -N 1 # Total number of nodes requested
#SBATCH -n 2 # Total number of mpi tasks #requested
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - 1.5 hours
# Launch MPI-based executable
echo "Test output from Slurm Testjob"
NODEFILE=`generate_pbs_nodefile`
cat $NODEFILE
sleep 20

and submit it using

sbatch ./test.sbatch

Check that it was submitted using

squeue

Once it ran through it should show a job.X.out file with the output of this job. If it concludes, you are set to use the Nanomatch software.

Installing the Nanomatch software

The rest of the installation is not carried out as root. Please switch back to your local user or login again. You can either install the software as your local user or shared for everybody on the server. If you install it shared for everybody, please add another user to your cluster. Afterwards add your own user (here: centos) to the new user's group

# This step is not required for a single user install, it does require root again
sudo -i
useradd nanomatch
# The next step might tell you that the group already exists. Ignore the warning if it pops up
groupadd nanomatch 
gpasswd -a centos nanomatch

We switch to the nanomatch user:

#This step is not required for a single user install
sudo -i
su nanomatch

Choose a directory on the server to install our software stack, e.g.

#Choose this inside your home directory for a single user install
/home/nanomatch

Upload all Nanomatch archives into this folder and unpack them using the following two commands

for i in *.tar.gz; do tar xf $i; done
for i in *.zip; do unzip $i; done

Afterwards enter the directory and run the postinstaller:

cd /home/nanomatch/nanomatch/V2
./postinstall.sh

Finally you have to edit the file customer.config

cp customer.config.template customer.config
vi customer.config

Here you have to enter the IP or hostname to the license server (unused, if it is a trial version), the node-local scratch directory and the path to the Turbomole installation. If you do not have a server with a scratch directory, enter:

export SCRATCH=$HOME/scratch

In this case SCRATCH will be generated inside your home. This will have performance implications, as dedicated scratch space is much faster most of the time. Also uncomment the HOSTFILE=.. export appropiate for your queueing system. In Slurms case this is

export HOSTFILE=`generate_pbs_nodefile`

If you chose to have a shared installation, you should also change the read rights to allow all users to access the directory.

#This step is not required for a single user install
chmod -R g+rX /home/nanomatch

Setting up your local user account for passwordless ssh

The rest of the commands are run as your user on your own machine and not the server. You can log out of the server now.

SimStack communicates with the server via ssh. You have to setup passwordless ssh to the server for SimStack to connect. If you do not already have a ssh key on your machine, run ssh-keygen like this:

#On most Linux distributions:
ssh-keygen # followed by return, return, return (no passphrases)

#On OSX post Mojave 10.14 (because the key format changed, but also possible on Linux
ssh-keygen -m PEM -t rsa -b 4096 -C "your_email@example.com"

#Afterwards transfer it, by invoking
ssh-copy-id username@hostname_of_slurm_server
#and logging in with your password once.

# Test whether passwordless login works, by logging in using
ssh username@hostname_of_slurm_server

Setting up the SimStack client

Unpack the simstack_linux.tar.gz or simstack_osx.tar.gz folders on your client machine and run:

tar xf simstack_linux.tar.gz
# of
tar xf simstack_osx.tar.gz
./run-simstack.command # on osx
./run-simstack.sh # on linux

Unpack the simstack_wanos.zip in a different directory and open the Configuration -> Paths pane and point the workflows directory to a new empty directory and the Wanos directory to the unpacked wanos. Open the Configuration -> Servers Pane and input:

* The hostname of the server we determined above
* Your ssh username on the server
* The ssh port
* The path to the unpacked nanomatch software on the server, e.g. /home/nanomatch/nanomatch
* a workspace directory, which does not yet exist on the server.

Once you finish, click connect and if the symbol turns green you are done.

The results of the search are