HPC Midlands Plus (Athena) Quick-start Guide
Accessing Athena
Log in to athena.hpc-midlands-plus.ac.uk (via ssh) using your SAFE username and the password issued to you via SAFE for this service.
Loughborough University users: please log in to athena.lboro.ac.uk.
Storage
The location of your home files is of the form: /gpfs/home/site/username, where site is the name of your site (e.g. aston, lboro ) and username is your username.
If you have critical data please ensure it is copied to other locations. User areas are not backed up.
You will need to transfer your files to/from Athena via scp or sftp (i.e. use winscp on Windows).
Modules
Athena uses the module system, so to list the available modules type
module avail
Job Submission
Athena uses the SLURM queueing system.
To submit a job use the sbatch command:
sbatch hello.sh
which responds with something like:
Submitted batch job 22465
Here is an example SLURM submission script:
#!/bin/bash
#SBATCH --time=10:00:00
#SBATCH --job-name=myjobname
#SBATCH --partition=compute
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=28
#SBATCH --account=a01
#SBATCH --mail-type=ALL
#SBATCH --mail-user=A.Bloggs@lboro.ac.uk
module purge
module load some_modules
mpirun ./hello_parallel
Please note that the x86 nodes on Athena x86 nodes all have 28 cores.
The working directory of the job defaults to the one in which you submitted the job.
Argument | Meaning |
---|---|
–time=HH:MM:SS | The walltime you are requesting, i.e. the maximum time the job will be allowed to run for. Specified in hours:minutes:seconds. |
–job-name=some_name | The name you have chosen to give the job. Defaults to the name of the job script. |
–output=some_filename | The name of the file in which the stdout output will be saved.Defaults to slurm-NNNNN.out, where NNNNN is the job ID. |
–error=some_filename | The name of the file in which the stderr output will be saved. Defaults to slurm-NNNNN.err, where NNNNN is the job ID. |
–partition=partition_name | The partition your job should run in. For most users, please set this to compute. |
–nodes=some_number | The number of nodes you are requesting. |
–account=account_code | You should have been assigned an account code you can use here. Jobs will usually default to the correct account unless you are a member of multiple projects. |
–workdir=some_directory | Set the job working directory. Without this the working directory defaults to the one where the job was submitted from. |
–mail-type=specification | When the system should send you email. Options are BEGIN, END, FAIL, REQUEUE and ALL. |
–mail-user=thing@somewhere.com | Who to email the job status updates to. Set to your full email address. |
–requeue | Tells SLURM it may restart the job if it has been interrupted (e.g. due to a hardware fault). |
Partitions
Jobs can be submitted to one of three partitions.
compute
- x86 nodes with 28 cores and 128 GB memory.
- There are 512 nodes in this partition.
- The maximum job walltime allowed is 100 hours hours.
- The maximum core count per job is 756 cores (= 27 nodes).
- The maximum cores per user (i.e. all their running jobs) is 2000 cores.
openpower
- OpenPower nodes with 20 cores and 1 TB memory.
- There are 4 nodes in this partition.
- The maximum job walltime allowed is 100 hours hours.
openpower-nvidia
- OpenPower node with 20 cores and 1 TB memory and two Nvidia P100 GPGPUs.
- There is one node in this partition.
- The maximum job walltime allowed is 100 hours hours.
Job Monitoring and Control
The squeue command lists both queued and runnning jobs:
squeue
JOBID PARTITION NAME USERST TIME NODES NODELIST(REASON)
22163 all jobnnt tumax PD0:00 1 (AssociationJobLimit)
22433 all small txdfq R 33:35 8 node[0146-0153]
If a job is running (R in the ST column) then squeue lists the nodes it is running on. If it is waiting to run it usually is shown as pending (PD in the_ST_ column) and squeue lists the reason in the NODELIST(REASON) column.
To show more details on a particular job use scontrol show job
scontrol show job 22401
JobId=22401 Name=tuhs
UserId=tuhs(7890) GroupId=lboro_tu(5678)
Priority=1 Account=tuhs98 QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=05:54:34 TimeLimit=20:00:00 TimeMin=N/A
TEST_JOB SubmitTime=2014-07-24T10:21:37 EligibleTime=2014-07-24T10:21:37
StartTime=2014-07-24T10:21:37 EndTime=2014-07-25T06:21:37
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=all AllocNode_Sid=athena10:3606
ReqNodeList=(null) ExcNodeList=(null)
NodeList=node[56-61]
BatchHost=node56
NumNodes=6 NumCPUs=96 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=16 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=0 Contiguous=0 Licenses=(null) Network=(null)
Command=/gpfs/home/loughborough/tu/tuhs/work/m77r80/submit.sh
WorkDir=/gpfshome/loughborough/tu/tuhs/work/m77r80
If you need to kill a job or remove it from the queue, use the scancel command:
scancel 22465
Command Summary
Command | Meaning |
---|---|
sbatch | Submit a job. Equivalent of qsub. |
squeue | List the queue. Equivalent of qstat or showq. |
scancel | Kill a job. Equivalent of qdel. |
scontrol show job jobid | Show details of a job jobid. Equivalent to qstat -f. |
smap | Graphical display of where jobs are running. Equivalent to showstate. |
squeue –start | Show a expected start time for a job. Equivalent to showstart. |
srun –exports=PATH –pty bash | Start an interactive job. Equivalent to qsub -I. |
Environment Variables
These environment variables may be useful in your submission scripts:
Variable | Meaning |
---|---|
SLURM_SUBMIT_DIR | The directory the job was submitted from. Equivalent to PBS_O_WORKDIR. |
SLURM_JOB_ID | Which job this is. Equivalent to PBS_JOBID. |
SLURM_NODELIST | The nodes assigned to the job. Equivalent to PBS_NODELIST. |
Further Information
Further detail on many of the commands is available using the man command, e.g. type man squeue once you have logged in to Athena. For long pages type a space to get the next page, or q to quit.
These manual pages are also available on the web
Getting Help
Users at each member site should contact their local support.
University | Support Contact / E-mail / Web site |
---|---|
Aston University | Support email |
University of Birmingham | Service Desk |
University of Leicester | Rcs.support@le.ac.uk, Service Desk |
Loughborough University | research-computing@lboro.ac.uk |
University of Nottingham | Service Desk |
Queen Mary, University of London | its-research-support@qmul.ac.uk |
University of Warwick | Bugzilla |
Installed Software
The cluster has a standard CentOS 7.3 Linux installation, with the Intel 2017 Compiler suite, MKL and MPI added.
The initial set of user software will be built and installed in tranches, based upon the priorities agreed by the HPC Midlands Plus working group to support the pilot service.
First tranche – compilers, libraries and other supporting packages.
Note that this is not an extensive list
Package | Version(s) |
---|---|
perl | 5.24.1 |
python | 2.7.12, 3.6.0 |
arpack | 96 |
arpack++ | 1.2 |
blas (non-MKL) | 3.6.0 |
openblas | 0.2.18 |
boost | 1.61.0 |
cblas | 3.6.0 |
cuda | 8 |
fftw | 2.1.5, 3.3.5 |
hdf5 | 1.8.17 |
Lapack (non-MKL) | 3.6.1 |
metis | 5.1.0 |
netcdf | 4.4.1 |
numexpr | 2.6.2 |
numpy | 1.11.2 |
openpyxl | 2.4.5 |
pandas | 0.19.2 |
parmetis | 4.0.3 |
patsy | 0.4.1 |
petsc | 3.7.6 |
pillow | 4.0.0 |
qhul | 2015.2 |
qt | 4.8.7 |
scalapack | 2.0.2 |
scipy | 0.18.1 |
suitesparse | 4.5.3 |
superlu | 5.2.1 |
git | 2.9.3 |
subversion | 1.9.5 |
gcc | 4.9.3, 6.3.0 |
glpk | 4.55 |
gsl | 1.16 |
jre | 1.8.0_121-b13 |
ant | 1.10.0 |
Second tranche
Package | Version(s) |
---|---|
castep | 17.2 |
DLPOLY Classic | 1.9 |
gromacs | 5.1.4 |
lammps | 17Nov16 |
R | 3.3.2 |
gdal | 2.1.2 |
namd | 2.12 |
Third tranche
Package | Version(s) |
---|---|
superlu | 5.2.1 |
metis-mt | 0.6.0 |
cblas | 3.6.0 |
amber | 16 |
DLPOLY | 4.08 |
cp2k | 4.1 |
gulp | 4.3.x |
ipython | 5.1 |
matlab | 2016b, 2017a |
Matlab toolboxes | |
OpenFOAM | V1612+, 2.4 |
paraview | 5.2.0 |
siesta | 3.2-pl-4 |
vtk | 8.0.0 |
cvodes | 2.9.0 |
ffmpeg | 2.8.6,3.3.2 |
hypre | 2.11.1 |
id | 2.9.0 |
idas | 1.3.0 |
kinsol | 2.9.0 |
cvode | 2.9.0 |
WRF | 3.6.1, 3.8.1 |