Arrays of Jobs

Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.) however it is possible to change some of these options after the job has begun execution using the scontrol command.

The particular power of job arrays is to be able to to SIMD – Single Instruction, Multiple Data jobs in which each job element, via the SLURM variable that is different for each array job member being used to select appropriate data. It may also be used to select other distinct parameters.

Usage Examples

Basic Usage

Arrays are specified as part of the sbatch command in the form:

sbatch --array=0-31 other_options job_script.sh parameters

which specifies job array ids to be 0,1,2….31

or

sbatch --array=1,3,5,7 other_options job_script.sh parameters

which specifies job array ids to be 1,3,5,7

or

sbatch --array=1-7:2 other-options job_script.sh parameters

which runs with IDs set to 1, then 1+2, 1+2+2, etc, or 1,3,5,7

Within the job script job_script.sh above, the array ID is provided by the environment variable SLURM_ARRAY_TASK_ID and the number of sub jobs by SLURM_ARRAY_TASK_COUNT and the minimum and maximum ID values by SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.

Caveat for Single Core Array Job Elements

The above is not recommended if each array job element does not use an entire node.

Single Core Array Job Elements

Due to node exclusivity, an array job composed of single core jobs will use one node per single core job. This is unlikely to be what is intended.

The preferred method in this case is to split the job via two scripts, master.sh and slave.sh. The master.sh script is then used as an array job via:

sbatch --array=1-2 other-options master.sh parameters

where the above submits two array job elements, one with an index of 1, and the second with an index of 2.

master.sh is then defined as:

#!/bin/bash
# Typical SBATCH commands here

module purge
module load mpi
mpirun -np 28 slave.sh

where module load mpi is replaced by a module load of your preferred MPI version.

slave.sh is defined along the lines:

#!/bin/bash
module purge
module load required_modules
INDEX=$((SLURM_ARRAY_TASK_ID * SLURM_CPUS_ON_NODE + SLURM_LOCALID))
# now use $INDEX as the index for the job as if it was an array job index

In slave.sh, INDEX will take the values 1..56 inclusive across the two nodes that will be allocated to the job (either concurrently or consecutively, depending on available resources).

Caveat for Single Core Jobs

The above is only optimal if all the sub jobs that run on a node last the same time, otherwise accounting resources will be wasted.

Effect on Accounting

In essence, in the example above, you are actually running two jobs (two master.sh jobs), not 56 jobs.

Multiple Core Jobs, Each Fewer Than 28 cores

This is an adaptation of the above.

In the example below, it is assumed that 7 4 core jobs are to be run.

master.sh is defined as:

#!/bin/bash
# Typical SBATCH commands here
module purge
module load mpi
mpirun -np 7 slave.sh

slave.sh is defined along the lines:

#!/bin/bash
module purge
module load required_modules
INDEX=$((SLURM_ARRAY_TASK_ID * SLURM_CPUS_ON_NODE / SLURM_CPUS_PER_TASK + SLURM_LOCALID))
# now use $INDEX as the index for the job as if it was an array job index
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # If your job uses OpenMP, for example

Run with:

sbatch --array=1-2 --cpus-per-task=4 other-options master.sh parameters

This uses INDEX values of 1..14.

Jobs Of Fewer Than a Cores than on a Whole Node, of Uneven Run Lengths

Use the task farming pattern, in conjunction with array jobs. This is advanced usage.

Additional Documentation

SLURM webpages

Leave a Reply