Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.) however it is possible to change some of these options after the job has begun execution using the scontrol command.
The particular power of job arrays is to be able to to SIMD – Single Instruction, Multiple Data jobs in which each job element, via the SLURM variable that is different for each array job member being used to select appropriate data. It may also be used to select other distinct parameters.
Arrays are specified as part of the sbatch command in the form:
sbatch --array=0-31 other_options job_script.sh parameters
which specifies job array ids to be 0,1,2….31
sbatch --array=1,3,5,7 other_options job_script.sh parameters
which specifies job array ids to be 1,3,5,7
sbatch --array=1-7:2 other-options job_script.sh parameters
which runs with IDs set to 1, then 1+2, 1+2+2, etc, or 1,3,5,7
Within the job script job_script.sh above, the array ID is provided by the environment variable SLURM_ARRAY_TASK_ID and the number of sub jobs by SLURM_ARRAY_TASK_COUNT and the minimum and maximum ID values by SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.
Caveat for Single Core Array Job Elements
The above is not recommended if each array job element does not use an entire node.
Single Core Array Job Elements
Due to node exclusivity, an array job composed of single core jobs will use one node per single core job. This is unlikely to be what is intended.
The preferred method in this case is to split the job via two scripts, master.sh and slave.sh. The master.sh script is then used as an array job via:
sbatch --array=1-2 other-options master.sh parameters
where the above submits two array job elements, one with an index of 1, and the second with an index of 2.
master.sh is then defined as:
#!/bin/bash # Typical SBATCH commands here module purge module load mpi mpirun -np 28 slave.sh
where module load mpi is replaced by a module load of your preferred MPI version.
slave.sh is defined along the lines:
#!/bin/bash module purge module load required_modules INDEX=$((SLURM_ARRAY_TASK_ID * SLURM_CPUS_ON_NODE + SLURM_LOCALID)) # now use $INDEX as the index for the job as if it was an array job index
In slave.sh, INDEX will take the values 1..56 inclusive across the two nodes that will be allocated to the job (either concurrently or consecutively, depending on available resources).
Caveat for Single Core Jobs
The above is only optimal if all the sub jobs that run on a node last the same time, otherwise accounting resources will be wasted.
Effect on Accounting
In essence, in the example above, you are actually running two jobs (two master.sh jobs), not 56 jobs.
Multiple Core Jobs, Each Fewer Than 28 cores
This is an adaptation of the above.
In the example below, it is assumed that 7 4 core jobs are to be run.
master.sh is defined as:
#!/bin/bash # Typical SBATCH commands here module purge module load mpi mpirun -np 7 slave.sh
slave.sh is defined along the lines:
#!/bin/bash module purge module load required_modules INDEX=$((SLURM_ARRAY_TASK_ID * SLURM_CPUS_ON_NODE / SLURM_CPUS_PER_TASK + SLURM_LOCALID)) # now use $INDEX as the index for the job as if it was an array job index export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # If your job uses OpenMP, for example
sbatch --array=1-2 --cpus-per-task=4 other-options master.sh parameters
This uses INDEX values of 1..14.
Jobs Of Fewer Than a Cores than on a Whole Node, of Uneven Run Lengths
Use the task farming pattern, in conjunction with array jobs. This is advanced usage.