I am running around 200 matlab codes on a slurm cluster. The codes are not parallelized but use intensive vectorized notation. So each code uses around 5-6 cores of processing power.
The sbatch code I am using is below:
#!/bin/bash
#SBATCH --job-name=sdmodel
#SBATCH --output=logs/out/%a
#SBATCH --error=logs/err/%a
#SBATCH --nodes=1
#SBATCH --partition=common
#SBATCH --exclusive
#SBATCH --mem=0
#SBATCH --array=1-225
module load Matlab/R2021a
matlab -nodisplay -r "run('main_cluster2.m'); exit"
Now the code above will assign one cluster node to each matlab task (225 of such tasks). However some cluster nodes have 20 or more cores. Which means that I could efficiently use one node to run 3 or 4 tasks simultaneously. Is there anyway to modify the above code to do so?
CodePudding user response:
Provided the cluster is configured to allow node sharing, you can remove the line #SBATCH --exclusive which requests that a full node be allocated to each job in the array and replace it with
SBATCH --cpus-per-task=5
to request 5 CPUs on the same node for each job in the array.
On a 20-core node, Slurm will be able to place 4 such jobs.
CodePudding user response:
If node sharing is not allowed, then you should be able to use multiple srun commands in the script to subdivide the node. If you wanted to use 4 cores per task (on a 20 core node) then your script would then change to something like:
#!/bin/bash
#SBATCH --job-name=sdmodel
#SBATCH --output=logs/out/%a
#SBATCH --error=logs/err/%a
#SBATCH --nodes=1
#SBATCH --partition=common
#SBATCH --exclusive
#SBATCH --mem=0
#SBATCH --array=1-225
module load Matlab/R2021a
for i in $(seq 1 5)
do
srun --ntasks=4 --exact matlab -nodisplay -r "run('main_cluster2.m'); exit" &
done
wait
The "&" at the end of each srun command puts the command into the background so you can skip onto launching multiple copies. The wait at the end makes sure the script waits for all backgrounded processes to finish before exiting.
Note, this may lead to wasted resources if each of the individual matlab commands take very different amounts of time as some runs will finish before others, leaving cores idle.
