Is there a way to run scripts with dependencies via GNU parallel?
I wish to run the following scripts:
aa_00.sh # run time ~6 hr
aa_01.sh # dependent on aa_00.sh; ~6 hr
aa_02.sh # dependent on aa_00.sh; ~6 hr
aa_03.sh # dependent on aa_00.sh; ~6 hr
bb_00.sh # run time ~2 hr
bb_01.sh # dependent on bb_00.sh; ~2 hr
bb_02.sh # dependent on bb_00.sh; ~2 hr
bb_03.sh # dependent on bb_00.sh; ~2 hr
Scripts aa_01.sh, aa_02.sh, and aa_03.sh must not run until script aa_00.sh completes.
Scripts aa_01.sh, aa_02.sh, and aa_03.sh are completely independent of each other and can run in parallel.
Similarly, scripts bb_01.sh, bb_02.sh, and bb_03.sh must not run until script bb_00.sh completes.
Scripts bb_01.sh, bb_02.sh, and bb_03.sh are completely independent of each other and can run in parallel.
I have 4 CPUs [*].
[*] Actually, I am using GPUs so I am using:
'eval CUDA_VISIBLE_DEVICES={%} {}'
# i removed the "({%} - 1)" notation just for simplicity here
Is there a way to run these 8 scripts efficiently such that the dependencies on aa_00.sh and bb_00.sh are respected?
One idea I had was at the completion of aa_00.sh, release the subsequent aa_{1,2,3}.sh scripts via parallel. And at the completion of bb_00.sh, release the subsequent bb_{1,2,3}.sh scripts via parallel. But because two different runs of parallel are used, the bb_* scripts don't know that aa_* scripts are running (and vice versa):
cat commands_aa.txt
aa_01.sh
aa_02.sh
aa_03.sh
CUDA_VISIBLE_DEVICES=0 aa_00.sh
parallel -j4 -a commands_aa.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'
cat commands_bb.txt
bb_01.sh
bb_02.sh
bb_03.sh
CUDA_VISIBLE_DEVICES=1 bb_00.sh
parallel -j4 -a commands_bb.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'
Conceptually, I'd like to add inputs to an already running parallel command. I tried overwriting the -a commands.txt file when parallel was already running but that did not achieve what I wanted (I would have been shocked if that did work).
In actuality, I have more than just aa and bb scripts; I have as many as 8 or 10 (ie, aa, bb, ..., hh, ii, ...). And I have more than 3 scripts that run after the _00 script; I have 12 in total: _00 _01, ..., _11. All of them have the dependency on their respective _00 script.
I was looking at the python library luigi, too. luigi can handle dependencies but I don't think it can handle parallelization. I also looked at the python module joblib.Parallel(). Perhaps I need to combine luigi and joblib.Parallel().
Thank you.
Additional Thoughts
- I do think what I need is to have each
_00script add its dependents upon its completion. - But I need to add these dependents to the list that
parallelis already working on.
Something like this (conceptually):
commands.txtcontains:
aa_00.sh
bb_00.sh
- run
parallel:
parallel -j4 -a commands.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'
CUDA_VISIBLE_DEVICES=1<--aa_00.shCUDA_VISIBLE_DEVICES=2<--bb_00.shwhen
bb_00.shcompletes it appends its dependencies to the bottom ofcommands.txt, like so:commands.txtupdated:
aa_00.sh # still running on GPU 1
bb_00.sh # this completed on GPU 2
bb_01.sh # these new scripts are
bb_02.sh # appended to
bb_03.sh # commands.txt
Somehow,
parallelmagically is okay with these new lines of input and these new scripts are queued to GPUs 3, 4, and 2.CUDA_VISIBLE_DEVICES=3<--bb_01.shCUDA_VISIBLE_DEVICES=4<--bb_02.shCUDA_VISIBLE_DEVICES=2<--bb_03.shCUDA_VISIBLE_DEVICES=1<-- still runningaa_03.shbb_01.shcompletes on GPU 3; no dependencies so nothing is appended tocommands.txt
The joblog would look something like:
aa_00.sh GPU=1 running
bb_00.sh GPU=2 completed
bb_01.sh GPU=3 completed
bb_02.sh GPU=4 running
bb_03.sh GPU=2 running
Eventually
aa_00.shcompletes so it appends its dependencies to the bottom ofcommands.txt.commands.txtupdated:
aa_00.sh # completed on GPU 1
bb_00.sh # completed on GPU 2
bb_01.sh # completed on GPU 3
bb_02.sh # running on GPU 4
bb_03.sh # running on GPU 2
aa_01.sh # these new scripts are
aa_02.sh # appended to
aa_03.sh # commands.txt
Again,
parallelis magically okay with these new lines of input so it dishes out the new scripts to available GPUs.CUDA_VISIBLE_DEVICES=3<--aa_01.shCUDA_VISIBLE_DEVICES=1<--aa_02.shSuppose
bb_02.shcompletes next, freeing up GPU 4.CUDA_VISIBLE_DEVICES=4<--aa_03.sh
Now the joblog looks something like:
aa_00.sh GPU=1 completed
bb_00.sh GPU=2 completed
bb_01.sh GPU=3 completed
bb_02.sh GPU=4 completed
bb_03.sh GPU=2 completed
aa_01.sh GPU=3 running
aa_02.sh GPU=1 running
aa_03.sh GPU=4 running
(I may have mixed up the numbering and surely the timing isn't correct (aa runs 3x longer than bb), but hopefully I explained the ordering correctly.)
It's the "magical" part of parallel that I'm unsure of.
CodePudding user response:
Look at https://www.gnu.org/software/parallel/man.html#example-gnu-parallel-as-queue-system-batch-manager
So something like:
true >jobqueue; tail -n 0 -f jobqueue | parallel -j4 'eval CUDA_VISIBLE_DEVICES={%} {}'
echo "aa_00.sh; (echo aa_01.sh; echo aa_02.sh; echo aa_03.sh) >> jobqueue" >> jobqueue
echo "bb_00.sh; (echo bb_01.sh; echo bb_02.sh; echo bb_03.sh) >> jobqueue" >> jobqueue
We are clearly in territory where there must be better tools: GNU Parallel does not have a dependency graph like make has.
