GNU parallel with dependencies?-CodePudding

Is there a way to run scripts with dependencies via GNU parallel?

I wish to run the following scripts:

aa_00.sh     # run time ~6 hr
aa_01.sh     # dependent on aa_00.sh; ~6 hr
aa_02.sh     # dependent on aa_00.sh; ~6 hr
aa_03.sh     # dependent on aa_00.sh; ~6 hr

bb_00.sh     # run time ~2 hr
bb_01.sh     # dependent on bb_00.sh; ~2 hr
bb_02.sh     # dependent on bb_00.sh; ~2 hr
bb_03.sh     # dependent on bb_00.sh; ~2 hr

Scripts aa_01.sh, aa_02.sh, and aa_03.sh must not run until script aa_00.sh completes.

Scripts aa_01.sh, aa_02.sh, and aa_03.sh are completely independent of each other and can run in parallel.

Similarly, scripts bb_01.sh, bb_02.sh, and bb_03.sh must not run until script bb_00.sh completes.

Scripts bb_01.sh, bb_02.sh, and bb_03.sh are completely independent of each other and can run in parallel.

I have 4 CPUs [*].

[*] Actually, I am using GPUs so I am using:

'eval CUDA_VISIBLE_DEVICES={%} {}'
# i removed the "({%} - 1)" notation just for simplicity here

Is there a way to run these 8 scripts efficiently such that the dependencies on aa_00.sh and bb_00.sh are respected?

One idea I had was at the completion of aa_00.sh, release the subsequent aa_{1,2,3}.sh scripts via parallel. And at the completion of bb_00.sh, release the subsequent bb_{1,2,3}.sh scripts via parallel. But because two different runs of parallel are used, the bb_* scripts don't know that aa_* scripts are running (and vice versa):

cat commands_aa.txt
  aa_01.sh
  aa_02.sh
  aa_03.sh
CUDA_VISIBLE_DEVICES=0 aa_00.sh
parallel -j4 -a commands_aa.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'

cat commands_bb.txt
  bb_01.sh
  bb_02.sh
  bb_03.sh
CUDA_VISIBLE_DEVICES=1 bb_00.sh
parallel -j4 -a commands_bb.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'

Conceptually, I'd like to add inputs to an already running parallel command. I tried overwriting the -a commands.txt file when parallel was already running but that did not achieve what I wanted (I would have been shocked if that did work).

In actuality, I have more than just aa and bb scripts; I have as many as 8 or 10 (ie, aa, bb, ..., hh, ii, ...). And I have more than 3 scripts that run after the _00 script; I have 12 in total: _00 _01, ..., _11. All of them have the dependency on their respective _00 script.

I was looking at the python library luigi, too. luigi can handle dependencies but I don't think it can handle parallelization. I also looked at the python module joblib.Parallel(). Perhaps I need to combine luigi and joblib.Parallel().

Thank you.

Additional Thoughts

I do think what I need is to have each _00 script add its dependents upon its completion.
But I need to add these dependents to the list that parallel is already working on.

Something like this (conceptually):

commands.txt contains:

aa_00.sh
bb_00.sh

run parallel:

parallel -j4 -a commands.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'

CUDA_VISIBLE_DEVICES=1 <-- aa_00.sh
CUDA_VISIBLE_DEVICES=2 <-- bb_00.sh
when bb_00.sh completes it appends its dependencies to the bottom of commands.txt, like so:
commands.txt updated:

aa_00.sh  # still running on GPU 1
bb_00.sh  # this completed on GPU 2
bb_01.sh  # these new scripts are
bb_02.sh  #   appended to
bb_03.sh  #   commands.txt

Somehow, parallel magically is okay with these new lines of input and these new scripts are queued to GPUs 3, 4, and 2.
CUDA_VISIBLE_DEVICES=3 <-- bb_01.sh
CUDA_VISIBLE_DEVICES=4 <-- bb_02.sh
CUDA_VISIBLE_DEVICES=2 <-- bb_03.sh
CUDA_VISIBLE_DEVICES=1 <-- still running aa_03.sh
bb_01.sh completes on GPU 3; no dependencies so nothing is appended to commands.txt

The joblog would look something like:

aa_00.sh  GPU=1  running
bb_00.sh  GPU=2  completed
bb_01.sh  GPU=3  completed
bb_02.sh  GPU=4  running
bb_03.sh  GPU=2  running

Eventually aa_00.sh completes so it appends its dependencies to the bottom of commands.txt.
commands.txt updated:

aa_00.sh  # completed on GPU 1
bb_00.sh  # completed on GPU 2
bb_01.sh  # completed on GPU 3
bb_02.sh  # running on GPU 4
bb_03.sh  # running on GPU 2
aa_01.sh  # these new scripts are
aa_02.sh  #   appended to
aa_03.sh  #   commands.txt

Again, parallel is magically okay with these new lines of input so it dishes out the new scripts to available GPUs.
CUDA_VISIBLE_DEVICES=3 <-- aa_01.sh
CUDA_VISIBLE_DEVICES=1 <-- aa_02.sh
Suppose bb_02.sh completes next, freeing up GPU 4.
CUDA_VISIBLE_DEVICES=4 <-- aa_03.sh

Now the joblog looks something like:

aa_00.sh  GPU=1  completed
bb_00.sh  GPU=2  completed
bb_01.sh  GPU=3  completed
bb_02.sh  GPU=4  completed
bb_03.sh  GPU=2  completed
aa_01.sh  GPU=3  running
aa_02.sh  GPU=1  running
aa_03.sh  GPU=4  running

(I may have mixed up the numbering and surely the timing isn't correct (aa runs 3x longer than bb), but hopefully I explained the ordering correctly.)

It's the "magical" part of parallel that I'm unsure of.

CodePudding user response：

Look at https://www.gnu.org/software/parallel/man.html#example-gnu-parallel-as-queue-system-batch-manager

So something like:

true >jobqueue; tail -n 0 -f jobqueue | parallel -j4 'eval CUDA_VISIBLE_DEVICES={%} {}'
echo "aa_00.sh; (echo aa_01.sh; echo aa_02.sh; echo aa_03.sh) >> jobqueue" >> jobqueue
echo "bb_00.sh; (echo bb_01.sh; echo bb_02.sh; echo bb_03.sh) >> jobqueue" >> jobqueue

We are clearly in territory where there must be better tools: GNU Parallel does not have a dependency graph like make has.