How to store each occurrence of multiline string in array using bash regex-CodePudding

Given a text file test.txt with contents:

hello
someline1
someline2
...
world1

line that shouldn't match

hello
someline1
someline2
...
world2

How can I store both of these multiline matches in separate array indexes?

I'm currently trying to use regex="hello.*world[12]"

Unfortunately I can only use native Bash, so Perl etc is off the table. Thanks

CodePudding user response：

As the regex of bash does not have such functionality as findall() function of python, we need to capture the matched substring one by one in the loop.

Would you please try the following:

#!/bin/bash

str=$(<test.txt)
regex="hello.world[12]"

while [[ $str =~ ($regex)(.*) ]]; do
    ary =( "${BASH_REMATCH[1]}" )       # store the match into an array
    str="${BASH_REMATCH[2]}"            # remaining substring
done

for i in "${!ary[@]}"; do               # see the result
    echo "[$i] ${ary[$i]}"
done

Output:

[0] hello
world1
[1] hello
world2

[Edit]
If there exist some lines between "hello" and "world", we need to change the approach as the regex of bash does not support the shortest match. Then how about:

regex1="hello"
regex2="world"

while IFS= read -r line; do
    if [[ $line =~ $regex1 ]]; then
        str="$line"$'\n'
        f=1
    elif (( f )); then
        str ="$line"$'\n'
        if [[ $line =~ $regex2 ]]; then
            ary =("$str")
            f=0
        fi
    fi
done < test.txt

CodePudding user response：

I would use awk and mapfile version >= 4.3

#!/bin/bash

mapfile -d '' arr < <(
    awk '/hello/{f=1} f; /world[12]/{f=0; printf "\000"}' test.txt
)

arr=([0]=$'hello\nsomeline1\nsomeline2\n...\nworld1\n' [1]=$'hello\nsomeline1\nsomeline2\n...\nworld2\n')

or for older bash:

#!/bin/bash
arr=()
while IFS='' read -r -d '' block
do
    arr =( "$block" )
done < <(
    awk '/hello/{f=1} f; /world[12]/{f=0; printf "\000"}' test.txt
)