Given a text file test.txt with contents:
hello
someline1
someline2
...
world1
line that shouldn't match
hello
someline1
someline2
...
world2
How can I store both of these multiline matches in separate array indexes?
I'm currently trying to use regex="hello.*world[12]"
Unfortunately I can only use native Bash, so Perl etc is off the table. Thanks
CodePudding user response:
As the regex of bash does not have such functionality as findall() function of python, we need to capture the matched substring one by one in the loop.
Would you please try the following:
#!/bin/bash
str=$(<test.txt)
regex="hello.world[12]"
while [[ $str =~ ($regex)(.*) ]]; do
ary =( "${BASH_REMATCH[1]}" ) # store the match into an array
str="${BASH_REMATCH[2]}" # remaining substring
done
for i in "${!ary[@]}"; do # see the result
echo "[$i] ${ary[$i]}"
done
Output:
[0] hello
world1
[1] hello
world2
[Edit]
If there exist some lines between "hello" and "world", we need to change the approach as the regex of bash does not support the shortest match. Then how about:
regex1="hello"
regex2="world"
while IFS= read -r line; do
if [[ $line =~ $regex1 ]]; then
str="$line"$'\n'
f=1
elif (( f )); then
str ="$line"$'\n'
if [[ $line =~ $regex2 ]]; then
ary =("$str")
f=0
fi
fi
done < test.txt
CodePudding user response:
I would use awk and mapfile version >= 4.3
#!/bin/bash
mapfile -d '' arr < <(
awk '/hello/{f=1} f; /world[12]/{f=0; printf "\000"}' test.txt
)
arr=([0]=$'hello\nsomeline1\nsomeline2\n...\nworld1\n' [1]=$'hello\nsomeline1\nsomeline2\n...\nworld2\n')
or for older bash:
#!/bin/bash
arr=()
while IFS='' read -r -d '' block
do
arr =( "$block" )
done < <(
awk '/hello/{f=1} f; /world[12]/{f=0; printf "\000"}' test.txt
)
