I need to extract the string between CAKE_FROSTING(" and ",. If the string extends over multiple lines, the quotation marks and newline at the line changes must be removed. I have a command (thanks stackoverflow) that does something in that direction, but not exactly. How can I fix it (and can you shortly explain the fixes)? I am using Linux bash.
sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s*?"([^,]*).*/\1/p;ba' filesToCheck/* > result.txt
filesToCheck/file.h
something
CAKE_FROSTING(
"is supreme",
"[i][agree]") something else
something more
something else
CAKE_FROSTING(
"is."kinda" neat"
"in fact",
"[i][agree]") something else
something more
result.txt current
is supreme"
is."kinda" neat"
result.txt desired
is supreme
is."kinda" neat in fact
Edit: With help from @D_action I now have
sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s*?"([^,]*).*,/\1/p;ba' filesToCheck/* > result.txt
this produces almost the correct output, but there are unnecessary quotation marks and one too many newline in the output:
result.txt current
is supreme"
is."kinda" neat"
"in fact"
CodePudding user response:
Using GNU sed
$ sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s"([^"]*[^\n,]*)["].*\n"([[:alpha:] ] )?.*/\1 \2/p;ba' input_file
is supreme
is."kinda" neat in fact
CodePudding user response:
You can also use perl here to match string between CAKE_FROSTING( and ) and remove double quotes from start/end of lines and replace linebreaks with spaces only inside the matches:
perl -0777 -ne 'while (/CAKE_FROSTING\(\s*"([^,]*)"/g) {$a=$1; $a =~ s/^"|"$|(\R )/$1?" ":""/gme; print "$a\n"}' file
See the online demo. Note that -0777 slurps the file so that the regex engine could "see" the line breaks.
The CAKE_FROSTING\(\s*"([^,]*)" pattern matches CAKE_FROSTING(, zero or more whitespaces, ", then captures into Group 1 any zero or more non-comma chars until the right-most ".
The $a=$1; $a =~ s/^"|"$|(\R )/$1?" ":""/gme; print "$a\n" parts assigns the Group 1 value to an $a variable, ^"|"$|(\R ) matches "s that are either at the start of end of lines or captures one or more line breaks (\R ) into Group 1 and if Group 1 matches, the replacement is a space, else, it is an empty string. The contents of the $a variable is printed only.
