I am trying to replace a pattern between the lines of a file.
Specifically, I would like to replace ,\n & with , &\n in large and multiple files. This actually moves the symbol & to the previous line. This is very easy with CTR H, but I found it difficult with sed.
So, the initial file is in the following form:
A ,
& B -,
& C ),
& D ,
& E (,
& F *,
# & G -,
& H ,
& I (,
& J ,
K ?,
The output-desired form is:
A , &
B -, &
C ), &
D , &
E (, &
F *, &
# & G -,
H , &
I (, &
J ,
K ?,
Following previous answered questions on stackoverflow, I tried to convert it with the commands below:
sed ':a;N;$!ba;s/,\n &/&\n /g' file1.txt > file2.txt
sed -i -e '$!N;/&/b1' -e 'P;D' -e:1 -e 's/\n[[:space:]]*/ /' file2.txt
but they fail if the symbol "#" is present in the file.
Is there any way to replace the matched pattern simpler, let's say:
sed -i 's/,\n &/, &\n /g' file
Thank you in advance!
CodePudding user response:
Using sed
$ sed ':a;N;s/\n \ \(&\) \(.*\)/ \1\n \2/;ba' input_file
A , &
B -, &
C ), &
D , &
E (, &
F *,
# & G -, &
H , &
I (, &
J ,
CodePudding user response:
If you use GNU sed and your file does not contain NUL characters (ASCII code 0), you can use its -z option to process the whole file as one single string:
$ sed -Ez ':a;s/((\`|\n)[^\n#]*,)((\n[^\n#]*#[^\n]*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /g;ta' file
A , &
B -, &
C ), &
D , &
E (, &
F *, &
# & G -,
H , &
I (, &
J ,
K ?,
This corresponds to your textual specification and to your desired output for the example you show. But it is a bit complicated. Instead of processing lines that end with a newline character it processes sub-strings that begin with a newline character (or the beginning of the file) and end before the next newline character. Let's name these "chunks".
We basically search for a sequence of chunks in the form AB*C where A is a chunk (possibly the first) not containing #, B* is any number (including none) of chunks containing #, and C is a chunk starting with a newline, followed by spaces and &.
A is matched by (\<backstick>|\n)[^\n#]*, which means beginning-of-file-or-newline, followed by any number of characters expect newline and #, followed by a comma.
B is matched by \n[^\n#]*#[^\n]* which means newline, followed by any number of characters expect newline and #, followed by # and any number of characters expect newline.
C is matched by \n[[:blank:]]* which means newline, followed by any number of blanks and a &.
If we find such a sequence we add a space and a & at the end of A, we do not change B*, and we replace the first & in C by a space.
And we repeat until no such sequence is found.
