I am trying to replace every 4th occurrence of "_" with "@" in multiple files with bash.
E.g.
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo..
would become
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo...
#perl -pe 's{_}{ $n % 4 ? $& : "@"}ge' *.txt
I have tried perl but the problem is this replaces every 4th _ carrying on from the last file. So for example, some files the first _ is replaced because it is not starting each new file at a count of 0, it carries on from the previous file.
I have tried:
#awk '{for(i=1; i<=NF; i ) if($i=="_") if( count%4==0) $i="@"}1' *.txt
but this also does not work.
Using sed I cannot find a way to keep replacing every 4th occurrence as there are different numbers of _ in each file. Some files have 20 _, some have 200 _. Therefore, I cant specify a range.
I am really lost what to do, can anybody help?
CodePudding user response:
You just need to reset the counter in the perl one using eof to tell when it's done reading each file:
perl -pe 's{_}{ $n % 4 ? "_" : "@"}ge; $n = 0 if eof' *.txt
CodePudding user response:
This MAY be what you want, using GNU awk for RT:
$ awk -v RS='_' '{ORS=(FNR%4 ? RT : "@")} 1' file
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo..
It only reads each _-separated string into memory 1 at a time so should work no matter how large your input file, assuming there are _s in it.
It assumes you want to replace every 4th _ across the whole file as opposed to within individual lines.
CodePudding user response:
A simple sed would handle this:
s='foo_foo_foo_foo_foo_foo_foo_foo_foo_foo'
sed -E 's/(([^_] _){3}[^_] )_/\1@/g' <<< "$s"
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo
Explanation:
(: Start capture group #1([^_] _){3}: Match Match 1 of non-_characters followed by a_. Repeat this group 3 times to match 3 such words separated by_[^_]: Match 1 of non-_characters
): End capture group #1_: Match a_- Replacement is
\1@to replace 4th_with a@
CodePudding user response:
With GNU awk
$ cat ip.txt
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
123_45678_90
_
$ awk -v RS='(_[^_] ){3}_' -v ORS= '{sub(/_$/, "@", RT); print $0 RT}' ip.txt
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo
123_45678_90
@
-v RS='(_[^_] ){3}_'set input record separator to cover sequence of four_(text matched by this separator will be available viaRT)-v ORS=empty output record separatorsub(/_$/, "@", RT)change last_to@- Use
-i inplacefor inplace editing.
CodePudding user response:
If the count should reset for each line:
perl -pe's/(?:_[^_]*){3}\K_/\@/g'
$ cat a.txt
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
$ perl -pe's/(?:_[^_]*){3}\K_/\@/g' a.txt a.txt
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo
If the count shouldn't reset for each line, but should reset for each file:
perl -0777pe's/(?:_[^_]*){3}\K_/\@/g'
The -0777 cause the whole file to be treated as one line. This causes the count to work properly across lines.
But since a new a match is used for each file, the count is reset between files.
$ cat a.txt
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
foo_foo_foo_foo_foo_foo_foo_foo_foo_foo
$ perl -0777pe's/(?:_[^_]*){3}\K_/\@/g' a.txt a.txt
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo
foo_foo_foo@foo_foo_foo_foo@foo_foo_foo
foo_foo_foo_foo@foo_foo_foo_foo@foo_foo
foo_foo_foo@foo_foo_foo_foo@foo_foo_foo
To avoid that reading the entire file at once, you could continue using the same approach, but with the following added:
$n = 0 if eof;
Note that eof is not the same thing as eof()! See eof.
CodePudding user response:
With GNU sed:
sed -nsE ':a;${s/(([^_]*_){3}[^_]*)_/\1@/g;p};N;ba' *.txt
-n suppresses the automatic printing, -s processes each file separately, -E uses extended regular expressions.
The script is a loop between label a (:a) and the branch-to-label-a command (ba). Each iteration appends the next line of input to the pattern space (N). This way, after the last line has been read, the pattern space contains the whole file(*). During the last iteration, when the last line has been read ($), a substitute command (s) replaces every 4th _ in the pattern space by a @ (s/(([^_]*_){3}[^_]*)_/\1@/g) and prints (p) the result.
When you will be satisfied with the result you can change the options:
sed -i -nE ':a;${s/(([^_]*_){3}[^_]*)_/\1@/g;p};N;ba' *.txt
to modify the files in-place, or:
sed -i.bkp -nE ':a;${s/(([^_]*_){3}[^_]*)_/\1@/g;p};N;ba' *.txt
to modify the files in-place, but keep a *.txt.bkp backup of each file.
(*) Note that if you have very large files this could cause memory overflows.
CodePudding user response:
With your shown samples, please try following awk program. Have created an awk variable named fieldNum where I have assigned 4 to it, since OP needs to enter @ after every 4th _, you can keep it as per your need too.
awk -v fieldNum="4" '
BEGIN{ FS=OFS="_" }
{
val=""
for(i=1;i<=NF;i ){
val=(val?val:"") $i (i%fieldNum==0?"@":(i<NF?OFS:""))
}
print val
}
' Input_file
