I have a string with 3 capture groups and would like to preserve the first and third but perform a substitution on the second. How do I express this in sed?
Concretely, I have an input string like:
top-level.subpath.one.subpath.two.subpath.forty-five
And I want to preserve the part before the first ., shorten the middle part to the first letter of every word, and preserve the part after the last .. The result should look like:
top-level.s.o.s.t.s.forty-five
For preserving the capture groups, I have:
sed -r 's/([^.]*)(.*)(\..*)/\1...\3/'
which gets me:
top-level....forty-five
For converting something like .subpath.one.subpath.two.subpath to only initials, I have:
sed -r 's/(\.[^.])[^\.]*/\1/g'
which gets me:
.s.o.s.t.s
I'd like to essentially apply that second sed expression to capture group 2. Is there some way I can chain sed substitutions to perform that second substitution on only the second capture group while retaining the first and third?
CodePudding user response:
You can use
sed -E ':a; s/^(.*\.[^.])[^.] (\.)/\1\2/; ta' file > newfile # GNU sed
sed -E -e :a -e 's/^(.*\.[^.])[^.] (\.)/\1\2/' -e ta file > newfile # FreeBSD sed
See the online demo. Details:
-E- enables POSIX ERE syntax (is now a one or more quantifier,(...)is parsed as a grouping construct):a- sets analabels/^(.*\.[^.])[^.] (\.)/\1\2/- finds zero or more chars, a.and then any single char other than a.(capturing this into Group 1), then one or more chars other than a., and then matches and captures into Group 2 a dot char, the match is replaced with concatenated Group 1 Group 2 valuesta- goes to thealabel upon successful replacement.
CodePudding user response:
A simple awk solution that will work with any version of awk including MacOS:
s='top-level.subpath.one.subpath.two.subpath.forty-five'
awk 'BEGIN{FS=OFS="."} {for(i=2;i<NF; i) $i=substr($i,1,1)}1' <<< "$s"
top-level.s.o.s.t.s.forty-five
This awk command uses . as input and output field separator. We loop through field position 2 to last-1 and replace value of each field with the first character of that field. In the end we print full record.
A BSD sed solution to do the same:
sed -E -e ':x' -e 's/(. \..)[^.] \./\1./; tx' <<< "$s"
top-level.s.o.s.t.s.forty-five
CodePudding user response:
This might work for you (GNU sed):
sed -E ':a;s/(\..*)\B.(.*\.)/\1\2/;ta' file
Capture the first and last periods and hollow out the middle removing any side-by-side word characters.
Ameliorating @anubhava's sed answer:
sed -E 's/(\..)[^.] \./\1./g;s//\1./g' file
Using the global flag and repeating the same substitution provides a 2 command solution.
