I am trying to convert this input from file.txt
a,b;c^d"e}
f;g,h!;i8j-
into this output
a,b,c,d,e,f,g,h,i,j
with awk
The best I did so far is
awk '$1=$1' FS="[!;^}8-]" OFS="," file.txt
- how can I escape interpritating
"as a special character ?"doesn`t work - avoid duplicate
,,in the output and delete the last,
CodePudding user response:
If you only want to replace non-letter characters with commas and squeeze repeated commas, tr is your friend:
tr -sc '[:alpha:]' ','
This will leave a trailing comma though. You could use sed to remove/replace it:
tr -sc '[:alpha:]' ',' | sed 's/,$/\n/'
Another possibility is to split each "item" into its own line (with tr or grep -o), then use paste to combine the lines again:
tr -sc '[:alpha:]' '\n' | paste -sd,
CodePudding user response:
I would harness GNU AWK for this task following way, let file.txt content be
a,b;c^d"e} f;g,h!;i8j-
then
awk 'BEGIN{FPAT="[a-z]";OFS=","}{$1=$1;print}' file.txt
gives output
a,b,c,d,e,f,g,h,i,j
Explanation: I inform GNU AWK that field is single lowercase ASCII letter using FPAT, and output field separator (OFS) is ,, then for each line I do $1=$1 to trigger line rebuild and print line.
(tested in GNU Awk 5.0.1)
CodePudding user response:
$ awk -v RS="^$" '{ # read whole file in
gsub(/[^a-z] /,",") # replace all non lowercase alphabet substrings with a comma
sub(/,$/,"") # remove trailing comma
}1' file # output
Output:
a,b,c,d,e,f,g,h,i,j
