I have a file s.csv
a,b -.,c
aa,bb ().,c._c
I want to remove all special characters from 2nd column (file separated by comma)
cat s.csv | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'
The above code also removes special characters from 3rd column as well.
awk -F, '{print $2}' s.csv | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'
This code only print 2nd column.
Any idea how can I remove special char from 2nd column and price all
Required output should be
a,b,c
aa,bb,c._c
CodePudding user response:
Remove all (from second field)
- characters that are not upper case letters
[^A-Z - or lower case letters
a-z - or digits
0-9] - from second field
$2 - fields are with "," separated
-F ',' - keep the separator in output
OFS=FS
$ awk -F ',' 'BEGIN{OFS=FS}{gsub(/[^A-Za-z0-9]/,"",$2); print}' s.csv
# test
$ awk -F ',' 'BEGIN{OFS=FS}{gsub(/[^A-Za-z0-9]/,"",$2); print}' <<<'aa,bb ().,c._c'
aa,bb,c._c
As @Léa Gris mentioned below
Don't forget to set the locale to
Cor[^A-Za-z0-9]is gonna be interpreted unexpectedly in non-western European alphabets. Prepend awk invocation withLC_ALL=C
CodePudding user response:
Using awk, [[:punct:]] will remove all special characters and [[:punct:] ] will match and remove special characters as well as spaces.
$ awk 'BEGIN{FS=OFS=","} {gsub(/[[:punct:] ]/,"",$2)} 1' input_file
a,b,c
aa,bb,c._c
Using sed
$ sed '/\([^,]*,[[:alpha:]]\ \)[^,]*/s//\1/' input_file
a,b,c
aa,bb,c._c
CodePudding user response:
You can use the [:alpha:] character class using awk, here for second field and remove with gsub() function the characteres that aren't alpha:
awk 'BEGIN{OFS=FS=","} {gsub(/[^[:alpha:]] /, "", $2)} 1' file
a,b,c
aa,bb,c._c
- if you need other set of characters, you can see this answer of Ed Morton: https://stackoverflow.com/questions/56481541/how-can-you-tell-which-characters-are-in-which-character-classes and see "which characters are in which character classes"
CodePudding user response:
Use this Perl one-liner:
perl -F',' -lane '$F[1] =~ s{[\W_] }{}g; @F = map { lc } @F; print join ",", @F; ' in_file > out_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-F',' : Split into @F on comma, rather than on whitespace.
s{[\W_] }{} : Replace 1 or more occurrences of \W (non-word character) or underscore with nothing.
The regex uses these modifiers:
/g : Match the pattern repeatedly.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
