Remove double quotes within the column value using Unix-CodePudding

I am working on Processing a (90 Cols) CSV File - Semicolon Separated (;) {case can be ignore and I am aware file standard is a mess but I am helpless in that regards}

Input Rows :

"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"

Output Expected :

"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"

(Double Quote can be replaced by Space or blank). {Kindly note - even though this is ';' seperated file some rows have ';' within quoted data for a column.

Issue : In the rows - I am getting an extra Double Quote within the quoted data.

Please advise me on how to handle this in Unix.

CodePudding user response：

one trick you can use is to remove " not around the field boundaries. A simple sed script can be

$ sed -E 's/([^\b;])"([^\b;])/\1 \2/g' file

note that if you allow escaped quote marks is you fields, this is going to remove them as well.

CodePudding user response：

What would you think of the following solution:

Replace all ";" by ;
Remove all remaining "
Replace all ; back into ";"
Add additional " characters, at the beginning and at the end of every line.

The whole thing can be done with tr or sed or whatever command you prefer.

CodePudding user response：

mawk 'NF*(gsub(__," ",$!(NF=NF))^_  gsub(OFS,FS)  gsub("^ | $",__))' \
               __='\42'  FS='\442\73\42' OFS='\31\17'

"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"

CodePudding user response：

This transform is easy to do using tool which provide regular expression with zero-length assertions (lookbehind and lookahead), as you applied unix tag there is good chance you have perl command and therefore I propose following solution, let file.txt content be

"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"

then

perl -p -e 's/(?<=[[:alnum:]])"(?=[[:alnum:]])/ /g' file.txt

gives output

"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"

Explanation: I inform perl that I want to use it sed-style via -p -e then I provide substitution (s): " which is after alphanumeric character (letter or digit) and before alphanumeric should be replaced using space character. This is applied to all such " that is globally (g).

Note: you might elect to port that answer to any other tools which does provide ability to replace regular expression with zero-length assertions.

(tested in perl 5, version 26, subversion 3)