I'm looking for a concise way of outputting TSV records that can be accurately read and unescaped with a POSIX shell (IFS='' read -r printf %b).
The minimalistic escaping rules are:
\\for backslash\nfor newline\tfor tab\rfor carriage return
But they can be extended to the full supported set of printf if there exists an easy way of doing it with ruby.
The code so far:
record = [ "\\", "\t", "\n", "\r", "\"" ]
rules = {
"\\" => "\\\\",
"\t" => "\\t",
"\r" => "\\r",
"\n" => "\\n"
};
regex = /#{ rules.keys.map{|c| Regexp.escape(c)}.join(?|) }/;
puts record.map { |field| field.gsub(regex) {|c| rules[c]} }.join("\t")
Output:
\\ \t \n \r "
The main problem is that the code is meant for one-liners, thus, if possible, I would like to reduce it greatly. Any idea?
CodePudding user response:
In Ruby, every program can always be written as a one-liner, since line breaks are optional: every function that a line break can have (expression separator, introducing a syntactic block, etc.) can also be performed by a symbol (e.g. ; as expression separator) or keyword (e.g. then for introducing the consequence block of a conditional expression or case expression or do to introduce the body of a while or for loop, etc.)
For example, your code can be written as a one-liner like this:
record = [ "\\", "\t", "\n", "\r", "\"" ]; rules = { "\\" => "\\\\", "\t" => "\\t", "\r" => "\\r", "\n" => "\\n" }; regex = /#{ rules.keys.map{|c| Regexp.escape(c)}.join(?|) }/; puts record.map { |field| field.gsub(regex) {|c| rules[c]} }.join("\t")
However, there are some possible improvements we can make to the code.
Use Regexp::union to construct regex:
regex = Regexp.union(rules.keys)
Since you already went through the trouble of constructing a replacement Hash, why not use the form of String#gsub that takes a replacement Hash as an argument:
puts record.map { |field| field.gsub(regex, rules) }.join("\t")
CodePudding user response:
I got one way:
puts record.map{ |s| s.gsub(/\\|\t|\r|\n/) { |c| c.dump[1,2] } }.join("\t")
String#dump seems compatible with the POSIX shell printf for all characters but the " (if you put aside the fact that it encapsulates the string in double-quotes). Here I'm using it for escaping each targeted char independently, but it would be wiser to run it on the whole string instead and then remove the unwanted \".
