Home > Software design >  Ruby one-liners: Writing a — POSIX shell compatible — TSV format
Ruby one-liners: Writing a — POSIX shell compatible — TSV format

Time:01-16

I'm looking for a concise way of outputting TSV records that can be accurately read and unescaped with a POSIX shell (IFS='' read -r printf %b).

The minimalistic escaping rules are:

  • \\ for backslash
  • \n for newline
  • \t for tab
  • \r for carriage return

But they can be extended to the full supported set of printf if there exists an easy way of doing it with ruby.

The code so far:

record = [ "\\", "\t", "\n", "\r", "\"" ]

rules = {
  "\\" => "\\\\",
  "\t" => "\\t",
  "\r" => "\\r",
  "\n" => "\\n"
};

regex = /#{ rules.keys.map{|c| Regexp.escape(c)}.join(?|) }/;

puts record.map { |field| field.gsub(regex) {|c| rules[c]} }.join("\t")

Output:

\\  \t  \n  \r  "

The main problem is that the code is meant for one-liners, thus, if possible, I would like to reduce it greatly. Any idea?

CodePudding user response:

In Ruby, every program can always be written as a one-liner, since line breaks are optional: every function that a line break can have (expression separator, introducing a syntactic block, etc.) can also be performed by a symbol (e.g. ; as expression separator) or keyword (e.g. then for introducing the consequence block of a conditional expression or case expression or do to introduce the body of a while or for loop, etc.)

For example, your code can be written as a one-liner like this:

record = [ "\\", "\t", "\n", "\r", "\"" ]; rules = { "\\" => "\\\\", "\t" => "\\t", "\r" => "\\r", "\n" => "\\n" }; regex = /#{ rules.keys.map{|c| Regexp.escape(c)}.join(?|) }/; puts record.map { |field| field.gsub(regex) {|c| rules[c]} }.join("\t")

However, there are some possible improvements we can make to the code.

Use Regexp::union to construct regex:

regex = Regexp.union(rules.keys)

Since you already went through the trouble of constructing a replacement Hash, why not use the form of String#gsub that takes a replacement Hash as an argument:

puts record.map { |field| field.gsub(regex, rules) }.join("\t")

CodePudding user response:

I got one way:

puts record.map{ |s| s.gsub(/\\|\t|\r|\n/) { |c| c.dump[1,2] } }.join("\t")

String#dump seems compatible with the POSIX shell printf for all characters but the " (if you put aside the fact that it encapsulates the string in double-quotes). Here I'm using it for escaping each targeted char independently, but it would be wiser to run it on the whole string instead and then remove the unwanted \".

  •  Tags:  
  • Related