Home > Blockchain >  What is the purpose of "r" before "..." in DataFrames.jl
What is the purpose of "r" before "..." in DataFrames.jl

Time:01-22

I noticed people using r"..". What is it for? thanks

CodePudding user response:

r"..." is Julia syntax for defining a regular expression, and is used throughout the language (not just in data frames) whenever a regexp is needed. You can find more information about this syntax by searching for r"" in the Julia REPL's built-in help:

help?> r""
  @r_str -> Regex

  Construct a regex, such as r"^[a-z]*$", without interpolation and unescaping (except
  for quotation mark " which still has to be escaped). The regex also accepts one or
  more flags, listed after the ending quote, to change its behaviour:

    •  i enables case-insensitive matching

    •  m treats the ^ and $ tokens as matching the start and end of individual
       lines, as opposed to the whole string.

    •  s allows the . modifier to match newlines.

    •  x enables "comment mode": whitespace is enabled except when escaped with \,
       and # is treated as starting a comment.

    •  a disables UCP mode (enables ASCII mode). By default \B, \b, \D, \d, \S, \s,
       \W, \w, etc. match based on Unicode character properties. With this option,
       these sequences only match ASCII characters.

  See Regex if interpolation is needed.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> match(r"a .*b .*?d$"ism, "Goodbye,\nOh, angry,\nBad world\n")
  RegexMatch("angry,\nBad world")

  This regex has the first three flags enabled.

More broadly, the pattern of some word or letter immediately preceding / juxtaposed with a quotation is called a string macro (or non-standard string literal) and you can even define your own (as in packages like this). The r"..."syntax is one that just happens to be built-in and is used specifically for definining regexp objects that can later be applied to one or more strings with functions such as match and replace.

CodePudding user response:

@cbk gave you a very good overview of the usages of the r"..." regular expressions in Julia.

In DataFrames.jl you can use regular expressions are commonly used as column selectors. Here are some examples where r"b" matches all columns that contain "b" somewhere in their name:

julia> using DataFrames

julia> df = DataFrame(a=1, b1=2, b2=3, c=4)
1×4 DataFrame
 Row │ a      b1     b2     c
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      2      3      4

julia> df[:, r"b"] # data frame indexing
1×2 DataFrame
 Row │ b1     b2
     │ Int64  Int64
─────┼──────────────
   1 │     2      3

julia> select(df, r"b") # selection operation
1×2 DataFrame
 Row │ b1     b2
     │ Int64  Int64
─────┼──────────────
   1 │     2      3

julia> combine(df, AsTable(r"b") => ByRow(sum)) # rowwise aggregation of selected columns
1×1 DataFrame
 Row │ b1_b2_sum
     │ Int64
─────┼───────────
   1 │         5
  •  Tags:  
  • Related