I noticed people using r"..". What is it for? thanks
CodePudding user response:
r"..." is Julia syntax for defining a regular expression, and is used throughout the language (not just in data frames) whenever a regexp is needed. You can find more information about this syntax by searching for r"" in the Julia REPL's built-in help:
help?> r""
@r_str -> Regex
Construct a regex, such as r"^[a-z]*$", without interpolation and unescaping (except
for quotation mark " which still has to be escaped). The regex also accepts one or
more flags, listed after the ending quote, to change its behaviour:
• i enables case-insensitive matching
• m treats the ^ and $ tokens as matching the start and end of individual
lines, as opposed to the whole string.
• s allows the . modifier to match newlines.
• x enables "comment mode": whitespace is enabled except when escaped with \,
and # is treated as starting a comment.
• a disables UCP mode (enables ASCII mode). By default \B, \b, \D, \d, \S, \s,
\W, \w, etc. match based on Unicode character properties. With this option,
these sequences only match ASCII characters.
See Regex if interpolation is needed.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> match(r"a .*b .*?d$"ism, "Goodbye,\nOh, angry,\nBad world\n")
RegexMatch("angry,\nBad world")
This regex has the first three flags enabled.
More broadly, the pattern of some word or letter immediately preceding / juxtaposed with a quotation is called a string macro (or non-standard string literal) and you can even define your own (as in packages like this). The r"..."syntax is one that just happens to be built-in and is used specifically for definining regexp objects that can later be applied to one or more strings with functions such as match and replace.
CodePudding user response:
@cbk gave you a very good overview of the usages of the r"..." regular expressions in Julia.
In DataFrames.jl you can use regular expressions are commonly used as column selectors. Here are some examples where r"b" matches all columns that contain "b" somewhere in their name:
julia> using DataFrames
julia> df = DataFrame(a=1, b1=2, b2=3, c=4)
1×4 DataFrame
Row │ a b1 b2 c
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 2 3 4
julia> df[:, r"b"] # data frame indexing
1×2 DataFrame
Row │ b1 b2
│ Int64 Int64
─────┼──────────────
1 │ 2 3
julia> select(df, r"b") # selection operation
1×2 DataFrame
Row │ b1 b2
│ Int64 Int64
─────┼──────────────
1 │ 2 3
julia> combine(df, AsTable(r"b") => ByRow(sum)) # rowwise aggregation of selected columns
1×1 DataFrame
Row │ b1_b2_sum
│ Int64
─────┼───────────
1 │ 5
