I'm looking for a regex that decomposes a string containing arguments of a function written in another language in a list of the form argName = value.
An instance of my string of arguments is:
args <- "arg1, arg2 = {{space}}, arg3 = TRUE, arg4 = {{plot, datG1, arg1 = TRUE}}, arg5 = ga, arg6 = {{bla bla {{plot, datG1, arg1 = TRUE}}}}"
where arg1 is an argument without value (by convention, here, value = NA), arg2 takes the value "{{space}}", arg3 takes "TRUE", etc.
Each value should be returned as a string (or NA). The special form {{foo}} is the convention for either a function (as in {{space}}) or a text eventually containing functions (as in {{bla bla {{plot, datG1, arg1 = TRUE}}}}). I already have a code identifying functions and pure text. The only thing I need is to list arguments of each function.
So here, the regex should allow me to decompose the string args in the list
list(
arg1 = NA,
arg2 = "{{space}}",
arg3 = "TRUE",
arg4 = "{{plot, datG1, arg1 = TRUE}}",
arg5 = "ga",
arg6 = "{{bla bla {{plot, datG1, arg1 = TRUE}}}}"
)
The regex I use to identify functions is "\\{\\{((?>[^\\{\\{\\}\\}] |(?R))*)\\}\\}"
CodePudding user response:
You can use
args <- "arg1, arg2 = {{space}}, arg3 = TRUE, arg4 = {{plot, datG1, arg1 = TRUE}}, arg5 = ga, arg6 = {{bla bla {{plot, datG1, arg1 = TRUE}}}}"
rx <- "(\\w )(?:\\s*=\\s*((\\{\\{((?>(?!\\{\\{|}})(?s).|(?3))*)}})|\\w ))?"
matches <- regmatches(args, gregexec(rx, args, perl=TRUE))
keys <- matches[[1]][2,]
values <- matches[[1]][3,]
values[values==""] <- NA
names(values) <- keys
See the regex demo. Now, values will contain your data. You may also put the data into a dataframe with df <- data.frame(params=matches[[1]][2,], values=matches[[1]][3,]).
Details:
(\w )- Group 1: one or more word chars(?:\s*=\s*((\{\{((?>(?!\{\{|}})(?s).|(?2))*)}})|\w ))?- an optional sequence of\s*=\s*- a=char enclosed with zero or more whitespaces((\{\{((?>(?!\{\{|}})(?s).|(?2))*)}})|\w )- Group 2:(\{\{((?>(?!\{\{|}})(?s).|(?2))*)}})- Group 3 (used for recursion): a{{, then any zero or more repetitions of any char that does not start a{{or}}char sequences (repeated zero or more times), or the Group 3 pattern, and then a}}substring|- or\w- one or more word chars.
