I have a data frame whose columns contain the coefficients of a regression model trained on different data sets. Each row of the data frame corresponds to the model trained on a (possibly) different data set. In the example below, I used the same data set for each of the three rows. There are multiple columns with interaction terms. In the example below, only column with an interaction term is shown.
> models_t
(Intercept) x1 x2 x3 x1:x3
model1.coefficients -0.0231804 1.02417 1.024191 -0.0118544 1.001139
model2.coefficients -0.0231804 1.02417 1.024191 -0.0118544 1.001139
model3.coefficients -0.0231804 1.02417 1.024191 -0.0118544 1.001139
We are using a string filter condition like so:
cond = "x1:x3 > 0"
in order to filter models that satisfy a condition on the interaction effect. We are using the dplyr and the rlang libraries like so:
> models_t %>% dplyr::filter(!!rlang::parse_expr(cond))
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `x1:x3 > 0`.
x Input `..1` must be of size 3 or 1, not size 2.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In x1:x3 : numerical expression has 3 elements: only the first used
2: In x1:x3 : numerical expression has 3 elements: only the first used
As can be seen, R seems to interpreting the x1:x3 term as a range. How does one perform such a filter operation using a string to refer to an interaction term?
CodePudding user response:
Use backticks for column names.
cond = "`x1:x3` > 0"
You can then use it in base R subset or dplyr::filter -
subset(df, eval(parse(text = cond)))
df %>% dplyr::filter(!!rlang::parse_expr(cond))
