Home > Back-end >  How do we refer to a dataframe column with an interaction term using a string variable?
How do we refer to a dataframe column with an interaction term using a string variable?

Time:01-23

I have a data frame whose columns contain the coefficients of a regression model trained on different data sets. Each row of the data frame corresponds to the model trained on a (possibly) different data set. In the example below, I used the same data set for each of the three rows. There are multiple columns with interaction terms. In the example below, only column with an interaction term is shown.

> models_t
                    (Intercept)      x1       x2         x3    x1:x3
model1.coefficients  -0.0231804 1.02417 1.024191 -0.0118544 1.001139
model2.coefficients  -0.0231804 1.02417 1.024191 -0.0118544 1.001139
model3.coefficients  -0.0231804 1.02417 1.024191 -0.0118544 1.001139

We are using a string filter condition like so:

cond = "x1:x3 > 0"

in order to filter models that satisfy a condition on the interaction effect. We are using the dplyr and the rlang libraries like so:

> models_t %>% dplyr::filter(!!rlang::parse_expr(cond))
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `x1:x3 > 0`.
x Input `..1` must be of size 3 or 1, not size 2.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In x1:x3 : numerical expression has 3 elements: only the first used
2: In x1:x3 : numerical expression has 3 elements: only the first used

As can be seen, R seems to interpreting the x1:x3 term as a range. How does one perform such a filter operation using a string to refer to an interaction term?

CodePudding user response:

Use backticks for column names.

cond = "`x1:x3` > 0"

You can then use it in base R subset or dplyr::filter -

subset(df, eval(parse(text = cond)))

df %>% dplyr::filter(!!rlang::parse_expr(cond))
  •  Tags:  
  • Related