I want to add a new column based on a given character vector.
For example, in the example below, I want to add column d defined in expr:
library(magrittr)
data <- tibble::tibble(
a = c(1, 2),
b = c(3, 4)
)
expr <- "d = a b"
just as below:
data %>%
dplyr::mutate(d = a b)
# # A tibble: 2 x 3
# a b d
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
However, in the codes below, while the calculations themselves (i.e., adding) work, the names of the new columns are different from what I expected.
data %>%
dplyr::mutate(!!rlang::parse_expr(expr))
# # A tibble: 2 x 3
# a b `d = a b`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
data %>%
dplyr::mutate(!!rlang::parse_quo(expr, env = rlang::global_env()))
# # A tibble: 2 x 3
# a b `d = a b`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
data %>%
dplyr::mutate(rlang::eval_tidy(rlang::parse_expr(expr)))
# # A tibble: 2 x 3
# a b `rlang::eval_tidy(rlang::parse_expr(expr))`
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
How can I properly use an expression in dplyr::mutate?
My question is similar to this, but in my example, the new variable (d) and its definition (a b) are given in a single character vector (expr).
CodePudding user response:
Lets first look at what kind of expressions dplyr::mutate takes to create named variables: we need a named list that contains an expression to create variables based on that expression with the given list element name.
library(tidyverse)
data <- tibble::tibble(
a = c(1, 2),
b = c(3, 4)
)
expr <- "d = a b"
# let's rewrite the string above as named list containing an expression.
expr2 <- list(d = expr(a b))
# this works as expected:
data %>%
mutate(!!! expr2)
#> # A tibble: 2 x 3
#> a b d
#> <dbl> <dbl> <dbl>
#> 1 1 3 4
#> 2 2 4 6
Now we simply need a function that transforms a string into a named list containing the expression of the right-hand side of the equation. The name needs to be the left-hand side of the equation. We can do this with regular string manipulations. Finally we need to transform the right-hand side of the equation from a string into an expression. We can use base R's str2lang here.
create_expr_ls <- function(str_expr) {
expr_nm <- str_extract(str_expr, "^\\w ")
expr_code <- str_replace_all(str_expr, "(^\\w \\s?=\\s?)(.*)", "\\2")
set_names(list(str2lang(expr_code)), expr_nm)
}
expr3 <- create_expr_ls(expr)
data %>%
mutate(!!! expr3)
#> # A tibble: 2 x 3
#> a b d
#> <dbl> <dbl> <dbl>
#> 1 1 3 4
#> 2 2 4 6
Created on 2022-01-23 by the reprex package (v0.3.0)
CodePudding user response:
To get the desired name for the mutated column, you can still use the same syntax and assign the results to a column with the preferred name. To get this name you can use a regular expression to find what is before = and then remove any leading or trailing spaces that might exist.
expr <- "x = a * b"
col_name <- trimws(str_extract(expr,"[^=] "))
data %>%
dplyr::mutate(!!col_name := !!rlang::parse_expr(expr))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8
data %>%
dplyr::mutate(!!col_name := !!rlang::parse_quo(expr, env = rlang::global_env()))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8
data %>%
dplyr::mutate(!!col_name := rlang::eval_tidy(rlang::parse_expr(expr)))
# A tibble: 2 × 3
a b x
<dbl> <dbl> <dbl>
1 1 3 3
2 2 4 8
CodePudding user response:
Any of these work. The second is similar to the first but does not require that rlang be on the search path. The third and fourth also work if the d= part is not present in expr in which case default names are used. The last one uses only base R and is also the shortest.
data %>% mutate(within(., !!parse_expr(expr)))
data %>% mutate(within(., !!parse(text = expr)))
data %>% mutate(data, !!parse_expr(sprintf("tibble(%s)", expr)))
data %>% { eval_tidy(parse_expr(sprintf("mutate(., %s)", expr))) }
within(data, eval(parse(text = expr))) # base R
Note
Assume this premable:
library(dplyr)
library(rlang)
# input
data <- tibble(a = c(1, 2), b = c(3, 4))
expr <- "d = a b"
