I want to apply several different functions simultaneously to one dataframe, then put the results into a list of dataframes. So, for example, I could arrange by one column, then save the output as a new dataframe. Or I could filter some data, then save as another new dataframe (and so on). I feel like there must be an easy way to do this with purrr or apply, but am unsure how to proceed. So, I'm wondering if there is a way to give a list of functions, then return a list of dataframes. Here are some example functions that I apply to mtcars:
library(tidyverse)
filter_df <- function(x, word) {
x %>%
tibble::rownames_to_column("ID") %>%
filter(str_detect(ID, word))
}
a <- filter_df(mtcars, "Merc")
mean_n_df <- function(x, grp, mean2) {
x %>%
group_by({{grp}}) %>%
summarise(mean = mean({{mean2}}), n = n())
}
b <- mean_n_df(mtcars, grp = cyl, mean2 = wt)
rating <- function(x, a, b, c) {
x %>%
rowwise %>%
mutate(rating = ({{a}}*2) ({{b}}-5) * abs({{c}} - 30))
}
c <- rating(mtcars, a = cyl, b = drat, c = qsec)
pct <- function(data, var, round = 4){
var_expr <- rlang::enquo(var)
colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
data %>%
mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
round(round))
}
d <- pct(mtcars, mpg)
I know that I could run the code above, then just bind each dataframe into a list.
df_list <- list(mtcars, a, b, c, d)
str(df_list, 1)[[1]]
List of 5
$ :'data.frame': 32 obs. of 11 variables:
$ :'data.frame': 7 obs. of 12 variables:
$ : tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
$ : rowwise_df [32 × 12] (S3: rowwise_df/tbl_df/tbl/data.frame)
..- attr(*, "groups")= tibble [32 × 1] (S3: tbl_df/tbl/data.frame)
$ :'data.frame': 32 obs. of 12 variables:
CodePudding user response:
This seems a bit bespoke (since each function requires different parameters), but I'd use Map (or purrr::map2 or purrr::pmap), passing a function and the args for it:
filter_df <- function(x, word) {
x %>%
tibble::rownames_to_column("ID") %>%
filter(str_detect(ID, word))
}
mean_n_df <- function(x, grp, mean2) {
x %>%
group_by({{grp}}) %>%
summarise(mean = mean({{mean2}}), n = n())
}
rating <- function(x, a, b, c) {
x %>%
rowwise %>%
mutate(rating = ({{a}}*2) ({{b}}-5) * abs({{c}} - 30))
}
pct <- function(data, var, round = 4){
var_expr <- rlang::enquo(var)
colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
data %>%
mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
round(round))
}
The call:
out <- Map(
function(fun, args) do.call(fun, c(list(mtcars), args)),
list(filter_df, mean_n_df, rating, pct),
list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
list(quo(mpg)))
)
lapply(out, head, 3)
# [[1]]
# ID mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
# 2 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
# 3 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
# [[2]]
# # A tibble: 3 x 3
# cyl mean n
# <dbl> <dbl> <int>
# 1 4 2.29 11
# 2 6 3.12 7
# 3 8 4.00 14
# [[3]]
# # A tibble: 3 x 12
# # Rowwise:
# mpg cyl disp hp drat wt qsec vs am gear carb rating
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 -2.89
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 -2.28
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 -5.10
# [[4]]
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_pct
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.03266449
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.03266449
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 0.03546430
A few things:
Because you demonstrated using the unevaluated symbols (
grp=cyl), we have toquote them first, otherwise they would be evaluated before reaching the functions.You can general this out to arbitrary data by not hard-coding it in the
Mapanon-func, with:out <- Map( function(x, fun, args) do.call(fun, c(list(x), args)), list(mtcars), list(filter_df, mean_n_df, rating, pct), list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)), list(a = quo(cyl), b = quo(drat), c = quo(qsec)), list(quo(mpg))) )where the
list(.)aroundmtcarsis intentional: it appears as length-1 toMap, so it is recycled for the other args (length 4 each). Withoutlist, Map would fail because the first function would see the first column (as a vector), second function second column (and/or warning withlonger argument not a multiple of length of shorter... I really wish mis-aligned recycling in R would fail harder than that).This generalization allows this sequence of functions to be applied each to multiple datasets:
out2 <- lapply(list(mtcars[1:10,], mtcars[11:32,]), function(XYZ) { Map( function(x, fun, args) do.call(fun, c(list(x), args)), list(XYZ), list(filter_df, mean_n_df, rating, pct), list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)), list(a = quo(cyl), b = quo(drat), c = quo(qsec)), list(quo(mpg))) ) })Not sure if you're intending the inception of applying a list of functions to a list of datasets ...
CodePudding user response:
Using invoke with map2 from purrr
library(purrr)
df_list2 <- c(list(mtcars), map2(list(filter_df, mean_n_df, rating, pct),
list("Merc", expression(grp = cyl, mean2 = wt),
expression(a = cyl, b= drat, c = qsec), quote(mpg)),
~ invoke(.x, c(list(mtcars), as.list(.y)))))
-checking
all.equal(df_list2, df_list, check.attributes = FALSE)
[1] TRUE
