Working on an RShiny app and am currently having trouble with dplyr's group_by() function. I have two defined functions:
gather_info: finds the category with the highest/lowest mean valuepaste_info: callsgather_infoand returns the corresponding category and value
The purpose is to return a string that - given a data frame and categorical variable - states the highest- and lowest-performing category and value of said category.
Calling gather_info with the appropriate arguments works as expected. However, paste_info consistently returns:
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `grp.col` is not found.
Here's a reproducible example, where the desired output of paste_info is "Given your data, your best performing group is Cat1 scoring 90% and your worst performing group is Cat2 scoring 20%.":
gather_info <- function(df, grp.col, maxm) {
df |>
mutate_if(
.predicate = function(x) is.character(x),
.funs = function(x) str_to_title(x)
) |>
group_by({{ grp.col }}) |>
summarize(percentage = round(mean(value, na.rm=TRUE) * 100, 2)) |>
arrange(desc(percentage)) %>% # c'est un pipe
{if (maxm) head(., 1) else tail(., 1)}
}
paste_info <- function(df, grp.col) {
high_df <- gather_info(df, grp.col, maxm=TRUE)
low_df <- gather_info(df, grp.col, maxm=FALSE)
paste0("Given your data, your best performing group is ",
high_df |> pull(grp.col), " scoring ", high_df$percentage, "%",
" and your worst performing group is ",
low_df |> pull(grp.col), " scoring ", low_df$percentage, "%.")
}
df <- data.frame(
category=c('cat1', 'cat1', 'cat2', 'cat2', 'cat2', 'cat3', 'cat3'),
value=c(1,0.8,0.2,0.3,0.1,0.5,0.5)
)
# returns category, value with highest mean value
gather_info(df, category, maxm=TRUE)
# returns category, value with lowest mean value
gather_info(df, category, maxm=FALSE)
# does not work
paste_info(df, category)
Any help is much appreciated. Thank you!
CodePudding user response:
The issue is that inside paste_info you have to use {{ to pass the grouping column grp.col to gather_info as well as when you call pull. This is for the same reason why you have to use {{ in group_by inside gather_info
In some sense {{ translates e.g. gather_info(df, {{ grp.col }}, maxm = TRUE) to gather_info(df, category, maxm = TRUE), i.e. you pass category to gather_info. Without {{ the column name stored in grp.col will not be "injected" into the expression or function call. Hence, gather_info will take grp.col as is and interprets it as the name of the grouping column. But as there I no column with name grp.col in your data you get an error.
For more info on why {{ is needed see What is data-masking and why do I need {{?.
library(dplyr)
paste_info <- function(df, grp.col) {
high_df <- gather_info(df, {{ grp.col }}, maxm = TRUE)
low_df <- gather_info(df, {{ grp.col }}, maxm = FALSE)
paste0(
"Given your data, your best performing group is ",
high_df |> pull({{ grp.col }}), " scoring ", high_df$percentage, "%",
" and your worst performing group is ",
low_df |> pull({{ grp.col }}), " scoring ", low_df$percentage, "%."
)
}
paste_info(df, category)
#> [1] "Given your data, your best performing group is Cat1 scoring 90% and your worst performing group is Cat2 scoring 20%."
