I'm sure I'm missing something about how grouping works. When I use my own function within a summarize statement (after grouping) I get the same result for each group, which is wrong. Also I don't get any errors or warnings, it's just silently giving me the wrong answer.
My goal is to get this custom function to play nice with group_by.
Here is the code:
library(dplyr)
#data
transect <- data.frame(acronym = c("ABEESC", "ABIBAL", "AMMBRE", "ANTELE", "ABEESC", "ABIBAL", "AMMBRE"),
quad_id = c(1, 1, 1, 1, 2, 2, 2))
#scores
c_scores <- data.frame(acronym = c("ABEESC", "ABIBAL", "AMMBRE", "ANTELE"),
c = c(5, 6, 6, 10))
#custom fun
my_fun <- function(data, scores){
join <- left_join(data, scores, by = "acronym")
mean <- mean(join$c)
return(mean)
}
#this works
my_fun(transect, c_scores)
#this also works
transect %>% my_fun(., c_scores)
#this doesn't...
transect %>%
group_by(quad_id) %>%
summarise(mean_c = my_fun(., scores = c_scores))
this is my result:
| quad_id | mean_c |
|---|---|
| 1 | 6.29 |
| 2 | 6.29 |
this is what I want:
| quad_id | mean_c |
|---|---|
| 1 | 6.75 |
| 2 | 5.66 |
CodePudding user response:
We may use cur_data() as input to the function instead of . as . can take the full dataset instead of subset of data in the group
library(dplyr)
transect %>%
group_by(quad_id) %>%
summarise(mean_c = my_fun(cur_data(), scores = c_scores))
-output
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.75
2 2 5.67
If we want a message when it is grouped, then use is_grouped_df
my_fun2 <- function(data, scores)
{
if(dplyr::is_grouped_df(data))
{
message("data is grouped, so use cur_data() as data")
}
left_join(data, scores, by = "acronym") %>%
pull(c) %>%
mean
}
-testing
> transect %>%
group_by(quad_id) %>%
summarise(mean_c = my_fun2(., scores = c_scores))
data is grouped, so use cur_data() as data
data is grouped, so use cur_data() as data
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.29
2 2 6.29
> transect %>%
group_by(quad_id) %>%
summarise(mean_c = my_fun2(cur_data(), scores = c_scores))
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.75
2 2 5.67
Note that the messages are repeated as the function is applied multiple times (n number of groups) after the grouping when it is inside summarise. If we do it outside, the message will be printed once
> transect %>%
group_by(quad_id) %>%
my_fun2(., c_scores)
data is grouped, so use cur_data() as data
[1] 6.285714
If we want a single function, we may also do
my_fun3 <- function(data, scores, grps = NULL)
{
data <- left_join(data, scores, by = "acronym")
if(!missing(grps))
{
data <- data %>%
group_by(across(all_of(grps)))
}
data %>%
summarise(mean_c = mean(c, na.rm = TRUE))
}
-testing
> my_fun3(transect, c_scores, "quad_id")
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.75
2 2 5.67
>
> my_fun3(transect, c_scores)
mean_c
1 6.285714
or simplify without any if condition using missing by making use of any_of in group_by
my_fun3 <- function(data, scores, grps = NULL)
{
left_join(data, scores, by = "acronym") %>%
group_by(across(any_of(grps))) %>%
summarise(mean_c = mean(c, na.rm = TRUE))
}
-testing
> my_fun3(transect, c_scores, "quad_id")
# A tibble: 2 × 2
quad_id mean_c
<dbl> <dbl>
1 1 6.75
2 2 5.67
> my_fun3(transect, c_scores)
# A tibble: 1 × 1
mean_c
<dbl>
1 6.29
