I have several data frames that I want to join together. Before I do that, I'm trying to create a function that will let me deal with duplicates in column 1 by using group by and summing up the values in column 2. The issue is that I want to keep the same name for column 2, and I can't figure out how to do that.
For example:
fruit_2015 <- data.frame(type = c("kiwi", "pineapple", "kiwi", "raspberry"), count_2015 = 1:4)
library(dplyr)
sum_duplicates <- function(df, x) {
x <- enquo(x)
df %>%
group_by(type) %>%
summarize(x = sum(!!x))
}
When I do this, the rows are aggregated successfully but the second column is named "x" instead of the original column name.
CodePudding user response:
You could achieve your desired result using the assignment operator := and using !!x on the LHS:
library(dplyr)
sum_duplicates <- function(df, x) {
x <- enquo(x)
df %>%
group_by(type) %>%
summarize(!!x := sum(!!x))
}
sum_duplicates(fruit_2015, count_2015)
#> # A tibble: 3 × 2
#> type count_2015
#> <chr> <int>
#> 1 kiwi 4
#> 2 pineapple 2
#> 3 raspberry 4
As a second option you could make use of the curly-curly-operator {{ as a replacement for enquo !! and some glue magic to achieve your desired result like so:
sum_duplicates1 <- function(df, x) {
df %>%
group_by(type) %>%
summarize("{{x}}" := sum({{ x }}))
}
sum_duplicates1(fruit_2015, count_2015)
#> # A tibble: 3 × 2
#> type count_2015
#> <chr> <int>
#> 1 kiwi 4
#> 2 pineapple 2
#> 3 raspberry 4
CodePudding user response:
If you take out the x assignation, then you can get a default value generated for the column name as below:
> sum_duplicates <- function(df, x) {
x <- enquo(x)
df %>%
group_by(type) %>%
summarize(sum(!!x))
}
> sum_duplicates(fruit_2015,2)
# A tibble: 3 × 2
type `sum(2)`
<fct> <dbl>
1 kiwi 2
2 pineapple 2
3 raspberry 2
> sum_duplicates(fruit_2015,1)
# A tibble: 3 × 2
type `sum(1)`
<fct> <dbl>
1 kiwi 1
2 pineapple 1
3 raspberry 1
>
I would personally would re factor the code to secure polymorphism. For example, the same function where you do not enter the column index, just let the function iterate through the columns and count for each of them
