I'm trying to summarize a table that has N number of date columns that will always exist at the end of the data set. Is there a way to summarize the last N columns instead of calling each column by name?
df <- df %>% group_by(Project) %>% summarise(???)
| Project | Sep-21 | Oct-21 | Nov-2021 |
|---|---|---|---|
| 1234 | 4 | 5 | 6 |
| 5678 | 10 | 11 | 5 |
| 1234 | 23 | 1 | 87 |
| 5678 | 8 | 16 | 8 |
Expected Output:
| Project | Sep-21 | Oct-21 | Nov-2021 |
|---|---|---|---|
| 1234 | 27 | 6 | 93 |
| 5678 | 18 | 27 | 13 |
CodePudding user response:
A base R solution.
df <- data.frame(
Project = c(1234,5678,1234,5678),
Sep = c(4,10,23,8),
Oct = c(5,11,1,16),
Nov = c(6,5,87,8)
)
t(sapply(split(df, df$Project), function(x) apply(x[,-1L], 2, sum)))
Output
Sep Oct Nov
1234 27 6 93
5678 18 27 13
CodePudding user response:
Indexing with summarise and across may be a bit buggy as the index have to consider the grouping column as well. A neater approach is to select the columns of interest first and then do a group_by and use everything() to select the remaining columns
library(dplyr)
df %>%
select(Project, 7:ncol(.)) %>%
group_by(Project) %>%
summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop')
If we want to directly do without select, subtract the column index from the number of columns used in the group_by for the range
df %>%
group_by(Project) %>%
summarise(across((7-1):(ncol(.)-1), sum, na.rm = TRUE), .groups = 'drop')
