Home > Back-end >  R Summarise Last N Columns with Dynamic Headers
R Summarise Last N Columns with Dynamic Headers

Time:01-25

I'm trying to summarize a table that has N number of date columns that will always exist at the end of the data set. Is there a way to summarize the last N columns instead of calling each column by name?

df <- df %>% group_by(Project) %>% summarise(???)
Project Sep-21 Oct-21 Nov-2021
1234 4 5 6
5678 10 11 5
1234 23 1 87
5678 8 16 8

Expected Output:

Project Sep-21 Oct-21 Nov-2021
1234 27 6 93
5678 18 27 13

CodePudding user response:

A base R solution.

df <- data.frame(
  Project = c(1234,5678,1234,5678),
  Sep = c(4,10,23,8),
  Oct = c(5,11,1,16),
  Nov = c(6,5,87,8)
)

t(sapply(split(df, df$Project), function(x) apply(x[,-1L], 2, sum)))

Output

     Sep Oct Nov
1234  27   6  93
5678  18  27  13

CodePudding user response:

Indexing with summarise and across may be a bit buggy as the index have to consider the grouping column as well. A neater approach is to select the columns of interest first and then do a group_by and use everything() to select the remaining columns

library(dplyr)
df %>% 
 select(Project, 7:ncol(.)) %>% 
 group_by(Project) %>% 
 summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop')

If we want to directly do without select, subtract the column index from the number of columns used in the group_by for the range

df %>%
    group_by(Project) %>%
    summarise(across((7-1):(ncol(.)-1), sum, na.rm = TRUE), .groups = 'drop')
  •  Tags:  
  • Related