I have a dataframe df which has some very large number of columns. Say one of those columns is named year, another is named col1, and another is named col2. If I am aggregating by year, how can I aggregate every column of df except col1 and col2 in one line of code using aggregate() (I wish to exclude col1 and col2 from the resulting data frame)? Can I use . notation to do this?
aggregate(. - col1 - col2 ~ year , data = df, FUN = mean, na.action= na.pass)
Something like the above line of code obviously doesn't work, but it may help illuminate what I mean. I want to somehow exclude col1 and col2 from the left hand side of ~ while still using . to capture all of the other variables (since there are many of them). Is this possible? I realize it may be more effective to simply remove these two variables from df before aggregating, but is there a way to achieve this result in just one line of code using only aggregate()?
CodePudding user response:
A tidyverse option would be
library(dplyr)
df %>%
group_by(year) %>%
summarise(across(-c(col1, col2), mean, na.rm = TRUE), .groups = 'drop')
With aggregate, we may also use subset with select option
aggregate(. ~ year, data = subset(df, select = -c(col1, col2)),
FUN = mean, na.rm = TRUE))
CodePudding user response:
aggregate(.~year, data=df[,colnames(df)[!colnames(df) %in% c("col1","col2")]], FUN=mean)
or data.table
library(data.table)
setDT(df)[,lapply(.SD, mean,na.rm=T),by=year,.SDcols=!c("col1","col2")]
