I have a dataframe with multiple columns and want to summarise rowwise by taking the mean on columns that start with a specific name. Therefore, this should summarise the columns and only return individual columns for each naming parameter.
For example:
iris %>% aggregate(. ~ Species, data=., sum) %>% group_by(Species) %>% mutate(summarise(across(starts_with(c('Sepal','Petal')), mean), .groups = "rowwise"))
produces:
# A tibble: 3 × 6
# Groups: Species [3]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl> <rowwise_df[,0]>
1 setosa 250. 171. 73.1 12.3
2 versicolor 297. 138. 213 66.3
3 virginica 329. 149. 278. 101.
However, I was expecting a dataframe like the following:
Species Sepal Petal
1 setosa 210.5 41.5
..
..
CodePudding user response:
The code is mixing tidyverse with base R. We may do this directly in tidyverse i.e. after grouping by 'Species', get the column wise sum with across, then get the rowMeans of the numeric columns
library(dplyr)
iris %>%
group_by(Species) %>%
summarise(across(everything(), sum), .groups = 'drop') %>%
transmute(Species, Sepal = rowMeans(across(starts_with("Sepal"))),
Petal = rowMeans(across(starts_with("Petal"))))
-output
# A tibble: 3 × 3
Species Sepal Petal
<fct> <dbl> <dbl>
1 setosa 211. 42.7
2 versicolor 218. 140.
3 virginica 239. 189.
If we want to use rowwise in groups (note that rowwise would be slower compared to vectorized rowMeans)
iris %>%
group_by(Species) %>%
summarise(across(everything(), sum), .groups = 'rowwise') %>%
transmute(Sepal = mean(c_across(starts_with("Sepal"))),
Petal = mean(c_across(starts_with("Petal")))) %>%
ungroup
-output
# A tibble: 3 × 3
Species Sepal Petal
<fct> <dbl> <dbl>
1 setosa 211. 42.7
2 versicolor 218. 140.
3 virginica 239. 189.
