I want to calculate the average of certain rows in a column based on another variable. dfin is the original df. I want to create a df like dfout
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
dfout <- data.frame(c1 = c("a1","a1","a1","a2","a2","a2","a3","a3","a3"),
c2 = c("b1","b2","b3","b4","b5","b6","b7","b8","b9"),
c3 = c(1.5,3.5,5.5,7.5,9.5,11.5,13.5,15.5,17.5))
I want to calculate the average of rows in c3 based on c2 information. dfin has three columns c1, c2, and c3
c1 has a1, a2, a3, c2 has b1, b2, b3, to b9, and c3 comprises values.
As can be seen in dfout, I want to create a new df that has calculated average of values in c3 based on c2 groups. Also keeping c1 information.
Any help would be much appreciated.
CodePudding user response:
Do you want this?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
dfin %>%
group_by(c1, c2) %>%
summarise(c3 = mean(c3), .groups = 'drop')
#> # A tibble: 9 x 3
#> c1 c2 c3
#> <chr> <chr> <dbl>
#> 1 a1 b1 1.5
#> 2 a1 b2 3.5
#> 3 a1 b3 5.5
#> 4 a2 b4 7.5
#> 5 a2 b5 9.5
#> 6 a2 b6 11.5
#> 7 a3 b7 13.5
#> 8 a3 b8 15.5
#> 9 a3 b9 17.5
Created on 2022-01-19 by the reprex package (v2.0.1)
