Could you help me solve the problem below: as you can see in the second part of the code I exclude the DR that have all columns that are equal to 0. However, in the third part of the code, I need to select D1 until the last column DR, for the sum to be done. But it gives an error, could you help me solve the problem?
library(dplyr)
df1 <- structure(
list(date1 = c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
"2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28"),
date2 = c("2021-04-02","2021-04-02","2021-04-03","2021-04-08","2021-04-09","2021-04-10","2021-07-01","2021-07-02","2021-07-03","2021-07-03"),
Week= c("Friday","Friday","Saturday","Thursday","Friday","Saturday","Thursday","Friday","Saturday","Saturday"),
D1 = c(2,3,4,4,6,3,4,5,6,2), DR01 = c(4,1,4,3,3,4,3,6,3,2), DR02= c(4,2,6,7,3,2,7,4,4,3),DR03= c(9,5,4,3,3,2,1,5,4,3),
DR04 = c(5,4,3,3,3,6,2,1,9,2),DR05 = c(5,4,5,3,6,2,1,9,3,4),
DR06 = c(2,4,4,3,3,5,6,7,8,3),DR07 = c(2,5,4,4,9,4,7,8,3,3),
DR08 = c(0,0,0,0,1,2,0,0,0,0),DR09 = c(0,0,0,0,0,0,0,0,0,0),DR010 = c(0,0,0,0,0,0,0,0,0,0),DR011 = c(0,4,0,0,0,0,0,0,0,0), DR012 = c(0,0,0,0,0,0,0,0,0,0)),
class = "data.frame", row.names = c(NA, -10L))
df1<-df1 %>%
select(!where(~ is.numeric(.) && all(. == 0)))
df1<-df1 %>%
group_by(date1,date2, Week) %>%
select(D1:DR012) %>%
summarise_all(sum)
CodePudding user response:
We can have the select before
library(dplyr)
df1 %>%
select(date1, date2, Week, matches("^D")) %>%
group_by(date1, date2, Week) %>%
summarise(across(everything(), sum), .groups = 'drop')
-output
# A tibble: 8 × 13
date1 date2 Week D1 DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-06-28 2021-04-02 Friday 5 5 6 14 9 9 6 7 0 4
2 2021-06-28 2021-04-03 Saturday 4 4 6 4 3 5 4 4 0 0
3 2021-06-28 2021-04-08 Thursday 4 3 7 3 3 3 3 4 0 0
4 2021-06-28 2021-04-09 Friday 6 3 3 3 3 6 3 9 1 0
5 2021-06-28 2021-04-10 Saturday 3 4 2 2 6 2 5 4 2 0
6 2021-06-28 2021-07-01 Thursday 4 3 7 1 2 1 6 7 0 0
7 2021-06-28 2021-07-02 Friday 5 6 4 5 1 9 7 8 0 0
8 2021-06-28 2021-07-03 Saturday 8 5 7 7 11 7 11 6 0 0
After we did the select, it is not clear why we have to select again. It is not really needed as summarise with across can be everything() other than the grouping columns
df1 %>%
select(!where(~ is.numeric(.) && all(. == 0))) %>%
group_by(across(date1:Week)) %>%
summarise(across(everything(), sum), .groups = 'drop')
# A tibble: 8 × 13
date1 date2 Week D1 DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-06-28 2021-04-02 Friday 5 5 6 14 9 9 6 7 0 4
2 2021-06-28 2021-04-03 Saturday 4 4 6 4 3 5 4 4 0 0
3 2021-06-28 2021-04-08 Thursday 4 3 7 3 3 3 3 4 0 0
4 2021-06-28 2021-04-09 Friday 6 3 3 3 3 6 3 9 1 0
5 2021-06-28 2021-04-10 Saturday 3 4 2 2 6 2 5 4 2 0
6 2021-06-28 2021-07-01 Thursday 4 3 7 1 2 1 6 7 0 0
7 2021-06-28 2021-07-02 Friday 5 6 4 5 1 9 7 8 0 0
8 2021-06-28 2021-07-03 Saturday 8 5 7 7 11 7 11 6 0 0
CodePudding user response:
We could use summarise with across:
library(dplyr)
df1 %>%
select(!where(~ is.numeric(.) && all(. == 0))) %>%
group_by(date1,date2, Week) %>%
summarise(across(where(is.numeric), sum))
date1 date2 Week D1 DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
1 2021-06-28 2021-04-02 Friday 2 4 4 9 5 5 2 2 0 0
2 2021-06-28 2021-04-02 Friday 3 1 2 5 4 4 4 5 0 4
3 2021-06-28 2021-04-03 Saturday 4 4 6 4 3 5 4 4 0 0
4 2021-06-28 2021-04-08 Thursday 4 3 7 3 3 3 3 4 0 0
5 2021-06-28 2021-04-09 Friday 6 3 3 3 3 6 3 9 1 0
6 2021-06-28 2021-04-10 Saturday 3 4 2 2 6 2 5 4 2 0
7 2021-06-28 2021-07-01 Thursday 4 3 7 1 2 1 6 7 0 0
8 2021-06-28 2021-07-02 Friday 5 6 4 5 1 9 7 8 0 0
9 2021-06-28 2021-07-03 Saturday 6 3 4 4 9 3 8 3 0 0
10 2021-06-28 2021-07-03 Saturday 2 2 3 3 2 4 3 3 0 0
CodePudding user response:
DR012 is filtered, so it does not exist anymore to select:
df1 %>%
select(!where(~ is.numeric(.) && all(. == 0))) %>%
names()
[1] "date1" "date2" "Week" "D1" "DR01" "DR02" "DR03" "DR04" "DR05"
[10] "DR06" "DR07" "DR08" "DR011"
Change your code to
df1 %>%
group_by(date1,date2, Week) %>%
select(D1:DR011) %>%
summarise_all(sum)
or
df1 %>%
group_by(date1,date2, Week) %>%
select(starts_with("D")) %>%
summarise_all(sum)
