How to adjust select of a function-CodePudding

Could you help me solve the problem below: as you can see in the second part of the code I exclude the DR that have all columns that are equal to 0. However, in the third part of the code, I need to select D1 until the last column DR, for the sum to be done. But it gives an error, could you help me solve the problem?

library(dplyr)

df1 <- structure(
  list(date1 = c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
                 "2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28"),
       date2 = c("2021-04-02","2021-04-02","2021-04-03","2021-04-08","2021-04-09","2021-04-10","2021-07-01","2021-07-02","2021-07-03","2021-07-03"),
       Week= c("Friday","Friday","Saturday","Thursday","Friday","Saturday","Thursday","Friday","Saturday","Saturday"),
       D1 = c(2,3,4,4,6,3,4,5,6,2), DR01 = c(4,1,4,3,3,4,3,6,3,2), DR02= c(4,2,6,7,3,2,7,4,4,3),DR03= c(9,5,4,3,3,2,1,5,4,3),
       DR04 = c(5,4,3,3,3,6,2,1,9,2),DR05 = c(5,4,5,3,6,2,1,9,3,4),
       DR06 = c(2,4,4,3,3,5,6,7,8,3),DR07 = c(2,5,4,4,9,4,7,8,3,3),
       DR08 = c(0,0,0,0,1,2,0,0,0,0),DR09 = c(0,0,0,0,0,0,0,0,0,0),DR010 = c(0,0,0,0,0,0,0,0,0,0),DR011 = c(0,4,0,0,0,0,0,0,0,0), DR012 = c(0,0,0,0,0,0,0,0,0,0)),
  class = "data.frame", row.names = c(NA, -10L))

df1<-df1 %>%
  select(!where(~ is.numeric(.) && all(. == 0)))

df1<-df1 %>%
  group_by(date1,date2, Week) %>%
  select(D1:DR012) %>%
  summarise_all(sum)

CodePudding user response：

We can have the select before

library(dplyr)
df1 %>%
   select(date1, date2, Week, matches("^D")) %>%
   group_by(date1, date2, Week) %>%
   summarise(across(everything(), sum), .groups = 'drop')

-output

# A tibble: 8 × 13
  date1      date2      Week        D1  DR01  DR02  DR03  DR04  DR05  DR06  DR07  DR08 DR011
  <chr>      <chr>      <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-06-28 2021-04-02 Friday       5     5     6    14     9     9     6     7     0     4
2 2021-06-28 2021-04-03 Saturday     4     4     6     4     3     5     4     4     0     0
3 2021-06-28 2021-04-08 Thursday     4     3     7     3     3     3     3     4     0     0
4 2021-06-28 2021-04-09 Friday       6     3     3     3     3     6     3     9     1     0
5 2021-06-28 2021-04-10 Saturday     3     4     2     2     6     2     5     4     2     0
6 2021-06-28 2021-07-01 Thursday     4     3     7     1     2     1     6     7     0     0
7 2021-06-28 2021-07-02 Friday       5     6     4     5     1     9     7     8     0     0
8 2021-06-28 2021-07-03 Saturday     8     5     7     7    11     7    11     6     0     0

After we did the select, it is not clear why we have to select again. It is not really needed as summarise with across can be everything() other than the grouping columns

df1 %>% 
    select(!where(~ is.numeric(.) && all(. == 0))) %>% 
    group_by(across(date1:Week)) %>%
    summarise(across(everything(), sum), .groups = 'drop')
# A tibble: 8 × 13
  date1      date2      Week        D1  DR01  DR02  DR03  DR04  DR05  DR06  DR07  DR08 DR011
  <chr>      <chr>      <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-06-28 2021-04-02 Friday       5     5     6    14     9     9     6     7     0     4
2 2021-06-28 2021-04-03 Saturday     4     4     6     4     3     5     4     4     0     0
3 2021-06-28 2021-04-08 Thursday     4     3     7     3     3     3     3     4     0     0
4 2021-06-28 2021-04-09 Friday       6     3     3     3     3     6     3     9     1     0
5 2021-06-28 2021-04-10 Saturday     3     4     2     2     6     2     5     4     2     0
6 2021-06-28 2021-07-01 Thursday     4     3     7     1     2     1     6     7     0     0
7 2021-06-28 2021-07-02 Friday       5     6     4     5     1     9     7     8     0     0
8 2021-06-28 2021-07-03 Saturday     8     5     7     7    11     7    11     6     0     0

CodePudding user response：

We could use summarise with across:

library(dplyr)
df1 %>%
  select(!where(~ is.numeric(.) && all(. == 0))) %>% 
  group_by(date1,date2, Week) %>%
  summarise(across(where(is.numeric), sum))

        date1      date2     Week D1 DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR011
1  2021-06-28 2021-04-02   Friday  2    4    4    9    5    5    2    2    0     0
2  2021-06-28 2021-04-02   Friday  3    1    2    5    4    4    4    5    0     4
3  2021-06-28 2021-04-03 Saturday  4    4    6    4    3    5    4    4    0     0
4  2021-06-28 2021-04-08 Thursday  4    3    7    3    3    3    3    4    0     0
5  2021-06-28 2021-04-09   Friday  6    3    3    3    3    6    3    9    1     0
6  2021-06-28 2021-04-10 Saturday  3    4    2    2    6    2    5    4    2     0
7  2021-06-28 2021-07-01 Thursday  4    3    7    1    2    1    6    7    0     0
8  2021-06-28 2021-07-02   Friday  5    6    4    5    1    9    7    8    0     0
9  2021-06-28 2021-07-03 Saturday  6    3    4    4    9    3    8    3    0     0
10 2021-06-28 2021-07-03 Saturday  2    2    3    3    2    4    3    3    0     0

CodePudding user response：

DR012 is filtered, so it does not exist anymore to select:

df1 %>% 
  select(!where(~ is.numeric(.) && all(. == 0))) %>% 
  names()

 [1] "date1" "date2" "Week"  "D1"    "DR01"  "DR02"  "DR03"  "DR04"  "DR05" 
[10] "DR06"  "DR07"  "DR08"  "DR011"

Change your code to

df1 %>%
  group_by(date1,date2, Week) %>%
  select(D1:DR011) %>%
  summarise_all(sum)

df1 %>%
  group_by(date1,date2, Week) %>%
  select(starts_with("D")) %>%
  summarise_all(sum)