do() superseded! Alternative is to use across(), nest_by(), and summarise, how?-CodePudding

I'm doing something quite simple. Given a dataframe of start dates and end dates for specific periods I want to expand/create a full sequence for each period binned by week (with the factor for each row), then output this in a single large dataframe.

For instance:

library(tidyverse)
library(lubridate)

# Dataset
  start_dates = ymd_hms(c("2019-05-08 00:00:00",
                          "2020-01-17 00:00:00",
                          "2020-03-03 00:00:00",
                          "2020-05-28 00:00:00",
                          "2020-12-10 00:00:00",
                          "2021-05-07 00:00:00",
                          "2022-01-04 00:00:00"), tz = "UTC")
  
  end_dates = ymd_hms(c( "2019-10-24 00:00:00",
                         "2020-03-03 00:00:00", 
                         "2020-05-28 00:00:00",
                         "2020-12-10 00:00:00",
                         "2021-05-07 00:00:00",
                         "2022-01-04 00:00:00",
                         "2022-01-19 00:00:00"), tz = "UTC") 
  
  df1 = data.frame(studying = paste0("period",seq(1:7),sep = ""),start_dates,end_dates)

It was suggested to me to use do(), which currently works fine but I hate it when things are superseded. I also have a way of doing it using map2. But reading the file (https://dplyr.tidyverse.org/reference/do.html) suggests you can use nest_by(), across() and summarise() to do the same job as do(), how would I go about getting same result? I've tried a lot of things but I just can't seem to get it.

# do() way to do it
df1 %>% 
  group_by(studying) %>% 
  do(data.frame(week=seq(.$start_dates,.$end_dates,by="1 week")))

# transmute() way to do it
 df1 %>% 
  transmute(weeks = map2(start_dates,end_dates, seq, by = "1 week"), studying) 
 %>% unnest(cols = c(weeks))

CodePudding user response：

You can also use tidyr::complete:

df1 %>% 
  group_by(studying) %>% 
  complete(start_dates = seq(from = start_dates, to = end_dates, by = "1 week")) %>% 
  select(-end_dates, weeks = start_dates)

# A tibble: 134 x 2
# Groups:   studying [7]
   studying weeks              
   <chr>    <dttm>             
 1 period1  2019-05-08 00:00:00
 2 period1  2019-05-15 00:00:00
 3 period1  2019-05-22 00:00:00
 4 period1  2019-05-29 00:00:00
 5 period1  2019-06-05 00:00:00
 6 period1  2019-06-12 00:00:00
 7 period1  2019-06-19 00:00:00
 8 period1  2019-06-26 00:00:00
 9 period1  2019-07-03 00:00:00
10 period1  2019-07-10 00:00:00
# ... with 124 more rows

CodePudding user response：

Not sure if this exactly what you are looking for, but here is my attempt with rowwise and unnest

df1 %>% 
  rowwise() %>% 
  mutate(week = list(seq(start_dates, end_dates, by = "1 week"))) %>% 
  select(studying, week) %>% 
  unnest(cols = c(week))

CodePudding user response：

As the documentation of ?do suggests, we can now use summarise and replace the . with across():

library(tidyverse)
library(lubridate)

df1 %>% 
  group_by(studying) %>% 
  summarise(data.frame(week = seq(across()$start_dates,
                                  across()$end_dates,
                                  by = "1 week")))
#> `summarise()` has grouped output by 'studying'. You can override using the
#> `.groups` argument.
#> # A tibble: 134 x 2
#> # Groups:   studying [7]
#>    studying week               
#>    <chr>    <dttm>             
#>  1 period1  2019-05-08 00:00:00
#>  2 period1  2019-05-15 00:00:00
#>  3 period1  2019-05-22 00:00:00
#>  4 period1  2019-05-29 00:00:00
#>  5 period1  2019-06-05 00:00:00
#>  6 period1  2019-06-12 00:00:00
#>  7 period1  2019-06-19 00:00:00
#>  8 period1  2019-06-26 00:00:00
#>  9 period1  2019-07-03 00:00:00
#> 10 period1  2019-07-10 00:00:00
#> # … with 124 more rows

^{Created on 2022-01-19 by the reprex package (v0.3.0)}

CodePudding user response：

Although marked Experimental the help file for group_modify does say that

‘group_modify()’ is an evolution of ‘do()’

and, in fact, the code for the example in the question using group_modify is nearly the same as with do.

# with group_modify
df2 <- df1 %>% 
  group_by(studying) %>% 
  group_modify(~ data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))

# with do
df0 <- df1 %>% 
  group_by(studying) %>% 
  do(data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))

identical(df2, df0)
## [1] TRUE

CodePudding user response：

Another approach:

library(tidyverse)

df1 %>%
    group_by(studying) %>%
    summarise(df = tibble(weeks = seq(start_dates, end_dates, by = 'week'))) %>%
    unnest(df)
#> `summarise()` has grouped output by 'studying'. You can override using the `.groups` argument.
#> # A tibble: 134 × 2
#> # Groups:   studying [7]
#>    studying weeks              
#>    <chr>    <dttm>             
#>  1 period1  2019-05-08 00:00:00
#>  2 period1  2019-05-15 00:00:00
#>  3 period1  2019-05-22 00:00:00
#>  4 period1  2019-05-29 00:00:00
#>  5 period1  2019-06-05 00:00:00
#>  6 period1  2019-06-12 00:00:00
#>  7 period1  2019-06-19 00:00:00
#>  8 period1  2019-06-26 00:00:00
#>  9 period1  2019-07-03 00:00:00
#> 10 period1  2019-07-10 00:00:00
#> # … with 124 more rows

^{Created on 2022-01-20 by the reprex package (v2.0.1)}