Home > Blockchain >  Merge consecutive delivery date (with certain condition) into one merge ID in R
Merge consecutive delivery date (with certain condition) into one merge ID in R

Time:01-06

So I was looking to merge consecutive dates into one merge ID (grouped by SHIP_TO). But, if the consecutive date is longer than 3 days, the merge id will be different too. Here are the following code that I used

dhl_mutate <- dhl %>%
  arrange(SHIP_TO) %>%
  group_by(MERGE_ID= cumsum(c(TRUE, diff(DELIVERY_DATE) > 1))) %>%
  ungroup()

here are the following output that I got

SHIP_TO DELIVERY_DATE MERGE_ID
2006 2021-04-01 1
2006 2021-04-02 1
2006 2021-04-03 1
2006 2021-04-04 1
2006 2021-04-19 2
2006 2021-04-23 3
2006 2021-04-27 4
2010 2021-06-01 5
2010 2021-06-02 5
2010 2021-07-05 6
2010 2021-04-07 7

My expected output is that for every three consecutive dates, the system will generate a new merge id and the new delivery date will be the same for the last date within the same merge ID

SHIP_TO DELIVERY_DATE MERGE_ID NEW_DELIVERY_DATE
2006 2021-04-01 1 2021-04-03
2006 2021-04-02 1 2021-04-03
2006 2021-04-03 1 2021-04-03
2006 2021-04-04 2 2021-04-04
2006 2021-04-19 3 2021-04-19
2006 2021-04-23 4 2021-04-23
2006 2021-04-27 5 2021-04-27
2010 2021-06-01 6 2021-06-02
2010 2021-06-02 6 2021-06-02
2010 2021-07-05 7 2021-07-05
2010 2021-04-07 8 2021-04-07

I would be extremely grateful if someone could help me with this, thank you!

CodePudding user response:

Here is one approach using ceiling_date from lubridate -

library(dplyr)
library(lubridate)

df <- df %>%
  mutate(DELIVERY_DATE = as.Date(DELIVERY_DATE), 
         NEW_DELIVERY_DATE = ceiling_date(DELIVERY_DATE, '3 days'), 
         MERGE_ID = match(NEW_DELIVERY_DATE, unique(NEW_DELIVERY_DATE))) %>%
  group_by(MERGE_ID) %>%
  mutate(NEW_DELIVERY_DATE = max(DELIVERY_DATE)) %>%
  ungroup

df

#   SHIP_TO DELIVERY_DATE MERGE_ID NEW_DELIVERY_DATE
#     <int> <date>           <int> <date>           
# 1    2006 2021-04-01           1 2021-04-03       
# 2    2006 2021-04-02           1 2021-04-03       
# 3    2006 2021-04-03           1 2021-04-03       
# 4    2006 2021-04-04           2 2021-04-04       
# 5    2006 2021-04-19           3 2021-04-19       
# 6    2006 2021-04-23           4 2021-04-23       
# 7    2006 2021-04-27           5 2021-04-27       
# 8    2010 2021-06-01           6 2021-06-02       
# 9    2010 2021-06-02           6 2021-06-02       
#10    2010 2021-07-05           7 2021-07-05       
#11    2010 2021-04-07           8 2021-04-07       

ceiling_date rounds the date up every 3 days, using match and unique we get a unique MERGE_ID and for each MERGE_ID we select the max DELIVERY_DATE.

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(SHIP_TO = c(2006L, 2006L, 2006L, 2006L, 2006L, 
2006L, 2006L, 2010L, 2010L, 2010L, 2010L), DELIVERY_DATE = c("2021-04-01", 
"2021-04-02", "2021-04-03", "2021-04-04", "2021-04-19", "2021-04-23", 
"2021-04-27", "2021-06-01", "2021-06-02", "2021-07-05", "2021-04-07"
), MERGE_ID = c(1L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L)), 
row.names = c(NA, -11L), class = "data.frame")
  •  Tags:  
  • Related