Home > database >  Calculating the difference between two dates by group with caveat
Calculating the difference between two dates by group with caveat

Time:01-20

Data looks like this:

df <- data.frame(
  id = c(283,994,294,294,1001,1001), 
  stint = c(1,1,1,2,1,2), 
  admit = c("2010-2-3","2011-2-4","2011-3-4","2012-4-1","2016-1-2","2017-2-3"),
  release = c("2011-2-3","2011-2-28","2011-4-1","2014-6-6","2017-2-1","2018-3-1")
)

okay so bear with me because I'm finding this kind of hard to articulate. I need to calculate the difference between the release date of the first stint and the admit date of the second stint by id. so that the difference, which I'm calling the "exposure" should look like this for the sample above

exposure=c(NA,NA,365,NA,2,NA)

So an NA will be returned if there is only 1 stint and if there are more than one stint the exposure period will be calculated using the previous release date and the current admit date. So exposure for stint three will be admit of stint 3 - the release of stint 2.

CodePudding user response:

Here is a dplyr approach. WE would find the value of admit (release) where stint is 2 (1), take the difference, and replace the first entry of exposure with that value for each group of id.

library(dplyr)

df %>% 
  mutate(
    across(c(admit, release), as.Date), 
    exposure = NA_integer_
  ) %>% 
  group_by(id) %>% 
  mutate(exposure = replace(
    exposure, 1L, 
    as.integer(admit[match(2, stint)] - release[match(1, stint)])
  ))
  

Output

# A tibble: 6 x 5
# Groups:   id [4]
     id stint admit      release    exposure
  <dbl> <dbl> <date>     <date>        <int>
1   283     1 2010-02-03 2011-02-03       NA
2   994     1 2011-02-04 2011-02-28       NA
3   294     1 2011-03-04 2011-04-01      366
4   294     2 2012-04-01 2014-06-06       NA
5  1001     1 2016-01-02 2017-02-01        2
6  1001     2 2017-02-03 2018-03-01       NA

CodePudding user response:

You want to calculate the exposure if stint == 2, otherwise return NA. That can be accomplished with ifelse. However, you want the release to be from the previous release date. That can be done with lag. But that will tie exposure values to the admit where exposure ==2, whereas you want exposure to be associated to the previous release used in the calculation. So, remove the first exposure value and add an NA at the end.

  df %>% 
    mutate(across(c(admit, release), as.Date), 
           exposure = c(ifelse(stint == 2, admit - lag(release), NA)[-1], NA))

Which yields

    id stint      admit    release exposure
1  283     1 2010-02-03 2011-02-03       NA
2  994     1 2011-02-04 2011-02-28       NA
3  294     1 2011-03-04 2011-04-01      366
4  294     2 2012-04-01 2014-06-06       NA
5 1001     1 2016-01-02 2017-02-01        2
6 1001     2 2017-02-03 2018-03-01       NA
  •  Tags:  
  • Related