Is there an R function that can calculate the days since a condition of another column?-CodePudding

I want to mutate the original df to add a column to calculate the days since the last math ("M") absence. I wanted the first occurrence of a student's math absence to be NA, and if there weren't any previous math absences, I wanted the value to be Inf.

I ordered the df by date and then wrote out this line of code:

df %>% groupby(Student_ID) %>% mutate(dayssinceM = ifelse(Subject == "M", c(NA, diff(Absent_Date)), Inf))

This worked until a student had a math absence later on in the data. I tried to add another ifelse statement: ifelse(lag(Subject == "M", c(NA, diff(Absent_Date)), Inf)), but it only worked if the math absence was directly previously. I want it to be if the student had any math absence previously. I was thinking maybe adding rollapply somehow. I would love to hear your thoughts and get some help please.

Original df:

 Studen_ID       Absent_Date       Subject        

    4567           08/30/2018          M
    4567           09/22/2019          M
    8345           09/01/2019          SS
    8345           03/30/2019          S         
    8345           07/18/2017          S
    8345           01/08/2019          M

This is the desired output:

 Student_ID       Absent_Date       Subject         dayssinceM            

    4567           08/30/2018          M                 NA
    4567           09/22/2019          M                 388
    8345           07/18/2017          S                 Inf
    8345           01/08/2019          M                 NA        
    8345           03/30/2019          S                 81
    8345           09/01/2019         SS                 236

CodePudding user response：

Probably something along the lines of the below will do it for you

Create an index whether it is M or not
Shove this forward so you have the index for the latest "M"
Use this index to extract the date on last "M" for each row
Calculate diff time

library(lubridate) # for as_date (more consistent than as.Date)
df %>% mutate(mIndex = cumsum(Subject == "M"), 
              lastMdate = Absent_Date[mIndex],
              DaysSinceM = as_date(Absent_Date) - as_date(lastMdate))

CodePudding user response：

Not a very graceful way to do it, but you could build up the answer using joins as follows:

      library(tidyverse)
      
      df <- data.frame(
        Student_ID = c(rep(4567,2), rep(8345,4)),
        Absent_Date = c("2018-08-30","2019-09-22","2019-09-01","2019-03-30","2017-07-18","2019-01-08"),
        Subject = c("M","M","SS","S","S","M")
      )
      
      df_m <- df %>% filter(Subject == "M") %>%
        mutate(dummy = 1) %>%
        rename(M_Date = Absent_Date,
               M_Subject = Subject)
      
      df_daysSinceM <- df %>%
        mutate(dummy = 1) %>%
        full_join(df_m, by=c("Student_ID","dummy")) %>%
        filter(M_Date < Absent_Date) %>%
        mutate(daysSinceM = floor(difftime(Absent_Date, M_Date, units="days"))) %>%
        select(Student_ID, Absent_Date, Subject, daysSinceM)
      
      result <- df %>%
        anti_join(df_daysSinceM, by=c("Student_ID","Absent_Date","Subject")) %>%
        bind_rows(df_daysSinceM)