Home > Software engineering >  Is there an R function that can calculate the days since a condition of another column?
Is there an R function that can calculate the days since a condition of another column?

Time:01-27

I want to mutate the original df to add a column to calculate the days since the last math ("M") absence. I wanted the first occurrence of a student's math absence to be NA, and if there weren't any previous math absences, I wanted the value to be Inf.

I ordered the df by date and then wrote out this line of code:

df %>% groupby(Student_ID) %>% mutate(dayssinceM = ifelse(Subject == "M", c(NA, diff(Absent_Date)), Inf))

This worked until a student had a math absence later on in the data. I tried to add another ifelse statement: ifelse(lag(Subject == "M", c(NA, diff(Absent_Date)), Inf)), but it only worked if the math absence was directly previously. I want it to be if the student had any math absence previously. I was thinking maybe adding rollapply somehow. I would love to hear your thoughts and get some help please.

Original df:

 Studen_ID       Absent_Date       Subject        

    4567           08/30/2018          M
    4567           09/22/2019          M
    8345           09/01/2019          SS
    8345           03/30/2019          S         
    8345           07/18/2017          S
    8345           01/08/2019          M

This is the desired output:

 Student_ID       Absent_Date       Subject         dayssinceM            

    4567           08/30/2018          M                 NA
    4567           09/22/2019          M                 388
    8345           07/18/2017          S                 Inf
    8345           01/08/2019          M                 NA        
    8345           03/30/2019          S                 81
    8345           09/01/2019         SS                 236

CodePudding user response:

Probably something along the lines of the below will do it for you

  1. Create an index whether it is M or not
  2. Shove this forward so you have the index for the latest "M"
  3. Use this index to extract the date on last "M" for each row
  4. Calculate diff time
library(lubridate) # for as_date (more consistent than as.Date)
df %>% mutate(mIndex = cumsum(Subject == "M"), 
              lastMdate = Absent_Date[mIndex],
              DaysSinceM = as_date(Absent_Date) - as_date(lastMdate))

CodePudding user response:

Not a very graceful way to do it, but you could build up the answer using joins as follows:

      library(tidyverse)
      
      df <- data.frame(
        Student_ID = c(rep(4567,2), rep(8345,4)),
        Absent_Date = c("2018-08-30","2019-09-22","2019-09-01","2019-03-30","2017-07-18","2019-01-08"),
        Subject = c("M","M","SS","S","S","M")
      )
      
      df_m <- df %>% filter(Subject == "M") %>%
        mutate(dummy = 1) %>%
        rename(M_Date = Absent_Date,
               M_Subject = Subject)
      
      df_daysSinceM <- df %>%
        mutate(dummy = 1) %>%
        full_join(df_m, by=c("Student_ID","dummy")) %>%
        filter(M_Date < Absent_Date) %>%
        mutate(daysSinceM = floor(difftime(Absent_Date, M_Date, units="days"))) %>%
        select(Student_ID, Absent_Date, Subject, daysSinceM)
      
      result <- df %>%
        anti_join(df_daysSinceM, by=c("Student_ID","Absent_Date","Subject")) %>%
        bind_rows(df_daysSinceM)
  •  Tags:  
  • Related