I want to mutate the original df to add a column to calculate the days since the last math ("M") absence. I wanted the first occurrence of a student's math absence to be NA, and if there weren't any previous math absences, I wanted the value to be Inf.
I ordered the df by date and then wrote out this line of code:
df %>% groupby(Student_ID) %>% mutate(dayssinceM = ifelse(Subject == "M", c(NA, diff(Absent_Date)), Inf))
This worked until a student had a math absence later on in the data. I tried to add another ifelse statement: ifelse(lag(Subject == "M", c(NA, diff(Absent_Date)), Inf)), but it only worked if the math absence was directly previously. I want it to be if the student had any math absence previously. I was thinking maybe adding rollapply somehow. I would love to hear your thoughts and get some help please.
Original df:
Studen_ID Absent_Date Subject
4567 08/30/2018 M
4567 09/22/2019 M
8345 09/01/2019 SS
8345 03/30/2019 S
8345 07/18/2017 S
8345 01/08/2019 M
This is the desired output:
Student_ID Absent_Date Subject dayssinceM
4567 08/30/2018 M NA
4567 09/22/2019 M 388
8345 07/18/2017 S Inf
8345 01/08/2019 M NA
8345 03/30/2019 S 81
8345 09/01/2019 SS 236
CodePudding user response:
Probably something along the lines of the below will do it for you
- Create an index whether it is M or not
- Shove this forward so you have the index for the latest "M"
- Use this index to extract the date on last "M" for each row
- Calculate diff time
library(lubridate) # for as_date (more consistent than as.Date)
df %>% mutate(mIndex = cumsum(Subject == "M"),
lastMdate = Absent_Date[mIndex],
DaysSinceM = as_date(Absent_Date) - as_date(lastMdate))
CodePudding user response:
Not a very graceful way to do it, but you could build up the answer using joins as follows:
library(tidyverse)
df <- data.frame(
Student_ID = c(rep(4567,2), rep(8345,4)),
Absent_Date = c("2018-08-30","2019-09-22","2019-09-01","2019-03-30","2017-07-18","2019-01-08"),
Subject = c("M","M","SS","S","S","M")
)
df_m <- df %>% filter(Subject == "M") %>%
mutate(dummy = 1) %>%
rename(M_Date = Absent_Date,
M_Subject = Subject)
df_daysSinceM <- df %>%
mutate(dummy = 1) %>%
full_join(df_m, by=c("Student_ID","dummy")) %>%
filter(M_Date < Absent_Date) %>%
mutate(daysSinceM = floor(difftime(Absent_Date, M_Date, units="days"))) %>%
select(Student_ID, Absent_Date, Subject, daysSinceM)
result <- df %>%
anti_join(df_daysSinceM, by=c("Student_ID","Absent_Date","Subject")) %>%
bind_rows(df_daysSinceM)
