Home > Mobile >  filtering duplicated rows with maximum values in a column with NAs
filtering duplicated rows with maximum values in a column with NAs

Time:01-22

I have a dataframe that kinda looks like this:

studentID Enrollment_date (class=Date)
56783536 2019-02-01
56783536 2019-02-12
56783406 NA
56783406 NA
56783522 2019-03-02
56783522 NA
56783599 NA
56783599 NA

I'm trying to filter the rows in order to 1) keep the row with the maximum date, 2) keep both duplicated rows when both enrollment dates are NA and 3) when one date is NA and the other isn't, keep the one row without NA. So it should output this:

studentID Enrollment_date (class=Date)
56783536 2019-02-12
56783406 NA
56783406 NA
56783522 2019-03-02
56783599 NA
56783599 NA

I tried this:

dup2_ENROLL_2020 <- dup_ENROLL_2020 %>%
  group_by(studentID) %>%
  filter(Enrollment_date==max(Enrollment_date))

And this:

dup2_ENROLL_2020 <- dup_ENROLL_2020 %>%
  group_by(studentID) %>%
  slice(which.max(Enrollment_date))

With a bunch of variations of !is.na() and na.omit() in them. None of them worked, somebody help me (sorry for the rookie problem)

CodePudding user response:

Try this:

dat %>%
  group_by(studentID) %>%
  filter(if (all(is.na(Enrollment_date))) TRUE else Enrollment_date == max(Enrollment_date, na.rm = TRUE)) %>%
  ungroup()
# # A tibble: 6 x 2
#   studentID Enrollment_date
#       <int> <date>         
# 1  56783536 2019-02-12     
# 2  56783406 NA             
# 3  56783406 NA             
# 4  56783522 2019-03-02     
# 5  56783599 NA             
# 6  56783599 NA             
  •  Tags:  
  • Related