Home > Back-end >  Removing IDs based on months
Removing IDs based on months

Time:01-09

I have a data set ranging from the first of January 2010 to the end of Decemeber 2012. I would like to identify IDs with specific months present, then filter all of those IDs from the data set without filtering out the specific months for those IDs.

For example: If ID "B" was missing Month 2 (not simulated in the data set below), I would like to remove ID "B" from the whole data set, and keep on A, C, and D with all of the months in the data set for those IDs intact.

How would I do this?

library(lubridate)
library(tidyverse)
date <- rep_len(seq(dmy("01-01-2010"), dmy("31-12-2011"), by = "days"), 5000)
ID <-  rep(c("A","B","C"), 5000)
df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

df$jDate <- yday(as.Date(df$date))
df$Month <- month(df$date)
df$year <- year(df$date)

set.seed(1234)
drop_rows <- sapply(sample(1:nrow(df), 3), function(i) {
  return(i:(i 100))
}, simplify = FALSE) %>% unlist()

df <- df[-c(drop_rows), ]

CodePudding user response:

We may group by 'year', and 'ID', get the number of distinct (n_distinct) 'Month', check if it is equal to 12 in filter. If an 'year', 'ID' doesn't have 12 unique months, that will be dropped

library(dplyr)
df %>%
   group_by(year, ID) %>% 
   filter(n_distinct(Month) == 12) %>% 
   ungroup
  •  Tags:  
  • Related