I have a data set ranging from the first of January 2010 to the end of Decemeber 2012. I would like to identify IDs with specific months present, then filter all of those IDs from the data set without filtering out the specific months for those IDs.
For example:
If ID "B" was missing Month 2 (not simulated in the data set below), I would like to remove ID "B" from the whole data set, and keep on A, C, and D with all of the months in the data set for those IDs intact.
How would I do this?
library(lubridate)
library(tidyverse)
date <- rep_len(seq(dmy("01-01-2010"), dmy("31-12-2011"), by = "days"), 5000)
ID <- rep(c("A","B","C"), 5000)
df <- data.frame(date = date,
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000),
ID)
df$jDate <- yday(as.Date(df$date))
df$Month <- month(df$date)
df$year <- year(df$date)
set.seed(1234)
drop_rows <- sapply(sample(1:nrow(df), 3), function(i) {
return(i:(i 100))
}, simplify = FALSE) %>% unlist()
df <- df[-c(drop_rows), ]
CodePudding user response:
We may group by 'year', and 'ID', get the number of distinct (n_distinct) 'Month', check if it is equal to 12 in filter. If an 'year', 'ID' doesn't have 12 unique months, that will be dropped
library(dplyr)
df %>%
group_by(year, ID) %>%
filter(n_distinct(Month) == 12) %>%
ungroup
