I have a data.frame of dates and corresponding values.
date <- Sys.Date() sort(sample(1:26, 26))
values <- c(sample(400:415, 10, replace = TRUE),
420, 421, 422, 420, 419, 421,
sample(430:435, 10, replace = TRUE))
df <- data.frame(date, values)
I want to filter the data to rows where the range 420±2 is repeated at least five times when the date is in ascending order. The final version of df would only contain rows with values of 420, 421, 422, 420, 419, 421, as the sample() data would be removed.
I am looking for both dplyr and data.table solutions.
CodePudding user response:
Here is one option with dplyr
library(dplyr)
library(data.table)
df %>%
mutate(values2 = between(values, 420-2, 420 2)) %>%
group_by(grp = rleid(values2)) %>%
filter(n() > 5, all(values2)) %>%
ungroup %>%
select(-values2, -grp)
-output
# A tibble: 6 × 2
date values
<date> <dbl>
1 2022-09-04 420
2 2022-09-05 421
3 2022-09-06 422
4 2022-09-07 420
5 2022-09-08 419
6 2022-09-09 421
Or using base R with rle
subset(df, inverse.rle(within.list(rle(values >= (420-2) &
values <= (420 2)), {values[values & lengths < 5] <- FALSE})))
date values
11 2022-09-04 420
12 2022-09-05 421
13 2022-09-06 422
14 2022-09-07 420
15 2022-09-08 419
16 2022-09-09 421
