Create a rule condition based on count and dates-CodePudding

I'd like to create a rule condition based on count and dates, for this, I try:

    # Package
    library(dplyr)

    # Open data set
    pred_avg<- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/cc_mean_CI.csv")
    str(pred_avg)
   
    # Create a rule if canopycoverSDmin < pred_avg, bad or good conditions
    pred_avg$class<-ifelse(pred_avg$canopycoverSDmin<pred_avg$covermin,"bad","good")
    pred_avg <-pred_avg[,-c(1,3:8)]
    #'data.frame':  17 obs. of  2 variables:
    #$ DATE : chr  "2021-06-04" "2021-06-14" "2021-06-24" "2021-07-04" ...
    #$ class: chr  "good" "good" "bad" "bad" ...

    # Now I'd like to create a decision if I have 3 "bad"s than attack, if not monitoring
    pred_avg_final <- pred_avg %>% group_by(DATE) %>% mutate(class = factor(class)) %>%
            count(class, name = "occurencies", .drop = F) %>%
            summarize(decision=ifelse(occurencies>=3,"attack","monitoring"))
    pred_avg_final
#      A tibble: 34 x 2
#      Groups:   DATE [17]
#       DATE       decision  
#       <chr>      <chr>     
#     1 2021-06-04 monitoring
#     2 2021-06-04 monitoring
#     3 2021-06-14 monitoring
#     4 2021-06-14 monitoring
#     5 2021-06-24 monitoring
#     6 2021-06-24 monitoring
#     7 2021-07-04 monitoring
#     8 2021-07-04 monitoring
#     9 2021-07-09 monitoring
#    10 2021-07-09 monitoring

But I have a problem that I don't have success solving. I'd like to find any way to apply the condition ifelse(occurencies>=3,"attack","monitoring") but just only for neighbourhood dates and not for non-continuous dates. For example, I have "bad" in 2021-06-24, 2021-07-04 and 2021-07-09 (continuos or neighbourhood dates), the decision in the day 2021-07-09 is attack, for the other dates is monitoring just the end because I don't have 3 "bad"s in neighbourhood dates again.

My deserible output is:

#          DATE class decision
# 1  2021-06-04  good monitoring
# 2  2021-06-14  good monitoring
# 3  2021-06-24   bad monitoring
# 4  2021-07-04   bad monitoring
# 5  2021-07-09   bad attack
# 6  2021-07-19  good monitoring
# 7  2021-07-24  good monitoring
# 8  2021-08-03  good monitoring
# 9  2021-08-08  good monitoring
# 10 2021-08-13  good monitoring
# 11 2021-08-23   bad monitoring
# 12 2021-09-02  good monitoring
# 13 2021-09-07  good monitoring
# 14 2021-09-22   bad monitoring
# 15 2021-10-22   bad monitoring
# 16 2021-12-06  good monitoring
# 17 2021-12-26  good monitoring

Please, any help with it?

CodePudding user response：

You can take a look at previous values with the function dplyr::lag(). Is this what you're looking for?

pred_avg %>% 
  mutate(decision = ifelse(class == "bad" & lag(class, 1) == "bad" & lag(class, 2) == "bad", "attack", "monitoring"))