I'm trying to mark all dates, which fall within several ranges in a different table.
The events table among other variables contains start_date and end_date of events:
events <- tibble(
name = c("Event A", "Event B"),
start_date = as.Date(c("2021-10-17", "2021-02-19")),
end_date = as.Date(c("2021-10-19", "2021-02-10"))
)
The date_info table contains date, statistic and value information in the long format for all days of the year:
date_info <- tibble(
date = as.Date(c("2021-10-16", "2021-10-16", "2021-10-17", "2021-10-17")),
statistic = c("var1", "var2", "var1", "var2"),
value = c(10, 54, 23, 34)
)
I need to make a new column in date_info to mark dates which fall within any date range of events.
I've tried the approach below, but it works only if there is one event in events
library(tidyverse)
date_info %>%
mutate(in_range = if_else(date < events$start_date | date > events$end_date, FALSE, TRUE))
I thought about creating a date_range vector in events such that code below can be used to mark the dates:
library(tidyverse)
date_info %>%
mutate(in_range = if_else(date %in% events$date_range, TRUE, FALSE))
However I'm not sure that this is the best approach. Additionally I'm not sure how to get such date range as seq() works on a single start/end date pair rather than a vector.
CodePudding user response:
This can be done as a range-based or non-equi join. Unfortunately, dplyr alone cannot do it, but one of the following should work fine.
The code below assigns the particular events$name to each row, not just an "in range" indicator. It's not hard to simplify that with in_range = !is.na(name) or similar.
fuzzyjoin
# library(fuzzyjoin)
date_info %>%
fuzzyjoin::fuzzy_left_join(events,
by = c(date = "start_date", date = "end_date"),
match_fun = list(`>=`, `<=`))
# # A tibble: 4 x 6
# date statistic value name start_date end_date
# <date> <chr> <dbl> <chr> <date> <date>
# 1 2021-10-16 var1 10 NA NA NA
# 2 2021-10-16 var2 54 NA NA NA
# 3 2021-10-17 var1 23 Event A 2021-10-17 2021-10-19
# 4 2021-10-17 var2 34 Event A 2021-10-17 2021-10-19
sqldf
# library(sqldf)
sqldf::sqldf("
select t1.*, t2.name
from date_info t1
left join events t2 on t1.date between t2.start_date and t2.end_date")
# date statistic value name
# 1 2021-10-16 var1 10 <NA>
# 2 2021-10-16 var2 54 <NA>
# 3 2021-10-17 var1 23 Event A
# 4 2021-10-17 var2 34 Event A
data.table
library(data.table)
date_info_DT <- as.data.table(date_info)
events_DT <- as.data.table(events)
date_info_DT[events_DT, name := i.name,
on = .(date >= start_date, date <= end_date)][]
# date statistic value name
# <Date> <char> <num> <char>
# 1: 2021-10-16 var1 10 <NA>
# 2: 2021-10-16 var2 54 <NA>
# 3: 2021-10-17 var1 23 Event A
# 4: 2021-10-17 var2 34 Event A
(There's also data.table::foverlaps, which requires the second data.table to be keyed.)
Another option, a bit simpler (not requiring class-changes):
date_info %>%
mutate(in_range = data.table::inrange(date, events$start_date, events$end_date))
# # A tibble: 4 x 4
# date statistic value in_range
# <date> <chr> <dbl> <lgl>
# 1 2021-10-16 var1 10 FALSE
# 2 2021-10-16 var2 54 FALSE
# 3 2021-10-17 var1 23 TRUE
# 4 2021-10-17 var2 34 TRUE
CodePudding user response:
Here's a solution using map from the purrr package that should work. It could be more concise but I made it very explicit so it's not overwhelming if you're not familiar with the syntax.
date_info |>
mutate(
in_range_n = map_dbl(date, .f = function(date){
filter(events, start_date <= date, end_date >= date) |>
nrow()
}),
in_range = in_range_n > 0
) |>
select(-in_range_n)
Output:
# A tibble: 4 x 4
date statistic value in_range
<date> <chr> <dbl> <lgl>
1 2021-10-16 var1 10 FALSE
2 2021-10-16 var2 54 FALSE
3 2021-10-17 var1 23 TRUE
4 2021-10-17 var2 34 TRUE
Let me know if I misunderstood the problem!
CodePudding user response:
Using base r
date_info$in_range <- sapply(date_info$date, function(date){
any(date >= events$start_date & date <= events$end_date)
})
gives
date statistic value in_range
<date> <chr> <dbl> <lgl>
1 2021-10-16 var1 10 FALSE
2 2021-10-16 var2 54 FALSE
3 2021-10-17 var1 23 TRUE
4 2021-10-17 var2 34 TRUE
