I have two dataframes. The first dataframe Observations shows the date/times that a surveyor recorded events of interest, and a unique ID number for each type of event:
library(lubridate)
Observations <- data.frame(Time = dmy_hms(paste(c("13-7-2022 10:01:01","13-7-2022 14:02:01","15-7-2022 10:01:01", "15-7-2022 16:01:01"))), ID = c(1,3,1))
The second dataframe Sites shows what date/time the surveyor started and stopped looking for events (i.e., this describes the possible times that events could have been observed). TimeStart is the time the surveyor began looking for an event, TimeEnd is when they stopped. Sites also contains the latitude and longitude where the surveyor was looking for events between TimeStart and TimeEnd.
Sites <- data.frame(TimeStart = dmy_hms(paste(c("13-7-2022 10:00:00","13-7-2022 14:00:00","15-7-2022 10:00:00", "15-7-2022 16:00:00"))),
TimeEnd = dmy_hms(paste(c("13-7-2022 10:05:00","13-7-2022 14:05:00","15-7-2022 10:05:00", "15-7-2022 16:05:00"))),
Latitude = c("11.1111", "11.2222", "11.1234", "11.1487"),
Longitude = c("99.1257", "99.3478", "99.6241", "99.6214"))
So the Time that events are recorded by the surveyor (i.e., recorded in Observations) falls within one of the time ranges shown in Sites$TimeStart and Sites$TimeEnd.
I would like to merge these two dataframes so that rows for each event (ID) recorded in Observations contains the Latitude and Longitude where the surveyor was searching during the corresponding Time, as well as when they started TimeStart and stopped TimeEnd searching for each period.
In the end, Observations would look like this:
Time ID Latitude Longitude TimeStart TimeEnd
2022-07-13 10:01:01 1 11.1111 99.1257 2022-07-13 10:00:00 2022-07-13 10:05:00
2022-07-13 14:02:01 3 11.2222 99.3478 2022-07-13 14:00:00 2022-07-13 14:05:00
2022-07-15 10:01:01 2 11.1234 99.6241 2022-07-15 10:00:00 2022-07-15 10:05:00
2022-07-15 16:01:01 1 11.1487 99.6214 2022-07-15 16:00:00 2022-07-15 16:05:00
How can we merge this data by times when Observations$Time falls within a "range of times" shown in Sites$TimeStart and Sites$TimeEnd?
CodePudding user response:
We can do this by retrieving the index (row number) in Sites that fulfills the time condition:
Observations$siteindex <- sapply(Observations$Time, function(x) which(x<=Sites$TimeEnd&x>=Sites$TimeStart)[1]) # first matching row into Sites
Sites$siteindex <- 1:nrow(Sites)
result <- merge(Observations, Sites, by="siteindex")
siteindex Time ID TimeStart TimeEnd Latitude Longitude
1 1 2022-07-13 10:01:01 1 2022-07-13 10:00:00 2022-07-13 10:05:00 11.1111 99.1257
2 2 2022-07-13 14:02:01 3 2022-07-13 14:00:00 2022-07-13 14:05:00 11.2222 99.3478
3 3 2022-07-15 10:01:01 1 2022-07-15 10:00:00 2022-07-15 10:05:00 11.1234 99.6241
4 4 2022-07-15 16:01:01 2 2022-07-15 16:00:00 2022-07-15 16:05:00 11.1487 99.6214
The data might be a bit different because your data.frame is incorrect: arguments imply differing number of rows: 4, 3
