I have two dataframes insitu and model:
dput(head(insitu,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR",
"LAM", "LAM", "LAM", "LAM", "LAM", "LAM"), D_SOS = structure(c(16929,
17149, 17422, 17850, 18389, 18202, 17044, 16744, 17300, 17522,
18027, 18198), class = "Date"), D_EOS = structure(c(17067, 17353,
17712, 18082, 18516, 18360, 17123, 17002, 17414, 17722, 18148,
18446), class = "Date")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
dput(head(model,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR",
"AUR", "AUR", "LAM", "LAM", "LAM", "LAM", "LAM", "LAM", "LAM"
), EVI_SOS = structure(c(16934, 17137, 17378, 17605, 17862, 18003,
18192, 18395, 16744, 17134, 17278, 17518, 17725, 18004, 18200
), class = "Date"), EVI_EOS = structure(c(17074, 17361, 17591,
17798, 17994, 18096, 18376, 18594, 17106, 17252, 17431, 17705,
17862, 18173, 18549), class = "Date")), row.names = c(NA, -15L
), class = c("tbl_df", "tbl", "data.frame"))
What I want to do is the following:
I want to find the best matching dates between two dataframes on the selected columns and corresponding rows. In other words, in the dataframe insitu in the column D_SOS which intercept with the rows of AUR of the column ID which dates best match with the column EVI_SOS which intercept the rows AUR of the column ID from the dataframe model. The same will have to be made on the LAM rows.
The desired output example would be:
dput(head(output,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR",
"LAM", "LAM", "LAM", "LAM", "LAM", "LAM"), D_SOS = structure(c(16929,
17149, 17422, 17850, 18389, 18202, 17044, 16744, 17300, 17522,
18027, 18198), class = "Date"), EVI_SOS = structure(c(16934,
17137, 17378, 17862, 18395, 18192, 17134, 16744, 17278, 17518,
18004, 18200), class = "Date"), D_EOS = structure(c(17067, 17353,
17712, 18082, 18516, 18360, 17123, 17002, 17414, 17722, 18148,
18446), class = "Date"), EVI_EOS = structure(c(17074, 17361,
17798, 18096, 18594, 18376, 17252, 17106, 17431, 17705, 18173,
18549), class = "Date")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
It will look like this:
ID D_SOS EVI_SOS D_EOS EVI_EOS
1 AUR 2016-05-08 2016-05-13 2016-09-23 2016-09-30
2 AUR 2016-12-14 2016-12-02 2017-07-06 2017-07-14
3 AUR 2017-09-13 2017-07-31 2018-06-30 2018-09-24
4 AUR 2018-11-15 2018-11-27 2019-07-05 2019-07-19
5 AUR 2020-05-17 2020-05-13 2020-09-11 2020-11-28
6 AUR 2019-11-02 2019-10-23 2020-04-08 2020-04-24
7 LAM 2016-08-31 2016-11-29 2016-11-18 2017-03-27
8 LAM 2015-11-05 2015-11-05 2016-07-20 2016-11-01
9 LAM 2017-05-14 2017-04-22 2017-09-05 2017-09-22
10 LAM 2017-12-22 2017-12-18 2018-07-10 2018-06-23
11 LAM 2019-05-11 2019-04-18 2019-09-09 2019-10-04
12 LAM 2019-10-29 2019-10-31 2020-07-03 2020-10-14
Basically, over the 8 AUR dates of model dataframe 6 will match since insitu only has 6 dates in AUR. For LAM model dataframe has 7 dates but insitu has 6 dates in LAM which will be the number to match.
The output would have first the column of insitu, for example D_SOS and then the one of model, for example EVI_SOS which match with the correspondent dates.
Any help will be much appreciatted.
CodePudding user response:
library(data.table)
setDT(insitu)
setDT(model)
insitu[, key := D_SOS]
model[, key := EVI_SOS]
setkey(insitu, ID, key)
setkey(model, ID, key)
model[insitu, roll = "nearest"][, .(ID, D_SOS, EVI_SOS, D_EOS, EVI_EOS)]
# ID D_SOS EVI_SOS D_EOS EVI_EOS
# 1: AUR 2016-05-08 2016-05-13 2016-09-23 2016-09-30
# 2: AUR 2016-12-14 2016-12-02 2017-07-06 2017-07-14
# 3: AUR 2017-09-13 2017-07-31 2018-06-30 2018-03-01
# 4: AUR 2018-11-15 2018-11-27 2019-07-05 2019-04-08
# 5: AUR 2019-11-02 2019-10-23 2020-04-08 2020-04-24
# 6: AUR 2020-05-07 2020-05-13 2020-09-11 2020-11-28
# 7: LAM 2015-11-05 2015-11-05 2016-07-20 2016-11-01
# 8: LAM 2016-08-31 2016-11-29 2016-11-18 2017-03-27
# 9: LAM 2017-05-14 2017-04-22 2017-09-05 2017-09-22
# 10: LAM 2017-12-22 2017-12-18 2018-07-10 2018-06-23
# 11: LAM 2019-05-11 2019-04-18 2019-09-09 2019-10-04
# 12: LAM 2019-10-29 2019-10-31 2020-07-03 2020-10-14
