I have two data frames, a price dataset and a survey dataset, that I need to join on the closest date. The dates don't exactly match, and I need the price dataset to match on the closest date in the survey dataset. A previous question that I asked here on this was very helpful in giving me a an answer as to how to match on the closest date. However, I've since realized that I also need to join the data sets on region, as well as the closest date. I know how to join on closest date, I know how to join on region, but I unfortunately do not know how to join on the closest date and the exact same region.
Here's some example code and what I've been trying
#generating sample dataset
set.seed(1234)
price <- data.frame(region = sample(LETTERS[1:3],15,replace = TRUE), price = rnorm(15),price_date=sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 15))
survey <- data.frame(region = sample(LETTERS[1:3],15,replace = TRUE), survey_date=sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 15))
#this works to join on the closest date, however, the regions don't match, which is a problem
library(data.table)
setDT(price)
setDT(survey)
join <- price[survey,on=.(price_date=survey_date),roll="nearest"]
Basically what I want to do is match the date/region observation from price to the closest date, same region observation in survey, and I'm not really sure as to how to do this. Would appreciate any help.
CodePudding user response:
You could add region to on condition:
price[survey,.(region,price,x.price_date,survey_date),on=.(region,price_date=survey_date),roll="nearest"][]
region price x.price_date survey_date
<char> <num> <Date> <Date>
1: A 0.88010416 1999-10-26 1999-09-05
2: B 0.31026217 1999-12-25 1999-12-31
3: A -1.68732684 1999-04-27 1999-06-02
4: C 0.00500695 1999-08-09 1999-05-11
5: C 0.00500695 1999-08-09 1999-03-24
6: B -0.03763026 1999-08-12 1999-09-02
7: C -0.64701901 1999-12-24 2000-01-01
8: B -0.03763026 1999-08-12 1999-08-06
9: C 0.00500695 1999-08-09 1999-08-03
10: A 0.88010416 1999-10-26 1999-11-22
11: C 1.37001035 1999-09-15 1999-10-03
12: B 0.01831663 1999-07-01 1999-06-18
13: A -0.62743621 1999-03-20 1999-03-12
14: B 0.72397606 1999-02-18 1999-03-02
15: C -0.64701901 1999-12-24 1999-12-18
Note use of x. to display LHS date.
