Suppose one has two asynchronous time-series in data.table:
dt1 = setkey(data.table(k = 2*(1:4) 1, v = c(10, 15, 9, 7)), k)
dt2 = setkey(data.table(k = 2*(1:3), v = c(11, 13, 6)), k)
My desired output is to merge them on k column using LOCF and all values of k from dt1 and dt2, like so:
> out
k v i.v
2 NA 11
3 10 11
4 10 13
5 15 13
6 15 6
7 9 6
9 7 6
Currently, I am using a somewhat clumsy construction:
all_keys = setkey(data.table(k = sort(c(dt1[, k], dt2[, k])), k)
dt1[all_keys, roll = T, on = 'k'][
dt2[all_keys, roll = T, on = 'k']]
This works, but feels a little contrived. Note that a simple rolling dt1[dt2, roll = T] or dt2[dt1, roll = T] would ignore keys from either dt1 or dt2.
Is there a simpler way?
P.S. Side note - is there also a simpler way to create a keyed data.table from the beginning just using data.table function, without following up with a setkey?
CodePudding user response:
You could use merge with all=T and setnafill with type='locf':
setnafill(merge(dt1,dt2,all=T),type="locf")[]
Key: <k>
k v.x v.y
<num> <num> <num>
1: 2 NA 11
2: 3 10 11
3: 4 10 13
4: 5 15 13
5: 6 15 6
6: 7 9 6
7: 9 7 6
