Home > Mobile >  rolling join two time series in data.table
rolling join two time series in data.table

Time:01-07

Suppose one has two asynchronous time-series in data.table:

dt1 = setkey(data.table(k = 2*(1:4)   1, v = c(10, 15, 9, 7)), k)
dt2 = setkey(data.table(k = 2*(1:3), v = c(11, 13, 6)), k)

My desired output is to merge them on k column using LOCF and all values of k from dt1 and dt2, like so:

> out
k v i.v
2 NA 11
3 10 11
4 10 13
5 15 13
6 15 6
7 9  6
9 7  6

Currently, I am using a somewhat clumsy construction:

all_keys = setkey(data.table(k = sort(c(dt1[, k], dt2[, k])), k)
dt1[all_keys, roll = T, on = 'k'][
    dt2[all_keys, roll = T, on = 'k']]

This works, but feels a little contrived. Note that a simple rolling dt1[dt2, roll = T] or dt2[dt1, roll = T] would ignore keys from either dt1 or dt2. Is there a simpler way?

P.S. Side note - is there also a simpler way to create a keyed data.table from the beginning just using data.table function, without following up with a setkey?

CodePudding user response:

You could use merge with all=T and setnafill with type='locf':

setnafill(merge(dt1,dt2,all=T),type="locf")[]

Key: <k>
       k   v.x   v.y
   <num> <num> <num>
1:     2    NA    11
2:     3    10    11
3:     4    10    13
4:     5    15    13
5:     6    15     6
6:     7     9     6
7:     9     7     6
  •  Tags:  
  • Related