I have the following data structure, where I want to interpolate the data row-wise until a certain year:
require('data.table')
test_dt <- data.table(iso1 = c('BTN', 'IND', 'BGD'),
iso2 = c('AFG', 'AFG', 'AFG'),
year = c(2006, 2003, 2006))
I came up with the following function that works well for the single-row case, but does not work for the general case:
interpolate_rows <- function(dt, stop_year = 2008) {
year <- as.integer(dt[, .SD, .SDcols = 'year'])
# If year is less than stop year, fill in observations:
if (year < stop_year) {
time_delta <- seq(year, stop_year)
# Explode bilateral country observation:
dt <- dt[rep(dt[, .I], length(time_delta))]
# Replace year column w/ time_delta sequence:
dt <- dt[, year := time_delta]
}
return(dt)
}
## Output
bar <- interpolate_rows(test_dt[1])
bar
iso1 iso2 year
1: BTN AFG 2006
2: BTN AFG 2007
3: BTN AFG 2008
What I'd like to have is the following:
bar <- interpolate_rows(test_dt)
bar
iso1 iso2 year
1: BTN AFG 2006
2: BTN AFG 2007
3: BTN AFG 2008
6: IND AFG 2003
7: IND AFG 2004
8: IND AFG 2005
9: IND AFG 2006
10: IND AFG 2007
11: IND AFG 2008
14: BGD AFG 2006
14: BGD AFG 2007
14: BGD AFG 2008
I know the culprit is most likely this line
year <- as.integer(dt[, .SD, .SDcols = 'year']), but I got no clue how to substitute this for a working vector solution. I tried to nest an lapply() function within interpolate_rows() to extract the year for each unique group and played around with Map(), but none of these yielded working solutions.
Any help pointing me to feasible vector solution, would be greatly appreciated.
CodePudding user response:
What about simply using by:
test_dt[, .(year = min(year):stop_year), by = .(iso1, iso2)]
# iso1 iso2 year
# 1: BTN AFG 2006
# 2: BTN AFG 2007
# 3: BTN AFG 2008
# 4: IND AFG 2003
# 5: IND AFG 2004
# 6: IND AFG 2005
# 7: IND AFG 2006
# 8: IND AFG 2007
# 9: IND AFG 2008
# 10: BGD AFG 2006
# 11: BGD AFG 2007
# 12: BGD AFG 2008
CodePudding user response:
One way using dplyr and tidyr libraries.
library(dplyr)
library(tidyr)
interpolate_rows <- function(dt, stop_year = 2008) {
dt %>%
group_by(iso1, iso2) %>%
complete(year = year : stop_year) %>%
ungroup
}
interpolate_rows(test_dt)
# iso1 iso2 year
# <chr> <chr> <dbl>
# 1 BGD AFG 2006
# 2 BGD AFG 2007
# 3 BGD AFG 2008
# 4 BTN AFG 2006
# 5 BTN AFG 2007
# 6 BTN AFG 2008
# 7 IND AFG 2003
# 8 IND AFG 2004
# 9 IND AFG 2005
#10 IND AFG 2006
#11 IND AFG 2007
#12 IND AFG 2008
Another way -
library(data.table)
interpolate_rows <- function(dt, stop_year = 2008) {
vals <- seq(dt$year, stop_year)
dt[rep(1, length(vals))][, year := vals]
}
rbindlist(by(test_dt, seq(nrow(test_dt)), interpolate_rows))
