I'm working with a time series database that has no NA, what I mean by that is that after a row that should have missing data or an NA the column that tracks times jumps to the next value without recording the time stamp of the NA. for example:
time - value
1 - 50kg
2 - 60kg
4- 45kg
there is no recording of the NA, but it is implicit or perhaps a pattern that there's missing data, is ther a good package to handle this kind of missing data?I've tried using 'naniar', but it doesn't work if I don't have NAs
I'm looking for a package that identifies this and imputes the missing data
Thank you!
CodePudding user response:
One way to deal with this is to create a frame of "all times" (even the missing ones) and then merge it back in.
dat <- data.frame(time = c(1L, 2L, 4L), value = c(50, 60, 45))
dat
# time value
# 1 1 50
# 2 2 60
# 3 4 45
times <- data.frame(time = seq(min(dat$time), max(dat$time), by = 1))
times
# time
# 1 1
# 2 2
# 3 3
# 4 4
merge(dat, times, by = "time", all = TRUE)
# time value
# 1 1 50
# 2 2 60
# 3 3 NA
# 4 4 45
I included a more-verbose call to seq with by= in case your real data has a slightly-different structure. For instance, if those are POSIXt, then you may want to change that to be by="1 day" or by="1 hour". Either way, you control the gaps there.
For more information about merges, see How to join (merge) data frames (inner, outer, left, right) and What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?.
(This gets a little more complicated if the interval between rows is inconsistent, or if the real time variables are not aligned perfectly on an integer-like component.)
CodePudding user response:
It isn't clear what you have -- a text file with literally the text shown? a data frame? other? so we will assume that we have Lines copied verbatim from the question in the Note at the end except we added one more row.
Read it into a zoo series using read.zoo as such objects can represent irregularly spaced series. (read.zoo can also read files and data frames.) Next, convert that to a ts series which can only represent regularly spaced series and so the conversion causes the empty spots to be filled with NA's. Then use na.locf (last occurrence carried forward), na.approx (linear interpolation) or na.spline (spline interpolation) to fill in the NA's. Leave it as a ts series or convert it back to a zoo series using as.zoo or to a data frame using fortify.zoo.
library(zoo)
z <- read.zoo(text = Lines, header = TRUE, sep = "-", strip.white = TRUE,
comment.char = "k")
tt <- as.ts(z)
na.approx(tt)
## Time Series:
## Start = 1
## End = 5
## Frequency = 1
## [1] 50.0 60.0 52.5 45.0 46.0
Note
Lines <- "time - value
1 - 50kg
2 - 60kg
4- 45kg
5 - 46kg"
