I have a data table that looks something like this:
library(data.table)
set.seed(1)
# Number of rows in the data table
obs <- 10^2
# Generate representative data
DT <- data.table(
V1 = sample(x = 1:10, size = obs, replace = TRUE),
V2 = sample(x = 11:20, size = obs, replace = TRUE),
V3 = sample(x = 21:30, size = obs, replace = TRUE)
)
And a vectorized function fn_calibrate that calculates an output variable V4 based on an input variable opt:
fn_calibrate <- function(opt) {
# Calculate some new value V4 that's dependent on opt
DT[, V4 := V1 * sqrt(V2) / opt ]
# Calculate the residual sum of squares (RSS) between V4 and a target value V3
DT[, rss := abs(V3 - V4)^2]
# Return the RSS
return(DT[, rss])
}
Now, I would like to perform a rowwise optimization using the optimize function, i.e. find the value of opt that minimizes the RSS for each row.
I was hoping to achieve that with the data.table by = syntax, such as:
# Run the optimizer rowwise
DT[, opt := optimize(f = fn_calibrate, interval = c(0.1, 1), tol = .0015)$minimum, by = seq_len(nrow(DT))]
The code returns the error invalid function value in 'optimize' because the fn_calibrate function is currently written (DT[, ...]) to return a whole vector of rss of length nrow(DT), instead of a scalar for just one row at a time.
My question is: is there a way to have fn_calibrate return rowwise results to the optimizer as well?
Edit
I realize a related question was asked and answered here in the context of a data frame, though the accepted answer uses a for loop whereas I would rather use the efficient data table by syntax, if possible. The RepRex above is simple (100 rows), but my actual data table is larger (250K rows).
CodePudding user response:
fcn_calibrate doesn't need to be vectorized and use data.table syntax.
You could pass V1,V2,V3,opt as parameters and optimize on opt only :
fn_calibrate <- function(V1,V2,V3,opt) {
# Calculate some new value V4 that's dependent on opt
V4 = V1 * sqrt(V2) / opt
# Calculate the residual sum of squares (RSS) between V4 and a target value V3
rss = abs(V3 - V4)^2
# Return the RSS
return(rss)
}
DT[, opt := optimize(f = function(opt) fn_calibrate(V1,V2,V3,opt),
interval = c(0.1, 1), tol = .0015)$minimum,
by = seq_len(nrow(DT))]
V1 V2 V3 opt
<int> <int> <int> <num>
1: 9 13 21 0.9990479
2: 4 20 30 0.5962869
3: 7 13 24 0.9992591
4: 1 11 29 0.1142778
5: 2 16 29 0.2756422
6: 7 16 29 0.9656941
7: 2 14 29 0.2578275
8: 3 19 26 0.5028686
9: 1 15 26 0.1490109
...
