Using the Weekly Dataset from ISLR pacakge on R:
> head(Weekly)
Year Lag1 Lag2 Lag3 Lag4 Lag5 Volume Today Direction
1 1990 0.816 1.572 -3.936 -0.229 -3.484 0.1549760 -0.270 Down
2 1990 -0.270 0.816 1.572 -3.936 -0.229 0.1485740 -2.576 Down
3 1990 -2.576 -0.270 0.816 1.572 -3.936 0.1598375 3.514 Up
4 1990 3.514 -2.576 -0.270 0.816 1.572 0.1616300 0.712 Up
5 1990 0.712 3.514 -2.576 -0.270 0.816 0.1537280 1.178 Up
6 1990 1.178 0.712 3.514 -2.576 -0.270 0.1544440 -1.372 Down
Trying to use Logistic Regression to regress Direction on *all Lag variables and Volume*, and tried to use the "all except" shortcut on R to exlcude Year and Today:
logregall <- glm(Direction ~ . - Today - Year,
family=binomial(link='logit'), data = Weekly)
But when I try to use this same object to make predictions, R somehow gives the error that I have forgotten to include Year in the 'newdata' dataframe despite not including Year:
dataforpred <- Weekly[,2:7]
preds <- predict(object = logregall, newdata = dataforpred, type = "response")
> preds <- predict(object = logregall, newdata = dataforpred, type = "response")
Error in eval(predvars, data, env) : object 'Year' not found
But when I regress by keying all variables manually, I get a fitted object that works for predict()
logregall2 <- glm(Direction ~ Lag1 Lag2 Lag3 Lag4 Lag5 Volume,
family=binomial(link='logit'), data = Weekly)
preds <- predict(object = logregall2, newdata = dataforpred, type = "response")
> head(preds)
1 2 3 4 5 6
0.6086249 0.6010314 0.5875699 0.4816416 0.6169013 0.5684190
Why is this the case?
CodePudding user response:
I don't have the package but I can replicate the error with mtcars dataset. I believe the reason is because you specified to remove some columns with -, so what the function does is to remove those columns first and then performs the prediction. It gets error out since it could not find those columns in the newdata.
Therefore, the solution is to manually assign arbitrary values to the columns.
fit <- glm(vs~. -mpg-cyl,data=mtcars,
family=binomial(link='logit'))
dataforpred <- mtcars[,c(3:7,9:11)]
preds <- predict(object = fit, newdata = dataforpred, type = "response")
Error in eval(predvars, data, env) : object 'cyl' not found
#solution
dataforpred2 <- dataforpred%>%
mutate(mpg=NA_real_,
cyl=NA_real_)
preds2 <- predict(object = fit, newdata = dataforpred2, type = "response")
> preds2[1:5]
1 2 3 4 5
2.220446e-16 1.081386e-11 1.000000e 00 1.000000e 00 2.220446e-16
