I have the data like this:
df <- tibble::tibble(
id = rep(c(1:50), each = 5),
y = runif(250,min = 0, max = 1),
x1 = rnorm(250, mean = 0, sd=1),
x2 = rnorm(250, mean = 0, sd=1),
x3 = rnorm(250, mean = 0, sd=1),
x4 = rnorm(250, mean = 0, sd=1),
x5 = rnorm(250, mean = 0, sd=1),
) %>%
group_by(id) %>%
mutate(year = rep(c(2001:2005)))
I would like to estimate the probit model for every year to get (1)coefficient estimates,and (2) predicted value of y, and (3) number of observations used to estimate the model:
probit_model <- function(df) {
glm (y ~ x1 x2 x3 x4 x5,
family = binomial(link = "probit"),
data = df)
}
Do you know how we can get the coefficient estimates, predicted value for every year and then combine them with the original data (that is df) here? I know what we can do with OLS model (by using map function to estimate for multiple models). But I do not know how to do with probit regression.
Thank you so much.
CodePudding user response:
I think you need to do this, I used this post as reference.
library(dplyr)
df <- tibble::tibble(
id = rep(c(1:50), each = 5),
y = runif(250,min = 0, max = 1),
x1 = rnorm(250, mean = 0, sd=1),
x2 = rnorm(250, mean = 0, sd=1),
x3 = rnorm(250, mean = 0, sd=1),
x4 = rnorm(250, mean = 0, sd=1),
x5 = rnorm(250, mean = 0, sd=1),
) %>%
group_by(id) %>%
mutate(year = rep(c(2001:2005)))
fitted_models = df %>% group_by(year) %>% do(model = glm(y ~ x1 x2 x3 x4 x5,
family = binomial(link = "probit"),
data = .))
#fitted_models$year
#fitted_models$model[1]
fitted_models %>% summarise(broom::tidy(model))
## A tibble: 30 x 5
#term estimate std.error statistic p.value
#<chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) -0.160 0.187 -0.856 0.392
#2 x1 0.0860 0.230 0.375 0.708
#3 x2 0.0657 0.187 0.351 0.725
#4 x3 0.0472 0.160 0.296 0.767
#5 x4 0.216 0.191 1.13 0.257
#6 x5 -0.159 0.263 -0.604 0.546
#7 (Intercept) -0.0792 0.182 -0.434 0.664
#8 x1 0.0314 0.170 0.185 0.853
#9 x2 -0.0320 0.194 -0.164 0.869
#10 x3 0.167 0.218 0.763 0.445
## ... with 20 more rows
fitted_models %>% summarise(broom::glance(model))
## A tibble: 5 x 8
#null.deviance df.null logLik AIC BIC deviance df.residual nobs
#<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
# 1 21.7 49 -32.5 77.0 88.5 19.7 44 50
#2 16.4 49 -33.4 78.8 90.3 15.7 44 50
#3 15.5 49 -34.5 81.1 92.5 15.2 44 50
#4 16.6 49 -32.4 76.7 88.2 15.0 44 50
#5 19.6 49 -33.3 78.6 90.0 19.1 44 50
fitted_models %>% summarise(broom::augment(model, type.predict = "response"))
## A tibble: 250 x 12
#y x1 x2 x3 x4 x5 .fitted .resid .std.resid .hat .sigma .cooksd
#<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0.819 0.0246 0.0176 0.280 0.192 0.840 0.407 0.846 0.875 0.0665 0.664 0.00894
#2 0.0418 1.41 0.297 1.15 -1.41 0.347 0.372 -0.792 -0.853 0.137 0.665 0.0144
#3 0.119 -0.265 -0.158 -1.37 -2.48 -0.504 0.237 -0.300 -0.327 0.156 0.676 0.00284
#4 0.0282 -0.836 -0.442 -1.63 0.506 0.910 0.355 -0.808 -0.858 0.114 0.665 0.0112
#5 0.893 -0.481 -0.384 -0.974 0.897 -0.662 0.510 0.819 0.850 0.0703 0.665 0.00792
#6 0.865 0.417 -0.0233 0.841 -0.268 -0.140 0.451 0.865 0.883 0.0395 0.664 0.00494
#7 0.809 1.30 -0.469 1.01 -0.0913 -0.106 0.486 0.669 0.702 0.0921 0.669 0.00778
#8 0.0220 0.119 -0.580 -0.533 -1.09 0.0142 0.326 -0.780 -0.801 0.0522 0.666 0.00406
#9 0.722 0.194 -1.50 -0.395 1.65 -0.868 0.592 0.271 0.297 0.167 0.676 0.00281
#10 0.131 1.24 0.600 1.14 -1.17 0.370 0.392 -0.579 -0.618 0.122 0.671 0.00756
## ... with 240 more rows
CodePudding user response:
A similar answer to @cdcarrion's, from the same post, but using map (a slightly newer approach than do()):
fit the models
library(broom)
models <- (df
%>% group_by(year)
%>% nest()
%>% mutate(model = map(data, glm,
formula = y ~ x1 x2 x3 x4 x5,
family = binomial(link = "probit")))
)
get coefficients
coefs <- (models
%>% mutate(cc = map(model, tidy))
%>% select(year, cc)
%>% unnest(cols = cc)
)
get predictions
preds <- (models
%>% mutate(aug = map(model, augment, type.predict = "response"))
%>% select(year, aug)
%>% unnest(cols = aug)
%>% select(year:x5, .fitted)
)
