Home > Back-end >  Inconsistent pvalues and confidence intervals
Inconsistent pvalues and confidence intervals

Time:01-25

Have been trying to fit a multiple logistic regression model in a given dataset but I seem to find 'strange results' with some variables. The p-values and the confidence intervals seem not to be consistent. However, when I tried fitting the exact same model in Stata I get consistent results. Is there a way this can be handled? Is there an option I need to specify to cater for this?... How would I proceed?

library(tidyverse)

set.seed(2021)

testdata <- tibble(
  var1 = rbinom(1114, 1, 0.12),
  var2 = rbinom(1114, 1, 0.82),
  var3 = rbinom(1114, 1, 0.60),
  var4 = rbinom(1114, 1, 0.18),
  var5 = rbinom(1114, 1, 0.12),
  var6 = rbinom(1114, 1, 0.05),
  var7 = rbinom(1114, 1, 0.63),
  var8 = rbinom(1114, 1, 0.20),
  var9 = rbinom(1114, 1, 0.06),
  var10 = rbinom(1114, 1, 0.40),
  var11 = rbinom(1114, 1, 0.35),
  var12 = rbinom(1114, 1, 0.32),
  outcome = rbinom(1114, 1, 0.04)
) %>%
  mutate(across(.cols = everything(), 
                ~factor(., levels = c(0, 1),
                        labels = c("No", "Yes"))))



mvariate.regress <- function(outcome, covariates, mydata) {
  form <- paste(outcome, "~",
                paste(covariates, collapse = "   "))

  model1 <- glm(as.formula(form),
                data = mydata, family = binomial)

  model1

}


ipvars <- paste0("var", 1:12)

mlogitfit <- mvariate.regress("outcome", ipvars, testdata)

summary(mlogitfit)
confint(mlogitfit)

var1 and var2 pvalues and CIs seem not to agree.

Stata Output 
        Coef.      Std. Err.    z     P>|z|      [95% Conf. Interval]
var1   .7269858    .360992     2.01   0.044     .0194544    1.434517
var2   -.6520712   .3250667   -2.01   0.045     -1.28919   -.0149521

Seems Stata computes CIs using a different method

CodePudding user response:

Following suggestions from the comments, I found this.

The reason why R gives different confidence intervals (but same coefficients, standard errors, ecc.) is the way they are computed by confint(), i.e., by profiling the likelihood. The default method of Stata should be based on the Wald method, that is on normal approximation. Indeed, running confint.default() on R returns the same Stata's confidence intervals.

I just summarised the answer @chl's answer here, so please consider go and upvote him.

  •  Tags:  
  • Related