glm binomial regression. How can I check what output is predicting-CodePudding

Stupid question, but I want to be sure. glm(outcome ~ cva, family = "binomial", data=df, x=TRUE, y=TRUE)

Predictors  Odds    p
(Intercept) 0.16    <0.001
cvaTRUE     1.95    0.029

My 'outcome' variable is YES or NO as.factor

How can I be sure this glm is giving the Odds of YES and not NO? ie I want to be confident that this is saying "if cva TRUE then Odds 1.95 for outcome YES.

CodePudding user response：

With a binary response taking the value of either 0 or 1, the model estimates the odds that outcome is equal to 1. So, if YES is coded as 1 then you can be sure that the odds of 1.95 are for outcome YES.

CodePudding user response：

If outcome is a factor with levels "NO" and "YES" and cva is a logical vector, then

coef(glm(outcome ~ cva, family = binomial, data = df))

shows you (per unit changes in) log odds of "YES" rather than "NO" if and only if "NO" is the first element of levels(outcome). This requirement is documented in ?family:

For the binomial and quasibinomial families the response can be specified in one of three ways:

As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).

If you find that outcome is not coded this way, then do

df$outcome <- relevel(df$outcome, "NO")

to replace outcome in df with a semantically equivalent factor whose first level is "NO".

FWIW, here is one way to check that glm behaves as documented in your use case:

## Simulated data set
set.seed(1L)
n <- 100L
df <- data.frame(
  outcome = factor(sample(0:1, size = n, replace = TRUE), levels = 0:1, labels = c("NO", "YES")),
  cva = sample(c(FALSE, TRUE), size = n, replace = TRUE)
)

## Contingency table
tt <- table(df$outcome, df$cva)

## Sample odds ratio
r <- (tt["YES", "TRUE"] / tt["NO", "TRUE"]) / (tt["YES", "FALSE"] / tt["NO", "FALSE"])
## Estimated odds ratio when first level is "NO"
m0 <- glm(outcome ~ cva, family = binomial, data = df)
r0 <- exp(coef(m0))[[2L]]
## Reciprocal estimated odds ratio when first level is "YES"
m1 <- glm(relevel(outcome, "YES") ~ cva, family = binomial, data = df)
r1 <- exp(-coef(m1))[[2L]]

print(c(r, r0, r1), digits = 20, width = 30)

[1] 0.85565476190476186247
[2] 0.85565476230266901414
[3] 0.85565476230266912516