Stupid question, but I want to be sure.
glm(outcome ~ cva, family = "binomial", data=df, x=TRUE, y=TRUE)
Predictors Odds p
(Intercept) 0.16 <0.001
cvaTRUE 1.95 0.029
My 'outcome' variable is YES or NO as.factor
How can I be sure this glm is giving the Odds of YES and not NO? ie I want to be confident that this is saying "if cva TRUE then Odds 1.95 for outcome YES.
CodePudding user response:
With a binary response taking the value of either 0 or 1, the model estimates the odds that outcome is equal to 1. So, if YES is coded as 1 then you can be sure that the odds of 1.95 are for outcome YES.
CodePudding user response:
If outcome is a factor with levels "NO" and "YES" and cva is a logical vector, then
coef(glm(outcome ~ cva, family = binomial, data = df))
shows you (per unit changes in) log odds of "YES" rather than "NO" if and only if "NO" is the first element of levels(outcome). This requirement is documented in ?family:
For the binomial and quasibinomial families the response can be specified in one of three ways:
- As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
If you find that outcome is not coded this way, then do
df$outcome <- relevel(df$outcome, "NO")
to replace outcome in df with a semantically equivalent factor whose first level is "NO".
FWIW, here is one way to check that glm behaves as documented in your use case:
## Simulated data set
set.seed(1L)
n <- 100L
df <- data.frame(
outcome = factor(sample(0:1, size = n, replace = TRUE), levels = 0:1, labels = c("NO", "YES")),
cva = sample(c(FALSE, TRUE), size = n, replace = TRUE)
)
## Contingency table
tt <- table(df$outcome, df$cva)
## Sample odds ratio
r <- (tt["YES", "TRUE"] / tt["NO", "TRUE"]) / (tt["YES", "FALSE"] / tt["NO", "FALSE"])
## Estimated odds ratio when first level is "NO"
m0 <- glm(outcome ~ cva, family = binomial, data = df)
r0 <- exp(coef(m0))[[2L]]
## Reciprocal estimated odds ratio when first level is "YES"
m1 <- glm(relevel(outcome, "YES") ~ cva, family = binomial, data = df)
r1 <- exp(-coef(m1))[[2L]]
print(c(r, r0, r1), digits = 20, width = 30)
[1] 0.85565476190476186247
[2] 0.85565476230266901414
[3] 0.85565476230266912516
