6  Module 4: Extensions and a Mini Case Study

7 Extensions and a Mini Case Study

7.1 Learning outcomes

  • Fit multivariable logistic models with categorical predictors.
  • Use interactions when effects differ by group.
  • Summarize results as predicted probabilities for realistic scenarios.

7.2 Multivariable logistic regression

fit_multi <- glm(am ~ wt + hp + factor(cyl), data = mtcars, family = binomial)
summary(fit_multi)

Call:
glm(formula = am ~ wt + hp + factor(cyl), family = binomial, 
    data = mtcars)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)  
(Intercept)   18.09383    9.46271   1.912   0.0559 .
wt           -10.67598    5.44182  -1.962   0.0498 *
hp             0.10321    0.09606   1.074   0.2826  
factor(cyl)6   2.76575    3.15679   0.876   0.3810  
factor(cyl)8  -8.38896   13.16709  -0.637   0.5240  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.2297  on 31  degrees of freedom
Residual deviance:  7.2554  on 27  degrees of freedom
AIC: 17.255

Number of Fisher Scoring iterations: 9
Key-point 6.1

Each coefficient is a partial effect on the log-odds scale, holding the other predictors constant.

7.3 Interactions for subgroup effects

If the effect of a predictor changes across groups, use an interaction term.

fit_interaction <- glm(am ~ wt * factor(cyl), data = mtcars, family = binomial)
summary(fit_interaction)

Call:
glm(formula = am ~ wt * factor(cyl), family = binomial, data = mtcars)

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)        13.758      7.787   1.767   0.0773 .
wt                 -5.116      2.984  -1.714   0.0865 .
factor(cyl)6      313.415  71542.593   0.004   0.9965  
factor(cyl)8       14.544     22.810   0.638   0.5237  
wt:factor(cyl)6  -102.300  23338.031  -0.004   0.9965  
wt:factor(cyl)8    -3.340      6.854  -0.487   0.6261  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 12.593  on 26  degrees of freedom
AIC: 24.593

Number of Fisher Scoring iterations: 19

7.4 Communicating results with scenarios

new_cars <- data.frame(
  wt = c(2.4, 3.2),
  hp = c(100, 150),
  cyl = factor(c(4, 6))
)

predict(fit_multi, newdata = new_cars, type = "response")
        1         2 
0.9422625 0.8982382 

7.5 Mini case study template (dataset to be added)

Use this checklist to structure the final case study once the dataset is chosen.

  1. Define the outcome variable and clarify what counts as an event.
  2. Summarize the baseline event rate and missing data.
  3. Choose a small set of predictors with clear interpretations.
  4. Fit a logistic model and report odds ratios and predicted probabilities.
  5. Evaluate the model with an appropriate threshold and a confusion matrix.
  6. Communicate results in plain language with scenario predictions.
Note 6.1

The final case study will be updated once the outcome variable is confirmed for the selected research dataset.

7.6 Reporting template

Use this outline to write up results in plain language.

  • Outcome definition, sample size, and event rate.
  • Model specification (predictors and reference levels).
  • Key effects as odds ratios with a short interpretation.
  • Predicted probabilities for 2 to 3 realistic scenarios.
  • Performance summary (threshold, confusion matrix metrics, AUC).
  • Limitations and next steps.

Template paragraph:

“We modeled [outcome] using logistic regression with predictors [list]. The event rate was [rate]. A one-unit increase in [predictor] was associated with [odds ratio] times the odds of the event, holding other variables constant. For a typical case ([scenario]), the predicted probability was [probability]. Using a threshold of [threshold], the model achieved [accuracy/sensitivity/specificity] and an AUC of [AUC]. Key limitations include [limitations].”

7.7 Practice prompts

  • Fit the model with factor(gear) instead of factor(cyl). What changes?
  • Compare predictions from fit_multi and fit_interaction for the same scenario.
  • Draft a one-paragraph summary for a non-technical audience.