4 Assessing Model Fit

5 Assessing Model Fit

5.1 Deviance

Deviance compares a fitted model to a saturated model (perfect fit). Smaller deviance generally indicates better fit, and differences in deviance can be used for likelihood ratio tests.

fit_null <- glm(am ~ 1, data = mtcars, family = binomial)
fit_full <- glm(am ~ wt + hp + factor(cyl), data = mtcars, family = binomial)

anova(fit_null, fit_full, test = "Chisq")

Analysis of Deviance Table

Model 1: am ~ 1
Model 2: am ~ wt + hp + factor(cyl)
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1        31     43.230                          
2        27      7.255  4   35.974 2.929e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Key-point 4.1

Logistic regression does not use R-squared in the same way as linear regression. Model comparison is typically done with likelihood-based tools (deviance, likelihood ratio tests) and information criteria (AIC).

5.2 Goodness-of-fit tests

Some goodness-of-fit tests (e.g., Hosmer–Lemeshow) require additional packages and choices about grouping. In practice, you should combine multiple checks:

Likelihood-based comparisons between nested models
Predictive performance (next chapter)
Residual and influence diagnostics for unusual points

5.3 Pseudo R-squared

Pseudo R-squared measures (e.g., McFadden’s) are based on likelihoods and are best used for rough comparisons rather than as a direct analogue of linear-model R-squared.

fit_null <- glm(am ~ 1, data = mtcars, family = binomial)
fit_full <- glm(am ~ wt + hp + factor(cyl), data = mtcars, family = binomial)

mcfadden_r2 <- 1 - as.numeric(logLik(fit_full) / logLik(fit_null))
mcfadden_r2

[1] 0.8321671

5.4 AIC

AIC(fit_null, fit_full)

         df      AIC
fit_null  1 45.22973
fit_full  5 17.25537