3  Fitting Logistic Regression Models

4 Fitting Logistic Regression Models

4.1 Maximum likelihood estimation

Logistic regression parameters are typically estimated by maximum likelihood. In R, glm(..., family = binomial) fits logistic regression models using likelihood-based methods.

fit_wt <- glm(am ~ wt, data = mtcars, family = binomial)
logLik(fit_wt)
'log Lik.' -9.588042 (df=2)

4.2 Fitting a model in R

fit_wt <- glm(am ~ wt, data = mtcars, family = binomial)
summary(fit_wt)

Call:
glm(formula = am ~ wt, family = binomial, data = mtcars)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   12.040      4.510   2.670  0.00759 **
wt            -4.024      1.436  -2.801  0.00509 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 19.176  on 30  degrees of freedom
AIC: 23.176

Number of Fisher Scoring iterations: 6
Key-point 3.1

Coefficients are on the log-odds scale. To interpret them as odds ratios, exponentiate with exp().

4.3 Interpreting coefficients

fit_ex <- glm(am ~ wt + hp, data = mtcars, family = binomial)
coef(fit_ex)
(Intercept)          wt          hp 
 18.8662987  -8.0834752   0.0362556 
exp(coef(fit_ex))
 (Intercept)           wt           hp 
1.561455e+08 3.085967e-04 1.036921e+00 
Exercise 3.1

Compute the odds ratio for wt and store it as or_wt.

or_wt <- exp(coef(fit_ex)["wt"]) or_wt
or_wt <- exp(coef(fit_ex)["wt"])
or_wt

4.4 Inference in logistic regression

For quick inference, you can use Wald tests from summary() and Wald confidence intervals via confint.default().

summary(fit_ex)

Call:
glm(formula = am ~ wt + hp, family = binomial, data = mtcars)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept) 18.86630    7.44356   2.535  0.01126 * 
wt          -8.08348    3.06868  -2.634  0.00843 **
hp           0.03626    0.01773   2.044  0.04091 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 10.059  on 29  degrees of freedom
AIC: 16.059

Number of Fisher Scoring iterations: 8
confint.default(fit_ex)
                    2.5 %     97.5 %
(Intercept)   4.277193002 33.4554044
wt          -14.097967884 -2.0689825
hp            0.001497294  0.0710139

4.5 Categorical predictors

fit_cat <- glm(am ~ wt + factor(cyl), data = mtcars, family = binomial)
summary(fit_cat)

Call:
glm(formula = am ~ wt + factor(cyl), family = binomial, data = mtcars)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)    20.853      8.032   2.596  0.00942 **
wt             -7.859      3.055  -2.573  0.01009 * 
factor(cyl)6    3.105      2.425   1.280  0.20042   
factor(cyl)8    5.379      3.201   1.681  0.09281 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 14.661  on 28  degrees of freedom
AIC: 22.661

Number of Fisher Scoring iterations: 7

When a predictor is a factor, coefficients compare each level to a reference level. Interpret them as odds ratios relative to that baseline.