eta <- c(-4, -2, 0, 2, 4)
data.frame(eta = eta, p = plogis(eta)) eta p
1 -4 0.01798621
2 -2 0.11920292
3 0 0.50000000
4 2 0.88079708
5 4 0.98201379
Logistic regression is a special case of a generalised linear model (GLM):
The logistic function maps any real number to a value between 0 and 1. In R, this is plogis().
eta <- c(-4, -2, 0, 2, 4)
data.frame(eta = eta, p = plogis(eta)) eta p
1 -4 0.01798621
2 -2 0.11920292
3 0 0.50000000
4 2 0.88079708
5 4 0.98201379
For a single observation with probability of success \(p\), a binary outcome is:
\[ Y \sim \text{Bernoulli}(p) \]
More generally, for \(n\) trials:
\[ Y \sim \text{Binomial}(n, p) \]
dbinom(x = 0:3, size = 3, prob = 0.4)[1] 0.216 0.432 0.288 0.064
If \(p\) is the probability of an event, the odds are \[ \\text{odds} = \\frac{p}{1 - p} \] The logit is the log of the odds: \[ \\text{logit}(p) = \\log\\left(\\frac{p}{1 - p}\\right) \]
p <- 0.2
odds <- p / (1 - p)
logit <- log(odds)
c(p = p, odds = odds, logit = logit) p odds logit
0.200000 0.250000 -1.386294
A change of 1 unit in the logit scale multiplies the odds by about 2.72 (because \(e^1 \\approx 2.72\)).
The logit scale is linear, but the probability scale is S-shaped. This is why logistic regression keeps predicted probabilities in the 0 to 1 range.
eta <- seq(-6, 6, length.out = 200)
plot(eta, plogis(eta), type = "l", lwd = 2,
xlab = "Linear predictor (eta)", ylab = "Probability")
abline(h = c(0, 1), col = "grey80", lty = 3)