Glossary

Logistic regression

A regression model for binary outcomes that models the log-odds of an event as a linear function of predictors.

Binary outcome

A response variable with two possible values, often coded as 0 and 1.

Probability

A number between 0 and 1 that represents the chance of an event.

Odds

The ratio of the probability of an event to the probability of it not occurring: p / (1 - p).

Logit

The log of the odds: log(p / (1 - p)).

Link function

A function that connects the mean of the response to the linear predictor; for logistic regression the link is the logit.

Linear predictor

The linear combination of predictors, for example beta0 + beta1 x.

Coefficient

A model parameter that describes how a predictor changes the log-odds, holding other predictors constant.

Odds ratio

The multiplicative change in odds for a one-unit increase in a predictor; exp(coefficient).

Predicted probability

The model’s estimate of the probability of an event for a given set of predictors.

Likelihood

A measure of how well a model explains the observed data; higher is better.

Deviance

A goodness-of-fit measure based on the likelihood; lower is better.

Akaike Information Criterion

A model comparison metric that balances fit and complexity; lower values indicate a preferred model among those compared.

Confusion matrix

A table of predicted versus actual outcomes used to summarize classification performance.

Sensitivity

The proportion of true positives correctly identified by the model.

Specificity

The proportion of true negatives correctly identified by the model.

Accuracy

The proportion of all predictions that are correct.

ROC curve

A curve showing the trade-off between sensitivity and 1 - specificity across thresholds.

AUC

Area under the ROC curve; a summary of classification performance across thresholds.

Calibration

How closely predicted probabilities match observed event rates.

Separation

A situation where predictors perfectly separate outcomes, leading to unstable estimates.

Class imbalance

When one outcome class is much more frequent than the other.

Interaction

A term that allows the effect of one predictor to depend on another.

Factor

A categorical predictor in R that is represented by discrete levels.