# Models for ordinal responses

A variable $y$ is said to be ordinal if it takes on values or categories which have a natural ordering to them but the distances between the categories is not known. Ordinal variables are very common: some examples include T-shirt sizes, the Likert scale used in surveys (“for each of the following statements, choose along the scale of strongly disagree to strongly agree”), income brackets and ranked variables.

In this post, I introduce 3 commonly used methods for fitting regression models with an ordinal response. In what follows, I assume that the response variable $y$ is ordinal, taking on category values $\{ 1, 2, \dots, c \}$. (As $y$ is ordinal, we should think of these values as category numbers instead of actual quantitative values.) We want to fit a model where we can predict $y$ based on a set of features or covariates $x \in \mathbb{R}^p$.

Cumulative logit model with proportional odds

The cumulative logit model with proportional odds assumes the following relationship between $y$ and $x$:

\begin{aligned} \text{logit}[P(y \leq j)] = \log \dfrac{P(y \leq j)}{P(y > j)} = \alpha_j + \beta^T x, \qquad j = 1, \dots, c-1. \end{aligned}

This is a natural extension of logistic regression: there, we have $c = 2$. Note also that for fixed $j$, the above looks like logistic regression where we have made the response binary (above $j$ vs. below or equal to $j$).

Note that in this model, each value of $j$ gets its own intercept $\alpha_j$ but the other coefficients $\beta$ are fixed across $j$. Because of this, the model satisfies the proportional odds property: for all $j$,

\begin{aligned} \log \left[ \dfrac{P(y \leq j \mid x_1) / P(y > j \mid x_1)}{P(y \leq j \mid x_2) / P(y > j \mid x_2)} \right] = \beta^T (x_1 - x_2). \end{aligned}

With this model, the estimated probability of being less than or equal to $j$ is

\begin{aligned} P(y \leq j) = \dfrac{\exp (\hat{\alpha}_j + \hat{\beta}^T x)}{1 + \exp (\hat{\alpha}_j + \hat{\beta}^T x)}. \end{aligned}

Hence, the estimated probability of being equal to $j$ is

\begin{aligned} P(y = j) &= P(y \leq j) - P(y \leq j - 1) \\ &= \dfrac{\exp (\hat{\alpha}_j + \hat{\beta}^T x)}{1 + \exp (\hat{\alpha}_j + \hat{\beta}^T x)} - \dfrac{\exp (\hat{\alpha}_{j-1} + \hat{\beta}^T x)}{1 + \exp (\hat{\alpha}_{j-1} + \hat{\beta}^T x)}. \end{aligned}

The adjacent-category logit model assumes the relationship

\begin{aligned} \log \dfrac{P(y = j)}{P(y = j + 1)} = \alpha_j + \beta^T x, \qquad j = 1, \dots, c-1. \end{aligned}

Here the odds ratio uses only adjacent categories, whereas in the cumulative logit model it uses the entire response scale. Hence, the interpretation of the model is in terms of local odds ratios as opposed to cumulative odds ratios.

Having fit the model, the estimated probability of being equal to $j$ is

\begin{aligned} P(y = j) = \dfrac{\exp (\hat{\alpha}_j + \hat{\beta}^T x)}{1 + \sum_{k=1}^{c-1} \exp(\hat{\alpha}_k + \hat{\beta}^T x) }. \end{aligned}

This is quite a different form from that for the cumulative logit model.

Continuation-ratio logit model

The continuation-ratio logit model assumes the relationship

\begin{aligned} \text{logit} [P(y = j \mid y \geq j)] = \log \dfrac{P(y = j \mid y \geq j)}{P(y > j \mid y \geq j)} = \alpha_j + \beta^T x, \qquad j = 1, \dots, c-1. \end{aligned}

Having fit the model, we have the estimated conditional probabilities

\begin{aligned} P(Y = j \mid Y \geq j) &= \dfrac{\exp (\hat{\alpha}_j + \hat{\beta}^T x)}{1 + \exp (\hat{\alpha}_j + \hat{\beta}^T x)}. \end{aligned}

With a bit of algebra we can convert them to unconditional probability estimates:

\begin{aligned} P(Y = j) = \dfrac{\exp (\hat{\alpha}_j + \hat{\beta}^T x)}{1 + \exp (\hat{\alpha}_j + \hat{\beta}^T x)} \prod_{k=1}^{j-1} \dfrac{1}{1 + \exp (\hat{\alpha}_k + \hat{\beta}^T x)}. \end{aligned}

References:

1. Agresti, A. (2010). Modeling Ordinal Categorical Data.