# What is isotonic regression?

Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data $(x_1, y_1), \dots, (x_n, y_n) \in \mathbb{R}^2$ such that $x_1 < x_2 < \dots < x_n$. (We assume no ties among the $x_i$‘s for simplicity.) Informally, isotonic regression looks for $\beta_1, \dots, \beta_n \in \mathbb{R}$ such that the $\beta_i$‘s approximate the $y_i$‘s well while being monotonically non-decreasing. Formally, the $\beta_i$‘s are the solution to the optimization problem

\begin{aligned} \text{minimize}_{\beta_1, \dots, \beta_n} \quad& \sum_{i=1}^n (y_i - \beta_i)^2 \\ \text{subject to} \quad& \beta_1 \leq \dots \leq \beta_n. \end{aligned}

(Note: There is a corresponding solution for a monotonically non-increasing fit. Sometimes, the above is referred to as linear ordering isotonic regression, with isotonic regression referring to a more general version of the problem above. For more general versions, see Reference 1.)

Isotonic regression is useful for enforcing a monotonic fit to the data. Sometimes you might know that your trend is monotonic but the output from your model is not monotonic. In that situation you can use isotonic regression as a smoother for the data or a post-processing step to force your model prediction to be monotonic.

A commonly used algorithm to obtain the isotonic regression solution is the pool-adjacent-violators algorithm (PAVA). It runs in linear time and linear memory. At a high level it works like this: go from left to right and set $\beta_i = y_i$. If setting $\beta_i$ as such causes a violation of monotonicity (i.e. $\beta_i = y_i < y_{i-1} = \beta_{i-1}$), replace both $\beta_i$ and $\beta_{i-1}$ with the mean $(y_{i-1} + y_i)/2$. This move may result in earlier violations (i.e. the new $\beta_{i-1}$ may be less than $\beta_{i-2}$): if that happens, we need to go back and fix it via averaging.

The animation below works through an example of how PAVA works. (Essentially I wrote a homebrew version of PAVA but kept a record of the intermediate fits.) Note that there are a number of ways to implement PAVA: what you see below may not be the most efficient. Click here for the R code; you can amend the dataset there to get your own version of the animation below.

If you want to do isotonic regression in R, DON’T use my homebrew version, use the gpava function in the isotone package instead.

References:

1. Mair, P., Hornik, M., and de Leeuw, J. (2009). Isotone optimization in R: Pool-adjacent-violators algorithm (PAVA) and active set methods.

## 4 thoughts on “What is isotonic regression?”

1. Dear Editor
Using R code
Best regards
Salwa

Like

2. Hi Kjytay,
The animation did a really good job of presenting isotonic regression. Thank you for this.

Like

• Hmm.. it looks like stat wiki has closed down. I didn’t keep a copy… If you are interested in learning more, I would recommend the second reference, as well as the works cited in its intro section.

(Update: I’ve removed the stat wiki link.)

Like