What is isotonic regression?

Isotonic regression is a method for obtaining a monotonic fit for 1-dimensional data. Let’s say we have data (x_1, y_1), \dots, (x_n, y_n) \in \mathbb{R}^2 such that x_1 < x_2 < \dots < x_n. (We assume no ties among the x_i‘s for simplicity.) Informally, isotonic regression looks for \beta_1, \dots, \beta_n \in \mathbb{R} such that the \beta_i‘s approximate the y_i‘s well while being monotonically non-decreasing. Formally, the \beta_i‘s are the solution to the optimization problem

\begin{aligned} \text{minimize}_{\beta_1, \dots, \beta_n} \quad& \sum_{i=1}^n (y_i - \beta_i)^2 \\ \text{subject to} \quad& \beta_1 \leq \dots \leq \beta_n. \end{aligned}

(Note: There is a corresponding solution for a monotonically non-increasing fit. Sometimes, the above is referred to as linear ordering isotonic regression, with isotonic regression referring to a more general version of the problem above. For more general versions, see Reference 1.)

Isotonic regression is useful for enforcing a monotonic fit to the data. Sometimes you might know that your trend is monotonic but the output from your model is not monotonic. In that situation you can use isotonic regression as a smoother for the data or a post-processing step to force your model prediction to be monotonic.

A commonly used algorithm to obtain the isotonic regression solution is the pool-adjacent-violators algorithm (PAVA). It runs in linear time and linear memory. At a high level it works like this: go from left to right and set \beta_i = y_i. If setting \beta_i as such causes a violation of monotonicity (i.e. \beta_i = y_i < y_{i-1} = \beta_{i-1}), replace both \beta_i and \beta_{i-1} with the mean (y_{i-1} + y_i)/2. This move may result in earlier violations (i.e. the new \beta_{i-1} may be less than \beta_{i-2}): if that happens, we need to go back and fix it via averaging.

The animation below works through an example of how PAVA works. (Essentially I wrote a homebrew version of PAVA but kept a record of the intermediate fits.) Note that there are a number of ways to implement PAVA: what you see below may not be the most efficient. Click here for the R code; you can amend the dataset there to get your own version of the animation below.

If you want to do isotonic regression in R, DON’T use my homebrew version, use the gpava function in the isotone package instead.

References:

  1. Mair, P., Hornik, M., and de Leeuw, J. (2009). Isotone optimization in R: Pool-adjacent-violators algorithm (PAVA) and active set methods.

4 thoughts on “What is isotonic regression?

  1. Dear Editor
    Please can you details stepes of application about isotonic Regression
    Using R code
    Best regards
    Salwa

    Like

  2. Hi Kjytay,
    The animation did a really good job of presenting isotonic regression. Thank you for this.
    I am interested in reading the first reference (stat wiki), but that link appears dead. Can you please update with a working link?

    Like

    • Hmm.. it looks like stat wiki has closed down. I didn’t keep a copy… If you are interested in learning more, I would recommend the second reference, as well as the works cited in its intro section.

      (Update: I’ve removed the stat wiki link.)

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s