Coordinate descent doesn’t always work for convex functions

In optimization, if you can convert your problem into minimizing a convex function, you are usually done. That’s because for convex functions, local minima are guaranteed to be global minima. Hence, you can use any of the several optimization algorithms (e.g. gradient descent) to reach a local minima: you are guaranteed to minimize your function there.

One of the common algorithms used to minimize convex functions is coordinate descent. Let’s say our function $f$ is a convex function in $d$ variables $x_1, \dots, x_d$. Coordinate descent cycles through the variables: for each variable $x_i$, we temporarily fix all other variables, and treat $f$ as a function in just $x_i$. We then find the value of $x_i$ which minimizes this single variable function and replace our old $x_i$ with this value. We cycle through the coordinates as many times as we need to reach convergence.

It turns out that coordinate descent can get stuck in places which are NOT global minima, even for simple convex functions. Here is an example: $f(x, y) = \max(x,y) + |x-y|$. Below is a representation of its graph: You can think of the function’s graph as a piece of paper which is folded away from you. From the diagram it’s obviously convex. (Another way to show it’s convex: the first term is the maximum of two convex functions, and hence is convex. The second term is a convex function of an affine function, and hence is convex. The sum of two convex functions is convex.)

You can see that the global minimum should be $-\infty$. However, here are the coordinate descent updates:

1. Fix $y$. The value of $x$ which minimizes $f$ is $x = y$. (When $x > y$, decreasing $x$ decreases both terms until we hit $x = y$. When $x < y$, increasing $x$ doesn’t change the first term but reduces the second term until we hit $x = y$.)
2. Fix $x$. Since $f$ is symmetric in $x$ and $y$, the argument above applies, meaning that $y$ stays at its same value.

The coordinate descent algorithm then terminates at the point $(x, x)$, which is clearly not a global minimum!