In optimization, if you can convert your problem into minimizing a convex function, you are usually done. That’s because for convex functions, local minima are guaranteed to be global minima. Hence, you can use any of the several optimization algorithms (e.g. gradient descent) to reach a local minima: you are guaranteed to minimize your function there.
One of the common algorithms used to minimize convex functions is coordinate descent. Let’s say our function is a convex function in
variables
. Coordinate descent cycles through the variables: for each variable
, we temporarily fix all other variables, and treat
as a function in just
. We then find the value of
which minimizes this single variable function and replace our old
with this value. We cycle through the coordinates as many times as we need to reach convergence.
It turns out that coordinate descent can get stuck in places which are NOT global minima, even for simple convex functions. Here is an example: . Below is a representation of its graph:
You can think of the function’s graph as a piece of paper which is folded away from you. From the diagram it’s obviously convex. (Another way to show it’s convex: the first term is the maximum of two convex functions, and hence is convex. The second term is a convex function of an affine function, and hence is convex. The sum of two convex functions is convex.)
You can see that the global minimum should be . However, here are the coordinate descent updates:
- Fix
. The value of
which minimizes
is
. (When
, decreasing
decreases both terms until we hit
. When
, increasing
doesn’t change the first term but reduces the second term until we hit
.)
- Fix
. Since
is symmetric in
and
, the argument above applies, meaning that
stays at its same value.
The coordinate descent algorithm then terminates at the point , which is clearly not a global minimum!