Let’s say we have a convex, differentiable function and that we want to minimize it. In this previous post, we saw that a gradient descent step at time (from point ) could be viewed as the minimizer of a quadratic approximation of at :

* What if is not differentiable at all points?* If has some special structure, we are not totally hosed. Suppose can be decomposed as the sum of two functions , where is convex and differentiable, while is convex but

**not**differentiable. Instead of doing the gradient update as above (i.e. minimizing the quadratic approximation of around ), we can try to minimize the sum of and the quadratic approximation of around :

This update is known as the * generalized gradient update*. (Why? This update encompasses a number of gradient descent algorithms, as we’ll see in the examples below.)

Let’s try to solve this new minimization problem. To do that, for a convex function we introduce the notion of a * proximal operator*, defined as

With this notation,

* This reformulation is useful if the proximal operator can be computed cheaply.* We finish off this post with 3 examples where this is the case:

**Example 1:** for all .

That is, itself is convex and differentiable. In this case, , so , which is the usual gradient descent update.

**Example 2:** if , otherwise, for some set .

Here, the proximal operator reduces to , which is the usual Euclidean projection onto . Hence, this case corresponds to** projected gradient descent**.

**Example 3:** for some .

Here, the proximal operator reduces to

The minimization problem can be solved independently for each . The result is that , where is the soft-thresholding operator.

When applied to the LASSO minimization problem (i.e. and ), the proximal gradient update becomes

This method of solving the LASSO is sometimes called the * Iterative Soft-Thresholding Algorithm (ISTA)*.

References:

- Gordon, G. & Tibshirani, R. Generalized gradient descent.
- Hastie, T., Tibshirani, R., & Wainwright, M. Statistical learning with sparsity (Section 5.3).