Consider the unconstrained minimization problem

where is twice differentiable and . **Newton’s method** is a second-order descent method for finding the minimum. Starting at some initial point, at the th iteration we update the candidate solution with the formula

where and are the gradient and Hessian of respectively, and is a step size chosen appropriately (e.g. with backtracking line search). Another way to write the update is

That is, we (i) compute the gradient and Hessian of at , (ii) solve for , then (iii) update according to the second line.

A big drawback of Newton’s method is that computing the Hessian can be very expensive. **Quasi-Newton methods** address this problem: update with

where is an approximation of which is easy to compute.

Quasi-Newton methods include SR1, DFP, BFGS and L-BFGS, which I hope to describe in future posts.

**The secant equation**

The question becomes how to generate the ‘s. Writing the Taylor expansion of around ,

It seems reasonable for to play the role of above, i.e.

or equivalently

where and . This last equation is known as the **secant equation**, and is sometimes written as for brevity. Note that this equation involves equations in unknowns, so there is a lot of freedom in how to choose the ‘s. On top of satisfying the secant equation, we often require 3 other conditions:

- is symmetric.
- is “close” to .
- being positive definite implies that is positive definite.

Each quasi-Newton iteration now looks like this:

- Solve for in .
- Update with .
- Update according to some procedure so that the secant equation (and other conditions) are satisfied.

The different quasi-Newton methods amount to different updates for .

References:

- Peña, J. (2016). Quasi-Newton Methods. (CMU Convex Optimization: Fall 2016, Lecture on Nov 2.)

### Like this:

Like Loading...

*Related*

Reblogged this on Statistical Methods, Paradigms and Application and commented:

Interesting and Really good explanation here:

LikeLike