Consider the unconstrained minimization problem
where is twice differentiable and . Newton’s method is a second-order descent method for finding the minimum. Starting at some initial point, at the th iteration we update the candidate solution with the formula
where and are the gradient and Hessian of respectively, and is a step size chosen appropriately (e.g. with backtracking line search). Another way to write the update is
That is, we (i) compute the gradient and Hessian of at , (ii) solve for , then (iii) update according to the second line.
A big drawback of Newton’s method is that computing the Hessian can be very expensive. Quasi-Newton methods address this problem: update with
where is an approximation of which is easy to compute.
Quasi-Newton methods include SR1, DFP, BFGS and L-BFGS, which I hope to describe in future posts.
The secant equation
The question becomes how to generate the ‘s. Writing the Taylor expansion of around ,
It seems reasonable for to play the role of above, i.e.
where and . This last equation is known as the secant equation, and is sometimes written as for brevity. Note that this equation involves equations in unknowns, so there is a lot of freedom in how to choose the ‘s. On top of satisfying the secant equation, we often require 3 other conditions:
- is symmetric.
- is “close” to .
- being positive definite implies that is positive definite.
Each quasi-Newton iteration now looks like this:
- Solve for in .
- Update with .
- Update according to some procedure so that the secant equation (and other conditions) are satisfied.
The different quasi-Newton methods amount to different updates for .
- Peña, J. (2016). Quasi-Newton Methods. (CMU Convex Optimization: Fall 2016, Lecture on Nov 2.)
Reblogged this on Statistical Methods, Paradigms and Application and commented:
Interesting and Really good explanation here: