Consider the unconstrained minimization problem
where is twice differentiable and
. Newton’s method is a second-order descent method for finding the minimum. Starting at some initial point, at the
th iteration we update the candidate solution with the formula
where and
are the gradient and Hessian of
respectively, and
is a step size chosen appropriately (e.g. with backtracking line search). Another way to write the update is
That is, we (i) compute the gradient and Hessian of at
, (ii) solve for
, then (iii) update
according to the second line.
A big drawback of Newton’s method is that computing the Hessian can be very expensive. Quasi-Newton methods address this problem: update with
where is an approximation of
which is easy to compute.
Quasi-Newton methods include SR1, DFP, BFGS and L-BFGS, which I hope to describe in future posts.
The secant equation
The question becomes how to generate the ‘s. Writing the Taylor expansion of
around
,
It seems reasonable for to play the role of
above, i.e.
or equivalently
where and
. This last equation is known as the secant equation, and is sometimes written as
for brevity. Note that this equation involves
equations in
unknowns, so there is a lot of freedom in how to choose the
‘s. On top of satisfying the secant equation, we often require 3 other conditions:
is symmetric.
is “close” to
.
being positive definite implies that
is positive definite.
Each quasi-Newton iteration now looks like this:
- Solve for
in
.
- Update
with
.
- Update
according to some procedure so that the secant equation (and other conditions) are satisfied.
The different quasi-Newton methods amount to different updates for .
References:
- Peña, J. (2016). Quasi-Newton Methods. (CMU Convex Optimization: Fall 2016, Lecture on Nov 2.)
Reblogged this on Statistical Methods, Paradigms and Application and commented:
Interesting and Really good explanation here:
LikeLike