Quasi-Newton methods in optimization

Consider the unconstrained minimization problem

\begin{aligned} \min_x f(x) \end{aligned}

where f : \mathbb{R}^n \mapsto \mathbb{R} is twice differentiable and \text{dom}(f) = \mathbb{R}^n. Newton’s method is a second-order descent method for finding the minimum. Starting at some initial point, at the kth iteration we update the candidate solution with the formula

x_{k+1} = x_k - t_k [\nabla^2 f(x_k)]^{-1} \nabla f(x_k),

where \nabla f and \nabla^2 f are the gradient and Hessian of f respectively, and t_k is a step size chosen appropriately (e.g. with backtracking line search). Another way to write the update is

\begin{aligned} \nabla^2 f(x_k) p_k &= - \nabla f(x_k), \\  x_{k+1} &= x_k + t_k p_k. \end{aligned}

That is, we (i) compute the gradient and Hessian of f at x_k, (ii) solve for p_k, then (iii) update x_{k+1} according to the second line.

A big drawback of Newton’s method is that computing the Hessian can be very expensive. Quasi-Newton methods address this problem: update x_{k+1} with

\begin{aligned} B_k p_k &= - \nabla f(x_k), \\  x_{k+1} &= x_k + t_k p_k, \end{aligned}

where B_k is an approximation of \nabla^2 f(x_k) which is easy to compute.

Quasi-Newton methods include SR1, DFP, BFGS and L-BFGS, which I hope to describe in future posts.

The secant equation

The question becomes how to generate the B_k‘s. Writing the Taylor expansion of \nabla f around x_{k+1},

\begin{aligned} \nabla f(x_{k+1}) \approx \nabla f(x_k) + \nabla^2 f(x_k) (x_{k+1} - x_k).  \end{aligned}

It seems reasonable for B_{k+1} to play the role of \nabla^2 f(x_k) above, i.e.

\begin{aligned} \nabla f(x_{k+1}) = \nabla f(x_k) + B_{k+1} (x_{k+1} - x_k),  \end{aligned}

or equivalently

\begin{aligned} B_{k+1} s_k = y_k, \end{aligned}

where s_k = x_{k+1} - x_k and y_k = \nabla f(x_{k+1}) - \nabla f(x_k). This last equation is known as the secant equation, and is sometimes written as B^+ s = y for brevity. Note that this equation involves n equations in n \times n unknowns, so there is a lot of freedom in how to choose the B_k‘s. On top of satisfying the secant equation, we often require 3 other conditions:

  1. B^+ is symmetric.
  2. B^+ is “close” to B.
  3. B being positive definite implies that B^+ is positive definite.

Each quasi-Newton iteration now looks like this:

  1. Solve for p_k in B_k p_k = - \nabla f(x_k).
  2. Update x with x_{k+1} = x_k + t_k p_k.
  3. Update B according to some procedure so that the secant equation (and other conditions) are satisfied.

The different quasi-Newton methods amount to different updates for B.

References:

  1. Peña, J. (2016). Quasi-Newton Methods. (CMU Convex Optimization: Fall 2016, Lecture on Nov 2.)

1 thought on “Quasi-Newton methods in optimization

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s