Quasi-Newton methods: SR1

Introduction

In this previous post, we described how Quasi-Newton methods can be used to minimize a twice-differentiable function f whose domain is all of \mathbb{R}^n. At each iteration of the method, we take the following steps:

  1. Solve for p_k in B_k p_k = - \nabla f(x_k).
  2. Update x with x_{k+1} = x_k + t_k p_k.
  3. Update B according to some procedure.

Many methods update B so that it satisfies the secant equation:

\begin{aligned} B_{k+1} s_k = y_k, \end{aligned}

where s_k = x_{k+1} - x_k and y_k = \nabla f(x_{k+1}) - \nabla f(x_k). This is often written as B^+ s = y for brevity. Because this is a system of n equations with n \times n unknowns, there are many ways to update B.

Symmetric rank-one update (SR1)

One way to update B easily is to assume that the next B is the current B plus a symmetric rank-one matrix, i.e.

B^+ = B + auu^\top

for some a \in \mathbb{R} and u \in \mathbb{R}^n. Plugging this update into the secant equation and rearranging, we get

(au^\top s) u = y -  Bs,

which implies that u must be a multiple of y - Bs. Plugging this into the equation and solving for a, we obtain the update

\begin{aligned} B^+ = B + \dfrac{(y - Bs)(y - Bs)^\top}{(y - Bs)^\top s}. \end{aligned}

Recall from the quasi-Newton updates that we need to solve the equation B p = - \nabla f(x) for p, which means that we need to compute the inverse H = B^{-1} to get p = - H \nabla f(x). Because of the Sherman-Morrison-Woodbury formula, we can use the update to B to derive an update for H:

\begin{aligned} H^+ = H + \dfrac{(s - Hy)(s - Hy)^\top}{(s - Hy)^\top y}. \end{aligned}

This is known as the SR1 update. SR1 is simple to understand and implement, but has two shortcomings. First, the denominator in the update for B, (y - Bs)^\top s, might be approximately zero, causing numerical issues. Second, it does not preserve positive definiteness.

References:

  1. Peña, J. (2016). Quasi-Newton Methods. (CMU Convex Optimization: Fall 2016, Lecture on Nov 2.)

Leave a comment