BFGS
In this previous post, we described how Quasi-Newton methods can be used to minimize a twice-differentiable function whose domain is all of
. BFGS is a popular quasi-Newton method. At each iteration, we take the following steps:
- Solve for
in
.
- Update
with
.
- Update
according to the equation
where and
.
Limited memory BFGS (L-BFGS)
The issue with BFGS is that we have to maintain the matrix
, which is too costly when
is large. The main idea behind limited memory BFGS (L-BFGS) is that we don’t really need
: what we really want is
. If we can compute
implicitly by maintaining a few
vectors, then we never have to deal with an
matrix.
Let be any
vector. We have
Thus, if we had an efficient way to compute , we have an efficient way to compute
involving just
vectors:
- Compute
.
- Compute
.
- Compute
efficiently.
- Compute
.
- Compute
.
But we do have an efficient way to computing : simply replace
with
and
with
above! By recursively unfolding Step 3, we get the quasi-Newton update for iteration
(computation of
):
.
- For
, set
, and
.
.
- For
, set
, and
.
- Return
.
Limited memory BFGS goes one step further: instead of doing 2 loops of length for the
th iteration, do 2 loops of length
instead, where
is usually set to 5 or 10. Here is the appropriately modified update:
.
- For
, set
, and
.
.
- For
, set
, and
.
- Return
.
A popular choice for is
.
References:
- Peña, J. (2016). Quasi-Newton Methods. (CMU Convex Optimization: Fall 2016, Lecture on Nov 2.)