Exact line search and backtracking line search

Assume we are trying to minimize some convex function $f: \mathbb{R}^n \mapsto \mathbb{R}$ which is differentiable. One way to do this is to use a descent method. A general descent method can be described as follows: Given a starting point $x$ in the domain of $f$, iterate over the following 3 steps until some stopping criterion is satisfied:

1. Choose a descent direction $\Delta x$.
2. Choose a step size $t > 0$.
3. Update $x \leftarrow x + t \Delta x$.

There are several ways to choose a descent direction (see this previous post on steepest descent for some examples). This post is about two ways to choose the step size. This step is called a line search because the step size determines where along the line $\{x + t \Delta x : t \geq 0 \}$ the next iterate will be.

Exact line search

Exact line search finds $t^\star$ such that $f(x + t \Delta x)$ attains its minimum value at $t = t^\star$, where we let $t$ range over non-negative values. The diagram below demonstrates this:

The curve is the value of the function along the ray $x + t \Delta x$ with $t \geq 0$. $t^*$ is the point where this function is minimized. It should be clear from the description that this choice will result in smaller values of the function in each iteration.

Since exact line search requires solving a minimization problem (albeit in one dimension), it can be expensively unless $t^*$ can be computed efficiently (or even analytically).

Backtracking line search

Backtracking line search uses a heuristic to quickly determine a step size which decreases the objective function value by a certain amount. The user needs to set two parameters, $0 < \alpha < 0.5$ and $0 < \beta < 1$. In each iteration, after the descent direction $\Delta x$ has been chosen, the step size is determine by the following:

1. Set $t = 1$.
2. While $f(x + t\Delta x) > f(x) + \alpha t \nabla f(x)^\top \Delta x$, set $t \leftarrow \beta t$.

In other words, we keep reducing the step size by the same multiplicative factor until the $f(x + t\Delta x) \leq f(x) + \alpha t \nabla f(x)^\top \Delta x$.

Let’s draw a picture to understand the intuition behind the inequality:

This is the same picture as that for exact line search, except we’ve added the line $y = f(x) + \alpha t \nabla f(x)^\top \Delta x$. This line always has negative slope: because $\Delta x$ is a descent direction, we must have $\nabla f(x)^\top \Delta x < 0$. The stopping condition, $f(x + t\Delta x) \leq f(x) + \alpha t \nabla f(x)^\top \Delta x$, holds for the blue segment $[0, t_0]$. The operation $t \leftarrow \beta t$ makes the step size smaller and smaller until $t$ falls in the blue segment, and by the way it’s set up, we know that the objective function will have a strictly smaller value.

How do we know that $t_0$ is always $> 0$ (i.e. the blue segment exists)? It’s because of how we chose the term $\alpha t \nabla f(x)^\top \Delta x$. It can be shown that the derivative of $f(x + t \Delta x)$ w.r.t. $t$ at the point $t = 0$ is $\nabla f(x)^\top \Delta x$. Hence, the line $y = f(x) + t \nabla f(x)^\top \Delta x$ is tangent to $y = f(x + t \nabla x)$ at $t = 0$. Multiplying the term $t \nabla f(x)^\top \Delta x$ by some $0 < \alpha < 1$ ensures that the stopping condition will hold for some non-trivial segment $[0, t_0]$.

(Note: The last diagram is taken from Reference 1, and the other diagrams are this base + my annotations.)

References:

1. Boyd, S., and Vandenberghe, L. (2004). Convex Optimization. Chapter 9.