What is Moreau’s Decomposition Theorem?

Let f: \mathbb{R}^n \mapsto \mathbb{R} be a convex function. Moreau’s Decomposition Theorem is the following result:

Theorem (Moreau Decomposition). For all x \in \mathbb{R}^n,

x = \textbf{prox}_{f} (x) + \textbf{prox}_{f^*} (x),

where \textbf{prox}_{f} is the proximal operator for f and f^* is the convex conjugate of f.

Here is the proof: Let z = \textbf{prox}_{f} (x). Then,

\begin{aligned} z = \textbf{prox}_{f} (x) \quad &\Leftrightarrow \quad x - z \in \partial f(z) \\  &\Leftrightarrow \quad z \in \partial f^*(x-z) \\  &\Leftrightarrow \quad x - (x-z) \in \partial f^*(x-z) \\  &\Leftrightarrow \quad x - z = \textbf{prox}_{f^*} (x) \\  &\Leftrightarrow \quad x = \textbf{prox}_{f} (x) + \textbf{prox}_{f^*} (x). \end{aligned}

The second equivalence is a result involving convex conjugates and subdifferentials (see this post for statement and proof) while the 4th equivalence is a property of the proximal operator (see the note at the end of this post).

Extended Moreau Decomposition

The extended version of Moreau’s Decomposition Theorem involves a scaling factor \lambda > 0. It states that for all x,

\begin{aligned} x = \textbf{prox}_{\lambda f} (x) + \textbf{prox}_{\lambda^{-1} f^*} (x / \lambda). \end{aligned}

Proof: Applying Moreau decomposition to the function \lambda f gives

x = \textbf{prox}_{\lambda f} (x) + \textbf{prox}_{(\lambda f)^*} (x).

Using the definitions of the proximal operator and the convex conjugate,

\begin{aligned} \textbf{prox}_{(\lambda f)^*} (x) &= \underset{u}{\text{argmin}} \left\{ \frac{1}{2} \| x - u \|_2^2 + (\lambda f)^* (u) \right\} \\  &= \underset{u}{\text{argmin}} \left\{ \frac{1}{2} \| x - u \|_2^2 + \sup_v \left[ u^\top v - \lambda f(v) \right] \right\} \\  &= \underset{u}{\text{argmin}} \left\{ \frac{1}{2} \|(x/\lambda) - (u / \lambda) \|_2^2 + \frac{1}{\lambda} \sup_v \left[ (u / \lambda)^\top v - f(v) \right] \right\} \\  &= \underset{u}{\text{argmin}} \left\{ \frac{1}{2} \|(x/\lambda) - (u / \lambda) \|_2^2 + \frac{1}{\lambda} f^*( u / \lambda) \right\} \\  &= \underset{u}{\text{argmin}} \left\{ \frac{1}{2} \| (x / \lambda) - u \|_2^2 + \lambda^{-1}f^* (u) \right\} \\  &= \textbf{prox}_{\lambda^{-1} f^*} (x / \lambda). \end{aligned}

References:

  1. Gu, Q. (2016). “SYS 6003: Optimization. Lecture 25.”
Advertisement

Relations, resolvents and the proximal operator

Relations and resolvents

A relation F on \mathbb{R}^n is a subset of \mathbb{R}^n \times \mathbb{R}^n. We use the notation F(x) to mean the set F(x) = \{ y: (x, y) \in F \}. You can think of F as an operator that maps vectors x \in \mathbb{R}^n to sets F(x) \subseteq \mathbb{R}^n. (Along this line of thinking, functions are a special kind of relation where every vector is mapped to a set consisting of exactly one element.)

The inverse relation is defined as F^{-1} = \{ (y, x): (x, y) \in F \}.

For a relation R and some parameter \lambda \in \mathbb{R}, the resolvent of R is defined as the relation

S = (I + \lambda R)^{-1}.

In other words,

\begin{aligned} S &= \{ (y, x): (x, y) \in I + \lambda R \}, \\ S &= \{ (x + \lambda y, x): (x, y) \in R \}. \end{aligned}

Connection with the proximal operator

Let f: \mathbb{R}^n \mapsto \mathbb{R} be some convex function. Recall that the proximal operator of f is defined by

\begin{aligned} \textbf{prox}_f (x) = \underset{u \in \mathbb{R}^n}{\text{argmin}} \left\{ \frac{1}{2} \| x - u \|_2^2 + f(u) \right\}. \end{aligned}

It turns out that the resolvent of the subdifferential operator is the proximal operator. Here is the proof: Let z = (I + \lambda \partial f)^{-1}(x) for some \lambda > 0. By definition of the resolvent,

\begin{aligned} z = (I + \lambda \partial f)^{-1}(x) \quad &\Leftrightarrow \quad x \in z + \lambda \partial f(z) \\ &\Leftrightarrow \quad 0 \in \lambda \partial f(z) + (z - x) \\ &\Leftrightarrow \quad 0 \in \partial \left[ \lambda f(z) + \frac{1}{2} \| z - x \|_2^2 \right] \\ &\Leftrightarrow \quad z = \underset{u}{\text{argmin}} \; \lambda f(u) + \frac{1}{2} \| u - x \|_2^2 \\ &\Leftrightarrow \quad z = \textbf{prox}_{\lambda f} (x). \end{aligned}

(Note: Setting \lambda = 1, the chain of reasoning above gives z = \textbf{prox}_{f} (x) \; \Leftrightarrow x - z \in \partial f(z), which is useful to know.)

References:

  1. Pilanci, M. (2022). “Monotone Operators.”

Convex conjugate functions, Fenchel’s inequality, subdifferentials and optimality

Convex conjugate functions and Fenchel’s inequality

Let f: \mathbb{R}^n \mapsto \mathbb{R} be some function. The convex conjugate of f (also know as the Fenchel transformation), is the function f^*: \mathbb{R}^n \mapsto \mathbb{R} defined as

f^*(y) = \underset{x}{\sup} \left\{ y^\top x - f(x) \right\}.

Fenchel’s equality is the following statement:

Theorem (Fenchel’s inequality). For any x, y \in \mathbb{R}^n,

f(x) + f^*(y) \geq x^\top y.

The proof of Fenchel’s equality follows directly from the definition of the convex conjugate:

\begin{aligned} f(x) + f^*(y) &= f(x) + \underset{u}{\sup} \left\{ y^\top u - f(u) \right\}  \\  &\geq f(x) + y^\top x - f(x) \\  &= x^\top y. \end{aligned}

A direct application of Fenchel’s inequality shows that the conjugate of the conjugate (the biconjugate), denoted f^{**}, always satisfies

f^{**} \leq f.

To see this: Fenchel’s inequality says f(x) \geq x^\top y - f^*(y) for all x, y. Taking a supremum over all y, the inequality becomes

f(x) \geq \underset{y}{\sup} \left\{ x^\top y - f^*(y) \right\} = f^{**}(x).

It turns out that if f is closed and convex, then we actually have f^{**} = f. The proof is a bit cumbersome and so I’m not providing it in this post. (For a proof, see Reference 1 or Slides 5.13-5.14 of Reference 2.) More generally, the Fenchel-Moreau theorem gives necessary and sufficient conditions for f^{**} = f.

Subdifferentials and optimality

A vector g \in \mathbb{R}^n is a subgradient of f at point x \in \mathbb{R}^n if

f(y) \geq f(x) + g^\top (y - x) for all y \in \mathbb{R}^n.

The subdifferential of f at point x, denoted \partial f(x), is the set of all the subgradients of f and point x.

Convex conjugates appear in the following theorem concerning subdifferentials:

Theorem. If f is closed and convex, then for any x, y \in \mathbb{R}^n,

\begin{aligned} y \in \partial f(x) \; \Leftrightarrow \; x \in \partial f^*(y) \; \Leftrightarrow \; f(x) + f^*(y) = x^\top y. \end{aligned}

Proof (adapted from Slide 5.15 of Reference 2 and Reference 3): First we show that y \in \partial f(x) \; \Rightarrow \; x \in \partial f^*(y). Using the definitions of subgradients and convex conjugates,

\begin{aligned} y \in \partial f(x) &\Rightarrow \; f(u) \geq f(x) + y^\top (u - x) \quad \text{for all } u \\  &\Rightarrow \; y^\top u - f(u) \leq y^\top x - f(x) \quad \text{for all } u \\  &\Rightarrow f^*(y) = y^\top x - f(x). \end{aligned}

Thus, for any v,

\begin{aligned} f^*(v) &= \underset{u}{\sup} \left\{ v^\top u - f(u) \right\} \\  &\geq v^\top x - f(x) \\  &= y^\top x - f(x) + x^\top (v - y) \\  &= f^*(y) + x^\top (v - y), \end{aligned}

which implies that x \in \partial f^*(y).

To get the reverse implication \partial x \in f^*(y) \; \Rightarrow \; y \in \partial f(x), apply the same logic above with f^* in place of f and use the fact that f^{**} = f for closed convex f.

Next, we show that y \in \partial f(x) \; \Rightarrow \; f(x) + f^*(y) = x^\top y. Using the definitions of subgradients,

\begin{aligned} y \in \partial f(x) &\Rightarrow \; f(u) \geq f(x) + y^\top (u - x) \quad \text{for all } u \\  &\Rightarrow \; x^\top y \geq f(x) + (y^\top u - f(u)) \quad \text{for all } u. \end{aligned}

Taking supremum over all u, the inequality becomes x^\top y \geq f(x) + f^*(y). Combining this with Fenchel’s inequality gives us equality x^\top y = f(x) + f^*(y).

Conversely, if x^\top y = f(x) + f^*(y), we have

\begin{aligned} x^\top y &\geq f(x) + f^*(y), \\  x^\top y &\geq f(x) + y^\top u - f(u) \quad \text{for all } u, \\  f(u) &\geq f(x) + y^\top (u - x) \quad \text{for all } u, \\  y &\in \partial f(x). \end{aligned}

References:

  1. StackExchange. “How to prove the conjugate of the conjugate function is itself?”
  2. Vandenberghe, L. (2022). “5. Conjugate functions.”
  3. StackExchange. “Proof about Conjugate and subgradient.”