Soft-thresholding and the sgn function

The soft thresholding operator $S$ with parameter $\lambda \geq 0$ acts element-wise on $x \in \mathbb{R}^p$ such that for each $i = 1, \dots, p$, $S(x, \lambda)_i = \begin{cases} x_i - \lambda &\text{if } x_i > \lambda, \\ 0 &\text{if } -\lambda \leq x_i \leq \lambda, \\ x_i + \lambda &\text{if } x_i < -\lambda. \end{cases}$

In other words, for each element in $x$, the soft-thresholding operator brings $x_i$ closer to 0 by the largest amount $\leq \lambda$, without having $x_i$ change its sign.

As a quick consequence of the definition, for any constant $c > 0$ we have $S(cx, c\lambda) = c S(x, \lambda)$.

The soft-thresholding operator appears frequently in optimization problems involving the $\ell_1$ penalty. This is because the subgradient of the $\ell_1$ penalty involves the $\text{sgn}$ function, defined as $\text{sgn}(x) = \begin{cases} +1 &\text{if } x > 0, \\ [-1, 1] &\text{if } x = 0, \\ -1 &\text{if } x < 0. \end{cases}$

The lemma below makes the connection between soft-thresholding and the $\text{sgn}$ function explicit:

Lemma: Consider the equation $ax - b + c \text{sgn}(x) = 0,$

where $a > 0$ and $c \geq 0$. The solution to this equation is $x = \dfrac{S(b, c)}{a}$.

Proof: We consider the following cases:

Case 1: $x > 0$. The equation becomes $ax - b + c = 0$, or $x = (b - c) / a$. This is only valid if $b - c > 0$, or $b > c$.

Case 2: $x < 0$. The equation becomes $ax - b - c = 0$, or $x = (b + c) / a$. This is only valid if $b + c < 0$, or $b < -c$.

Case 3: $x = 0$. This solution is only valid if $a(0) - b + c(-1) \leq 0 \leq a(0) - b + c(1)$, or $-c \leq b \leq c$.

Putting it altogether, we see that the solution is equivalent to $x = \dfrac{S(b, c)}{a}$.