Bounds/constraints on leverage in linear regression

In the previous post, we introduced the notion of leverage in linear regression. If we have a response vector y \in \mathbb{R}^n and design matrix X \in \mathbb{R}^{n \times p}, the hat matrix is defined as H = X(X^TX)^{-1}X^T, and the leverage of data point i is the ith diagonal entry of H, which we denote by H_{ii}. It is so called because it is a measure of the influence that y_i has on its own prediction \hat{y}_i: \text{Cov}(\hat{y}_i, y_i) = \sigma^2 H_{ii}. The higher the leverage, the more influence y_i has on its own prediction.

It turns out that leverage must satisfy the following bounds:

0 \leq H_{ii} \leq 1 for all i = 1, \dots, n.

This is easy to prove using the following 2 facts:

  1. Note that H^2 =X(X^TX)^{-1}X^TX(X^TX)^{-1}X^T =X(X^TX)^{-1}X^T = H, i.e. H is idempotent.
  2. Note that H^T = H, i.e. H is symmetric.

Since H_{ii} = (H^2)_{ii}, using the symmetry of H it follows that

H_{ii} = H_{ii}^2 + \displaystyle\sum_{j \neq i} H_{ij}^2.

  • Since the RHS is a sum of square numbers, H_{ii} \geq 0.
  • Since the second term on the RHS is a sum of square numbers, H_{ii} \geq H_{ii}^2, and since H_{ii} \geq 0, we have 1 \geq H_{ii}.

There is also a constraint on the sum of leverages which is easy to derive. By the cyclic property of the trace operator,

\begin{aligned} \sum_{i=1}^n H_{ii} &= \text{tr}(H) = \text{tr} [X(X^TX)^{-1}X^T] \\  &= \text{tr}[X^T X (X^T X)^{-1}] \\ &= \text{tr}(I_p) = p. \end{aligned}