# Generalized least squares (GLS) and weighted least squares (WLS)

Assume that we are in the standard regression setting where we have $n$ observations, responses $y = (y_1, \dots, y_n) \in \mathbb{R}^n$, and $p$ feature values $X \in \mathbb{R}^{n \times p}$, where $x_{ij}$ denotes the value of the $j$th feature for the $i$th observation. Assume that $X$ is fixed. In ordinary least squares (OLS), we assume that the true model is

\begin{aligned} y = X\beta + \varepsilon, \end{aligned}

where $\mathbb{E}[\varepsilon] = 0$ and $\text{Cov}(\varepsilon) = \sigma^2 I$ for some known $\sigma \in \mathbb{R}$. The OLS estimate of $\beta$ is

\begin{aligned} \hat{\beta}_{OLS} = (X^T X)^{-1} X^T y. \end{aligned}

Under the assumptions above, the Gauss-Markov theorem says that $\hat{\beta}_{OLS}$ is the best linear unbiased estimator (BLUE) for $\beta$.

Generalized least squares (GLS)

In generalized least squares (GLS), instead of assuming that $\text{Cov}(\varepsilon) = \sigma^2 I$, we assume instead that $\text{Cov}(\varepsilon) = V$ for some known, non-singular covariance matrix $V \in \mathbb{R}^{n \times n}$. We have

\begin{aligned} y &= X\beta + \varepsilon, &(1)\\ V^{-1/2}y &= V^{-1/2} X \beta + \varepsilon', &(2) \end{aligned}

where $\text{Cov}(\varepsilon') = \text{Cov}(V^{-1/2}\varepsilon) = I$. Applying the OLS formula to this last equation $(2)$, we get the generalized least squares estimate (GLS) for $\beta$:

\begin{aligned} \hat{\beta}_{GLS} &= [(V^{-1/2} X)^T(V^{-1/2} X)]^{-1}(V^{-1/2} X)^T(V^{-1/2}y) \\ &= (X^TV^{-1}X)^{-1}X^TV^{-1}y. \end{aligned}

Why not just apply the OLS formula to $(1)$ directly? That is because the assumptions for the Gauss-Markov theorem hold for $(2)$, and so we can conclude that $\hat{\beta}_{GLS}$ is the best linear unbiased estimator (BLUE) for $\beta$ in this setup.

Note that if we let $W = V^{-1}$ denote the inverse covariance matrix, then the GLS solution has a slightly nicer form:

\begin{aligned} \hat{\beta}_{GLS} = (X^TWX)^{-1}X^TWy. \end{aligned}

Weighted least squares (WLS) Part 1

One can think of weighted least squares (WLS) as a special case of GLS. In WLS, we assume that the covariance matrix for $\varepsilon$, $V$ is diagonal. That is, the error terms for each observation may have its own variance, but they are pairwise independent. If $\varepsilon_i = \sigma_i^2$ for all $i = 1, \dots, n$ and $W$ is the diagonal matrix such that $w_{ii} = \frac{1}{\sigma_i^2}$, then

\begin{aligned} \hat{\beta}_{WLS} = (X^TWX)^{-1}X^TWy. \end{aligned}

Weighted least squares (WLS) Part 2

Another sense in which the least squares problem can be “weighted” is where the observations are assigned different weights. Instead of trying to solve

\begin{aligned} \text{minimize} \: \sum_{i=1}^n \left( y_i - \sum_{j=1}^p x_{ij}\beta_j \right)^2, \end{aligned}

we want to solve

\begin{aligned} \text{minimize} \: \sum_{i=1}^n w_i \left( y_i - \sum_{j=1}^p x_{ij}\beta_j \right)^2, \end{aligned}

where the $w_i$‘s are some known “weights” that we place on the observations. (We might do so, for example, if we know that some observations are more trustworthy than others.) Note that

\begin{aligned} \sum_{i=1}^n w_i \left( y_i - \sum_{j=1}^p x_{ij}\beta_j \right)^2 = \sum_{i=1}^n \left( \sqrt{w_i} y_i - \sum_{j=1}^p \sqrt{w_i}x_{ij}\beta_j \right)^2. \end{aligned}

Thus, if $W^{1/2}$ is a diagonal matrix with the $i$th diagonal entry being $\sqrt{w_i}$, solving the above is equivalent to performing OLS of $W^{1/2} y$ on $W^{1/2} X$:

\begin{aligned} \hat{\beta}_{WLS} &= [(W^{1/2} X)^T (W^{1/2} X)]^{-1} (W^{1/2} X) ^T(W^{1/2} y) \\ &= (X^T W X)^{-1} X^TWy. \end{aligned}

This is the same expression that we had in WLS Part 1, except that the weights in $W$ mean something different.