Generalized least squares (GLS) and weighted least squares (WLS)

Assume that we are in the standard regression setting where we have n observations, responses y = (y_1, \dots, y_n) \in \mathbb{R}^n, and p feature values X \in \mathbb{R}^{n \times p}, where x_{ij} denotes the value of the jth feature for the ith observation. Assume that X is fixed. In ordinary least squares (OLS), we assume that the true model is

\begin{aligned} y = X\beta + \varepsilon, \end{aligned}

where \mathbb{E}[\varepsilon] = 0 and \text{Cov}(\varepsilon) = \sigma^2 I for some known \sigma \in \mathbb{R}. The OLS estimate of \beta is

\begin{aligned} \hat{\beta}_{OLS} = (X^T X)^{-1} X^T y. \end{aligned}

Under the assumptions above, the Gauss-Markov theorem says that \hat{\beta}_{OLS} is the best linear unbiased estimator (BLUE) for \beta.

Generalized least squares (GLS)

In generalized least squares (GLS), instead of assuming that \text{Cov}(\varepsilon) = \sigma^2 I, we assume instead that \text{Cov}(\varepsilon) = V for some known, non-singular covariance matrix V \in \mathbb{R}^{n \times n}. We have

\begin{aligned} y &= X\beta + \varepsilon, &(1)\\  V^{-1/2}y &= V^{-1/2} X \beta + \varepsilon', &(2) \end{aligned}

where \text{Cov}(\varepsilon') = \text{Cov}(V^{-1/2}\varepsilon) = I. Applying the OLS formula to this last equation (2), we get the generalized least squares estimate (GLS) for \beta:

\begin{aligned} \hat{\beta}_{GLS} &= [(V^{-1/2} X)^T(V^{-1/2} X)]^{-1}(V^{-1/2} X)^T(V^{-1/2}y) \\  &= (X^TV^{-1}X)^{-1}X^TV^{-1}y.  \end{aligned}

Why not just apply the OLS formula to (1) directly? That is because the assumptions for the Gauss-Markov theorem hold for (2), and so we can conclude that \hat{\beta}_{GLS} is the best linear unbiased estimator (BLUE) for \beta in this setup.

Note that if we let W = V^{-1} denote the inverse covariance matrix, then the GLS solution has a slightly nicer form:

\begin{aligned} \hat{\beta}_{GLS} = (X^TWX)^{-1}X^TWy.  \end{aligned}

Weighted least squares (WLS) Part 1

One can think of weighted least squares (WLS) as a special case of GLS. In WLS, we assume that the covariance matrix for \varepsilon, V is diagonal. That is, the error terms for each observation may have its own variance, but they are pairwise independent. If \varepsilon_i = \sigma_i^2 for all i = 1, \dots, n and W is the diagonal matrix such that w_{ii} = \frac{1}{\sigma_i^2}, then

\begin{aligned} \hat{\beta}_{WLS} = (X^TWX)^{-1}X^TWy.  \end{aligned}

Weighted least squares (WLS) Part 2

Another sense in which the least squares problem can be “weighted” is where the observations are assigned different weights. Instead of trying to solve

\begin{aligned} \text{minimize} \: \sum_{i=1}^n \left( y_i - \sum_{j=1}^p x_{ij}\beta_j \right)^2, \end{aligned}

we want to solve

\begin{aligned} \text{minimize} \: \sum_{i=1}^n w_i \left( y_i - \sum_{j=1}^p x_{ij}\beta_j \right)^2, \end{aligned}

where the w_i‘s are some known “weights” that we place on the observations. (We might do so, for example, if we know that some observations are more trustworthy than others.) Note that

\begin{aligned} \sum_{i=1}^n w_i \left( y_i - \sum_{j=1}^p x_{ij}\beta_j \right)^2 = \sum_{i=1}^n \left( \sqrt{w_i} y_i - \sum_{j=1}^p \sqrt{w_i}x_{ij}\beta_j \right)^2. \end{aligned}

Thus, if W^{1/2} is a diagonal matrix with the ith diagonal entry being \sqrt{w_i}, solving the above is equivalent to performing OLS of W^{1/2} y on W^{1/2} X:

\begin{aligned} \hat{\beta}_{WLS} &= [(W^{1/2} X)^T (W^{1/2} X)]^{-1} (W^{1/2} X) ^T(W^{1/2} y) \\  &= (X^T W X)^{-1} X^TWy. \end{aligned}

This is the same expression that we had in WLS Part 1, except that the weights in W mean something different.


1 thought on “Generalized least squares (GLS) and weighted least squares (WLS)

  1. Pingback: Fitting a generalized linear model (GLM) | Statistical Odds & Ends

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s