Assume that we are in the standard regression setting where we have observations, responses , and feature values , where denotes the value of the th feature for the th observation. Assume that is fixed. In * ordinary least squares (OLS)*, we assume that the true model is

where and for some known . The OLS estimate of is

Under the assumptions above, the * Gauss-Markov theorem* says that is the

**for .**

*best linear unbiased estimator (BLUE)***Generalized least squares (GLS)**

In generalized least squares (GLS), instead of assuming that , we assume instead that for some *known*, non-singular covariance matrix . We have

where . Applying the OLS formula to this last equation , we get the * generalized least squares estimate (GLS)* for :

* Why not just apply the OLS formula to directly?* That is because the assumptions for the Gauss-Markov theorem hold for , and so we can conclude that is the best linear unbiased estimator (BLUE) for in this setup.

Note that if we let denote the * inverse covariance matrix*, then the GLS solution has a slightly nicer form:

**Weighted least squares (WLS) Part 1**

One can think of * weighted least squares (WLS)* as a special case of GLS. In WLS, we assume that the covariance matrix for , is diagonal. That is, the error terms for each observation may have its own variance, but they are pairwise independent. If for all and is the diagonal matrix such that , then

**Weighted least squares (WLS) Part 2**

Another sense in which the least squares problem can be “weighted” is where the observations are assigned different weights. Instead of trying to solve

we want to solve

where the ‘s are some known “weights” that we place on the observations. (We might do so, for example, if we know that some observations are more trustworthy than others.) Note that

Thus, if is a diagonal matrix with the th diagonal entry being , solving the above is equivalent to performing OLS of on :

This is the same expression that we had in WLS Part 1, except that the weights in mean something different.

Pingback: Fitting a generalized linear model (GLM) | Statistical Odds & Ends