This post derives the general formula for the covariance of the ordinary least squares (OLS) estimator.
Imagine we are in the regression setup with design matrix and response
. Let
and
denote the
th row of
and the
th element of
respectively. We can always make the following decomposition:
,
where and
is uncorrelated with any function of
. (This is Theorem 3.1.1 of Reference 1.)
The population regression function approximates as
, where
solves the minimization problem
It can be shown that
The ordinary least squares (OLS) estimator is a sample version of this and is given by
We are often interested in estimating the covariance matrix of as it is needed to construct standard errors for
. Defining
as the
th residual, we can rewrite the above as
By Slutsky’s Theorem, the quantity above has the same asymptotic distribution as . Since
*, the Central Limit Theorem tells us that
is asymptotically normally distributed with mean zero and covariance
(the matrix of fourth moments). Thus,
We can use the diagonal elements of to construct standard errors of
. The standard errors computed in this way are called heteroskedasticity-consistent standard errors (or White standard errors, or Eicker-White standard errors). They are “robust” in the sense that they use few assumptions on the data and the model: only those needed to make the Central Limit Theorem go through.
*Note: We do NOT need to assume that is linear in order to conclude that
. All we need are the relations
and
. The derivation is as follows:
Special case of homoskedastic errors
If we assume that the errors are homoskedastic, i.e.
for all
for some constant , then the asymptotic covariance simplifies a little:
References:
- Angrist, J. D., and Pischke, J.-S. (2009). Mostly harmless econometrics (Section 3.1.3).