# Horvitz–Thompson estimator

Let’s say we have a finite population of $N$ individuals, and we are interested in some trait that they have. Let $X_i$ denote the value of the trait for individual $i$. We don’t get to see all these $X_i$‘s: we only sample $n < N$ of them. With this sample of $n$ individuals, we may be interested in obtaining an estimate of the total $T = \sum_{i=1}^N X_i$ or the mean $\tau = \frac{1}{N}\sum_{i=1}^N X_i$.

Let’s add another wrinkle to our sampling scheme: we don’t know how we obtained it! (Maybe someone else gave it to us.) All we know is that the probability of individual $i$ being included in the sample was $\pi_i$. Can we still come up with reasonable estimates for $T$ and $\tau$?

It turns out that we can. In a 1952 paper, Daniel G. Horvitz and Donovan J. Thompson introduced what is now known as the Horvitz-Thompson estimator:

$\hat{T}_{HT} = \displaystyle\sum_{i=1}^n \frac{X_i}{\pi_i}.$

Note that the sum only goes over $n$ terms, but it is an estimate for a sum over $N$ terms. This estimator is performing inverse probability weighting: that is, we give each observation a weight which is the inverse of its probability of inclusion. The Horvitz-Thompson estimator is unbiased for $T$. The paper also worked out an expression for the estimator’s variance, but it’s substantially more complicated.

One potential application of this is if $X_i = 1$ for all individuals ($i = 1, \dots, N$). Here, we are just trying to estimate the size of the population $N$. If we knew the inclusion probabilities (a BIG if), then we could just use the Horvitz-Thompson estimator directly: $\hat{T}_{HT} = \sum_{i=1}^n \frac{1}{\pi_i}.$ Usually we don’t know the $\pi_i$‘s, since they depend on the knowledge of $N$ in some way! What we could do then is to get estimates $\hat{\pi}_i$ for the inclusion probabilities, then use the plug-in principle to get the estimator $\sum_{i=1}^n \frac{X_i}{\hat{\pi}_i}$.

Credits: I learnt of this estimator through a talk Kristian Lum gave recently at the Stanford statistics seminar.