Let’s assume that we have some distribution we want to estimate some quantity related to it (e.g. the mean of the distribution). A typical estimation strategy is to draw independent and identically distributed samples from (i.e. ), then plug those samples into an estimator . The sample size is the number of samples we have: .
But what happens if our samples are not i.i.d.? In an extreme example, imagine that we have samples , but our sampling design forced the restriction . While it looks like we have numbers, intuitive we know we really only have one sample.
Effective sample size makes this idea concrete. When are i.i.d. from , the estimator has a certain variance . If our samples come from some other sampling design, we will have a different expression for the variance. Effective sample size is the number such that when we compute the estimator based on samples from our sampling design, the resulting variance is .
While different estimators could give rise to different effective sample sizes, estimation of the mean via the sample mean is so common that when someone talks about effective sample size, it is almost always with respect to this estimator. In this setting, if , then
If our ‘s are not drawn from the i.i.d. sampling design, then the effective sample size is such that
The notion of effective sample size comes up frequently in two contexts: when observations are correlated (e.g. time series data or Markov-chain Monte Carlo (MCMC) simulation) or weighted.
When observations are correlated, we have
Hence, we need expressions for the covariance to compute effective sample size.
In this setting, the observations are i.i.d. but each observation is given an observation weight . The weights are incorporated into the computation of the statistic (see this post for what we mean by this). These weights should affect our sample size. Taking an extreme example, if and , our estimator is really using only and not the other samples, so the effective sample size should be 1.
In the context of estimating the mean, our weighted estimator is the weighted mean
Since the observations are i.i.d., we have
- Wikipedia. Effective sample size.