Effective sample size for Markov Chain Monte Carlo

In this previous post, we introduced the notion of effective sample size. In the context of estimating a mean, if we draw X_1, \dots, X_n in an independent and identically distributed (i.i.d.) manner from a distribution with variance \sigma^2, the sample mean has variance \sigma^2 / n. When there is correlation between the X_i‘s, the effective sample size n_{eff} is defined by the equation

\begin{aligned}\dfrac{\sigma^2}{n_{eff}} &= \text{Var} \left( \dfrac{X_1 + \dots + X_n}{n} \right) \\  &= \sum_{i=1}^n \dfrac{\text{Var}(X_i)}{n^2} + 2 \sum_{1 \leq i < j \leq n} \dfrac{\text{Cov}(X_i, X_j)}{n^2}, \\  \dfrac{\sigma^2}{n_{eff}} &= \dfrac{\sigma^2}{n} + \dfrac{2 \sigma^2}{n^2} \sum_{1 \leq i < j \leq n} \text{Cor}(X_i, X_j). \qquad -(1) \end{aligned}

In the Markov Chain Monte Carlo (MCMC) setting, the effective sample size is defined as

\begin{aligned} ESS = \dfrac{n}{1 + 2 \sum_{i=1}^\infty \rho(i)} = \dfrac{n}{1 + 2 \sum_{i=1}^\infty \text{Cor}(X_t, X_{t+i})}. \qquad-(2) \end{aligned}

How are these two formulas related? I couldn’t find a reference that explained why (2) is the correct formula for effective sample size in the MCMC context; any pointers would be great. (This blog post provides some context, but no derivation.)

Here’s my line of thought for how we can derive (1) from (2). From (1),

\begin{aligned} \dfrac{\sigma^2}{n_{eff}} &= \dfrac{\sigma^2}{n} + \dfrac{2 \sigma^2}{n^2} \sum_{i=1}^{n-1} (n-i)\rho(i), \\  \dfrac{1}{n_{eff}} &= \dfrac{n + 2 \sum_{i=1}^{n-1} (n-i)\rho(i) }{n^2}, \\  n_{eff} &= \dfrac{n}{1 + 2 \sum_{i=1}^{n-1} \frac{n-i}{n}\rho(i) }. \end{aligned}

If we replace the sum’s final index n-1 with \infty and replace all the \dfrac{n-i}{n} terms with 1, we get the formula in (2).

I think these two approximations can be pretty good in the MCMC context for a few reasons:

  1. We usually expect autocorrelations at very large lags to be close to zero. Thus, adding \rho(i) terms for i > n probably doesn’t make much of a difference to the denominator, especially when n is pretty large.
  2. For many MCMC settings, the autocorrelations are usually all positive, so including more \rho(i) creates a downward bias, if any. If we had to have any bias, this is probably in the right direction as it gives us a more conservative estimate of effective sample size.
  3. Replacing (n-i)/n by 1 creates a downward bias (the right direction), if any. Also, This approximation is very good for large values of n and small values of i. While the approximation is bad for large values of i, \rho(i) is likely to be very close to zero in this case, so replacing (n-i)/n by 1 is no big deal.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s