Effective sample size for Markov Chain Monte Carlo

In this previous post, we introduced the notion of effective sample size. In the context of estimating a mean, if we draw $X_1, \dots, X_n$ in an independent and identically distributed (i.i.d.) manner from a distribution with variance $\sigma^2$, the sample mean has variance $\sigma^2 / n$. When there is correlation between the $X_i$‘s, the effective sample size $n_{eff}$ is defined by the equation

\begin{aligned}\dfrac{\sigma^2}{n_{eff}} &= \text{Var} \left( \dfrac{X_1 + \dots + X_n}{n} \right) \\ &= \sum_{i=1}^n \dfrac{\text{Var}(X_i)}{n^2} + 2 \sum_{1 \leq i < j \leq n} \dfrac{\text{Cov}(X_i, X_j)}{n^2}, \\ \dfrac{\sigma^2}{n_{eff}} &= \dfrac{\sigma^2}{n} + \dfrac{2 \sigma^2}{n^2} \sum_{1 \leq i < j \leq n} \text{Cor}(X_i, X_j). \qquad -(1) \end{aligned}

In the Markov Chain Monte Carlo (MCMC) setting, the effective sample size is defined as

\begin{aligned} ESS = \dfrac{n}{1 + 2 \sum_{i=1}^\infty \rho(i)} = \dfrac{n}{1 + 2 \sum_{i=1}^\infty \text{Cor}(X_t, X_{t+i})}. \qquad-(2) \end{aligned}

How are these two formulas related? I couldn’t find a reference that explained why $(2)$ is the correct formula for effective sample size in the MCMC context; any pointers would be great. (This blog post provides some context, but no derivation.)

Here’s my line of thought for how we can derive $(1)$ from $(2)$. From $(1)$,

\begin{aligned} \dfrac{\sigma^2}{n_{eff}} &= \dfrac{\sigma^2}{n} + \dfrac{2 \sigma^2}{n^2} \sum_{i=1}^{n-1} (n-i)\rho(i), \\ \dfrac{1}{n_{eff}} &= \dfrac{n + 2 \sum_{i=1}^{n-1} (n-i)\rho(i) }{n^2}, \\ n_{eff} &= \dfrac{n}{1 + 2 \sum_{i=1}^{n-1} \frac{n-i}{n}\rho(i) }. \end{aligned}

If we replace the sum’s final index $n-1$ with $\infty$ and replace all the $\dfrac{n-i}{n}$ terms with $1$, we get the formula in $(2)$.

I think these two approximations can be pretty good in the MCMC context for a few reasons:

1. We usually expect autocorrelations at very large lags to be close to zero. Thus, adding $\rho(i)$ terms for $i > n$ probably doesn’t make much of a difference to the denominator, especially when $n$ is pretty large.
2. For many MCMC settings, the autocorrelations are usually all positive, so including more $\rho(i)$ creates a downward bias, if any. If we had to have any bias, this is probably in the right direction as it gives us a more conservative estimate of effective sample size.
3. Replacing $(n-i)/n$ by 1 creates a downward bias (the right direction), if any. Also, This approximation is very good for large values of $n$ and small values of $i$. While the approximation is bad for large values of $i$, $\rho(i)$ is likely to be very close to zero in this case, so replacing $(n-i)/n$ by 1 is no big deal.