# Asymptotic distribution of the Pearson chi-square statistic

I recently learned of a fairly succinct proof for the asymptotic distribution of the Pearson chi-square statistic (from Chapter 9 of Reference 1), which I share below.

First, the set-up: Assume that we have $n$ independent trials, and each trial ends in one of $J$ possible outcomes, which we label (without loss of generality) as $1, 2, \dots, J$. Assume that for each trial, the probability of the outcome being $j$ is $p_j > 0$. Let $n_j$ denote that number of trials that result in outcome $j$, so that $\sum_{j=1}^J n_j = n$. Pearson’s $\chi^2$-statistic is defined as \begin{aligned} \chi^2 = \sum_{\text{cells}} \dfrac{(\text{obs} - \text{exp})^2}{\text{exp}} = \sum_{j=1}^J \dfrac{(n_j - np_j)^2}{np_j}. \end{aligned}

Theorem. As $n \rightarrow \infty$, $\chi^2 \stackrel{d}{\rightarrow} \chi_{J-1}^2$, where $\stackrel{d}{\rightarrow}$ denotes convergence in distribution.

Before proving the theorem, we prove a lemma that we will use:

Lemma. Let $\mathbf{X} \in \mathbb{R}^{n \times n}$ have distribution $\mathbf{X} \sim \mathcal{N}(0, \mathbf{\Sigma})$. Then $\mathbf{X}^\top \mathbf{X}$ has $\chi_r^2$ distribution if and only if $\mathbf{\Sigma}$ is idempotent  (i.e. a projection) with rank $r$.

(Note: We call $\mathbf{X}$ a projection matrix if $\mathbf{X}$ is idempotent.)

Proof of Lemma: Since $\mathbf{\Sigma}$ is real and symmetric, it is orthogonally diagonalizable, i.e. there is an orthogonal matrix $\mathbf{U}$ and a diagonal matrix $\mathbf{D}$ such that $\mathbf{D} = \mathbf{U \Sigma U}^\top$. Let $\mathbf{Y} = \mathbf{UX}$. Since $\mathbf{X} \sim \mathcal{N}(0, \mathbf{\Sigma})$, $\mathbf{Y} \sim \mathcal{N}(0, \mathbf{U\Sigma U}^\top) = \mathcal{N}(0, \mathbf{D})$. Furthermore, $\mathbf{Y}^\top \mathbf{Y} = \mathbf{X}^\top \mathbf{U}^\top \mathbf{U} \mathbf{X} = \mathbf{X}^\top \mathbf{X}$. Thus, \begin{aligned} &\mathbf{\Sigma}^2 = \mathbf{\Sigma} \text{ and } \mathbf{\Sigma} \text{ has rank } r \\ \Leftrightarrow \quad& \mathbf{D}^2 = \mathbf{D} \text{ and } \mathbf{D} \text{ has rank } r \\ \Leftrightarrow \quad& \mathbf{D} \text{ has } r \text{ ones and } n-r \text{ zeros on its diagonal} \\ \Leftrightarrow \quad& \mathbf{Y}^\top \mathbf{Y} \sim \chi_r^2 \\ \Leftrightarrow \quad& \mathbf{X}^\top \mathbf{X} \sim \chi_r^2. \end{aligned}

We are now ready to prove the theorem.

Proof of Theorem (asymptotic distribution of Pearson $\chi^2$ statistic): For each $j \in J$, let $\mathbf{e}_j$ denote the vector in $\mathbb{R}^J$ with all zeros except for a one in the $j$th entry. Let $\mathbf{X}_i$ be equal to $\mathbf{e}_j$ if the $i$th trial resulted in outcome $j$. Then $\mathbf{X}_1, \dots, \mathbf{X}_n$ are i.i.d. with the multinomial distribution: $\mathbb{E}[\mathbf{X}_i] = \mathbf{p}$ and $\text{Cov}(\mathbf{X}_i) = \mathbf{\Sigma}$, where \begin{aligned} \mathbf{p} &= \begin{pmatrix} p_1 \\ \vdots \\ p_J \end{pmatrix}, \\ \mathbf{\Sigma} &= \begin{pmatrix} p_1 (1-p_1) & -p_1 p_2 & \dots & -p_1 p_J \\ -p_1 p_2 & p_2 (1-p_2) & \dots & -p_2 p_J \\ \vdots & \vdots & & \vdots \\ -p_1 p_J & -p_2 p_J & \dots & p_J (1-p_J) \end{pmatrix} = \text{diag}(\mathbf{p}) - \mathbf{p}\mathbf{p}^T. \end{aligned}

Let $\mathbf{P} = \text{diag}(\mathbf{p})$. We can rewrite the Pearson $\chi^2$ statistic as \begin{aligned} \chi^2 &= \sum_{j=1}^J \dfrac{(n_j - np_j)^2}{np_j} \\ &= n \sum_{j=1}^J \dfrac{((n_j/n) - p_j)^2}{p_j} \\ &= n \left( \overline{\mathbf{X}}_n - \mathbf{p} \right)^\top \mathbf{P}^{-1} \left( \overline{\mathbf{X}}_n - \mathbf{p} \right). \end{aligned}

By the Central Limit Theorem, $\sqrt{n} \left( \overline{\mathbf{X}}_n - \mathbf{p} \right) \stackrel{d}{\rightarrow} \mathbf{Y}$, where $\mathbf{Y} \sim \mathcal{N} (0, \mathbf{\Sigma})$. Applying the Continuous Mapping Theorem, \begin{aligned} \chi^2 = \sqrt{n} \left( \overline{\mathbf{X}}_n - \mathbf{p} \right)^\top \mathbf{P}^{-1} \sqrt{n} \left( \overline{\mathbf{X}}_n - \mathbf{p} \right) \stackrel{d}{\rightarrow} \mathbf{Y}^T \mathbf{P}^{-1} \mathbf{Y}. \end{aligned}

If we define $\mathbf{Z} = \mathbf{P}^{-1/2} \mathbf{Y}$, then $\mathbf{Z} \sim \mathcal{N}(0, \mathbf{P}^{-1/2} \mathbf{\Sigma} \mathbf{P}^{-1/2})$ and $\chi^2 \stackrel{d}{\rightarrow} \mathbf{Z}^\top \mathbf{Z}$. By the lemma, it remains to show that $\mathbf{P}^{-1/2} \mathbf{\Sigma} \mathbf{P}^{-1/2}$ is a projection matrix of rank $J-1$. We can write this matrix as \begin{aligned} \mathbf{P}^{-1/2} \mathbf{\Sigma} \mathbf{P}^{-1/2} &= \mathbf{P}^{-1/2} \left( \mathbf{P}- \mathbf{p}\mathbf{p}^T \right) \mathbf{P}^{-1/2} \\ &= \mathbf{I} - \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2}. \end{aligned}

This matrix is a projection: \begin{aligned} \left( \mathbf{I} - \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2} \right)^2 &= \mathbf{I} - 2 \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2} + \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2} \\ &= \mathbf{I} - 2 \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2} + \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2} \\ &= \mathbf{I} - \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2}. \end{aligned}

We can compute the trace of the matrix: \begin{aligned} \text{tr} \left[ (\mathbf{P}^{-1/2} \mathbf{p})(\mathbf{p}^T \mathbf{P}^{-1/2})\right] &= \text{tr} \left[(\mathbf{p}^T \mathbf{P}^{-1/2})(\mathbf{P}^{-1/2} \mathbf{p})\right] \\ &= \text{tr}(1) = 1, \\ \text{tr} \left( \mathbf{I} - \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2} \right) &= \text{tr}(\mathbf{I}) - \text{tr}\left( \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2} \right) = r-1. \end{aligned}

Since $\mathbf{I} - \mathbf{P}^{-1/2} \mathbf{p}\mathbf{p}^T \mathbf{P}^{-1/2}$ is a projection matrix and can be shown to be symmetric, its rank is equal to its trace (proof here), i.e. its rank is $r-1$. This completes the proof.

References:

1. Ferguson, T. S. (1996). A Course in Large Sample Theory.