General chi-square tests

In this previous post, I wrote about the asymptotic distribution of the Pearson \chi^2 statistic. Did you know that the Pearson \chi^2 statistic (and the related hypothesis test) is actually a special case of a general class of \chi^2 tests? In this post we describe the general \chi^2 test. The presentation follows that in Chapters 23 and 24 of Ferguson (1996) (Reference 1). I’m leaving out the proofs, which can be found in the reference.

(Warning: This post is going to be pretty abstract! Nevertheless, I think it’s worth a post since I don’t think the idea is well-known.)

Let’s define some quantities. Let Z_1, Z_2, \dots \in \mathbb{R}^d be a sequence of random vectors whose distribution depends on a k-dimensional parameter \theta which lies in a parameter space \Theta. \Theta is assumed to be a non-empty open subset of \mathbb{R}^k, where k \leq d. Next, assume that the Z_n are asymptotically normal, i.e. there exist A(\theta) \in \mathbb{R}^d and covariance matrix C(\theta) \in \mathbb{R}^{d \times d} such that

\begin{aligned} \sqrt{n} \left( Z_n - A(\theta) \right) \stackrel{d}{\rightarrow} \mathcal{N} \left( 0, C(\theta)\right). \end{aligned}

Next, let M(\theta) \in \mathbb{R}^{d \times d} be some covariance matrix, and define the quadratic form

\begin{aligned} Q_n(\theta) = n \left( Z_n - A(\theta) \right)^\top M(\theta) \left( Z_n - A(\theta) \right). \end{aligned}

We make 3 assumptions about A(\theta) and M(\theta):

  • A(\theta) is bicontinuous, i.e. \theta_n \rightarrow \theta \;\Leftrightarrow \; A(\theta_n) \rightarrow A(\theta).
  • A(\theta) has a continuous first partial derivative, \dot{A}(\theta), of full rank k.
  • M(\theta) is continuous is \theta and is uniformly bounded below, in the sense that there is a constant \alpha > 0 such that M(\theta) > \alpha I for all \theta \in \Theta.

Definition. A minimum \chi^2 estimate is a value of \theta, depending on Z_n, that minimizes Q_n(\theta). A sequence \theta_n^* (Z_n) is a minimum \chi^2 sequence if

\begin{aligned} Q_n(\theta_n^*) - \inf_{\theta \in \Theta} Q_n(\theta) \stackrel{P}{\rightarrow} 0, \end{aligned}

whatever the true value of \theta \in \Theta. Q_n(\theta_n^*) is going to be the statistic in our hypothesis test.

Let \theta_0 denote the true value of the parameter, and let \dot{A}, M and C denote \dot{A}(\theta_0), M(\theta_0) and C(\theta_0) respectively. The first theorem states that minimum \chi^2 sequences are asymptotically normal with a specific mean and covariance:

Theorem 1. For any minimum \chi^2 sequence \left\{ \theta_n^* \right\}, \sqrt{n} \left( \theta_n^* - \theta_0 \right) \stackrel{d}{\rightarrow} \mathcal{N}(0, \Sigma), where

\begin{aligned} \Sigma = \left( \dot{A}^\top M \dot{A} \right)^{-1} \dot{A}^\top MCM\dot{A} \left( \dot{A}^\top M \dot{A} \right)^{-1}. \end{aligned}

The theorem above holds for any covariance matrix M (that satisfies the assumption we make on it). We are interested in finding the matrix M that makes the asymptotic covariance matrix above the smallest. The corollary below tells us what this M is:

Corollary. If there is a non-singular M_0 \in\mathbb{R}^{d \times d} such that CM_0 \dot{A} = \dot{A}, then \Sigma (M_0) = \left( \dot{A}^\top M_0 \dot{A} \right)^{-1}. Moreover, \Sigma(M_0) \leq \Sigma(M) for all M.

It can be shown that we can take M_0 to be any generalized inverse of C. If C is non-singular, then we can take M_0 = C^{-1}. For this choice of M_0, the next theorem gives the asymptotic distribution of Q_n(\theta_n^*):

Theorem 2. Q_n(\theta_n^*) \stackrel{d}{\rightarrow} \chi_{v - k}^2, where v is the rank of C(\theta_0).

Application to Pearson’s \chi^2

Whew! That was a lot. Let’s see how Pearson’s \chi^2 is a special case of this. (You might want to have the previous post open for reference.) For Pearson’s \chi^2,

  • d = J, the number of possible outcomes for each trial.
  • Z_n = \overline{X}_n, the vector of relative cell frequencies.
  • A(\theta) = p, the vector of cell probabilities, written as a function of some k-dimensional parameter \theta, with k \leq J-1.
  • C(\theta) = P(\theta) - p(\theta) p(\theta)^\top.
  • To make the expression Q_n(\theta) equal to Pearson’s \chi^2 statistic, we have to take M(\theta) = P(\theta)^{-1}.
  • It can be shown that P(\theta)^{-1} is a generalized inverse of C(\theta). Hence, Theorem 2 applies.

Other applications of general chi-square theory

This general chi-square theory has been used in recent years to construct hypothesis tests that work under various forms of privatized data: see References 2-4 (Reference 4 just came out this week!). There are probably other applications of this theory: if you know of any, please share!

References:

  1. Ferguson, T. S. (1996). A Course in Large Sample Theory.
  2. Kifer, D., and Rogers, R. (2017). A New Class of Private Chi-Square Hypothesis Tests.
  3. Gaboardi, M., and Rogers, R. (2018). Local Private Hypothesis Testing: Chi-Square Tests.
  4. Friedberg, R., and Rogers, R. (2022). Privacy Aware Experimentation over Sensitive Groups: A General Chi Square Approach.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s