I recently learnt of the Shapiro-Wilk test from this blog post. So what is it?
The Shapiro-Wilk test is a statistical test for the hypothesis that a group of values come from a normal distribution. (The mean and variance of this normal distribution need not be 0 or 1 respectively.) Empirically, this test appears to have the best power (among tests that test for normality).
Assume that the data are and that we want to test if they come for a population that is normally distributed. The test statistic is
- is the mean of the ‘s,
- are the order statistics,
- are “constants generated from the means, variances and covariances of the order statistics of a sample of size from a normal distribution” (Ref 3).
Let be the expected values of the standard normal order statistics, and let be the corresponding covariance matrix. Then
We reject the null hypothesis (the data come from a normal distribution) if is small. In R, the Shapiro-Wilk test can be performed with the
As far as I can tell there isn’t a closed form for the distribution of under the null.
What is the intuition behind this test? Reference 2 has a good explanation for this:
The basis idea behind the Shapiro-Wilk test is to estimate the variance of the sample in two ways: (1) the regression line in the QQ-Plot allows to estimate the variance, and (2) the variance of the sample can also be regarded as an estimator of the population variance. Both estimated values should approximately equal in the case of a normal distribution and thus should result in a quotient of close to 1.0. If the quotient is significantly lower than 1.0 then the null hypothesis (of having a normal distribution) should be rejected.
Why is it called the Shapiro-Wilk test? It was proposed by S. S. Shapiro and M. B. Wilk in a 1965 Biometrika paper “An Analysis of Variance Test for Normality“. This original paper proves a number of properties of the statistic, e.g. .
- Wikipedia. Shapiro-Wilk test.
- Fundamentals of Statistics. Shapiro-Wilk test.
- Engineering Statistics Handbook. Section 184.108.40.206. Anderson-Darling and Shapiro-Wilk tests.