# What is Andrew’s Sine?

Notational set-up

Let’s say we have data $x_1, \dots, x_n \stackrel{i.i.d.}{\sim} P_\theta$ for some probability distribution $P_\theta$, and we want to use the data to estimate the parameter $\theta$. If our estimate $\hat\theta$ is the solution to a minimization problem of the form

\begin{aligned} \text{minimize}_\theta \quad \sum_{i=1}^n \rho (x_i ; \theta) \end{aligned}

for some function $\rho$, then $\hat\theta$ is called an M-estimator. In maximum likelihood statistics, we choose $\rho (x ; \theta) = - \log f(x ; \theta)$, where $f (\cdot ; \theta)$ is the probability density associated with $P_\theta$.

Define

$\psi (x ; \theta) = \dfrac{\partial \rho(x ; \theta)}{\partial \theta}.$

Sometimes, we prefer to think of $\hat\theta$ as the solution to the implicit equation

\begin{aligned} \sum_{i=1}^n \psi(x ; \theta) = 0. \end{aligned}

In the field of robust statistics, we want to choose $\rho$ and/or $\psi$ such that the solution to the problem above has some robustness properties. These are the functions being referred to when you come across the terms “rho function” or “psi function” in this field.

Andrew’s Sine

In this blog we’ve already come across one rho/psi function used in robust statistics: the Tukey loss function. Andrew’s Sine is another psi function that appears in robust statistics. It is defined by

\begin{aligned} \psi(x) = \begin{cases} \sin (x / c) &\text{if } |x| \leq c\pi, \\ 0 &\text{otherwise}, \end{cases} \end{aligned}

where $c$ is a user-defined parameter. The rho function implied by this choice is

\begin{aligned} \rho(x) = \begin{cases} c[1 - \cos (z / c)] &\text{if } |x| \leq c\pi, \\ 2c &\text{otherwise}. \end{cases} \end{aligned}

Here are plots of both the rho and psi functions for a few choices of $c$:

Some history

Andrew’s Sine is named after D. F. Andrews. The first mention of it I could find was in Andrews (1974) (Reference 3), but it appears to have been proposed first by Andrews et al. 1972 (Reference 4), for which I can’t find a copy.

Choosing $c$

Reference 3 recommends using $c = 1.5$ or $c = 1.8$ without giving an explanation. Reference 5 suggests $c = 1.339$, noting that with this value of $c$, the corresponding M-estimator gives 95% efficiency at the normal distribution.

References:

1. Wolfram MathWorld. Andrew’s Sine.
2. Penn State Department of Statistics, STAT 501. 13.3 – Robust Regression Methods.
3. Andrews, D. F. (1974). A Robust Method for Multiple Linear Regression.
4. Andrews, D. F., et al. (1972). Robust Estimates of Location: Survey and Advances.
5. Young, D. S. (2017). Handbook of Regression Methods.