What is Andrew’s Sine?

Notational set-up

Let’s say we have data x_1, \dots, x_n \stackrel{i.i.d.}{\sim} P_\theta for some probability distribution P_\theta, and we want to use the data to estimate the parameter \theta. If our estimate \hat\theta is the solution to a minimization problem of the form

\begin{aligned} \text{minimize}_\theta \quad \sum_{i=1}^n \rho (x_i ; \theta) \end{aligned}

for some function \rho, then \hat\theta is called an M-estimator. In maximum likelihood statistics, we choose \rho (x ; \theta) = - \log f(x ; \theta), where f (\cdot ; \theta) is the probability density associated with P_\theta.

Define

\psi (x ; \theta) = \dfrac{\partial \rho(x ; \theta)}{\partial \theta}.

Sometimes, we prefer to think of \hat\theta as the solution to the implicit equation

\begin{aligned} \sum_{i=1}^n \psi(x ; \theta) = 0.  \end{aligned}

In the field of robust statistics, we want to choose \rho and/or \psi such that the solution to the problem above has some robustness properties. These are the functions being referred to when you come across the terms “rho function” or “psi function” in this field.

Andrew’s Sine

In this blog we’ve already come across one rho/psi function used in robust statistics: the Tukey loss function. Andrew’s Sine is another psi function that appears in robust statistics. It is defined by

\begin{aligned} \psi(x) = \begin{cases} \sin (x / c) &\text{if } |x| \leq c\pi, \\ 0 &\text{otherwise}, \end{cases} \end{aligned}

where c is a user-defined parameter. The rho function implied by this choice is

\begin{aligned} \rho(x) = \begin{cases} c[1 - \cos (z / c)] &\text{if } |x| \leq c\pi, \\ 2c &\text{otherwise}. \end{cases} \end{aligned}

Here are plots of both the rho and psi functions for a few choices of c:

Some history

Andrew’s Sine is named after D. F. Andrews. The first mention of it I could find was in Andrews (1974) (Reference 3), but it appears to have been proposed first by Andrews et al. 1972 (Reference 4), for which I can’t find a copy.

Choosing c

Reference 3 recommends using c = 1.5 or c = 1.8 without giving an explanation. Reference 5 suggests c = 1.339, noting that with this value of c, the corresponding M-estimator gives 95% efficiency at the normal distribution.

References:

  1. Wolfram MathWorld. Andrew’s Sine.
  2. Penn State Department of Statistics, STAT 501. 13.3 – Robust Regression Methods.
  3. Andrews, D. F. (1974). A Robust Method for Multiple Linear Regression.
  4. Andrews, D. F., et al. (1972). Robust Estimates of Location: Survey and Advances.
  5. Young, D. S. (2017). Handbook of Regression Methods.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s