How heavy-tailed is the t distribution? (Part 2)

In this previous post, we explored how heavy-tailed the t distribution is through the question: “What is the probability that the random variable is at least x standard deviations (SDs) away from the mean?” For the most part, the smaller the degrees of freedom, the larger this probability was (more “heavy-tailed”), until we realized that the trend reversed for really small degrees of freedom (2.1 in the post). In fact, for 1 < df \leq 2, the variance of the t distribution is infinite, and so the random variable is always within 1 SD of the mean!

We need another way to think about heavy-tailedness. (The code to produce the figures in this post is available here.)

A first approach that doesn’t work

You might be wondering, why didn’t I just plot P\{ T > x \} with T \sim t_{df} against x, for various values of x and df? If I did that, I would have ended up with the plot below (for the log of the probabilities):

That seems to be exactly what we want: the smaller the degrees of freedom, the slower this probability decays…

The problem is that the comparison above ignores the scale of the random variables. Imagine if we tried to make the plot above, but instead of plotting lines for the t distribution with different degrees of freedom, let’s plot it for the normal distribution with different standard deviations. This is what we would get:

That seems to give the same trend as the plot before! Can we then conclude that the \mathcal{N}(0, 10^2) distribution is more heavy-tailed than the \mathcal{N}(0, 1) distribution??

One way to incorporate scale

The discussion above illustrates the need to take scale into account. We tried to do this in the previous post by scaling each distribution by its own SD, but that idea broke down for small degrees of freedom.

Here’s an idea: Pick some threshold x'. For each random variable T, find the scale factor k such that P \{ kT > x' \} = P \{ \mathcal{N}(0, 1) > x' \}. For this value of k, kT and \mathcal{N}(0, 1) are on the same scale w.r.t. this threshold. We then compare the tail probabilities of kT and \mathcal{N}(0, 1) (instead of T and \mathcal{N}(0, 1)).

Finding k is not hard: here’s a three-line function that does it for the t distribution in R:

getScaleFactor <- function(df, threshold) {
  tailProb <- pnorm(threshold, lower.tail = FALSE)
  tQuantile <- qt(tailProb, df = df, lower.tail = FALSE)
  return(threshold / tQuantile)

Let’s plot the log10 of the tail probability P \{ kT > x \} with T \sim t_{df} against x for various values of x and df, with the scale factor k computed as above:

By definition, the tail probabilities will coincide when x is equal to the threshold used to compute the scale factors. We now see a clear trend with no breakdown: for smaller values of df, the tail probability P \{ kT > x \} is larger.

Another side benefit of this way to looking at tail probabilities is that we can now compare distributions which have infinite variance, or even an undefined mean (like the Cauchy distribution, which is the t distribution with one degree of freedom)! Here is the same plot as above but for smaller degrees of freedom:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s