I’m writing a series of posts on various function options of the `glmnet`

function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation.

In this post, we will focus on the **standardize** option.

For reference, here is the full signature of the `glmnet`

function (v3.0-2):

glmnet(x, y, family = c("gaussian", "binomial", "poisson", "multinomial", "cox", "mgaussian"), weights, offset = NULL, alpha = 1, nlambda = 100, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, standardize = TRUE, intercept = TRUE, thresh = 1e-07, dfmax = nvars + 1, pmax = min(dfmax * 2 + 20, nvars), exclude, penalty.factor = rep(1, nvars), lower.limits = -Inf, upper.limits = Inf, maxit = 1e+05, type.gaussian = ifelse(nvars < 500, "covariance", "naive"), type.logistic = c("Newton", "modified.Newton"), standardize.response = FALSE, type.multinomial = c("ungrouped", "grouped"), relax = FALSE, trace.it = 0, ...)

Unless otherwise stated, will denote the number of observations, will denote the number of features, and `fit`

will denote the output/result of the `glmnet`

call. The data matrix is denoted by and the response is denoted by .

**standardize**

When `standardize = TRUE`

(default), columns of the data matrix `x`

are standardized, i.e. each column of `x`

has mean 0 and standard deviation 1. More specifically, we have that for each ,

, and .

Why might we want to do this? Standardizing our features before model fitting is common practice in statistical learning. This is because if our features are on vastly different scales, the features with larger scales will tend to dominate the action. (One instance where we might not want to standardize our features is if they are already all measured along the same scale, e.g. meters or kilograms.)

Notice that the standardization here is slightly different from that offered by the `scale`

function: `scale(x, center = TRUE, scale = TRUE)`

gives the standardization

, and .

We verify this with a small data example. Generate data according to the following code:

n <- 100; p <- 5; true_p <- 2 set.seed(950) X <- matrix(rnorm(n * p), nrow = n) beta <- matrix(c(rep(1, true_p), rep(0, p - true_p)), ncol = 1) y <- X %*% beta + 3 * rnorm(n)

Create a version of the data matrix which has standardized columns:

X_centered <- apply(X, 2, function(x) x – mean(x))

Xs <- apply(X_centered, 2, function(x) x / sqrt(sum(x^2) / n))

Next, we run `glmnet`

on `Xs`

and `y`

with both possible options for `standardize`

:

library(glmnet) fit <- glmnet(Xs, y, standardize = TRUE) fit2 <- glmnet(Xs, y, standardize = FALSE)

We can check that we get the same fit in both cases (modulo numerical precision):

sum(fit$lambda != fit2$lambda)

# 0

max(abs(fit$beta – fit2$beta))

# 6.661338e-16

The documentation notes that the coefficients returned are on the original scale. Let's confirm that with our small data set. Run `glmnet`

with the original data matrix and `standardize = TRUE`

:

fit3 <- glmnet(X, y, standardize = TRUE)

For each column , our standardized variables are , where and are the mean and standard deviation of column respectively. If and represent the model coefficients of `fit2`

and `fit3`

respectively, then we should have

i.e. we should have and for . The code below checks that this is indeed the case (modulo numerical precision):

# get column means and SDs X_mean <- colMeans(X) X_sd <- apply(X_centered, 2, function(x) sqrt(sum(x^2) / n)) # check difference for intercepts fit2_int <- coefficients(fit2)[1,] fit3_int <- coefficients(fit3)[1,] temp <- fit2_int - colSums(diag(X_mean / X_sd) %*% fit2$beta) max(abs(temp - fit3_int)) # 1.110223e-16 # check difference for feature coefficients temp <- diag(1 / X_sd) %*% fit2$beta max(abs(temp - fit3$beta)) # 1.110223e-15

The discussion above has been for the standardization of `x`

. What about standardization for `y`

? The documentation notes that when `family = "gaussian"`

, `y`

is automatically standardized, and the coefficients are unstandardized at the end of the procedure.

More concretely, let the mean and standard deviation of be denoted by and respectively. If running `glmnet`

on standardized `y`

gives intercept and coefficients , then `glmnet`

on unstandardized `y`

will give intercept and coefficients .

Again, this can be verified empirically:

# get mean and SD of y y_mean <- mean(y) y_sd <- sqrt(sum((y - y_mean)^2) / n) # fit model with standardized y fit4 <- glmnet(X, (y - y_mean) / y_sd, standardize = TRUE) # check difference for intercepts fit4_int <- coefficients(fit4)[1,] temp <- fit4_int * y_sd + y_mean max(abs(temp - fit3_int)) # 1.110223e-16 # check difference for feature coefficients max(abs(y_sd * fit4$beta - fit3$beta)) # 8.881784e-16

Pingback: Distilled News | Analytixon

Thanks for this article! It helped me understand what was going on with standardization in glmnet.

I did notice what appears to be a typo. I think the beta_j term is missing from the equation for gamma_0. It should be gamma_0 = – sum_{j=1}^p beta_j mu_j / s_j, no? The beta_j term is included in the code on line 8.

LikeLike

Yes you are right! I’ve fixed it now. Thanks for the note!

LikeLike