I’ve had the privilege of working with Trevor Hastie on an extension of the `glmnet`

package which has just been released. In essence, the `glmnet()`

function’s `family`

parameter can now be any object of class `family`

. **This enables the user to fit any generalized linear model with the elastic net penalty.**

`glmnet v4.0`

is now available on CRAN here. We have also written a vignette which explains and demonstrates this new functionality. This blog post presents what we’ve done from a different perspective, and goes into a bit more detail on how we made it work.

I would love for you to download the package and try out this new functionality! Comments/suggestions/bug reports welcome 🙂

**Background on generalized linear models (GLM)**

Let’s say we have observations with corresponding responses . The simple linear model assumes that the response is a linear combination of features plus some (mean-zero) noise:

* Generalized linear models (GLM)* are an extension of linear models which allow more flexible modeling. A GLM consists of 3 parts:

- A linear predictor ,
- A link function , and
- A random component .

(You can read more about them here.) and are specified by the GLM, while and are parameters that we have to estimate. With these 3 components and the data at hand, we can define the log-likelihood function and estimate and via maximum likelihood estimation. If we define to be the negative log-likelihood associated with observation , this amounts to solving

This can be solved via a technique called * iteratively reweighted least squares (IRLS)*. At a high level this is what it does:

** Step 1: **Compute new weights and new working responses .

** Step 2: **Solve the

*problem*

**weighted least squares (WLS)*** Step 3:* Repeat Steps 1 and 2 until convergence.

Essentially in each cycle we are making a quadratic approximation of the negative log-likelihood, then minimizing that approximation. (For more details, see this webpage or this older post.) The key point here is that once we find an efficient way to do Step 2 (solve WLS), we have a way to solve the GLM optimization problem.

This is how the `glm()`

function in R fits GLMs. We specify the 3 components of the GLM via an object of class `family`

. Below are examples of how we might run logistic regression and probit regression in R:

glm(y ~ x, family = binomial()) # logistic regression glm(y ~ x, family = binomial(link = "probit")) # probit regression

**GLMs with the elastic net penalty**

In many instances, instead of minimizing the log-likelihood we want to minimize the sum of the log-likelihood and a penalty term. The penalty we choose will influence the properties that our solution will have. A popular choice of penalty is the elastic net penalty

Here, and are hyperparameters that the user chooses. The elastic net penalty reduces to the lasso penalty when , and to the ridge penalty when .

The `glmnet`

package solves this minimization problem for a grid of values. The IRLS algorithm used to compute the GLM solution can be easily adapted to compute the solution to :

** Step 1: **Compute new weights and new working responses .

** Step 2: **Solve the

*WLS problem*

**penalized*** Step 3:* Repeat Steps 1 and 2 until convergence.

Step 1 is exactly the same as in the GLM algorithm, so as long as we have an efficient routine for solving , we have a way to optimize the penalized likelihood.

**glmnet v4.0**

While we could do the above for any GLM family in theory, it was not implemented in practice. Before v4.0, `glmnet()`

could only optimize the penalized likelihood for special GLM families (e.g. ordinary least squares, logistic regression, Poisson regression). For each family, which we specified via a character string for the `family`

parameter, we had custom FORTRAN code that ran the modified IRLS algorithm above. While this was computationally efficient, it did not allow us to fit any penalized GLM of our choosing.

From v4.0 onwards, we can do the above for any GLM family * in practice*. We can do so by passing a class

`family`

object to the `family`

parameter instead of a character string. For example, if we want to do probit regression with the elastic net penalty, we would do something like this:glmnet(x, y, family = binomial(link = "probit"))

Underneath the hood, instead of having custom FORTRAN code for each family, we have a FORTRAN subroutine that solves efficiently. Everything surrounding the FORTRAN subroutine is basically the same as `glmnet()`

/`glm()`

. (Only in theory of course: as with most engineering tasks the devil is in the details!)

For the special families that `glmnet`

pre-v4.0 could fit, we still recommend passing a character string to the `family`

parameter as it would run more quickly. (The vignette has more details on this.)