glmnet v4.1: regularized Cox models for (start, stop] and stratified data

My latest work on the glmnet package has just been pushed to CRAN! In this release (v4.1), we extend the scope of regularized Cox models to include (start, stop] data and strata variables. In addition, we provide the survfit method for plotting survival curves based on the model (as the survival package does).

Why is this a big deal? As explained in Therneau and Grambsch (2000), the ability to work with start-stop responses opens the door to fitting regularized Cox models with:

  • time-dependent covariates,
  • time-dependent strata,
  • left truncation,
  • multiple time scales,
  • multiple events per subject,
  • independent increment, marginal, and conditional models for correlated data, and
  • various forms of case-cohort models.

glmnet v4.1 is now available on CRAN here. We have reorganized the package’s vignettes, with the new functionality described in the vignette “Regularized Cox Regression” (PDF version/web version). Don’t hesitate to reach out if you have questions.

(Note: This is joint work with Trevor Hastie, Balasubramanian Narasimhan and Rob Tibshirani.)

9 thoughts on “glmnet v4.1: regularized Cox models for (start, stop] and stratified data

  1. Thanks for the update, and really timely it is. I am currently working on regularization to obtain best lambda value for Ridge Regression. A vignette on that will be appreciated, so many copies examples , confusing one’s abound.
    Good work.
    Regards,
    Ibiloye

    Like

  2. Hi Kenneth,
    Thank you so much for your work on updating the Cox model functionality of the glmnet package. It is great timing as I have many possible predictors to include in my Cox model and would love to use the lasso penalty to reduce them. My question is, in the survival package, there is an “id” argument in the coxph function that allows you to identify which rows belong to the same patient when you have time-dependent covariates with multiple lines. How can you separate out patients in the glmnet version with time-dependent covariates? I don’t see an equivalent argument. Should you consider each patient to be a strata and use the stratifySurv function?
    Thank you,
    Natalie

    Like

    • I’m not familiar with what the “id” argument does in survival::coxph… Considering each patient as its own stratum means that each patient has its own baseline hazard rate, which doesn’t seem correct to me.

      I guess until I figure out how “id” is used to fit the Cox model it’ll be hard for me to advise. For now, glmnet doesn’t have such an argument, and probably won’t have one until there is a compelling use case for it.

      Like

  3. Thanks for the update. I wonder how can I use predict() function in a cox model. The vignette reefers to the predict function in anything but survival data, in which no censoring is accounted for. I would appreciate any tip or help

    Like

    • For the Cox model, predict with type="link" (default) gives the value of the linear predictor (x^T \beta), while type="response" will give exp(x^T \beta).

      Like

  4. Thank you for develping such useful package! Could you tell me more on (start, stop] data? I want to fit a cox model with elastic-net in a large dataset of patients, my dependent variable would be time-to cardiovascular event. Since some patients had more than one CV event thorugh their follow-up, I thinK it might be interesting to exploit the (start,stop] data model to evaluate risk of multiple occurences of cardiovascular events. Is this a right situation to apply this type of analysisi? Thank you!

    Like

    • Hi! I’m personally not too familiar with what would be best here. A good starting reference is Section 3.7 of Therneau & Grambsch’s “Modeling survival data”. The R survival package should also have some documentation/vignettes on this.

      Like

Leave a comment