glmnet v4.1: regularized Cox models for (start, stop] and stratified data

My latest work on the glmnet package has just been pushed to CRAN! In this release (v4.1), we extend the scope of regularized Cox models to include (start, stop] data and strata variables. In addition, we provide the survfit method for plotting survival curves based on the model (as the survival package does).

Why is this a big deal? As explained in Therneau and Grambsch (2000), the ability to work with start-stop responses opens the door to fitting regularized Cox models with:

time-dependent covariates,
time-dependent strata,
left truncation,
multiple time scales,
multiple events per subject,
independent increment, marginal, and conditional models for correlated data, and
various forms of case-cohort models.

glmnet v4.1 is now available on CRAN here. We have reorganized the package’s vignettes, with the new functionality described in the vignette “Regularized Cox Regression” (PDF version/web version). Don’t hesitate to reach out if you have questions.

(Note: This is joint work with Trevor Hastie, Balasubramanian Narasimhan and Rob Tibshirani.)

9 thoughts on “glmnet v4.1: regularized Cox models for (start, stop] and stratified data”

Thanks for the update, and really timely it is. I am currently working on regularization to obtain best lambda value for Ridge Regression. A vignette on that will be appreciated, so many copies examples , confusing one’s abound.
Good work.
Regards,
Ibiloye

LikeLike

Reply ↓

kjytay on January 14, 2021 at 11:15 pm said:

Hi, the main glmnet vignette would be most helpful for you. You can use cross-validation (CV) to find the value of lambda that gives smallest CV error (see here: https://glmnet.stanford.edu/articles/glmnet.html#cross-validation).

LikeLike

Reply ↓

Hi Kenneth,
Thank you so much for your work on updating the Cox model functionality of the glmnet package. It is great timing as I have many possible predictors to include in my Cox model and would love to use the lasso penalty to reduce them. My question is, in the survival package, there is an “id” argument in the coxph function that allows you to identify which rows belong to the same patient when you have time-dependent covariates with multiple lines. How can you separate out patients in the glmnet version with time-dependent covariates? I don’t see an equivalent argument. Should you consider each patient to be a strata and use the stratifySurv function?
Thank you,
Natalie

LikeLike

Reply ↓

kjytay on March 2, 2021 at 10:40 pm said:

I’m not familiar with what the “id” argument does in survival::coxph… Considering each patient as its own stratum means that each patient has its own baseline hazard rate, which doesn’t seem correct to me.

I guess until I figure out how “id” is used to fit the Cox model it’ll be hard for me to advise. For now, glmnet doesn’t have such an argument, and probably won’t have one until there is a compelling use case for it.

LikeLike

Reply ↓
- Shaun on March 10, 2023 at 1:53 pm said:
  
  I’ve had a similar question, which as brought me here ..
  
  If you look at the survival vignettes – https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf you’ll it used on page 7, when referring to patient IDs with start some data
  
  coxph(Surv(tstart, tstop, infect) ~ treat + inherit + steroids, data =newcgd, cluster = id)
  
  Seems to be used sporadically in that document
  
  LikeLike

Thanks for the update. I wonder how can I use predict() function in a cox model. The vignette reefers to the predict function in anything but survival data, in which no censoring is accounted for. I would appreciate any tip or help

LikeLike

Reply ↓

kjytay on June 17, 2021 at 6:09 pm said:

For the Cox model, predict with type="link" (default) gives the value of the linear predictor (x^T \beta), while type="response" will give exp(x^T \beta).

LikeLike

Reply ↓

Thank you for develping such useful package! Could you tell me more on (start, stop] data? I want to fit a cox model with elastic-net in a large dataset of patients, my dependent variable would be time-to cardiovascular event. Since some patients had more than one CV event thorugh their follow-up, I thinK it might be interesting to exploit the (start,stop] data model to evaluate risk of multiple occurences of cardiovascular events. Is this a right situation to apply this type of analysisi? Thank you!

LikeLike

Reply ↓

kjytay on October 18, 2021 at 4:31 am said:

Hi! I’m personally not too familiar with what would be best here. A good starting reference is Section 3.7 of Therneau & Grambsch’s “Modeling survival data”. The R survival package should also have some documentation/vignettes on this.

LikeLike

Reply ↓

Statistical Odds & Ends

glmnet v4.1: regularized Cox models for (start, stop] and stratified data

9 thoughts on “glmnet v4.1: regularized Cox models for (start, stop] and stratified data”

Leave a comment Cancel reply

Share this:

Related

9 thoughts on “glmnet v4.1: regularized Cox models for (start, stop] and stratified data”

Leave a comment Cancel reply