I recently encountered a situation where I wanted to run several linear models, but where the response variables would depend on previous steps in the data analysis pipeline. Let me illustrate using the `mtcars`

dataset:

data(mtcars) head(mtcars) #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 #> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 #> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 #> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Let’s say I wanted to fit a linear model of `mpg`

vs. `hp`

and get the coefficients. This is easy:

lm(mpg ~ hp, data = mtcars)$coefficients #> (Intercept) hp #> 30.09886054 -0.06822828

But what if I wanted to fit a linear model of `y`

vs. `hp`

, where `y`

is a response variable that I won’t know until runtime? Or what if I want to fit 3 linear models: each of `mpg`

, `disp`

, `drat`

vs. `hp`

? Or what if I want to fit 300 such models? There has to be a way to do this programmatically.

It turns out that there are at least 4 different ways to achieve this in R. For all these methods, let’s assume that the responses we want to fit models for are in a character vector:

response_list <- c("mpg", "disp", "drat")

Here are the 4 ways I know (in decreasing order of preference):

**1. as.formula()**

`as.formula()`

converts a string to a formula object. Hence, we can programmatically create the formula we want as a string, then pass that string to `as.formula()`

:

for (y in response_list) { lmfit <- lm(as.formula(paste(y, "~ hp")), data = mtcars) print(lmfit$coefficients) } #> (Intercept) hp #> 30.09886054 -0.06822828 #> (Intercept) hp #> 20.99248 1.42977 #> (Intercept) hp #> 4.10990867 -0.00349959

**2. Don’t specify the data option**

Passing the `data = mtcars`

option to `lm()`

gives us more succinct and readable code. However, `lm()`

also accepts the response vector and data matrix themselves:

for (y in response_list) { lmfit <- lm(mtcars[[y]] ~ mtcars$hp) print(lmfit$coefficients) } #> (Intercept) hp #> 30.09886054 -0.06822828 #> (Intercept) hp #> 20.99248 1.42977 #> (Intercept) hp #> 4.10990867 -0.00349959

**Edit:** Commenter Tommaso Gennari shared a really nice solution that makes use of the fact that when you give `lm()`

just a data frame, the first column is used as a dependent variable and the remaining columns are treated as independent variables:

for (y in response_list) { lmfit <- lm(mtcars[, c(y, "hp")]) print(lmfit$coefficients) } #> (Intercept) hp #> 30.09886054 -0.06822828 #> (Intercept) hp #> 20.99248 1.42977 #> (Intercept) hp #> 4.10990867 -0.00349959

**3. get()**

`get()`

searches for an R object by name and returns that object if it exists.

for (y in response_list) { lmfit <- lm(get(y) ~ hp, data = mtcars) print(lmfit$coefficients) } #> (Intercept) hp #> 30.09886054 -0.06822828 #> (Intercept) hp #> 20.99248 1.42977 #> (Intercept) hp #> 4.10990867 -0.00349959

**4. eval(parse())**

This one is a little complicated. `parse()`

returns the parsed but unevaluated expressions, while `eval()`

evaluates those expressions (in a specified environment).

for (y in response_list) { lmfit <- lm(eval(parse(text = y)) ~ hp, data = mtcars) print(lmfit$coefficients) } #> (Intercept) hp #> 30.09886054 -0.06822828 #> (Intercept) hp #> 20.99248 1.42977 #> (Intercept) hp #> 4.10990867 -0.00349959

Of course, for any of these methods, we could replace the outer loop with `apply()`

or `purrr::map()`

.

References:

You reminded me of something similar that I did 8 years ago .. https://stats.stackexchange.com/questions/6856/aggregating-results-from-linear-model-runs-r/6862#6862

LikeLike

Actually you could simplify even more way n.2 using: lm(mtcars[,c(y,”hp”)])

(I have not tested this expression and there might be a detail I am not foreseeing; however my point is that when you feed into lm just a data frame, the first column is used as a dependent variable, and all the remaining as independent) – hope this helps!

LikeLike

Wow that’s really nifty! I’ll add it to the post later 🙂

LikeLike