How do I create a simple linear regression function in R that iterates over the entire dataframe?

I am working through ISLR and am stuck on a question. Basically, I am trying to create a function that iterates through an entire dataframe. It is question 3.7, 15a.

For each predictor, fit a simple linear regression model to predictthe response. Describe your results. In which of the models isthere a statistically significant association between the predictor and the response? Create some plots to back up your assertions.

So my thinking is like this:

y = Boston$crim 
x = Boston[, -crim] 
TestF1 = lm(y ~ x) 
summary(TestF1) 

But this is nowhere near the right answer. I was hoping to break it down by:

  1. Iterate over the entire dataframe with crim as my response and the others as predictors
  2. Extract the p values that are statistically significant (or extract the ones insignificant)
  3. Move on to the next question (which is considerably easier)

But I am stuck. I've googled but can't find anything. I tried this combn(Boston) thing but it didn't work either. Please help, thank you.

1 answer

  • answered 2020-07-29 17:58 slava-kohut

    If your problem is to iterate over a data frame, here is an example for mtrcars (mpg is the targer variable, and the rest are predictors, assuming models with a single predictor). The idea is to generate strings and convert them to formulas:

    lms <- vector(mode = "list", length = ncol(mtcars)-1)
    
    for (i in seq_along(lms)){
      lms[[i]] <- lm(as.formula(paste0("mpg~",names(mtcars)[-1][i])), data = mtcars)
    }
    

    If you want to look at each and every variable combination, start with a model with all variables and then eliminate non-significant predictors finding the best model.