Home > Net >  dplyr get linear regression coefficients
dplyr get linear regression coefficients

Time:01-12

I'm wondering if there is a better way is to get linear regression coefficients as columns in dplyr. Here is some sample data.

mydata <- 
  data.frame(
    Site = c(1,1,1,1,1,1,1,1),
    Site1 = c(2,3,2,3,2,3,2,3),
    Age = c(17, 52, 19, 18, 62, 53, 41, 24),
    Gender = c(1,2,1,1,2,2,2,1),
    Outcome = c(1,1,1,1,0,0,0,1)
  ) 

I wrote this helper function to turn summary(.data)$coefficients into columns

GetCoefficients <- function(.data){
  AllData <- data.frame()
  AllData[1, ] <- ""
  col_names <- colnames(summary(.data)$coefficients)
  row_names <- rownames(summary(.data)$coefficients)
  row_len <- length(row_names)
  col_len <- length(col_names)-1
  x <- summary(.data)$coefficients
  for (i in 1:length(x)){
    AllData <- AllData %>%
      mutate(!!paste0(row_names[ifelse(i%%row_len != 0, i%%row_len, row_len)],
                      "_",col_names[ceiling(i/col_len)]) := x[i])
  }
  return(AllData)
} 

Using the helper function I can put coefficients into my data.frame()

Linear_regression <- mydata %>%
  pivot_longer(starts_with("Site"),
               names_to = ".value",
               names_pattern = "(^Site)") %>% 
  group_by(Site) %>% 
  do(Reg = lm(Outcome ~ Age   Gender, data = .)) %>%
  mutate(rsq = summary(Reg)$r.squared) %>% 
  mutate(fun = GetCoefficients(Reg))

CodePudding user response:

Here is a combination of tidyverse and broom package to get your desired output.

Very handy here is group_split -> you get a list and then you iterate with purrrs map_dfr (by the way with map_dfr you get a dataframe otherwise with map you get a list) your regression lm(... through each list element. Using brooms glance gives the desired output:

library(tidyverse)
library(broom)

mydata %>%
  pivot_longer(starts_with("Site"),
               names_to = ".value",
               names_pattern = "(^Site)") %>% 
  mutate(Site=as.factor(Site)) %>% 
  group_by(Site) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(Outcome ~ Age Gender, data=df) %>% 
      glance() %>% 
      add_column(Site = unique(df$Site), .before = 1)
  })
  Site  r.squared adj.r.squared    sigma statistic  p.value    df logLik    AIC     BIC deviance df.residual  nobs
  <fct>     <dbl>         <dbl>    <dbl>     <dbl>    <dbl> <dbl>  <dbl>  <dbl>   <dbl>    <dbl>       <int> <int>
1 1         0.6           0.44  3.87e- 1  3.75e  0 1.01e- 1     2  -1.88   11.8   12.1  7.5 e- 1           5     8
2 2         1             1     2.22e-16  1.01e 31 2.22e-16     2 141.   -275.  -277.   4.93e-32           1     4
3 3         0.351        -0.946 6.97e- 1  2.71e- 1 8.05e- 1     2  -1.46   10.9    8.47 4.86e- 1           1     4
  •  Tags:  
  • Related