How to iterate over multiple vectors to run multiple regressions in R-CodePudding

DATA BELOW

analysis<-tibble(off_race = c("hispanic", "hispanic", "white","white", "hispanic", "white", "hispanic", "white", "white", "white","hispanic"), any_black_uof = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), any_black_arrest = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), prop_white_scale = c(0.866619646027524, -1.14647499712298, 1.33793539994219, 0.593565300512359, -0.712819809606193, 0.3473585867755, -1.37025501425243, 1.16596624239715, 0.104521426674564, 0.104521426674564, -1.53728347122581),  prop_hisp_scale=c(-0.347382203637802, 1.54966785579018, 
    -0.833021026477168, -0.211470492567308, 1.48353691981021, 
    0.421968013870802, 2.63739845069911, -0.61002505397242, 0.66674880256898,0.66674880256898, 2.93190487813111))

I would like to run a series of regressions that iterate over these vectors

officer_race = c("black", "white", "hispanic")
primary_ind<-c("prop_white_scale","prop_hisp_scale","prop_black_scale")
outcome<-c("any_black_uof","any_white_uof","any_hisp_uof","any_black_arrest","any_white_arrest","any_hisp_arrest","any_black_stop","any_white_stop","any_hisp_stop")

Also of note, I would like to use the fixest package where the regressions would look like this

feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race =="black"])
feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race =="white"])
feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race=="hispanic"])
feols(any_black_uof~ prop_hisp_scale,data=analysis[analysis$off_race =="black"])
feols(any_black_uof~ prop_hisp_scale,data=analysis[analysis$off_race =="white"])

etc. iterating through all possible combinations and creating a list of lm objects.

Is this possible?

CodePudding user response：

Since you did not provide sample data I am using mtcars dataset as an example dataset.

I am using cyl variable as equivalent to race in your example.

primary_ind <- c("mpg", "gear", "disp")
outcome <- c("hp", "wt")

result <- lapply(split(mtcars, mtcars$cyl), function(x) {
  sapply(primary_ind, function(y) {
    sapply(outcome, function(z) {
      lm(paste(y, z, sep = "~"), x)
    }, simplify = FALSE)
  }, simplify = FALSE)
})

result

First we split the data by cyl values so that we have (3) list for each unique value (4, 6 and 8). Then for each individual dataset loop over primary_ind and outcome values and apply lm for each combination.

sapply with simplify = FALSE helps to identify primary_ind and outcome value for each model as it is saved in the name of the list.

CodePudding user response：

You can use the built-in multiple estimation tools: see the dedicated vignette. You also need to understand the formula expansion tools, presented here.

It seems you want to iterate over subsets of the data for different explanatory and dependent variables. Use split for the subsets and sw for the explanatory variables and c() for the dependent variables.

Here is a reproducible example:

library(fixest)

base = setNames(iris, c("y1", "y2", "x1", "x2", "species"))

lhs = c("y1", "y2")
rhs = c("x1", "x2")

mult_est = feols(.[lhs] ~  sw(.[, rhs]), base, split = ~ species)

etable(mult_est)
#>                         mult_est.1        mult_est.2        mult_est.3        mult_est.4
#> Sample (species)            setosa            setosa            setosa            setosa
#> Dependent Var.:                 y1                y1                y2                y2
#>                                                                                         
#> Constant         4.213*** (0.4156) 4.777*** (0.1239) 2.861*** (0.4564) 3.222*** (0.1349)
#> x1                0.5423. (0.2823)                     0.3879 (0.3100)                  
#> x2                                  0.9302. (0.4637)                     0.8372 (0.5049)
#> ________________ _________________ _________________ _________________ _________________
#> S.E. type                      IID               IID               IID               IID
#> Observations                    50                50                50                50
#> R2                         0.07138           0.07734           0.03158           0.05417
#> Adj. R2                    0.05204           0.05812           0.01140           0.03447
#> 
#>                          mult_est.5        mult_est.6         mult_est.7        mult_est.8
#> Sample (species)         versicolor        versicolor         versicolor        versicolor
#> Dependent Var.:                  y1                y1                 y2                y2
#>                                                                                           
#> Constant          2.408*** (0.4463) 4.045*** (0.4229)   1.175** (0.3421) 1.373*** (0.2296)
#> x1               0.8283*** (0.1041)                   0.3743*** (0.0798)                  
#> x2                                  1.426*** (0.3155)                    1.054*** (0.1713)
#> ________________ __________________ _________________ __________________ _________________
#> S.E. type                       IID               IID                IID               IID
#> Observations                     50                50                 50                50
#> R2                          0.56859           0.29862            0.31419           0.44089
#> Adj. R2                     0.55960           0.28401            0.29990           0.42925
#> 
#>                          mult_est.9       mult_est.10       mult_est.11        mult_est.12
#> Sample (species)          virginica         virginica         virginica          virginica
#> Dependent Var.:                  y1                y1                y2                 y2
#>                                                                                           
#> Constant            1.060* (0.4668) 5.269*** (0.6556) 1.673*** (0.4310)  1.695*** (0.2921)
#> x1               0.9957*** (0.0837)                   0.2343** (0.0773)                   
#> x2                                   0.6508* (0.3207)                   0.6314*** (0.1429)
#> ________________ __________________ _________________ _________________ __________________
#> S.E. type                       IID               IID               IID                IID
#> Observations                     50                50                50                 50
#> R2                          0.74688           0.07902           0.16084            0.28915
#> Adj. R2                     0.74161           0.05983           0.14335            0.27434
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1