DATA BELOW
analysis<-tibble(off_race = c("hispanic", "hispanic", "white","white", "hispanic", "white", "hispanic", "white", "white", "white","hispanic"), any_black_uof = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), any_black_arrest = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), prop_white_scale = c(0.866619646027524, -1.14647499712298, 1.33793539994219, 0.593565300512359, -0.712819809606193, 0.3473585867755, -1.37025501425243, 1.16596624239715, 0.104521426674564, 0.104521426674564, -1.53728347122581), prop_hisp_scale=c(-0.347382203637802, 1.54966785579018,
-0.833021026477168, -0.211470492567308, 1.48353691981021,
0.421968013870802, 2.63739845069911, -0.61002505397242, 0.66674880256898,0.66674880256898, 2.93190487813111))
I would like to run a series of regressions that iterate over these vectors
officer_race = c("black", "white", "hispanic")
primary_ind<-c("prop_white_scale","prop_hisp_scale","prop_black_scale")
outcome<-c("any_black_uof","any_white_uof","any_hisp_uof","any_black_arrest","any_white_arrest","any_hisp_arrest","any_black_stop","any_white_stop","any_hisp_stop")
Also of note, I would like to use the fixest package where the regressions would look like this
feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race =="black"])
feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race =="white"])
feols(any_black_uof~ prop_white_scale,data=analysis[analysis$off_race=="hispanic"])
feols(any_black_uof~ prop_hisp_scale,data=analysis[analysis$off_race =="black"])
feols(any_black_uof~ prop_hisp_scale,data=analysis[analysis$off_race =="white"])
etc. iterating through all possible combinations and creating a list of lm objects.
Is this possible?
CodePudding user response:
Since you did not provide sample data I am using mtcars dataset as an example dataset.
I am using cyl variable as equivalent to race in your example.
primary_ind <- c("mpg", "gear", "disp")
outcome <- c("hp", "wt")
result <- lapply(split(mtcars, mtcars$cyl), function(x) {
sapply(primary_ind, function(y) {
sapply(outcome, function(z) {
lm(paste(y, z, sep = "~"), x)
}, simplify = FALSE)
}, simplify = FALSE)
})
result
First we split the data by cyl values so that we have (3) list for each unique value (4, 6 and 8). Then for each individual dataset loop over primary_ind and outcome values and apply lm for each combination.
sapply with simplify = FALSE helps to identify primary_ind and outcome value for each model as it is saved in the name of the list.
CodePudding user response:
You can use the built-in multiple estimation tools: see the dedicated vignette. You also need to understand the formula expansion tools, presented here.
It seems you want to iterate over subsets of the data for different explanatory and dependent variables.
Use split for the subsets and sw for the explanatory variables and c() for the dependent variables.
Here is a reproducible example:
library(fixest)
base = setNames(iris, c("y1", "y2", "x1", "x2", "species"))
lhs = c("y1", "y2")
rhs = c("x1", "x2")
mult_est = feols(.[lhs] ~ sw(.[, rhs]), base, split = ~ species)
etable(mult_est)
#> mult_est.1 mult_est.2 mult_est.3 mult_est.4
#> Sample (species) setosa setosa setosa setosa
#> Dependent Var.: y1 y1 y2 y2
#>
#> Constant 4.213*** (0.4156) 4.777*** (0.1239) 2.861*** (0.4564) 3.222*** (0.1349)
#> x1 0.5423. (0.2823) 0.3879 (0.3100)
#> x2 0.9302. (0.4637) 0.8372 (0.5049)
#> ________________ _________________ _________________ _________________ _________________
#> S.E. type IID IID IID IID
#> Observations 50 50 50 50
#> R2 0.07138 0.07734 0.03158 0.05417
#> Adj. R2 0.05204 0.05812 0.01140 0.03447
#>
#> mult_est.5 mult_est.6 mult_est.7 mult_est.8
#> Sample (species) versicolor versicolor versicolor versicolor
#> Dependent Var.: y1 y1 y2 y2
#>
#> Constant 2.408*** (0.4463) 4.045*** (0.4229) 1.175** (0.3421) 1.373*** (0.2296)
#> x1 0.8283*** (0.1041) 0.3743*** (0.0798)
#> x2 1.426*** (0.3155) 1.054*** (0.1713)
#> ________________ __________________ _________________ __________________ _________________
#> S.E. type IID IID IID IID
#> Observations 50 50 50 50
#> R2 0.56859 0.29862 0.31419 0.44089
#> Adj. R2 0.55960 0.28401 0.29990 0.42925
#>
#> mult_est.9 mult_est.10 mult_est.11 mult_est.12
#> Sample (species) virginica virginica virginica virginica
#> Dependent Var.: y1 y1 y2 y2
#>
#> Constant 1.060* (0.4668) 5.269*** (0.6556) 1.673*** (0.4310) 1.695*** (0.2921)
#> x1 0.9957*** (0.0837) 0.2343** (0.0773)
#> x2 0.6508* (0.3207) 0.6314*** (0.1429)
#> ________________ __________________ _________________ _________________ __________________
#> S.E. type IID IID IID IID
#> Observations 50 50 50 50
#> R2 0.74688 0.07902 0.16084 0.28915
#> Adj. R2 0.74161 0.05983 0.14335 0.27434
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
