How to bootstrap a linear regression and estimate confidence intervals in R?-CodePudding

I have data-set where the bootstraps were performed such that only values within the two main factors replicate/ level were replaced.

replicate level high.density low.density
    1     low    14          36
    1     low    54          31
    1     mid    82          10
    1     mid    24          NA
    2     low    12          28
    2     low    11          45
    2     mid    12          17
    2     mid    NA          24
    2      up    40          10
    2      up    NA           5
    2      up    20           2

##DATA FRAME

 df <- structure(list(replicate = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), level = c("low", "low", "mid", "mid", "low", "low", "mid", "mid", "up", "up", "up"), high.density = c(14, 54, 82, 24, 12, 11, 12, NA, 40, NA, 20), low.density = c(36, 31, 10, 
    NA, 28, 45, 17, 24, 10, 5, 2)), class = c("spec_tbl_df","tbl_df","tbl", "data.frame"), row.names = c(NA, -11L), spec = structure(list(cols = list(replicate = structure(list(), class = c("collector_double", "collector")), level = structure(list(), class = c("collector_character","collector")), high.density = structure(list(), class = c("collector_double","collector")), low.density = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1L), class = "col_spec"))
    
    df$replicate <- as.factor(as.numeric(df$replicate))
    df$level <- as.factor(as.character(df$level))

##Creating the data-set needed for boot. Only values with-in the unique replicate/ level were allowed to shuffle (

#A sample code for the plot

df_boot <-  rbindlist(df_list, idcol = 'count')

ggplot(aes(x = low.density, y = high.density), data = df_boot)    
  stat_smooth(aes(group = factor(count)), geom = "line", method = "lm", se = FALSE, color = "red", alpha=0.02)  
  stat_smooth(geom = "line", method = "lm", se = FALSE, color = "black", linetype = "longdash")  
  theme(panel.background = element_blank())   theme(legend.position="none")

CodePudding user response：

We may conduct a non-parametric or parametric bootstrap of the postulated linear model. The important difference between a non-parametric and a parametric bootstrap is that in the non-parametric case, we are going to repeatedly sample from our original dataframe df, whereas in the parametric case, we simulate new data from our original model.

We postulate the following linear model:

high.density_i = b₀ b₁low.density e_i e_i ~ N(0, sigma²)

It is important to first listwise delete rows containing missing observations and then run the bootstrap procedure (even though you'd better multiply impute missing observations in independents, because you will obtain biased inference if the missing data mechanism is not of the so-called MCAR type). In the

To get smoother bounds, try more bootstrap replicates.