Home > Blockchain >  Specification curve "choices" plot using ggplot2
Specification curve "choices" plot using ggplot2

Time:01-13

I have a small dataset of estimates from many regressions of an outcome variable on a main treatment variable and then various sets of control variables (in fact, all possible combinations of those controls variables). The table of estimates is as follows:

df <-
structure(list(control_set = c("cen21_hindu_pct", "cen83_urban_pct", 
"cen21_hindu_pct   cen83_urban_pct", "NONE"), xest = c(0.0124513609978549, 
0.00427174623249021, 0.006447506098051, 0.0137107176362076), 
    xest_conf_low = c(0.00750677700140716, -0.00436301983024899, 
    -0.0013089334064237, 0.00925185534519074), xest_conf_high = c(0.0173959449943027, 
    0.0129065122952294, 0.0142039456025257, 0.0181695799272245
    )), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))

I want to make the two plots for the classic "specification curve analysis." The top plot is simply the set of estimates ordered by the magnitude of the estimate on the main treatment variable (no issue here):

df %>%
arrange(xest) %>%
mutate(specifications = 1:nrow(.)) %>%
ggplot(aes(x = specifications, y = xest, ymin = xest_conf_low, ymax = xest_conf_high))  
  geom_pointrange(alpha = 0.1, size = 0.6, fatten = 1)  
  labs(x = "", y = "Estimate\n")  
  theme_bw()

My problem is with the aligned plot underneath that describes the control-set choices. Directly underneath each coefficient dot and whisker from the plot just made I want a plot that indicates the set of corresponding control variables that were included in that model (i.e. the list of controls in the control_set column in the df data frame row). So the plot I need in this example would look just like this:

enter image description here

This is a (failed) sketch of what I tried to get there, by modifying the earlier estimation dataset in long form, but I couldn't get multiple ticks to show vertically: (Note, this bit of code won't run)

# forplot %>%
# arrange(xest) %>% 
# mutate(specifications = 1:nrow(.)) %>%    
# mutate(value = "|") %>%
# ggplot(aes(specifications, term))  
#   geom_text(aes(label = value))  
#   scale_color_manual(values = c("lightblue"))  
#   labs(x = "\nSpecification number", y = "")   
#   theme_bw()    

How can I use ggplot2 to make the plot-figure shown above from the information in the data frame, df?

CodePudding user response:

If we define your plot as -> a...

library(patchwork)
  
b <- tibble(specifications = c(1,2,2,3),
       control_set = rep(c("cen83_urban_pct", "cen21_hindu_pct"), each = 2)) %>%
  ggplot(aes(specifications, control_set))  
  geom_text(aes(label = "|"), size = 5)  
  coord_cartesian(xlim = c(1,4))  
  labs(x = NULL, y = NULL)  
  theme_bw() 
  theme(axis.ticks = element_blank(), 
        axis.text.x = element_blank())

a/b   plot_layout(heights = c(3,1))

enter image description here

If you want to generate the key automatically, you might use something like this:

library(dplyr)
df %>%
  select(control_set) %>%
  mutate(specifications = 1:4) %>%
  separate_rows(control_set, sep = "\\ ") %>%
  mutate(control_set = trimws(control_set)) %>%  # b/c my regex not good enough to trim spaces in line above
  ...

CodePudding user response:

If you want to relabel the numbers in the y-axis with the control_set labels you can add

  scale_y_continuous(breaks = df$xest, labels  = df$control_set)
  •  Tags:  
  • Related