Home > Blockchain >  Overlay two plots from different dataframes in R
Overlay two plots from different dataframes in R

Time:01-31

I would like to overlay two ggplots from different data sources. I don't think a left_join will work because the dataframes are of two different lengths and would potential change the underlying plots.[Maybe?]

library(tidyverse)
set.seed(123)
player_df <- tibble(name = rep(c("A","B","C","D"), each = 10, times = 1),
                    pos = rep(c("DEF","DEF","MID","MID"), each = 10, times = 1),
                    load = c(rnorm(10, mean = 200, sd = 100),
                             rnorm(10, mean = 300, sd = 50),
                             rnorm(10, mean = 400, sd = 100),
                             rnorm(10, mean = 500, sd = 50)))

p1 <- player_df %>% 
  ggplot(aes(x = load, y = name))  
  geom_point()


pos_df <- tibble(pos = rep(c("DEF","MID"), each = 30, times = 1),
                 load = (c(rnorm(30, mean = 250, sd = 100),
                           rnorm(30, mean = 350, sd = 100))))

p2 <- pos_df %>% 
  ggplot(aes(x = load, y = pos))  
  geom_boxplot()

p1
p2

# add p2 to every p1 player plot by pos

I would like p1 to have the corresponding p2 - by pos - appear behind it. So... add the matching p2 boxplot to each p1 scatterplot.

p1: plot of players

p2: boxplot of position

CodePudding user response:

It's not really advisable to attempt to superimpose two plots on each other. A ggplot is made of layers already, so usually it's just a case of superimposing one geom on another. This can be difficult if (as in your case) one of the axes has different labels. However, with a little work it is possible to wrangle your data so that it all sits on a single plot. In your case, you could do something like:

levs <- c("A", "DEF", "B", "C", "MID", "D")

ggplot(within(pos_df, pos <- factor(pos, levs)), aes(x = load, y = pos))  
  geom_boxplot(width = 2.3)  
  geom_point(data = within(player_df, pos <- factor(name, levs)))  
  scale_y_discrete(limits = c("A", "DEF", "B", " ", "C", "MID", "D"))

enter image description here

CodePudding user response:

Dug into ggplot a bit and re-engineered a boxplot bit by bit.

# manually calculate stats that are used in boxplots
pos_df_summary <- pos_df %>%
  group_by(pos, .drop = FALSE) %>% 
  summarise(min = fivenum(load)[1],
            Q1 = fivenum(load)[2],
            median = fivenum(load)[3],
            Q3 = fivenum(load)[4],
            max = fivenum(load)[5]
            )
# add the boxplot data to each player
joined_df <- player_df %>% 
  left_join(., pos_df_summary, by = "pos") %>% 
  distinct(name, .keep_all = TRUE)

# plot
ggplot(data = NULL, aes(group = name))  
  # create the line from min to max
  geom_segment(data = joined_df, aes(y = name, yend = name, x=min, xend=max), color="black")  
  #create the box with median line
  geom_crossbar(data = joined_df, 
                aes(y = name, xmin = Q1, xmax = Q3, x = median, fill = "NA"),
                color = "black",
                fatten = 1)  
  scale_fill_manual(values = "white")  
  # add the points from the player_df
  geom_point(data = player_df, 
             aes(x = load, y = name, group=name),
             color = "red",
             show.legend=FALSE)  
  theme(legend.position = "none")

final plot combining to dataframes

There may be some extraneous code in here as I cobbled it from some other resources. Specifically, I'm not sure what the aes(group = name) in the ggplot() call does exactly.

  •  Tags:  
  • Related