Home > Back-end >  ggplot geom_line on top of geom_bar
ggplot geom_line on top of geom_bar

Time:01-17

I'm looking to simply display in one plot the production(geom_line) and the import(geom_bar) of wheat over years with data pulled from faostat. So basically on the x-axis = year, on the y-axis = production, and on the final axis = import. Adding to that I would like to have the bars split by the country for import since there is more import from a certain country, but I don't wish to display all of them, only the top 3 and the rest should be in a new category "other".

I'm quite bad with the code in R but once I have the foundation I can figure out how to adjust the looks of it just need to figure out how to start.

Here I'm providing a simplified dataset, as described, in the bar plot I want to show only the 3 highest values and have the rest combined into a new category "other". And a geom_line should be displayed on top of that with a separate axis on the right side, for some reason the geom_line is not working for me.

import <- data.frame(country = c("USA", "USA", "EU", "NZ", "EU", "Bulgaria", "Romania", "Serbia"),
                         date = c("1991", "1992", "1994", "1995", "1991", "1991", "1991", "1991"),
                         value = c(1000, 500, 2000, 1500, 1000, 100, 500, 500))

production <- data.frame(date = c("1991", "1992", "1994", "1995", "1995"),
                       value = c(50, 45, 40, 39, 38))

ggplot(data = import, mapping = aes(x=date, y=value, fill=country))  
  geom_col()

ggplot(data = production, aes(x=date, y=value))  
  geom_line()

Any help or guidance in the right direction will be appreciated. Thank you in advance! Love this community always great with the help <3

CodePudding user response:

You are close but there are a few things to get you the plot you want:

  1. To connect observations with geom_line() you need to add a group aesthetic. So if you just add aes(group = 1) inside the geom_line() call you'll get an actual line.
  2. I strongly reccomend you treat the date as numeric instead of character. Time is generally continuous so it's best to treat it that way unless you have a good reason for using it as a discrete variable.
  3. There are probably multiple good ways to lump the low value countries together. I just used if_else. It wouldn't work here because but it's worth knowing about `forcats::fct_lump_* if you haven't seen it already for doing something similar in a different setting.
  4. Finally, to get a second axis properly scaled, you need to provide a transformation factor that you use to divide the axis scale by inside sec_axis() and then multiply the data series you want to see on that axis by the same vactor.
library(tidyverse)

# create data with date as numeric
import <- data.frame(country = c("USA", "USA", "EU", "NZ", "EU", "Bulgaria", "Romania", "Serbia"),
                     date = c(1991, 1992, 1994, 1995, 1991, 1991, 1991, 1991),
                     value = c(1000, 500, 2000, 1500, 1000, 100, 500, 500))

# lump into 'other' if not in top 3
import <- import %>% 
  mutate(country = if_else(value >= sort(value,decreasing = T)[3],
         country, 
         "other"))

# create data with date as numeric
production <- data.frame(date = c(1991, 1992, 1994, 1995, 1995),
                         value = c(50, 45, 40, 39, 38))

# calculate scling factor for second axis
trans_fct <- max(import$value)/max(production$value)

# plot with second axis, scaling data series that appears on that axis
ggplot()  
  geom_col(data = import,
           mapping = aes(x = date, y = value, fill = country))  
  geom_line(data = production, aes(
    x = date,
    y = value * trans_fct,
    group = 1
  ))  
  scale_y_continuous(name = "Import",
                     sec.axis = sec_axis( ~ . / trans_fct, name = "Production"))

Created on 2022-01-13 by the reprex package (v2.0.1)

  •  Tags:  
  • Related