I'm looking to simply display in one plot the production(geom_line) and the import(geom_bar) of wheat over years with data pulled from faostat. So basically on the x-axis = year, on the y-axis = production, and on the final axis = import. Adding to that I would like to have the bars split by the country for import since there is more import from a certain country, but I don't wish to display all of them, only the top 3 and the rest should be in a new category "other".
I'm quite bad with the code in R but once I have the foundation I can figure out how to adjust the looks of it just need to figure out how to start.
Here I'm providing a simplified dataset, as described, in the bar plot I want to show only the 3 highest values and have the rest combined into a new category "other". And a geom_line should be displayed on top of that with a separate axis on the right side, for some reason the geom_line is not working for me.
import <- data.frame(country = c("USA", "USA", "EU", "NZ", "EU", "Bulgaria", "Romania", "Serbia"),
date = c("1991", "1992", "1994", "1995", "1991", "1991", "1991", "1991"),
value = c(1000, 500, 2000, 1500, 1000, 100, 500, 500))
production <- data.frame(date = c("1991", "1992", "1994", "1995", "1995"),
value = c(50, 45, 40, 39, 38))
ggplot(data = import, mapping = aes(x=date, y=value, fill=country))
geom_col()
ggplot(data = production, aes(x=date, y=value))
geom_line()
Any help or guidance in the right direction will be appreciated. Thank you in advance! Love this community always great with the help <3
CodePudding user response:
You are close but there are a few things to get you the plot you want:
- To connect observations with
geom_line()you need to add agroupaesthetic. So if you just addaes(group = 1)inside thegeom_line()call you'll get an actual line. - I strongly reccomend you treat the date as
numericinstead ofcharacter. Time is generally continuous so it's best to treat it that way unless you have a good reason for using it as a discrete variable. - There are probably multiple good ways to lump the low
valuecountries together. I just usedif_else. It wouldn't work here because but it's worth knowing about `forcats::fct_lump_* if you haven't seen it already for doing something similar in a different setting. - Finally, to get a second axis properly scaled, you need to provide a transformation factor that you use to divide the axis scale by inside
sec_axis()and then multiply the data series you want to see on that axis by the same vactor.
library(tidyverse)
# create data with date as numeric
import <- data.frame(country = c("USA", "USA", "EU", "NZ", "EU", "Bulgaria", "Romania", "Serbia"),
date = c(1991, 1992, 1994, 1995, 1991, 1991, 1991, 1991),
value = c(1000, 500, 2000, 1500, 1000, 100, 500, 500))
# lump into 'other' if not in top 3
import <- import %>%
mutate(country = if_else(value >= sort(value,decreasing = T)[3],
country,
"other"))
# create data with date as numeric
production <- data.frame(date = c(1991, 1992, 1994, 1995, 1995),
value = c(50, 45, 40, 39, 38))
# calculate scling factor for second axis
trans_fct <- max(import$value)/max(production$value)
# plot with second axis, scaling data series that appears on that axis
ggplot()
geom_col(data = import,
mapping = aes(x = date, y = value, fill = country))
geom_line(data = production, aes(
x = date,
y = value * trans_fct,
group = 1
))
scale_y_continuous(name = "Import",
sec.axis = sec_axis( ~ . / trans_fct, name = "Production"))

Created on 2022-01-13 by the reprex package (v2.0.1)
