Home > Mobile >  Having trouble making ggplot line graph
Having trouble making ggplot line graph

Time:01-29

I am a beginner with R so I appreciate your patience and help in advance!

I am trying to make a line graph using ggplot to display the changes in the y variable over years. When I input my data into ggplot this is what I am getting...

enter image description here

B: year as factor:

  1. Starting column names with digits is not good. Here we use rename to rename to X100015
  2. to get year in the order we use fct_inorder from forcats package (it is in tidyverse)
  3. with group=1 we get the line as desired.
library(tidyverse)

df %>% 
  rename(X100015 = `100015`) %>%
  mutate(year = factor(year)) %>%
  ggplot(aes(x=fct_inorder(year), y=X100015, group=1))  
  geom_line()

enter image description here

data:

df <- tribble(
  ~year,  ~`100015`,
1998,   1504,
1999,   1504,
2000,   1504,
2001,   1504,
2002,    984,
2003,   1504,
2004,   1504,
2005,   1968,
2006,   1968,
2007,   1968,
2008,   1968,
2009,   1968,
2010,   1968,
2011,   1968,
2012,   1968,
2013,   1968,
2014,   1968,
2015,   1968,
2016,   1968,
2017,   1968,
2018,   1968,
2019,   1968,
2020,   1968,
2021,   1968)

CodePudding user response:

Hi Estefan and welcome to Stack Overflow! In the future, please try to post a reproducible example with your question. These help respondents better understand and diagnose your issue.

In regards to your question, the issue is that your column name is numeric and ggplot() reads y = '100015' as a string instead of a column name. This can be overcome by putting the term y = df$'100015'instead. It is generally a best practice to avoid purely numeric column names for this reason. Alternatively if you are not married to the column name being '100015' you can simply rename it with colnames(df)[2]<-"ResponseVar"

Here is a reproducible example:

library(ggplot2)
library(dplyr)

##Current approach##
df<-data.frame(year=c(1998:2021))
df$'100015'<-case_when(df$year %in% c(1998:2001, 2003, 2004) ~ 1504,
                       df$year == 2002 ~ 984,
                       TRUE ~ 1968)

ggplot(data = df, aes(x = year, y = df$'100015'))   geom_line() # ggplot will complain about the column name, but will still provide the correct output


##Renaming approach##
colnames(df)[2]<-"ResponseVar"
ggplot(data = df, aes(x = year, y = ResponseVar))   geom_line() # ggplot won't complain

  •  Tags:  
  • Related