I am a beginner with R so I appreciate your patience and help in advance!
I am trying to make a line graph using ggplot to display the changes in the y variable over years. When I input my data into ggplot this is what I am getting...
B: year as factor:
- Starting column names with digits is not good. Here we use
renameto rename toX100015 - to get year in the order we use
fct_inorderfromforcatspackage (it is in tidyverse) - with
group=1we get the line as desired.
library(tidyverse)
df %>%
rename(X100015 = `100015`) %>%
mutate(year = factor(year)) %>%
ggplot(aes(x=fct_inorder(year), y=X100015, group=1))
geom_line()
data:
df <- tribble(
~year, ~`100015`,
1998, 1504,
1999, 1504,
2000, 1504,
2001, 1504,
2002, 984,
2003, 1504,
2004, 1504,
2005, 1968,
2006, 1968,
2007, 1968,
2008, 1968,
2009, 1968,
2010, 1968,
2011, 1968,
2012, 1968,
2013, 1968,
2014, 1968,
2015, 1968,
2016, 1968,
2017, 1968,
2018, 1968,
2019, 1968,
2020, 1968,
2021, 1968)
CodePudding user response:
Hi Estefan and welcome to Stack Overflow! In the future, please try to post a reproducible example with your question. These help respondents better understand and diagnose your issue.
In regards to your question, the issue is that your column name is numeric and ggplot() reads y = '100015' as a string instead of a column name. This can be overcome by putting the term y = df$'100015'instead. It is generally a best practice to avoid purely numeric column names for this reason. Alternatively if you are not married to the column name being '100015' you can simply rename it with colnames(df)[2]<-"ResponseVar"
Here is a reproducible example:
library(ggplot2)
library(dplyr)
##Current approach##
df<-data.frame(year=c(1998:2021))
df$'100015'<-case_when(df$year %in% c(1998:2001, 2003, 2004) ~ 1504,
df$year == 2002 ~ 984,
TRUE ~ 1968)
ggplot(data = df, aes(x = year, y = df$'100015')) geom_line() # ggplot will complain about the column name, but will still provide the correct output
##Renaming approach##
colnames(df)[2]<-"ResponseVar"
ggplot(data = df, aes(x = year, y = ResponseVar)) geom_line() # ggplot won't complain


