Home > Mobile >  How to create a plotting function with two datasets of unequal length and two layers of plots in ggp
How to create a plotting function with two datasets of unequal length and two layers of plots in ggp

Time:01-19

I have one dataset called df1 that looks as such:

category    subject    runtime
foo         square     7.3
foo         circle.    10.5
foo2        square     15.3    
...

And a second subset called df2, where I filter for category where n > 1 which results in a smaller dataframe where category==foo2 is removed as a result:

category    subject    runtime
foo         square     7.3
foo         circle.    10.5
...

I attempt to plot these two datasets in a function that will create a boxplot (for all rows where there are multiple occurrences of category, hence I use the subsetted df2 here for geom_boxplot and dotplot for where there is only one occurrence of category. For example:

create.my.plot <- function(df, df.subset, xaxis, yaxis, xlab, ylab) {
  ex.plot <- ggplot(df, aes(x=xaxis, y=yaxis)) 
    geom_boxplot(data = df.subset, aes(fill=xaxis))  
    geom_point(position="identity", size=1)  
    labs(x=xlab, y=ylab)
  ex.plot
}

I can create this plot just fine outside of this plotting function but but when I call it from this function, it returns an error about the length of the data frames:

> create.my.plot(df1, df2, df$category, df$runtime, "Category", "Runtime")
Error: Aesthetics must be either length 1 or the same as the data (3): x and y

I guess this is because when ggplot() is called, it is with df and expects variables in aes() to have the amount of rows within df?

I would prefer to use a plotting function since I have to create the plot repeatedly but how do I overcome this error where I require to plot layers using different datasets of unequal length?

CodePudding user response:

The issue is that you use df$... to pass the variables to be plotted to your function. The error you get is one reason why as a general rule you should never use df$... inside aes().

Instead pass your columns to the function as unquoted names and make use of {{ (= curly-curly) inside aes():

library(ggplot2)

create.my.plot <- function(df, df.subset, xaxis, yaxis, xlab, ylab) {
  ggplot(df, aes(x = {{ xaxis }}, y = {{ yaxis }}))  
    geom_boxplot(data = df.subset, aes(fill = {{ xaxis }}))  
    geom_point(position = "identity", size = 1)  
    labs(x = xlab, y = ylab)
}

create.my.plot(df1, df2, category, runtime, "x", "y")

Or as a second option pass the column names as quoted strings and make use of the .data pronoun:

create.my.plot1 <- function(df, df.subset, xaxis, yaxis, xlab, ylab) {
  ggplot(df, aes(x = .data[[xaxis]], y = .data[[yaxis]]))  
    geom_boxplot(data = df.subset, aes(fill = .data[[xaxis]]))  
    geom_point(position = "identity", size = 1)  
    labs(x = xlab, y = ylab)
}

create.my.plot1(df1, df2, "category", "runtime", "x", "y")

DATA

df1 <- data.frame(
  category = c("foo", "foo", "foo2"),
  subject = c("square", "circle", "square"),
  runtime = 1:3
)
df2 <- df1[df1$category != "foo2", ]

CodePudding user response:

Stefan's answer is a good one, but will plot all the points associated with the boxplot. I would suggest this instead (I made the data a bit more realistic). First, here are the data:

df1 <- data.frame(
  category = factor(rep(c("foo", "foo3", "foo2"), c(100,100, 1))),
  subject = sample(c("circle", "square"), 201, replace=TRUE),
  runtime = runif(201, 5, 25)
)

Next, in your data make a new variable that counts the number of observations per group.

df1 <- df1 %>% 
  group_by(category) %>% 
  mutate(n = n())

Filter the data into those that will be plotted with points and those with boxes:

df_pt <- df1 %>% 
  filter(n == 1)
df_box <- df1 %>% 
  filter(n > 1)

Then, the function can be modified to use the point and box datasets. The important bit is that the point dataset will contain all points

create.my.plot1 <- function(df.point, df.box, xaxis, yaxis, xlab, ylab) {
  ggplot()  
    geom_point(data=df.point, aes_string(x = xaxis, y = yaxis), 
               position = "identity", size = 1)  
    geom_boxplot(data = df.box, aes_string(x=xaxis, y=yaxis, fill = xaxis))  
    labs(x = xlab, y = ylab)
}

Finally, call the plot on your data:

create.my.plot1(df_pt, df_box, "category", "runtime", "x", "y")

enter image description here

Alternatively, you could also build the function to do all of the subsetting work for you:

df1 <- data.frame(
  category = factor(rep(c("foo", "foo3", "foo2"), c(100,100, 1))),
  subject = sample(c("circle", "square"), 201, replace=TRUE),
  runtime = runif(201, 5, 25)
)
create.my.plot1 <- function(dat, xaxis, yaxis, xlab, ylab) {
  x <- dat %>% 
    group_by(across(all_of(xaxis))) %>% 
    mutate(n = n())
  pt <- x %>% filter(n == 1)
  box <- x %>% filter(n > 1)
  cat("Number of sinlge points: ", nrow(pt), "\nNumber of boxes: ", length(table(box[[xaxis]])), "\n")
  ggplot()  
    geom_point(data=pt, aes_string(x = xaxis, y = yaxis), 
               position = "identity", size = 1)  
    geom_boxplot(data = box, aes_string(x=xaxis, y=yaxis, fill = xaxis))  
    labs(x = xlab, y = ylab)
}

create.my.plot1(df1, "category", "runtime", "x", "y")

The result is the same as above.

  •  Tags:  
  • Related