I have one dataset called df1 that looks as such:
category subject runtime
foo square 7.3
foo circle. 10.5
foo2 square 15.3
...
And a second subset called df2, where I filter for category where n > 1 which results in a smaller dataframe where category==foo2 is removed as a result:
category subject runtime
foo square 7.3
foo circle. 10.5
...
I attempt to plot these two datasets in a function that will create a boxplot (for all rows where there are multiple occurrences of category, hence I use the subsetted df2 here for geom_boxplot and dotplot for where there is only one occurrence of category. For example:
create.my.plot <- function(df, df.subset, xaxis, yaxis, xlab, ylab) {
ex.plot <- ggplot(df, aes(x=xaxis, y=yaxis))
geom_boxplot(data = df.subset, aes(fill=xaxis))
geom_point(position="identity", size=1)
labs(x=xlab, y=ylab)
ex.plot
}
I can create this plot just fine outside of this plotting function but but when I call it from this function, it returns an error about the length of the data frames:
> create.my.plot(df1, df2, df$category, df$runtime, "Category", "Runtime")
Error: Aesthetics must be either length 1 or the same as the data (3): x and y
I guess this is because when ggplot() is called, it is with df and expects variables in aes() to have the amount of rows within df?
I would prefer to use a plotting function since I have to create the plot repeatedly but how do I overcome this error where I require to plot layers using different datasets of unequal length?
CodePudding user response:
The issue is that you use df$... to pass the variables to be plotted to your function. The error you get is one reason why as a general rule you should never use df$... inside aes().
Instead pass your columns to the function as unquoted names and make use of {{ (= curly-curly) inside aes():
library(ggplot2)
create.my.plot <- function(df, df.subset, xaxis, yaxis, xlab, ylab) {
ggplot(df, aes(x = {{ xaxis }}, y = {{ yaxis }}))
geom_boxplot(data = df.subset, aes(fill = {{ xaxis }}))
geom_point(position = "identity", size = 1)
labs(x = xlab, y = ylab)
}
create.my.plot(df1, df2, category, runtime, "x", "y")

Or as a second option pass the column names as quoted strings and make use of the .data pronoun:
create.my.plot1 <- function(df, df.subset, xaxis, yaxis, xlab, ylab) {
ggplot(df, aes(x = .data[[xaxis]], y = .data[[yaxis]]))
geom_boxplot(data = df.subset, aes(fill = .data[[xaxis]]))
geom_point(position = "identity", size = 1)
labs(x = xlab, y = ylab)
}
create.my.plot1(df1, df2, "category", "runtime", "x", "y")

DATA
df1 <- data.frame(
category = c("foo", "foo", "foo2"),
subject = c("square", "circle", "square"),
runtime = 1:3
)
df2 <- df1[df1$category != "foo2", ]
CodePudding user response:
Stefan's answer is a good one, but will plot all the points associated with the boxplot. I would suggest this instead (I made the data a bit more realistic). First, here are the data:
df1 <- data.frame(
category = factor(rep(c("foo", "foo3", "foo2"), c(100,100, 1))),
subject = sample(c("circle", "square"), 201, replace=TRUE),
runtime = runif(201, 5, 25)
)
Next, in your data make a new variable that counts the number of observations per group.
df1 <- df1 %>%
group_by(category) %>%
mutate(n = n())
Filter the data into those that will be plotted with points and those with boxes:
df_pt <- df1 %>%
filter(n == 1)
df_box <- df1 %>%
filter(n > 1)
Then, the function can be modified to use the point and box datasets. The important bit is that the point dataset will contain all points
create.my.plot1 <- function(df.point, df.box, xaxis, yaxis, xlab, ylab) {
ggplot()
geom_point(data=df.point, aes_string(x = xaxis, y = yaxis),
position = "identity", size = 1)
geom_boxplot(data = df.box, aes_string(x=xaxis, y=yaxis, fill = xaxis))
labs(x = xlab, y = ylab)
}
Finally, call the plot on your data:
create.my.plot1(df_pt, df_box, "category", "runtime", "x", "y")
Alternatively, you could also build the function to do all of the subsetting work for you:
df1 <- data.frame(
category = factor(rep(c("foo", "foo3", "foo2"), c(100,100, 1))),
subject = sample(c("circle", "square"), 201, replace=TRUE),
runtime = runif(201, 5, 25)
)
create.my.plot1 <- function(dat, xaxis, yaxis, xlab, ylab) {
x <- dat %>%
group_by(across(all_of(xaxis))) %>%
mutate(n = n())
pt <- x %>% filter(n == 1)
box <- x %>% filter(n > 1)
cat("Number of sinlge points: ", nrow(pt), "\nNumber of boxes: ", length(table(box[[xaxis]])), "\n")
ggplot()
geom_point(data=pt, aes_string(x = xaxis, y = yaxis),
position = "identity", size = 1)
geom_boxplot(data = box, aes_string(x=xaxis, y=yaxis, fill = xaxis))
labs(x = xlab, y = ylab)
}
create.my.plot1(df1, "category", "runtime", "x", "y")
The result is the same as above.

