Home > Enterprise >  Draw a boxplot with several boxplots using ggplot2
Draw a boxplot with several boxplots using ggplot2

Time:01-28

I would like to draw a boxplot of a dataframe, with the x axis of the dataframe I will be drawing having the specific names of the various boxplots to be drawn. Here is the dataset

OA White_British Low_Occupancy Unemployed Qualification
E00004120 42.3566879 6.293706294 1.893939394 73.62637363
E00004121 47.2 5.93220339 2.688172043 69.90291262
E00004122 40.6779661 2.912621359 1.212121212 67.58241758
E00004123 49.66216216 0.925925926 2.803738318 60.77586207
E00004124 51.13636364 2 3.816793893 65.98639456
E00004125 41.41791045 3.93258427 3.846153846 74.20634921
E00004126 48.54014599 5.555555556 4.545454545 62.44725738
E00004127 48.67924528 8.870967742 0.938967136 60.35242291
E00004128 45.39249147 2.48447205 2.164502165 70.07874016
E00004129 49.05660377 3.521126761 4.310344828 66.66666667
E00004130 38.80597015 6.25 0.917431193 66.66666667
E00004131 39.64285714 7.56302521 1.869158879 64.47368421
E00004132 55.88235294 4.347826087 3.797468354 73.4939759
E00004133 41.96078431 7.627118644 1.990049751 65.38461538
E00004134 53.19148936 6 2.702702703 72.89156627
E00004135 46.85314685 4.761904762 3.731343284 74.82014388
E00004136 59.64912281 0.909090909 2.732240437 73.68421053
E00004137 48.16176471 5.442176871 2.752293578 69.06779661
E00004138 42.22222222 2.816901408 4.972375691 58.16326531

This is in fact just a subset of a very long table. After calling the excel table to R, I used ggplot to call the table. This code framework was used: ggplot(data, aes (x)) geom_boxplot() and sometimes ggplot(data) geom_boxplot(aes(x)) where x is a vector in the dataframe (a column in this case).

So the first time this is the R code I used: ggplot(Census.Data, aes(x = White_British, Low_Occupancy, Unemployed, Qualification)) geom_boxplot()

What I get is the White_British data column is plotted on the X-axis while that of Low_Occupancy gets plotted on the y-axis, like in the figure below:

boxplot but I want both data columns on x-axis

Same case happens when the aes(x) is put in the geom_boxplot() argument but thats not the big issue. Trying to be a bit clever, I call my columns [| White_British | Low_Occupancy | Unemployed | Qualification |] into an object named censusgroups, thinking if I put the object name in the aes(x) and specify the call to these columns with x = to denote they will go to the x-axis, that will have sorted my problem. Not so clever. censusgroups <- Census.Data [, 2:5]

ggplot(Census.Data) geom_boxplot(aes(x = censusgroups)) -- returns a blank white page on the plot tab in R studio

ggplot(Census.Data) geom_boxplot(aes(x = Census.Data [, 2:5])) -- still returns a blank page

To cut long story short since some other explorations also resulted in errors, I used the default boxplot function of R Studio.

gach <-boxplot(x= Census.Data[, 2:5], xlab = 'Data Groups', ylab = 'Percentages', col = topo.colors(4, 0.6, rev = F))

And had the result below. In fact, this is the kind of result I would like to get using ggplot features, since I prefer ggplot to the default due to its superior graphic features.

correct boxplot with several boxplots on x axis

I would like to know how to write the code to create the boxplot as above, using ggplot.

CodePudding user response:

You can try melting the data using reshape2 library. Here's how you can do this:

library(reshape2)
ggplot(melt(df), aes(x = variable, y = value))   
  geom_boxplot()

The output will look like this: enter image description here

CodePudding user response:

Try pivoting your data first:

data <- 
    data %>%
    tidyr::pivot_longer(cols = White_British:Qualification, names_to = "Data Groups", values_to = "Percentages")
col = topo.colors(4, 0.6, rev = F)

ggplot(data, aes(x = `Data Groups`, y = Percentages, fill = `Data Groups`))   
    geom_boxplot()  
    scale_fill_manual(values = col)  
    theme(legend.position = "none")

enter image description here

  •  Tags:  
  • Related