I would like to draw a boxplot of a dataframe, with the x axis of the dataframe I will be drawing having the specific names of the various boxplots to be drawn. Here is the dataset
| OA | White_British | Low_Occupancy | Unemployed | Qualification |
|---|---|---|---|---|
| E00004120 | 42.3566879 | 6.293706294 | 1.893939394 | 73.62637363 |
| E00004121 | 47.2 | 5.93220339 | 2.688172043 | 69.90291262 |
| E00004122 | 40.6779661 | 2.912621359 | 1.212121212 | 67.58241758 |
| E00004123 | 49.66216216 | 0.925925926 | 2.803738318 | 60.77586207 |
| E00004124 | 51.13636364 | 2 | 3.816793893 | 65.98639456 |
| E00004125 | 41.41791045 | 3.93258427 | 3.846153846 | 74.20634921 |
| E00004126 | 48.54014599 | 5.555555556 | 4.545454545 | 62.44725738 |
| E00004127 | 48.67924528 | 8.870967742 | 0.938967136 | 60.35242291 |
| E00004128 | 45.39249147 | 2.48447205 | 2.164502165 | 70.07874016 |
| E00004129 | 49.05660377 | 3.521126761 | 4.310344828 | 66.66666667 |
| E00004130 | 38.80597015 | 6.25 | 0.917431193 | 66.66666667 |
| E00004131 | 39.64285714 | 7.56302521 | 1.869158879 | 64.47368421 |
| E00004132 | 55.88235294 | 4.347826087 | 3.797468354 | 73.4939759 |
| E00004133 | 41.96078431 | 7.627118644 | 1.990049751 | 65.38461538 |
| E00004134 | 53.19148936 | 6 | 2.702702703 | 72.89156627 |
| E00004135 | 46.85314685 | 4.761904762 | 3.731343284 | 74.82014388 |
| E00004136 | 59.64912281 | 0.909090909 | 2.732240437 | 73.68421053 |
| E00004137 | 48.16176471 | 5.442176871 | 2.752293578 | 69.06779661 |
| E00004138 | 42.22222222 | 2.816901408 | 4.972375691 | 58.16326531 |
This is in fact just a subset of a very long table. After calling the excel table to R,
I used ggplot to call the table. This code framework was used:
ggplot(data, aes (x)) geom_boxplot() and sometimes ggplot(data) geom_boxplot(aes(x)) where x is a vector in the dataframe (a column in this case).
So the first time this is the R code I used:
ggplot(Census.Data, aes(x = White_British, Low_Occupancy, Unemployed, Qualification)) geom_boxplot()
What I get is the White_British data column is plotted on the X-axis while that of Low_Occupancy gets plotted on the y-axis, like in the figure below:
Same case happens when the aes(x) is put in the geom_boxplot() argument but thats not the big issue. Trying to be a bit clever, I call my columns [| White_British | Low_Occupancy | Unemployed | Qualification |] into an object named censusgroups, thinking if I put the object name in the aes(x) and specify the call to these columns with x = to denote they will go to the x-axis, that will have sorted my problem. Not so clever.
censusgroups <- Census.Data [, 2:5]
ggplot(Census.Data) geom_boxplot(aes(x = censusgroups)) -- returns a blank white page on the plot tab in R studio
ggplot(Census.Data) geom_boxplot(aes(x = Census.Data [, 2:5])) -- still returns a blank page
To cut long story short since some other explorations also resulted in errors, I used the default boxplot function of R Studio.
gach <-boxplot(x= Census.Data[, 2:5], xlab = 'Data Groups', ylab = 'Percentages', col = topo.colors(4, 0.6, rev = F))
And had the result below. In fact, this is the kind of result I would like to get using ggplot features, since I prefer ggplot to the default due to its superior graphic features.
I would like to know how to write the code to create the boxplot as above, using ggplot.
CodePudding user response:
You can try melting the data using reshape2 library.
Here's how you can do this:
library(reshape2)
ggplot(melt(df), aes(x = variable, y = value))
geom_boxplot()
The output will look like this:

CodePudding user response:
Try pivoting your data first:
data <-
data %>%
tidyr::pivot_longer(cols = White_British:Qualification, names_to = "Data Groups", values_to = "Percentages")
col = topo.colors(4, 0.6, rev = F)
ggplot(data, aes(x = `Data Groups`, y = Percentages, fill = `Data Groups`))
geom_boxplot()
scale_fill_manual(values = col)
theme(legend.position = "none")



