I'm struggling to create a stacked bar chart derived from value_counts() of a columns from a dataframe.
Assume a dataframe like the following, where responder is not important, but would like to stack the count of [1,2,3,4,5] for all q# columns.
responder, q1, q2, q3, q4, q5
------------------------------
r1, 5, 3, 2, 4, 1
r2, 3, 5, 1, 4, 2
r3, 2, 1, 3, 4, 5
r4, 1, 4, 5, 3, 2
r5, 1, 2, 5, 3, 4
r6, 2, 3, 4, 5, 1
r7, 4, 3, 2, 1, 5
Look something like, except each bar would be labled by q# and it would include 5 sections for count of [1,2,3,4,5] from the data:
Ideally, all bars will be "100%" wide, showing the count as a proportion of the bar. But it's gauranteed that each responder row will have one entry for each, so the percentage is just a bonus if possible.
Any help would be much appreciated, with a slight preference for matplotlib solution.
CodePudding user response:
You can calculate the heights of bars using percentages and obtain the stacked bar plot using ax = percents.T.plot(kind='barh', stacked=True) where percents is a DataFrame with q1,...q5 as columns and 1,...,5 as indices.
>>> percents
q1 q2 q3 q4 q5
1 0.196873 0.199316 0.206644 0.194919 0.202247
2 0.205357 0.188988 0.205357 0.205357 0.194940
3 0.202265 0.217705 0.184766 0.196089 0.199177
4 0.199494 0.199494 0.190886 0.198481 0.211646
5 0.196137 0.195146 0.211491 0.205052 0.192174
Then you can use ax.patches to add labels for every bar. Labels can be generated from the original counts DataFrame: counts = df.apply(lambda x: x.value_counts())
>>> counts
q1 q2 q3 q4 q5
1 403 408 423 399 414
2 414 381 414 414 393
3 393 423 359 381 387
4 394 394 377 392 418
5 396 394 427 414 388
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
## create some data similar to yours
np.random.seed(42)
categories = ['q1','q2','q3','q4','q5']
df = pd.DataFrame(np.random.randint(1,6,size=(2000, 5)), columns=categories)
## counts will be used for the labels
counts = df.apply(lambda x: x.value_counts())
## percents will be used to determine the height of each bar
percents = counts.div(counts.sum(axis=1), axis=0)
counts_array = counts.values
nrows, ncols = counts_array.shape
indices = [(i,j) for i in range(0,nrows) for j in range(0,ncols)]
percents_array = percents.values
ax = percents.T.plot(kind='barh', stacked=True)
ax.legend(bbox_to_anchor=(1, 1.01), loc='upper right')
for i, p in enumerate(ax.patches):
ax.annotate(f"({p.get_width():.2f}%)", (p.get_x() p.get_width() - 0.15, p.get_y() - 0.10), xytext=(5, 10), textcoords='offset points')
ax.annotate(str(counts_array[indices[i]]), (p.get_x() p.get_width() - 0.15, p.get_y() 0.10), xytext=(5, 10), textcoords='offset points')
plt.show()
CodePudding user response:
- Bar annotations are most easily implemented with
- Transposing
dfwithdf = pd.DataFrame(data).set_index('responder').T, swaps the index and columns, to produce the following plot.figsize=(12, 10)may need to be adjusted.
DataFrames
df
q1 q2 q3 q4 q5 responder r1 5 3 2 4 1 r2 3 5 1 4 2 r3 2 1 3 4 5 r4 1 4 5 3 2 r5 1 2 5 3 4 r6 2 3 4 5 1 r7 4 3 2 1 5per
q1 q2 q3 q4 q5 responder r1 0.333333 0.200000 0.133333 0.266667 0.066667 r2 0.200000 0.333333 0.066667 0.266667 0.133333 r3 0.133333 0.066667 0.200000 0.266667 0.333333 r4 0.066667 0.266667 0.333333 0.200000 0.133333 r5 0.066667 0.133333 0.333333 0.200000 0.266667 r6 0.133333 0.200000 0.266667 0.333333 0.066667 r7 0.266667 0.200000 0.133333 0.066667 0.333333Referenced
- How to put the legend out of the plot shows various ways to format and move the legend.
- Adding value labels on a matplotlib bar chart provides a detailed explanation of
.bar_label. - How to add multiple annotations to a barplot
- stack bar plot in matplotlib and add label to each section
- How to annotate barplot with percent by hue/legend group
- How to add percentages on top of bars in seaborn
- Transposing



