My original data looks something like this:
Day Time Type
3 21:00 0
3 21:00 0
3 22:00 0
4 21:00 0
3 21:00 1
3 22:00 1
4 22:00 1
3 21:00 2
4 22:00 2
4 22:00 2
While this is the resulting grouped data I have
Type Day Hour
0 3 21 2
22 1
4 21 1
1 3 21 1
22 1
4 22 1
2 3 21 1
4 22 2
Imagine this being knocks on my front, backdoor and side door. Where 0 is front and 1 the backdoor and 2 being a side door. And it shows me the amount of knocks on each door per day and hour. It always displays the sum at the right
I want this data now to be shown in a barchart where data from the same day and hour gets stacked upon of each other just havin different color based on the type they are
This represents of what I am looking for. I was playing around with matplotlib but I just cant seem to do it. Hope someone can help
Edit: Here my groupby code
time_data_station1 = df.groupby([df["Type"], df["CREATION TIME"].dt.day, df["CREATION TIME"].dt.hour]).size()
CodePudding user response:
IIUC,
df = pd.DataFrame({'knocks':[2,3,5,1,2,2,3,5,3]},
index=pd.MultiIndex.from_arrays([[0,0,0,1,1,1,2,2,2],
[3.,3.,4.,3.,4.,4.,3.,3.,4.],
[21,22,21,21,21,22,21,22,21]]))
dfu=df.unstack(0)['knocks']
dfu.index = [f'Day {i} - Hour {j}' for i, j in dfu.index]
ax = dfu.plot.bar(stacked=True, figsize=(12,8), rot=0)
ax.legend(title='Door');
ax.set_ylabel('Number of Knocks');
ax.set_title('Daily Hourly Knocks by Door');
Output:
CodePudding user response:
Okay this is what I did:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({"Day":[3,3,3,4,4,4], "Hour":[21,21,22,21,21,22], "Type":[0,1,0,0,1,0], "Amount":[2,1,3,5,2,2]})
first = df[df["Type"] == 0].sort_values(["Day", "Hour"])
first.reset_index(inplace=True, drop=True)
second = df[df["Type"] == 1].sort_values(["Day", "Hour"])
second.reset_index(inplace=True, drop=True)
#third = df[df["Type"] == 2].sort_values(by=["Day", "Hour"])
#third.reset_index(inplace=True, drop=True)
#fourth = df[df["Type"] == 3].sort_values(by=["Day", "Hour"])
#fourth.reset_index(inplace=True, drop=True)
total = df.groupby(["Day","Hour"]).sum()
total.reset_index(inplace=True)
for i in [first,second]: # [first,second,third,fourth] modify this to increase the number of stacked elements
ax = sns.barplot(x=total.index, y=total["Amount"], color="#09b0a8", alpha=0.3) #alpha makes color stacking easier
for j in i.values:
for num, val in enumerate(total.values):
if val[0] == j[0] and val[1] == j[1]:
total["Amount"].loc[num] = total["Amount"].loc[num] - j[3] #calculate height differential
plt.show()
This is a very manual way of solving the problem. There's definitely a library for this.


