I want to represent my data in the form of a bar plot as shown on my expected output.

time,date,category
0,2002-05-01,2
1,2002-05-02,0
2,2002-05-03,0
3,2002-05-04,0
4,2002-05-05,0
5,2002-05-06,0
6,2002-05-07,0
7,2002-05-08,2
8,2002-05-09,2
9,2002-05-10,0
10,2002-05-11,2
11,2002-05-12,0
12,2002-05-13,0
13,2002-05-14,2
14,2002-05-15,2
15,2002-05-16,2
16,2002-05-17,2
17,2002-05-18,2
18,2002-05-19,0
19,2002-05-20,0
20,2002-05-21,1
21,2002-05-22,2
22,2002-05-23,0
23,2002-05-24,1
24,2002-05-25,0
25,2002-05-26,0
26,2002-05-27,0
27,2002-05-28,0
28,2002-05-29,1
29,2002-05-30,0
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('df.csv')
daily_category = df[['date','category']]
daily_category['weekday'] = pd.to_datetime(daily_category['date']).dt.day_name()
daily_category_plot = daily_category[['weekday','category']]
daily_category_plot[['category']].groupby('weekday').count().plot(kind='bar', legend=None)
plt.show()
However, I get the below error
Traceback (most recent call last): File "day_plot.py", line 10, in daily_category_plot[['category']].groupby('weekday').count().plot(kind='bar', legend=None) File "/home/..../.local/lib/python3.6/site-packages/pandas/core/frame.py", line 6525, in groupby dropna=dropna, File "/home/..../.local/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 533, in init dropna=self.dropna, File "/home/..../.local/lib/python3.6/site-packages/pandas/core/groupby/grouper.py", line 786, in get_grouper raise KeyError(gpr) KeyError: 'weekday'
********** A further example below where I manually extract data below returns almost the expected output except that the days are represented as numbers instead of weekday names. ***********
Day,category1,category2,category3
Sunday,0,0,4
Monday,0,0,4
Tuesday,1,1,2
Wednesday,1,4,0
Thursday,0,2,3
Friday,1,1,2
Saturday,0,2,2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('df.csv')
ax = df.plot.bar(stacked=True, color=['green', 'red', 'blue'])
ax.set_xticklabels(labels=df.index, rotation=70, rotation_mode="anchor", ha="right")
ax.set_xlabel('')
ax.set_ylabel('Number of days')
plt.show()
CodePudding user response:
import pandas as pd
import matplotlib.pyplot as plt
d = """0,2002-05-01,2 1,2002-05-02,0 2,2002-05-03,0 3,2002-05-04,0 4,2002-05-05,0 5,2002-05-06,0 6,2002-05-07,0 7,2002-05-08,2 8,2002-05-09,2 9,2002-05-10,0 10,2002-05-11,2 11,2002-05-12,0 12,2002-05-13,0 13,2002-05-14,2 14,2002-05-15,2 15,2002-05-16,2 16,2002-05-17,2 17,2002-05-18,2 18,2002-05-19,0 19,2002-05-20,0 20,2002-05-21,1 21,2002-05-22,2 22,2002-05-23,0 23,2002-05-24,1 24,2002-05-25,0 25,2002-05-26,0 26,2002-05-27,0 27,2002-05-28,0 28,2002-05-29,1 29,2002-05-30,0"""
df = pd.DataFrame([v.split(',') for v in d.split(' ')], columns=['time', 'date', 'category'])
df.time, df.category = df.time.astype(int), df.category.astype(int)
data = df.copy()
data['weekday'] = pd.to_datetime(data['date']).dt.day_name()
data.drop(columns=['time', 'date'], inplace=True)
weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
categories = sorted(list(set(df.category)))
counts = pd.DataFrame(0, index=weekdays, columns=categories)
for weekday, category in zip(data.weekday, data.category):
counts.loc[weekday, category] = 1
counts.plot.bar(stacked=True);
CodePudding user response:
This solution uses groupby on to columns and transforms the returned Dataframe using pivot. This can be plotted by plot.bar() but has the wrong labels. Therefor the index is changed.
ans = (df.groupby(["weekday", "category"])
.size()
.reset_index(name="sum")
.pivot(index='weekday', columns='category', values='sum')
)
ans.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
ans.plot.bar(stacked=True)


