I have a Pandas dataframe with data like the following. I need to group by company and date and then compute the rolling average (eg 2 days, 3 days, etc) by such grouping.
df = pd.DataFrame({
'company': ['abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz'],
'dt': pd.to_datetime([
'2022-01-01', '2022-01-02', '2022-01-03',
'2022-01-31', '2022-02-01', '2022-02-02'
]),
'volume': [1, 2, 3, 4, 5, 6]
})
If I do the following, the rolling average is across the companies.
df.groupby(['company', 'dt'])['volume'].mean().rolling(2).mean()
The results is as follows, but is not what I want. When I get to company xyz it should only consider dates/values associated with that company.
company dt
abc 2022-01-01 NaN
2022-01-02 1.5
2022-01-03 2.5
xyz 2022-01-31 3.5
2022-02-01 4.5
2022-02-02 5.5
Additionally, I cannot specify the frequency to be 2D or 3D; if I do, then I get the following error: ValueError: window must be an integer 0 or greater.
Is the only way to do what I want to issue separate codes?
df[df['company']=='abc'].groupby(['dt'])['volume'].mean().rolling('2D').mean()
df[df['company']=='xyz'].groupby(['dt'])['volume'].mean().rolling('2D').mean()
CodePudding user response:
There is better way. Group on company and use dt as a column on which to calculate rolling window:
df['dt'] = pd.to_datetime(df['dt'])
df.groupby('company').rolling('2d', on='dt')['volume'].mean()
company dt
abc 2022-01-01 1.0
2022-01-02 1.5
2022-01-03 2.5
xyz 2022-01-31 4.0
2022-02-01 4.5
2022-02-02 5.5
Name: volume, dtype: float64
