With a DataFrame organized by DateTime index, what would be the best way to take a continually rolling mean of a column inside it?
Ex:
2021-12-20 09:30:00 6.0
2021-12-20 09:31:00 36.0
2021-12-20 09:32:00 37.0
2021-12-20 09:33:00 41.0
2021-12-20 09:34:00 73.0
My goal is to have a new column where the first value is the average of the first row (6), second row is the average of the first two rows, and so on. Is this achievable through Pandas, or do I have to iterate over the whole DataFrame? Additionally, would it be possible to then reset this column to the first volume at 9:30:00 every day?
CodePudding user response:
Use DataFrame.expanding with mean:
df['A'].expanding().mean()
Per groups starting by 09:30 is possible create helper groups:
import datetime
g = np.cumsum(df.index.time == datetime.time(9, 30))
df.groupby(g)['A'].expanding().mean()
print (df)
A
2021-12-20 09:30:00 6.0
2021-12-20 09:31:00 36.0
2021-12-20 09:32:00 37.0
2021-12-20 09:33:00 41.0
2021-12-20 09:34:00 73.0
2021-12-21 09:30:00 6.0
2021-12-21 09:31:00 36.0
2021-12-21 09:32:00 37.0
2021-12-21 09:33:00 41.0
2021-12-21 09:34:00 73.0
import datetime
g = np.cumsum(df.index.time == datetime.time(9, 30))
df['rolling mean'] = df['A'].expanding().mean()
df['reset'] = df.groupby(g)['A'].expanding().mean().droplevel(0)
print (df)
A rolling mean reset
2021-12-20 09:30:00 6.0 6.000000 6.000000
2021-12-20 09:31:00 36.0 21.000000 21.000000
2021-12-20 09:32:00 37.0 26.333333 26.333333
2021-12-20 09:33:00 41.0 30.000000 30.000000
2021-12-20 09:34:00 73.0 38.600000 38.600000
2021-12-21 09:30:00 6.0 33.166667 6.000000
2021-12-21 09:31:00 36.0 33.571429 21.000000
2021-12-21 09:32:00 37.0 34.000000 26.333333
2021-12-21 09:33:00 41.0 34.777778 30.000000
2021-12-21 09:34:00 73.0 38.600000 38.600000
CodePudding user response:
One option is the find the cumulative sum using cumsum and cumulative count using expanding_count and divide the two to find the rolling average:
out = s.cumsum() / s.expanding(1).count()
Output:
2021-12-20 09:30:00 6.000000
2021-12-20 09:31:00 21.000000
2021-12-20 09:32:00 26.333333
2021-12-20 09:33:00 30.000000
2021-12-20 09:34:00 38.600000
dtype: float64
