Home > Software engineering >  Optimal way to take mean of Series
Optimal way to take mean of Series

Time:01-20

With a DataFrame organized by DateTime index, what would be the best way to take a continually rolling mean of a column inside it?

Ex:

2021-12-20 09:30:00       6.0
2021-12-20 09:31:00      36.0
2021-12-20 09:32:00      37.0
2021-12-20 09:33:00      41.0
2021-12-20 09:34:00      73.0

My goal is to have a new column where the first value is the average of the first row (6), second row is the average of the first two rows, and so on. Is this achievable through Pandas, or do I have to iterate over the whole DataFrame? Additionally, would it be possible to then reset this column to the first volume at 9:30:00 every day?

CodePudding user response:

Use DataFrame.expanding with mean:

df['A'].expanding().mean()

Per groups starting by 09:30 is possible create helper groups:

import datetime

g = np.cumsum(df.index.time == datetime.time(9, 30))
df.groupby(g)['A'].expanding().mean()

print (df)
                        A
2021-12-20 09:30:00   6.0
2021-12-20 09:31:00  36.0
2021-12-20 09:32:00  37.0
2021-12-20 09:33:00  41.0
2021-12-20 09:34:00  73.0
2021-12-21 09:30:00   6.0
2021-12-21 09:31:00  36.0
2021-12-21 09:32:00  37.0
2021-12-21 09:33:00  41.0
2021-12-21 09:34:00  73.0

import datetime

g = np.cumsum(df.index.time == datetime.time(9, 30))

df['rolling mean'] = df['A'].expanding().mean()
df['reset'] = df.groupby(g)['A'].expanding().mean().droplevel(0)
print (df)
                        A  rolling mean      reset
2021-12-20 09:30:00   6.0      6.000000   6.000000
2021-12-20 09:31:00  36.0     21.000000  21.000000
2021-12-20 09:32:00  37.0     26.333333  26.333333
2021-12-20 09:33:00  41.0     30.000000  30.000000
2021-12-20 09:34:00  73.0     38.600000  38.600000
2021-12-21 09:30:00   6.0     33.166667   6.000000
2021-12-21 09:31:00  36.0     33.571429  21.000000
2021-12-21 09:32:00  37.0     34.000000  26.333333
2021-12-21 09:33:00  41.0     34.777778  30.000000
2021-12-21 09:34:00  73.0     38.600000  38.600000

CodePudding user response:

One option is the find the cumulative sum using cumsum and cumulative count using expanding_count and divide the two to find the rolling average:

out = s.cumsum() / s.expanding(1).count()

Output:

2021-12-20 09:30:00     6.000000
2021-12-20 09:31:00    21.000000
2021-12-20 09:32:00    26.333333
2021-12-20 09:33:00    30.000000
2021-12-20 09:34:00    38.600000
dtype: float64
  •  Tags:  
  • Related