I'm looking for an efficient way to create the 'average_units' column - which, for each group in Col B, goes row by row and computes the average units for previous 4 days while EXCLUDING 0 unit days. Simple example below:
Original DF
ColA ColB Units
2021-01-01 A 2
2021-01-02 A 0
2021-01-03 A 4
2021-01-04 A 8
2021-01-05 A 1
2021-01-06 A 0
2021-01-07 A 0
2021-01-08 A 7
2021-01-01 B 3
2021-01-02 B 6
2021-01-03 B 9
2021-01-04 B 0
2021-01-05 B 1
2021-01-06 B 7
2021-01-07 B 0
2021-01-08 B 5
Desired DF... Example of 'Avg Units' calculation: On 2021-01-08 in group A... calculation will look at the previous 4 days of units (0,0,1,8)...and will compute the average with the 0's excluded - thus 1 8 / 2 = 4.5
ColA ColB Units Avg Units
2021-01-01 A 2 NA
2021-01-02 A 0 NA
2021-01-03 A 4 NA
2021-01-04 A 8 NA
2021-01-05 A 1 7
2021-01-06 A 0 6.5
2021-01-07 A 0 6.5
2021-01-08 A 7 4.5
2021-01-01 B 3 NA
2021-01-02 B 6 NA
2021-01-03 B 9 NA
2021-01-04 B 0 NA
2021-01-05 B 1 9
2021-01-06 B 7 8
2021-01-07 B 0 8.5
2021-01-08 B 5 4
I know this can be done with some loops but it would be super complex and time consuming because I have 1000's of groups and more dates. Any suggestions is appreciated!
CodePudding user response:
import numpy as np
def rolling_average_custom(group):
group_parsed = group.iloc[0:4].replace(0, np.nan)
return group_parsed.mean()
tmp.groupby('ColB').rolling(min_periods=5, window=5).apply(rolling_average_custom)
returns
Units
ColB
A 0 NaN
1 NaN
2 NaN
3 NaN
4 4.666667
5 4.333333
6 4.333333
7 4.500000
B 8 NaN
9 NaN
10 NaN
11 NaN
12 6.000000
13 5.333333
14 5.666667
15 4.000000
I think you have some mistakes in your example. Like the first value should not be 7, but (8 4 2)/3. If that's the case, the solution should work.
To add to the dataframe:
rolling_output = tmp.groupby('ColB').rolling(min_periods=5, window=5).apply(rolling_average_custom)
tmp = tmp.set_index(['ColB'], append=True)
tmp = tmp.swaplevel()
tmp['average'] = rolling_output
Result:
ColA Units average
ColB
A 0 2021-01-01 2 NaN
1 2021-01-02 0 NaN
2 2021-01-03 4 NaN
3 2021-01-04 8 NaN
4 2021-01-05 1 4.666667
5 2021-01-06 0 4.333333
6 2021-01-07 0 4.333333
7 2021-01-08 7 4.500000
B 8 2021-01-01 3 NaN
9 2021-01-02 6 NaN
10 2021-01-03 9 NaN
11 2021-01-04 0 NaN
12 2021-01-05 1 6.000000
13 2021-01-06 7 5.333333
14 2021-01-07 0 5.666667
15 2021-01-08 5 4.000000
