I have a DataFrame with two column identifier Date and ID and try to calculate the rolling mean of the Score over three days for each ID individually over time.
Date ID Score
2022-01-02 1 1
2022-01-03 1 2
2022-01-04 1 1
2022-01-05 1 3
2022-01-02 2 5
2022-01-03 2 6
2022-01-04 2 7
2022-01-05 2 3
So far I only know how to create a rolling mean over one particular column without considering the second identifier ID:
df[RollingMean3]=df[Score].rolling(3).mean()
I try to get
Date ID Score ScoreRollingMean3
2022-01-02 1 1 NaN
2022-01-03 1 2 NaN
2022-01-04 1 1 1.33
2022-01-05 1 3 2
2022-01-02 2 5 NaN
2022-01-03 2 6 NaN
2022-01-04 2 7 6
2022-01-05 2 3 5.33
For reproducability:
df = pd.DataFrame({
'Date':['2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
'ID':[1, 1, 1, 1, 2, 2, 2, 2],
'Score':[1, 2, 1, 3, 5, 6, 7, 3]})
Thank you very much
CodePudding user response:
Use DataFrame.groupby with Series.droplevel if datetimes are consecutive:
df['RollingMean3']=df.groupby('ID')['Score'].rolling(3).mean().droplevel(0)
print (df)
Date ID Score RollingMean3
0 2022-01-02 1 1 NaN
1 2022-01-03 1 2 NaN
2 2022-01-04 1 1 1.333333
3 2022-01-05 1 3 2.000000
4 2022-01-02 2 5 NaN
5 2022-01-03 2 6 NaN
6 2022-01-04 2 7 6.000000
7 2022-01-05 2 3 5.333333
General solution with rolling window 3D is possible with DatetimeIndex:
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df['RollingMean3']=df.groupby('ID')['Score'].rolling('3D').mean().droplevel(0)
print (df)
ID Score RollingMean3
Date
2022-01-02 1 1 1.000000
2022-01-03 1 2 1.500000
2022-01-04 1 1 1.333333
2022-01-05 1 3 2.000000
2022-01-02 2 5 5.000000
2022-01-03 2 6 5.500000
2022-01-04 2 7 6.000000
2022-01-05 2 3 5.333333
