Home > Enterprise >  Calculate grouped average of columns in pandas dataframe for non zero elements based on time
Calculate grouped average of columns in pandas dataframe for non zero elements based on time

Time:01-25

Hello I would like to calculate the average of days column for each Name group at each Time. The average at each Time for each group should be calculated only based on the elements of the rows of the Days which are larger than 0. Any feedback would be highly appreciated ..

Name       Time        Days    Average
 
John    2021-12-02      0        0
John    2021-12-03      2        0 
John    2021-12-05      9        2
John    2021-12-07      0       5.5
John    2021-12-10     10       5.5
Larry   2021-12-02     20        0
Jim     2021-12-09     20        0
Jim     2021-12-10     20       20
Jim     2021-12-12     40       20
Jim     2021-12-12      0       26.6
Juli    2021-11-09      0        0
Juli    2021-11-10      0        0
Juli    2021-11-12     40        0
Juli    2021-11-18      0       40
Juli    2021-11-12      0       40
Juli    2021-11-18      2       40
Juli    2021-11-19      0       21

CodePudding user response:

First replace 0 to missing values and then use GroupBy.transform with lambda function for Series.expanding with mean and Series.shift, last replace NaNs to 0 by Series.fillna:

df['Avg'] = (df.assign(Days = df['Days'].replace(0,np.nan))
               .groupby('Name')['Days']
               .transform(lambda x: x.expanding().mean().shift())
               .fillna(0))
print (df)
     Name        Time  Days  Average        Avg
0    John  2021-12-02     0      0.0   0.000000
1    John  2021-12-03     2      0.0   0.000000
2    John  2021-12-05     9      2.0   2.000000
3    John  2021-12-07     0      5.5   5.500000
4    John  2021-12-10    10      5.5   5.500000
5   Larry  2021-12-02    20      0.0   0.000000
6     Jim  2021-12-09    20      0.0   0.000000
7     Jim  2021-12-10    20     20.0  20.000000
8     Jim  2021-12-12    40     20.0  20.000000
9     Jim  2021-12-12     0     26.6  26.666667
10   Juli  2021-11-09     0      0.0   0.000000
11   Juli  2021-11-10     0      0.0   0.000000
12   Juli  2021-11-12    40      0.0   0.000000
13   Juli  2021-11-18     0     40.0  40.000000
14   Juli  2021-11-12     0     40.0  40.000000
15   Juli  2021-11-18     2     40.0  40.000000
16   Juli  2021-11-19     0     21.0  21.000000

CodePudding user response:

First of all be sure that the "Days" column is numeric. Then:

df.loc[df.Days>0].groupby(["Name","Time"]).mean()
  •  Tags:  
  • Related