calculate average of a column in accordance to another column-CodePudding

Say I have this dummy panda's df:

    Feature1    Featrue2
0       X           0
1       X           0
2       Y           0
3       Y           1
4       Y           1
5       X           1
6       Y           0
7       X           1
8       Y           1
9       X           0

How do I calculate the average of Feature2, only when the value of Feature1 is X, and the average of Feature2 again, just when the value of Feature1 is Y? I figure it's by using groupby, however it's not working for me.

My attempt (making a function to find the difference in the two averages):

def diff_of_avg(df, column_name , groupby_var):
    groupby_var = df.groupby(groupby_var)
    avgs = groupby_var[column_name].mean()
    return avgs.loc['1'] - avgs.loc['0']

where groupby_var is Feature2

and column_name is Feature1

CodePudding user response：

You can indeed use groupby():

df2 = df.groupby('Feature1').mean()

Ouput:

          Featrue2
Feature1
X              0.4
Y              0.6

Docs for mean() give some examples as well.

To find the difference in the averages of X and Y, you can do this:

diffOfAverages = df.groupby('Feature1').mean().diff().iloc[-1,-1]

Output:

0.19999999999999996