I wanted to apply the solutions found here and here for a problem I had. I wrote a function that had the following return line (aka line 1):
return df.loc[mask].groupby('HorseId')['Plassering'].apply(lambda x: x.shift().expanding().mean())
I tried to remove the apply function, rewriting the above line like this (aka line 2):
return df.loc[mask].groupby('HorseId')['Plassering'].shift().expanding().mean()
However, the second line completely ignored the groupby part and computed the average for the entire dataset. The first line worked perfectly.
What is the difference between line 1 and line 2?
CodePudding user response:
In the first version, the function specified inside .apply() is applied to each group, the results are combined into a single dataframe and returned. In effect, .shift().expand().mean() is computed for each group separately.
In the second version .shift() is applied to a groupby object. This shifts each group, assembles the groups into a single dataframe, and returns this dataframe. The following methods .expanding().mean() are applied to the whole dataframe.
