What is the difference between these two code lines?-CodePudding

I wanted to apply the solutions found here and here for a problem I had. I wrote a function that had the following return line (aka line 1):

return df.loc[mask].groupby('HorseId')['Plassering'].apply(lambda x: x.shift().expanding().mean())

I tried to remove the apply function, rewriting the above line like this (aka line 2):

return df.loc[mask].groupby('HorseId')['Plassering'].shift().expanding().mean()

However, the second line completely ignored the groupby part and computed the average for the entire dataset. The first line worked perfectly.

What is the difference between line 1 and line 2?

CodePudding user response：

In the first version, the function specified inside .apply() is applied to each group, the results are combined into a single dataframe and returned. In effect, .shift().expand().mean() is computed for each group separately.

In the second version .shift() is applied to a groupby object. This shifts each group, assembles the groups into a single dataframe, and returns this dataframe. The following methods .expanding().mean() are applied to the whole dataframe.