How do you forward fill in pandas within a group?-CodePudding

I have a pandas DataFrame of the form

quarter	user_id	# Sessions
2022 Q1	1	9
2021 Q4	1
2021 Q3	1
2022 Q1	2	8
2021 Q4	2
2021 Q3	2

And I'd like to forward fill the # Sessions column within each user_id to get a table like:

quarter	user_id	# Sessions
2022 Q1	1	9
2021 Q4	1	9
2021 Q3	1	9
2022 Q1	2	8
2021 Q4	2	8
2021 Q3	2	8

I can do this with

x.groupby('user_id').apply(lambda x: x.fillna({'# Sessions': x['# Sessions'].ffill()}))

But I have reasonably big data (~10k users), and this is a very common operation in the codebase.

Is .groupby necessary or is there a more performant way to achieve the same?

CodePudding user response：

You could also try transforming first (since first skips NaNs):

df['# Sessions'] = df.groupby('user_id')['# Sessions'].transform('first')

Output:

   quarter  user_id  # Sessions
0  2022 Q1        1         9.0
1  2021 Q4        1         9.0
2  2021 Q3        1         9.0
3  2022 Q1        2         8.0
4  2021 Q4        2         8.0
5  2021 Q3        2         8.0

CodePudding user response：

We can avoid use groupby, First we use DataFrame.sort_values, Then we just use ffill except for the rows where user_id change occurs.

df2 = df.sort_values('user_id')
df['# Sessions'] = df2['# Sessions'].ffill()\
    .where(df2['user_id'].eq(df2['user_id'].shift()),
           df2['# Sessions'])

CodePudding user response：

This should be much faster than what you're doing:

x['# Sessions'] = x.groupby('user_id')['# Sessions'].ffill()