I have a dataframe df that looks like this:
column_a ...
1
1
1
2
3
3
3
3
3
I now want to group the dataframe based on column_a but the resulting groups should be not of greater size than s.
So, for s=2 the groups should be:
(1,1), (1), (2), (3,3), (3,3), (3).
I have this working with a simple loop over the grouped dataframe (df.groupby(['column_a'])) and splitting the groups if they are too big but I have the feeling there is a shorter and more elegant way to do this.
Does anyone know a short and elegant method to group with a limited group size?
CodePudding user response:
It seems like you could group by a and the floor div of the groupby cumcount and s.
import pandas as pd
df = pd.DataFrame({'a':[1,1,1,2,3,3,3,3,3]})
s = 2
df.groupby(['a',df.groupby('a').cumcount()//s]).size()
Output
a
1 0 2
1 1
2 0 1
3 0 2
1 2
2 1
