I have a dataframe that I am trying to use pandas.groupby on to get the cumulative sum. The values that I am grouping by show up in two different columns, and I am having trouble getting the groupby to work correctly. My starting dataframe is:
df = pd.DataFrame({'col_A': ['red', 'red', 'blue', 'red'], 'col_B': ['blue', 'red', 'blue', 'red'], 'col_A_qty': [1, 1, 1, 1], 'col_B_qty': [1, 1, 1, 1]})
col_A col_B col_A_qty col_B_qty
red blue 1 1
red red 1 1
blue blue 1 1
red red 1 1
The result I am trying to get is:
col_A col_B col_A_qty col_B_qty red_cumsum blue_cumsum
red blue 1 1 1 1
red red 1 1 3 1
blue blue 1 1 3 3
red red 1 1 5 3
I've tried:
df.groupby(['col_A', 'col_B'])['col_A_qty'].cumsum()
but this groups on the combination of col_A and col_B. How can I use pandas.groupby to calculate the cumulative sum of red and blue, regardless of if it's in col_A or col_B?
CodePudding user response:
Try two pivot
out = pd.pivot(df,columns='col_A',values='col_A_qty').fillna(0).cumsum().add(pd.pivot(df,columns='col_B',values='col_B_qty').fillna(0).cumsum(),fill_value=0)
Out[404]:
col_A blue red
0 1.0 1.0
1 1.0 3.0
2 3.0 3.0
3 3.0 5.0
df = df.join(out)
CodePudding user response:
A simple method is to define each cumsum column by two Series.cumsum, as follows:
df['red_cumsum'] = df['col_A'].eq('red').cumsum() df['col_B'].eq('red').cumsum()
df['blue_cumsum'] = df['col_A'].eq('blue').cumsum() df['col_B'].eq('blue').cumsum()
In each column col_A and col_B, check for values equal 'red' / 'blue' (results are boolean series). Then, we use Series.cumsum on these resultant boolean series to get the cumulative counts. You don't really need to use pandas.groupby in this use case.
If you have multiple items in col_A and col_B, you can also iterate through the unique item list, as follows:
for item in pd.unique(df['col_A'].tolist() df['col_B'].tolist()):
df[f'{item}_cumsum'] = df['col_A'].eq(item).cumsum() df['col_B'].eq(item).cumsum()
Result:
print(df)
col_A col_B col_A_qty col_B_qty red_cumsum blue_cumsum
0 red blue 1 1 1 1
1 red red 1 1 3 1
2 blue blue 1 1 3 3
3 red red 1 1 5 3
