Home > database >  Sum consecutive rows' values between 2 conditions using pandas
Sum consecutive rows' values between 2 conditions using pandas

Time:01-04

I would like to sum consecutive rows values between 2 conditions. Here is my data frame:

df = pd.DataFrame({'A': ["yes","no","no","no","no","yes","yes","no","no","no","yes"],'B':["no","no","no","no","yes","yes","no","no","no","yes","yes"],'C':[2,5,1,4,6,13,7,8,3,9,1]},index=[0, 1, 2, 3,4,5,6,7,8,9,10])

In fact, when A="yes" and B="no", I would like to start adding rows values until A="no" and B="yes". I would like to get the following result:

df1 = pd.DataFrame({'A': ["yes","yes"],'B':["no","no"],'C':[12,18]},index=[0, 6])

CodePudding user response:

You can use a boolean mask twice to get your desired outcome.

In the first mask, you filter the rows whose sum you'll need to find. In the second, you find the sum.

mask = df.apply(lambda x: 1 if (x['A']=='yes')&(x['B']=='no') else (-1 if (x['A']=='no')&(x['B']=='yes') else 0), axis=1).cumsum().astype(bool)
mask2 = df[mask].apply(lambda x: 1 if (x['A']=='yes') & (x['B']=='no') else 0, axis=1)
out = df[mask][mask2.astype(bool)]
out['C'] = df[mask].groupby(mask2.cumsum())['C'].sum().to_numpy()

Output:

     A   B   C
0  yes  no  12
6  yes  no  18
  •  Tags:  
  • Related