I would like to sum consecutive rows values between 2 conditions. Here is my data frame:
df = pd.DataFrame({'A': ["yes","no","no","no","no","yes","yes","no","no","no","yes"],'B':["no","no","no","no","yes","yes","no","no","no","yes","yes"],'C':[2,5,1,4,6,13,7,8,3,9,1]},index=[0, 1, 2, 3,4,5,6,7,8,9,10])
In fact, when A="yes" and B="no", I would like to start adding rows values until A="no" and B="yes".
I would like to get the following result:
df1 = pd.DataFrame({'A': ["yes","yes"],'B':["no","no"],'C':[12,18]},index=[0, 6])
CodePudding user response:
You can use a boolean mask twice to get your desired outcome.
In the first mask, you filter the rows whose sum you'll need to find. In the second, you find the sum.
mask = df.apply(lambda x: 1 if (x['A']=='yes')&(x['B']=='no') else (-1 if (x['A']=='no')&(x['B']=='yes') else 0), axis=1).cumsum().astype(bool)
mask2 = df[mask].apply(lambda x: 1 if (x['A']=='yes') & (x['B']=='no') else 0, axis=1)
out = df[mask][mask2.astype(bool)]
out['C'] = df[mask].groupby(mask2.cumsum())['C'].sum().to_numpy()
Output:
A B C
0 yes no 12
6 yes no 18
