How to count the number of times a condition consecutively exists across rows in a column in pandas-CodePudding

I have a simple df like this:

I am interested in counting the number of times 1 consecutively occurs, and then creating a column assigning that value to each row in the chunk. It should look like this:

    data   count
0    0       0
1    0       0
2    1       2
3    1       2
4    0       0
5    0       0
6    1       4
7    1       4
8    1       4
9    1       4

Is there a simple pandas way to do this?

CodePudding user response：

Use np.where, cumsum

s=df.groupby((df['data']!=df['data'].shift(1)).cumsum()).transform('count')
df['count']=np.where(df['data'].eq(1),s['data'],0)

How it works

#Create unique groups for each partition with unique values

(df['data']!=df['data'].shift(1)).cumsum()

# groupby the groups generated above and count occurrences in each group saving into a staging dataframe
df.groupby((df['data']!=df['data'].shift(1)).cumsum()).transform('count')


#Create a new column in the old dataframe allocating the count value from the staging dataframe if data value in old is 1 else make the count value 0

df['count']=np.where(df['data'].eq(1),s['data'],0)

outcome

data  count
0     0      0
1     0      0
2     1      2
3     1      2
4     0      0
5     0      0
6     1      4
7     1      4
8     1      4
9     1      4

CodePudding user response：

Another way:

import pandas as pd
df = pd.DataFrame({'data': [0, 0, 1, 1, 0, 0, 1, 1, 1, 1]})

df['count'] = 0
m = df['data'] == 1
df.loc[m, 'count'] = df.loc[m, 'data'].groupby(by=(~m).cumsum()).transform("count")

It gives:

  data  count
0     0      0
1     0      0
2     1      2
3     1      2
4     0      0
5     0      0
6     1      4
7     1      4
8     1      4
9     1      4