I have a simple df like this:
data
0 0
1 0
2 1
3 1
4 0
5 0
6 1
7 1
8 1
9 1
I am interested in counting the number of times 1 consecutively occurs, and then creating a column assigning that value to each row in the chunk. It should look like this:
data count
0 0 0
1 0 0
2 1 2
3 1 2
4 0 0
5 0 0
6 1 4
7 1 4
8 1 4
9 1 4
Is there a simple pandas way to do this?
CodePudding user response:
Use np.where, cumsum
s=df.groupby((df['data']!=df['data'].shift(1)).cumsum()).transform('count')
df['count']=np.where(df['data'].eq(1),s['data'],0)
How it works
#Create unique groups for each partition with unique values
(df['data']!=df['data'].shift(1)).cumsum()
# groupby the groups generated above and count occurrences in each group saving into a staging dataframe
df.groupby((df['data']!=df['data'].shift(1)).cumsum()).transform('count')
#Create a new column in the old dataframe allocating the count value from the staging dataframe if data value in old is 1 else make the count value 0
df['count']=np.where(df['data'].eq(1),s['data'],0)
outcome
data count
0 0 0
1 0 0
2 1 2
3 1 2
4 0 0
5 0 0
6 1 4
7 1 4
8 1 4
9 1 4
CodePudding user response:
Another way:
import pandas as pd
df = pd.DataFrame({'data': [0, 0, 1, 1, 0, 0, 1, 1, 1, 1]})
df['count'] = 0
m = df['data'] == 1
df.loc[m, 'count'] = df.loc[m, 'data'].groupby(by=(~m).cumsum()).transform("count")
It gives:
data count
0 0 0
1 0 0
2 1 2
3 1 2
4 0 0
5 0 0
6 1 4
7 1 4
8 1 4
9 1 4
