So I have this dataframe:
data = {'value':[1,1,1,0,1,0,1,0,0,0,0,0,1,1,1,0,0,1,0,1]}
df = pd.DataFrame(data)
| row | value |
|---|---|
| 0 | 1 |
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
| 4 | 1 |
| 5 | 0 |
| 6 | 1 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
| 10 | 0 |
| 11 | 0 |
I would like to add another column called "Cumulative" which will count how many times a number appears consecutively and will stop counting when the value is no longer the same as the one before. Then it should start counting again. This would be the resulting dataframe:
| row | value | Cumulative |
|---|---|---|
| 0 | 1 | 0 |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 0 | 0 |
| 4 | 1 | 0 |
| 5 | 0 | 0 |
| 6 | 1 | 0 |
| 7 | 0 | 0 |
| 8 | 0 | 1 |
| 9 | 0 | 2 |
| 10 | 0 | 3 |
| 11 | 0 | 4 |
| 12 | 1 | 0 |
I have tried a couple of built-in functions like where, mask, and cumsum, but I'm honestly clueless when it comes to iterating and creating for loops and I'm positive that this is probably where the answer lies. Is there a function that I am not aware of that could do this? Or is there no avoiding for loops?
CodePudding user response:
Try:
df.groupby(df['value'].diff().ne(0).cumsum()).cumcount()
Output:
0 0
1 1
2 2
3 0
4 0
5 0
6 0
7 0
8 1
9 2
10 3
11 4
12 0
13 1
14 2
15 0
16 1
17 0
18 0
19 0
dtype: int64
CodePudding user response:
This code adds the collum "Cumulative" and calculates the number of consecutive
# add col named "Cumulative"
df['Cumulative'] = [0 for i in range(len(df))]
last = 0
# if value is 1, add 1 to the vlaue in the col named "Cumulative"
for i in range(len(df)):
if df['value'][i] == 1:
df['Cumulative'][i] = last 1
last = 1
else:
df['Cumulative'][i] = df['Cumulative'][i]
last = 0
print(df)
the output is this:
value Cumulative
0 1 1
1 1 2
2 1 3
3 0 0
4 1 1
5 0 0
6 1 1
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 1 1
13 1 2
14 1 3
15 0 0
16 0 0
17 1 1
18 0 0
19 1 1
EDIT
this code now works for all different numbers.
import pandas as pd
data = {'value': [1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1]}
df = pd.DataFrame(data)
# add col named "Cumulative"
df['Cumulative'] = [0 for i in range(len(df))]
last_count = 0
# if value is 1, add 1 to the vlaue in the col named "Cumulative"
last_num = df['value'][0]
for i in range(len(df)):
if df['value'][i] == last_num:
df['Cumulative'][i] = last_count 1
last_count = 1
else:
df['Cumulative'][i] = 1
last_num = df['value'][i]
last_count = 1
print(df)
The result:
value Cumulative
0 1 1
1 1 2
2 1 3
3 0 1
4 1 1
5 0 1
6 1 1
7 0 1
8 0 2
9 0 3
10 0 4
11 0 5
12 1 1
13 1 2
14 1 3
15 0 1
16 0 2
17 1 1
18 0 1
19 1 1
