I wanted to assign the unique id based on the value from the column. For ex. i have a table like this:
df = pd.DataFrame({'A': [0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1]}
Eventually I would like to have my output table looks like this:
| A | id | |
|---|---|---|
| 1 | 0 | 1 |
| 2 | 0 | 1 |
| 3 | 0 | 1 |
| 4 | 0 | 1 |
| 5 | 0 | 1 |
| 6 | 0 | 1 |
| 7 | 1 | 2 |
| 8 | 1 | 2 |
| 9 | 1 | 2 |
| 10 | 1 | 2 |
| 11 | 1 | 2 |
| 12 | 1 | 2 |
| 13 | 0 | 3 |
| 14 | 0 | 3 |
| 15 | 0 | 3 |
| 16 | 0 | 3 |
| 17 | 0 | 3 |
| 18 | 0 | 3 |
| 19 | 1 | 4 |
| 20 | 1 | 4 |
| 21 | 1 | 4 |
| 22 | 0 | 5 |
| 23 | 0 | 5 |
| 24 | 0 | 5 |
| 25 | 0 | 5 |
| 26 | 1 | 6 |
| 27 | 1 | 6 |
| 28 | 1 | 6 |
I tried data.groupby(['a'], sort=False).ngroup() 1 but its not working as what I want. Any help and guidance will be appreciated! thanks!
CodePudding user response:
diff cumsum:
df['id'] = df.A.diff().ne(0).cumsum()
df
A id
0 0 1
1 0 1
2 0 1
3 0 1
4 0 1
5 0 1
6 0 1
7 1 2
8 1 2
9 1 2
10 1 2
11 1 2
12 1 2
13 0 3
14 0 3
15 0 3
16 0 3
17 0 3
18 0 3
19 1 4
20 1 4
21 1 4
22 0 5
23 0 5
24 0 5
25 0 5
26 1 6
27 1 6
28 1 6
CodePudding user response:
import pdrle
df["id"] = pdrle.get_id(df["A"]) 1
df
# A id
# 0 0 1
# 1 0 1
# 2 0 1
# 3 0 1
# 4 0 1
# 5 0 1
# 6 0 1
# 7 1 2
# 8 1 2
# 9 1 2
# 10 1 2
# 11 1 2
# 12 1 2
# 13 0 3
# 14 0 3
# 15 0 3
# 16 0 3
# 17 0 3
# 18 0 3
# 19 1 4
# 20 1 4
# 21 1 4
# 22 0 5
# 23 0 5
# 24 0 5
# 25 0 5
# 26 1 6
# 27 1 6
# 28 1 6
