Suppose I have the following table:
| id | name | date | |
|---|---|---|---|
| 1 | Sta | [email protected] | 11.11.22 |
| 2 | Danny | [email protected] | 11.11.22 |
| 3 | Elle | [email protected] | 11.11.22 |
| 4 | Elle | [email protected] | 11.11.22 |
| 5 | Elle | [email protected] | 12.11.22 |
What is the best way to create an incremental counter for repeating observations for the feature subset [name, date]?
Desired output:
| id | name | date | counter | |
|---|---|---|---|---|
| 1 | Sta | [email protected] | 11.11.22 | 1 |
| 2 | Danny | [email protected] | 11.11.22 | 1 |
| 3 | Elle | [email protected] | 11.11.22 | 1 |
| 4 | Elle | [email protected] | 11.11.22 | 2 |
| 5 | Elle | [email protected] | 12.11.22 | 1 |
Edit: The table itself is sorted correctly and the duplicates appear after each other.
CodePudding user response:
df['counter'] = df.groupby(['name', 'date']).cumcount() 1
df
id name mail date counter
0 1 Sta [email protected] 11.11.22 1
1 2 Danny [email protected] 11.11.22 1
2 3 Elle [email protected] 11.11.22 1
3 4 Elle [email protected] 11.11.22 2
4 5 Elle [email protected] 12.11.22 1
