I have a pandas dataframe like below
| Id | Date | Aid |
|---|---|---|
| 0 | 2022-01-01 | one |
| 1 | 2022-01-01 | two |
| 2 | 2022-01-05 | one |
| 3 | 2022-01-06 | three |
| 4 | 2022-01-02 | one |
| 5 | 2022-01-01 | one |
I would like to add a column "counter" which, for each row, will contain the number of rows with the same "Aid" and "Date" lower or equal with the row's "Date"
So, for row 4, 'Counter' = 3 (rows 0, 4, 5).
The result should be:
| Id | Date | Aid | Counter |
|---|---|---|---|
| 0 | 2022-01-01 | one | 2 |
| 1 | 2022-01-01 | two | 1 |
| 2 | 2022-01-05 | one | 4 |
| 3 | 2022-01-06 | three | 1 |
| 4 | 2022-01-02 | one | 3 |
| 5 | 2022-01-01 | one | 2 |
Is there a way to do this count in pandas ?
The original dataframe contains milions of rows so efficiency is very important
CodePudding user response:
rank with max method
pd.to_datetime(df['Date']).groupby(df['Aid']).rank(method='max').astype('int')
output:
0 2
1 1
2 4
3 1
4 3
5 2
Name: Date, dtype: int32
make output to Counter column
df.assign(Counter=pd.to_datetime(df['Date']).groupby(df['Aid']).rank(method='max').astype('int'))
result:
Id Date Aid Counter
0 2022-01-01 one 2
1 2022-01-01 two 1
2 2022-01-05 one 4
3 2022-01-06 three 1
4 2022-01-02 one 3
5 2022-01-01 one 2
