Hi I have a table of data like below and I want to try do a rolling count that takes the date in the group by and the values of dates prior.
Table of data:
| Date | ID |
|---|---|
| 1/1/2020 | 123 |
| 2/1/2020 | 432 |
| 2/1/2020 | 5234 |
| 4/1/2020 | 543 |
| 5/1/2020 | 645 |
| 6/1/2020 | 231 |
My desired output is something like this:
| Date | count |
|---|---|
| 1/1/2020 | 1 |
| 2/1/2020 | 3 |
| 4/1/2020 | 4 |
| 5/1/2020 | 5 |
| 6/1/2020 | 6 |
I have tried the following but it doesn't seem to work on how I want it do it.
df[['id','date']].groupby('date').cumcount()
CodePudding user response:
Convert column to datetimes for correct ordering if aggregate GroupBy.size and add cumulative sum by Series.cumsum:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.groupby('Date').size().cumsum().reset_index(name='count')
print (df)
Date count
0 2020-01-01 1
1 2020-01-02 3
2 2020-01-04 4
3 2020-01-05 5
4 2020-01-06 6
