Pandas groupby cumulative sum and month-CodePudding

I need to sort my df by month with the cumulative sum for each user (about 5 or 6). Each line is a different log entry by the user, so users may have multiple entries on the same day.

        USER        DATE
1      user1  2021-04-15
4      user5  2021-04-15
5      user3  2021-04-15
6      user1  2021-04-15
14     user2  2021-04-16
...      ...         ...
2227   user4  2021-12-30
2228   user5  2021-12-30
2229   user3  2021-12-30
2230   user2  2021-12-30
2231   user1  2021-12-30

I would like to get something like this

         MONTH    USER  CUMSUM
1      2021-04   user1       3
2      2021-04   user2       5
3      2021-04   user3       2
4      2021-04   user4       0
5      2021-04   user5       1
...        ...     ...     ...
n      2021-12   user1     232
n 1    2021-12   user2     124
n 2    2021-12   user3     152
n 3    2021-12   user4     312
n 4    2021-12   user5     218

The objective is to later graph the cumulative sum by month for each user. I have a code that is already working but had to iterate on the df and count each entry for each month on a dict. Probably not the most efficient way. I tried using cumsum and groupby but so far without success.

CodePudding user response：

how about this one-liner :

df.groupby([pd.Grouper(key='DATE', freq='M'), 'USER'])['USER'].count().groupby(['USER']).cumsum()

CodePudding user response：

You can use pandas Grouper or more typically written pd.Grouper for the month, but you have to set an index if you don't already have one.

df.set_index('DATE').groupby([pd.Grouper(freq = 'M'),'USER']).sum()

CodePudding user response：

df['MONTH'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d') # getting DATE to datetime
df['MONTH'] = df['MONTH'].apply(lambda x: x.strftime("%Y-%m")) # applying your format
df['count'] = 1 # adding a count column for cumsum()
df_try = df.groupby(['USER', 'MONTH']).sum().groupby(level=0).cumsum() # groupby and cumsum