How do I get both the sum and an overlapping a list of dates that fall within a sliding window in a-CodePudding

I have a df like this:

date	name	amount
2021-07-01	'Chlorox'	1
2021-07-14	'Chlorox'	20
2021-07-29	'Chlorox'	700
2021-08-11	'Chlorox'	6000
2021-08-12	'Suriname'	3
2021-08-19	'Suriname'	10

and I'd like the sum of amounts within a one month period, plus all the dates that fall within that month range. So something like these results:

name	sum	dates
'Chlorox'	721	['2021-07-01', '2021-07-14', '2021-07-29']
'Chlorox'	6720	'2021-07-14', '2021-07-29', '2021-08-11'
'Suriname'	13	['2021-08-12', '2021-08-19']

I've been tinkering around with rolling() and groupby, but I've been struggling and unable to get overlapping dates!

CodePudding user response：

Use pd.to_datetime, Series.dt.to_period with Groupby.agg:

In [874]: df['date'] = pd.to_datetime(df['date']) # Convert date column to pandas datetime

In [923]: res = df.groupby(['name', df['date'].dt.to_period('M')], as_index=False).agg({'amount': sum, 'date': lambda x: list(x.dt.date)})

In [924]: res
Out[924]: 
         name  amount                                  date
0   'Chlorox'     721  [2021-07-01, 2021-07-14, 2021-07-29]
1   'Chlorox'    6000                          [2021-08-11]
2  'Suriname'      13              [2021-08-12, 2021-08-19]

CodePudding user response：

I think the objective is to group by the columns based on the month and name. So the result dataframe will look like this -

date  amount
name     Month                                            
Chlorox  7      2021-07-01, 2021-07-14, 2021-07-29     721
         8                              2021-08-11    6000
Suriname 8                  2021-08-12, 2021-08-19      13

If that is the case - then here is a solution.

>>> import numpy as np
>>> import pandas as pd
>>> d = {
    'date' : ['2021-07-01', '2021-07-14', '2021-07-29', '2021-08-11', '2021-08-12', '2021-08-19', '2020-08-11', '2020-08-12', '2020-08-19'],
    'name' : ['Chlorox', 'Chlorox', 'Chlorox', 'Chlorox', 'Suriname', 'Suriname','Chlorox', 'Suriname', 'Suriname'],
    'amount': [1,20,700,6000,3,10,6500,300,150]
}

>>> df = pd.DataFrame(d)

We will add a "Datetime" (dtype=datetime) column and a month (dtype=int) and a year column

>>> df['Datetime'] = pd.to_datetime( df.date)
>>> df['year'] = pd.DatetimeIndex(df['Datetime']).year
>>> df['month'] = pd.DatetimeIndex(df['Datetime']).month

>>> print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      9 non-null      object        
 1   name      9 non-null      object        
 2   amount    9 non-null      int64         
 3   Datetime  9 non-null      datetime64[ns]
 4   month     9 non-null      int64         
 5   year      9 non-null      int64         
dtypes: datetime64[ns](1), int64(3), object(2)
memory usage: 560.0  bytes
None

>>> print(df)

date      name  amount   Datetime  month  year
0  2021-07-01   Chlorox       1 2021-07-01      7  2021
1  2021-07-14   Chlorox      20 2021-07-14      7  2021
2  2021-07-29   Chlorox     700 2021-07-29      7  2021
3  2021-08-11   Chlorox    6000 2021-08-11      8  2021
4  2021-08-12  Suriname       3 2021-08-12      8  2021
5  2021-08-19  Suriname      10 2021-08-19      8  2021
6  2020-08-11   Chlorox    6500 2020-08-11      8  2020
7  2020-08-12  Suriname     300 2020-08-12      8  2020
8  2020-08-19  Suriname     150 2020-08-19      8  2020

>>> # we will have this aggregate function to deal with concatenating string values together, and, for numerical additions we will use numpy.sum()
>>> def f1(s):
    return ', '.join(s)

>>> df_result = df.groupby(['name', 'year', 'month']).agg({'date': f1, 'amount': np.sum})

>>> print(df_result)

                                       date  amount
name     year month                                            
Chlorox  2020 8                              2020-08-11    6500
         2021 7      2021-07-01, 2021-07-14, 2021-07-29     721
              8                              2021-08-11    6000
Suriname 2020 8                  2020-08-12, 2020-08-19     450
         2021 8                  2021-08-12, 2021-08-19      13

See if this helps or leave a comment