I tried looking for the answer for my question but can not find exactly what I'm looking for. I have revenue for a customer and product by day(each customer does not have rev per product each day) and am looking to know the 10 days revenue prior and post of that activity date for that customer, product
Current Input
| customer | product | activity_date | revenue |
|---|---|---|---|
| 43 | product a | 2022-01-04 | 45.24 |
| 89 | product b | 2021-06-17 | 9294.21 |
| 35 | product a | 2020-05-20 | 75.12 |
Expected Output
| customer | product | activity_date | revenue | rev_10_d_prior | rev_10_d_post |
|---|---|---|---|---|---|
| 43 | product a | 2022-01-04 | 45.24 | ||
| 89 | product b | 2021-06-17 | 9294.21 | ||
| 35 | product a | 2020-05-20 | 75.12 |
CodePudding user response:
Try this code to add 2 new columns to your dataframe
dataframe.insert(4,'rev_10_d_prior','')
dataframe.insert(5,'rev_10_d_post','')
CodePudding user response:
If df is your dataframe and the column activity_date is a datetime column (sorted) then you could try to use .rolling():
df["rev_10_d_prior"] = (
df.groupby(["customer", "product"], as_index=False)
.rolling("11D", on="activity_date", closed="left").sum()
.revenue
)
df["rev_10_d_post"] = (
df[::-1].groupby(["customer", "product"], as_index=False)
.rolling("11D", on="activity_date", closed="left").sum()[::-1]
.revenue
)
