I have below data frame, I'm trying to calculate Average of "Price" column before and after Rate change, not considering in the average calculation the value of the Month when Rate changed. for example for Client "XY Ltd" rate Increased on Aug. so, Average of Price before rate change would be (5 6)/2 Average price after rate changed would be (-3-5)/2
below is the code I tried, but I don't know how to exclude the values of the Month when rate changed.
Also, I'm trying to create another column with the Subtraction of "Average after rate change" - "average before rate changed"
df['New Column'] = df.groupby(['Client', 'Rate'])['Price'].transform('mean')
| Date | Client | Rate | Price |
|---|---|---|---|
| 2022-06-01 | XY Ltd | 1.50 | 5 |
| 2022-07-01 | XY Ltd | 1.50 | 6 |
| 2022-08-01 | XY Ltd | 3.00 | 10 |
| 2022-09-01 | XY Ltd | 3.00 | -3 |
| 2022-10-01 | XY Ltd | 3.00 | -5 |
| 2022-06-01 | ZZ Inc | 1.60 | 3 |
| 2022-07-01 | ZZ Inc | 1.60 | 4 |
| 2022-08-01 | ZZ Inc | 4.00 | 12 |
| 2022-09-01 | ZZ Inc | 4.00 | -4 |
| 2022-10-01 | ZZ Inc | 4.00 | -6 |
desired output
| Date | Client | Rate | Price | New Colum | New column 2 |
|---|---|---|---|---|---|
| 2022-06-01 | XY Ltd | 1.50 | 5 | 5.5 | -9.5 |
| 2022-07-01 | XY Ltd | 1.50 | 6 | 5.5 | -9.5 |
| 2022-08-01 | XY Ltd | 3.00 | 10 | 0.0 | -9.5 |
| 2022-09-01 | XY Ltd | 3.00 | -3 | -4.0 | -9.5 |
| 2022-10-01 | XY Ltd | 3.00 | -5 | -4.0 | -9.5 |
| 2022-06-01 | ZZ Inc | 1.60 | 3 | 3.5 | -8.5 |
| 2022-07-01 | ZZ Inc | 1.60 | 4 | 3.5 | -8.5 |
| 2022-08-01 | ZZ Inc | 4.00 | 12 | 0.0 | -8.5 |
| 2022-09-01 | ZZ Inc | 4.00 | -4 | -5.0 | -8.5 |
| 2022-10-01 | ZZ Inc | 4.00 | -6 | -5.0 | -8.5 |
CodePudding user response:
I think this will do what you're asking:
df = df.join(df.groupby('Client')['Rate'].agg(['first']), on='Client')
df['rate_change'] = (df.Rate != df['first']) & (df.Rate.shift() == df['first'])
df['cs'] = df.groupby('Client')['rate_change'].cumsum() * (2 * ~df.rate_change - 1)
df2 = df.groupby(['Client','cs'])['Price'].mean().reset_index(level='cs')
df = df.join(df2[df2.cs == 0].rename(columns={'Price':'pre_change_avg'}).drop(columns='cs'),on='Client')
df = df.join(df2[df2.cs == 1].rename(columns={'Price':'post_change_avg'}).drop(columns='cs'),on='Client')
df['New Column'] = np.where((df.Rate == df['first']), df.pre_change_avg, np.where(df.rate_change, 0, df.post_change_avg))
df['New Column 2'] = df.post_change_avg - df.pre_change_avg
df = df.drop(columns=['first','rate_change','cs','pre_change_avg','post_change_avg'])
Output:
Date Client Rate Price New Column New Column 2
0 2022-06-01 XY Ltd 1.5 5 5.5 -9.5
1 2022-07-01 XY Ltd 1.5 6 5.5 -9.5
2 2022-08-01 XY Ltd 3.0 10 0.0 -9.5
3 2022-09-01 XY Ltd 3.0 -3 -4.0 -9.5
4 2022-10-01 XY Ltd 3.0 -5 -4.0 -9.5
5 2022-06-01 ZZ Ltd 1.6 3 3.5 -8.5
6 2022-07-01 ZZ Ltd 1.6 4 3.5 -8.5
7 2022-08-01 ZZ Ltd 4.0 12 0.0 -8.5
8 2022-09-01 ZZ Ltd 4.0 -4 -5.0 -8.5
9 2022-10-01 ZZ Ltd 4.0 -6 -5.0 -8.5
Explanation:
- create a new column
firstcontaining the startingRatefor eachClient - create a boolean column
rate_changeindicating the first row with a newRatefor eachClient - create a column
cscontaining 0 prior to theRatechange and 1 after it, with a -1 value for the first row of the newRate - put the average of
Pricefor each unique (Client,cs) pair into a new dataframedf2 - add new columns
pre_change_avgandpost_change_averageto the original df - set
New Columnusing these before and after theRatechange, and for the row of change use 0 - set
New Column 2to be the difference betweenpre_change_avgandpost_change_average.
CodePudding user response:
Here's an alternative:
# Part 1: `New Column`
grouped = df.groupby("Client")["Rate"].diff()
m, groups = grouped.fillna(0.0).eq(0.0), grouped.ne(0.0).cumsum()
df.loc[m, "New Column"] = df[m].groupby(groups)["Price"].transform("mean")
# Part 2: `New Column 2`
df = df.set_index(groups).join(
df.groupby(groups).last().groupby("Client")["New Column"].diff()
.rename("New Column 2").bfill().to_frame()
).reset_index(drop=True)
New Column: First groupdfoverClientand determine whereRatechanges. Then build a maskmto exclude the first rows after aRatechange, and determine the relevant groups for further.groupby. To getNew Column: group overgroupsand build the mean resp. means overPrice, but only onm.New Column 2: Build a minimal temporary dataframe with the uniquegroupsas index and theNew Columndiffs perClientas values, and then.joinit ondfongroups.
Result:
Date Client Rate Price New Column New Column 2
0 2022-06-01 XY Ltd 1.5 5 5.5 -9.5
1 2022-07-01 XY Ltd 1.5 6 5.5 -9.5
2 2022-08-01 XY Ltd 3.0 10 NaN -9.5
3 2022-09-01 XY Ltd 3.0 -3 -4.0 -9.5
4 2022-10-01 XY Ltd 3.0 -5 -4.0 -9.5
5 2022-06-01 ZZ Inc 1.6 3 3.5 -8.5
6 2022-07-01 ZZ Inc 1.6 4 3.5 -8.5
7 2022-08-01 ZZ Inc 4.0 12 NaN -8.5
8 2022-09-01 ZZ Inc 4.0 -4 -5.0 -8.5
9 2022-10-01 ZZ Inc 4.0 -6 -5.0 -8.5
