My df1 looks like this:It contains 3 unique project id.The date starts on 01-01-22 and ends on 01-12-28
| id | date | p50 | p90 |
|---|---|---|---|
| apv1 | 01-01-22 | 1000 | 1000 |
| apv2 | 01-01-22 | 1000 | 1000 |
| tsso | 01-01-22 | 1202 | 2005 |
| apv1 | 01-02-22 | 1000 | 2000 |
| apv2 | 01-02-22 | 1400 | 5000 |
| tsso | 01-02-22 | 200 | 1000 |
| . | . | . | . |
| . | . | . | . |
| . | . | . | . |
| apv1 | 01-11-28 | 305 | 400 |
| apv2 | 01-11-28 | 300 | 200 |
| tsso | 01-11-28 | 250 | 499 |
| apv1 | 01-12-28 | 100 | 290 |
| apv2 | 01-12-28 | 145 | 650 |
| tsso | 01-12-28 | 299 | 179 |
df2 looks like this:
| mth | month | pct |
|---|---|---|
| 01 | january | 0.042 |
| 02 | febuary | 0.055 |
| 03 | march | 0.089 |
| 04 | april | 0.097 |
| 05 | may | 0.10 |
| 06 | june | 0.11 |
| 07 | july | 0.127 |
| 08 | august | 0.114 |
| 09 | sept | 0.091 |
| 10 | oct | 0.068 |
| 11 | nov | 0.043 |
| 12 | dec | 0.038 |
I want to multiply each value p50 of df1 by the value pct of df2 with mth or month value equals to the month of the date in df1. Similarly for p90.
The Final df should look like this:
| id | date | p50 | p90 |
|---|---|---|---|
| apv1 | 01-01-22 | 1000*0.042 | 1000*0.042 |
| apv2 | 01-01-22 | 1000*0.042 | 1000*0.042 |
| tsso | 01-01-22 | 1202*0.042 | 2005*0.042 |
| apv1 | 01-02-22 | 1000*0.055 | 2000*0.055 |
| apv2 | 01-02-22 | 1400*0.055 | 5000*0.055 |
| tsso | 01-02-22 | 200*0.055 | 1000*0.055 |
| . | . | . | . |
| . | . | . | . |
| . | . | . | . |
| apv1 | 01-11-28 | 305*0.043 | 400*0.043 |
| apv2 | 01-11-28 | 300*0.043 | 200*0.043 |
| tsso | 01-11-28 | 250*0.043 | 499*0.043 |
| apv1 | 01-12-28 | 100*0.038 | 290*0.038 |
| apv2 | 01-12-28 | 145*0.038 | 650*0.038 |
| tsso | 01-12-28 | 299*0.038 | 179*0.038 |
CodePudding user response:
Simply assign month column using the date column. Then merge both data frames by month. Finally, run multiplication with needed columns:
final_df = (
df1.assign(mth = lambda x: pd.to_datetime(x["date"]).dt.strftime("%m"))
.merge(df2, on="mth")
.assign(
p50 = lambda x: x["p50"].mul(x["pct"]),
p90 = lambda x: x["p90"].mul(x["pct"])
)
)
CodePudding user response:
You should be able to use map to modify you data in place:
s = df2.set_index('mth')['pct']
pct = pd.to_datetime(df1['date']).dt.strftime('%m').map(s)
df1[['p50', 'p90']] *= pct
