How can I get df3, that is a multiplication of the values in df1 by the coefficients in df2 matching on the type and year.
df1:
year type_1 type_2 type_3 pepse
0 2011 1 1 1 1
1 2011 2 2 2 1
2 2011 3 3 3 1
3 2011 4 4 4 1
df2:
year combo_type X Y
0 2011 type_1 1 5
1 2011 type_3 11 6
output df3:
year type_1 type_2 type_3 pepse X_type_1 X_type_2 X_type_3 Y_type_1 Y_type_2 Y_type_3
0 2011 1 1 1 1 1 0 11 5 0 6
1 2011 2 2 2 1 1 0 11 5 0 6
2 2011 3 3 3 1 1 0 11 5 0 6
3 2011 4 4 4 1 1 0 11 5 0 6
dataframes:
df1 = pd.DataFrame({'year':[2011,2011,2011,2011],'type_1':[1,2,3,4],'type_2':[1,2,3,4],'type_3':[1,2,3,4],'pepse':[1,1,1,1]})
df2 = pd.DataFrame({'year':[2011,2011],'combo_type':['type_1','type_3',],'X':[1,11],'Y':[5,6,]})
df3 = pd.DataFrame({'year':[2011,2011,2011,2011],'type_1':[1,2,3,4],'type_2':[1,2,3,4],'type_3':[1,2,3,4],'pepse':[1,1,1,1],'X_type_1':[1,1,1,1],'X_type_2':[0,0,0,0],'X_type_3':[11,11,11,11],'Y_type_1':[5,5,5,5],'Y_type_2':[0,0,0,0],'Y_type_3':[6,6,6,6]})
CodePudding user response:
I think there might be a mistake in your output. As you have a duplicated index you first need to add a group helper column to deduplicate otherwise you'll aggregate the data in the pivot steps.
I would do:
cols = ['X', 'Y']
# deduplicate df1 index
# by adding a secondary key where the year is the same
df1 = df1.assign(group=lambda d: d.groupby('year').cumcount())
idx = ['year', 'combo_type']
out = ( # reshape df1 to have one VALUE per row (wide to long)
df2.merge(df1.melt(id_vars=['year', 'group'], var_name='combo_type'),
on=idx) # merge df2 and reshaped df1 on year/type
.set_index(idx ['group']) # set year/group aside
# remove the VALUE and use it to multiply the X/Y columns
.pipe(lambda d: d[cols].mul(d.pop('value'), axis=0))
.reset_index() # put back index as columns
# reshape back to wide format
.pivot_table(index=['year', 'group'], columns='combo_type', fill_value=0)
)
# merge the multiindex levels
out.columns = out.columns.map('_'.join)
# join new columns to original dataframe
df3 = (df1.merge(out, left_on=['year', 'group'], right_index=True)
.drop(columns='group')
)
output:
year type_1 type_2 type_3 pepse X_type_1 X_type_3 Y_type_1 Y_type_3
0 2011 1 1 1 1 1 11 5 6
1 2011 2 2 2 1 2 22 10 12
2 2011 3 3 3 1 3 33 15 18
3 2011 4 4 4 1 4 44 20 24
