Merge two tables or make a new one-CodePudding

How can I get df3, that is a multiplication of the values in df1 by the coefficients in df2 matching on the type and year.

df1:

   year  type_1  type_2  type_3  pepse
0  2011       1       1       1      1
1  2011       2       2       2      1
2  2011       3       3       3      1
3  2011       4       4       4      1

df2:

   year combo_type   X  Y
0  2011     type_1   1  5
1  2011     type_3  11  6

output df3:

   year  type_1  type_2  type_3  pepse  X_type_1  X_type_2  X_type_3  Y_type_1  Y_type_2  Y_type_3
0  2011       1       1       1      1         1         0        11         5         0         6
1  2011       2       2       2      1         1         0        11         5         0         6
2  2011       3       3       3      1         1         0        11         5         0         6
3  2011       4       4       4      1         1         0        11         5         0         6

dataframes:

df1 = pd.DataFrame({'year':[2011,2011,2011,2011],'type_1':[1,2,3,4],'type_2':[1,2,3,4],'type_3':[1,2,3,4],'pepse':[1,1,1,1]})
df2 = pd.DataFrame({'year':[2011,2011],'combo_type':['type_1','type_3',],'X':[1,11],'Y':[5,6,]})
df3 = pd.DataFrame({'year':[2011,2011,2011,2011],'type_1':[1,2,3,4],'type_2':[1,2,3,4],'type_3':[1,2,3,4],'pepse':[1,1,1,1],'X_type_1':[1,1,1,1],'X_type_2':[0,0,0,0],'X_type_3':[11,11,11,11],'Y_type_1':[5,5,5,5],'Y_type_2':[0,0,0,0],'Y_type_3':[6,6,6,6]})

enter image description here

CodePudding user response：

I think there might be a mistake in your output. As you have a duplicated index you first need to add a group helper column to deduplicate otherwise you'll aggregate the data in the pivot steps.

I would do:

cols = ['X', 'Y']

# deduplicate df1 index
# by adding a secondary key where the year is the same
df1 = df1.assign(group=lambda d: d.groupby('year').cumcount())

idx = ['year', 'combo_type']

out = (    # reshape df1 to have one VALUE per row (wide to long)
 df2.merge(df1.melt(id_vars=['year', 'group'], var_name='combo_type'), 
           on=idx)  # merge df2 and reshaped df1 on year/type
    .set_index(idx ['group'])  # set year/group aside 
     # remove the VALUE and use it to multiply the X/Y columns
    .pipe(lambda d: d[cols].mul(d.pop('value'), axis=0))
    .reset_index() # put back index as columns
     # reshape back to wide format
    .pivot_table(index=['year', 'group'], columns='combo_type', fill_value=0)
)

# merge the multiindex levels
out.columns = out.columns.map('_'.join)

# join new columns to original dataframe
df3 = (df1.merge(out, left_on=['year', 'group'], right_index=True)
          .drop(columns='group')
       )

output:

   year  type_1  type_2  type_3  pepse  X_type_1  X_type_3  Y_type_1  Y_type_3
0  2011       1       1       1      1         1        11         5         6
1  2011       2       2       2      1         2        22        10        12
2  2011       3       3       3      1         3        33        15        18
3  2011       4       4       4      1         4        44        20        24