I have two dataframes:
df1 = pd.DataFrame({'A': [1,2,3,4],
'B': ['car', 'bike', 'pedestrian', 'truck']})
df2 = pd.DataFrame({'C':[301, 202],
'D': ['red', 'blue']})
I have a map between the two frames
MAP = pd.DataFrame({'A_index': [3, 2],
'C_index': [301, 202]})
I want to create a dataframe in which each row is the concatenation of the df1 & df2 following the mapping rule, the expected output will look like this:
output = pd.DataFrame({'A': [3,2],
'B': ['pedestrian', 'bike'],
'D': ['red', 'blue']})
Any ideas how to achieve that ? I'm not that fluent with pandas.
Thanks,
CodePudding user response:
One option is to perform a double merge:
(df1.merge(df2.merge(MAP, left_on='C', right_on='C_index'),
left_on='A', right_on='A_index')
.filter(regex=r'^((?!_index).)*$') # remove the "X_index" columns
.drop(columns='C')
)
NB. I used MAP as name for the mapping dataframe as map is a python builtin
Alternative, more linear, syntax:
(df1.merge(MAP, left_on='A', right_on='A_index')
.merge(df2, left_on='C_index', right_on='C')
.filter(regex=r'^((?!_index).)*$')
.drop(columns='C')
)
output:
A B D
0 2 bike blue
1 3 pedestrian red
CodePudding user response:
Set a new index for df2 dataframe based on MAP dataframe then merge it with df1:
out = df1.merge(df2.set_index(MAP.set_index('C_index')['A_index']),
left_on='A', right_index=True).drop(columns='C')
print(out)
# Output
A B D
0 2 bike blue
1 3 pedestrian red
