How to concatenate two Dataframe rows using a mapping index-CodePudding

I have two dataframes:

df1 = pd.DataFrame({'A': [1,2,3,4],
                    'B': ['car', 'bike', 'pedestrian', 'truck']})

df2 = pd.DataFrame({'C':[301, 202],
                    'D': ['red', 'blue']})

I have a map between the two frames

MAP = pd.DataFrame({'A_index': [3, 2],
                    'C_index': [301, 202]})

I want to create a dataframe in which each row is the concatenation of the df1 & df2 following the mapping rule, the expected output will look like this:

output = pd.DataFrame({'A': [3,2],
                       'B': ['pedestrian', 'bike'],
                       'D': ['red', 'blue']})

Any ideas how to achieve that ? I'm not that fluent with pandas.

Thanks,

CodePudding user response：

One option is to perform a double merge:

(df1.merge(df2.merge(MAP, left_on='C', right_on='C_index'),
           left_on='A', right_on='A_index')
    .filter(regex=r'^((?!_index).)*$') # remove the "X_index" columns
    .drop(columns='C')
)

NB. I used MAP as name for the mapping dataframe as map is a python builtin

Alternative, more linear, syntax:

(df1.merge(MAP, left_on='A', right_on='A_index')        
    .merge(df2, left_on='C_index', right_on='C')
    .filter(regex=r'^((?!_index).)*$')
    .drop(columns='C')
)

output:

   A           B     D
0  2        bike  blue
1  3  pedestrian   red

CodePudding user response：

Set a new index for df2 dataframe based on MAP dataframe then merge it with df1:

out = df1.merge(df2.set_index(MAP.set_index('C_index')['A_index']), 
                left_on='A', right_index=True).drop(columns='C')
print(out)

# Output
   A           B     D
0  2        bike  blue
1  3  pedestrian   red