Home > Enterprise >  Pandas: How to easily merge a dataframe with many other dataframes on different keys?
Pandas: How to easily merge a dataframe with many other dataframes on different keys?

Time:01-07

let's say I have a 'base' DF with encoded columns and many mapping tables - each of them includes a dictionary for one encoded column from the base df, for example

Base df: df

RETAILER    STATE_CODE  SIZE_CODE   POTENTIAL_CODE
store1      1           008         222
store2      2           001         111
store3      3           006         333

Mapping table 1: mp_df1

STATE_CODE STATE_NAME
1          California
2          Nevada
3          Washington

Mapping table 2: mp_df2

SIZE_CODE SIZE_NAME
001       Small
006       Moderate
008       Big

Mapping table 3: mp_df3

POTENTIAL_CODE POTENTIAL_NAME
111            Small
222            Moderate
333            Promising

So the base table has a seperate key for each of the mapping tables.

In this case I can use 3 merges but in real life I have more than just 3 mapping tables so it's really annoying to do it 'manually'. Is there a simpler way to combine these dfs into one?

RETAILER    STATE_CODE  SIZE_CODE   POTENTIAL_CODE STATE_NAME SIZE_NAME POTENTIAL_NAME
store1      1           008         222            California Big       Moderate  
store2      2           001         111            Nevada     Small     Small
store3      3           006         333            Washington Moderate  Promising

CodePudding user response:

You can use pd.concat and map:

out = pd.concat([df,
    df['STATE_CODE'].map(mp_df1.set_index('STATE_CODE').squeeze()),
    df['SIZE_CODE'].map(mp_df2.set_index('SIZE_CODE').squeeze()),
    df['POTENTIAL_CODE'].map(mp_df3.set_index('POTENTIAL_CODE').squeeze())], axis=1)
print(out)

# Output:
  RETAILER STATE_CODE SIZE_CODE POTENTIAL_CODE  STATE_CODE SIZE_CODE  \
0   store1          1       008            222  California       Big   
1   store2          2       001            111      Nevada     Small   
2   store3          3       006            333  Washington  Moderate   

  POTENTIAL_CODE  
0       Moderate  
1          Small  
2      Promising  

CodePudding user response:

I can't really believe how simple was the solution, although it's probably not optimized yet.

results = df
dataframes_list = ['mp_df1', 'mp_df2', 'mp_df3']

for i in range(len(dataframes_list)):
    result = pd.merge(results, dataframes_list[i], how='left')

Seems to work just fine - assuming that the mapping tables have only 2 columns: code and name, obviously.

  •  Tags:  
  • Related