Home > Mobile >  One-To-Many Merge with Pandas
One-To-Many Merge with Pandas

Time:01-30

Given two dataframes in which two keys share a centralised value in another dataframe as follows:

manager_id employee_id
1 5
4 3
4 2
7 9
employee_id location
1 US
2 France
3 UK
4 Russia
5 Mexico
6 Spain
7 Italy
8 Greece
9 Hungary

how to merge with pandas the key in the second dataframe to the other two in the first one so it ends up as follows:

manager_id employee_id manager_location employee_location
1 5 US Mexico
4 3 Russia UK
4 2 Russia France
7 9 Italy Hungary

to be clear, both manager_id and employee_id share the same id key in the location table, even if they have distinct names.

Thanks in advance.

CodePudding user response:

You can map ids to locations by converting the DataFrame with location data df2 into a Series where indices are employee_ids and values are locations:

id_to_location_mapper = df2.set_index('employee_id')['location']
df1['manager_location'] = df1['manager_id'].map(id_to_location_mapper)
df1['employee_location'] = df1['employee_id'].map(id_to_location_mapper)

You can also unstack, map, unstack and join:

df1 = df1.join(df1.unstack().map(df2.set_index('employee_id')['location']).swaplevel().unstack().add_suffix('_location'))

Output:

   manager_id  employee_id manager_location employee_location
0           1            5               US            Mexico
1           4            3           Russia                UK
2           4            2           Russia            France
3           7            9            Italy           Hungary

CodePudding user response:

You can use left join like SQL

df = pd.DataFrame({
    'manager_id' : [1,4,4,7],
    'employee_id':[5,3,2,9]
 })

df
  manager_id    employee_id
0   1              5
1   4              3
2   4              2
3   7              9
df2 = pd.DataFrame({
    'employee_id' : [1,2,3,4,5,6,7,8,9],
    'location':['US','FR','UK','RU','MX','ES','IT','GR','HU']
 })

df2

  employee_id   location
0   1             US
1   2             FR
2   3             UK
3   4             RU
4   5             MX
5   6             ES
6   7             IT
7   8             GR
8   9             HU

Left join with pandas dataframes

df_result = df.merge(df2, on=['employee_id'], how='left').merge(df2,left_on=['manager_id'],right_on=['employee_id'], how='left')
df_result = df_result.drop('employee_id_y', 1) 
df_result

Result

   manager_id   employee_id_x   location_x  location_y
0   1              5               MX        US
1   4              3               UK        RU
2   4              2               FR        RU
3   7              9               HU        IT
  •  Tags:  
  • Related