One-To-Many Merge with Pandas-CodePudding

Given two dataframes in which two keys share a centralised value in another dataframe as follows:

manager_id	employee_id
1	5
4	3
4	2
7	9

employee_id	location
1	US
2	France
3	UK
4	Russia
5	Mexico
6	Spain
7	Italy
8	Greece
9	Hungary

how to merge with pandas the key in the second dataframe to the other two in the first one so it ends up as follows:

manager_id	employee_id	manager_location	employee_location
1	5	US	Mexico
4	3	Russia	UK
4	2	Russia	France
7	9	Italy	Hungary

to be clear, both manager_id and employee_id share the same id key in the location table, even if they have distinct names.

Thanks in advance.

CodePudding user response：

You can map ids to locations by converting the DataFrame with location data df2 into a Series where indices are employee_ids and values are locations:

id_to_location_mapper = df2.set_index('employee_id')['location']
df1['manager_location'] = df1['manager_id'].map(id_to_location_mapper)
df1['employee_location'] = df1['employee_id'].map(id_to_location_mapper)

You can also unstack, map, unstack and join:

df1 = df1.join(df1.unstack().map(df2.set_index('employee_id')['location']).swaplevel().unstack().add_suffix('_location'))

Output:

   manager_id  employee_id manager_location employee_location
0           1            5               US            Mexico
1           4            3           Russia                UK
2           4            2           Russia            France
3           7            9            Italy           Hungary

CodePudding user response：

You can use left join like SQL

df = pd.DataFrame({
    'manager_id' : [1,4,4,7],
    'employee_id':[5,3,2,9]
 })

df
  manager_id    employee_id
0   1              5
1   4              3
2   4              2
3   7              9

df2 = pd.DataFrame({
    'employee_id' : [1,2,3,4,5,6,7,8,9],
    'location':['US','FR','UK','RU','MX','ES','IT','GR','HU']
 })

df2

  employee_id   location
0   1             US
1   2             FR
2   3             UK
3   4             RU
4   5             MX
5   6             ES
6   7             IT
7   8             GR
8   9             HU

Left join with pandas dataframes

df_result = df.merge(df2, on=['employee_id'], how='left').merge(df2,left_on=['manager_id'],right_on=['employee_id'], how='left')
df_result = df_result.drop('employee_id_y', 1) 
df_result

Result

   manager_id   employee_id_x   location_x  location_y
0   1              5               MX        US
1   4              3               UK        RU
2   4              2               FR        RU
3   7              9               HU        IT