I am using Python pandas, I would appreciate the code in Python. But you can suggest a solution in Excel or SQL. I have two tables A and B. Table A has two ID columns by which I want to join table B. Table B has one ID column that either corresponds with one or the other ID column of Table A. Here is an example:
And this is the resulting table AB that I want:

CodePudding user response:
in sql:
select * from tableA a
left join tableB b
on b.ID in (a.ID1, a.ID2)
CodePudding user response:
You can use pd.merge() dropna() isna() pd.concat()
import pandas as pd
m1 = pd.merge(df_A, df_B, left_on='ID Column 1', right_on='ID Column', how='left')
m1_mismatches = m1[m1['ID Column'].isna()].dropna(axis=1)
m1_matches = m1[~m1['ID Column'].isna()]
m2 = pd.merge(m1_mismatches, df_B, left_on='ID Column 2', right_on='ID Column', how='left')
df_AB = pd.concat([m1_matches, m2]).drop(columns='ID Column').reset_index(drop=True)
print(df_AB)
The idea here is to do an initial "left join" between df_A and df_B, and then a second "left join" with the mismatches found in the first join. Finally we use pd.concat to concat the results.
Output:
| ID Column 1 | ID Column 2 | col 1 | col 2 | col 3 | col 4 | |
|---|---|---|---|---|---|---|
| 0 | [email protected] | [email protected] | lol | 9479 | pee pee poo poo | 985023.0 |
| 1 | [email protected] | [email protected] | bla bla bla | 123 | bla bla bla hello | 124323.0 |
| 2 | [email protected] | [email protected] | harry potter | 2077 | NaN | NaN |


