i have 2 dataframe like below
df1 :
| col 1 | col 2 |
|---|---|
| val1 | val2 |
| xyz | abc |
| ooo | ppp |
df2:
| col 1 | col 2 |
|---|---|
| val1 | mmmm |
| xyz | abc |
| ooo | sssss |
so here the first column is the same but 2nd is different. But i want to compare both the columns at a time get the difference based on their combination.
output:
| col 1 | col 2 |
|---|---|
| val1 | mmmm |
| ooo | sssss |
CodePudding user response:
You can try merge then filter
out = (pd.merge(df1, df2, on='col 1', how='outer')
.query('`col 2_x` != `col 2_y`'))
print(out)
col 1 col 2_x col 2_y
0 val1 val2 mmmm
2 ooo ppp sssss
CodePudding user response:
IIUC, and based on your expected output, you are looking to return all rows of df2 where df1['col 2'] does not match df1['col 2']. If so, you can simply use:
df2[df2['col 2'] != df1['col 2']]
col 1 col 2
0 val1 mmmm
2 ooo sssss
If you want col 2 from both dfs here, you could do:
df1[df1['col 2'] != df2['col 2']].merge(df2, on='col 1', how='left', suffixes=['_df1','_df2'])
col 1 col 2_df1 col 2_df2
0 val1 val2 mmmm
1 ooo ppp sssss
Or if you want to preserve the index:
df1[df1['col 2'] != df2['col 2']].reset_index().merge(df2, on='col 1', how='left', suffixes=['_df1','_df2']).set_index('index')
col 1 col 2_df1 col 2_df2
index
0 val1 val2 mmmm
2 ooo ppp sssss
