I have two df which i want the values which are not present in df2 when compared-CodePudding

i have 2 dataframe like below

df1 :

col 1	col 2
val1	val2
xyz	abc
ooo	ppp

df2:

col 1	col 2
val1	mmmm
xyz	abc
ooo	sssss

so here the first column is the same but 2nd is different. But i want to compare both the columns at a time get the difference based on their combination.

output:

col 1	col 2
val1	mmmm
ooo	sssss

CodePudding user response：

You can try merge then filter

out = (pd.merge(df1, df2, on='col 1', how='outer')
       .query('`col 2_x` != `col 2_y`'))

print(out)

  col 1 col 2_x col 2_y
0  val1    val2    mmmm
2   ooo     ppp   sssss

CodePudding user response：

IIUC, and based on your expected output, you are looking to return all rows of df2 where df1['col 2'] does not match df1['col 2']. If so, you can simply use:

df2[df2['col 2'] != df1['col 2']]

  col 1  col 2
0  val1   mmmm
2   ooo  sssss

If you want col 2 from both dfs here, you could do:

df1[df1['col 2'] != df2['col 2']].merge(df2, on='col 1', how='left', suffixes=['_df1','_df2'])

  col 1 col 2_df1 col 2_df2
0  val1      val2      mmmm
1   ooo       ppp     sssss

Or if you want to preserve the index:

df1[df1['col 2'] != df2['col 2']].reset_index().merge(df2, on='col 1', how='left', suffixes=['_df1','_df2']).set_index('index')

      col 1 col 2_df1 col 2_df2
index                          
0      val1      val2      mmmm
2       ooo       ppp     sssss