Home > database >  Pandas conditional merge 2 dataframes with one to many relationship
Pandas conditional merge 2 dataframes with one to many relationship

Time:02-06

I am trying to merge two pandas DataFrames with one of many relationship. However, there are a couple of caveats. Explanation below.

import pandas as pd

df1 = pd.DataFrame({'name': ['AA', 'BB', 'CC', 'DD'],
                    'col1': [1, 2, 3, 4],
                    'col2': [1, 2, 3, 4] })

df2 = pd.DataFrame({'name': ['AA', 'AA', 'BB', 'BB', 'CC', 'DD'],
                    'col3': [0, 10, np.nan, 11, 12, 13] })

I'd like to merge the 2 DataFrames, however, ignore the 0 and np.nan in df2 when joining. I cannot simply filter df2 as there are other columns that I need.

Basically, I'd like to join on rows with one-to-many relationship that are not 0 or NaNs.

Expected output:

merge_df

CodePudding user response:

One way:

>>> df1.merge(df2).drop_duplicates(subset=['name'], keep='last')
  name  col1  col2  col3
1   AA     1     1  10.0
3   BB     2     2  11.0
4   CC     3     3  12.0
5   DD     4     4  13.0

CodePudding user response:

Try with filter then merge

out = df1.merge(df2[df2.col3.ne(0)&df2.col3.notna()])
Out[69]: 
  name  col1  col2  col3
0   AA     1     1  10.0
1   BB     2     2  11.0
2   CC     3     3  12.0
3   DD     4     4  13.0
  •  Tags:  
  • Related