I am trying to merge two pandas DataFrames with one of many relationship. However, there are a couple of caveats. Explanation below.
import pandas as pd
df1 = pd.DataFrame({'name': ['AA', 'BB', 'CC', 'DD'],
'col1': [1, 2, 3, 4],
'col2': [1, 2, 3, 4] })
df2 = pd.DataFrame({'name': ['AA', 'AA', 'BB', 'BB', 'CC', 'DD'],
'col3': [0, 10, np.nan, 11, 12, 13] })
I'd like to merge the 2 DataFrames, however, ignore the 0 and np.nan in df2 when joining. I cannot simply filter df2 as there are other columns that I need.
Basically, I'd like to join on rows with one-to-many relationship that are not 0 or NaNs.
Expected output:
CodePudding user response:
One way:
>>> df1.merge(df2).drop_duplicates(subset=['name'], keep='last')
name col1 col2 col3
1 AA 1 1 10.0
3 BB 2 2 11.0
4 CC 3 3 12.0
5 DD 4 4 13.0
CodePudding user response:
Try with filter then merge
out = df1.merge(df2[df2.col3.ne(0)&df2.col3.notna()])
Out[69]:
name col1 col2 col3
0 AA 1 1 10.0
1 BB 2 2 11.0
2 CC 3 3 12.0
3 DD 4 4 13.0

