I have two dataframes df1 and df2. I would like to check if id2 value exists in df2. If exists, fetch the rows in df1 with identical id1 values. I tried left join but it gives only the first row.
pd.merge(df1, df2.rename(columns={'id1':'id2'}), on='id2', how='left')
df1
id1 id2 id3
C45 AB ZU
C45 ZO RE
C67 RT FG
C78 TZ GH
df2
id1
AB
GH
ZU
Expected output is:
id1 id2 id3
C45 AB ZU
C45 ZO RE
CodePudding user response:
I did it in 2 separate steps.
First, find all the items in df1.id1 which match df2.id1 and save them to lookup.
Then, find all the rows which are in lookup within df1.id1 and save them to final.
First,
# Import pandas library
import pandas as pd
import numpy as np
# initialize list of lists
data = [['C45', 'AB','ZU'], ['C45', 'ZO','RE'], ['C67', 'RT','FG'],['C78','TZ','GH']]
# Create the pandas DataFrame
df1 = pd.DataFrame(data, columns=['id1', 'id2', 'id3'])
# initialize list of lists
data = [['AB'], ['GH'], ['ZU']]
# Create the pandas DataFrame
df2 = pd.DataFrame(data, columns=['id1'])
lookup = df1[df1.id2.isin(df2.id1)].id1 # contains a series of the ids in df1.id2 that match df2.id1
final = df1[df1.id1.isin(lookup)]
final looks like this,
id1 id2 id3
0 C45 AB ZU
1 C45 ZO RE
CodePudding user response:
Here we go
df1 = pd.DataFrame([['C45','AB','ZU'],['C45','ZO','RE'],
['C67','RT','FG',],['C78','TZ','GH']], columns=['id1','id2','id3'])
df2 = pd.DataFrame(['AB','GH','ZU'], columns=['id1'])
Calculate a mask for the required rows:
mask = df1.isin(df2.id1.values).any(1)
mask looks like this
0 True
1 False
2 False
3 True
dtype: bool
the required output:
df1[mask]
id1 id2 id3
0 C45 AB ZU
3 C78 TZ GH
