I have 2 dataframes which are related to each other with a foreign key. I'm trying to find rows in the first dataframe where the Fk appears in the second dataframe.
Example:
df1 = pd.DataFrame({
'a': ['1', '2', '3', '4'],
'b': ['5', '6', '7', '8'],
'c': ['3', '4', '5', '6']
})
df2 = pd.DataFrame({
'd': ['1', '2', '4', '3', '5', '6'],
'e': ['4', '3', '2', '1', '5', '6'],
'f': ['7', '4', '6', '5', '8', '3']
})
key1 = ['a', 'c']
key2 = ['d', 'f']
mask = (df1[key1].isin(df2[key2])).all(axis=1)
I'm expecting mask to be
0 False
1 True
2 True
3 True
dtype: bool
But what I'm getting is
0 False
1 False
2 False
3 False
dtype: bool
This is a simplified example. The datasets and keys are dynamic so I can't just hard-code some filters. How can I go about getting the correct indexes in the mask?
CodePudding user response:
Transform your columns into tuple to group columns before applying isin:
>>> df1[key1].apply(tuple, axis=1).isin(df2[key2].apply(tuple, axis=1))
0 False
1 True
2 True
3 True
dtype: bool
CodePudding user response:
DataFrame.isin against another DataFrame expects the indexes and columns to match so you don't get the desired output. One way is to tuplify your values along rows and then use Series.isin:
>>> d1, d2 = df1[key1], df2[key2]
>>> d1.agg(tuple, axis=1).isin(d2.agg(tuple, axis=1))
0 False
1 True
2 True
3 True
dtype: bool
