I have two DataFrame df1 and df2.
df1 is the original dataset and df2 is the dataset made from df1 after some manipulation.
In df1 I have column 'log' and in df2 I have column 'log1' and 'log2' two columns.
where the values in columns 'log1' and 'log2' contains in column 'log' in df1.
df2 sample below
date id log1 log2
1 uu1q (2,4) (3,5)
1 uu1q (2,4) (7,6)
1 uu1q (3,5) (7,6)
5 u25a (4,7) (3,9)
5 uu25a (1,9) (3,9)
6 ua3b7 (1,1) (2,2)
6 ua3b7 (1,1) (3,3)
6 ua3b7 (2,2) (3,3)
df1 column sample with data below
date id log name col1 col2
1 uu1q (2,4) xyz 1123 qqq
1 uu1q (3,5) aas 2132 wew
1 uu1q (7,6) wqas 2567 uuo
5 u25a (4,7) enj 666 ttt
5 fff (0,0) ddd 0 lll
Now I want to take fetch/filter all the records from df1 based on column values for each row in df2 i.e. based on 'date', 'id', 'log1' or 'log2' and compare it with columns in df1 i.e.
'date', 'id', 'log'.
NOTE: values columns 'log1' and 'log2' contained in single column 'log'
CodePudding user response:
IIUC, you're looking for a chained isin:
out = df1[df1['date'].isin(df2['date']) & df1['id'].isin(df2['id']) & (df1['log'].isin(df2['log1']) | df1['log'].isin(df2['log2']))]
Output:
date id log name col1 col2
0 1 uu1q (2,4) xyz 1123 qqq
1 1 uu1q (3,5) aas 2132 wew
2 1 uu1q (7,6) wqas 2567 uuo
3 5 u25a (4,7) enj 666 ttt
CodePudding user response:
Use DataFrame.melt for column log from log1, log2... columns and for filtering inner join in DataFrame.merge:
df = (df2.melt(['date','id'], value_name='log')
.drop('variable', axis=1)
.drop_duplicates()
.merge(df1))
print (df)
date id log name col1 col2
0 1 uu1q (2,4) xyz 1123 qqq
1 1 uu1q (3,5) aas 2132 wew
2 5 u25a (4,7) enj 666 ttt
3 1 uu1q (7,6) wqas 2567 uuo
