I have a data frame (df_validation) in which I have done some validation. If all was ok, this should be empty. If not, rows with info like this will be found
CHR START END
1 1000 2000 # Imagine this failed
Now, I want to communicate to another data frame (df) that validation was ok except for the ones that appear in df_validation
This is df
CHR START END
1 1000 2000
2 1000 2000
3 1000 2000
4 1000 2000
5 1000 2000
and this is the result I would like to have
CHR START END Pass_validation?
1 1000 2000 False
2 1000 2000 True
3 1000 2000 True
4 1000 2000 True
5 1000 2000 True
How can I do this? The first idea was to merge both but having same column name, it adds a '_X' on left and '_Y' on right. The new right columns I don't really need, just a new column as in the example.
CodePudding user response:
Idea is use indicator=True parameter for helper column _merge and for False for match compare for not equal both. If is omit on parameter is joined by intersection of columnsname in both dataFrames, here CHR, START and END.
df2['Pass_validation?'] = df2.merge(df_validation,
indicator=True,
how='left')['_merge'].ne('both')
print (df2)
CHR START END Pass_validation?
0 1 1000 2000 False
1 2 1000 2000 True
2 3 1000 2000 True
3 4 1000 2000 True
4 5 1000 2000 True
Details:
print (df2.merge(df_validation, indicator=True, how='left'))
CHR START END _merge
0 1 1000 2000 both
1 2 1000 2000 left_only
2 3 1000 2000 left_only
3 4 1000 2000 left_only
4 5 1000 2000 left_only
