I have a data frame like below:
start start_interaction
0 710000 224180000
1 710000 3445
2 715000 760000
3 755000 7603
4 755000 870000
.. ... ...
149 1840000 1935000
150 1840000 1980000
and I have a list like below:
myList=[(710000,3445),(755000,7603) ,(77700,234)]
so I need to delete the rows whose the first element in tuple that in myList should be in the start column and the second element in my tuple should be in start_interaction column
And the end result that I want is like below:
start start_interaction
0 710000 224180000
2 715000 760000
4 755000 870000
.. ... ...
149 1840000 1935000
150 1840000 1980000
so please tell me how can I do it. thanks a lot.
CodePudding user response:
not the most elegant maybe, but it works
df = df[[not(x) for x in [any([(row[1].start==x[0]) &(row[1].start_intercation==x[1]) for x in myList]) for row in df.iterrows()]]]
An explanation: The following row checks whether the specific line "start" matches the first element of the tuple, and that "start_interaction" matches the 2nd object for any tuple in the list.
[any([(row[1].start==x[0]) &(row[1].start_intercation==x[1]) for x in myList]]
the we filter df for the rows that does not meet this condition (no match in any item in the list. we do that by filtering df for rows that does not meet the above condition:
df = df[[not(x) for x in cond]]
where cond is the condition above
CodePudding user response:
You can craft a dataframe from your list, perform a left merge with indicator=True and use the left_only indicator to build a boolean array. Finally slice the original dataframe:
cols = list(df.columns) # subset here if needed
df2 = pd.DataFrame(myList, columns=cols)
mask = (df.merge(df2, on=cols, how='left', indicator=True)
['_merge'].eq('left_only').values # getting the values as the new index
# is no longer aligned
)
out = df[mask]
output:
start start_interaction
0 710000 224180000
2 715000 760000
4 755000 870000
149 1840000 1935000
150 1840000 1980000
