drop some rows with more that a condition-CodePudding

I have a data frame like below:

       start  start_interaction
0     710000          224180000
1     710000               3445
2     715000             760000
3     755000             7603
4     755000             870000
..       ...                ...
149  1840000            1935000
150  1840000            1980000

and I have a list like below:

myList=[(710000,3445),(755000,7603) ,(77700,234)]

so I need to delete the rows whose the first element in tuple that in myList should be in the start column and the second element in my tuple should be in start_interaction column And the end result that I want is like below:

start  start_interaction
0     710000          224180000
2     715000             760000
4     755000             870000
..       ...                ...
149  1840000            1935000
150  1840000            1980000

so please tell me how can I do it. thanks a lot.

CodePudding user response：

not the most elegant maybe, but it works

df = df[[not(x) for x in [any([(row[1].start==x[0]) &(row[1].start_intercation==x[1]) for x in myList]) for row in df.iterrows()]]]

An explanation: The following row checks whether the specific line "start" matches the first element of the tuple, and that "start_interaction" matches the 2nd object for any tuple in the list.

 [any([(row[1].start==x[0]) &(row[1].start_intercation==x[1]) for x in myList]]

the we filter df for the rows that does not meet this condition (no match in any item in the list. we do that by filtering df for rows that does not meet the above condition:

df = df[[not(x) for x in cond]]

where cond is the condition above

CodePudding user response：

You can craft a dataframe from your list, perform a left merge with indicator=True and use the left_only indicator to build a boolean array. Finally slice the original dataframe:

cols = list(df.columns) # subset here if needed
df2 = pd.DataFrame(myList, columns=cols)

mask = (df.merge(df2, on=cols, how='left', indicator=True)
        ['_merge'].eq('left_only').values # getting the values as the new index
                                          # is no longer aligned
        )

out = df[mask]

output:

       start  start_interaction
0     710000          224180000
2     715000             760000
4     755000             870000
149  1840000            1935000
150  1840000            1980000