Pandas: best way to remove rows where columns match any set of values in a list of tuples?-CodePudding

I have two columns, A and B. I also have a list of tuples. I want to remove any rows where it matches any of the tuples in the list. For example:

Input:

A	B
A	1
A	4
B	2
A	3

[(A,1),(C,4),(A,3)]

Output:

A	B
A	4
B	2

CodePudding user response：

You can use zip list comprehension:

tuples = [('A', 1), ('C', 4), ('A', 3)]
new_df = df[[x not in tuples for x in zip(df['A'], df['B'])]]

Output:

>>> new_df
   A  B
1  A  4
2  B  2

CodePudding user response：

I think the best you can do here is to throw all the "blacklisted" tuples into a set (i.e. hash them) and perform a membership test on each row in your list. The membership test will take constant time & the overall time complexity of this algorithm will be O(n m), with n being the number of items in your list and m being the number of items in your blacklist.

def solve(arr, blacklist):
    S = set(blacklist)
    result = [None] * len(arr)
    idx = 0
    for i in range(len(arr)):
        if arr[i] not in S:
           result[idx] = arr[i]
           idx  = 1
    return result[:idx]

CodePudding user response：

A "pure" pandas solution (whatever that means):

df[~df.set_index(['A','B']).index.isin(tuples)]

output

    A   B
1   A   4
2   B   2

CodePudding user response：

Use zip pandas series to do without for loop (should be faster) Note: Based upon How to filter a pandas DataFrame according to a list of tuples

tuples = [('A',1),('C',4),('A',3)]
new_df = df[~pd.Series(list(zip(df['A'], df['B']))).isin(tuples)] # no for loop
>>> new_df
    A   B
1   A   4
2   B   2