I have two columns, A and B. I also have a list of tuples. I want to remove any rows where it matches any of the tuples in the list. For example:
Input:
| A | B |
|---|---|
| A | 1 |
| A | 4 |
| B | 2 |
| A | 3 |
[(A,1),(C,4),(A,3)]
Output:
| A | B |
|---|---|
| A | 4 |
| B | 2 |
CodePudding user response:
You can use zip list comprehension:
tuples = [('A', 1), ('C', 4), ('A', 3)]
new_df = df[[x not in tuples for x in zip(df['A'], df['B'])]]
Output:
>>> new_df
A B
1 A 4
2 B 2
CodePudding user response:
I think the best you can do here is to throw all the "blacklisted" tuples into a set (i.e. hash them) and perform a membership test on each row in your list. The membership test will take constant time & the overall time complexity of this algorithm will be O(n m), with n being the number of items in your list and m being the number of items in your blacklist.
def solve(arr, blacklist):
S = set(blacklist)
result = [None] * len(arr)
idx = 0
for i in range(len(arr)):
if arr[i] not in S:
result[idx] = arr[i]
idx = 1
return result[:idx]
CodePudding user response:
A "pure" pandas solution (whatever that means):
df[~df.set_index(['A','B']).index.isin(tuples)]
output
A B
1 A 4
2 B 2
CodePudding user response:
Use zip pandas series to do without for loop (should be faster) Note: Based upon How to filter a pandas DataFrame according to a list of tuples
tuples = [('A',1),('C',4),('A',3)]
new_df = df[~pd.Series(list(zip(df['A'], df['B']))).isin(tuples)] # no for loop
>>> new_df
A B
1 A 4
2 B 2
