Trying to filter a dataframe using iloc and isin while looking for a results similar to any.
Data:
| column | tags |
|---|---|
| 0 | A |
| 1 | [A] |
| 2 | [] |
| 3 | |
| 4 | [A,B] |
| 5 | C |
| 6 | [C] |
| 7 | B |
df = pd.DataFrame({"tags": ["A",["A"],[],"",["A","B"],"C",["C"],"B"]})
filter = ["A","C"]
Filtering:
df.loc[df["tags"].isin(filter)]
Result:
| column | tags |
|---|---|
| 0 | A |
| 5 | C |
Desired Result:
| column | tags |
|---|---|
| 0 | A |
| 1 | [A] |
| 4 | [A,B] |
| 5 | C |
| 6 | [C] |
- I don't want to flatten the dataframe because it'll be costly for large dataframes.
CodePudding user response:
Use set.intersection in list comprehension and if-else because mixed lists and scalars for test and filter in boolean indexing:
df = pd.DataFrame({"tags": ["A",["A"],[],"",["A","B"],"C",["C"],"B"]})
f = ["A","C"]
s = set(f)
df = df[[bool(s.intersection(x if isinstance(x, list) else [x])) for x in df["tags"]]]
print (df)
tags
0 A
1 [A]
4 [A, B]
5 C
6 [C]
