How to test whether a pandas series contains elements from another list (or NumPy array or pandas se-CodePudding

Assume that I have this DataFrame (Animals column is of type pandas.Series):

ID	Animals
1	[cat, dog, chicken]
2	[penguin]

And these lists (It can be NumPy Array or Pandas Series if it is better for performance):

mammals = ['cat', 'dog', 'cow', 'sheep']
birds = ['chicken', 'duck', 'penguin']

What I am trying to do is to add two columns to my DataFrame which are ContainsBirds and ContainsMammals based on the contents of the Animals column.

Here is the final expected output:

ID	Animals	ContainsBirds	ContainsMammals
1	[cat, dog, chicken]	1.0	1.0
2	[penguin]	1.0	0.0

CodePudding user response：

You can create dictionary for test if match at least one value by converting to sets with isdisjoint and if necessary 0.0 and 1.0 casting boolean to floats, for 0, 1 use .astype(int):

d = {'Birds':birds, 'Mammals':mammals}

for k, v in d.items():
    df[f'Contains{k}'] = (~df['Animals'].map(set(v).isdisjoint)).astype(float)
print (df)
   ID              Animals  ContainsBirds  ContainsMammals
0   1  [cat, dog, chicken]            1.0              1.0
1   2            [penguin]            1.0              0.0

CodePudding user response：

Using a list comprehension:

lists = [birds, mammals]
names = ['Birds', 'Mammals']

df[names] = [[int(bool(set(l).intersection(x))) for l in lists]
             for x in df['Animals']]

output:

   ID              Animals  Birds  Mammals
0   1  [cat, dog, chicken]      1        1
1   2            [penguin]      1        0