How can I lookup the values in a df column against multiple lists and return the list name in a new-CodePudding

I have a DataFrame with a column called 'color', containing a list of colors.

color
Red
Yellow
Green
Yellow
Violet

I've created two lists, primary and secondary. I'm trying to iterate through the values in the 'color' column against the two lists and create a new column 'category' which contains the list name (or the category "primary and "secondary").

primary = ["red","yellow","blue"]
secondary = ["green","violet","orange"]

This is the output I'm looking for.

color	category
Red	primary
Yellow	primary
Green	secondary
Yellow	primary
Violet	secondary

I've tried using two np.where statements but the second statement overwrites the first. I now understand why it's doing that but I'm struggling to find a solution to my problem.

Any suggestions?

CodePudding user response：

You could use numpy.select (since the words in df are capitalized but those in the lists aren't, we could align them with the str.lower method):

colors = df['color'].str.lower()
df['category'] = np.select([colors.isin(primary), colors.isin(secondary)], ['primary', 'secondary'], np.nan)

Output:

    color   category
0     Red    primary
1  Yellow    primary
2   Green  secondary
3  Yellow    primary
4  Violet  secondary

CodePudding user response：

With a lambda function

df['type'] = df.color.apply(lambda col : 'primary' if col.lower() in primary else ('secondary'  if col.lower() in secondary else ''))

CodePudding user response：

Try with np.where:

import numpy as np
color["category"] = np.where(color["color"].str.lower().isin(primary), 
                             "primary", 
                             "secondary")

>>> color
    color   category
0     Red    primary
1  Yellow    primary
2   Green  secondary
3  Yellow    primary
4  Violet  secondary

Alternatively, with pandas.where:

color["category"] = "secondary"
color["category"] = (color["category"].where(~color["color"].str.lower().isin(primary))
                                      .fillna("primary")
                    )