I have a DataFrame with a column called 'color', containing a list of colors.
| color |
|---|
| Red |
| Yellow |
| Green |
| Yellow |
| Violet |
I've created two lists, primary and secondary. I'm trying to iterate through the values in the 'color' column against the two lists and create a new column 'category' which contains the list name (or the category "primary and "secondary").
primary = ["red","yellow","blue"]
secondary = ["green","violet","orange"]
This is the output I'm looking for.
| color | category |
|---|---|
| Red | primary |
| Yellow | primary |
| Green | secondary |
| Yellow | primary |
| Violet | secondary |
I've tried using two np.where statements but the second statement overwrites the first. I now understand why it's doing that but I'm struggling to find a solution to my problem.
Any suggestions?
CodePudding user response:
You could use numpy.select (since the words in df are capitalized but those in the lists aren't, we could align them with the str.lower method):
colors = df['color'].str.lower()
df['category'] = np.select([colors.isin(primary), colors.isin(secondary)], ['primary', 'secondary'], np.nan)
Output:
color category
0 Red primary
1 Yellow primary
2 Green secondary
3 Yellow primary
4 Violet secondary
CodePudding user response:
With a lambda function
df['type'] = df.color.apply(lambda col : 'primary' if col.lower() in primary else ('secondary' if col.lower() in secondary else ''))
CodePudding user response:
Try with np.where:
import numpy as np
color["category"] = np.where(color["color"].str.lower().isin(primary),
"primary",
"secondary")
>>> color
color category
0 Red primary
1 Yellow primary
2 Green secondary
3 Yellow primary
4 Violet secondary
Alternatively, with pandas.where:
color["category"] = "secondary"
color["category"] = (color["category"].where(~color["color"].str.lower().isin(primary))
.fillna("primary")
)
