I'm trying to create a new column based on whether the strings in the original column contain a certain substring. What I tried was this:
def get_group(row):
stores = pd.Series(row['store'])
if (stores.str.contains('Blue')): 'Blue'
elif (stores.str.contains('Yellow')): 'Yellow'
elif (stores.str.contains('Green')): 'Green'
elif (stores.str.contains('Red')): 'Red'
elif (stores.str.contains('Purple')): 'Purple'
elif (stores.str.contains('Pink')): 'Pink'
elif (stores.str.contains('Orange')): 'Orange'
else: 'Outhers'
db['group'] = db.apply(lambda row: get_group(row), axis=1)
However it is not working
CodePudding user response:
You are missing a return in your function. Besides, to check if a string contains a substring, you have to use in. Finally, your line pd.Series(row['store']) is wrong.
Your function should look like this:
def get_group(row):
stores = row['store']
to_return='Others'
if ('Blue' in stores): to_return='Blue'
elif ('Yellow' in stores): to_return='Yellow'
elif ('Green' in stores): to_return='Green'
elif ('Red' in stores): to_return='Red'
elif ('Purple' in stores): to_return='Purple'
elif ('Pink' in stores): to_return='Pink'
elif ('Orange' in stores): to_return='Orange'
return(to_return)
Be aware that this function is sensitive to the case, so it will not detect 'blue' with a lowercase for instance, but only 'Blue'.
If you want to make your function case-insensitive, you have to transform all your strings into lowercase for instance: if ('blue' in stores.lower())
CodePudding user response:
There's two things you need to fix:
- Your if-statements are returning a boolean series that have an ambiguous truth value. In other words, a combination of True and False is being returned for the values of the boolean series and Python doesn't know which of these values to use. One way to obtain a single truth value is by using .any() to return True if any of the values are True.
- You need to add a return statement for the strings
With that being said, the following should work:
def get_group(row):
stores = pd.Series(row['store'])
if stores.str.contains('Blue').any(): return 'Blue'
elif stores.str.contains('Yellow').any(): return 'Yellow'
elif stores.str.contains('Green').any(): return 'Green'
elif stores.str.contains('Red').any(): return 'Red'
elif stores.str.contains('Purple').any(): return 'Purple'
elif stores.str.contains('Pink').any(): return 'Pink'
elif stores.str.contains('Orange').any(): return 'Orange'
else: return 'Others'
db['group'] = db.apply(lambda row: get_group(row), axis=1)
