how to “extract” values from a column, in which values are found in a dictionary-CodePudding

I have a dataframe df1:

ID	item
11111	chair
11112	desk¥blue chair
11113	bed¥pen¥cable
11114	mug¥old sofa¥toy
11115	old mug¥wine glass
11116	blue chair¥old mug

etc.

and a large dictionary item_dict (from csv) which categorize some but not all items into categories:

item	category
chair	furniture
blue chair	furniture
mug	tableware
old mug	tableware
wine glass	tableware

etc.

The goal is to make a new column in df1 of all category of interest for items, if any, in each row:

ID	item	category
11111	chair	furniture
11112	desk¥blue chair	furniture
11113	bed¥pen¥cable
11114	mug¥old sofa¥toy	mug
11115	old mug¥wine glass	mug
11116	blue chair¥old mug	furniture, mug

I’ve tried

df1[‘category’] = df1[‘item’].replace(item_dict)

which will only replace values that only have a single item;

adding regex=True would allow multi-item entries to be replaced but would turn desk¥blue chair into desk¥blue furniture bc a shorter one existed.

I know replace isn’t the best option since I want the new column to only contain categories of interest without other item names, but am unsure of what to use.

CodePudding user response：

You can try this:

# !pip install -U swifter  #### only in case of very large dataframe
import swifter


delimiter = '¥'
df1['category'] = df1['item'].swifter.apply(
    lambda _: delimiter.join([item_dict.get(el, '') for el in _.split(delimiter)])
)