I am trying to use a table that has headers that correspond to categorizations to descriptions for products
Table1
| ID | Description |
|---|---|
| 1 | Wow what a nice car |
| 2 | That automobile is fabulous |
| 3 | When will they serve us dessert? |
| 4 | When is Jim's ride showing up? |
| 5 | What do I have to do to get a decent cookie around here! |
| 6 | Bury me with a chocolate sundae. |
CategoryTable
| Dessert | Cars | Coffee |
|---|---|---|
| Candy | Car | Cold Brew |
| Cake | Automobile | Starbucks |
| Chocolate | Driving | Mug |
| Cookie | Ride | |
| sundae | Park | |
| Dessert | Wheels |
| ID | Description | Result |
|---|---|---|
| 1 | Wow what a nice car | Car |
| 2 | That automobile rocks | Car |
| 3 | When's Dessert? | Dessert |
| 4 | Where is Jim's ride? | Car |
| 5 | Gimme the Cookie! | Dessert |
| 6 | Starbucks is popular. | Coffee |
I am happy to use Python, SQL or Excel to figure this all out thanks in advance!
CodePudding user response:
Note that I just separated tags and description tables in different sheets.
Sheet5:
Sheet1:
Use:
data = pd.read_excel('Example Data Set.xlsx', sheet_name='Sheet5')
tags = pd.read_excel('Example Data Set.xlsx', sheet_name='Sheet1')
tags_str = '|'.join([x.lower() for x in tags.fillna('_').values.flatten()])
temp = data['Description'].str.lower().str.extract(f"( {tags_str} )")
def tag2cat(tag):
for col in tags.columns:
if tag.values[0] in tags[col].str.lower().values:
return col
temp.apply(tag2cat, axis = 1)
Output:
CodePudding user response:
You can try:
pattern = '|'.join([f"(?P<{c}>{'|'.join(df2[c].str.lower().dropna())})"
for c in df2])
df1['Result'] = (df1['Description']
.str.lower().str.extractall(pattern)
.stack().reset_index(level=2).groupby(level=0)['level_2']
.apply(lambda x: ', '.join(set(x))))
Output:
>>> df1
ID Description Result
0 1 Wow what a nice car Cars
1 2 That automobile is fabulous Cars
2 3 When will they serve us dessert? Dessert
3 4 When is Jim's ride showing up? Cars
4 5 What do I have to do to get a decent cookie ar... Dessert
5 6 Bury me with a chocolate sundae. Dessert
It works if you have multiple match. Tags will be comma separated.



