how to search list of dictionary values with pandas dataframe series column elements-CodePudding

I have a dictionary & data frame column which has a series of list elements in string type.

if values in dictionary item are matched with any of string elements that should be marked with the itemname

for ex: input

text_column=[['grapes','are','good','for','health'],['banana','is','not','good','for','health'],
['apple','keeps','the','doctor','away'],['automobile','industry','is','in','top','position','from','recent','times']]

dict={ "fruit_name":['apple','grapes','lemon','cherry'],
        "profession":['health','manufacturing','automobiles']
     }

output :

    1) fruit_name
    2) fruit_name
    3) profession
    4) profession

CodePudding user response：

You can reverse dict to create reverse_dct and map words in 'text_column' to 'word_type' (by the way, dict is a dictionary constructor in Python, don't name your variables dict).

reverse_dct = {}
for k,v in dct.items():
    for i in v:
        reverse_dct[i] = k

df = pd.DataFrame({'text_column':text_column})
df['word_type'] = df['text_column'].explode().map(reverse_dct).dropna().groupby(level=0).apply(','.join)

Output:

                                         text_column              word_type
0                   [grapes, are, good, for, health]  fruit_name,profession
1               [banana, is, not, good, for, health]             profession
2                  [apple, keeps, the, doctor, away]             fruit_name
3  [automobile, industry, is, in, top, position, ...                    NaN

Note that the last row doesn't have a type because you have automobiles in dict but automobile in text_column. You'll need to normalize spelling if you want your program to recognize these are the same.