Keep only matched words in pandas column-CodePudding

I want to keep only those words which are present in my list. All other words should get deleted.(pandas dataframe)

cuisine_list = ['breakfast', 'american', 'tea', 'chicken']

name	cuisine
dominos pizza	breakfast american tea dine in
kfc	american chicken play area

The result should look like this-

name	cuisine
dominos pizza	breakfast american tea
kfc	american chicken

I am using following code but its taking lot of time.

 file1_cuisine = file1[["Cuisine"]]

for index, row in file1_cuisine.iterrows():
    words_to_keep = []
    for word in row[0].split(' '):
        if word in words_to_match :
            words_to_keep.append(word   ' ')
    file1_cuisine.loc[index, 'final_input_text']= ''.join(words_to_keep)

CodePudding user response：

Use lambda function with split and set intersection, last join values by ,:

cuisine_list = ['breakfast', 'american', 'tea', 'chicken']
df['cuisine'] = df['cuisine'].apply(lambda x: ','.join(set(x.split()).intersection(cuisine_list)))

print (df)
            name                 cuisine
0  dominos pizza  tea,breakfast,american
1            kfc        chicken,american

Or use Series.str.findall:

cuisine_list = ['breakfast', 'american', 'tea', 'chicken']

pat = '|'.join(r"\b{}\b".format(x) for x in cuisine_list)
df['cuisine'] = df['cuisine'].str.findall(rf'{pat}').str.join(',')

print (df)
            name                 cuisine
0  dominos pizza  breakfast,american,tea
1            kfc        american,chicken

CodePudding user response：

Use set intersection using & with df.apply and Series.str.split:

In [760]: y = set(cuisine_list)
In [766]: df['cuisine'] = df['cuisine'].str.split().apply(lambda x: list(set(x) & y)).str.join(',')
    
In [767]: df
Out[767]: 
            name                 cuisine
0  dominos pizza  tea,american,breakfast
1            kfc        chicken,american