a = ['bed', 'mattress', 'sheets', 'pillow', 'admission kits']
df = pd.DataFrame({'sr.no':[1,2,3,4,5], 'des': ['2 bed rooms', 'natural language processing', '2x2 sheets grabs', '2 meter long pillow', '2x30mm long']})
df = sr.no des
1 2 bed rooms
2 natural language processing
3 2x2 sheets grabs
4 2 meter long admission kits
5 2x30mm long
Here is list 'a' and dataframe 'df'. so here I want to match elements in list 'a' with dataframe column 'des'. if 'a' list of word present in 'des' column then print matched word otherwise print not match.
Here is the output that I want:
out= sr.no des output
1 2 bed rooms bed
2 natural language processing not match
3 2x2 sheets grabs sheets
4 2 meter long admission kits admission kits
5 2x30mm long not match
How I can do this using python?
CodePudding user response:
If we only want to see if there is a match, we can use str.contains for the check and np.where to assign values:
df['output'] = np.where(df['des'].str.contains('|'.join(a)), 'match', 'not match')
Output:
sr.no des output
0 1 2 bed rooms match
1 2 natural language processing not match
2 3 2x2 sheets grabs match
3 4 2 meter long pillow match
4 5 2x30mm long not match
CodePudding user response:
Dont use split solutions, because in list are joined values by space here 'admission kits', for avoid it use Series.str.extract:
pat = r"\b({})\b".format("|".join(x for x in a))
df['output'] = df['des'].str.extract(pat).fillna('not match')
print (df)
sr.no des output
0 1 2 bed rooms bed
1 2 natural language processing not match
2 3 2x2 sheets grabs sheets
3 4 2 meter long pillow pillow
4 5 2x30mm long not match
If need test only matching:
pat = '|'.join(r"\b{}\b".format(x) for x in a)
df['output'] = np.where(df['des'].str.contains(pat), 'match', 'not match')
print (df)
sr.no des output
0 1 2 bed rooms match
1 2 natural language processing not match
2 3 2x2 sheets grabs match
3 4 2 meter long pillow match
4 5 2x30mm long not match
