Home > OS >  How to match particular word in the list to pandas column?
How to match particular word in the list to pandas column?

Time:01-21

a = ['bed', 'mattress', 'sheets', 'pillow', 'admission kits']

df = pd.DataFrame({'sr.no':[1,2,3,4,5], 'des': ['2 bed rooms', 'natural language processing', '2x2 sheets grabs', '2 meter long pillow', '2x30mm long']})

df =   sr.no       des
       1           2 bed rooms
       2           natural language processing
       3           2x2 sheets grabs
       4           2 meter long admission kits
       5           2x30mm long

Here is list 'a' and dataframe 'df'. so here I want to match elements in list 'a' with dataframe column 'des'. if 'a' list of word present in 'des' column then print matched word otherwise print not match.

Here is the output that I want:

out=   sr.no    des                            output
       1        2 bed rooms                    bed
       2        natural language processing    not match
       3        2x2 sheets grabs               sheets 
       4        2 meter long admission kits    admission kits
       5        2x30mm long                    not match

How I can do this using python?

CodePudding user response:

If we only want to see if there is a match, we can use str.contains for the check and np.where to assign values:

df['output'] = np.where(df['des'].str.contains('|'.join(a)), 'match', 'not match')

Output:

   sr.no                          des     output
0      1                  2 bed rooms      match
1      2  natural language processing  not match
2      3             2x2 sheets grabs      match
3      4          2 meter long pillow      match
4      5                  2x30mm long  not match

CodePudding user response:

Dont use split solutions, because in list are joined values by space here 'admission kits', for avoid it use Series.str.extract:

pat = r"\b({})\b".format("|".join(x for x in a))
df['output'] = df['des'].str.extract(pat).fillna('not match')
print (df)
   sr.no                          des     output
0      1                  2 bed rooms        bed
1      2  natural language processing  not match
2      3             2x2 sheets grabs     sheets
3      4          2 meter long pillow     pillow
4      5                  2x30mm long  not match

If need test only matching:

pat = '|'.join(r"\b{}\b".format(x) for x in a)
df['output'] = np.where(df['des'].str.contains(pat), 'match', 'not match')
print (df)
   sr.no                          des     output
0      1                  2 bed rooms      match
1      2  natural language processing  not match
2      3             2x2 sheets grabs      match
3      4          2 meter long pillow      match
4      5                  2x30mm long  not match
  •  Tags:  
  • Related