I have a dataframe:
,nsn,nsn2,cage,part_number,company_name
0,6520-01-533-3775 ,6520015333775 ,1TDD0,973-0404,
1,6520-01-533-3775 ,6520015333775 ,4N2Q0,973-0404,
3,5995-01-633-1445 ,5995016331445 ,0RAG4,945923,"OHIO ASSOCIATED ENTERPRISES, LLC"
4,5331-00-157-6630 ,5331001576630 ,99167,3803-28,HAMILTON SUNDSTRAND CORPORATION
5,2915-00-908-6032 ,2915009086032 ,06848,2523862,HONEYWELL INTERNATIONAL INC.
6,5905-01-446-7000 ,5905014467000 ,63005,23054911,ROLLS-ROYCE CORPORATION
7,2840-01-440-7755 ,2840014407755 ,99207,5124T01G01,GENERAL ELECTRIC COMPANY
I want to add a column matches that has only two string values Possible and Close . The values filtering condition will be if part number is fully numbers including the '-' sign, then it would be close matches if part number includes characters then it would be possible matches.
So here the expected dataframe will be :
,nsn,nsn2,cage,part_number,company_name, matches
0,6520-01-533-3775 ,6520015333775 ,1TDD0,973-0404,,close
1,6520-01-533-3775 ,6520015333775 ,4N2Q0,973-0404,,close.
3,5995-01-633-1445 ,5995016331445 ,0RAG4,945923,"OHIO ASSOCIATED ENTERPRISES, LLC",close
4,5331-00-157-6630 ,5331001576630 ,99167,3803-28,HAMILTON SUNDSTRAND CORPORATION,close
5,2915-00-908-6032 ,2915009086032 ,06848,2523862,HONEYWELL INTERNATIONAL INC.,close
6,5905-01-446-7000 ,5905014467000 ,63005,23054911,ROLLS-ROYCE CORPORATION,close
7,2840-01-440-7755 ,2840014407755 ,99207,5124T01G01,GENERAL ELECTRIC COMPANY,possible
CodePudding user response:
Use Series.str.contains with regex for match digits or - betwen start of string by ^ and end of string by $ and set new column in numpy.where:
df['matches'] = np.where(df['part_number'].str.contains('^[0-9-] $'),'close','possible')
print (df)
nsn nsn2 cage part_number \
0 6520-01-533-3775 6520015333775 1TDD0 973-0404
1 6520-01-533-3775 6520015333775 4N2Q0 973-0404
3 5995-01-633-1445 5995016331445 0RAG4 945923
4 5331-00-157-6630 5331001576630 99167 3803-28
5 2915-00-908-6032 2915009086032 06848 2523862
6 5905-01-446-7000 5905014467000 63005 23054911
7 2840-01-440-7755 2840014407755 99207 5124T01G01
company_name matches
0 NaN close
1 NaN close
3 OHIO ASSOCIATED ENTERPRISES, LLC close
4 HAMILTON SUNDSTRAND CORPORATION close
5 HONEYWELL INTERNATIONAL INC. close
6 ROLLS-ROYCE CORPORATION close
7 GENERAL ELECTRIC COMPANY possible
CodePudding user response:
This should work
df['matches'] = np.where(df['part_number'].str.replace('-','').str.isdigit() == True,'close','possible')
Here I replaces '-' with empty string and used isdigit() function, which checks if all chars in the string are digits. Based on this condition a new column was created using np.where().
