Home > Net >  How to add the column to the dataframe depending upon the condition?(Pandas)
How to add the column to the dataframe depending upon the condition?(Pandas)

Time:01-07

I have a dataframe:

,nsn,nsn2,cage,part_number,company_name
0,6520-01-533-3775   ,6520015333775   ,1TDD0,973-0404,
1,6520-01-533-3775   ,6520015333775   ,4N2Q0,973-0404,
3,5995-01-633-1445   ,5995016331445   ,0RAG4,945923,"OHIO ASSOCIATED ENTERPRISES, LLC"
4,5331-00-157-6630   ,5331001576630   ,99167,3803-28,HAMILTON SUNDSTRAND CORPORATION
5,2915-00-908-6032   ,2915009086032   ,06848,2523862,HONEYWELL INTERNATIONAL INC.
6,5905-01-446-7000   ,5905014467000   ,63005,23054911,ROLLS-ROYCE CORPORATION
7,2840-01-440-7755   ,2840014407755   ,99207,5124T01G01,GENERAL ELECTRIC COMPANY

I want to add a column matches that has only two string values Possible and Close . The values filtering condition will be if part number is fully numbers including the '-' sign, then it would be close matches if part number includes characters then it would be possible matches.

So here the expected dataframe will be :

,nsn,nsn2,cage,part_number,company_name, matches
0,6520-01-533-3775   ,6520015333775   ,1TDD0,973-0404,,close
1,6520-01-533-3775   ,6520015333775   ,4N2Q0,973-0404,,close.
3,5995-01-633-1445   ,5995016331445   ,0RAG4,945923,"OHIO ASSOCIATED ENTERPRISES, LLC",close
4,5331-00-157-6630   ,5331001576630   ,99167,3803-28,HAMILTON SUNDSTRAND CORPORATION,close
5,2915-00-908-6032   ,2915009086032   ,06848,2523862,HONEYWELL INTERNATIONAL INC.,close
6,5905-01-446-7000   ,5905014467000   ,63005,23054911,ROLLS-ROYCE CORPORATION,close
7,2840-01-440-7755   ,2840014407755   ,99207,5124T01G01,GENERAL ELECTRIC COMPANY,possible

CodePudding user response:

Use Series.str.contains with regex for match digits or - betwen start of string by ^ and end of string by $ and set new column in numpy.where:

df['matches'] = np.where(df['part_number'].str.contains('^[0-9-] $'),'close','possible')
print (df)
                   nsn           nsn2   cage part_number  \
0  6520-01-533-3775     6520015333775  1TDD0    973-0404   
1  6520-01-533-3775     6520015333775  4N2Q0    973-0404   
3  5995-01-633-1445     5995016331445  0RAG4      945923   
4  5331-00-157-6630     5331001576630  99167     3803-28   
5  2915-00-908-6032     2915009086032  06848     2523862   
6  5905-01-446-7000     5905014467000  63005    23054911   
7  2840-01-440-7755     2840014407755  99207  5124T01G01   

                       company_name   matches  
0                               NaN     close  
1                               NaN     close  
3  OHIO ASSOCIATED ENTERPRISES, LLC     close  
4   HAMILTON SUNDSTRAND CORPORATION     close  
5      HONEYWELL INTERNATIONAL INC.     close  
6           ROLLS-ROYCE CORPORATION     close  
7          GENERAL ELECTRIC COMPANY  possible  

CodePudding user response:

This should work

df['matches'] = np.where(df['part_number'].str.replace('-','').str.isdigit() == True,'close','possible')

Here I replaces '-' with empty string and used isdigit() function, which checks if all chars in the string are digits. Based on this condition a new column was created using np.where().

  •  Tags:  
  • Related