Home > Net >  Parsing CSV Data from one column with Pandas
Parsing CSV Data from one column with Pandas

Time:02-04

Lets say I have column "OU":

OU                      
CORP:Jenny Smith:    
STORE:Mary Poppins:  
STORE:Tony Stark:
STORE:Carmen Sandiego:    
NEWS:Peter Parker:
NEWS:Clark Kent:

I want to parse this column up to the first ":" and keep only the words before the ":". Then any word that repeats is left only at one. So the finished data should look like this:

OU                      
CORP
STORE     
NEWS

Would I need to do something in the pandas.read_csv(file, usecols=['OU']) when I read the original CSV file?


In reference to an answer below, this is also how that one row looks in a text editor:

 OU                      
 CORP:Jenny Smith:   
 "CORP:John Smith:,John Smith:" 
 STORE:Mary Poppins:  
 STORE:Tony Stark:
 STORE:Carmen Sandiego:    
 NEWS:Peter Parker:
 NEWS:Clark Kent:

CodePudding user response:

You can use the semicolon as separator and supply the column title manually, skipping the first title row of the csv file. Then you drop_duplicates:

pd.read_csv(file, sep=":", header=None, skiprows=1, usecols=[0], names=['OU']).drop_duplicates()

Result:

      OU
0   CORP
1  STORE
4   NEWS
  •  Tags:  
  • Related