Home > Enterprise >  cleaning numeric columns in pandas
cleaning numeric columns in pandas

Time:01-27

I have some difficulties to exploit csv scraping file in pandas. I have several columns, one of them contain prices as '1 800 €'

After to import csv as dataframe, I can not convert my columns in Integrer

I deleted euro symbol without problem

data['prix']= data['prix'].str.strip('€')

I tried to delete space with the same approach, but the space still remaied

data['prix']= data['prix'].str.strip()
or
data['prix']= data['prix'].str.strip(' ')
or
data['prix']= data['prix'].str.replace(' ', '')

I tried to force the conversion in Int

data['prix']= pd.to_numeric(data['prix'], errors='coerce')

My column was fill by Nan value

I tried to convert before operation of replace space in string

data = data.convert_dtypes(convert_string=True)

But same result : impossible to achieve my aim the spaces are always present and I can not convert in integer

I looked with Excel into dataset, I can not identify special problem in the data

I tried also to change encoding standard in read_csv ... ditto

In this same dataset I had the same problem for the kilometrage as 15 256 km And I had no problem to retreat and convert to int ...

I would like to test through REGEX to copy only numbers of the field et create new column with

How to proceed ? I am also interested by other ideas

Thank you

CodePudding user response:

Use str.findall:

I would like to test through REGEX to copy only numbers of the field et create new column with

data['prix2'] = data['prix'].str.findall(r'\d ').str.join('').astype(int)

# Or if it raises an exception

data['prix2'] = pd.to_numeric(data['prix'].str.findall('(\d )').str.join(''), errors='coerce')

CodePudding user response:

To delete the white space use this line:

data['prix']= data['prix'].str.replace(" ","")

and to convert the string into a int use this line:

data['prix'] = [int(i) for i in data['prix']]
  •  Tags:  
  • Related