Home > OS >  pandas adding .0 when I import from CSV
pandas adding .0 when I import from CSV

Time:01-20

My problem is when I import the base, pandas kinds of try to convert it into a number?

This is more or less how my csv file is.

Data,ID,Text

2018-06-11,20995, bla bla bla 

2018-06-11,17980, bla bla bla

2018-06-11,46854, bla bla bla

The trouble is when I import it with pd.read_csv. The Id column should be exactly the way it is in csv files. But pandas is returning something like:

Data,ID,Text

2018-06-11,20995.0, bla bla bla 

2018-06-11,17980.0, bla bla bla

2018-06-11,46854.0, bla bla bla

I've tried using the dtype during the readin

df= pd.read_csv('df.csv',encoding ='latin1',dtype={'ID':str})

but it still adds the .0. When I look at the csv it does not have these tailing .0

I've also tried to converting to string after

df['ID'] = df['ID'].astype(str) 

I want to be clear that I've already read this question and the responses didn't answer my Q

CodePudding user response:

You can try

pd.Series(["{0:.1f}".format(val) for val in df['ID']], index = df.index)

CodePudding user response:

You have hit the worst pandas wart of all times. But it's 2022, and missing values for integers are finally supported! Check this out. Here is a csv file, with integer column a that has a missing value:

a,b
1,y
2,m
,c
3,a

If you read it in a default manner you get the annoying conversion to float:

pd.read_csv('test.csv'):

    a       b
--------------
0   1.0     y
1   2.0     m
2   NaN     c
3   3.0     a

But, if you tell pandas that you want new experimental integers with missing values, you get the good stuff: pd.read_csv('test.csv', dtype={'a': 'Int64'}):

    a   b
---------
0   1   y
1   2   m
2 <NA>  c
3   3   a
  •  Tags:  
  • Related