My problem is when I import the base, pandas kinds of try to convert it into a number?
This is more or less how my csv file is.
Data,ID,Text
2018-06-11,20995, bla bla bla
2018-06-11,17980, bla bla bla
2018-06-11,46854, bla bla bla
The trouble is when I import it with pd.read_csv. The Id column should be exactly the way it is in csv files. But pandas is returning something like:
Data,ID,Text
2018-06-11,20995.0, bla bla bla
2018-06-11,17980.0, bla bla bla
2018-06-11,46854.0, bla bla bla
I've tried using the dtype during the readin
df= pd.read_csv('df.csv',encoding ='latin1',dtype={'ID':str})
but it still adds the .0. When I look at the csv it does not have these tailing .0
I've also tried to converting to string after
df['ID'] = df['ID'].astype(str)
I want to be clear that I've already read this question and the responses didn't answer my Q
CodePudding user response:
You can try
pd.Series(["{0:.1f}".format(val) for val in df['ID']], index = df.index)
CodePudding user response:
You have hit the worst pandas wart of all times. But it's 2022, and missing values for integers are finally supported! Check this out. Here is a csv file, with integer column a that has a missing value:
a,b
1,y
2,m
,c
3,a
If you read it in a default manner you get the annoying conversion to float:
pd.read_csv('test.csv'):
a b
--------------
0 1.0 y
1 2.0 m
2 NaN c
3 3.0 a
But, if you tell pandas that you want new experimental integers with missing values, you get the good stuff:
pd.read_csv('test.csv', dtype={'a': 'Int64'}):
a b
---------
0 1 y
1 2 m
2 <NA> c
3 3 a
