Home > Mobile >  Why does the dtype change to object when appending a row?
Why does the dtype change to object when appending a row?

Time:02-05

Create a DataFrame, print info on it append a row, print info again. The dtype of all the columns changes to object. Why?

myData = np.array([134.29, 136.97, 250.31, 312.28])
mySeries = pd.Series(myData,index=['IBM','P&G','Microsoft','Home Depot'], name="Stock Price")
myData1 = np.array(['120.573B', '336.72B', '1.885T' , '335.974B'])
mySeries1 = pd.Series(myData1, index=['IBM','P&G','Microsoft','Home Depot'], name="Market Cap")
myData2 = np.array([120_573_000_000, 336_720_000_000, 1_885_000_000_000 , 335_974_000_000])
mySeries2 = pd.Series(myData2, index=['IBM','P&G','Microsoft','Home Depot'], name="Market Cap Raw")

myDataFrame = pd.concat([mySeries, mySeries1, mySeries2], axis=1)
#print(myDataFrame)
print(myDataFrame.info())

# After adding the row below, the dtype of numeric types change to object

myData = np.array([20.99, '100M', 100000000 ])
mySeries = pd.Series(myData, index = myDataFrame.columns, name = 'HML')
myDataFrame = myDataFrame.append(mySeries, ignore_index=False)
#print(myDataFrame)
print(myDataFrame.info())


<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, IBM to Home Depot
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Stock Price     4 non-null      float64
 1   Market Cap      4 non-null      object 
 2   Market Cap Raw  4 non-null      int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 128.0  bytes
None
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, IBM to HML
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Stock Price     5 non-null      object
 1   Market Cap      5 non-null      object
 2   Market Cap Raw  5 non-null      object
dtypes: object(3)
memory usage: 160.0  bytes
None

CodePudding user response:

When you create a Series object containing objects of different incompatible types, the dtype of that Series becomes object.

When you create myData and mySeries the second time, that's exactly what's happening:

>>> myData = np.array([20.99, '100M', 100000000 ])
>>> mySeries = pd.Series(myData, index = myDataFrame.columns, name = 'HML')
>>> mySeries.dtype
dtype('O')

Right after that, you append that Series (of dtype object) to the dataframe. Since the object type is more general than the dtypes of the various columns of the dataframe, those columns get converted to the more general object dtype.

CodePudding user response:

I figure out how to fix it:

tmpSeries = pd.to_numeric(myDataFrame['Stock Price'])
myDataFrame['Stock Price'] = tmpSeries

This changes the column to float64 from object. to_numeric can also be used to convert to other numeric types.

  •  Tags:  
  • Related