Home > Blockchain >  Cannot convert multiple Pandas columns from Object to Datetime type in the existing Dataframe (inpla
Cannot convert multiple Pandas columns from Object to Datetime type in the existing Dataframe (inpla

Time:01-26

I have a Pandas dataframe with over ten columns, and most of these columns are Pandas 'object' type.

I need to convert three of these columns from 'object' to 'datetime' type and write back to the existing dataframe columns (to perform 'inplace update').

I am using the following code as the starting point for just one column and plan to expand on it for the other two columns, but it is not working.

The 'bigframe' dataframe has many duplicate rows. I am dropping duplicate rows based on 'incidentId' and assigning unique rows to 'alerts' below.

alerts = bigframe.drop_duplicates(subset=['incidentId'])
alerts['firstEventTime'] = pd.to_datetime(alerts['firstEventTime'])

The content of alerts['firstEventTime'] is similar to '2021-01-27 22:34:05.991031 00:00'

I am getting the following warning, but the code's execution completes without an error.

"A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead"

When I type 'alerts.dtypes', I still see this column is listed as 'object' type. So, I am assuming that the operations is not written back to the original dataframe for some reason.

I searched the internet, and the code listed above was supposed to work, but it doesn't. So I must be doing something wrong, but I cannot find it. Would you please let me know what I am doing wrong?

Thank you!

CodePudding user response:

there's two unrelated issues here;

#1 DataFrame.drop_duplicates returns a slice of the existing df, not a new df. You can avoid the warning by copying the result to a new df explicitly.

#2 pandas datetime dtype cannot handle mixed UTC offsets within one Series. It falls back to using Python's datetime in such cases, i.e. the elements are of type datetime.datetime which is displayed as 'object'. You can handle this by converting to UTC with keyword utc=True.

EX:

bigframe = pd.DataFrame({'incidentId': [1,1,2],
                         'firstEventTime': ['2021-01-27 22:34:05.991031 00:00',
                                            '2021-01-27 22:34:05.991031 00:00', 
                                            '2022-01-18 20:34:52']})

# avoid the warning by explicitly calling .copy():
alerts = bigframe.drop_duplicates(subset=['incidentId']).copy()

# now convert to datetime in UTC, to get a datetime64[ns] dtyped Series:
alerts['firstEventTime'] = pd.to_datetime(alerts['firstEventTime'], utc=True)

alerts['firstEventTime']
# 0   2021-01-27 22:34:05.991031 00:00
# 2          2022-01-18 20:34:52 00:00
# Name: firstEventTime, dtype: datetime64[ns, UTC]

Note that pandas will assume the input is UTC if no UTC offset is specified.

  •  Tags:  
  • Related