Home > Blockchain >  Why is Pandas to_datetime() function returning (DD-MM-YYYY) datetime in two different formats in the
Why is Pandas to_datetime() function returning (DD-MM-YYYY) datetime in two different formats in the

Time:01-14

I am a beginner to quantitative analysis of stocks with time series analysis and intend to convert the data column in a Pandas series to a datetime format. Earlier, the index of the series was

infy.index

Output:

Index(['03-01-2000', '04-01-2000', '05-01-2000', '06-01-2000', '07-01-2000',
       '10-01-2000', '11-01-2000', '12-01-2000', '13-01-2000', '14-01-2000',
       ...
       '16-04-2021', '19-04-2021', '20-04-2021', '22-04-2021', '23-04-2021',
       '26-04-2021', '27-04-2021', '28-04-2021', '29-04-2021', '30-04-2021'],
      dtype='object', name='Date', length=5306)

Currently, the format is DD-MM-YYYY. So now, I applied the following code to change the format

infy = pd.to_datetime(infy.index)
infy.head()

Output:

DatetimeIndex(['2000-03-01', '2000-04-01', '2000-05-01', '2000-06-01',
               '2000-07-01', '2000-10-01', '2000-11-01', '2000-12-01',
               '2000-01-13', '2000-01-14',
               ...
               '2021-04-16', '2021-04-19', '2021-04-20', '2021-04-22',
               '2021-04-23', '2021-04-26', '2021-04-27', '2021-04-28',
               '2021-04-29', '2021-04-30'],
              dtype='datetime64[ns]', name='Date', length=5306, freq=None)

So, the problem is that the first few dates are in the format YYYY-DD-MM, but if you look in the end, the format is now YYYY-MM-DD. I thought I might've made some error in the beginning while converting the DataFrame to a Series but upon multiple iterations of rechecking, I couldn't find any problem.

Why is this happening?

CodePudding user response:

The datetime default format in pandas is YYYY-MM-DD. So it always shows datetime objects with that format. If you want to show the dates in another format you can use:

date_as_strings=pd.to_datetime(infy.index).dt.strftime("%y-%d-%m") 

Note: strftime changes the type to string.

CodePudding user response:

pd.to_datetime, without any additional arguments, can flexibly parse a single column with multiple formats for a given date. While this can be extremely powerful it is also very problematic.

The main issue here is that the default for the dayfirst argument of pd.to_datetime is False.

This means that for your first few dates, which it can properly parse because the first part is <= 12, pandas will parse this as MM-DD-YYYY (dayfirst is False so the Month comes first). Later when it encounters a date where the first part is >12 it's smart enough to know there are only 12 months in a year so it assumes the format is DD-MM-YYYY and it parses those dates as such.

This is clearly not your desired behavior so you should either specify dayfirst=True or pass the specific format that works for all of your dates.

pd.to_datetime(infy.index, format='%d-%m-%Y')
# or
pd.to_datetime(infy.index, dayfirst=True)

DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06',
               '2000-01-07', '2000-01-10', '2000-01-11', '2000-01-12',
               '2000-01-13', '2000-01-14', 
               ...
               '2021-04-16', '2021-04-19',
               '2021-04-20', '2021-04-22', '2021-04-23', '2021-04-26',
               '2021-04-27', '2021-04-28', '2021-04-29', '2021-04-30'],
              dtype='datetime64[ns]', freq=None)
  •  Tags:  
  • Related