i am new to pandas and I try to convert an int type-column to an date type-column .
The int in the df is something like: 10712 (first day, then month, then year).
I tried solving this with:
df_date = pd.to_datetime(df['Date'], format='%d%m%Y')
but I always get the following value error:
time data '10712' does not match format '%d%m%Y' (match)
Thank you for your help :)
CodePudding user response:
You should use %y (2-digit year) instead of %Y (4-digit year). But that is not enough.
The format %d%m%y converts 10712 to 10-07-2012, not to 1-07-2012 as you expect.
That's because of the following feature of the underlying strptime:
When used with the strptime() method, the leading zero is optional for %m
A workaround could be to convert to a format properly understandable by strptime (and to_datetime):
>>> df = pd.DataFrame({'date': [10712, 20813, 30914]})
>>> df
date
0 10712
1 20813
2 30914
>>> df1 = df.date.astype(str).str.replace('(\d )(\d\d)(\d\d)',
r'\2/\1/\3', regex=True)
>>> df1
0 07/1/12
1 08/2/13
2 09/3/14
>>> pd.to_datetime(df1)
0 2012-07-01
1 2013-08-02
2 2014-09-03
CodePudding user response:
Use %y year specifier to parse year without century digits:
In [654]: pd.to_datetime(10712, format='%d%m%y')
Out[654]: Timestamp('2012-07-10 00:00:00')
CodePudding user response:
pandas.to_datetime only work with '%Y%m%d', that is why you can use %d%m%Y.
example =>
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
datetime.datetime(1300, 1, 1, 0, 0)
>>> pd.to_datetime('13000101', format='%d%m%Y', errors='ignore')
'13000101'
>>> pd.to_datetime('13000101', format='%m%d%Y', errors='ignore')
'13000101'
