Home > Software engineering >  pandas to_datetime converting 71 to 2071 instead of 1971
pandas to_datetime converting 71 to 2071 instead of 1971

Time:01-06

I have this dataframe which is a time series dataframe:

           day month   time  year
index                             
0       03    25    03/25/93    93
1        6    18     6/18/85    85
2        7     8      7/8/71    71
3        9    27     9/27/75    75
4        2     6      2/6/96    96
5        7    06     7/06/79    79
6        5    18     5/18/78    78
7       10    24    10/24/89    89
8        3     7      3/7/86    86
9        4    10     4/10/71    71
10       5    11     5/11/85    85
11       4    09     4/09/75    75
12       8    01     8/01/98    98
13       1    26     1/26/72    72
14       5    24   5/24/1990  1990
15       1    25   1/25/2011  2011
16       4    12     4/12/82    82
17      10    13  10/13/1976  1976
18       4    24     4/24/98    98
19       5    21     5/21/77    77
20       7    21     7/21/98    98
21      10    21    10/21/79    79
22       3    03     3/03/90    90
23       2    11     2/11/76    76
24      07    25  07/25/1984  1984
25       4    13     4-13-82    82
26       9    22     9/22/89    89
27       9    02     9/02/76    76
28       9    12     9/12/71    71
29      10    24    10/24/86    86
...    ...   ...         ...   ...
470    NaN   NaN        1983  1983
471    NaN   NaN        1999  1999
472    NaN   NaN        2010  2010
473    NaN   NaN        1975  1975
474    NaN   NaN        1972  1972
475    NaN   NaN        2015  2015
476    NaN   NaN        1989  1989
477    NaN   NaN        1994  1994
478    NaN   NaN        1993  1993
479    NaN   NaN        1996  1996
480    NaN   NaN        2013  2013
481    NaN   NaN        1974  1974
482    NaN   NaN        1990  1990
483    NaN   NaN        1995  1995
484    NaN   NaN        2004  2004
485    NaN   NaN        1987  1987
486    NaN   NaN        1973  1973
487    NaN   NaN        1992  1992
488    NaN   NaN        1977  1977
489    NaN   NaN        1985  1985
490    NaN   NaN        2007  2007
491    NaN   NaN        2009  2009
492    NaN   NaN        1986  1986
493    NaN   NaN        1978  1978
494    NaN   NaN        2002  2002
495    NaN   NaN        1979  1979
496    NaN   NaN        2006  2006
497    NaN   NaN        2008  2008
498    NaN   NaN        2005  2005
499    NaN   NaN        1980  1980

When i convert it with to_datetime, i get 2071 when the year is 71 instead of 1971. How should i do to get 1971?

df['Date'] = pd.to_datetime(df['time'])#get error with 1971

Cf the dataframe below:

           day month        time  year       Date
index                                        
0       03    25    03/25/93    93 1993-03-25
1        6    18     6/18/85    85 1985-06-18
2        7     8      7/8/71    71 2071-07-08
3        9    27     9/27/75    75 1975-09-27
4        2     6      2/6/96    96 1996-02-06
5        7    06     7/06/79    79 1979-07-06
6        5    18     5/18/78    78 1978-05-18
7       10    24    10/24/89    89 1989-10-24
8        3     7      3/7/86    86 1986-03-07
9        4    10     4/10/71    71 2071-04-10
10       5    11     5/11/85    85 1985-05-11
11       4    09     4/09/75    75 1975-04-09
12       8    01     8/01/98    98 1998-08-01
13       1    26     1/26/72    72 1972-01-26
14       5    24   5/24/1990  1990 1990-05-24
15       1    25   1/25/2011  2011 2011-01-25
16       4    12     4/12/82    82 1982-04-12
17      10    13  10/13/1976  1976 1976-10-13
18       4    24     4/24/98    98 1998-04-24
19       5    21     5/21/77    77 1977-05-21
20       7    21     7/21/98    98 1998-07-21
21      10    21    10/21/79    79 1979-10-21
22       3    03     3/03/90    90 1990-03-03
23       2    11     2/11/76    76 1976-02-11
24      07    25  07/25/1984  1984 1984-07-25
25       4    13     4-13-82    82 1982-04-13
26       9    22     9/22/89    89 1989-09-22
27       9    02     9/02/76    76 1976-09-02
28       9    12     9/12/71    71 2071-09-12
29      10    24    10/24/86    86 1986-10-24
...    ...   ...         ...   ...        ...
470    NaN   NaN        1983  1983 1983-01-01
471    NaN   NaN        1999  1999 1999-01-01
472    NaN   NaN        2010  2010 2010-01-01
473    NaN   NaN        1975  1975 1975-01-01
474    NaN   NaN        1972  1972 1972-01-01
475    NaN   NaN        2015  2015 2015-01-01
476    NaN   NaN        1989  1989 1989-01-01
477    NaN   NaN        1994  1994 1994-01-01
478    NaN   NaN        1993  1993 1993-01-01
479    NaN   NaN        1996  1996 1996-01-01
480    NaN   NaN        2013  2013 2013-01-01
481    NaN   NaN        1974  1974 1974-01-01
482    NaN   NaN        1990  1990 1990-01-01
483    NaN   NaN        1995  1995 1995-01-01
484    NaN   NaN        2004  2004 2004-01-01
485    NaN   NaN        1987  1987 1987-01-01
486    NaN   NaN        1973  1973 1973-01-01
487    NaN   NaN        1992  1992 1992-01-01
488    NaN   NaN        1977  1977 1977-01-01
489    NaN   NaN        1985  1985 1985-01-01
490    NaN   NaN        2007  2007 2007-01-01
491    NaN   NaN        2009  2009 2009-01-01
492    NaN   NaN        1986  1986 1986-01-01
493    NaN   NaN        1978  1978 1978-01-01
494    NaN   NaN        2002  2002 2002-01-01
495    NaN   NaN        1979  1979 1979-01-01
496    NaN   NaN        2006  2006 2006-01-01
497    NaN   NaN        2008  2008 2008-01-01
498    NaN   NaN        2005  2005 2005-01-01
499    NaN   NaN        1980  1980 1980-01-01

As you can see the year 1971 is 2071. I tried to look into the docs but I didnt find the argument or option to specify 1900s instead of 2000s

CodePudding user response:

The year column is very ambiguous since a century isn't declared Python's behavior will interpret the dates as such. You can read the reasoning here.

There is a partial solution found here. You would basically offset the years by 100 (a century) to fix this issue. This will be a janky fix. You would want to implement this after getting your second dataframe.

import pandas as pd
import numpy as np

df['Date'] = np.where(df['Date'].dt.year > 2022, df['Date'] - pd.offsets.DateOffset(years=100), df['Date'])
# Anything after 2022 is changed to have 100 years subtracted because 2022 is the current year, change it as the years progress
  •  Tags:  
  • Related