I have this dataframe which is a time series dataframe:
day month time year
index
0 03 25 03/25/93 93
1 6 18 6/18/85 85
2 7 8 7/8/71 71
3 9 27 9/27/75 75
4 2 6 2/6/96 96
5 7 06 7/06/79 79
6 5 18 5/18/78 78
7 10 24 10/24/89 89
8 3 7 3/7/86 86
9 4 10 4/10/71 71
10 5 11 5/11/85 85
11 4 09 4/09/75 75
12 8 01 8/01/98 98
13 1 26 1/26/72 72
14 5 24 5/24/1990 1990
15 1 25 1/25/2011 2011
16 4 12 4/12/82 82
17 10 13 10/13/1976 1976
18 4 24 4/24/98 98
19 5 21 5/21/77 77
20 7 21 7/21/98 98
21 10 21 10/21/79 79
22 3 03 3/03/90 90
23 2 11 2/11/76 76
24 07 25 07/25/1984 1984
25 4 13 4-13-82 82
26 9 22 9/22/89 89
27 9 02 9/02/76 76
28 9 12 9/12/71 71
29 10 24 10/24/86 86
... ... ... ... ...
470 NaN NaN 1983 1983
471 NaN NaN 1999 1999
472 NaN NaN 2010 2010
473 NaN NaN 1975 1975
474 NaN NaN 1972 1972
475 NaN NaN 2015 2015
476 NaN NaN 1989 1989
477 NaN NaN 1994 1994
478 NaN NaN 1993 1993
479 NaN NaN 1996 1996
480 NaN NaN 2013 2013
481 NaN NaN 1974 1974
482 NaN NaN 1990 1990
483 NaN NaN 1995 1995
484 NaN NaN 2004 2004
485 NaN NaN 1987 1987
486 NaN NaN 1973 1973
487 NaN NaN 1992 1992
488 NaN NaN 1977 1977
489 NaN NaN 1985 1985
490 NaN NaN 2007 2007
491 NaN NaN 2009 2009
492 NaN NaN 1986 1986
493 NaN NaN 1978 1978
494 NaN NaN 2002 2002
495 NaN NaN 1979 1979
496 NaN NaN 2006 2006
497 NaN NaN 2008 2008
498 NaN NaN 2005 2005
499 NaN NaN 1980 1980
When i convert it with to_datetime, i get 2071 when the year is 71 instead of 1971. How should i do to get 1971?
df['Date'] = pd.to_datetime(df['time'])#get error with 1971
Cf the dataframe below:
day month time year Date
index
0 03 25 03/25/93 93 1993-03-25
1 6 18 6/18/85 85 1985-06-18
2 7 8 7/8/71 71 2071-07-08
3 9 27 9/27/75 75 1975-09-27
4 2 6 2/6/96 96 1996-02-06
5 7 06 7/06/79 79 1979-07-06
6 5 18 5/18/78 78 1978-05-18
7 10 24 10/24/89 89 1989-10-24
8 3 7 3/7/86 86 1986-03-07
9 4 10 4/10/71 71 2071-04-10
10 5 11 5/11/85 85 1985-05-11
11 4 09 4/09/75 75 1975-04-09
12 8 01 8/01/98 98 1998-08-01
13 1 26 1/26/72 72 1972-01-26
14 5 24 5/24/1990 1990 1990-05-24
15 1 25 1/25/2011 2011 2011-01-25
16 4 12 4/12/82 82 1982-04-12
17 10 13 10/13/1976 1976 1976-10-13
18 4 24 4/24/98 98 1998-04-24
19 5 21 5/21/77 77 1977-05-21
20 7 21 7/21/98 98 1998-07-21
21 10 21 10/21/79 79 1979-10-21
22 3 03 3/03/90 90 1990-03-03
23 2 11 2/11/76 76 1976-02-11
24 07 25 07/25/1984 1984 1984-07-25
25 4 13 4-13-82 82 1982-04-13
26 9 22 9/22/89 89 1989-09-22
27 9 02 9/02/76 76 1976-09-02
28 9 12 9/12/71 71 2071-09-12
29 10 24 10/24/86 86 1986-10-24
... ... ... ... ... ...
470 NaN NaN 1983 1983 1983-01-01
471 NaN NaN 1999 1999 1999-01-01
472 NaN NaN 2010 2010 2010-01-01
473 NaN NaN 1975 1975 1975-01-01
474 NaN NaN 1972 1972 1972-01-01
475 NaN NaN 2015 2015 2015-01-01
476 NaN NaN 1989 1989 1989-01-01
477 NaN NaN 1994 1994 1994-01-01
478 NaN NaN 1993 1993 1993-01-01
479 NaN NaN 1996 1996 1996-01-01
480 NaN NaN 2013 2013 2013-01-01
481 NaN NaN 1974 1974 1974-01-01
482 NaN NaN 1990 1990 1990-01-01
483 NaN NaN 1995 1995 1995-01-01
484 NaN NaN 2004 2004 2004-01-01
485 NaN NaN 1987 1987 1987-01-01
486 NaN NaN 1973 1973 1973-01-01
487 NaN NaN 1992 1992 1992-01-01
488 NaN NaN 1977 1977 1977-01-01
489 NaN NaN 1985 1985 1985-01-01
490 NaN NaN 2007 2007 2007-01-01
491 NaN NaN 2009 2009 2009-01-01
492 NaN NaN 1986 1986 1986-01-01
493 NaN NaN 1978 1978 1978-01-01
494 NaN NaN 2002 2002 2002-01-01
495 NaN NaN 1979 1979 1979-01-01
496 NaN NaN 2006 2006 2006-01-01
497 NaN NaN 2008 2008 2008-01-01
498 NaN NaN 2005 2005 2005-01-01
499 NaN NaN 1980 1980 1980-01-01
As you can see the year 1971 is 2071. I tried to look into the docs but I didnt find the argument or option to specify 1900s instead of 2000s
CodePudding user response:
The year column is very ambiguous since a century isn't declared Python's behavior will interpret the dates as such. You can read the reasoning here.
There is a partial solution found here. You would basically offset the years by 100 (a century) to fix this issue. This will be a janky fix. You would want to implement this after getting your second dataframe.
import pandas as pd
import numpy as np
df['Date'] = np.where(df['Date'].dt.year > 2022, df['Date'] - pd.offsets.DateOffset(years=100), df['Date'])
# Anything after 2022 is changed to have 100 years subtracted because 2022 is the current year, change it as the years progress
