I have a column of my dataframe that is made up of the following:
df['Year] = [2025, 2024, NaN, 2023, 2026, NaN] (these are type float64)
How can I convert these years to something in datetime format? Since there are no months or days included I feel like they have to output as [01-01-2025, 01-01-2021, NaT, 01-01-2023, 01-01-2026, NaT] by default.
But if there was a way to still have the column as [2025, 2024, NaT, 2023, 2026, NaT] then that would work well too.
Using df['Year'] = pd.DatetimeIndex(df['Year']).year just output [1970, 1970, NaN, 1970, 1970, NaN].
Thank you very much.
CodePudding user response:
You can use pandas' to_datetime() and set errors='coerce' to take care of the NaNs (-> NaT)
df['Year'] = pd.to_datetime(df['Year'], format='%Y', errors='coerce')
The output is going to be like 01-01-2025, 01-01-2021 ...
CodePudding user response:
Probably not the most elegant solution but if you convert the column to string and fill the empty with a dummy year (say 1900) you can use parser from dateutil
from dateutil import parser
('01/01/' df['year']).fillna('1900').apply(parser.parse)
Out[67]: 0 2025-01-01 1 2024-01-01 2 1900-07-21 3 2023-01-01 4 2026-01-01 5 1900-07-21
