I have a time series dataframe whose headers are "Date" & "Value"
Sample Dataframe:
Date Value
2019-10-01 46486868.0
2019-11-01 36092742.0
2019-12-01 32839185.0
Unfortunately, the Date column has some missing months. For example, the dataset is supposed to be a monthly dataset for the years 2001-2010 but it is missing a few random months in each year.
Since I am making a time series model, I cant have missing dates so I have been trying to figure out a way to include the same.
I tried creating a pandas object as follows:
df_date = pd.date_range(start='1/1/2001', end='11/1/2010', freq='MS')
type(df_date)
pandas.core.indexes.datetimes.DatetimeIndex
But this is not getting merged with pandas dataframe. It gives the following error:
TypeError: Can only merge Series or DataFrame objects, a <class 'pandas.core.indexes.datetimes.DatetimeIndex'> was passed
The task at hand is to have all dates for each month in the time span by either matching it with another dataframe with all dates and ffil the values that are missing or somehow including the missing values in the existing dataframe.
Is there any way to solve this?
CodePudding user response:
pd.date_range returns an index object. You can create a dateframe with it like this:
df_date = pd.DataFrame({'Date':pd.date_range(start='1/1/2019', end='1/1/2020', freq='MS') })
and then you can merge
df_date.merge(df, on='Date', how='left')
Output:
Date Value
0 2019-01-01 NaN
1 2019-02-01 NaN
2 2019-03-01 NaN
3 2019-04-01 NaN
4 2019-05-01 NaN
5 2019-06-01 NaN
6 2019-07-01 NaN
7 2019-08-01 NaN
8 2019-09-01 NaN
9 2019-10-01 46486868.0
10 2019-11-01 36092742.0
11 2019-12-01 32839185.0
12 2020-01-01 NaN
CodePudding user response:
Set Date as index and reindex it with df_date:
df_date = pd.date_range(start='1/1/2019', end='11/1/2020', freq='MS')
df = df.set_index('Date').reindex(df_date)
Output:
>>> df
Value
2019-01-01 NaN
2019-02-01 NaN
2019-03-01 NaN
2019-04-01 NaN
2019-05-01 NaN
2019-06-01 NaN
2019-07-01 NaN
2019-08-01 NaN
2019-09-01 NaN
2019-10-01 46486868.0
2019-11-01 36092742.0
2019-12-01 32839185.0
2020-01-01 NaN
2020-02-01 NaN
2020-03-01 NaN
2020-04-01 NaN
2020-05-01 NaN
2020-06-01 NaN
2020-07-01 NaN
2020-08-01 NaN
2020-09-01 NaN
2020-10-01 NaN
2020-11-01 NaN
