I have a df
df = pd.DataFrame(index = ['A','B','C','D','E'],
columns = ['date_1','date_2','value_2','value_3','value_4'],
data = [['2021-06-28', '2022-05-03', 30, 40, 60],
['2022-01-10', '2022-05-15', 50, 90, 70],
[np.nan, '2022-05-15', 40, 60, 80],
[np.nan, '2022-04-28', 40, 60, 90],
[np.nan, '2022-06-28', 50, 60, 54]])
date_1 date_2 value_2 value_3 value_4
A 2021-06-28 2022-05-03 30 40 60
B 2022-01-03 2022-05-15 50 90 70
C NaN 2022-05-15 40 60 80
D NaN 2022-04-28 40 60 90
...
E NaN 2022-06-28 50 60 54
I am trying to fill NaN values in column date_1. The values I need to fill the date_1 column are changing every week, the min value of date_1 value need to be2021-06-28 and the max value is 2022-06-20. Each week the max value in date_1 column will be the last Monday. I need column date_1 to each date up to 2022-06-20 at least once so that each date starting from 2021-06-28 to 2022-06-20 will be in date_1 at least once. The order of these values does not matter.
I tried:
from datetime import date, timedelta
today = date.today()
last_monday = pd.to_datetime((today - timedelta(days=today.weekday()) - timedelta(days=7)).strftime('%Y-%m-%d'))
# date_mappings is a dictionary with this kind of structure:
# {1 : '2021-06-28', 2 : '2021-07-05', ... 52 : '2022-06-20'}
dates_needed = [x for x in pd.to_datetime(list(date_mappings.values())) if x >= last_monday]
So now dates_needed has the remaining of the dates that needs to be added at least once in date_1 column.
The problem I am facing is that the shapes do not match when I try to fill the values, because there can be multiple rows with the same date_2.
If I try to use:
df.loc[df['date_1'].isna(), 'date_1'] = dates_needed
I get:
ValueError: Must have equal len keys and value when setting with an iterable
Because this only works if I match the shape:
df.loc[df['date_1'].isna(), 'date_1'] = [pd.to_datetime('2022-01-10 00:00:00'),
pd.to_datetime('2022-01-17 00:00:00'),
pd.to_datetime('2022-01-24 00:00:00')]
date_1 date_2 value_2 value_3 value_4
A 2021-06-28 2022-05-03 30 40 60
B 2022-01-10 2022-05-15 50 90 70
C 2022-01-10 2022-05-15 40 60 80
D 2022-01-17 2022-04-28 40 60 90
E 2022-01-24 2022-06-28 50 60 54
So my goal is to fill NaN values in date_1 from a created list dates_needed where the each date from dates_needed is used at least once in date_1 column and the order does not matter.
CodePudding user response:
Here is solution for mapping by integers from date_mappings by helper Index by number of missing values by sum. Solution working if difference between length of dict vs number of missing values:
m = df['date_1'].isna()
df.loc[m, 'date_1'] = (pd.Index(range(m.sum())) 1).map(date_mappings)
