I need to merge two dataframes with different frequencies (daily to weekly). However, would like to retain the weekly values when merging to the daily dataframe.
There is a grouping variable in the data, group.
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
daily={'date':[datetime.date(2022,1,1) relativedelta(day=i) for i in range(1,10)]*2,
'group':['A' for x in range(1,10)] ['B' for x in range(1,10)],
'daily_value':[x for x in range(1,10)]*2}
weekly={'date':[datetime.date(2022,1,1),datetime.date(2022,1,7)]*2,
'group':['A','A'] ['B','B'],
'weekly_value':[100,200,300,400]}
daily_data=pd.DataFrame(daily)
weekly_data=pd.DataFrame(weekly)
daily_data output:
date group daily_value
0 2022-01-01 A 1
1 2022-01-02 A 2
2 2022-01-03 A 3
3 2022-01-04 A 4
4 2022-01-05 A 5
5 2022-01-06 A 6
6 2022-01-07 A 7
7 2022-01-08 A 8
8 2022-01-09 A 9
9 2022-01-01 B 1
10 2022-01-02 B 2
11 2022-01-03 B 3
12 2022-01-04 B 4
13 2022-01-05 B 5
14 2022-01-06 B 6
15 2022-01-07 B 7
16 2022-01-08 B 8
17 2022-01-09 B 9
weekly_data output:
date group weekly_value
0 2022-01-01 A 100
1 2022-01-07 A 200
2 2022-01-01 B 300
3 2022-01-07 B 400
The desired output
desired={'date':[datetime.date(2022,1,1) relativedelta(day=i) for i in range(1,10)]*2,
'group':['A' for x in range(1,10)] ['B' for x in range(1,10)],
'daily_value':[x for x in range(1,10)]*2,
'weekly_value':[100]*6 [200]*3 [300]*6 [400]*3}
desired_data=pd.DataFrame(desired)
desired_data output:
date group daily_value weekly_value
0 2022-01-01 A 1 100
1 2022-01-02 A 2 100
2 2022-01-03 A 3 100
3 2022-01-04 A 4 100
4 2022-01-05 A 5 100
5 2022-01-06 A 6 100
6 2022-01-07 A 7 200
7 2022-01-08 A 8 200
8 2022-01-09 A 9 200
9 2022-01-01 B 1 300
10 2022-01-02 B 2 300
11 2022-01-03 B 3 300
12 2022-01-04 B 4 300
13 2022-01-05 B 5 300
14 2022-01-06 B 6 300
15 2022-01-07 B 7 400
16 2022-01-08 B 8 400
17 2022-01-09 B 9 400
CodePudding user response:
Use merge_asof with sorting values by datetimes, last sorting like original by both columns:
daily_data['date'] = pd.to_datetime(daily_data['date'])
weekly_data['date'] = pd.to_datetime(weekly_data['date'])
df = (pd.merge_asof(daily_data.sort_values('date'),
weekly_data.sort_values('date'),
on='date',
by='group').sort_values(['group','date'], ignore_index=True))
print (df)
date group daily_value weekly_value
0 2022-01-01 A 1 100
1 2022-01-02 A 2 100
2 2022-01-03 A 3 100
3 2022-01-04 A 4 100
4 2022-01-05 A 5 100
5 2022-01-06 A 6 100
6 2022-01-07 A 7 200
7 2022-01-08 A 8 200
8 2022-01-09 A 9 200
9 2022-01-01 B 1 300
10 2022-01-02 B 2 300
11 2022-01-03 B 3 300
12 2022-01-04 B 4 300
13 2022-01-05 B 5 300
14 2022-01-06 B 6 300
15 2022-01-07 B 7 400
16 2022-01-08 B 8 400
17 2022-01-09 B 9 400
