Home > Net >  pandas retain values on different index dataframes
pandas retain values on different index dataframes

Time:01-24

I need to merge two dataframes with different frequencies (daily to weekly). However, would like to retain the weekly values when merging to the daily dataframe.

There is a grouping variable in the data, group.

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta

daily={'date':[datetime.date(2022,1,1) relativedelta(day=i) for i in range(1,10)]*2,
       'group':['A' for x in range(1,10)] ['B' for x in range(1,10)],
       'daily_value':[x for x in range(1,10)]*2}
weekly={'date':[datetime.date(2022,1,1),datetime.date(2022,1,7)]*2,
        'group':['A','A'] ['B','B'],
        'weekly_value':[100,200,300,400]}


daily_data=pd.DataFrame(daily)
weekly_data=pd.DataFrame(weekly)

daily_data output:

          date group  daily_value
0   2022-01-01     A            1
1   2022-01-02     A            2
2   2022-01-03     A            3
3   2022-01-04     A            4
4   2022-01-05     A            5
5   2022-01-06     A            6
6   2022-01-07     A            7
7   2022-01-08     A            8
8   2022-01-09     A            9
9   2022-01-01     B            1
10  2022-01-02     B            2
11  2022-01-03     B            3
12  2022-01-04     B            4
13  2022-01-05     B            5
14  2022-01-06     B            6
15  2022-01-07     B            7
16  2022-01-08     B            8
17  2022-01-09     B            9

weekly_data output:

         date group  weekly_value
0  2022-01-01     A           100
1  2022-01-07     A           200
2  2022-01-01     B           300
3  2022-01-07     B           400

The desired output

desired={'date':[datetime.date(2022,1,1) relativedelta(day=i) for i in range(1,10)]*2,
         'group':['A' for x in range(1,10)] ['B' for x in range(1,10)],
         'daily_value':[x for x in range(1,10)]*2,
         'weekly_value':[100]*6 [200]*3 [300]*6 [400]*3}

desired_data=pd.DataFrame(desired)

desired_data output:

          date group  daily_value  weekly_value
0   2022-01-01     A            1           100
1   2022-01-02     A            2           100
2   2022-01-03     A            3           100
3   2022-01-04     A            4           100
4   2022-01-05     A            5           100
5   2022-01-06     A            6           100
6   2022-01-07     A            7           200
7   2022-01-08     A            8           200
8   2022-01-09     A            9           200
9   2022-01-01     B            1           300
10  2022-01-02     B            2           300
11  2022-01-03     B            3           300
12  2022-01-04     B            4           300
13  2022-01-05     B            5           300
14  2022-01-06     B            6           300
15  2022-01-07     B            7           400
16  2022-01-08     B            8           400
17  2022-01-09     B            9           400

CodePudding user response:

Use merge_asof with sorting values by datetimes, last sorting like original by both columns:

daily_data['date'] = pd.to_datetime(daily_data['date'])
weekly_data['date'] = pd.to_datetime(weekly_data['date'])


df = (pd.merge_asof(daily_data.sort_values('date'),
                    weekly_data.sort_values('date'), 
                    on='date', 
                    by='group').sort_values(['group','date'], ignore_index=True))
print (df)
         date group  daily_value  weekly_value
0  2022-01-01     A            1           100
1  2022-01-02     A            2           100
2  2022-01-03     A            3           100
3  2022-01-04     A            4           100
4  2022-01-05     A            5           100
5  2022-01-06     A            6           100
6  2022-01-07     A            7           200
7  2022-01-08     A            8           200
8  2022-01-09     A            9           200
9  2022-01-01     B            1           300
10 2022-01-02     B            2           300
11 2022-01-03     B            3           300
12 2022-01-04     B            4           300
13 2022-01-05     B            5           300
14 2022-01-06     B            6           300
15 2022-01-07     B            7           400
16 2022-01-08     B            8           400
17 2022-01-09     B            9           400
  •  Tags:  
  • Related