how to merge two dataframe with different times and sizes-CodePudding

I am trying to merge these two dataframe together and preserve all the rows and columns. They have different times under the column 'time', so i want them to merge in a way that is time sequential.

df1:

time	run_id	weight
0	H1	500
24	H1	400
48	H1	300
0	H2	900
24	H2	800
48	H2	700

df2:

time	run_id	totalizer
0.5	H1	100
10	H1	200
40	H1	300
60	H1	400
0.5	H2	900
5	H2	1000
35	H2	1100
70	H2	1200

How do I merge these two tables into:

time	run_id	weight	totalizer
0	H1	500
0.5	H1		100
10	H1		200
24	H1	400
40	H1		300
48	H1	300
60	H1		400
0	H2	900
0.5	H2		900
5	H2		1000
24	H2	800
35	H2		1100
48	H2	700
70	H2		1200

I tried:

mergedf = df1.merge(df2, how='outer')

but it stacked df1 on top of df2.

CodePudding user response：

One option is to use combine_first :

cols = ["run_id", "time"]

out = (
        df1.set_index(cols)
              .combine_first(df2.set_index(cols))
              .reset_index().sort_values(by=cols)
           [["time", "run_id", "weight", "totalizer"]]
       )

Output :

print(out)

    time run_id  weight  totalizer
0    0.0     H1   500.0        NaN
1    0.5     H1     NaN      100.0
2   10.0     H1     NaN      200.0
3   24.0     H1   400.0        NaN
4   40.0     H1     NaN      300.0
5   48.0     H1   300.0        NaN
6   60.0     H1     NaN      400.0
7    0.0     H2   900.0        NaN
8    0.5     H2     NaN      900.0
9    5.0     H2     NaN     1000.0
10  24.0     H2   800.0        NaN
11  35.0     H2     NaN     1100.0
12  48.0     H2   700.0        NaN
13  70.0     H2     NaN     1200.0

CodePudding user response：

You could simply add line after what you have already done:

mergedf = df1.merge(df2, how='outer') # your current code
mergedf.sort_values(['run_id', 'time']) # add this

CodePudding user response：

You can use panda's merge_ordered

df_merged=pd.merge_ordered(df1,df2, on=['run_id','time'])