I am trying to merge these two dataframe together and preserve all the rows and columns. They have different times under the column 'time', so i want them to merge in a way that is time sequential.
df1:
| time | run_id | weight |
|---|---|---|
| 0 | H1 | 500 |
| 24 | H1 | 400 |
| 48 | H1 | 300 |
| 0 | H2 | 900 |
| 24 | H2 | 800 |
| 48 | H2 | 700 |
df2:
| time | run_id | totalizer |
|---|---|---|
| 0.5 | H1 | 100 |
| 10 | H1 | 200 |
| 40 | H1 | 300 |
| 60 | H1 | 400 |
| 0.5 | H2 | 900 |
| 5 | H2 | 1000 |
| 35 | H2 | 1100 |
| 70 | H2 | 1200 |
How do I merge these two tables into:
| time | run_id | weight | totalizer |
|---|---|---|---|
| 0 | H1 | 500 | |
| 0.5 | H1 | 100 | |
| 10 | H1 | 200 | |
| 24 | H1 | 400 | |
| 40 | H1 | 300 | |
| 48 | H1 | 300 | |
| 60 | H1 | 400 | |
| 0 | H2 | 900 | |
| 0.5 | H2 | 900 | |
| 5 | H2 | 1000 | |
| 24 | H2 | 800 | |
| 35 | H2 | 1100 | |
| 48 | H2 | 700 | |
| 70 | H2 | 1200 |
I tried:
mergedf = df1.merge(df2, how='outer')
but it stacked df1 on top of df2.
CodePudding user response:
One option is to use combine_first :
cols = ["run_id", "time"]
out = (
df1.set_index(cols)
.combine_first(df2.set_index(cols))
.reset_index().sort_values(by=cols)
[["time", "run_id", "weight", "totalizer"]]
)
Output :
print(out)
time run_id weight totalizer
0 0.0 H1 500.0 NaN
1 0.5 H1 NaN 100.0
2 10.0 H1 NaN 200.0
3 24.0 H1 400.0 NaN
4 40.0 H1 NaN 300.0
5 48.0 H1 300.0 NaN
6 60.0 H1 NaN 400.0
7 0.0 H2 900.0 NaN
8 0.5 H2 NaN 900.0
9 5.0 H2 NaN 1000.0
10 24.0 H2 800.0 NaN
11 35.0 H2 NaN 1100.0
12 48.0 H2 700.0 NaN
13 70.0 H2 NaN 1200.0
CodePudding user response:
You could simply add line after what you have already done:
mergedf = df1.merge(df2, how='outer') # your current code
mergedf.sort_values(['run_id', 'time']) # add this
Read more here: https://stackoverflow.com/a/17141755/2650341
CodePudding user response:
You can use panda's merge_ordered
df_merged=pd.merge_ordered(df1,df2, on=['run_id','time'])
