Hello I have 2 dataframes I want to combine
dataframe 1 :
| ID | A | B | C |
|---|---|---|---|
| row1 | 1 | 2 | 3 |
| row2 | 4 | 5 | 6 |
dataframe 2:
| ID | A | B | D |
|---|---|---|---|
| row1 | 6 | 7 | 8 |
and I want them to merge and replace values of the same row to the values on dataframe 2 like this:
| ID | A | B | C | D |
|---|---|---|---|---|
| row1 | 6 | 7 | 3 | 8 |
| row2 | 4 | 5 | 6 | null |
how do I do this? I tried merging and concatenation but it doesn't seem to work. Thank you
CodePudding user response:
Another method to merge your 2 dataframes:
>>> pd.concat([df1, df2]).groupby('ID', as_index=False).last()
ID A B C D
0 row1 6 7 3.0 8.0
1 row2 4 5 6.0 NaN
CodePudding user response:
Assuming ID is the index in both DataFrames (if not, make it so): There is actually a function combine_first():
out = df2.combine_first(df1)
>>> out
A B C D
ID
row1 6 7 3 8.0
row2 4 5 6 NaN
Notes:
- why is column
Dof typefloat? Because of thatNaN. - what if the rows are in different order, e.g.
df1hasrow2first and thenrow1? Not a problem at all and the result is exactly the same as above (with rows sorted). Tested withpandas=1.4.2and alsopandas=1.3.4.
