I have following tables in a pandas dataframe:
| ID1 | ID2 | Value1 |
|---|---|---|
| Data1 | Data2 | Data11 |
| ID1 | ID2 | Value2 |
|---|---|---|
| Data1 | Data2 | Data12 |
| ID1 | ID2 | Value3 |
|---|---|---|
| Data1 | Data2 | Data13 |
My aim is a table of form:
| ID1 | ID2 | Value1 | Value2 | Value3 |
|---|---|---|---|---|
| Data1 | Data2 | Data11 | Data12 | Data13 |
What is the easiest way to do that with python?
CodePudding user response:
Perform multiple merges in a chain:
df1.merge(df2.merge(df3))
# ID1 ID2 Value1 Value2 Value3
# 0 Data1 Data2 Data11 Data12 Data13
Or with functools.reduce syntax:
from functools import reduce
reduce(lambda L, R: L.merge(R, on=['ID1', 'ID2']), [df1, df2, df3])
CodePudding user response:
new_df = pd.concat([df1, df2, df3]).groupby(['ID1', 'ID2'], as_index=False).first()
Output:
>>> new_df
ID1 ID2 Value1 Value2 Value3
0 Data1 Data2 Data11 Data12 Data13
CodePudding user response:
It's very similar to @user17242583's answer. The only difference is, nth(0) will get the first even if it's NaN while first won't.
df_new = pd.concat([df1, df2, df3]).groupby(['ID1', 'ID2'], as_index=False).nth(0)
