I have this problem:
A
| ID | Data |
|---|---|
| 1 | Something |
| 2 | Anything |
B
| ID | A_ID | Data |
|---|---|---|
| 1 | 1 | data1 |
| 2 | 1 | data2 |
| 3 | 2 | data3 |
A B
| ID | Data | B_ID_1 | B_Data | B_ID_2 | B_Data_2 |
|---|---|---|---|---|---|
| 1 | Something | 1 | data1 | 2 | data2 |
| 2 | Anything | 3 | data3 | NaN | NaN |
So basically, I want to join two tables, but without duplicating rows and creating new columns.
CodePudding user response:
Use GroupBy.cumcount for groups, pass to DataFrame.pivot and sorting MultiIndex in columns, flatten it and last join to A by DataFrame.join:
B['g'] = B.groupby('A_ID').cumcount().add(1)
B = B.pivot('A_ID','g').sort_index(axis=1, level=1, sort_remaining=False)
B.columns = B.columns.map(lambda x: f'{x[0]}_{x[1]}')
print (B)
ID_1 Data_1 ID_2 Data_2
A_ID
1 1.0 data1 2.0 data2
2 3.0 data3 NaN NaN
df = A.join(B, on='ID')
print (df)
ID Data ID_1 Data_1 ID_2 Data_2
0 1 Something 1.0 data1 2.0 data2
1 2 Anything 3.0 data3 NaN NaN
