I have the following table
| Path ID | Lane ID | Customer | Source | Destination | Mode |
|---|---|---|---|---|---|
| 1 | 1 | Mumbai | Chicago | Berlin | Ship |
| 1 | 2 | Mumbai | Berlin | Mumbai | Air |
| 2 | 1 | Mumbai | Chicago | Berlin | Air |
| 2 | 2 | Mumbai | Berlin | Dubai | Air |
| 2 | 3 | Mumbai | Dubai | Mumbai | Ship |
I want the following table
| Path ID | Source | Site2 | Site3 | Destination | Lane1 Mode | Lane2 Mode | Lane3 Mode |
|---|---|---|---|---|---|---|---|
| 1 | Chicago | Berlin | Mumbai | Ship | Air | ||
| 2 | Chicago | Berlin | Dubai | Mumbai | Air | Air | Ship |
How do I go about getting this table? I feel like groupby is obviously required but what after that? Not sure how to proceed from there. The dataset is really big so it also needs to be efficient. Any pointers would help :)
CodePudding user response:
First drop irrelevant columns. Then as @LiamRoberts notes, pivot will get us in the right shape. Then fix the "Destination" column with stack groupby last. Finally, fix the column names.
s = df.drop(columns='Customer').pivot(index=['Path ID'], columns=['Lane ID'])
s['Dest'] = s['Destination'].stack().groupby(level=0).last()
s = s.drop(columns='Destination').reset_index()
s.columns = [f'{c1}{c2}' for c1, c2 in s.columns]
s = (s.rename(columns={'Source1':'Source', 'Source2':'Site2',
'Source3':'Site3', 'Dest':'Destination',
'Mode1':'Lane1 Mode', 'Mode2':'Lane2 Mode',
'Mode3':'Lane3 Mode'})[['Path ID','Source','Site2','Site3','Destination',
'Lane1 Mode','Lane2 Mode','Lane3 Mode']])
Output:
Path ID Source Site2 Site3 Destination Lane1 Mode Lane2 Mode Lane3 Mode
0 1 Chicago Berlin NaN Mumbai Ship Air NaN
1 2 Chicago Berlin Dubai Mumbai Air Air Ship
CodePudding user response:
It looks like you're actually pivoting the Lane Id and indexing with the Path Id. This should get you most of the way.
df.pivot(index="Path ID", columns=["Lane ID"])
