Group by pandas to get path from source to end destination-CodePudding

I have the following table

Path ID	Lane ID	Customer	Source	Destination	Mode
1	1	Mumbai	Chicago	Berlin	Ship
1	2	Mumbai	Berlin	Mumbai	Air
2	1	Mumbai	Chicago	Berlin	Air
2	2	Mumbai	Berlin	Dubai	Air
2	3	Mumbai	Dubai	Mumbai	Ship

I want the following table

Path ID	Source	Site2	Site3	Destination	Lane1 Mode	Lane2 Mode	Lane3 Mode
1	Chicago	Berlin		Mumbai	Ship	Air
2	Chicago	Berlin	Dubai	Mumbai	Air	Air	Ship

How do I go about getting this table? I feel like groupby is obviously required but what after that? Not sure how to proceed from there. The dataset is really big so it also needs to be efficient. Any pointers would help :)

CodePudding user response：

First drop irrelevant columns. Then as @LiamRoberts notes, pivot will get us in the right shape. Then fix the "Destination" column with stack groupby last. Finally, fix the column names.

s = df.drop(columns='Customer').pivot(index=['Path ID'], columns=['Lane ID'])
s['Dest'] = s['Destination'].stack().groupby(level=0).last()
s = s.drop(columns='Destination').reset_index()
s.columns = [f'{c1}{c2}' for c1, c2 in s.columns]
s = (s.rename(columns={'Source1':'Source', 'Source2':'Site2', 
                       'Source3':'Site3', 'Dest':'Destination',
                      'Mode1':'Lane1 Mode', 'Mode2':'Lane2 Mode', 
                       'Mode3':'Lane3 Mode'})[['Path ID','Source','Site2','Site3','Destination',
                                               'Lane1 Mode','Lane2 Mode','Lane3 Mode']])

Output:

   Path ID   Source   Site2  Site3 Destination Lane1 Mode Lane2 Mode    Lane3 Mode  
0        1  Chicago  Berlin    NaN      Mumbai       Ship        Air           NaN  
1        2  Chicago  Berlin  Dubai      Mumbai        Air        Air          Ship

CodePudding user response：

It looks like you're actually pivoting the Lane Id and indexing with the Path Id. This should get you most of the way.

df.pivot(index="Path ID", columns=["Lane ID"])