Sort value and calculate difference for multiindex pandas dataframe-CodePudding

I have a pandas dataframe like this, where place1 and place2 are the indexes, I want to sort index place2 by data value, without disrupt the order of index place1, that is, the place2 column becomes a, c, b for each place1. Is there a way to do this?

place1 place2 data
 sss     a    100
         b    300
         c    200
 ses     a    400
         b    600
         c    500
  ...   ...   ...

I also want to calculate the data difference between place2 == b and place2 == a for each place1, that is, the dataframe becomes

place1 place2 data diff
 sss     a    100  200
         b    300
         c    200
 ses     a    400  200
         b    600
         c    500
  ...   ...   ...  ...

Is there an efficient way to calculate the difference like this? Thanks in advance!

CodePudding user response：

You can .groupby() and then calculate the diff of each group:

groups = df.groupby(level=0, group_keys=False, sort=False)
df['diff'] = groups.transform(lambda x: x.xs('b', level='place2').iloc[0] - x.xs('a', level='place2').iloc[0])

It adds the value to each row of the group - but you can remove the "duplicates" afterwards if needed.

>>> df
               data  diff
place1 place2            
sss    a        100   200
       b        300   200
       c        200   200
ses    a        400   200
       b        600   200
       c        500   200

You can then sort the data column within each group

>>> groups.apply(lambda x: x.sort_values('data'))
               data  diff
place1 place2            
sss    a        100   200
       c        200   200
       b        300   200
ses    a        400   200
       c        500   200
       b        600   200