I have a pandas dataframe like this, where place1 and place2 are the indexes, I want to sort index place2 by data value, without disrupt the order of index place1, that is, the place2 column becomes a, c, b for each place1. Is there a way to do this?
place1 place2 data
sss a 100
b 300
c 200
ses a 400
b 600
c 500
... ... ...
I also want to calculate the data difference between place2 == b and place2 == a for each place1, that is, the dataframe becomes
place1 place2 data diff
sss a 100 200
b 300
c 200
ses a 400 200
b 600
c 500
... ... ... ...
Is there an efficient way to calculate the difference like this? Thanks in advance!
CodePudding user response:
You can .groupby() and then calculate the diff of each group:
groups = df.groupby(level=0, group_keys=False, sort=False)
df['diff'] = groups.transform(lambda x: x.xs('b', level='place2').iloc[0] - x.xs('a', level='place2').iloc[0])
It adds the value to each row of the group - but you can remove the "duplicates" afterwards if needed.
>>> df
data diff
place1 place2
sss a 100 200
b 300 200
c 200 200
ses a 400 200
b 600 200
c 500 200
You can then sort the data column within each group
>>> groups.apply(lambda x: x.sort_values('data'))
data diff
place1 place2
sss a 100 200
c 200 200
b 300 200
ses a 400 200
c 500 200
b 600 200
