Home > Software engineering >  Pandas multiindex Ranking and Sorting
Pandas multiindex Ranking and Sorting

Time:01-14

I have a sorting problem.

TOT_ENERGY_CONSUMPTION REGION_CODE INDUSTRY_CODE
11289.15 110101 I65
11407.056 110101 M73
14898.249 110101 I63
9718.287 110102 M7d
9957.729923 110102 I6s
10653.603 110103 M7s
12673.787 110103 I6c
13101.499 110103 M7w

I would like my final output table be sorted to the following:

TOT_ENERGY_CONSUMPTION REGION_CODE INDUSTRY_CODE RANK
14898.249 110101 I65 1
11407.056 110101 M73 2
11289.15 110101 I63 3
9957.729923 110102 M7d 1
9718.287 110102 I6s 2
13101.499 110103 M7s 1
12673.787 110103 I6c 2
10653.603 110103 M7w 3

Sorting by fixing REGION_CODE and sort TOT_ENERGY_CONSUMPTION. Then add ranking to region_code

I tried several times and no fruit.

CodePudding user response:

Use groupby_rank then sort_values:

out = df.assign(RANK=df.groupby('REGION_CODE', sort=False)['TOT_ENERGY_CONSUMPTION']
                       .rank('dense', ascending=False).astype(int)) \
        .sort_values(['REGION_CODE', 'RANK'])
print(out)

# Output
   TOT_ENERGY_CONSUMPTION  REGION_CODE INDUSTRY_CODE  RANK
2            14898.249000       110101           I63     1
1            11407.056000       110101           M73     2
0            11289.150000       110101           I65     3
4             9957.729923       110102           I6s     1
3             9718.287000       110102           M7d     2
7            13101.499000       110103           M7w     1
6            12673.787000       110103           I6c     2
5            10653.603000       110103           M7s     3

CodePudding user response:

Use DataFrame.sort_values with GroupBy.cumcount:

df = df.sort_values(['REGION_CODE', 'TOT_ENERGY_CONSUMPTION'], 
                    ascending=[True, False],
                    ignore_index=True
                    )
df["rank"] = df.groupby(['REGION_CODE']).cumcount().add(1)
print (df)
   TOT_ENERGY_CONSUMPTION  REGION_CODE INDUSTRY_CODE  rank
0            14898.249000       110101           I63     1
1            11407.056000       110101           M73     2
2            11289.150000       110101           I65     3
3             9957.729923       110102           I6s     1
4             9718.287000       110102           M7d     2
5            13101.499000       110103           M7w     1
6            12673.787000       110103           I6c     2
7            10653.603000       110103           M7s     3

CodePudding user response:

This should be what you want:

df["rank"] = df.groupby(['REGION_CODE'])["TOT_ENERGY_CONSUMPTION"].rank("dense", ascending=False)
df = df.sort_values(['REGION_CODE','rank'])

which gives

TOT_ENERGY_CONSUMPTION  REGION_CODE INDUSTRY_CODE  rank
2            14898.249000       110101           I63   1.0
1            11407.056000       110101           M73   2.0
0            11289.150000       110101           I65   3.0
4             9957.729923       110102           I6s   1.0
3             9718.287000       110102           M7d   2.0
7            13101.499000       110103           M7w   1.0
6            12673.787000       110103           I6c   2.0
5            10653.603000       110103           M7s   3.0
  •  Tags:  
  • Related