Home > OS >  Python dataframe rank each column based on row values
Python dataframe rank each column based on row values

Time:01-31

I have a data frame. I want to rank each column based on its row value

Ex:

xdf = pd.DataFrame({'A':[10,20,30],'B':[5,30,20],'C':[15,3,8]})
xdf = 
    A   B   C
0  10   5  15
1  20  30   3
2  30  20   8

Expected result:

xdf = 
    A   B   C   Rk_1  Rk_2 Rk_3
0  10   5  15    C     A    B
1  20  30   3    B     A    C
2  30  20   8    A     B    C

OR 

xdf = 
    A   B   C   A_Rk  B_Rk C_Rk
0  10   5  15    2     3    1
1  20  30   3    2     1    2
2  30  20   8    1     2    3

Why I need this: I want to track the trend of each column and how it is changing. I would like to show this by the plot. Maybe a bar plot showing how many times A got Rank1, 2, 3, etc.

My approach:

xdf[['Rk_1','Rk_2','Rk_3']] = ""

for i in range(len(xdf)):
    xdf.loc[i,['Rk_1','Rk_2','Rk_3']] = dict(sorted(dict(xdf[['A','B','C']].loc[i]).items(),reverse=True,key=lambda item:item[1])).keys()

Present output:

    A   B   C Rk_1 Rk_2 Rk_3
0  10   5  15    C    A    B
1  20  30   3    B    A    C
2  30  20   8    A    B    C

I am iterating through each row, converting each row, column into a dictionary, sorting the values, and then extracting the keys (columns). Is there a better approach? My actual data frame has 10000 rows, 12 columns to be ranked. I just executed and it took around 2 minutes.

CodePudding user response:

You should be able to get your desired dataframe by using:

ranked = xdf.join(xdf.rank(ascending=False, method='first', axis=1), rsuffix='_rank')

This'll give you:

    A   B   C  A_rank  B_rank  C_rank
0  10   5  15     2.0     3.0     1.0
1  20  30   3     2.0     1.0     3.0
2  30  20   8     1.0     2.0     3.0

Then do whatever you need to do plotting wise.

  •  Tags:  
  • Related