Columns names of top n values of each row in a pandas DataFrame-CodePudding

Data:
d = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1]}
df = pd.DataFrame(d)

Desired Output:

d2 = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1], 'Top (First)': ['e','a'], 'Top (Second)': ['d','b'], 'Top (Third)': ['c','c']}
df2 = pd.DataFrame(d2)

I would like to create 3 columns - Top (First), Top (Second), Top (Third) that return the column names of the top 3 highest values for that row. I tried an np.where and even though that works, it becomes exponentially longer with more columns so looking for a simpler solution.

CodePudding user response：

You could use nlargest to find the 3 largest values, then get the index of the largest values (which are column names since we apply nlargest row-wise) and build DataFrame and join it back to df:

df2 = df.join(pd.DataFrame(df.apply(lambda x: x.nlargest(3).index.tolist(), axis=1).tolist(), 
                           columns=['Top (First)','Top (Second)','Top (Third)']))

Output:

   a  b  c  d  e Top (First) Top (Second) Top (Third)
0  1  2  3  4  5           e            d           c
1  5  4  3  2  1           a            b           c

CodePudding user response：

enke's answer is good enough solving the question within one library. Here's how you do with Numpy if you are feeling spicy lol.

import numpy as np

pd.concat([df,pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(),  columns=['Top (First)', 'Top (Second)', 'Top (Third)'])], axis=1)

   a  b  c  d  e Top (First) Top (Second) Top (Third)
0  1  2  3  4  5           e            d           c
1  5  4  3  2  1           a            b           c