Data:
d = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1]}
df = pd.DataFrame(d)
Desired Output:
d2 = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1], 'Top (First)': ['e','a'], 'Top (Second)': ['d','b'], 'Top (Third)': ['c','c']}
df2 = pd.DataFrame(d2)
I would like to create 3 columns - Top (First), Top (Second), Top (Third) that return the column names of the top 3 highest values for that row. I tried an np.where and even though that works, it becomes exponentially longer with more columns so looking for a simpler solution.
CodePudding user response:
You could use nlargest to find the 3 largest values, then get the index of the largest values (which are column names since we apply nlargest row-wise) and build DataFrame and join it back to df:
df2 = df.join(pd.DataFrame(df.apply(lambda x: x.nlargest(3).index.tolist(), axis=1).tolist(),
columns=['Top (First)','Top (Second)','Top (Third)']))
Output:
a b c d e Top (First) Top (Second) Top (Third)
0 1 2 3 4 5 e d c
1 5 4 3 2 1 a b c
CodePudding user response:
enke's answer is good enough solving the question within one library. Here's how you do with Numpy if you are feeling spicy lol.
import numpy as np
pd.concat([df,pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(), columns=['Top (First)', 'Top (Second)', 'Top (Third)'])], axis=1)
a b c d e Top (First) Top (Second) Top (Third)
0 1 2 3 4 5 e d c
1 5 4 3 2 1 a b c
