Home > database >  pandas.DataFrame.idxmin - how to handle equal values
pandas.DataFrame.idxmin - how to handle equal values

Time:01-22

I have the following dataframe:

frame=pd.DataFrame(index=["Index_1","Index_2","Index_3","Index_4","Index_5","Index_6","Index_7"],
                   columns=["Header_1","Header_2","Rank_1","Rank_2"], data=[[4.06,4.34,1,1],[3.30,2.35,2,3],
                    [3.26,1.02,3,7],[3.14,2.40,4,2],[2.61,1.81,5,5],[2.52,2.33,6,4],[2.36,1.62,7,6]])

which results in:

         Header_1   Header_2    Rank_1  Rank_2
Index_1    4.06      4.34          1      1
Index_2    3.30      2.35          2      3
Index_3    3.26      1.02          3      7
Index_4    3.14      2.40          4      2
Index_5    2.61      1.81          5      5
Index_6    2.52      2.33          6      4
Index_7    2.36      1.62          7      6

The Rank columns (_1,_2) show the ranking of the values (say returns) in the header columns (_1,_2) when the latter are sorted in descending order. For example at Index_2 , Header 1 - we have the second highest value of 3.3 within Header_1 column , therefore the rank of 2 at column Rank_1.

For each Index (_1,_2,_3,_4....) I want to map either Rank_1 or Rank_2 depending on where the minimum value is; In essence this achieved via pandas.DataFrame.idxmin. If I use: frame.iloc[:,[2,3]].idxmin(axis=1) I arrive at:

Index_1    Rank_1
Index_2    Rank_1
Index_3    Rank_1
Index_4    Rank_2
Index_5    Rank_1
Index_6    Rank_2
Index_7    Rank_2

This is more or less the desired output; yet when there are equal ranks in Rank_1 and Rank_2 columns, such as at Index_1 or Index_5 I want the algorithm to check also the two Header columns and make final decision based on the highest value from Header_1 or Header_2. So at Index_1 we see that the ranks are equal (1); however Header_2 contains the higher value (4.34) compared to Header_1 (4.06). Therefore I would expect the final output to be as follows:

Index_1    Rank_2
Index_2    Rank_1
Index_3    Rank_1
Index_4    Rank_2
Index_5    Rank_1
Index_6    Rank_2
Index_7    Rank_2

Note: the difference at the last two outputs is just at Index_1.

CodePudding user response:

Add or subtract an epsilon value to Rank_1 according to Header_1 < Header_2 or not:

eps = 0.000001

out = frame.assign(Rank_1=np.where(df['Header_1'] < df['Header_2'],
                                   df['Rank_1']   eps, df['Rank_1'] - eps)
                  ).iloc[:,[2,3]].idxmin(axis=1)
print(out)

# Output
Index_1    Rank_2
Index_2    Rank_1
Index_3    Rank_1
Index_4    Rank_2
Index_5    Rank_1
Index_6    Rank_2
Index_7    Rank_2
dtype: object
  •  Tags:  
  • Related