I have the loan dataset below -
| Sector | Total Units | Bad units | Bad Rate |
|---|---|---|---|
| Retail Trade | 16 | 5 | 31% |
| Construction | 500 | 1100 | 20% |
| Healthcare | 165 | 55 | 33% |
| Mining | 3 | 2 | 67% |
| Utilities | 56 | 19 | 34% |
| Other | 300 | 44 | 15% |
How can I create a ranking function to sort this data based on the bad_rate while also accounting for the number of units ?
e.g This is the result when I sort in descending order based on bad_rate
| Sector | Total Units | Bad units | Bad Rate |
|---|---|---|---|
| Mining | 3 | 2 | 67% |
| Utilities | 56 | 19 | 34% |
| Healthcare | 165 | 55 | 33% |
| Retail Trade | 16 | 5 | 31% |
| Construction | 500 | 1100 | 20% |
| Other | 300 | 44 | 15% |
Here, Mining shows up first but I don't really care about this sector as it only has a total of 3 units. I would like construction, other and healthcare to show up on the top as they have more # of total as well as bad units
CodePudding user response:
STEP 1) is easy...
Use SORT("Range","ByColNumber","Order")
Just put it in the top left cell of where you want your sorted data.
=SORT(B3:E8,4,-1):
STEP 2)
Here's the tricky part... you need to decide how to weight the outage.
Here, I found multiplying the Rate% by the Total Unit Rank:
I think this approach gives pretty good results... you just need to play with the formula!
Please let me know what formula you eventually use!
CodePudding user response:
You would need to define sorting criteria, since you don't have a priority based on column, but instead a combination. I would suggest defining a function that weighs both columns: Total Units and Bad Rate. Using a weight function would be a good idea, but first, we would need to normalize both columns. For example put the data in a range 0-100, so we can weigh each column having similar values. Once you have the data normalized then you can use criteria like this:
w_1 * x w_2 * y
This is the main idea. Now to put this information in Excel. We create an additional temporary variable with the previous calculation and name it crit. We Define a user LAMBDA function SORT_BY for calculating crit as follows:
LAMBDA(a,b, wu*a wbr*b)
and we use MAP to calculate it with the normalized data. For convenience we define another user LAMBDA function to normalize the data: NORM as follows:
LAMBDA(x, 100*(x-MIN(x))/(MAX(x) - MIN(x)))
Here is the formula:
=LET(wu, 0.6, wbr, 0.8, u, B2:B7, br, D2:D7, SORT_BY, LAMBDA(a,b, wu*a wbr*b),
NORM, LAMBDA(x, 100*(x-MIN(x))/(MAX(x) - MIN(x))),
crit, MAP(NORM(u), NORM(br), LAMBDA(a,b, SORT_BY(a,b))),
DROP(SORT(HSTACK(A2:D7, crit),5,-1),,-1))
You can customize how to weight each column (via wu for Total Units and wbr for Bad Rates columns). Finally, we present the result removing the sorting criteria (crit) via the DROP function. If you want to show it, then remove this step.




