I want to achieve the following in general, and ideally without using a for loop to just step through each element since that seems inefficient. Every element of the list is going to be int type.
3 Examples:
In [1]: L1 = [3,6,19,6,3,3,19]
In [2]: f(L1)
Out [2]: [0,1,2,1,0,0,2]
In [3]: L2 = [0,1,2,4,1,4,1,0]
In [4]: f(L2)
Out [4]: [0,1,2,3,1,3,1,0]
In [5]: L3 = [2,3,3,4,2,2,3,4]
In [6]: f(L3)
Out [6]: [0,1,1,2,0,0,1,2]
The order of the numbers doesn't matter, but the relative frequency is important.
My current attempt works if there's no spacing between numbers - like in L3.
My attempt:
def f(L):
L = np.array(L) - min(L)
return list(L)
but this obviously doesn't handle differences the way I want it to. If I could avoid a for loop for each element of L and also avoid doing element wise comparison (like checking if the next lowest number after 19 is 6 in L1) that'd be ideal, but it's not obvious to me there's a way to avoid that.
CodePudding user response:
I don't think you can possibly do this in better than O(n) time, so trying to avoid a for loop over the list isn't useful. Here's the approach I'd take:
>>> def f(nums):
... m = {num: i for i, num in enumerate(sorted(set(nums)))}
... return [m[num] for num in nums]
...
>>> f([3,6,19,6,3,3,19])
[0, 1, 2, 1, 0, 0, 2]
>>> f([0,1,2,4,1,4,1,0])
[0, 1, 2, 3, 1, 3, 1, 0]
>>> f([2,3,4,4,2,2,3,4])
[0, 1, 2, 2, 0, 0, 1, 2]
Note that the sort is O(m log m), where m is the number of unique numbers, so in the worst case this is O(n log n).
CodePudding user response:
Note, if you are actually using numpy to begin with, you can use numpy.unique with the return_inverse=True arguments for this particular factorization:
>>> np.unique(L1, return_inverse=True)
(array([ 3, 6, 19]), array([0, 1, 2, 1, 0, 0, 2]))
Which should be very performant, but you should profile for your particular use-case. Note, this has the behavior you want because np.unique will return the sorted unique elements of the array.
