Redundant calculations in python list comprehension?-CodePudding

I often run into this situation while coding in Python, and am not sure which is more performant. Suppose I have a list l = [3, 13, 6, 8, 9, 53], and I want to use a list comprehension to create a new list that subtracts off the minimum value so that the lowest number is zero. I could do:

[x - min(l) for x in l]

On the other hand, I could do:

min_val = min(l)
[x - min_val for x in l]

Is it true that the first option causes min(l) to run for new item in the list, while the second option only calculates the minimum value once? Potentially, if l is a very long list, this could be a significant performance difference? On the other hand, perhaps with a shorter list, the creation of a variable in option 2 results in some overhead?

I guess my question is: how much of a difference does this make? I find the first option cleaner and more compact, but don't know if that comes at a performance cost.

Relatedly, is there any performance difference between:

new_list = []
for x in l:
    new_list.append(x ** 2)

and:

new_list = [x ** 2 for x in l]

Does the latter directly translate into the former, or is there a difference in what goes on under the hood?

CodePudding user response：

There is this really cool module called timeit ... with wich you can ... timeit. Don't ask, measure:

import timeit


def a():
    l = list(range(100))[::-1]
    return [x - min(l) for x in l]

def b():
    l = list(range(100))[::-1]
    min_val = min(l)
    return [x - min_val for x in l]



# do 100 calls to the given function, average results to avoid
# caching mishaps
print("min inside list comp: ", timeit.timeit(a, number=100))
print("min outside list comp:", timeit.timeit(b, number=100))

Output:

min inside list comp:  0.008259299998826464
min outside list comp: 0.0010576999993645586

See How to use timeit module

CodePudding user response：

I prefer the timeit calls from the commandline in this case as this problem isn't too complicated:

$ python -m timeit --setup "x = [i for i in range(1000)]" "[i - min(x) for i in x]"
50 loops, best of 5: 8.07 msec per loop
$ python -m timeit --setup "x = [i for i in range(1000)]" "y = min(x);[i - y for i in x]"
10000 loops, best of 5: 36.8 usec per loop

Where you can see very clearly that there is a tremendous benefit to storing the minimum value before the list comprehension.

CodePudding user response：

Yes, min(l) will be called for every iteration in the list comprehension. You can test this using your own function, as @jarmod has mentioned.

Yes, this will add complexity compared to storing it in a variable, because min(l) is O(n). That makes your code with the method call inside comprehension O(n^2) vs just O(n) if you store it in a variable beforehand.

And yes, the other example (list comprehension vs append) is equivalent, as shown in the docs.

CodePudding user response：

It has a very large impact, you can run this quick test to see for yourself:

from datetime import datetime as dt
now = dt.now
l = []

#create list
for i in range(20000):
    l.append(i)

#mark time
print(str(now().time()))

#save min first
min_val = min(l)
new = [x - min_val for x in l]

#mark time
print(str(now().time()))

#do min() each time
[x - min(l) for x in l]

#mark time
print(str(now().time()))

Output:

15:17:38.392065
15:17:38.393058
15:17:40.592543

The first ran instantly and the second took over two seconds and that is only for 20,000 items.

So practically a large difference even though it would be expected due to the change in time complexity. (O(n) to O(n^2))

CodePudding user response：

import time
from functools import lru_cache

l = [i for i in range(10000)]
min_val = min(l)

def f1():
    min_l = [x - min(l) for x in l]


def f2():
    min_l = [x - min_val for x in l]


@lru_cache
def lru_min(x):
    return min(x)

if __name__ == '__main__':
    start_time = time.time()
    f1()
    print("--- f1 runs in %s seconds ---" % (time.time() - start_time))
    start_time = time.time()
    f2()
    print("--- f2 runs in %s seconds ---" % (time.time() - start_time))

This code results in

--- f1 runs in 0.7290558815002441 seconds ---
--- f2 runs in 0.0003840923309326172 seconds ---