I often run into this situation while coding in Python, and am not sure which is more performant. Suppose I have a list l = [3, 13, 6, 8, 9, 53], and I want to use a list comprehension to create a new list that subtracts off the minimum value so that the lowest number is zero. I could do:
[x - min(l) for x in l]
On the other hand, I could do:
min_val = min(l)
[x - min_val for x in l]
Is it true that the first option causes min(l) to run for new item in the list, while the second option only calculates the minimum value once? Potentially, if l is a very long list, this could be a significant performance difference? On the other hand, perhaps with a shorter list, the creation of a variable in option 2 results in some overhead?
I guess my question is: how much of a difference does this make? I find the first option cleaner and more compact, but don't know if that comes at a performance cost.
Relatedly, is there any performance difference between:
new_list = []
for x in l:
new_list.append(x ** 2)
and:
new_list = [x ** 2 for x in l]
Does the latter directly translate into the former, or is there a difference in what goes on under the hood?
CodePudding user response:
There is this really cool module called timeit ... with wich you can ... timeit. Don't ask, measure:
import timeit
def a():
l = list(range(100))[::-1]
return [x - min(l) for x in l]
def b():
l = list(range(100))[::-1]
min_val = min(l)
return [x - min_val for x in l]
# do 100 calls to the given function, average results to avoid
# caching mishaps
print("min inside list comp: ", timeit.timeit(a, number=100))
print("min outside list comp:", timeit.timeit(b, number=100))
Output:
min inside list comp: 0.008259299998826464
min outside list comp: 0.0010576999993645586
CodePudding user response:
I prefer the timeit calls from the commandline in this case as this problem isn't too complicated:
$ python -m timeit --setup "x = [i for i in range(1000)]" "[i - min(x) for i in x]"
50 loops, best of 5: 8.07 msec per loop
$ python -m timeit --setup "x = [i for i in range(1000)]" "y = min(x);[i - y for i in x]"
10000 loops, best of 5: 36.8 usec per loop
Where you can see very clearly that there is a tremendous benefit to storing the minimum value before the list comprehension.
CodePudding user response:
Yes, min(l) will be called for every iteration in the list comprehension. You can test this using your own function, as @jarmod has mentioned.
Yes, this will add complexity compared to storing it in a variable, because min(l) is O(n). That makes your code with the method call inside comprehension O(n^2) vs just O(n) if you store it in a variable beforehand.
And yes, the other example (list comprehension vs append) is equivalent, as shown in the docs.
CodePudding user response:
It has a very large impact, you can run this quick test to see for yourself:
from datetime import datetime as dt
now = dt.now
l = []
#create list
for i in range(20000):
l.append(i)
#mark time
print(str(now().time()))
#save min first
min_val = min(l)
new = [x - min_val for x in l]
#mark time
print(str(now().time()))
#do min() each time
[x - min(l) for x in l]
#mark time
print(str(now().time()))
Output:
15:17:38.392065
15:17:38.393058
15:17:40.592543
The first ran instantly and the second took over two seconds and that is only for 20,000 items.
So practically a large difference even though it would be expected due to the change in time complexity. (O(n) to O(n^2))
CodePudding user response:
import time
from functools import lru_cache
l = [i for i in range(10000)]
min_val = min(l)
def f1():
min_l = [x - min(l) for x in l]
def f2():
min_l = [x - min_val for x in l]
@lru_cache
def lru_min(x):
return min(x)
if __name__ == '__main__':
start_time = time.time()
f1()
print("--- f1 runs in %s seconds ---" % (time.time() - start_time))
start_time = time.time()
f2()
print("--- f2 runs in %s seconds ---" % (time.time() - start_time))
This code results in
--- f1 runs in 0.7290558815002441 seconds ---
--- f2 runs in 0.0003840923309326172 seconds ---
