Let's use a simple example: say I have a list of lists
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
and I want to find all longest lists, which means all lists that maximize the len function. Of course we can do
def func(x):
return len(x)
maxlen = func(max(ll, key=lambda x: func(x)))
res = [l for l in ll if func(l) == maxlen]
print(res)
Output
[[1, 2, 3], [2, 3, 4]]
But I wonder if there are more efficient way to do this, especially when the function is very expensive or the list is very long. Any suggestions?
CodePudding user response:
when the function is very expensive
Note that you do compute func(x) for each element twice, first there
maxlen = func(max(ll, key=lambda x: func(x)))
then there
res = [l for l in ll if func(l) == maxlen]
so it would be beneficial to store what was already computed. functools.lru_cache allow that easily just replace
def func(x):
return len(x)
using
import functools
@functools.lru_cache(maxsize=None)
def func(x):
return len(x)
However, beware as due to way how data are stored argument(s) must be hashable, so in your example you would first need convert list e.g. to tuples i.e.
ll = [(1, 2), (1, 3), (2, 3), (1, 2, 3), (2, 3, 4)]
See descripiton in docs for further discussion
CodePudding user response:
Is not OK use dictionary like below, (this is O(n))
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
from collections import defaultdict
dct = defaultdict(list)
for l in ll:
dct[len(l)].append(l)
dct[max(dct)]
Output:
[[1, 2, 3], [2, 3, 4]]
>>> dct
defaultdict(list, {2: [[1, 2], [1, 3], [2, 3]], 3: [[1, 2, 3], [2, 3, 4]]})
OR use setdefault and without defaultdict like below:
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
dct = {}
for l in ll:
dct.setdefault(len(l), []).append(l)
Output:
>>> dct
{2: [[1, 2], [1, 3], [2, 3]], 3: [[1, 2, 3], [2, 3, 4]]}
CodePudding user response:
From a computer science/algorithms perspective, this is a very classical "reduce" problem.
so, pseudocode. It's honestly very straightforward.
metric():= a mapping from elements to non-negative numbers
winner = []
maxmetric = 0
for element in ll:
if metric(element) larger than maxmetric:
winner = [ element ]
maxmetric = metric(element)
else if metric(element) equal to maxmetric:
append element to winner
