I would like to create an extension function for a collection to check if one collection contains any item of defined set. I think about two implementations:
infix fun <T> Iterable<T>.containsAny(values: Iterable<T>): Boolean = any(values::contains)
or
infix fun <T> Iterable<T>.containsAny(values: Iterable<T>): Boolean = intersect(values).isNotEmpty()
The question is which way is more efficient and why? And is there any better solution?
CodePudding user response:
The first way with any is O(n*m) unless the parameter Iterable is a Set, in which case it's O(n).
The second way with intersect is O(n).
So the second way is much faster unless the parameter is already a Set or both inputs are so tiny that it's worth iterating repeatedly to avoid copying the receiver Iterable to a MutableSet.
The O(n) way could be improved to allow the early exit behavior of any by doing this:
infix fun <T> Iterable<T>.containsAny(values: Iterable<T>): Boolean =
any(values.toSet()::contains)
and further to avoid an unnecessary set copy:
infix fun <T> Iterable<T>.containsAny(values: Iterable<T>): Boolean =
any((values as? Set<T> ?: values.toSet())::contains)
And if the receiver Iterable is usually bigger than the parameter Iterable, you might want to swap which one is the set and which one is being iterated.
CodePudding user response:
Looking at the implementation intersect will take more memory as it creates an intermediate set.
My guess that any will be more efficient since it is quite straightforward. Ultimately it is better to just benchmark your use case with JMH.
