std::map with std::vector as key -- complexity of lookup function-CodePudding

I have a set of N customers, indexed 0,...,N-1. Periodically, for some subset S of customers, I need to evaluate a function f(S). Computing f(S) is of linear complexity in |S|. The set S of customers is represented as an object of type std::vector<int>. The subsets that come up for evaluation can be of different size each time. [Since the order of customers in S does not matter, the set can as well be represented as an object of type std::set<int> or std::unordered_set<int>.]

In the underlying application, I may have the same subset S of customers come up multiple times for evaluation of f(S). Instead of incurring the needless linear complexity each time, I am looking to see if it would benefit from some sort of less computational burdensome lookup.

I am considering having a map of key-value pairs where the key is directly the vector of customers, std::vector<int> S and the value mapped to this key is f(S). That way, I am hoping that I can first check to see if a key already exists in the map, and if it does, I can look it up without having to compute f(.) again.

Having an std::map with std::vector as keys is well-defined. See, for e.g., here.

CPPReference indicates that map lookup time is logarithmic. But I suppose this is logarithmic in the number of keys where each key if of a constant length -- such as an int or a double, etc. How is the complexity affected where the key itself need not be of constant length and can be of arbitrary length upto size N?

Since the keys can themselves be of different sizes (subset of customers that come up for evaluation could be different each time), does this introduce any additional complexity in computing a hash function or the compare operation for the std::map? Is there any benefit to maintain the key as a binary array a fixed length N? This binary array is such that B_S[i]=1 if the ith customer is in set S, and it is 0 otherwise. Does this make the lookup any easier?

I am aware that ultimately the design choice between reevaluating f(S) each time versus using std::map would have to be done based on actual profiling of my application. However, before implementing both ideas (the std::map route is more difficult to code in my underlying application), I would like to know if there are any known pre-existing best-practices / benchmarks.

CodePudding user response：

Complexity of lookup in a map is O(log N) That is, roughly log N comparisons are needed when there are N elements in the map. The cost of the comparison itself adds to that linearly. For example when you compare M vectors with K elements, then there are roughly log N comparisons, each comparing M*K vector elements, ie in total O(M*K*log N).

However, asymptotic complexity is only that: Asymptotic complexity. When there are only a small number of elements in the map then lower order factors might outweigh the log N that only dominates for large N. Consequently, the actual runtime depends on your specific application and you need to measure to be sure.

Moreover, you shouldn't use vectors as keys in the first place. Its a waste of memory. Subsets of S can be enumerated with a n-bit integer when S has n elements (simply set the i-th bit when i-th element of S is in the subset). Comparing a single integer (or bitset) is surely more efficient than comparing vectors of integers.