How does std::unordered_map determine the location of a specific key in a hash table?-CodePudding

The documentation mentions that std::unordered_map uses a hash table. How does it achieve O(1) lookup of a specific key in the hash table? The only way I can think of is to store each key at an address computed from the hash value of the data it holds. If this is the case, how does it keep all of the keys close together in memory such that it does not get used up quickly? Furthermore, what if more than one std::unordered_map is used? How does the implementation guarantee that no two maps will ever compute hashes that boil down to the same address?

CodePudding user response：

O(1) does not mean that the algorithm's complexity is "one shot", or, in this case, a value in the map must be retrieved with a single lookup, of some sort, based on its key.

O(1) really means O(amortized constant), that is: it takes a constant amount of time to complete, on average.

One possible implementation for an unordered map would be a hash table containing a linked list of all values that have the same hash key, and the unordered map dynamically adjusts the size of the hash table, as needed. It is entirely possible that an occasional map look up will have to search a small linked list to find the right hash key. Also, occasionally, an insert or delete operation will take an additional amount of time to rehash the entire unordered map.

If you look up the details of unordered maps' methods in your reference material you will find it explicitly mentioned that all existing to an unordered map get invalidated if a modification triggers a rehash.

But, on average, an unordered map lookup is expected to take a constant amount of time.

CodePudding user response：

Typically a hash map keeps an array of buckets inside. A bucket, on the other hand is a list of entries. And so something like this:

template<class TKey, class TValue>
class HashMap {
    vector<vector<pair<TKey, TValue>>> Buckets;
};

Then when you do a lookup, it simply takes the key, computes its hash, say hash, goes to Buckets[hash % Buckets.size], which is of type vector<pair<TKey,TValue>> and does a linear search on it. This makes the amortized (average) lookup complexity constant, while in the worst case (e.g. bad hashing function) it is linear. And indeed, this is the guarantee you get with the unordered_map.

Note that the top level vector will eventually grow when you add elements (maybe even when you remove), in order to allow more entries and avoid collisions. Rehashing is likely to take place in such case. And so adding/removing elements is not trivial, and can be expensive from time to time.

Also note that since vector allocates memory on the heap under the hood, then there is no issue with using more than one map. They share nothing (well, they may share something, but it doesn't affect the behaviour). But even if an implementation does not use vector (which is not mandatory, its just an example), some form of dynamic malloc has to be used to guarantee that.