The human behind Windmill - A companion for seamless iOS development.

How To Think Of... A Hash Table (part 3)

The hashtable indeed works as expected.
We do have a key to retrieve a value, we do handle conflicts when putting one under the same index.

However, in a hashtable there is more than just looking up values based on keys.
Take a look at the first bucket.

0 > [0, 3, 6, 9]

Given the fixed size of the hashtable imagine what would happen if we were to store only values with keys that came up with an index of 0.

0 > [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30 ...]

The bucket would grow so large that every time we would have to iterate through the whole list to find the value for the given key. Also the remaining 2 buckets would go to waste.

Unless you have a collection of values that you need to store with predefined keys so that you can come up with the perfect hash function that distributes them evenly, you need to think of a way to grow your bucket list and at the same time recalculate the index for each value, what is known as rehashing.

The reason you need to calculate the index again for each value when rehashing is because every time you get and put a value given the key, the hashtable needs to figure out the index based on its size, which after resizing, has changed. Therefore it will come up with a different answer as to where the value is.

As you can tell, rehashing can be quite expensive since not only there is a need to iterate through every value in the hashtable but also recalculate every index.

What you get in return though is spreading the values across all the buckets, which was the problem that lead to all this.

The question then becomes when is the right time to rehash? The most obvious answer would be when there is one value in every bucket and no buckets are left. There is however something that is called a load factor which shows how full your hashtable is. The load factor is simply the ratio between the number of values and the hashtable size.

In the next part we'll see how the size of a hash table has an affect on rehashing, given a "bad collection" of keys.