Software Engineer

I am a Software Engineer. I have a Bachelor (Honours) of Science in Information Technology from the University of Sunderland - Class of 2003. I have been developing software since 2001 when I was offered a role at CERN as part of their Technical Student Programme.

By 2016 I had grown really tired of the software industry and by the end of 2019 Apple killed whatever excitement I had left. I am not sure what the next 10 years will bring. What I do know is that my apettite to do work that is impactful has only grown bigger and stronger. Great people make me tick more than anything.

I am also tired.

How to think of... a hash table (Part 3)

/**
     * The well known hash table, now with rehashing support
     */
    public final class HashTable 
    {
        private BucketList bucketList;

        private HashTable(BucketList buckets) 
        {
            this.bucketList = buckets;
        }

        private void rehash() {
            this.bucketList = BucketList.newBucketList(this.bucketList);
        }

        /*
         * If we run out of buckets let's make some more 
         */
        private void ensureAvailableBuckets() 
        {
            if(this.bucketList.isLoaded()) {
                this.rehash();
            }
        }
        
        public void put(int key, int value) 
        {
            this.ensureAvailableBuckets();
            
            int index = this.bucketList.bucketOf(key);      

            this.bucketList.put(index, key, value);
        }
    }

    /**
     * A bucket list as you can tell has a hash function
     */
    final class BucketList
    {
        private final HashFunction hashFunction;
        private final Value[] buckets;
        private int count;  //keeping track of values added to know the load factor

        private BucketList(Value[] buckets, HashFunction hashFunction) 
        {
            this.buckets = buckets;
            this.hashFunction = hashFunction;
        }   

        float load(){
        return this.count / this.size();
        }

        int bucketOf(int key) {
        return this.hashFunction.indexOf(key, this.size());
        }

        boolean isLoaded() {
        return this.hashFunction.isLoaded( this.load() );
        }

        int size() {
        return this.buckets.length;
        }

        public void put(int index, int key, int value) 
        {       
            Value current = this.buckets[index];
            
            this.buckets[index] = new Value(key, value, current);
            this.count++;
        }
    }

The hashtable indeed works as expected. We do have a key to retrieve a value, we do handle conflicts when putting one under the same index.

However, in a hashtable there is more than just looking up values based on keys. Take a look at the first bucket.

0 > [0, 3, 6, 9]

Given the fixed size of the hashtable imagine what would happen if we were to store only values with keys that came up with an index of 0.

0 > [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30 ...]

The bucket would grow so large that every time we would have to iterate through the whole list to find the value for the given key. Also the remaining 2 buckets would go to waste.

Unless you have a collection of values that you need to store with predefined keys so that you can come up with the perfect hash function that distributes them evenly, you need to think of a way to grow your bucket list and at the same time recalculate the index for each value, what is known as rehashing.

    /**
     * A hash function is a bucket list's close friend 
     */
    public interface HashFunction
    {
	public int indexOf(int key, int bucketSize);
	public boolean isLoaded(float load);
	public int newSize(int size);
    }

The reason you need to calculate the index again for each value when rehashing is because every time you get and put a value given the key, the hashtable needs to figure out the index based on its size, which after resizing, has changed. Therefore it will come up with a different answer as to where the value is.

	/* initial size of 3 */
	hashtable.put(6, 6); 
	//6 % 3 = 0
	
	/* after resizing to 5 */	
	hashtable.get(6);
	//6 % 5 = 1

As you can tell, rehashing can be quite expensive since not only there is a need to iterate through every value in the hashtable but also recalculate every index.

	public static BucketList newBucketList(int size, HashFunction hashFunction) {
	return new BucketList(new Value[size], hashFunction);
	}

	public static BucketList newBucketList(BucketList bucketList) 
	{
		int size = bucketList.hashFunction.newSize( bucketList.size() );
		
		BucketList newBucketList = 
			newBucketList(size, bucketList.hashFunction);		
		
		for (Value value : bucketList.buckets)
		{
			while(value != null) 
			{
				int index = newBucketList.bucketOf(value.key);
				newBucketList.put(index, value.key, value.value);
				
				value = value.next;
			}				
		}

	return newBucketList;
	}

What you get in return though is spreading the values across all the buckets, which was the problem that lead to all this.

The question then becomes when is the right time to rehash? The most obvious answer would be when there is one value in every bucket and no buckets are left. There is however something that is called a load factor which shows how full your hashtable is. The load factor is simply the ratio between the number of values and the hashtable size.

In the next part we’ll see how the size of a hash table has an affect on rehashing, given a “bad collection” of keys.