In our example, when we add India to the dataset, it is appended to the linked list stored at the index 5, then our table would look like this. Hash collision handling by separate chaining, uses an additional data structure, preferrably linked list for dynamic allocation, into buckets. Two basic methods are used to handle collisions. If our keys were random words from English, where there are so many words with same length, using length as a hashing function would be fairly useless. If our dataset had a string with thousand characters, and you make an array of thousand indices to store the data, it would result in a wastage of space.
Since the index 5 is already occupied, we have to make a call on what to do with it. What if we have one another word with 5 characters, “India”, and try assigning it to an index using our hash function. Taking the length of a string is nice and fast, and so is the process of finding the value associated with a given key (certainly faster than doing up to five string comparisons).īut, what do we do if our dataset has a string which has more than 11 characters? We do waste a bit of space because, for example, there are no 1-letter keys in our data, nor keys between 8 and 10 letters.īut in this case, the wasted space isn’t so bad either. Our array needs to be big enough to accommodate the longest string, but in this case that’s only 11 slots. Now, in this specific example things work quite well. So we store Cuba in the 4th position in the keys array, and Havana in the 4th index of the values array etc. So to put an item in the hash table, we compute its hash code (in this case, simply count the number of characters), then put the key and value in the arrays at the corresponding index.įor example, Cuba has a hash code (length) of 4. So let’s say we want to store the data in Table in the map.Īnd let us suppose that our hash function is to simply take the length of the string.įor simplicity, we will have two arrays: one for our keys and one for the values. Purely as an example to help us grasp the concept, let us suppose that we want to map a list of string keys to string values (for example, map a list of countries to their capital cities). The hash code, which is an integer, is then mapped to the fixed size we have.
The key, which is used to identify the data, is given as an input to the hashing function. In hash tables, you store data in forms of key and value pairs. Generally, these hash codes are used to generate an index, at which the value is stored. This so-called hash code (or simply hash) can then be used as a way to narrow down our search when looking for the item in the map. Hashing means using some function or algorithm to map object data to some representative integer value. Hashing is a technique to make things more efficient by effectively narrowing down the search at the outset. Even if the list of words are lexicographically sorted, like in a dictionary, you will still need some time to find the word you are looking for. Hashing is designed to solve the problem of needing to efficiently find or store an item in a collection.įor example, if we have a list of 10,000 words of English and we want to check if a given word is in the list, it would be inefficient to successively compare the word with all 10,000 items until we find a match.