Hashing

Why we need hashing?
- Hashing is usefull for searching
Hashing is a technique by which we can access key element data from hash table in O(1) time.
Here key should be mapped on hash table so its just like domain and range.
So element from domain are mapped to element on range, so it is like a relationship.
So mapping domain to range is done by a function.

There are 4 relational mapping

One-one mapping

Keys: 8, 3, 6, 10, 15, 9, 4
We have to map these keys to hash table and the function we are going to use is h(x) = x

We can say, we are not storing key directly, we are using hash function and that hash function is giving index and we are storing key at that index.
Now if we have to search for any key we will use hash function.
Let's search 9, h(9) = 9, so go to 9th index.
Functions supports two types of mapping:
1. one-one
2. many-one
h(x) = x, we call it as ideal hashing because the time taken for searching, storing or deleting an element is constant.

Drawback of ideal hashing

The space required is very huge.
If we have key value as 100 then hash table should have 100 as index also.

Now who is responsible for this draw back??

Hash function.

If we want to store key in small space then we have to modify hash function.

Let's see how we can modify hash function, if we are providing only 10 spaces from 0 to 9 then hash function should be modified.
h(x) = x % 10

Drawback of modulus hash function

In above table as we can see the key 25 should be stored in index 5 as h(25) = 5, but that index is already occupied.
There is no storage for 25, so two key are mapped at same index.
When two key are mapped at same location this is known as collision.
Hence h(x) = x % 10 is many-one function.

How to resolve these collisions

Methods for resolving collision

There are two major methods
1. Open hashing: Here we will consume extra space beyond the hash table given.
  - Here the method used is chaining.
2. Closed hashing: When space given is fixed and we have to use that space only, we will not increase the space.
  - Open addressing: If the given address is already occupied, then we will not forcely store other value in that space, we will use other free space.
  - So where will you store?
    - There are 3 options:
      1. Linear probing
      2. Quadratic probing
      3. Double hashing

Chaining

For resolving collision chaining is one of the mehod which comes under open hashing.

Now if we want to search 75, we will put it in hash function - h(75) = 5
Now search for 75 in chain of 5th index.

Linear probing

It is collision resolution technique which comes under closed hashing.
We will be using same hash function h(x) = x % 10, but if the index location is already occupied then linearly we will probe (search) the next empty space.
Our function was h(x) = x % 10
Modifying it to h'(x) = (h(x) + f(i)) % 10, where f(i) = i and i = 0, 1, 2, ...

Linear probing method is little time taking it is not O(1).

Quadratic probing

It is one the collision resoluting technique.
It comes under open addressing.
It is introduced to overcome the problem of clustering of key on after another.
h'(x) = (h(x) + f(i)) % 10 where f(i) = i² and i = 0, 1, 2, ...

Different hashing function

Mod
Mid square
Folding

Properties of hash functions

The hash function should be selected such that he value in hash table must be uniformly distributed whether you are using chaining, open addressing, linear probing, quadratic probing.
Hash function should be selected according to the data that we want to insert inside the hash table.
If we are using linear probing then the size of hash table should be double the number of element.

Mod

h(x) = (x % size ) + 1 ( if array index starts from 1).

Midsquare method

This method suggest that whatever the key is you do square of that key and take the middle digit.
Example: key = 11
then (11)² = 121
middle of 121 is 2, so 11 will be stored at index 2.

Folding method

In this method group of digits of key value are added together and the sum is the index where we have to store the key value. If the sum is still large then we can perform another hash function or we can again use folding method on it.

Example:

                                
key = 123347
 12
+33
+47
----- 
 92 
if 92 index is not avail then again perform folding method on 92
9 + 2 = 11 
store value in 11th index.