Subscribe by Email

Monday, January 4, 2010

Overview of Collision Resolution in Hashing

Collisions are practically unavoidable when hashing a random subset of a large set of possible keys.Therefore, most hash table implementations have some collision resolution strategy to handle such events.

- Load factor : The performance of most collision resolution methods does not depend directly on the number n of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the size s of its bucket array. With a good hash function, the average look up cost is nearly constant as the load factor increases from 0 up to 0.7 or so. Beyond that point, the probability of collisions and the cost of handling them increases.
- Separate chaining : Each slot of the bucket array is a pointer to a linked list that contains the key-value pairs that hashed to the same location. Look-up requires scanning the list for an entry with the given key. Insertion requires appending a new entry record to either end of the list in the hashed slot. Deletion requires searching the list and removing the element.
* Chained hash tables with linked lists are popular because they require only basic data structures with simple algorithms, and can use simple hash functions that would be unsuitable for other methods.
* Chained hash tables remain effective even when the number of entries n is much higher than the number of slots.
* For separate-chaining, the worst-case scenario is when all entries were inserted into the same bucket, in which case the hash table is ineffective and the cost is that of searching the bucket data structure.
* Chained hash tables also inherit the disadvantages of linked lists.

No comments:

Facebook activity