Software Product Development | Software Testing Tutorial | Software Process: Collision Resolution in Hashing

Showing posts with label Collision Resolution in Hashing. Show all posts

Tuesday, January 5, 2010

Open Addressing - a way to deal with collisions in hashing

Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table.
In this technique all elements are stored in the hash table itself. That is, each table entry contains either an element or NIL. When searching for element (or empty slot), we systematically examine slots until we find an element (or empty slot). There are no lists and no elements stored outside the table. This implies that table can completely "fill up"; the load factor α can never exceed 1.
Advantage of this technique is that it avoids pointers (pointers need space too). Instead of chasing pointers, we compute the sequence of slots to be examined.
To perform insertion, we successively examine or probe, the hash table until we find an empty slot. The sequence of slots probed "depends upon the key being inserted." To determine which slots to probe, the hash function includes the probe number as a second input.
Well known probe sequences include:
* Linear probing in which the interval between probes is fixed (usually 1).
* Quadratic probing in which the interval between probes increases by some constant (usually 1) after each probe.
* Double hashing in which the interval between probes is computed by another hash function.

Drawbacks :
- Open addressing schemes is that the number of stored entries cannot exceed the number of slots in the bucket array.
- Open addressing schemes also put more stringent requirements on the hash function.
- Open addressing only saves memory if the entries are small.
- Normal open addressing is a poor choice for large elements, since these elements fill entire cache lines, and a large amount of space is wasted on large empty table slots.

Separate Chaining

A scheme in which each position in the hash table has a list to handle collisions. Each position may be just a link to the list (direct chaining) or may be an item and a link, essentially, the head of a list. In the latter, one item is in the table, and other colliding items are in the list.
Also known as external chaining.

Separate chaining with list heads :
Some chaining implementations store the first record of each chain in the slot array itself. The purpose is to increase cache efficiency of hash table access. In order to save memory space, such hash tables are often designed to have about as many slots as stored entries, meaning that many slots will have two or more entries.

Separate chaining with other structures :
Instead of a list, one can use any other data structure that supports the required operations. By using a self-balancing tree, for example, the theoretical worst-case time of a hash table can be brought down to O(log n) rather than O(n). However, this approach is worth the trouble and extra memory cost only if long delays must be avoided at all costs or if one expects to have many entries hashed to the same slot.
The variant called array hashing uses a dynamic array to store all the entries that hash to the same bucket.Each inserted entry gets appended to the end of the dynamic array that is assigned to the hashed slot.

Monday, January 4, 2010

Overview of Collision Resolution in Hashing

Collisions are practically unavoidable when hashing a random subset of a large set of possible keys.Therefore, most hash table implementations have some collision resolution strategy to handle such events.

- Load factor : The performance of most collision resolution methods does not depend directly on the number n of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the size s of its bucket array. With a good hash function, the average look up cost is nearly constant as the load factor increases from 0 up to 0.7 or so. Beyond that point, the probability of collisions and the cost of handling them increases.
- Separate chaining : Each slot of the bucket array is a pointer to a linked list that contains the key-value pairs that hashed to the same location. Look-up requires scanning the list for an entry with the given key. Insertion requires appending a new entry record to either end of the list in the hashed slot. Deletion requires searching the list and removing the element.
* Chained hash tables with linked lists are popular because they require only basic data structures with simple algorithms, and can use simple hash functions that would be unsuitable for other methods.
* Chained hash tables remain effective even when the number of entries n is much higher than the number of slots.
* For separate-chaining, the worst-case scenario is when all entries were inserted into the same bucket, in which case the hash table is ineffective and the cost is that of searching the bucket data structure.
* Chained hash tables also inherit the disadvantages of linked lists.

Software Product Development | Software Testing Tutorial | Software Process

Tuesday, January 5, 2010

Open Addressing - a way to deal with collisions in hashing

Separate Chaining

Monday, January 4, 2010

Overview of Collision Resolution in Hashing

Facebook activity

Feed

Search Site

Popular Posts

Blog Archive

Contributors

Followers

Pageviews