Personal tools
You are here: Home Classes Fall 2004 - Spring 2005 Old CS 160 Chaining; Hash table performance
Navigation
Log in


Forgot your password?
« May 2008 »
Su Mo Tu We Th Fr Sa
123
456789 10
11121314151617
18192021222324
25262728293031
 
Document Actions

Chaining; Hash table performance

by admin last modified 2005-05-25 15:40

Collision handling with chaining; Hash table performance

Chaining

Open addressing suffers from secondary collisions, which occur when two keys with different hash values collide at some point in their respective probe sequences.  Secondary collisions can be avoided by isolating the keys which map to a given hash value from all other keys in the table.  This can be done with the technique called chaining, which uses linked lists (chains) to handle collisions.  We'll consider two types of chaining:  chained scatter tables and separate chaining.

Chained scatter tables

As with open addressing, the hash table is an array of entries, with each entry containing a key and a value.  Empty slots in the table are those in which key = null.  In addition, each entry contains a next field, which can be used to implement linked lists within the table.  So we have

class ScatterTableEntry {
    Object key;
    Object value;
    ScatterTableEntry next;

    etc.
}

The hash table is an array of ScatterTableEntries.  It works as follows:  If a collision occurs on an insertion, a search is made for an unused slot in the table, which can be used for the new entry.  This slot is connected to the original target of the hash function in a linked list.  The idea is that it will be faster to search this linked list than to perform a linear search through the table, as we did with linear probing.  In more detail:

To insert (key0, value0):
  1. Compute the hashValue of key0;
  2. Let current = hashValue;
  3. If hashTable[current] is empty, then insert key0 and value0 there, set next = null, and return;
  4. If the key at hashTable[current] matches key0, then replace hashTable[current].value by value0 and return;
  5. If the key at hashTable[current] does not match key0, and hashTable[current].next is not null, let current = hashTable[current].next and go to step 4;
  6. If the key at hashTable[current] does not match key0, and hashTable[current].next is null, then begin looking for an empty slot;
  7. Let probe = (current+1)%tableSize;
  8. While hashTable[probe] is not empty, let probe = (probe+1)%tableSize;
  9. Enter key0 and value0 in hashTable[probe]; set hashTable[probe].next = null;
  10. Let hashTable[current].next = probe; 
Q:  How would a search work?

Q:  How does this method compare with open addressing?

The resulting linked lists may coalesce.  It is possible for keys with different hash values to wind up on the same chain.  How?

This method is often used in hash tables implemented in random-access files, with the following variation:  If a collision occurs, instead of probing for an empty slot in the table, a new record (called an overflow record) is written to the end of the file, and then chained to the other records containing keys with the same hash value

Separate chaining

Chained scatter tables avoid some, but not all, secondary collisions.  Separate chaining eliminates them altogether.

The basic idea is this:  Each position in the hash table is a linked list of entries, rather than just a single entry.  That is, the hash table is an array of linked lists.  When a (key, value) pair is inserted in the table, it is simply added to the linked list which it maps to.  It may collide with other keys with the same hash value, but it cannot collide with keys with different hash values, because they are in other linked lists.

This method takes advantage of dynamic memory allocation (as with the Java "new" operator), so it is normally used only when dynamic memory allocation is available.

To insert (key0, value0) in a chained hash table:
  1. Compute the hashValue of key0;
  2. Search the linked list at hashTable[hashValue] for an entry containing key0;
  3. If one is found, replace its value with value0;
  4. If one is not found, add (key0, value0) to this list;

To search the table for key0:
  1. Compute the hashValue of key0;
  2. Search the linked list at hashTable[hashValue] for an entry containing key0;
  3. If one is found, return its value;
  4. If one is not found, return null;


Hash table performance 

Recall that our goal in studying hashing was to find a data structure in which insertions and searches could be performed with a running time of O(1).  Have we achieved that goal?

Q:  What is the worst case performance of a hash table using open addressing?

Q:  What is the worst case performance of a hash table using chaining?

Q:  If O(1) performance cannot be guaranteed, why would anyone use hashing?


Consider the average running time of hash table insertions and searches.
  • The average search time depends on how full the table is.
  • Open addressing:  searches end when an empty slot is reached, so there must be some empty slots to achieve good performance
  • Chaining:  the more entries there are in the table, the longer the chains will be
  • So, consider the load factor, defined to be (number of keys in the table) / (number of slots in the table)
  • The higher the load factor, the slower the search time
What is the relationship between load factor and search time?  It can be analyzed mathematically.

For separate chaining, the analysis is easy:
  • Let f = the load factor.
  • Then, the average length of a chain is f.
  • The average search time for a successful search is f/2.
  • The average search time for an unsuccessful search is f.

For open addressing, the analysis depends on the type of probing that is used.  Search times depend not only on what keys are in the table, but in what order they were entered.
  • If we assume that every probe is independent of every other probe (as it is with rehashing but not double hashing or linear probing),  we can say that the empty slots in the table are dispersed randomly.
  • If the load factor is f, then the fraction of empty slots is 1-f.
  • The average number of probes for an unsuccessful search is 1/(1-f).
  • The average number of probes for a successful search is the same as the number of probes that were required to insert it.  So the average number of probes to search for the ith item inserted is equal to the average number of probes required to perform an unsuccessful search in a table with load factor i/n.  By averaging over all possible values of i, if follows that the average number of probes for a successful search is (1/f)ln(1/(1-f)).

The key to getting the best performance from a hash table is to keep the load factor from getting too high.  How can this be done?

If it reaches some threshhold, rebuild the table with more slots.

Q:  How expensive is this?

Conclusions:

  • A hash table provides the fastest possible implementation of a search table, as long as the load factor is not too high.  A hash table is a classic trade-off between space and time.  Time is saved, at the cost of extra space.
  • A hash table does not preserve ordering.  To produce an ordered list of the entries in a hash table it is necessary to sort them.
  • If both fast searching and fast sequential processing are necessary, a search tree is usually the best choice.  Database systems make heavy use of a type of search tree called a B-tree.



 

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: