Binary Search Trees
Binary Search Trees
Most of the tree methods we have looked at operate on existing trees (count nodes, preorder traversal, etc.).- How are trees constructed?
- Well, we do have a constructor which creates a tree (binary or non-binary) from a data object (the root data item) and a pair or list of subtrees. But it's not clear how to use the constructor to create a tree from a set of data.
- Are there methods analogous to the add(Object) and add(Object, index) methods for lists, that is can we "add at the end" , "add at the beginning", or "add at a given position"?
- The problem is that because of its multidimensional structure, a tree doesn't have a beginning or and end, and it is difficult to identify a position in a tree just by giving a number.
- At another level, we can say that the problem is that the parent-child relationship could be used to represent many kinds of real-world relationships between objects. The rules for constructing trees may depend on what sort of relationship is being represented.
Definition A binary search tree is a binary tree in which:
- The data objects in the tree obey a linear ordering
- For each node n in the tree,
- The data items in n's left subtree are less than or equal to the data item in n, and
- The data items in n's right subtree are greater than or equal to the data item in n.
example A binary search tree containing integer data items:
Binary search trees have some interesting properties. For example, try applying an inorder traversal to the tree in my example. The result is:
13, 18, 23, 32, 36, 44, 54, 59, 64, 73, 81, 85, 92
The traversal produces a list of the elements in the tree, in increasing order. This is the result of the ordering property of a binary search tree.
We can also perform an efficient search of a binary search tree. The algorithm can be written iteratively or recursively. I'll use the name "contains" to be consistent with the contains method of List in Java. Assume that the objects in the tree implement Comparable.
boolean contains(Object target){
/* return true if target is found in this tree, false if it is not */
/* iterative version */
BinarySearchTree node = this;
boolean found = false;
while(!found && !node.isEmpty()){
if(target.compareTo(node.data) == 0)
found = true;
else if (target.compareTo(node.data) < 0) /* target is less than data in this node, so go left */
node = node.left;
else /* target is greater than data in this node, so go right */
node = node.right;
}
return found;
}
boolean contains(Object target){
/* return true if target is found in this tree, false if it is not */
/* recursive version */
if(isEmpty())
return false;
else if(getData().compareTo(target) == 0)
return true;
else if(getData().compareTo(target) > 0)
return getLeft().contains(target);
else
return getRight().contains(target);
}
Note the similarity with the binary search algorithm we wrote for a sorted list in an array. The recursive algorithm has the same four cases.
Q: What is the running time of the search algorithm?
Binary Search Tree Insertion
When adding an item to a binary search tree, it is essential that the ordering property of the tree be preserved. The following algorithm does so. A new item is always inserted by creating a new node which becomes a leaf in the tree.The algorithm follows the same path from the root that would be taken in a search for the object being inserted. The algorithm continues until it comes to an empty subtree. This is the insertion point. The empty subtree is replaced by a node containing the added item.
example Add 26 and 50 to the example tree.
void add(Object obj){ /* iterative version */
BinarySearchTree node = this;
boolean found = false;
while(!node.isEmpty()){ /* search for insertion point */
if (obj.compareTo(node.data) < 0) /* obj belongs in left subtree */
node = node.left;
else /* obj belongs in right subtree */
node = node.right;
}
node.data = obj;
node.left = new BinarySearchTree();
node.right = new BinarySearchTree();
}
void add(Object obj){ /* recursive version */
if(isEmpty()){
data = obj;
left = new BinarySearchTree();
right = new BinarySearchTree();
}
else if (obj.compareTo(data) < 0)
left.add(obj);
else
right.add(obj);
}
Q: What if obj.compareTo(data) == 0?
Q: What is the running time of the insertion algorithm?
Running time for both search and insert is O(h), where h is the height of the tree. How can we compare this with the running times of list algorithms, which are based on the number of nodes in the list?
In a binary tree,
log2n <= h <= n
So O(h) is somewhere between O(log n) and O(n).
In the worst case, the running time is O(n).
On average, the running time is O(log n). (Average over what?)
Conclusions:
- A binary search tree is an efficient data structure for storing an ordered set or list of data items.
- Sequential processing of the items can be performed by an inorder traversal in time O(n).
- On average, insertions and searches run in time O(n). This is better than any of the data structures (array or linked list) that we studied for implementation of a list.
- Drawback: In the worst case, insertions and searches may run in time O(n).
Question: How can an object be removed from a binary search tree?
Question: Is it possible to implement the operation get(i) (that is, access by index) from the List interface?
Question: Keeping the tree in balance is essential to achieve O(log n) performance. How can a binary search tree be kept in balance?
There are two approaches: on-the-fly balancing and periodic rebalancing. "On-the-fly" means that some balance check is performed on every insertion into the tree. If the insertion would violate some balance criteria, a local restructuring is performed immediately to bring the tree back into balance. AVL trees and red-black trees are two examples of binary search trees which, by definition, are never allowed to be out of balance. Both of them have insertion algorithms which perform the insertion and any necessary rebalancing in O(log n) time in the worst case. Because the tree is never too much out of balance, their search times are also O(log n).
An alternative approach is to perform insertions using the standard algorithm. Over time, the tree may become unbalanced. In that case, the entire tree is rebuilt, putting it back in balance. The rebuilding algorithm works by first putting all the data items from the tree into an array, using an inorder traversal. A new tree can then be constructed from the array using a recursive algorithm: Choose the middle item from the array and make it the root of the tree. Make a balanced tree from the left half of the array and make it the left subtree of the tree being constructed. Make a balanced tree from the right half of the array and make it the right subtree of the tree being constructed. Question: What is the running time of the rebalancing algorithm?
Question: Can a binary search tree be used to implement a Map? How?
Recall the two fundamental Map operations:
- Object get(Object key);
- Object put(Object key, Object value);