5 AVL Trees |
In the last chapter, we designed and implemented a table ADT using binary search trees. We were interested in binary trees from the beginning because of their promise of speed compared to linear lists.
But we only get these speed improvements if our binary trees are arranged more or less optimally, with the tree's height as small as possible. If we insert and delete items in the tree in random order, then chances are that we'll come pretty close to this optimal tree.1
In “pathological” cases, search within binary search trees can be as slow as sequential search, or even slower when the extra bookkeeping needed for a binary tree is taken into account. For example, after inserting items into a BST in sorted order, we get something like the vines on the left and the right below. The BST in the middle below illustrates a more unusual case, a “zig-zag” BST that results from inserting items from alternating ends of an ordered list.
Unfortunately, these pathological cases can easily come up in practice, because sorted data in the input to a program is common. We could periodically balance the tree using some heuristic to detect that it is “too tall”. In the last chapter, in fact, we used a weak version of this idea, rebalancing when a stack overflow force it. We could abandon the idea of a binary search tree, using some other data structure. Finally, we could adopt some modifications to binary search trees that prevent the pathological case from occurring.
For the remainder of this book, we're only interested in the latter choice. We'll look at two sets of rules that, when applied to the basic structure of a binary search tree, ensure that the tree's height is kept within a constant factor of the minimum value. Although this is not as good as keeping the BST's height at its minimum, it comes pretty close, and the required operations are much faster. A tree arranged to rules such as these is called a balanced tree (see balanced tree). The operations used for minimizing tree height are said to rebalance (see rebalance) the tree, even though this is different from the sort of rebalancing we did in the previous chapter, and are said to maintain the tree's “balance.”
A balanced tree arranged according to the first set of rebalancing rules that we'll examine is called an AVL tree (see AVL tree), after its inventors, G. M. Adel'son-Vel'skii< and E. M. Landis. AVL trees are the subject of this chapter, and the next chapter will discuss red-black trees, another type of balanced tree.
In the following sections, we'll construct a table implementation based on AVL trees. Here's an outline of the AVL code:
143. <avl.h 143> = <Library License 1> #ifndef AVL_H #define AVL_H 1 #include <stddef.h> <Table types; tbl => avl 15> <AVL maximum height 145> <BST table structure; bst => avl 28> <AVL node structure 146> <BST traverser structure; bst => avl 62> <Table function prototypes; tbl => avl 16> #endif /* avl.h */
144. <avl.c 144> = <Library License 1> #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include "avl.h" <AVL functions 147>
See also: [Knuth 1998b], sections 6.2.2 and 6.2.3; [Cormen 1990], section 13.4.
[1] This seems true intuitively, but there are some difficult mathematics in this area. For details, refer to [Knuth 1998b] theorem 6.2.2H, [Knuth 1977], and [Knuth 1978].