Answers for Chapter 4 - GNU libavl 2.0.3

Chapter 4

1. This construct makes <bst.h 25> idempotent (see idempotent), that is, including it many times has the same effect as including it once. This is important because some C constructs, such as type definitions with typedef, are erroneous if included in a program multiple times.

Of course, <Table assertion function control directives 595> is included outside the #ifndef-protected part of <bst.h 25>. This is intentional (see Exercise 2.9-1 for details).

Section 4.2.2

1. Under many circumstances we often want to know how many items are in a binary tree. In these cases it's cheaper to keep track of item counts as we go instead of counting them each time, which requires a full binary tree traversal.

It would be better to omit it if we never needed to know how many items were in the tree, or if we only needed to know very seldom.

Section 4.2.3

1. The purpose for conditional definition of BST_MAX_HEIGHT is not to keep it from being redefined if the header file is included multiple times. There's a higher-level “include guard” for that (see Exercise 4-1), and, besides, identical definitions of a macro are okay in C. Instead, it is to allow the user to set the maximum height of binary trees by defining that macro before <bst.h 25> is #included. The limit can be adjusted upward for larger computers or downward for smaller ones.

The main pitfall is that a user program will use different values of BST_MAX_HEIGHT in different source files. This leads to undefined behavior. Less of a problem are definitions to invalid values, which will be caught at compile time by the compiler.

Section 4.3

2. The functions need to adjust the pointer from the rotated subtree's parent, so they take a double-pointer struct bst_node **. An alternative would be to accept two parameters: the rotated subtree's parent node and the bst_link[] index of the subtree.

/* Rotates right at *yp. */
static void 
rotate_right (struct bst_node **yp) 
{
  struct bst_node *y = *yp;
  struct bst_node *x = y->bst_link[0];
  y->bst_link[0] = x->bst_link[1];
  x->bst_link[1] = y;
  *yp = x;
}

/* Rotates left at *xp. */
static void 
rotate_left (struct bst_node **xp) 
{
  struct bst_node *x = *xp;
  struct bst_node *y = x->bst_link[1];
  x->bst_link[1] = y->bst_link[0];
  y->bst_link[0] = x;
  *xp = y;
}

Section 4.7

1. This is a dirty trick. The bst_root member of struct bst_table is not a struct bst_node, but we are pretending that it is by casting its address to struct bst_node *. We can get away with this only because the first member of struct bst_node * is bst_link, whose first element bst_link[0] is a struct bst_node *, the same type as bst_root. ANSI C guarantees that a pointer to a structure is a pointer to the structure's first member, so this is fine as long as we never try to access any member of *p except bst_link[0]. Trying to access other members would result in undefined behavior.

The reason that we want to do this at all is that it means that the tree's root is not a special case. Otherwise, we have to deal with the root separately from the rest of the nodes in the tree, because of its special status as the only node in the tree not pointed to by the bst_link[] member of a struct bst_node.

It is a good idea to get used to these kinds of pointer cast, because they are common in libavl.

As an alternative, we can declare an actual instance of struct bst_node, store the tree's bst_root into its bst_link[0], and copy its possibly updated value back into bst_root when done. This isn't very elegant, but it works. This technique is used much later in this book, in <TBST main copy function 281>. A different kind of alternative approach is used in Exercise 2.

2. Here, pointer-to-pointer q traverses the tree, starting with a pointer to the root, comparing each node found against item while looking for a null pointer. If an item equal to item is found, it returns a pointer to the item's data. Otherwise, q receives the address of the NULL pointer that becomes the new node, the new node is created, and a pointer to its data is returned.

622. <BST item insertion function, alternate version 622> =
void **
bst_probe (struct bst_table *tree, void *item) 
{
  struct bst_node **q;
  int cmp;

  assert (tree != NULL && item != NULL);

  for (q = &tree->bst_root; *q != NULL; q = &(*q)->bst_link[cmp > 0]) 
    {
      cmp = tree->bst_compare (item, (*q)->bst_data, tree->bst_param);
      if (cmp == 0)
        return &(*q)->bst_data;
    }

  *q = tree->bst_alloc->libavl_malloc (tree->bst_alloc, sizeof **q);
  if (*q == NULL)
    return NULL;

  (*q)->bst_link[0] = (*q)->bst_link[1] = NULL;
  (*q)->bst_data = item;
  tree->bst_count++;
  return &(*q)->bst_data;
}

3. The first item to be inserted have the value of the original tree's root. After that, at each step, we can insert either an item with the value of either child x of any node in the original tree corresponding to a node y already in the copy tree, as long as x's value is not already in the copy tree.

4. The function below traverses tree in “level order”. That is, it visits the root, then the root's children, then the children of the root's children, and so on, so that all the nodes at a particular level in the tree are visited in sequence.

See also: [Sedgewick 1998], Program 5.16.

623. <Level-order traversal 623> =
/* Calls visit for each of the nodes in tree in level order.
   Returns nonzero if successful, zero if out of memory. */
static int 
bst_traverse_level_order (struct bst_table *tree, bst_item_func *visit) 
{
  struct bst_node **queue;
  size_t head, tail;

  if (tree->bst_count == 0)
    return 1;

  queue = tree->bst_alloc->libavl_malloc (tree->bst_alloc, 
                                          sizeof *queue * tree->bst_count);
  if (queue == NULL)
    return 0;

  head = tail = 0;
  queue[head++] = tree->bst_root;
  while (head != tail) 
    {
      struct bst_node *cur = queue[tail++];
      visit (cur->bst_data, tree->bst_param);
      if (cur->bst_link[0] != NULL)
        queue[head++] = cur->bst_link[0];
      if (cur->bst_link[1] != NULL)
        queue[head++] = cur->bst_link[1];
    }
  tree->bst_alloc->libavl_free (tree->bst_alloc, queue);

  return 1;
}

Section 4.7.1

624. <Root insertion of existing node in arbitrary subtree 624> =
/* Performs root insertion of n at root within tree.
   Subtree root must not contain a node matching n.
   Returns nonzero only if successful. */
static int 
root_insert (struct bst_table *tree, struct bst_node **root, 
             struct bst_node *n) 
{
  struct bst_node *pa[BST_MAX_HEIGHT]; /* Nodes on stack. */
  unsigned char da[BST_MAX_HEIGHT];    /* Directions moved from stack nodes. */
  int k;                               /* Stack height. */

  struct bst_node *p; /* Traverses tree looking for insertion point. */

  assert (tree != NULL && n != NULL);

  <Step 1: Search for insertion point in arbitrary subtree 625>
  <Step 2: Insert n into arbitrary subtree 626>
  <Step 3: Move BST node to root 37>

  return 1;
}

625. <Step 1: Search for insertion point in arbitrary subtree 625> =
pa[0] = (struct bst_node *) root;
da[0] = 0;
k = 1;
for (p = *root; p != NULL; p = p->bst_link[da[k - 1]]) 
  {
    int cmp = tree->bst_compare (n->bst_data, p->bst_data, tree->bst_param);
    assert (cmp != 0);

    if (k >= BST_MAX_HEIGHT)
      return 0;

    pa[k] = p;
    da[k++] = cmp > 0;
  }

This code is included in 624.

626. <Step 2: Insert n into arbitrary subtree 626> =
pa[k - 1]->bst_link[da[k - 1]] = n;

This code is included in 624 and 627.

2. The idea is to optimize for the common case but allow for fallback to a slower algorithm that doesn't require a stack when necessary.

627. <Robust root insertion of existing node in arbitrary subtree 627> =
/* Performs root insertion of n at root within tree.
   Subtree root must not contain a node matching n.
   Never fails and will not rebalance tree. */
static void 
root_insert (struct bst_table *tree, struct bst_node **root, 
             struct bst_node *n) 
{
  struct bst_node *pa[BST_MAX_HEIGHT]; /* Nodes on stack. */
  unsigned char da[BST_MAX_HEIGHT];    /* Directions moved from stack nodes. */
  int k;                               /* Stack height. */
  int overflow = 0;                    /* Set nonzero if stack overflowed. */

  struct bst_node *p; /* Traverses tree looking for insertion point. */

  assert (tree != NULL && n != NULL);

  <Step 1: Robustly search for insertion point in arbitrary subtree 628>
  <Step 2: Insert n into arbitrary subtree 626>
  <Step 3: Robustly move BST node to root 629>
}

If the stack overflows while we're searching for the insertion point, we stop keeping track of any nodes but the last one and set overflow so that later we know that overflow occurred:

628. <Step 1: Robustly search for insertion point in arbitrary subtree 628> =
pa[0] = (struct bst_node *) root;
da[0] = 0;
k = 1;
for (p = *root; p != NULL; p = p->bst_link[da[k - 1]]) 
  {
    int cmp = tree->bst_compare (n->bst_data, p->bst_data, tree->bst_param);
    assert (cmp != 0);

    if (k >= BST_MAX_HEIGHT) 
      {
        overflow = 1;
        k--;
      }     

    pa[k] = p;
    da[k++] = cmp > 0;
  }

This code is included in 627.

Once we've inserted the node, we deal with the rotation in the same way as before if there was no overflow. If overflow occurred, we instead do the rotations one by one, with a full traversal from *root every time:

629. <Step 3: Robustly move BST node to root 629> =
if (!overflow)
  { 
    <Step 3: Move BST node to root 37> 
  }
else 
  {
    while (*root != n) 
      {
        struct bst_node **r; /* Link to node to rotate. */
        struct bst_node *q;  /* Node to rotate. */
        int dir;

        for (r = root; ; r = &q->bst_link[dir]) 
          {
            q = *r;
            dir = 0 < tree->bst_compare (n->bst_data, q->bst_data, 
                                         tree->bst_param);

            if (q->bst_link[dir] == n)
              break;
          }

        if (dir == 0) 
          {
            q->bst_link[0] = n->bst_link[1];
            n->bst_link[1] = q;
          } 
        else 
          {
            q->bst_link[1] = n->bst_link[0];
            n->bst_link[0] = q;
          }
        *r = n;
      }
  }

This code is included in 627.

3. One insertion order that does not require much stack is ascending order. If we insert 1...4 at the root in ascending order, for instance, we get a BST that looks like this:

If we then insert node 5, it will immediately be inserted as the right child of 4, and then a left rotation will make it the root, and we're back where we started without ever using more than one stack entry. Other obvious pathological orders such as descending order and “zig-zag” order behave similarly.

One insertion order that does require an arbitrary amount of stack space is to first insert 1...n in ascending order, then the single item 0. Each of the first group of insertions requires only one stack entry (except the first, which does not use any), but the final insertion uses n - 1.

If we're interested in high average consumption of stack space, the pattern consisting of a series of ascending insertions (n / 2 + 1)...n followed by a second ascending series 1...(n / 2), for even n, is most effective. For instance, each insertion for insertion order 6, 7, 8, 9, 10, 1, 2, 3, 4, 5 requires 0, 1, 1, 1, 1, 5, 6, 6, 6, 6 stack entries, respectively, for a total of 33.

These are, incidentally, the best possible results in each category, as determined by exhaustive search over the 10! == 3,628,800 possible root insertion orders for trees of 10 nodes. (Thanks to Richard Heathfield for suggesting exhaustive search.)

Section 4.8

1. Add this before the top-level else clause in <Step 2: Delete BST node 40>:

630. <Case 1.5 in BST deletion 630> =
else if (p->bst_link[0] == NULL) 
  q->bst_link[dir] = p->bst_link[1];

2. Be sure to look at Exercise 3 before actually making this change.

631. <Case 3 in BST deletion, alternate version 631> =
struct bst_node *s = r->bst_link[0];
while (s->bst_link[0] != NULL) 
  {
    r = s;
    s = r->bst_link[0];
  }
p->bst_data = s->bst_data;
r->bst_link[0] = s->bst_link[1];
p = s;

We could, indeed, make similar changes to the other cases, but for these cases the code would become more complicated, not simpler.

3. The semantics for libavl traversers only invalidate traversers with the deleted item selected, but the revised code would actually free the node of the successor to that item. Because struct bst_traverser keeps a pointer to the struct bst_node of the current item, attempts to use a traverser that had selected the successor of the deleted item would result in undefined behavior.

Some other binary tree libraries have looser semantics on their traversers, so they can afford to use this technique.

Section 4.9.1

1. It would probably be faster to check before each call rather than after, because this way many calls would be avoided. However, it might be more difficult to maintain the code, because we would have to remember to check for a null pointer before every call. For instance, the call to traverse_recursive() within walk() might easily be overlooked. Which is “better” is therefore a toss-up, dependent on a program's goals and the programmer's esthetic sense.

632. <Recursive traversal of BST, using nested function 632> =
void 
walk (struct bst_table *tree, bst_item_func *action, void *param) 
{
  void 
  traverse_recursive (struct bst_node *node) 
  {
    if (node != NULL) 
      {
        traverse_recursive (node->bst_link[0]);
        action (node->bst_data, param);
        traverse_recursive (node->bst_link[1]);
      }
  }

  assert (tree != NULL && action != NULL);
  traverse_recursive (tree->bst_root);
}

Section 4.9.2

1a. First of all, a minimal-height binary tree of n nodes has a height (see height) of about log2(n), that is, starting from the root and moving only downward, you can visit at most n nodes (including the root) without running out of nodes. Examination of the code should reveal to you that only moving down to the left pushes nodes on the stack and only moving upward pops nodes off. What's more, the first thing the code does is move as far down to the left as it can. So, the maximum height of the stack in a minimum-height binary tree of n nodes is the binary tree's height, or, again, about log2(n).

1b. If a binary tree has only left children, as does the BST on the left below, the stack will grow as tall as the tree, to a height of n. Conversely, if a binary tree has only right children, as does the BST on the right below, no nodes will be pushed onto the stack at all.

1c. It's only acceptable if it's known that the stack will not exceed the fixed maximum height (or if the program aborting with an error is itself acceptable). Otherwise, you should use a recursive method (but see part (e) below), or a dynamically extended stack, or a balanced binary tree library.

1d. Keep in mind this is not the only way or necessarily the best way to handle stack overflow. Our final code for tree traversal will rebalance the tree when it grows too tall.

633. <Iterative traversal of BST, with dynamically allocated stack 633> =
static void 
traverse_iterative (struct bst_node *node, bst_item_func *action, void *param) 
{
  struct bst_node **stack = NULL;
  size_t height = 0;
  size_t max_height = 0;

  for (;;) 
    {
      while (node != NULL) 
        {
          if (height >= max_height) 
            {
              max_height = max_height * 2 + 8;
              stack = realloc (stack, sizeof *stack * max_height);
              if (stack == NULL) 
                {
                  fprintf (stderr, "out of memory\n");
                  exit (EXIT_FAILURE);
                }
            }

          stack[height++] = node;
          node = node->bst_link[0];
        }

      if (height == 0)
        break;

      node = stack[--height];
      action (node->bst_data, param);
      node = node->bst_link[1];
    }

  free (stack);
}

1e. Yes, traverse_recursive() can run out of memory, because its arguments must be stored somewhere by the compiler. Given typical compilers, it will consume more memory per call than traverse_iterative() will per item on the stack, because each call includes two arguments not pushed on traverse_iterative()'s stack, plus any needed compiler-specific bookkeeping information.

Section 4.9.2.1

1. After calling bst_balance(), the structure of the binary tree may have changed completely, so we need to “find our place” again by setting up the traverser structure as if the traversal had been done on the rebalanced tree all along. Specifically, members node, stack[], and height of struct traverser need to be updated.

It is easy to set up struct traverser in this way, given the previous node in inorder traversal, which we'll call prev. Simply search the tree from the new root to find this node. Along the way, because the stack is used to record nodes whose left subtree we are examining, push nodes onto the stack as we move left down the tree. Member node receives prev->bst_link[1], just as it would have if no overflow had occurred.

A small problem with this approach is that it requires knowing the previous node in inorder, which is neither explicitly noted in struct traverser nor easy to find out. But it is easy to find out the next node: it is the smallest-valued node in the binary tree rooted at the node we were considering when the stack overflowed. (If you need convincing, refer to the code for next_item() above: the while loop descends to the left, pushing nodes as it goes, until it hits a NULL pointer, then the node pushed last is popped and returned.) So we can return this as the next node in inorder while setting up the traverser to return the nodes after it.

Here's the code:

634. <Handle stack overflow during BST traversal 634> =
struct bst_node *prev, *iter;

prev = node;
while (prev->bst_link[0] != NULL)
  prev = prev->bst_link[0];

bst_balance (trav->table);

trav->height = 0;
for (iter = trav->table->bst_root; iter != prev; )
  if (trav->table->bst_compare (prev->bst_data, iter->bst_data, 
                                trav->table->bst_param) < 0) 
    {
      trav->stack[trav->height++] = iter;
      iter = iter->bst_link[0];
    }
  else 
    iter = iter->bst_link[1];

trav->node = iter->bst_link[1];
return prev->bst_data;

Without this code, it is not necessary to have member table in struct traverser.

2. It is possible to write prev_item() given our current next_item(), but the result is not very efficient, for two reasons, both related to the way that struct traverser is used. First, the structure doesn't contain a pointer to the current item. Second, its stack doesn't contain pointers to trees that must be descended to the left to find a predecessor node, only those that must be descended to the right to find a successor node.

The next section will develop an alternate, more general method for traversal that avoids these problems.

Section 4.9.3

1. The bst_probe() function can't disturb any traversals. A change in the tree is only problematic for a traverser if it deletes the currently selected node (which is explicitly undefined: see Traversers) or if it shuffles around any of the nodes that are on the traverser's stack. An insertion into a tree only creates new leaves, so it can't cause either of those problems, and there's no need to increment the generation number.

The same logic applies to bst_t_insert(), presented later.

On the other hand, an insertion into the AVL and red-black trees discussed in the next two chapters can cause restructuring of the tree and thus potentially disturb ongoing traversals. For this reason, the insertion functions for AVL and red-black trees will increment the tree's generation number.

2. First, trav_refresh() is only called from bst_t_next() and bst_t_prev(), and these functions are mirrors of each other, so we need only show it for one of them.

Second, all of the traverser functions check the stack height, so these will not cause an item to be initialized at too high a height, nor will bst_t_next() or bst_t_prev() increase the stack height above its limit.

Since the traverser functions won't force a too-tall stack directly, this leaves the other functions. Only functions that modify the tree could cause problems, by pushing an item farther down in the tree.

There are only four functions that modify a tree. The insertion functions bst_probe() and bst_t_insert() can't cause problems, because they add leaves but never move around nodes. The deletion function bst_delete() does move around nodes in case 3, but it always moves them higher in the tree, never lower. Finally, bst_balance() always ensures that all nodes in the resultant tree are within the tree's height limit.

3. This won't work because the stack may contain pointers to nodes that have been deleted and whose memory have been freed. In ANSI C89 and C99, any use of a pointer to an object after the end of its lifetime results in undefined behavior, even seemingly innocuous uses such as pointer comparisons. What's worse, the memory for the node may already have been recycled for use for another, different node elsewhere in the tree.

This approach does work if there are never any deletions in the tree, or if we use some kind of generation number for each node that we store along with each stack entry. The latter would be overkill unless comparisons are very expensive and the traversals in changing trees are common. Another possibility would be to somehow only select this behavior if there have been no deletions in the binary tree since the traverser was last used. This could be done, for instance, with a second generation number in the binary tree incremented only on deletions, with a corresponding number kept in the traverser.

The following reimplements trav_refresh() to include this optimization. As noted, it will not work if there are any deletions in the tree. It does work for traversers that must be refreshed due to, e.g., rebalancing.

635. <BST traverser refresher, with caching 635> =
/* Refreshes the stack of parent pointers in trav
   and updates its generation number.
   Will *not* work if any deletions have occurred in the tree. */
static void 
trav_refresh (struct bst_traverser *trav) 
{
  assert (trav != NULL);

  trav->bst_generation = trav->bst_table->bst_generation;

  if (trav->bst_node != NULL) 
    {
      bst_comparison_func *cmp = trav->bst_table->bst_compare;
      void *param = trav->bst_table->bst_param;
      struct bst_node *node = trav->bst_node;
      struct bst_node *i = trav->bst_table->bst_root;
      size_t height = 0;

      if (trav->bst_height > 0 && i == trav->bst_stack[0])
        for (; height < trav->bst_height; height++) 
          {
            struct bst_node *next = trav->bst_stack[height + 1];
            if (i->bst_link[0] != next && i->bst_link[1] != next)
              break;
            i = next;
          }

      while (i != node) 
        {
          assert (height < BST_MAX_HEIGHT);
          assert (i != NULL);

          trav->bst_stack[height++] = i;
          i = i->bst_link[cmp (node->bst_data, i->bst_data, param) > 0];
        }

      trav->bst_height = height;
    }
}

Section 4.9.3.2

1. It only calls itself if it runs out of stack space. Its call to bst_balance() right before the recursive call ensures that the tree is short enough to fit within the stack, so the recursive call cannot overflow.

Section 4.9.3.6

1. The assignment statements are harmless, but memcpy() of overlapping regions produces undefined behavior.

Section 4.10.1

1a. Notice the use of & instead of && below. This ensures that both link fields get initialized, so that deallocation can be done in a simple way. If && were used instead then we wouldn't have any way to tell whether (*y)->bst_link[1] had been initialized.

636. <Robust recursive copy of BST, take 1 636> =
/* Stores in *y a new copy of tree rooted at x. 
   Returns nonzero if successful, or zero if memory was exhausted.*/
static int 
bst_robust_copy_recursive_1 (struct bst_node *x, struct bst_node **y) 
{
  if (x != NULL) 
    {
      *y = malloc (sizeof **y);
      if (*y == NULL)
        return 0;

      (*y)->bst_data = x->bst_data;
      if (!(bst_robust_copy_recursive_1 (x->bst_link[0], &(*y)->bst_link[0])
            & bst_robust_copy_recursive_1 (x->bst_link[1], 
                                           &(*y)->bst_link[1]))) 
        {
          bst_deallocate_recursive (*y);
          *y = NULL;
          return 0;
        }
    }
  else 
    *y = NULL;

  return 1;
}

Here's a needed auxiliary function:

637. <Recursive deallocation function 637> =
static void 
bst_deallocate_recursive (struct bst_node *node) 
{
  if (node == NULL)
    return;

  bst_deallocate_recursive (node->bst_link[0]);
  bst_deallocate_recursive (node->bst_link[1]);
  free (node);
}

1b.

638. <Robust recursive copy of BST, take 2 638> =
static struct bst_node error_node;

/* Makes and returns a new copy of tree rooted at x.
   If an allocation error occurs, returns &error_node. */
static struct bst_node *
bst_robust_copy_recursive_2 (struct bst_node *x) 
{
  struct bst_node *y;

  if (x == NULL)
    return NULL;

  y = malloc (sizeof *y);
  if (y == NULL)
    return &error_node;

  y->bst_data = x->bst_data;
  y->bst_link[0] = bst_robust_copy_recursive_2 (x->bst_link[0]);
  y->bst_link[1] = bst_robust_copy_recursive_2 (x->bst_link[1]);
  if (y->bst_link[0] == &error_node || y->bst_link[1] == &error_node) 
    {
      bst_deallocate_recursive (y);
      return &error_node;
    }

  return y;
}

2. Here's one way to do it, which is simple but perhaps not the fastest possible.

639. <Robust recursive copy of BST, take 3 639> =
/* Copies tree rooted at x to y, which latter is allocated but not 
   yet initialized. 
   Returns one if successful, zero if memory was exhausted. 
   In the latter case y is not freed but any partially allocated
   subtrees are. */
static int 
bst_robust_copy_recursive_3 (struct bst_node *x, struct bst_node *y) 
{
  y->bst_data = x->bst_data;

  if (x->bst_link[0] != NULL) 
    {
      y->bst_link[0] = malloc (sizeof *y->bst_link[0]);
      if (y->bst_link[0] == NULL)
        return 0;
      if (!bst_robust_copy_recursive_3 (x->bst_link[0], y->bst_link[0])) 
        {
          free (y->bst_link[0]);
          return 0;
        }
    }
  else 
    y->bst_link[0] = NULL;

  if (x->bst_link[1] != NULL) 
    {
      y->bst_link[1] = malloc (sizeof *y->bst_link[1]);
      if (y->bst_link[1] == NULL)
        return 0;
      if (!bst_robust_copy_recursive_3 (x->bst_link[1], y->bst_link[1])) 
        {
          bst_deallocate_recursive (y->bst_link[0]);
          free (y->bst_link[1]);
          return 0;
        }
    }
  else 
    y->bst_link[1] = NULL;

  return 1;      
}

Section 4.10.2

1. Here is one possibility.

640. <Intermediate step between bst_copy_recursive_2() and bst_copy_iterative() 640> =
/* Copies org to a newly created tree, which is returned. */
struct bst_table *
bst_copy_iterative (const struct bst_table *org) 
{
  struct bst_node *stack[2 * (BST_MAX_HEIGHT + 1)];
  int height = 0;

  struct bst_table *new;
  const struct bst_node *x;
  struct bst_node *y;

  new = bst_create (org->bst_compare, org->bst_param, org->bst_alloc);
  new->bst_count = org->bst_count;
  if (new->bst_count == 0)
    return new;

  x = (const struct bst_node *) &org->bst_root;
  y = (struct bst_node *) &new->bst_root;
  for (;;) 
    {
      while (x->bst_link[0] != NULL) 
        {
          y->bst_link[0] = 
            org->bst_alloc->libavl_malloc (org->bst_alloc,
                                           sizeof *y->bst_link[0]);
          stack[height++] = (struct bst_node *) x;
          stack[height++] = y;
          x = x->bst_link[0];
          y = y->bst_link[0];
        }
      y->bst_link[0] = NULL;

      for (;;) 
        {
          y->bst_data = x->bst_data;

          if (x->bst_link[1] != NULL) 
            {
              y->bst_link[1] = 
                org->bst_alloc->libavl_malloc (org->bst_alloc,
                                               sizeof *y->bst_link[1]);
              x = x->bst_link[1];
              y = y->bst_link[1];
              break;
            }
          else 
            y->bst_link[1] = NULL;

          if (height <= 2)
            return new;

          y = stack[--height];
          x = stack[--height];
        }
    }
}

Section 4.11.1

1. bst_copy() can set bst_data to NULL when memory allocation fails.

Section 4.13

1. Factoring out recursion is troublesome in this case. Writing the loop with an explicit stack exposes more explicitly the issue of stack overflow. Failure on stack overflow is not acceptable, because it would leave both trees in disarray, so we handle it by dropping back to a slower algorithm that does not require a stack.

This code also makes use of root_insert() from <Robust root insertion of existing node in arbitrary subtree 627>.

641. <BST join function, iterative version 641> =
/* Adds to tree all the nodes in the tree rooted at p. */
static void 
fallback_join (struct bst_table *tree, struct bst_node *p) 
{
  struct bst_node *q;

  for (; p != NULL; p = q)
    if (p->bst_link[0] == NULL) 
      {
        q = p->bst_link[1];
        p->bst_link[0] = p->bst_link[1] = NULL;
        root_insert (tree, &tree->bst_root, p);
      }
    else 
      {
        q = p->bst_link[0];
        p->bst_link[0] = q->bst_link[1];
        q->bst_link[1] = p;
      }
}

/* Joins a and b, which must be disjoint and have compatible 
   comparison functions.
   b is destroyed in the process. */
void 
bst_join (struct bst_table *ta, struct bst_table *tb) 
{
  size_t count = ta->bst_count + tb->bst_count;

  if (ta->bst_root == NULL)
    ta->bst_root = tb->bst_root;
  else if (tb->bst_root != NULL) 
    {
      struct bst_node **pa[BST_MAX_HEIGHT];
      struct bst_node *qa[BST_MAX_HEIGHT];
      int k = 0;

      pa[k] = &ta->bst_root;
      qa[k++] = tb->bst_root;
      while (k > 0) 
        {
          struct bst_node **a = pa[--k];
          struct bst_node *b = qa[k];

          for (;;) 
            {
              struct bst_node *b0 = b->bst_link[0];
              struct bst_node *b1 = b->bst_link[1];
              b->bst_link[0] = b->bst_link[1] = NULL;
              root_insert (ta, a, b);

              if (b1 != NULL) 
                {
                  if (k < BST_MAX_HEIGHT) 
                    {
                      pa[k] = &(*a)->bst_link[1];
                      qa[k] = b1;
                      if (*pa[k] != NULL)
                        k++;
                      else 
                        *pa[k] = qa[k];
                    } 
                  else 
                    {
                      int j;

                      fallback_join (ta, b0);
                      fallback_join (ta, b1);
                      for (j = 0; j < k; j++)
                        fallback_join (ta, qa[j]);

                      ta->bst_count = count;
                      free (tb);
                      bst_balance (ta);
                      return;
                    }
                }

              a = &(*a)->bst_link[0];
              b = b0;
              if (*a == NULL) 
                {
                  *a = b;
                  break;
                } 
              else if (b == NULL)
                break;
            }
        }
    }

  ta->bst_count = count;
  free (tb);
}

Section 4.14.1

1. Functions not used at all are bst_insert(), bst_replace(), bst_t_replace(), bst_malloc(), and bst_free().

Functions used explicitly within test() or functions that it calls are bst_create(), bst_find(), bst_probe(), bst_delete(), bst_t_init(), bst_t_first(), bst_t_last(), bst_t_insert(), bst_t_find(), bst_t_copy(), bst_t_next(), bst_t_prev(), bst_t_cur(), bst_copy(), and bst_destroy().

The trav_refresh() function is called indirectly by modifying the tree during traversal.

The copy_error_recovery() function is called if a memory allocation error occurs during bst_copy(). The bst_balance() function, and therefore also tree_to_vine(), vine_to_tree(), and compress(), are called if a stack overflow occurs. It is possible to force both these behaviors with command-line options to the test program.

2. Some kinds of errors mean that we can keep going and test other parts of the code. Other kinds of errors mean that something is deeply wrong, and returning without cleanup is the safest action short of terminating the program entirely. The third category is memory allocation errors. In our test program these are always caused intentionally in order to test out the BST functions' error recovery abilities, so a memory allocation error is not really an error at all, and we clean up and return successfully. (A real memory allocation error will cause the program to abort in the memory allocator. See the definition of mt_allocate() within <Memory tracker 127>.)

Section 4.14.1.1

1. The definition of size_t differs from one compiler to the next. All we know about it for sure is that it's an unsigned type appropriate for representing the size of an object. So we must convert it to some known type in order to pass it to printf(), because printf(), having a variable number of arguments, does not know what type to convert it into.

Incidentally, C99 solves this problem by providing a `z' modifier for printf() conversions, so that we could use "%zu" to print out size_t values without the need for a cast.

See also: [ISO 1999], section 7.19.6.1.

2. Yes.

Section 4.14.2

642. <Generate random permutation of integers 642> =
/* Fills the n elements of array[] with a random permutation of the
   integers between 0 and n - 1. */
static void 
permuted_integers (int array[], size_t n) 
{
  size_t i;

  for (i = 0; i < n; i++)
    array[i] = i;

  for (i = 0; i < n; i++) 
    {
      size_t j = i + (unsigned) rand () / (RAND_MAX / (n - i) + 1);
      int t = array[j];
      array[j] = array[i];
      array[i] = t;
    }
}

This code is included in 644.

2. All it takes is a preorder traversal. If the code below is confusing, try looking back at <Initialize smaller and larger within binary search tree 618>.

643. <Generate permutation for balanced tree 643> =
/* Generates a list of integers that produce a balanced tree when
   inserted in order into a binary tree in the usual way.
   min and max inclusively bound the values to be inserted.
   Output is deposited starting at *array. */
static void 
gen_balanced_tree (int min, int max, int **array) 
{
  int i;

  if (min > max)
    return;

  i = (min + max + 1) / 2;
  *(*array)++ = i;
  gen_balanced_tree (min, i - 1, array);
  gen_balanced_tree (i + 1, max, array);
}

This code is included in 644.

644. <Insertion and deletion order generation 644> =
<Generate random permutation of integers 642>
<Generate permutation for balanced tree 643>

/* Generates a permutation of the integers 0 to n - 1 into
   insert[] according to insert_order. */
static void 
gen_insertions (size_t n, enum insert_order insert_order, int insert[]) 
{
  size_t i;

  switch (insert_order) 
    {
    case INS_RANDOM:
      permuted_integers (insert, n);
      break;

    case INS_ASCENDING:
      for (i = 0; i < n; i++)
        insert[i] = i;
      break;

    case INS_DESCENDING:
      for (i = 0; i < n; i++)
        insert[i] = n - i - 1;
      break;

    case INS_BALANCED:
      gen_balanced_tree (0, n - 1, &insert);
      break;

    case INS_ZIGZAG:
      for (i = 0; i < n; i++)
        if (i % 2 == 0) 
          insert[i] = i / 2;
        else 
          insert[i] = n - i / 2 - 1;
      break;

    case INS_ASCENDING_SHIFTED:
      for (i = 0; i < n; i++) 
        {
           insert[i] = i + n / 2;
           if ((size_t) insert[i] >= n)
             insert[i] -= n;
        }
      break;

    case INS_CUSTOM:
      for (i = 0; i < n; i++)
        if (scanf ("%d", &insert[i]) == 0)
          fail ("error reading insertion order from stdin");
      break;

    default:
      assert (0);
    }
}

/* Generates a permutation of the integers 0 to n - 1 into
   delete[] according to delete_order and insert[]. */
static void 
gen_deletions (size_t n, enum delete_order delete_order,
               const int *insert, int *delete) 
{
  size_t i;

  switch (delete_order) 
    {
    case DEL_RANDOM:
      permuted_integers (delete, n);
      break;

    case DEL_REVERSE:
      for (i = 0; i < n; i++)
        delete[i] = insert[n - i - 1];
      break;

    case DEL_SAME:
      for (i = 0; i < n; i++)
        delete[i] = insert[i];
      break;

    case DEL_CUSTOM:
      for (i = 0; i < n; i++)
        if (scanf ("%d", &delete[i]) == 0)
          fail ("error reading deletion order from stdin");
      break;

    default:
      assert (0);
    }
}

This code is included in 98.

4. The function below is carefully designed. It uses time() to obtain the current time. The alternative clock() is a poor choice because it measures CPU time used, which is often more or less constant among runs. The actual value of a time_t is not portable, so it computes a “hash” of the bytes in it using a multiply-and-add technique. The factor used for multiplication normally comes out as 257, a prime and therefore a good candidate.

See also: [Knuth 1998a], section 3.2.1; [Aho 1986], section 7.6.

645. <Random number seeding 645> =
/* Choose and return an initial random seed based on the current time.
   Based on code by Lawrence Kirby <fred@genesis.demon.co.uk>. */
unsigned 
time_seed (void) 
{
  time_t timeval;	/* Current time. */
  unsigned char *ptr;	/* Type punned pointed into timeval. */
  unsigned seed;	/* Generated seed. */
  size_t i;

  timeval = time (NULL);
  ptr = (unsigned char *) &timeval;

  seed = 0;
  for (i = 0; i < sizeof timeval; i++)
    seed = seed * (UCHAR_MAX + 2u) + ptr[i];

  return seed;
}

This code is included in 98.

Section 4.14.3

646. <Overflow testers 125> +=
static int 
test_bst_t_last (struct bst_table *tree, int n) 
{
  struct bst_traverser trav;
  int *last;

  last = bst_t_last (&trav, tree);
  if (last == NULL || *last != n - 1) 
    {
      printf ("    Last item test failed: expected %d, got %d\n",
              n - 1, last != NULL ? *last : -1);
      return 0;
    }

  return 1;
}

static int 
test_bst_t_find (struct bst_table *tree, int n) 
{
  int i;

  for (i = 0; i < n; i++) 
    {
      struct bst_traverser trav;
      int *iter;

      iter = bst_t_find (&trav, tree, &i);
      if (iter == NULL || *iter != i) 
        {
          printf ("    Find item test failed: looked for %d, got %d\n",
                  i, iter != NULL ? *iter : -1);
          return 0;
        }
    }

  return 1;
}

static int 
test_bst_t_insert (struct bst_table *tree, int n) 
{
  int i;

  for (i = 0; i < n; i++) 
    {
      struct bst_traverser trav;
      int *iter;

      iter = bst_t_insert (&trav, tree, &i);
      if (iter == NULL || iter == &i || *iter != i) 
        {
          printf ("    Insert item test failed: inserted dup %d, got %d\n",
                  i, iter != NULL ? *iter : -1);
          return 0;
        }
    }

  return 1;
}

static int 
test_bst_t_next (struct bst_table *tree, int n) 
{
  struct bst_traverser trav;
  int i;

  bst_t_init (&trav, tree);
  for (i = 0; i < n; i++) 
    {
      int *iter = bst_t_next (&trav);
      if (iter == NULL || *iter != i) 
        {
          printf ("    Next item test failed: expected %d, got %d\n",
                  i, iter != NULL ? *iter : -1);
          return 0;
        }
    }

  return 1;
}

static int 
test_bst_t_prev (struct bst_table *tree, int n) 
{
  struct bst_traverser trav;
  int i;

  bst_t_init (&trav, tree);
  for (i = n - 1; i >= 0; i--) 
    {
      int *iter = bst_t_prev (&trav);
      if (iter == NULL || *iter != i) 
        {
          printf ("    Previous item test failed: expected %d, got %d\n",
                  i, iter != NULL ? *iter : -1);
          return 0;
        }
    }

  return 1;
}

static int 
test_bst_copy (struct bst_table *tree, int n) 
{
  struct bst_table *copy = bst_copy (tree, NULL, NULL, NULL);
  int okay = compare_trees (tree->bst_root, copy->bst_root);

  bst_destroy (copy, NULL);

  return okay;
}

Section 4.14.4

1. Attempting to apply an allocation policy to allocations of zero-byte blocks is silly. How could a failure be indicated, given that one of the successful results for an allocation of 0 bytes is NULL? At any rate, libavl never calls bst_allocate() with a size argument of 0.

See also: [ISO 1990], section 7.10.3.

Section 4.15

1. We'll use bsts_, short for “binary search tree with sentinel”, as the prefix for these functions. First, we need node and tree structures:

647. <BSTS structures 647> =
/* Node for binary search tree with sentinel. */
struct bsts_node 
  {
    struct bsts_node *link[2];
    int data;
  };

/* Binary search tree with sentinel. */
struct bsts_tree 
  {
    struct bsts_node *root;
    struct bsts_node sentinel;
    struct libavl_allocator *alloc;
  };

This code is included in 651.

Searching is simple:

648. <BSTS functions 648> =
/* Returns nonzero only if item is in tree. */
int 
bsts_find (struct bsts_tree *tree, int item) 
{
  const struct bsts_node *node;

  tree->sentinel.data = item;
  node = tree->root;
  while (item != node->data)
    if (item < node->data) 
      node = node->link[0];
    else 
      node = node->link[1];
  return node != &tree->sentinel;
}