Copying a TBST - GNU libavl 2.0.3

7.9 Copying

We can use essentially the same algorithm to copy threaded BSTs as unthreaded (see <BST copy function 84>). Some modifications are necessary, of course. The most obvious change is that the threads must be set up. This is not hard. We can do it the same way that tbst_probe() does.

Less obvious is the way to get rid of the stack. In bst_copy(), the stack was used to keep track of as yet incompletely processed parents of the current node. When we came back to one of these nodes, we did the actual copy of the node data, then visited the node's right subtree, if non-empty.

In a threaded tree, we can replace the use of the stack by the use of threads. Instead of popping an item off the stack when we can't move down in the tree any further, we follow the node's right thread. This brings us up to an ancestor (parent, grandparent, ...) of the node, which we can then deal with in the same way as before.

This diagram shows the threads that would be followed to find parents in copying a couple of different threaded binary trees. Of course, the TBSTs would have complete sets of threads, but only the ones that are followed are shown:

Why does following the right thread from a node bring us to one of the node's ancestors? Consider the algorithm for finding the successor of a node with no right child, described earlier (see Better Iterative Traversal). This algorithm just moves up the tree from a node to its parent, grandparent, etc., guaranteeing that the successor will be a ancestor of the original node.

How do we know that following the right thread won't take us too far up the tree and skip copying some subtree? Because we only move up to the right one time using that same algorithm. When we move up to the left, we're going back to some binary tree whose right subtree we've already dealt with (we are currently in the right subtree of that binary tree, so of course we've dealt with it).

In conclusion, following the right thread always takes us to just the node whose right subtree we want to copy next. Of course, if that node happens to have an empty right subtree, then there is nothing to do, so we just continue along the next right thread, and so on.

The first step is to build a function to copy a single node. The following function copy_node() does this, creating a new node as the child of an existing node:

279. <TBST node copy function 279> =
/* Creates a new node as a child of dst on side dir.
   Copies data from src into the new node, applying copy(), if non-null.
   Returns nonzero only if fully successful.
   Regardless of success, integrity of the tree structure is assured,
   though failure may leave a null pointer in a tbst_data member. */
static int 
copy_node (struct tbst_table *tree, 
           struct tbst_node *dst, int dir,
           const struct tbst_node *src, tbst_copy_func *copy) 
{
  struct tbst_node *new = 
    tree->tbst_alloc->libavl_malloc (tree->tbst_alloc, sizeof *new);
  if (new == NULL)
    return 0;

  new->tbst_link[dir] = dst->tbst_link[dir];
  new->tbst_tag[dir] = TBST_THREAD;
  new->tbst_link[!dir] = dst;
  new->tbst_tag[!dir] = TBST_THREAD;
  dst->tbst_link[dir] = new;
  dst->tbst_tag[dir] = TBST_CHILD;

  if (copy == NULL)
    new->tbst_data = src->tbst_data;
  else 
    {
      new->tbst_data = copy (src->tbst_data, tree->tbst_param);
      if (new->tbst_data == NULL)
        return 0;
    }

  return 1;
}

This code is included in 280.

Using the node copy function above, constructing the tree copy function is easy. In fact, the code is considerably easier to read than our original function to iteratively copy an unthreaded binary tree (see Handling Errors in Iterative BST Copying), because this function is not as heavily optimized.

One tricky part is getting the copy started. We can't use the dirty trick from bst_copy() of casting the address of a bst_root to a node pointer, because we need access to the first tag as well as the first link (see Exercise 2 for a way to sidestep this problem). So instead we use a couple of “pseudo-root” nodes rp and rq, allocated locally.

280. <TBST copy function 280> =
<TBST node copy function 279>
<TBST copy error helper function 282>
<TBST main copy function 281>

This code is included in 253.

281. <TBST main copy function 281> =
struct tbst_table *
tbst_copy (const struct tbst_table *org, tbst_copy_func *copy,
          tbst_item_func *destroy, struct libavl_allocator *allocator) 
{
  struct tbst_table *new;

  const struct tbst_node *p;
  struct tbst_node *q;
  struct tbst_node rp, rq;

  assert (org != NULL);
  new = tbst_create (org->tbst_compare, org->tbst_param,
                     allocator != NULL ? allocator : org->tbst_alloc);
  if (new == NULL)
    return NULL;

  new->tbst_count = org->tbst_count;
  if (new->tbst_count == 0)
    return new;

  p = &rp;
  rp.tbst_link[0] = org->tbst_root;
  rp.tbst_tag[0] = TBST_CHILD;

  q = &rq;
  rq.tbst_link[0] = NULL;
  rq.tbst_tag[0] = TBST_THREAD;

  for (;;) 
    {
      if (p->tbst_tag[0] == TBST_CHILD) 
        {
          if (!copy_node (new, q, 0, p->tbst_link[0], copy)) 
            {
              copy_error_recovery (rq.tbst_link[0], new, destroy);
              return NULL;
            }

          p = p->tbst_link[0];
          q = q->tbst_link[0];
        } 
      else 
        {
          while (p->tbst_tag[1] == TBST_THREAD) 
            {
              p = p->tbst_link[1];
              if (p == NULL) 
                {
                  q->tbst_link[1] = NULL;
                  new->tbst_root = rq.tbst_link[0];
                  return new;
                }

              q = q->tbst_link[1];
            }

          p = p->tbst_link[1];
          q = q->tbst_link[1];
        }

      if (p->tbst_tag[1] == TBST_CHILD)
        if (!copy_node (new, q, 1, p->tbst_link[1], copy)) 
          {
            copy_error_recovery (rq.tbst_link[0], new, destroy);
            return NULL;
          }
    }
}

This code is included in 280 and 331.

A sensitive issue in the code above is treatment of the final thread. The initial call to copy_node() causes a right thread to point to rq, but it needs to be a null pointer. We need to perform this kind of transformation:

When the copy is successful, this is just a matter of setting the final q's right child pointer to NULL, but when it is unsuccessful we have to find the pointer in question, which is in the greatest node in the tree so far (to see this, try constructing a few threaded BSTs by hand on paper). Function copy_error_recovery() does this, as well as destroying the tree. It also handles the case of failure when no nodes have yet been added to the tree:

282. <TBST copy error helper function 282> =
static void 
copy_error_recovery (struct tbst_node *p,
                     struct tbst_table *new, tbst_item_func *destroy) 
{
  new->tbst_root = p;
  if (p != NULL) 
    {
      while (p->tbst_tag[1] == TBST_CHILD)
        p = p->tbst_link[1];
      p->tbst_link[1] = NULL;
    }
  tbst_destroy (new, destroy);
}

This code is included in 280 and 331.

Exercises:

1. In the diagram above that shows examples of threads followed while copying a TBST, all right threads in the TBSTs are shown. Explain how this is not just a coincidence. [answer]

2. Suggest some optimization possibilities for tbst_copy(). [answer]