3.5 Binary Search of Ordered Array |

At this point we've squeezed just about all the performance we can out of sequential search in portable C. For an algorithm that searches faster than our final refinement of sequential search, we'll have to reconsider our entire approach.

What's the fundamental idea behind sequential search? It's that we examine array elements in order. That's a fundamental limitation: if we're looking for an element in the middle of the array, we have to examine every element that comes before it. If a search algorithm is going to be faster than sequential search, it will have to look at fewer elements.

One way to look at search algorithms based on repeated comparisons is to
consider what we learn about the array's content at each step. Suppose
that *array*[] has *n* elements in sorted order, without duplicates,
that *array*[*j*] contains *key*, and that we are trying to learn the
value *j*. In sequential search, we learn only a little about the data
set from each comparison with *array*[*i*]: either *key* == *array*[*i*] so
that *i* == *j*, or *key* != *array*[*i*] so that *i* != *j* and therefore *j* > *i*. As a result, we eliminate only one possibility at each step.

Suppose that we haven't made any comparisons yet, so that we know
nothing about the contents of *array*[]. If we compare *key* to
*array*[*i*] for arbitrary *i* such that 0 <= *i* < *n*, what do we learn? There are three possibilities:

*key*<*array*[*i*]: Now we know that*key*<*array*[*i*] <*array*[*i*+ 1] < ... <*array*[*n*- 1].^{1}Therefore, 0 <=*j*<*i*.*key*==*array*[*i*]: We're done:*j*==*i*.*key*>*array*[*i*]: Now we know that*key*>*array*[*i*] >*array*[*i*- 1] > ... >*array*[0]. Therefore,*i*<*j*<*n*.

So, after one step, if we're not done, we know that *j* > *i* or that *j* < *i*. If we're equally likely to be looking for each element in
*array*[], then the best choice of *i* is *n* / 2: for that value, we
eliminate about half of the possibilities either way. (If *n* is odd,
we'll round down.)

After the first step, we're back to essentially the same situation: we
know that *key* is in *array*[*j*] for some *j* in a range of about
*n* / 2. So we can repeat the same process. Eventually, we will either
find *key* and thus *j*, or we will eliminate all the possibilities.

Let's try an example. For simplicity, let *array*[] contain the values
100 through 114 in numerical order, so that *array*[*i*] is 100 + *i* and
*n* is 15. Suppose further that *key* is 110. The steps that we'd go
through to find *j* are described below. At each step, the facts are
listed: the known range that *j* can take, the selected value of *i*,
the results of comparing *key* to *array*[*i*], and what was learned from
the comparison.

- 0 <=
*j*<= 14:*i*becomes (0 + 14) / 2 == 7. 110 >*array*[*i*] == 107, so now we know that*j*> 7. - 8 <=
*j*<= 14:*i*becomes (8 + 14) / 2 == 11. 110 <*array*[*i*] == 111, so now we know that*j*< 11. - 8 <=
*j*<= 10:*i*becomes (8 + 10) / 2 == 9. 110 >*array*[*i*] == 109, so now we know that*j*> 9. - 10 <=
*j*<= 10:*i*becomes (10 + 10) / 2 == 10. 110 ==*array*[*i*] == 110, so we're done and*i*==*j*== 10.

In case you hadn't yet figured it out, this technique is called binary search (see binary search). We can make an initial C implementation pretty easily:

21. <Binary search of ordered array 21> = /* Returns the offset withinarray[] of an element equal tokey,

or -1 ifkeyis not inarray[].array[] must be an array ofnints sorted in ascending order. */intbinary_search(intarray[],intn,intkey)

{intmin= 0;intmax=n- 1;while(max>=min)

{inti= (min+max) / 2;if(key<array[i])

max=i- 1;elseif(key>array[i])

min=i+ 1;else

returni; }return-1; }

This code is included in 600.

The maximum number of comparisons for a binary search in an array of *n*
elements is about
log2(n),
as opposed to a maximum of *n* comparisons for sequential search. For
moderate to large values of *n*, this is a lot better.

On the other hand, for small values of *n*, binary search may actually
be slower because it is more complicated than sequential search. We
also have to put our array in sorted order before we can use binary
search. Efficiently sorting an *n*-element array takes time
proportional to
n * log2(n)
for large *n*. So binary search is preferred if *n* is large enough
(see the answer to Exercise 4 for one typical value) and if
we are going to do enough searches to justify the cost of the initial
sort.

Further small refinements are possible on binary search of an ordered array. Try some of the exercises below for more information.

**See also:**
[Knuth 1998b], algorithm 6.2.1B;
[Kernighan 1988], section 3.3;
[Bentley 2000], chapters 4 and 5, section 9.3, appendix 1;
[Sedgewick 1998], program 12.6.

**Exercises:**

**1.** Function *binary_search*() above uses three local variables: *min* and
*max* for the ends of the remaining search range and *i* for its
midpoint. Write and test a binary search function that uses only two
variables: *i* for the midpoint as before and *m* representing the
width of the range on either side of *i*. You may require the
existence of a dummy element just before the beginning of the array.
Be sure, if so, to specify what its value should be.
[answer]

**2.** The standard C library provides a function, *bsearch*(), for searching
ordered arrays. Commonly, *bsearch*() is implemented as a binary
search, though ANSI C does not require it. Do the following:

- Write a function compatible with the interface for
*binary_search*() that uses*bsearch*() “under the hood.” You'll also have to write an additional callback function for use by*bsearch*(). - Write and test your own version of
*bsearch*(), implementing it using a binary search. (Use a different name to avoid conflicts with the C library.)

**3.** An earlier exercise presented a simple test framework for
*seq_search*(), but now we have more search functions. Write a test
framework that will handle all of them presented so far. Add code for
timing successful and unsuccessful searches. Let the user specify, on
the command line, the algorithm to use, the size of the array to search,
and the number of search iterations to run.
[answer]

**4.** Run the test framework from the previous exercise on your own system for
each algorithm. Try different array sizes and compiler optimization
levels. Be sure to use enough iterations to make the searches take at
least a few seconds each. Analyze the results: do they make sense?
Try to explain any apparent discrepancies.
[answer]

[1] This sort of notation
means very different things in C and mathematics. In mathematics,
writing *a* < *b* < *c* asserts both of the relations *a* < *b* and *b* < *c*,
whereas in C, it expresses the evaluation of *a* < *b*, then the
comparison of the 0 or 1 result to the value of *c*. In mathematics
this notation is invaluable, but in C it is rarely meaningful. As a
result, this book uses this notation only in the mathematical sense.