Foundations of Computer Science C Edition (Aho, Ullman) (1994)
Foundations of Computer Science C Edition (Aho, Ullman) (1994)
Foundations of Computer Science C Edition (Aho, Ullman) (1994)
✦
✦ ✦
✦ Table of Contents
Preface ix
Index 776
Preface
This book was motivated by the desire we and others have had to further the evolu-
tion of the core course in computer science. Many departments across the country
have revised their curriculum in response to the introductory course in the science
of computing discussed in the “Denning Report,” (Denning, P. J., D. E. Comer, D.
Gries, M. C. Mulder, A. Tucker, J. Turner, and P. R. Young, “Computing as a Dis-
cipline,” Comm. ACM 32:1, pp. 9–23, January 1989.). That report draws attention
to three working methodologies or processes — theory, abstraction, and design —
as fundamental to all undergraduate programs in the discipline. More recently,
the Computing Curricula 1991 report of the joint ACM/IEEE-CS Curriculum Task
Force echoes the Denning Report in identifying key recurring concepts which are
fundamental to computing, especially: conceptual and formal models, efficiency,
and levels of abstraction. The themes of these two reports summarize what we have
tried to offer the student in this book.
This book developed from notes for a two-quarter course at Stanford — called
CS109: Introduction to Computer Science — that serves a number of goals. The
first goal is to give beginning computer science majors a solid foundation for fur-
ther study. However, computing is becoming increasingly important in a much
wider range of scientific and engineering disciplines. Therefore, a second goal is
to give those students who will not take advanced courses in computer science the
conceptual tools that the field provides. Finally, a more pervasive goal is to expose
all students not only to programming concepts but also to the intellectually rich
foundations of the field.
Our first version of this book was based on programming in Pascal and appeared
in 1992. Our choice of Pascal as the language for example programs was motivated
by that language’s use in the Computer Science Advanced Placement Test as well
as in a plurality of college introductions to programming. We were pleased to see
that since 1992 there has been a significant trend toward C as the introductory
programming language, and we accordingly developed a new version of the book
using C for programming examples. Our emphasis on abstraction and encapsulation
should provide a good foundation for subsequent courses covering object-oriented
technology using C++.
At the same time, we decided to make two significant improvements in the
content of the book. First, although it is useful to have a grounding in machine
architecture to motivate how running time is measured, we found that almost all
curricula separate architecture into a separate course, so the chapter on that subject
was not useful. Second, many introductory courses in the theory of computing
emphasize combinatorics and probability, so we decided to increase the coverage
and cluster the material into a chapter of its own.
Foundations of Computer Science covers subjects that are often found split
between a discrete mathematics course and a sophomore-level sequence in computer
science in data structures. It has been our intention to select the mathematical
foundations with an eye toward what the computer user really needs, rather than
what a mathematician might choose. We have tried to integrate effectively the
mathematical foundations with the computing. We thus hope to provide a better
feel for the soul of computer science than might be found in a programming course,
ix
x PREFACE
Prerequisites
Students taking courses based on this book have ranged from first-year undergrad-
uates to graduate students. We assume only that students have had a solid course
in programming. They should be familiar with the programming language ANSI
C to use this edition. In particular, we expect students to be comfortable with C
constructs such as recursive functions, structures, pointers, and operators involving
pointers and structures such as dot, ->, and &.
4. Lists: all of Chapter 6. Some may wish to cover lists before trees, which
is a more traditional treatment. We regard trees as the more fundamental
notion, but there is little harm in switching the order. The only significant
dependency is that Chapter 6 talks about the “dictionary” abstract data type
(set with operations insert, delete, and lookup), which is introduced in Section
5.7 as a concept in connection with binary search trees.
5. Sets and relations. Data structures for sets and relations are emphasized in
Sections 7.2 through 7.9 and 8.2 through 8.6.
6. Graph algorithms are covered in Sections 9.2 through 9.9.
Acknowledgments
We are deeply indebted to a number of colleagues and students who have read this
material and given us many valuable suggestions for improving the presentation.
We owe a special debt of gratitude to Brian Kernighan, Don Knuth, Apostolos
Lerios, and Bob Martin who read the original Pascal manuscript in detail and
gave us many perceptive comments. We have received, and gratefully acknowledge,
reports of course testing of the notes for the Pascal edition of this book by Michael
Anderson, Margaret Johnson, Udi Manber, Joseph Naor, Prabhakar Ragde, Rocky
Ross, and Shuky Sagiv.
There are a number of other people who found errors in earlier editions, both
the original notes and the various printings of the Pascal edition. In this regard,
we would like to thank: Susan Aho, Michael Anderson, Aaron Edsinger, Lonnie
Eldridge, Todd Feldman, Steve Friedland, Christopher Fuselier, Mike Genstil, Paul
Grubb III, Barry Hayes, John Hwang, Hakan Jakobsson, Arthur Keller, Dean Kelley,
James Kuffner Jr., Steve Lindell, Richard Long, Mark MacDonald, Simone Mar-
tini, Hugh McGuire, Alan Morgan, Monnia Oropeza, Rodrigo Philander, Andrew
Quan, Stuart Reges, John Stone, Keith Swanson, Steve Swenson, Sanjai Tiwari,
Eric Traut, and Lynzi Ziegenhagen.
PREFACE xiii
We acknowledge helpful advice from Geoff Clem, Jon Kettenring, and Brian
Kernighan during the preparation of the C edition of Foundations of Computer
Science.
Peter Ullman produced a number of the figures used in this book. We are grate-
ful to Dan Clayton, Anthony Dayao, Mat Howard, and Ron Underwood for help
with TEX fonts, and to Hester Glynn and Anne Smith for help with the manuscript
preparation.
A. V. A.
Chatham, NJ
J. D. U.
Stanford, CA
July, 1994
CHAPTER 1
✦
Computer Science:
✦ ✦
✦
The Mechanization
of Abstraction
Though it is a new field, computer science already touches virtually every aspect
of human endeavor. Its impact on society is seen in the proliferation of computers,
information systems, text editors, spreadsheets, and all of the wonderful application
programs that have been developed to make computers easier to use and people more
productive. An important part of the field deals with how to make programming
easier and software more reliable. But fundamentally, computer science is a science
Abstraction of abstraction — creating the right model for thinking about a problem and devising
the appropriate mechanizable techniques to solve it.
Every other science deals with the universe as it is. The physicist’s job, for
example, is to understand how the world works, not to invent a world in which
physical laws would be simpler or more pleasant to follow. Computer scientists,
on the other hand, must create abstractions of real-world problems that can be
understood by computer users and, at the same time, that can be represented and
manipulated inside a computer.
Sometimes the process of abstraction is simple. For example, we can model the
behavior of the electronic circuits used to build computers quite well by an abstrac-
tion called “propositional logic.” The modeling of circuits by logical expressions is
not exact; it simplifies, or abstracts away, many details — such as the time it takes
for electrons to flow through circuits and gates. Nevertheless, the propositional
logic model is good enough to help us design computer circuits well. We shall have
much more to say about propositional logic in Chapter 12.
Exam As another example, suppose we are faced with the problem of scheduling final
scheduling examinations for courses. That is, we must assign course exams to time slots so
that two courses may have their exams scheduled in the same time slot only if there
is no student taking both. At first, it may not be apparent how we should model
this problem. One approach is to draw a circle called a node for each course and
draw a line called an edge connecting two nodes if the corresponding courses have
a student in common. Figure 1.1 suggests a possible picture for five courses; the
picture is called a course-conflict graph.
Given the course-conflict graph, we can solve the exam-scheduling problem by
repeatedly finding and removing “maximal independent sets” from the graph. An
1
2 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
Eng Math
CS
Econ Phy
Fig. 1.1. Course-conflict graph for five courses. An edge between two
courses indicates that at least one student is taking both courses.
independent set is a collection of nodes that have no connecting edges within the
Maximal collection. An independent set is maximal if no other node from the graph can be
independent added without including an edge between two nodes of the set. In terms of courses,
set a maximal independent set is any maximal set of courses with no common students.
In Fig. 1.1, {Econ, Eng, P hy} is one maximal independent set. The set of courses
corresponding to the selected maximal independent set is assigned to the first time
slot.
We remove from the graph the nodes in the first maximal independent set,
along with all incident edges, and then find a maximal independent set among the
remaining courses. One choice for the next maximal independent set is the singleton
set {CS}. The course in this maximal independent set is assigned to the second
time slot.
We repeat this process of finding and deleting maximal independent sets until
no more nodes remain in the course-conflict graph. At this point, all courses will
have been assigned to time slots. In our example, after two iterations, the only
remaining node in the course-conflict graph is Math, and this forms the final maxi-
mal independent set, which is assigned to the third time slot. The resulting exam
schedule is thus
This algorithm does not necessarily partition the courses among the smallest
possible number of time slots, but it is simple and does tend to produce a schedule
with close to the smallest number of time slots. It is also one that can be readily
programmed using the techniques presented in Chapter 9.
Notice that this approach abstracts away some details of the problem that may
be important. For example, it could cause one student to have five exams in five
consecutive time slots. We could create a model that included limits on how many
exams in a row one student could take, but then both the model and the solution
SEC. 1.1 WHAT THIS BOOK IS ABOUT 3
✦
✦ ✦
✦
1.1 What This Book Is About
This book will introduce the reader, who is assumed to have a working knowledge of
the programming language ANSI C, to the principal ideas and concerns of computer
science. The book emphasizes three important problem-solving tools:
1. Data models, the abstractions used to describe problems. We have already men-
tioned two models: logic and graphs. We shall meet many others throughout
this book.
4 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
Animal
is
Cat
is
Fluffy’s
owns
Fluffy milk
saucer
Data Models
We meet data models in two contexts. Data models such as the graphs discussed in
the introduction to this chapter are abstractions frequently used to help formulate
solutions to problems. We shall learn about several such data models in this book:
trees in Chapter 5, lists in Chapter 6, sets in Chapter 7, relations in Chapter 8,
graphs in Chapter 9, finite automata in Chapter 10, grammars in Chapter 11, and
logic in Chapters 12 and 14.
Data models are also associated with programming languages and computers.
For example, C has a data model that includes abstractions such as characters,
integers of several sizes, and floating-point numbers. Integers and floating-point
numbers in C are only approximations of integers and reals in mathematics because
of the limited precision of arithmetic available in computers. The C data model also
includes types such as structures, pointers, and functions, which we shall discuss in
more detail in Section 1.4.
Data Structures
When the data model of the language in which we are writing a program lacks a
built-in representation for the data model of the problem at hand, we must represent
the needed data model using the abstractions supported by the language. For this
purpose, we study data structures, which are methods for representing in the data
model of a programming language abstractions that are not an explicit part of
SEC. 1.1 WHAT THIS BOOK IS ABOUT 5
that language. Different programming languages may have strikingly different data
models. For example, unlike C, the language Lisp supports trees directly, and the
language Prolog has logic built into its data model.
Algorithms
Underlying Threads
1. Design algebras. In certain fields in which the underlying models have become
well understood, we can develop notations in which design trade-offs can be
expressed and evaluated. Through this understanding, we can develop a theory
of design with which well-engineered systems can be constructed. Propositional
logic, with the associated notation called Boolean algebra that we encounter in
Chapter 12, is a good example of this kind of design algebra. With it, we can
design efficient circuits for subsystems of the kind found in digital computers.
Other examples of algebras found in this book are the algebra of sets in Chapter
7, the algebra of relations in Chapter 8, and the algebra of regular expressions
in Chapter 10.
6 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
2. Recursion is such a useful technique for defining concepts and solving problems
that it deserves special mention. We discuss recursion in detail in Chapter 2
and use it throughout the rest of the book. Whenever we need to define an
object precisely or whenever we need to solve a problem, we should always ask,
“What does the recursive solution look like?” Frequently that solution has a
simplicity and efficiency that makes it the method of choice.
✦
✦ ✦
✦
1.2 What This Chapter Is About
The remainder of this chapter sets the stage for the study of computer science. The
primary concepts that will be covered are
✦ Data models (Section 1.3)
✦ The data model of the programming language C (Section 1.4)
✦ The principal steps in the software-creation process (Section 1.5)
We shall give examples of several different ways in which abstractions and mod-
els appear in computer systems. In particular, we mention the models found in
programming languages, in certain kinds of systems programs, such as operating
systems, and in the circuits from which computers are built. Since software is a
vital component of today’s computer systems, we need to understand the software-
creation process, the role played by models and algorithms, and the aspects of
software creation that computer science can address only in limited ways.
In Section 1.6 there are some conventional definitions that are used in C pro-
grams throughout this book.
✦
✦ ✦
✦
1.3 Data Models
Any mathematical concept can be termed a data model. In computer science, a
data model normally has two aspects:
1. The values that objects can assume. For example, many data models contain
objects that have integer values. This aspect of the data model is static; it tells
us what values objects may take. The static part of a programming language’s
Type system data model is often called the type system.
2. The operations on the data. For example, we normally apply operations such
as addition to integers. This aspect of the model is dynamic; it tells us the
ways in which we can change values and create new values.
We may also name boxes. In general, a name for a box is any expression that
denotes that box. Often, we think of the names of boxes as the variables of the
Name program, but that is not quite right. For example, if x is a variable local to a
recursive function F , then there may be many boxes named x, each associated with
a different call to F . Then the true name of such a box is a combination of x and
the particular call to F .
Most of the data types of C are familiar: integers, floating-point numbers,
characters, arrays, structures, and pointers. These are all static notions.
The operations permitted on data include the usual arithmetic operations on
integers and floating-point numbers, accessing operations for elements of arrays or
Dereferencing structures, and pointer dereferencing, that is, finding the element pointed to by a
pointer. These operations are part of the dynamics of the C data model.
In a programming course, we would see important data models that are not part
of C, such as lists, trees, and graphs. In mathematical terms, a list is a sequence of
The list data n elements, which we shall write as (a1 , a2 , . . . , an ), where a1 is the first element, a2
model the second, and so on. Operations on lists include inserting new elements, deleting
elements, and concatenating lists (that is, appending one list to the end of another).
a1 • a2 • ··· an •
Cells are represented by rectangles, the left part of which is the element, and
the right part of which holds a pointer, shown as an arrow to the next cell pointed
to. A dot in the box holding a pointer means that the pointer is NULL.1 Lists will
be covered in more detail in Chapter 6. ✦
1 NULL is a symbolic constant defined in the standard header file stdio.h to be equal to a value
that cannot be a pointer to anything. We shall use it to have this meaning throughout the
book.
8 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
ann bob
a1 a2 a3 b1 b2
1. The data itself is stored in files, which in the UNIX system are strings of
Files characters.
Directories 2. Files are organized into directories, which are collections of files and/or other
directories. The directories and files form a tree with the files at the leaves.3
Figure 1.4 suggests the tree that might represent the directory structure of a
typical UNIX operating system. Directories are indicated by circles. The root
directory / contains directories called mnt, usr, bin, and so on. The directory
/usr contains directories ann and bob; directory ann contains three files: a1,
a2, and a3.
2 If you are unfamiliar with operating systems, you can skip the next paragraphs. However,
most readers have probably encountered an operating system, perhaps under another name.
For example, the Macintosh “system” is an operating system, although different terminology
is used. For example, a directory becomes a “folder” in Macintosh-ese.
3 However, “links” in directories may make it appear that a file or directory is part of several
different directories.
SEC. 1.3 DATA MODELS 9
Processes 3. Processes are individual executions of programs. Processes take zero or more
streams as input and produce zero or more streams as output. In the UNIX
Pipes system, processes can be combined by pipes, where the output from one process
is fed as input into the next process. The resulting composition of processes
can be viewed as a single process with its own input and output.
There are many other aspects to an operating system, such as how it manages
security of data and interaction with the user. However, even these few observations
should make it apparent that the data model of an operating system is rather
different from the data model of a programming language.
Another type of data model is found in text editors. Every text editor’s data
Text editors model incorporates a notion of text strings and editing operations on text. The
data model usually includes the notion of lines, which, like most files, are character
strings. However, unlike files, lines may have associated line numbers. Lines may
also be organized into larger units such as paragraphs, and operations on lines are
normally applicable anywhere within the line — not just at the front, like the most
common file operations. The typical editor supports a notion of a “current” line
(where the cursor is) and probably a current position within that line. Operations
performed by the editor include various modifications to lines, such as deletion or
insertion of characters within the line, deletion of lines, and creation of new lines.
It is also possible in typical editors to search for features, such as specific character
strings, among the lines of the file being edited.
In fact, if you examine any other familiar piece of software, such as a spread-
sheet or a video game, a pattern emerges. Each program that is designed to be
used by others has its own data model, within which the user must work. The
data models we meet are often radically different from one another, both in the
primitives they use to represent data and in the operations on that data that are
offered to the user. Yet each data model is implemented, via data structures and
the programs that use them, in some programming language.
inputs and one output; the value of an input or output can be only 0 or 1. A
gate performs a simple function — such as AND, where the output is 1 if all the
inputs are 1 and the output is 0 if one or more of the inputs are 0. At one level
of abstraction, computer design is the process of deciding how to connect gates
to perform the basic operations of a computer. There are many other levels of
abstraction associated with computer design as well.
Figure 1.5 shows the usual symbol for an AND-gate, together with its truth table,
which indicates the output value of the gate for each pair of input values.4 We
discuss truth tables in Chapter 12 and gates and their interconnections in Chapter
13.
x y z
x
AND z 0 0 0
0 1 0
y
1 0 0
1 1 1
x y
d c
One-bit adder Out of a few gates, we can build a one-bit adder circuit, as suggested in Fig.
1.6. Two input bits, x and y, and a carry-in bit c, are summed, resulting in a sum
bit z and a carry-out bit d. To be precise, d is 1 if two or more of c, x, and y are 1,
while z is 1 if an odd number (one or three) of c, x, and y are 1, as suggested by
4 Note that if we think of 1 as “true” and 0 as “false,” then the AND-gate performs the same
logical operation as the && operator of C.
SEC. 1.3 DATA MODELS 11
x y c d z
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
the table of Fig. 1.7. The carry-out bit followed by the sum bit — that is, dz —
forms a two-bit binary number, which is the total number of x, y, and c that are 1.
In this sense, the one-bit adder adds its inputs.
Many computers represent integers as 32-bit numbers. An adder circuit can
12 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
then be composed of 32 one-bit adders, as suggested in Fig. 1.8. This circuit is often
Ripple-carry called a ripple-carry adder, because the carry ripples from right to left, one bit at a
adder time. Note that the carry into the rightmost (low-order bit) one-bit adder is always
0. The sequence of bits x31 x30 · · · x0 represents the bits of the first number being
added, and y31 y30 · · · y0 is the second addend. The sum is dz31 z30 · · · z0 ; that is, the
leading bit is the carry-out of the leftmost one-bit adder, and the following bits of
the sum are the sum bits of the adders, from the left.
x31 x30 x0
y31 y30 y0
d ··· 0
z31 z30 z0
Fig. 1.8. A ripple-carry adder: dz31 z30 · · · z0 = x31 x30 · · · x0 + y31 y30 · · · y0 .
The circuit of Fig. 1.8 is really an algorithm in the data model of bits and the
primitive operations of gates. However, it is not a particularly good algorithm. The
reason is that until we compute the carry-out of the rightmost place, we cannot
compute z1 or the carry-out of the second place. Until we compute the carry-out of
the second place, we cannot compute z2 or the carry-out of the third place, and so
on. Thus, the time taken by the circuit is the length of the numbers being added —
32 in our case — multiplied by the time needed by a one-bit adder.
One might suspect that the need to “ripple” the carry through each of the one-
bit adders, in turn, is inherent in the definition of addition. Thus, it may come as
a surprise to the reader that computers have a much faster way of adding numbers.
We shall cover such an improved algorithm for addition when we discuss the design
of circuits in Chapter 13. ✦
EXERCISES
1.3.1: Explain the difference between the static and dynamic aspects of a data
model.
1.3.2: Describe the data model of your favorite video game. Distinguish between
static and dynamic aspects of the model. Hint : The static parts are not just the
parts of the game board that do not move. For example, in Pac Man, the static
part includes not only the map, but the “power pills,” “monsters,” and so on.
1.3.3: Describe the data model of your favorite text editor.
1.3.4: Describe the data model of a spreadsheet program.
SEC. 1.4 THE C DATA MODEL 13
✦
✦ ✦
✦
1.4 The C Data Model
In this section we shall highlight important parts of the data model used by the C
programming language. As an example of a C program, consider the program in
Fig. 1.10 that uses the variable num to count the number of characters in its input.
#include <stdio.h>
main()
{
int num;
num = 0;
while (getchar() != EOF)
++num; /* add 1 to num */
printf("%d\n", num);
}
The first line tells the C preprocessor to include as part of the source the
standard input/output file stdio.h, which contains the definitions of the functions
getchar and printf, and the symbolic constant EOF, a value that represents the
end of a file.
A C program itself consists of a sequence of definitions, which can be either
function definitions or data definitions. One must be a definition of a function
called main. The first statement in the function body of the program in Fig. 1.10
declares the variable num to be of type int. (All variables in a C program must
be declared before their use.) The next statement initializes num to zero. The
following while statement reads input characters one at a time using the library
function getchar, incrementing num after each character read, until there are no
EOF more input characters. The end of file is signaled by the special value EOF on the
input. The printf statement prints the value of num as a decimal integer, followed
by a newline character.
Object of
p •
type T
✦ Example 1.4. C has the typedef construct to create synonyms for type names.
The declaration
typedef int Distance;
16 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
field1 3
field2 •
9 •
field1 3
0
field2 •
1 ···
2 ···
3 ···
field1 7
4
field2 •
(d)
Functions
Functions also have associated types, even though we do not associate boxes or
“values” with functions, as we do with program variables. For any list of types
T1 , T2 , . . . , Tn , we can define a function with n parameters consisting of those types,
in order. This list of types followed by the type of the value returned by the function
Return-value (the return-value) is the “type” of the function. If the function has no return value,
its type is void.
In general, we can build types by applying the type-construction rules arbi-
trarily, but there are a number of constraints. For example, we cannot construct an
18 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
Data Combination
C has a rich set of operators for manipulating and combining values. The principal
operators are
1. Arithmetic operators. C provides:
SEC. 1.4 THE C DATA MODEL 19
5 We shall use TRUE and FALSE as defined constants 1 and 0, respectively, to represent Boolean
values; see Section 1.6.
20 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
EXERCISES
1.4.1: Explain the difference between an identifier of a C program and a name (for
a “box” or data object).
1.4.2: Give an example of a C data object that has more than one name.
1.4.3: If you are familiar with another programming language besides C, describe
its type system and operations.
✦
✦ ✦
✦
1.5 Algorithms and the Design of Programs
The study of data models, their properties, and their appropriate use is one pillar of
computer science. A second, equally important pillar is the study of algorithms and
their associated data structures. We need to know the best ways to perform common
tasks, and we need to learn the principal techniques for designing good algorithms.
Further, we need to understand how the use of data structures and algorithms fits
into the process of creating useful programs. The themes of data models, algorithms,
data structures, and their implementation in programs are interdependent, and each
appears many times throughout the book. In this section, we shall mention some
generalities regarding the design and implementation of programs.
Problem definition and specification. The hardest, but most important, part of
the task of creating a software system is defining what the problem really is and
then specifying what is needed to solve it. Usually, problem definition begins by
analyzing the users’ requirements, but these requirements are often imprecise and
hard to write down. The system architect may have to consult with the future users
of the system and iterate the specification, until both the specifier and the users
are satisfied that the specification defines and solves the problem at hand. In the
Prototyping specification stage, it may be helpful to build a simple prototype or model of the
final system, to gain insight into its behavior and intended use. Data modeling is
also an important tool in the problem-definition phase.
Design. Once the specification is complete, a high-level design of the system is
created, with the major components identified. A document outlining the high-level
design is prepared, and performance requirements for the system may be included.
More detailed specifications of some of the major components may also be included
Software reuse during this phase. A cost-effective design often calls for the reuse or modification of
previously constructed components. Various software methodologies such as object-
oriented technology facilitate the reuse of components.
Implementation. Once the design is fixed, implementation of the components can
proceed. Many of the algorithms discussed in this book are useful in the implemen-
tation of new components. Once a component has been implemented, it is subject
to a series of tests to make sure that it behaves as specified.
Integration and system testing. When the components have been implemented and
individually tested, the entire system is assembled and tested.
Installation and field testing. Once the developer is satisfied that the system
will work to the customer’s satisfaction, the system is installed on the customer’s
premises and the final field testing takes place.
Maintenance. At this point, we might think that the bulk of the work has been
done. Maintenance remains, however, and in many situations maintenance can
account for more than half the cost of system development. Maintenance may
involve modifying components to eliminate unforeseen side-effects, to correct or
improve system performance, or to add features. Because maintenance is such an
important part of software systems design, it is important to write programs that
are correct, rugged, efficient, modifiable, and — whenever possible — portable from
one computer to another.
It is very important to catch errors as early as possible, preferably during the
problem-definition phase. At each successive phase, the cost of fixing a design error
or programming bug rises greatly. Independent reviews of requirements and designs
are beneficial in reducing downstream errors.
Programming Style
An individual programmer can ease the maintenance burden greatly by writing
programs that others can read and modify readily. Good programming style comes
only with practice, and we recommend that you begin at once to try writing pro-
grams that are easy for others to understand. There is no magic formula that will
guarantee readable programs, but there are several useful rules of thumb:
22 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
✦
✦ ✦
✦
1.6 Some C Conventions Used Throughout the Book
There are several definitions and conventions that we shall find useful as we illustrate
concepts with C programs. Some of these are common conventions found in the
standard header file stdio.h, while others are defined specially for the purposes of
this book and must be included with any C program that uses them.
NULL 1. The identifier NULL is a value that may appear anywhere a pointer can appear,
but it is not a value that can ever point to anything. Thus, NULL in a field such
as next in the cells of Example 1.1 can be used to indicate the end of a list.
We shall see that NULL has a number of similar uses in other data structures.
NULL is properly defined in stdio.h.
2. The identifiers TRUE and FALSE are defined by
#define TRUE 1
#define FALSE 0
TRUE and FALSE Thus, TRUE can be used anywhere a condition with logical value true is wanted,
and FALSE can be used for a condition whose value is false.
BOOLEAN 3. The type BOOLEAN is defined as
typedef int BOOLEAN;
We use BOOLEAN whenever we want to stress the fact that we are interested in
the logical rather than the numeric value of an expression.
SEC. 1.7 SUMMARY OF CHAPTER 1 23
EOF 4. The identifier EOF is a value that is returned by file-reading functions such as
getchar() when there are no more bytes left to be read from the file. An
appropriate value for EOF is provided in stdio.h.
5. We shall define a macro that generates declarations of cells of the kind used in
Cell definition Example 1.1. An appropriate definition appears in Fig. 1.14. It declares cells
with two fields: an element field whose type is given by the parameter Type
and a next field to point to a cell with this structure. The macro provides
two external definitions: CellName is the name of structures of this type, and
ListName is a name for the type of pointers to these cells.
✦ Example 1.6. We can define cells of the type used in Example 1.1 by the macro
use
DefCell(int, CELL, LIST);
The macro then expands into
typedef struct CELL, LIST;
struct CELL {
int element;
LIST next;
}
As a consequence, we can use CELL as the type of integer cells, and we can use LIST
as the type of pointers to these cells. For example,
CELL c;
LIST L;
defines c to be a cell and L to be a pointer to a cell. Note that the representation
of a list of cells is normally a pointer to the first cell on the list, or NULL if the list
is empty. ✦
✦
✦ ✦
✦
1.7 Summary of Chapter 1
At this point you should be aware of the following concepts:
✦ How data models, data structures, and algorithms are used to solve problems
24 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION
✦ The distinction between a list as a data model and a linked list as a data
structure
✦ The presence of some kind of data model in every software system, be it a
programming language, an operating system, or an application program
✦ The key elements of the data model supported by the programming language
C
✦ The major steps in the development of a large software system
✦
✦ ✦
✦
1.8 Bibliographic Notes for Chapter 1
Kernighan and Ritchie [1988] is the classic reference for the C programming lan-
guage. Roberts [1994] is a good introduction to programming using C.
Stroustrup [1991] has created an object-oriented extension of C called C++
that is now widely used for implementing systems. Sethi [1989] provides an intro-
duction to the data models of several major programming languages.
Brooks [1974] eloquently describes the technical and managerial difficulties in
developing large software systems. Kernighan and Plauger [1978] provide sound
advice for improving your programming style.
American National Standards Institute (ANSI) [1990]. Programming Language C,
American National Standards Institute, New York.
Brooks, F. P. [1974]. The Mythical Man Month, Addison-Wesley, Reading, Mass.
Kernighan, B. W., and P. J. Plauger [1978]. The Elements of Programming Style,
second edition, McGraw-Hill, New York.
Kernighan, B. W., and D. M. Ritchie [1988]. The C Programming Language, second
edition, Prentice-Hall, Englewood Cliffs, New Jersey.
Roberts, E. S. [1994]. A C-Based Introduction to Computer Science, Addison-
Wesley, Reading, Mass.
Sethi, R. [1989]. Programming Languages: Concepts and Constructs, Addison-
Wesley, Reading, Mass.
Stroustrup, B. [1991]. The C++ Programming Language, second edition, Addison-
Wesley, Reading, Mass.
CHAPTER 2
Iteration,
✦
Induction,
✦ ✦
✦
and
Recursion
The power of computers comes from their ability to execute the same task, or
different versions of the same task, repeatedly. In computing, the theme of iteration
is met in a number of guises. Many concepts in data models, such as lists, are forms
of repetition, as “A list either is empty or is one element followed by another, then
another, and so on.” Programs and algorithms use iteration to perform repetitive
jobs without requiring a large number of similar steps to be specified individually,
as “Do the next step 1000 times.” Programming languages use looping constructs,
like the while- and for-statements of C, to implement iterative algorithms.
Closely related to repetition is recursion, a technique in which a concept is
defined, directly or indirectly, in terms of itself. For example, we could have defined
a list by saying “A list either is empty or is an element followed by a list.” Recursion
is supported by many programming languages. In C, a function F can call itself,
either directly from within the body of F itself, or indirectly by calling some other
function, which calls another, and another, and so on, until finally some function
in the sequence calls F . Another important idea, induction, is closely related to
“recursion” and is used in many mathematical proofs.
Iteration, induction, and recursion are fundamental concepts that appear in
many forms in data models, data structures, and algorithms. The following list
gives some examples of uses of these concepts; each will be covered in some detail
in this book.
25
26 ITERATION, INDUCTION, AND RECURSION
which says that the sum of the integers from 1 to n equals n(n+1)/2. The basis
could be S(1) — that is, Equation (2.1) with 1 in place of nP— which is just the
equality 1 = 1 × 2/2. The inductive step is to show that ni=1 i = n(n + 1)/2
Pn+1
implies that i=1 i = (n + 1)(n + 2)/2; the former is S(n), which is Equation
(2.1) itself, while the latter is S(n + 1), which is Equation (2.1) with n + 1
replacing n everywhere n appears. Section 2.3 will show us how to construct
proofs such as this.
4. Proofs of program correctness. In computer science, we often wish to prove,
formally or informally, that a statement S(n) about a program is true. The
statement S(n) might, for example, describe what is true on the nth iteration
of some loop or what is true for the nth recursive call to some function. Proofs
of this sort are generally inductive proofs.
5. Inductive definitions. Many important concepts of computer science, especially
those involving data models, are best defined by an induction in which we give
SEC. 2.2 ITERATION 27
a basis rule defining the simplest example or examples of the concept, and an
inductive rule or rules, where we build larger instances of the concept from
smaller ones. For instance, we noted that a list can be defined by a basis rule
(an empty list is a list) together with an inductive rule (an element followed
by a list is also a list).
6. Analysis of running time. An important criterion for the “goodness” of an
algorithm is how long it takes to run on inputs of various sizes (its “running
time”). When the algorithm involves recursion, we use a formula called a
recurrence equation, which is an inductive definition that predicts how long the
algorithm takes to run on inputs of different sizes.
Each of these subjects, except the last, is introduced in this chapter; the running
time of a program is the topic of Chapter 3.
✦
✦ ✦
✦
2.1 What This Chapter Is About
In this chapter we meet the following major concepts.
✦ Iterative programming (Section 2.2)
✦ Inductive proofs (Sections 2.3 and 2.4)
✦ Inductive definitions (Section 2.6)
✦ Recursive programming (Sections 2.7 and 2.8)
✦ Proving the correctness of a program (Sections 2.5 and 2.9)
In addition, we spotlight, through examples of these concepts, several interesting
and important ideas from computer science. Among these are
✦ Sorting algorithms, including selection sort (Section 2.2) and merge sort (Sec-
tion 2.8)
✦ Parity checking and detection of errors in data (Section 2.3)
✦ Arithmetic expressions and their transformation using algebraic laws (Sections
2.4 and 2.6)
✦ Balanced parentheses (Section 2.6)
✦
✦ ✦
✦
2.2 Iteration
Each beginning programmer learns to use iteration, employing some kind of looping
construct such as the for- or while-statement of C. In this section, we present an
example of an iterative algorithm, called “selection sort.” In Section 2.5 we shall
prove by induction that this algorithm does indeed sort, and we shall analyze its
running time in Section 3.6. In Section 2.8, we shall show how recursion can help
us devise a more efficient sorting algorithm using a technique called “divide and
conquer.”
28 ITERATION, INDUCTION, AND RECURSION
Sorting
To sort a list of n elements we need to permute the elements of the list so that they
appear in nondecreasing order.
✦ Example 2.1. Suppose we are given the list of integers (3, 1, 4, 1, 5, 9, 2, 6, 5).
We sort this list by permuting it into the sequence (1, 1, 2, 3, 4, 5, 5, 6, 9). Note that
sorting not only orders the values so that each is either less than or equal to the one
that follows, but it also preserves the number of occurrences of each value. Thus,
the sorted list has two 1’s, two 5’s, and one each of the numbers that appear once
in the original list. ✦
We can sort a list of elements of any type as long as the elements have a “less-
than” order defined on them, which we usually represent by the symbol <. For
example, if the values are real numbers or integers, then the symbol < stands for the
usual less-than relation on reals or integers, and if the values are character strings,
we would use the lexicographic order on strings. (See the box on “Lexicographic
Order.”) Sometimes when the elements are complex, such as structures, we might
use only a part of each element, such as one particular field, for the comparison.
The comparison a ≤ b means, as always, that either a < b or a and b are the
Sorted list same value. A list (a1 , a2 , . . . , an ) is said to be sorted if a1 ≤ a2 ≤ · · · ≤ an ; that
is, if the values are in nondecreasing order. Sorting is the operation of taking an
arbitrary list (a1 , a2 , . . . , an ) and producing a list (b1 , b2 , . . . , bn ) such that
SEC. 2.2 ITERATION 29
Lexicographic Order
The usual way in which two character strings are compared is according to their
lexicographic order. Let c1 c2 · · · ck and d1 d2 · · · dm be two strings, where each of the
c’s and d’s represents a single character. The lengths of the strings, k and m, need
not be the same. We assume that there is a < ordering on characters; for example,
in C characters are small integers, so character constants and variables can be used
as integers in arithmetic expressions. Thus we can use the conventional < relation
on integers to tell which of two characters is “less than” the other. This ordering
includes the natural notion that lower-case letters appearing earlier in the alphabet
are “less than” lower-case letters appearing later in the alphabet, and the same
holds for upper-case letters.
We may then define the ordering on character strings called the lexicographic,
dictionary, or alphabetic ordering, as follows. We say c1 c2 · · · ck < d1 d2 · · · dm if
either of the following holds:
Proper prefix 1. The first string is a proper prefix of the second, which means that k < m and
for i = 1, 2, . . . , k we have ci = di . According to this rule, bat < batter. As
a special case of this rule, we could have k = 0, in which case the first string
has no characters in it. We shall use ǫ, the Greek letter epsilon, to denote the
Empty string empty string, the string with zero characters. When k = 0, rule (1) says that
ǫ < s for any nonempty string s.
2. For some value of i > 0, the first i − 1 characters of the two strings agree, but
the ith character of the first string is less than the ith character of the second
string. That is, cj = dj for j = 1, 2, . . . , i − 1, and ci < di . According to
this rule, ball < base, because the two words first differ at position 3, and at
that position ball has an l, which precedes the character s found in the third
position of base.
order. We may do so by iterating a step in which a smallest element1 not yet part of
the sorted portion of the array is found and exchanged with the element in the first
position of the unsorted part of the array. In the first iteration, we find (“select”) a
smallest element among the values found in the full array A[0..n-1] and exchange
it with A[0].2 In the second iteration, we find a smallest element in A[1..n-1] and
exchange it with A[1]. We continue these iterations. At the start of the i + 1st
iteration, A[0..i-1] contains the i smallest elements in A sorted in nondecreasing
order, and the remaining elements of the array are in no particular order. A picture
of A just before the i + 1st iteration is shown in Fig. 2.1.
sorted unsorted
↑ ↑ ↑
0 i n−1
Fig. 2.1. Picture of array just before the i + 1st iteration of selection sort.
1 We say “a smallest element” rather than “the smallest element” because there may be several
occurrences of the smallest value. If so, we shall be happy with any of those occurrences.
2 To describe a range of elements within an array, we adopt a convention from the language
Pascal. If A is an array, then A[i..j] denotes those elements of A with indexes from i to j,
inclusive.
SEC. 2.2 ITERATION 31
in A[small] to temp at line (6), move the value in A[i] to A[small] at line (7),
and finally move the value originally in A[small] from temp to A[i] at line (8).
We begin the outer loop with i = 0, and at line (2) we set small to 0. Lines (3) to
(5) form an inner loop, in which j is set to 1, 2, and 3, in turn. With j = 1, the
test of line (4) succeeds, since A[1], which is 30, is less than A[small], which is A[0],
or 40. Thus, we set small to 1 at line (5). At the second iteration of lines (3) to
(5), with j = 2, the test of line (4) again succeeds, since A[2] < A[1], and so we set
small to 2 at line (5). At the last iteration of lines (3) to (5), with j = 3, the test
of line (4) succeeds, since A[3] < A[2], and we set small to 3 at line (5).
We now fall out of the inner loop to line (6). We set temp to 10, which is
A[small], then A[3] to A[0], or 40, at line (7), and A[0] to 10 at line (8). Now, the
32 ITERATION, INDUCTION, AND RECURSION
Sorting on Keys
When we sort, we apply a comparison operation to the values being sorted. Often
the comparison is made only on specific parts of the values and the part used in the
comparison is called the key.
For example, a course roster might be an array A of C structures of the form
struct STUDENT {
int studentID;
char *name;
char grade;
} A[MAX];
We might want to sort by student ID, or name, or grade; each in turn would be
the key. For example, if we wish to sort structures by student ID, we would use the
comparison
A[j].studentID < A[small].studentID
at line (4) of SelectionSort. The type of array A and temporary temp used in the
swap would be struct STUDENT, rather than integer. Note that entire structures
are swapped, not just the key fields.
Since it is time-consuming to swap whole structures, a more efficient approach
is to use a second array of pointers to STUDENT structures and sort only the pointers
in the second array. The structures themselves remain stationary in the first array.
We leave this version of selection sort as an exercise.
The second iteration of the outer loop, with i = 1, sets small to 1 at line (2).
The inner loop sets j to 2 initially, and since A[2] < A[1], line (5) sets small to 2.
With j = 3, the test of line (4) fails, since A[3] ≥ A[2]. Hence, small = 2 when we
reach line (6). Lines (6) to (8) swap A[1] with A[2], leaving the array
0 1 2 3
A 10 20 30 40
Although the array now happens to be sorted, we still iterate the outer loop once
more, with i = 2. We set small to 2 at line (2), and the inner loop is executed only
with j = 3. Since the test of line (4) fails, small remains 2, and at lines (6) through
(8), we “swap” A[2] with itself. The reader should check that the swapping has no
effect when small = i. ✦
Figure 2.3 shows how the function SelectionSort can be used in a complete
program to sort a sequence of n integers, provided that n ≤ 100. Line (1) reads and
stores n integers in an array A. If the number of inputs exceeds MAX, only the first
MAX integers are put into A. A message warning the user that the number of inputs
is too large would be useful here, but we omit it.
SEC. 2.2 ITERATION 33
Line (3) calls SelectionSort to sort the array. Lines (4) and (5) print the
integers in sorted order.
#include <stdio.h>
EXERCISES
2.2.3: Write a C function that takes two linked lists of characters as arguments and
returns TRUE if the first string precedes the second in lexicographic order. Hint :
Implement the algorithm for comparing character strings that was described in
this section. Use recursion by having the function call itself on the tails of the
character strings when it finds that the first characters of both strings are the same.
Alternatively, one can develop an iterative algorithm to do the same.
2.2.4*: Modify your program from Exercise 2.2.3 to ignore the case of letters in
comparisons.
2.2.5: What does selection sort do if all elements are the same?
2.2.6: Modify Fig. 2.3 to perform selection sort when array elements are not inte-
gers, but rather structures of type struct STUDENT, as defined in the box “Sorting
on Keys.” Suppose that the key field is studentID.
2.2.7*: Further modify Fig. 2.3 so that it sorts elements of an arbitrary type T .
You may assume, however, that there is a function key that takes an element of type
T as argument and returns the key for that element, of some arbitrary type K. Also
assume that there is a function lt that takes two elements of type K as arguments
and returns TRUE if the first is “less than” the second, and FALSE otherwise.
2.2.8: Instead of using integer indexes into the array A, we could use pointers to
integers to indicate positions in the array. Rewrite the selection sort algorithm of
Fig. 2.3 using pointers.
2.2.9*: As mentioned in the box on “Sorting on Keys,” if the elements to be sorted
are large structures such as type STUDENT, it makes sense to leave them stationary
in an array and sort pointers to these structures, found in a second array. Write
this variation of selection sort.
2.2.10: Write an iterative program to print the distinct elements of an integer array.
P Q
2.2.11: Use the and notations described at the beginning of this chapter to
express the following.
a) The sum of the odd integers from 1 to 377
b) The sum of the squares of the even integers from 2 to n (assume that n is even)
c) The product of the powers of 2 from 8 to 2k
2.2.12: Show that when small = i, lines (6) through (8) of Fig. 2.2 (the swapping
steps) do not have any effect on array A.
✦
✦ ✦
✦
2.3 Inductive Proofs
Mathematical induction is a useful technique for proving that a statement S(n) is
true for all nonnegative integers n, or, more generally, for all integers at or above
some lower limit. For
P example, in the introduction to this chapter we suggested
that the statement ni=1 i = n(n + 1)/2 can be proved true for all n ≥ 1 by an
induction on n.
Now, let S(n) be some arbitrary statement about an integer n. In the simplest
form of an inductive proof of the statement S(n), we prove two facts:
SEC. 2.3 INDUCTIVE PROOFS 35
1. The basis case, which is frequently taken to be S(0). However, the basis can be
S(k) for any integer k, with the understanding that then the statement S(n)
is proved only for n ≥ k.
2. The inductive step, where we prove that for all n ≥ 0 [or for all n ≥ k, if the
basis is S(k)], S(n) implies S(n + 1). In this part of the proof, we assume
Inductive that the statement S(n) is true. S(n) is called the inductive hypothesis, and
hypothesis assuming it to be true, we must then prove that S(n + 1) is true.
n
X
STATEMENT S(n): 2i = 2n+1 − 1 for any n ≥ 0.
i=0
That is, the sum of the powers of 2, from the 0th power to the nth power, is 1 less
than the (n + 1)st power of 2.3 For example, 1 + 2 + 4 + 8 = 16 − 1. The proof
proceeds as follows.
BASIS. To prove the basis, we substitute 0 for n in the equation S(n). Then S(n)
becomes
3 S(n) can be proved without induction, using the formula for the sum of a geometric series.
However, it will serve as a simple example of the technique of mathematical induction.
Further, the proofs of the formulas for the sum of a geometric or arithmetic series that you
have probably seen in high school are rather informal, and strictly speaking, mathematical
induction should be used to prove those formulas.
36 ITERATION, INDUCTION, AND RECURSION
0
X
2i = 21 − 1 (2.2)
i=0
There is only one term, for i = 0, in the summation on the left side of Equation
(2.2), so that the left side of (2.2) sums to 20 , or 1. The right side of Equation (2.2),
which is 21 − 1, or 2 − 1, also has value 1. Thus we have proved the basis of S(n);
that is, we have shown that this equality is true for n = 0.
INDUCTION. Now we must prove the inductive step. We assume that S(n) is true,
and we prove the same equality with n + 1 substituted for n. The equation to be
proved, S(n + 1), is
n+1
X
2i = 2n+2 − 1 (2.3)
i=0
To prove Equation (2.3), we begin by considering the sum on the left side,
n+1
X
2i
i=0
This sum is almost the same as the sum on the left side of S(n), which is
n
X
2i
i=0
except that (2.3) also has a term for i = n + 1, that is, the term 2n+1 .
Since we are allowed to assume that the inductive hypothesis S(n) is true in
our proof of Equation (2.3), we should contrive to use S(n) to advantage. We do so
by breaking the sum in (2.3) into two parts, one of which is the sum in S(n). That
is, we separate out the last term, where i = n + 1, and write
n+1
X n
X
2i = 2i + 2n+1 (2.4)
i=0 i=0
Pn
Now we can make use of S(n) by substituting its right side, 2n+1 − 1, for i=0 2i
in Equation (2.4):
n+1
X
2i = 2n+1 − 1 + 2n+1 (2.5)
i=0
We then literally substitute the desired expression, n + 1 in this case, for each
occurrence of m. That gives us
n+1
X
2i = 2(n+1)+1 − 1
i=0
each of our “proofs” that induction works requires an inductive proof itself, and
therefore is no proof at all. Technically, induction must be accepted as axiomatic.
Nevertheless, many people find the following intuition useful.
In what follows, we assume that the basis value is n = 0. That is, we know
that S(0) is true and that for all n greater than 0, if S(n) is true, then S(n + 1) is
true. Similar arguments work if the basis value is any other integer.
First “proof”: Iteration of the inductive step. Suppose we want to show that
S(a) is true for a particular nonnegative integer a. If a = 0, we just invoke the
truth of the basis, S(0). If a > 0, then we argue as follows. We know that S(0) is
true, from the basis. The statement “S(n) implies S(n + 1),” with 0 in place of n,
says “S(0) implies S(1).” Since we know that S(0) is true, we now know that S(1)
is true. Similarly, if we substitute 1 for n, we get “S(1) implies S(2),” and so we
also know that S(2) is true. Substituting 2 for n, we have “S(2) implies S(3),” so
that S(3) is true, and so on. No matter what the value of a is, we eventually get to
S(a), and we are done.
Second “proof”: Least counterexample. Suppose S(n) were not true for at least
one value of n. Let a be the least nonnegative integer for which S(a) is false. If
a = 0, then we contradict the basis, S(0), and so a must be greater than 0. But if
a > 0, and a is the least nonnegative integer for which S(a) is false, then S(a − 1)
must be true. Now, the inductive step, with n replaced by a−1, tells us that S(a−1)
implies S(a). Since S(a−1) is true, S(a) must be true, another contradiction. Since
we assumed there were nonnegative values of n for which S(n) is false and derived
a contradiction, S(n) must therefore be true for any n ≥ 0.
38 ITERATION, INDUCTION, AND RECURSION
Error-Detecting Codes
We shall now begin an extended example of “error-detecting codes,” a concept that
is interesting in its own right and also leads to an interesting inductive proof. When
we transmit information over a data network, we code characters (letters, digits,
punctuation, and so on) into strings of bits, that is, 0’s and 1’s. For the moment let
us assume that characters are represented by seven bits. However, it is normal to
transmit more than seven bits per character, and an eighth bit can be used to help
detect some simple errors in transmission. That is, occasionally, one of the 0’s or 1’s
gets changed because of noise during transmission, and is received as the opposite
bit; a 0 entering the transmission line emerges as a 1, or vice versa. It is useful if
the communication system can tell when one of the eight bits has been changed, so
that it can signal for a retransmission.
To detect changes in a single bit, we must be sure that no two characters are
represented by sequences of bits that differ in only one position. For then, if that
position were changed, the result would be the code for the other character, and
we could not detect that an error had occurred. For example, if the code for one
character is the sequence of bits 01010101, and the code for another is 01000101,
then a change in the fourth position from the left turns the former into the latter.
One way to be sure that no characters have codes that differ in only one position
Parity bit is to precede the conventional 7-bit code for the character by a parity bit. If the
total number of 1’s in a group of bits is odd, the group is said to have odd parity. If
the number of 1’s in the group is even, then the group has even parity. The coding
scheme we select is to represent each character by an 8-bit code with even parity;
we could as well have chosen to use only the codes with odd parity. We force the
parity to be even by selecting the parity bit judiciously.
ASCII ✦ Example 2.5. The conventional ASCII (pronounced “ask-ee”; it stands for
“American Standard Code for Information Interchange”) 7-bit code for the charac-
ter A is 1000001. That sequence of seven bits already has an even number of 1’s,
and so we prefix it by 0 to get 01000001. The conventional code for C is 1000011,
which differs from the 7-bit code for A only in the sixth position. However, this
code has odd parity, and so we prefix a 1 to it, yielding the 8-bit code 11000011
with even parity. Note that after prefixing the parity bits to the codes for A and C,
we have 01000001 and 11000011, which differ in two positions, namely the first and
seventh, as seen in Fig. 2.5. ✦
A: 0 1 0 0 0 0 0 1
C: 1 1 0 0 0 0 1 1
Fig. 2.5. We can choose the initial parity bit so the 8-bit code always has even parity.
We can always pick a parity bit to attach to a 7-bit code so that the number of
1’s in the 8-bit code is even. We pick parity bit 0 if the 7-bit code for the character
at hand has even parity, and we pick parity bit 1 if the 7-bit code has odd parity.
In either case, the number of 1’s in the 8-bit code is even.
SEC. 2.3 INDUCTIVE PROOFS 39
No two sequences of bits that each have even parity can differ in only one
position. For if two such bit sequences differ in exactly one position, then one has
exactly one more 1 than the other. Thus, one sequence must have odd parity and
the other even parity, contradicting our assumption that both have even parity. We
conclude that addition of a parity bit to make the number of 1’s even serves to
create an error-detecting code for characters.
The parity-bit scheme is quite “efficient,” in the sense that it allows us to
transmit many different characters. Note that there are 2n different sequences of n
bits, since we may choose either of two values (0 or 1) for the first position, either
of two values for the second position, and so on, a total of 2 × 2 × · · · × 2 (n factors)
possible strings. Thus, we might expect to be able to represent up to 28 = 256
characters with eight bits.
However, with the parity scheme, we can choose only seven of the bits; the
eighth is then forced upon us. We can thus represent up to 27 , or 128 characters,
and still detect single errors. That is not so bad; we can use 128/256, or half, of
the possible 8-bit codes as legal codes for characters, and still detect an error in one
bit.
Similarly, if we use sequences of n bits, choosing one of them to be the parity
bit, we can represent 2n−1 characters by taking sequences of n − 1 bits and prefixing
the suitable parity bit, whose value is determined by the other n − 1 bits. Since
there are 2n sequences of n bits, we can represent 2n−1 /2n , or half the possible
number of characters, and still detect an error in any one of the bits of a sequence.
Is it possible to detect errors and use more than half the possible sequences of
bits as legal codes? Our next example tells us we cannot. The inductive proof uses
a statement that is not true for 0, and for which we must choose a larger basis,
namely 1.
Error-detecting STATEMENT S(n): If C is any set of bit strings of length n that is error detecting
code (i.e., if there are no two strings that differ in exactly one position), then C
contains at most 2n−1 strings.
This statement is not true for n = 0. S(0) says that any error-detecting set of strings
of length 0 has at most 2−1 strings, that is, half a string. Technically, the set C
consisting of only the empty string (string with no positions) is an error-detecting
set of length 0, since there are no two strings in C that differ in only one position.
Set C has more than half a string; it has one string to be exact. Thus, S(0) is false.
However, for all n ≥ 1, S(n) is true, as we shall see.
BASIS. The basis is S(1); that is, any error-detecting set of strings of length one has
at most 21−1 = 20 = 1 string. There are only two bit strings of length one, the string
0 and the string 1. However, we cannot have both of them in an error-detecting
set, because they differ in exactly one position. Thus, every error-detecting set for
n = 1 must have at most one string.
show, using this assumption, that any error-detecting set C of strings with length
n + 1 has at most 2n strings. Thus, divide C into two sets, C0 , the set of strings in
C that begin with 0, and C1 , the set of strings in C that begin with 1. For instance,
suppose n = 2 and C is the code with strings of length n + 1 = 3 constructed using
a parity bit. Then, as shown in Fig. 2.6, C consists of the strings 000, 101, 110, and
011; C0 consists of the strings 000 and 011, and C1 has the other two strings, 101
and 110.
0 00
D C0
0 11 0
1 01
D C1
1 10 1
Fig. 2.6. The set C is split into C0 , the strings beginning with 0, and C1 ,
the strings beginning with 1. D0 and D1 are formed by
deleting the leading 0’s and 1’s, respectively.
Consider the set D0 consisting of those strings in C0 with the leading 0 removed.
In our example above, D0 contains the strings 00 and 11. We claim that D0 cannot
have two strings differing in only one bit. The reason is that if there are two such
strings — say a1 a2 · · · an and b1 b2 · · · bn — then restoring their leading 0’s gives us
two strings in C0 , 0a1 a2 · · · an and 0b1 b2 · · · bn , and these strings would differ in only
one position as well. But strings in C0 are also in C, and we know that C does not
have two strings that differ in only one position. Thus, neither does D0 , and so D0
is an error detecting set.
Now we can apply the inductive hypothesis to conclude that D0 , being an
error-detecting set with strings of length n, has at most 2n−1 strings. Thus, C0 has
at most 2n−1 strings.
We can reason similarly about the set C1 . Let D1 be the set of strings in C1 ,
with their leading 1’s deleted. D1 is an error-detecting set with strings of length
n, and by the inductive hypothesis, D1 has at most 2n−1 strings. Thus, C1 has at
most 2n−1 strings. However, every string in C is in either C0 or C1 . Therefore, C
has at most 2n−1 + 2n−1 , or 2n strings.
We have proved that S(n) implies S(n + 1), and so we may conclude that S(n)
is true for all n ≥ 1. We exclude n = 0 from the claim, because the basis is n = 1,
not n = 0. We now see that the error-detecting sets constructed by parity check
are as large as possible, since they have exactly 2n−1 strings when strings of n bits
are used. ✦
SEC. 2.3 INDUCTIVE PROOFS 41
EXERCISES
2.3.1: Show the following formulas by induction on n starting at n = 1.
Pn
a) i=1 i = n(n + 1)/2.
Pn 2
b) i=1 i = n(n + 1)(2n + 1)/6.
Pn 3 2 2
c) i=1 i = n (n + 1) /4.
Pn
d) i=1 1/i(i + 1) = n/(n + 1).
Triangular 2.3.2: Numbers of the form tn = n(n + 1)/2 are called triangular numbers,
Pn because
number marbles arranged in an equilateral triangle, n on a side, will total i=1 i marbles,
which we saw in Exercise 2.3.1(a) is tn marbles. For example, bowling pins are
P4 on a side and there are t4 = 4 × 5/2 = 10 pins. Show by
arranged in a triangle
induction on n that nj=1 tj = n(n + 1)(n + 2)/6.
2.3.3: Identify the parity of each of the following bit sequences as even or odd:
a) 01101
b) 111000111
42 ITERATION, INDUCTION, AND RECURSION
c) 010101
2.3.4: Suppose we use three digits — say 0, 1, and 2 — to code symbols. A set
of strings C formed from 0’s, 1’s, and 2’s is error detecting if no two strings in C
differ in only one position. For example, {00, 11, 22} is an error-detecting set with
strings of length two, using the digits 0, 1, and 2. Show that for any n ≥ 1, an
error-detecting set of strings of length n using the digits 0, 1, and 2, cannot have
more than 3n−1 strings.
2.3.5*: Show that for any n ≥ 1, there is an error-detecting set of strings of length
n, using the digits 0, 1, and 2, that has 3n−1 strings.
2.3.6*: Show that if we use k symbols, for any k ≥ 2, then there is an error-
detecting set of strings of length n, using k different symbols as “digits,” with k n−1
strings, but no such set of strings with more than k n−1 strings.
2.3.7*: If n ≥ 1, the number of strings using the digits 0, 1, and 2, with no two
consecutive places holding the same digit, is 3 × 2n−1 . For example, there are 12
such strings of length three: 010, 012, 020, 021, 101, 102, 120, 121, 201, 202, 210,
and 212. Prove this claim by induction on the length of the strings. Is the formula
true for n = 0?
2.3.8*: Prove that the ripple-carry addition algorithm discussed in Section 1.3
produces the correct answer. Hint : Show by induction on i that after considering
the first i places from the right end, the sum of the tails of length i for the two
addends equals the number whose binary representation is the carry bit followed by
the i bits of answer generated so far.
2.3.9*: The formula for the sum of n terms of a geometric series a, ar, ar2 , . . . , arn−1
is
n−1
X (arn − a)
ari =
i=0
(r − 1)
Prove this formula by induction on n. Note that you must assume r 6= 1 for the
formula to hold. Where do you use that assumption in your proof?
2.3.10: The formula for the sum of an arithmetic series with first term a and
increment b, that is, a, (a + b), (a + 2b), . . . , a + (n − 1)b , is
n−1
X
a + bi = n 2a + (n − 1)b /2
i=0
2.3.11: Give two informal proofs that induction starting at 1 “works,” although
the statement S(0) may be false.
2.3.12: Show by induction on the length of strings that the code consisting of the
odd-parity strings detects errors.
SEC. 2.3 INDUCTIVE PROOFS 43
For example, consider the sum of 3 + 5 + 7 + 9 + 11. There are n = 5 terms, the
first is 3 and the last 11. Thus, the sum is 5 × (3 + 11)/2 = 5 × 7 = 35. You can
check that this sum is correct by adding the five integers.
Geometric A geometric series is a sequence of n numbers of the form
series
a, ar, ar2 , ar3 , . . . , arn−1
That is, the first term is a, and each successive term is r times the previous term.
The formula for the sum of n terms of a geometric series is
n−1
X (arn − a)
ari =
i=0
(r − 1)
Here, r can be greater or less than 1. If r = 1, the above formula does not work,
but all terms are a so the sum is obviously an.
As an example of a geometric series sum, consider 1 + 2 + 4 + 8 + 16. Here,
n = 5, the first term a is 1, and the ratio r is 2. Thus, the sum is
(1 × 25 − 1)/(2 − 1) = (32 − 1)/1 = 31
as you may check. For another example, consider 1 + 1/2 + 1/4 + 1/8 + 1/16. Again
n = 5 and a = 1, but r = 1/2. The sum is
1 × ( 21 )5 − 1 /( 21 − 1) = (−31/32)/(−1/2) = 1 16
15
Error-correcting 2.3.13**: If no two strings in a code differ in fewer than three positions, then we
code can actually correct a single error, by finding the unique string in the code that
differs from the received string in only one position. It turns out that there is a
code of 7-bit strings that corrects single errors and contains 16 strings. Find such a
code. Hint : Reasoning it out is probably best, but if you get stuck, write a program
that searches for such a code.
2.3.14*: Does the even parity code detect any “double errors,” that is, changes in
two different bits? Can it correct any single errors?
44 ITERATION, INDUCTION, AND RECURSION
✦
✦ ✦
✦
2.4 Complete Induction
In the examples seen so far, we have proved that S(n + 1) is true using only S(n)
as an inductive hypothesis. However, since we prove our statement S for values of
its parameter starting at the basis value and proceeding upward, we are entitled to
use S(i) for all values of i, from the basis value up to n. This form of induction is
Strong and called complete (or sometimes perfect or strong) induction, while the simple form of
weak induction induction of Section 2.3, where we used only S(n) to prove S(n + 1) is sometimes
called weak induction.
Let us begin by considering how to perform a complete induction starting with
basis n = 0. We prove that S(n) is true for all n ≥ 0 in two steps:
As for weak induction described in the previous section, we can also pick some
value a other than 0 as the basis. Then, for the basis we prove S(a), and in the
inductive step we are entitled to assume only S(a), S(a + 1), . . . , S(n). Note that
weak induction is a special case of complete induction in which we elect not to use
any of the previous statements except S(n) to prove S(n + 1).
Figure 2.7 suggests how complete induction works. Each instance of the state-
ment S(n) can (optionally) use any of the lower-indexed instances to its right in its
proof.
SEC. 2.4 COMPLETE INDUCTION 45
Fig. 2.7. Complete induction allows each instance to use one, some, or
all of the previous instances in its proof.
✦ Example 2.7. Our first example of a complete induction is a simple one that
uses multiple basis cases. As we shall see, it is only “complete” in a limited sense.
To prove S(n + 1) we do not use S(n) but we use S(n − 1) only. In more general
complete inductions to follow, we use S(n), S(n − 1), and many other instances of
the statement S.
Let us prove by induction on n the following statement for all n ≥ 0.4
STATEMENT S(n): There are integers a and b (positive, negative, or 0) such that
n = 2a + 3b.
INDUCTION. Now, we may assume S(n) and prove S(n + 1), for any n ≥ 1. Note
that we may assume n is at least the largest of the consecutive values for which we
have proved the basis: n ≥ 1 here. Statement S(n + 1) says that n + 1 = 2a + 3b
for some integers a and b.
The inductive hypothesis says that all of S(0), S(1), . . . , S(n) are true. Note
that we begin the sequence at 0 because that was the lowest of the consecutive basis
cases. Since n ≥ 1 can be assumed, we know that n − 1 ≥ 0, and therefore, S(n − 1)
is true. This statement says that there are integers a and b such that n−1 = 2a+3b.
4 Actually, this statement is true for all n, positive or negative, but the case of negative n
requires a second induction which we leave as an exercise.
46 ITERATION, INDUCTION, AND RECURSION
Since we need a in the statement S(n+1), let us restate S(n−1) to use different
names for the integers and say there are integers a′ and b′ such that
n − 1 = 2a′ + 3b′ (2.6)
If we add 2 to both sides of (2.6), we have n + 1 = 2(a′ + 1) + 3b′ . If we then let
a = a′ + 1 and b = b′ , we have the statement n + 1 = 2a + 3b for some integers a
and b. This statement is S(n + 1), so we have proved the induction. Notice that in
this proof, we did not use S(n), but we did use S(n − 1). ✦
We shall prove this claim by performing two separate inductions, the first of which
is a complete induction.
INDUCTION. Let E have n + 1 operands, and assume that S(i) is true for i =
2, 3, . . . , n. We need to prove the inductive step for n ≥ 2, so we may assume
that E has at least three operands and therefore at least two occurrences of +.
We can write E as E1 + E2 for some expressions E1 and E2 . Since E has exactly
n + 1 operands, and E1 and E2 must each have at least one of these operands, it
follows that neither E1 nor E2 can have more than n operands. Thus, the inductive
hypothesis applies to E1 and E2 , as long as they have more than one operand each
(because we started with n = 2 as the basis). There are four cases we must consider,
depending whether a is in E1 or E2 , and on whether it is or is not the only operand
in E1 or E2 .
a) E1 is a by itself. An example of this case occurs when E is a + (b + c); here E1
is a and E2 is b + c. In this case, E2 serves as F ; that is, E is already of the
form a + F .
48 ITERATION, INDUCTION, AND RECURSION
b) E1 has more than one operand, and a is among them. For instance,
E = c + (d + a) + (b + e)
In all four cases, we have transformed E to the desired form. Thus, the inductive
step is proved, and we conclude that S(n) for all n ≥ 2. ✦
✦ Example 2.9. The inductive proof of Example 2.8 leads directly to an algo-
rithm that puts an expression into the desired form. As an example, consider the
expression
E = x + (z + v) + (w + y)
and suppose that v is the operand we wish to “pull out,” that is, to play the role of
a in the transformation of Example 2.8. Initially, we have an example of case (b),
with E1 = x + (z + v), and E2 = w + y.
Next, we must work on the expression E1 and “pull out” v. E1 is an example of
case (d), and so we first apply the commutative law to transform it into (z + v) + x.
As an instance of case (b), we must work on the expression z + v, which is an
instance of case (c). We thus transform it by the commutative law into v + z.
Now E1 has been transformed into (v+z)+x, and a further use of the associative
law transforms it to v+(z+x). That, in turn, transforms E into v+(z+x) +(w+y).
By the associative law, E can be transformed into v + (z + x) + (w + y) . Thus,
E = v + F , where F is the expression (z + x) + (w + y). The entire sequence of
transformations is summarized in Fig. 2.8. ✦
Now, we can use the statement proved in Example 2.8 to prove our original
contention, that any two expressions involving the operator + and the same list
of distinct operands can be transformed one to the other by the associative and
commutative laws. This proof is by weak induction, as discussed in Section 2.3,
rather than complete induction.
SEC. 2.4 COMPLETE INDUCTION 49
x + (z + v) + (w + y)
(z + v) + x + (w + y)
(v + z) + x + (w + y)
v + (z + x) + (w + y)
v + (z + x) + (w + y)
Fig. 2.8. Using the commutative and associative laws, we can “pull out”
any operand, such as v.
STATEMENT T (n): If E and F are expressions involving the operator + and the
same set of n distinct operands, then it is possible to transform E into F by
a sequence of applications of the associative and commutative laws.
BASIS. If n = 1, then the two expressions must both be a single operand a. Since
they are the same expression, surely E is “transformable” into F .
INDUCTION. Suppose T (n) is true, for some n ≥ 1. We shall now prove T (n + 1).
Let E and F be expressions involving the same set of n+1 operands, and let a be one
of these operands. Since n + 1 ≥ 2, S(n + 1) — the statement from Example 2.8 —
must hold. Thus, we can transform E into a + E1 for some expression E1 involving
the other n operands of E. Similarly, we can transform F into a + F1 , for some
expression F1 involving the same n operands as E1 . What is more important, in
this case, is that we can also perform the transformations in the opposite direction,
transforming a + F1 into F by use of the associative and commutative laws.
Now we invoke the inductive hypothesis T (n) on the expressions E1 and F1 .
Each has the same n operands, and so the inductive hypothesis applies. That tells
us we can transform E1 into F1 , and therefore we can transform a + E1 into a + F1 .
We may thus perform the transformations
E → · · · → a + E1 Using S(n + 1)
→ · · · → a + F1 Using T (n)
→ ··· → F Using S(n + 1) in reverse
to turn E into F . ✦
✦ Example 2.11. Let us transform E = (x+y)+(w+z) into F = (w+z)+y +x.
We begin by selecting an operand, say w, to “pull out.” If we check the cases in
Example 2.8, we see that for E we perform the sequence of transformations
(x + y) + (w + z) → (w + z) + (x + y) → w + z + (x + y) (2.7)
while for F we do
(w + z) + y + x → w + (z + y) + x → w + (z + y) + x (2.8)
50 ITERATION, INDUCTION, AND RECURSION
(x + y) + (w + z) Expression E
(w + z) + (x + y) Middle of (2.7)
w + z + (x + y) End of (2.7)
w + (x + y) + z Middle of (2.9)
w + x + (y + z) End of (2.9)
w + x + (z + y) Commutative law
w + (z + y)+ x (2.10) in reverse
w + (z + y) + x Middle of (2.8) in reverse
(w + z) + y + x Expression F , end of (2.8) in reverse
Fig. 2.9. Transforming one expression into another using the commutative
and associative laws.
EXERCISES
2.4.1: “Pull out” from the expression E = (u + v) + w + (x + y) + z each of
the operands in turn. That is, start from E in each of the six parts, and use the
techniques of Example 2.8 to transform E into an expression of the form u + E1 .
Then transform E1 into an expression of the form v + E2 , and so on.
2.4.2: Use the technique of Example 2.10 to transform
a) w + x + (y + z) into (w + x) + y + z
b) (v + w) + (x + y) + z into (y + w) + (v + z) + x
2.4.3*: Let E be an expression with operators +, −, ∗, and /; each operator is
Binary operator binary only; that is, it takes two operands. Show, using a complete induction on
the number of occurrences of operators in E, that if E has n operator occurrences,
then E has n + 1 operands.
SEC. 2.4 COMPLETE INDUCTION 51
2.4.4: Give an example of a binary operator that is commutative but not associa-
tive.
2.4.5: Give an example of a binary operator that is associative but not commuta-
tive.
2.4.6*: Consider an expression E whose operators are all binary. The length of E
is the number of symbols in E, counting an operator or a left or right parenthesis as
one symbol, and also counting any operand such as 123 or abc as one symbol. Prove
that E must have an odd length. Hint: Prove the claim by complete induction on
the length of the expression E.
2.4.7: Show that every negative integer can be written in the form 2a + 3b for some
(not necessarily positive) integers a and b.
2.4.8*: Show that every integer (positive or negative) can be written in the form
5a + 7b for some (not necessarily positive) integers a and b.
2.4.9*: Is every proof by weak induction (as in Section 2.3) also a proof by complete
induction? Is every proof by complete induction also a proof by weak induction?
2.4.10*: We showed in this section how to justify complete induction by a least
counterexample argument. Show how complete induction can also be justified by
an iteration.
52 ITERATION, INDUCTION, AND RECURSION
Truth in Advertising
There are many difficulties, both theoretical and practical, in proving programs
correct. An obvious question is “What does it mean for a program to be ‘correct’ ?”
As we mentioned in Chapter 1, most programs in practice are written to satisfy some
informal specification. The specification itself may be incomplete or inconsistent.
Even if there were a precise formal specification, we can show that no algorithm
exists to prove that an arbitrary program is equivalent to a given specification.
However, in spite of these difficulties, it is beneficial to state and prove asser-
tions about programs. The loop invariants of a program are often the most useful
short explanation one can give of how the program works. Further, the programmer
should have a loop invariant in mind while writing a piece of code. That is, there
must be a reason why a program works, and this reason often has to do with an
inductive hypothesis that holds each time the program goes around a loop or each
time it performs a recursive call. The programmer should be able to envision a
proof, even though it may be impractical to write out such a proof line by line.
✦
✦ ✦
✦
2.5 Proving Properties of Programs
In this section we shall delve into an area where inductive proofs are essential:
proving that a program does what it is claimed to do. We shall see a technique
for explaining what an iterative program does as it goes around a loop. If we
understand what the loops do, we generally understand what we need to know
about an iterative program. In Section 2.9, we shall consider what is needed to
prove properties of recursive programs.
Loop Invariants
small = i
j = i+1
S(k)
j < n? no
yes
loop body
lines (4) and (5)
j = j+1
Recall that the purpose of these lines is to make small equal to the index of an
element of A[i..n-1] with the smallest value. To see why that claim is true,
consider the flowchart for our loop shown in Fig. 2.10. This flowchart shows the
five steps necessary to execute the program:
1. First, we need to initialize small to i, as we do in line (2).
2. At the beginning of the for-loop of line (3), we need to initialize j to i + 1.
3. Then, we need to test whether j < n.
4. If so, we execute the body of the loop, which consists of lines (4) and (5).
5. At the end of the body, we need to increment j and go back to the test.
In Fig. 2.10 we see a point just before the test that is labeled by a loop-invariant
statement we have called S(k); we shall discover momentarily what this statement
must be. The first time we reach the test, j has the value i + 1 and small has the
value i. The second time we reach the test, j has the value i + 2, because j has been
incremented once. Because the body (lines 4 and 5) sets small to i + 1 if A[i + 1]
is less than A[i], we see that small is the index of whichever of A[i] and A[i + 1] is
smaller.5
5 In case of a tie, small will be i. In general, we shall pretend that no ties occur and talk about
“the smallest element” when we really mean “the first occurrence of the smallest element.”
54 ITERATION, INDUCTION, AND RECURSION
Similarly, the third time we reach the test, the value of j is i + 3 and small
is the index of the smallest of A[i..i+2]. We shall thus try to prove the following
statement, which appears to be the general rule.
STATEMENT S(k): If we reach the test for j < n in the for-statement of line (3)
with k as the value of loop index j, then the value of small is the index of
the smallest of A[i..k-1].
Note that we are using the letter k to stand for one of the values that the variable
j assumes, as we go around the loop. That is less cumbersome than trying to use
j as the value of j, because we sometimes need to keep k fixed while the value of
j changes. Also notice that S(k) has the form “if we reach · · · ,” because for some
values of k we may never reach the loop test, as we broke out of the loop for a
smaller value of the loop index j. If k is one of those values, then S(k) is surely
true, because any statement of the form “if A then B” is true when A is false.
BASIS. The basis case is k = i + 1, where i is the value of the variable i at line
(3).6 Now j = i + 1 when we begin the loop. That is, we have just executed line
(2), which gives small the value i, and we have initialized j to i + 1 to begin the
loop. S(i + 1) says that small is the index of the smallest element in A[i..i],
which means that the value of small must be i. But we just observed that line (2)
causes small to have the value i. Technically, we must also show that j can never
have value i + 1 except the first time we reach the test. The reason, intuitively, is
that each time around the loop, we increment j, so it will never again be as low as
i + 1. (To be perfectly precise, we should give an inductive proof of the assumption
that j > i + 1 except the first time through the test.) Thus, the basis, S(i + 1), has
been shown to be true.
INDUCTION. Now let us assume as our inductive hypothesis that S(k) holds, for
some k ≥ i + 1, and prove S(k + 1). First, if k ≥ n, then we break out of the loop
when j has the value k, or earlier, and so we are sure never to reach the loop test
with the value of j equal to k + 1. In that case, S(k + 1) is surely true.
Thus, let us assume that k < n, so that we actually make the test with j equal
to k + 1. S(k) says that small indexes the smallest of A[i..k-1], and S(k + 1) says
that small indexes the smallest of A[i..k]. Consider what happens in the body of
the loop (lines 4 and 5) when j has the value k; there are two cases, depending on
whether the test of line (4) is true or not.
1. If A[k] is not smaller than the smallest of A[i..k-1], then the value of small
does not change. In that case, however, small also indexes the smallest of
A[i..k], since A[k] is not the smallest. Thus, the conclusion of S(k + 1) is
true in this case.
2. If A[k] is smaller than the smallest of A[i] through A[k − 1], then small is set
to k. Again, the conclusion of S(k + 1) now holds, because k is the index of
the smallest of A[i..k].
6 As far as the loop of lines (3) to (5) is concerned, i does not change. Thus, i + 1 is an
appropriate constant to use as the basis value.
SEC. 2.5 PROVING PROPERTIES OF PROGRAMS 55
Thus, in either case, small is the index of the smallest of A[i..k]. We go around
the for-loop by incrementing the variable j. Thus, just before the loop test, when
j has the value k + 1, the conclusion of S(k + 1) holds. We have now shown that
S(k) implies S(k + 1). We have completed the induction and conclude that S(k)
holds for all values k ≥ i + 1.
Next, we apply S(k) to make our claim about the inner loop of lines (3) through
(5). We exit the loop when the value of j reaches n. Since S(n) says that small
indexes the smallest of A[i..n-1], we have an important conclusion about the
working of the inner loop. We shall see how it is used in the next example. ✦
✦ Example 2.13. Now, let us consider the entire SelectionSort function, the
heart of which we reproduce in Fig. 2.11. A flowchart for this code is shown in
Fig. 2.12, where “body” refers to lines (2) through (8) of Fig. 2.11. Our inductive
assertion, which we refer to as T (m), is again a statement about what must be true
just before the test for termination of the loop. Informally, when i has the value
m, we have selected m of the smallest elements and sorted them at the beginning
of the array. More precisely, we prove the following statement T (m) by induction
on m.
STATEMENT T (m): If we reach the loop test i < n − 1 of line (1) with the value
of variable i equal to m, then
a) A[0..m-1] are in sorted order; that is, A[0] ≤ A[1] ≤ · · · ≤ A[m − 1].
b) All of A[m..n-1] are at least as great as any of A[0..m-1].
BASIS. The basis case is m = 0. The basis is true for trivial reasons. If we look
at the statement T (0), part (a) says that A[0..-1] are sorted. But there are no
elements in the range A[0], . . . , A[−1], and so (a) must be true. Similarly, part (b)
of T (0) says that all of A[0..n-1] are at least as large as any of A[0..-1]. Since
there are no elements of the latter description, part (b) is also true.
56 ITERATION, INDUCTION, AND RECURSION
i = 0
T (m)
i < n − 1? no
yes
loop body
lines (2) — (8)
i = i+1
INDUCTION. For the inductive step, we assume that T (m) is true for some m ≥ 0,
and we show that T (m + 1) holds. As in Example 2.12, we are trying to prove a
statement of the form “if A then B,” and such a statement is true whenever A is
false. Thus, T (m + 1) is true if the assumption that we reach the for-loop test with
i equal to m + 1 is false. Thus, we may assume that we actually reach the test with
i having the value m + 1; that is, we may assume m < n − 1.
When i has the value m, the body of the loop finds a smallest element in
A[m..n-1] (as proved by the statement S(m) of Example 2.12). This element is
swapped with A[m] in lines (6) through (8). Part (b) of the inductive hypothesis,
T (m), tells us the element chosen must be at least as large as any of A[0..m-1].
Moreover, those elements were sorted, so now all of A[i..m] are sorted. That proves
part (a) of statement T (m + 1).
To prove part (b) of T (m+ 1), we see that A[m] was just selected to be as small
as any of A[m+1..n-1]. Part (a) of T (m) tells us that A[0..m-1] were already as
small as any of A[m+1..n-1]. Thus, after executing the body of lines (2) through
(8) and incrementing i, we know that all of A[m+1..n-1] are at least as large as
any of A[0..m]. Since now the value of i is m + 1, we have shown the truth of the
statement T (m + 1) and thus have proved the inductive step.
Now, let m = n − 1. We know that we exit the outer for-loop when i has the
value n − 1, so T (n − 1) will hold after we finish this loop. Part (a) of T (n − 1) says
that all of A[0..n-2] are sorted, and part (b) says that A[n − 1] is as large as any
of the other elements. Thus, after the program terminates the elements in A are in
nonincreasing order; that is, they are sorted. ✦
SEC. 2.5 PROVING PROPERTIES OF PROGRAMS 57
✦ Example 2.14. The factorial function, written n!, is defined as the product of
Factorial the integers 1 × 2 × · · · × n. For example, 1! = 1, 2! = 1 × 2 = 2, and
5! = 1 × 2 × 3 × 4 × 5 = 120
Figure 2.13 shows a simple program fragment to compute n! for integers n ≥ 1.
To begin, let us prove that the while-loop of lines (4) to (6) in Fig. 2.13 must
terminate. We shall choose E to be the expression n − i. Notice that each time
around the while-loop, i is increased by 1 at line (6) and n remains unchanged.
Therefore, E decreases by 1 each time around the loop. Moreover, when E is −1
or less, we have n − i ≤ −1, or i ≥ n + 1. Thus, when E becomes negative, the
loop condition i ≤ n will be false and the loop will terminate. We don’t know how
large E is initially, since we don’t know what value of n will be read. Whatever that
value is, however, E will eventually reach as low as −1, and the loop will terminate.
58 ITERATION, INDUCTION, AND RECURSION
Now we must prove that the program of Fig. 2.13 does what it is intended to
do. The appropriate loop-invariant statement, which we prove by induction on the
value of the variable i, is
STATEMENT S(j): If we reach the loop test i ≤ n with the variable i having the
value j, then the value of the variable fact is (j − 1)!.
BASIS. The basis is S(2). We reach the test with i having value 2 only when we
enter the loop from the outside. Prior to the loop, lines (2) and (3) of Fig. 2.13 set
fact to 1 and i to 2. Since 1 = (2 − 1)!, the basis is proved.
INDUCTION. Assume S(j), and prove S(j + 1). If j > n, then we break out of the
while-loop when i has the value j or earlier, and thus we never reach the loop test
with i having the value j + 1. In that case, S(j + 1) is trivially true, because it is
of the form “If we reach · · · .”
Thus, assume j ≤ n, and consider what happens when we execute the body of
the while-loop with i having the value j. By the inductive hypothesis, before line
(5) is executed, fact has value (j − 1)!, and i has the value j. Thus, after line (5)
is executed, fact has the value j × (j − 1)!, which is j!.
At line (6), i is incremented by 1 and so attains the value j + 1. Thus, when we
reach the loop test with i having value j + 1, the value of fact is j!. The statement
S(j + 1) says that when i equals j + 1, fact equals ((j + 1) − 1)!, or j!. Thus, we
have proved statement S(j + 1), and completed the inductive step.
We already have shown that the while-loop will terminate. Evidently, it ter-
minates when i first attains a value greater than n. Since i is an integer and is
incremented by 1 each time around the loop, i must have the value n + 1 when the
loop terminates. Thus, when we reach line (7), statement S(n + 1) must hold. But
that statement says that fact has the value n!. Thus, the program prints n!, as we
wished to prove.
As a practical matter, we should point out that on any computer the factorial
program in Fig. 2.13 will print n! as an answer for very few values of n. The problem
is that the factorial function grows so rapidly that the size of the answer quickly
exceeds the maximum size of an integer on any real computer. ✦
EXERCISES
2.5.1: What is an appropriate loop invariant for the following program fragment,
which sets sum equal to the sum of the integers from 1 to n?
scanf("%d",&n);
sum = 0;
for (i = 1; i <= n; i++)
sum = sum + i;
Prove your loop invariant by induction on i, and use it to prove that the program
works as intended.
2.5.2: The following fragment computes the sum of the integers in array A[0..n-1]:
SEC. 2.6 RECURSIVE DEFINITIONS 59
sum = 0;
for (i = 0; i < n; i++)
sum = sum + A[i];
What is an appropriate loop invariant? Use it to show that the fragment works as
intended.
2.5.3*: Consider the following fragment:
scanf("%d", &n);
x = 2;
for (i = 1; i <= n; i++)
x = x * x;
An appropriate loop invariant for the point just before the test for i ≤ n is that if
k−1
we reach that point with the value k for variable i, then x = 22 . Prove that this
invariant holds, by induction on k. What is the value of x after the loop terminates?
sum = 0;
scanf("%d", &x);
while (x >= 0) {
sum = sum + x;
scanf("%d", &x);
}
2.5.4*: The fragment in Fig. 2.14 reads integers until it finds a negative integer,
and then prints the accumulated sum. What is an appropriate loop invariant for
the point just before the loop test? Use the invariant to show that the fragment
performs as intended.
2.5.5: Find the largest value of n for which the program in Fig. 2.13 works on your
computer. What are the implications of fixed-length integers for proving programs
correct?
2.5.6: Show by induction on the number of times around the loop of Fig. 2.10 that
j > i + 1 after the first time around.
✦
✦ ✦
✦
2.6 Recursive Definitions
Inductive In a recursive, or inductive, definition, we define one or more classes of closely
definition related objects (or facts) in terms of the objects themselves. The definition must
not be meaningless, like “a widget is a widget of some color,” or paradoxical, like
“something is a glotz if and only if it is not a glotz.” Rather, a recursive definition
involves
1. One or more basis rules, in which some simple objects are defined, and
2. One or more inductive rules, whereby larger objects are defined in terms of
smaller ones in the collection.
60 ITERATION, INDUCTION, AND RECURSION
BASIS. 1! = 1.
INDUCTION. n! = n × (n − 1)!.
For example, the basis tells us that 1! = 1. We can use this fact in the inductive
step with n = 2 to find
2! = 2 × 1! = 2 × 1 = 2
With n = 3, 4, and 5, we get
3! = 3 × 2! = 3 × 2 = 6
4! = 4 × 3! = 4 × 6 = 24
5! = 5 × 4! = 5 × 24 = 120
and so on. Notice that, although it appears that the term “factorial” is defined in
terms of itself, in practice, we can get the value of n! for progressively higher values
of n in terms of the factorials for lower values of n only. Thus, we have a meaningful
definition of “factorial.”
Strictly speaking, we should prove that our recursive definition of n! gives the
same result as our original definition,
n! = 1 × 2 × · · · × n
To do so, we shall prove the following statement:
BASIS. S(1) clearly holds. The basis of the recursive definition tells us that 1! = 1,
and the product 1 × · · · × 1 (i.e., the product of the integers “from 1 to 1”) is
evidently 1 as well.
INDUCTION. Assume that S(n) holds; that is, n!, as given by the recursive defini-
tion, equals 1 × 2 × · · · × n. Then the recursive definition tells us that
(n + 1)! = (n + 1) × n!
If we use the commutative law for multiplication, we see that
(n + 1)! = n! × (n + 1) (2.11)
By the inductive hypothesis,
n! = 1 × 2 × · · · × n
Thus, we may substitute 1 × 2 × · · · × n for n! in Equation (2.11) to get
(n + 1)! = 1 × 2 × · · · × n × (n + 1)
SEC. 2.6 RECURSIVE DEFINITIONS 61
which is the statement S(n + 1). We have thereby proved the inductive hypothesis
and shown that our recursive definition of n! is the same as our iterative definition.✦
BASIS. The basis covers those pairs of strings for which we can immediately resolve
the question of which comes first in lexicographic order. There are two parts of the
basis.
1. ǫ < w for any string w other than ǫ itself. Recall that ǫ is the empty string, or
the string with no characters.
2. If c < d, where c and d are characters, then for any strings w and x, we have
cw < dx.
62 ITERATION, INDUCTION, AND RECURSION
INDUCTION. If w < x for strings w and x, then for any character c we have
cw < cx.
For instance, we can use the above definition to show that base < batter. By
rule (2) of the basis, with c = s, d = t, w = e, and x = ter, we have se < tter. If
we apply the recursive rule once, with c = a, w = se, and x = tter, we infer that
ase < atter. Finally, applying the recursive rule a second time with c = b, w =
ase, and x = atter, we find base < batter. That is, the basis and inductive steps
appear as follows:
se < tter
ase < atter
base < batter
We can also show that bat < batter as follows. Part (1) of the basis tells us
that ǫ < ter. If we apply the recursive rule three times — with c equal to t, a,
and b, in turn — we make the following sequence of inferences:
ǫ < ter
t < tter
at < atter
bat < batter
Now we should prove, by induction on the number of characters that two strings
have in common at their left ends, that one string precedes the other according to
the definition in Section 2.2 if and only if it precedes according to the recursive
definition just given. We leave these two inductive proofs as exercises. ✦
In Example 2.16, the groups of facts suggested by Fig. 2.15 are large. The basis
case gives us all facts w < x for which either w = ǫ or w and x begin with different
letters. One use of the inductive step gives us all w < x facts where w and x have
exactly one initial letter in common, the second use covers those cases where w and
x have exactly two initial letters in common, and so on.
Expressions
Arithmetic expressions of all kinds are naturally defined recursively. For the basis
of the definition, we specify what the atomic operands can be. For example, in C,
atomic operands are either variables or constants. Then, the induction tells us what
operators may be applied, and to how many operands each is applied. For instance,
in C, the operator < can be applied to two operands, the operator symbol − can be
applied to one or two operands, and the function application operator, represented
by a pair of parenthesis with as many commas inside as necessary, can be applied
to one or more operands, as f (a1 , . . . , an ).
1. Variables
2. Integers
3. Real numbers
INDUCTION. If E1 and E2 are arithmetic expressions, then the following are also
arithmetic expressions:
1. (E1 + E2 )
2. (E1 − E2 )
3. (E1 × E2 )
4. (E1 / E2 )
The operators +, −, ×, and / are said to be binary operators, because they take two
Infix operator arguments. They are also said to be infix operators, because they appear between
their two arguments.
Additionally, we allow a minus sign to imply negation (change of sign), as well
as subtraction. That possibility is reflected in the fifth and last recursive rule:
5. If E is an arithmetic expression, then so is (−E).
Unary, prefix An operator like − in rule (5), which takes only one operand, is said to be a unary
operator operator. It is also said to be a prefix operator, because it appears before its
argument.
Figure 2.16 illustrates some arithmetic expressions and explains why each is an
expression. Note that sometimes parentheses are not needed, and we can omit them.
In the final expression (vi) of Fig. 2.16, the outer parentheses and the parentheses
around −(x + 10) can be omitted, and we could write y × −(x + 10). However, the
remaining parentheses are essential, since y × −x + 10 is conventionally interpreted
as (y × −x) + 10, which is not an equivalent expression (try y = 1 and x = 0, for
instance).7 ✦
7 Parentheses are redundant when they are implied by the conventional precedences of opera-
tors (unary minus highest, then multiplication and division, then addition and subtraction)
and by the convention of “left associativity,” which says that we group operators at the same
precedence level (e.g., a string of plusses and minuses) from the left. These conventions
should be familiar from C, as well as from ordinary arithmetic.
64 ITERATION, INDUCTION, AND RECURSION
Balanced Parentheses
Strings of parentheses that can appear in expressions are called balanced parentheses.
For example, the pattern ((())) appears in expression (vi) of Fig. 2.16, and the
expression
(a + b) × (c + d) − e
has the pattern (()(())). The empty string, ǫ, is also a string of balanced paren-
theses; it is the pattern of the expression x, for example. In general, what makes a
string of parentheses balanced is that it is possible to match each left parenthesis
with a right parenthesis that appears somewhere to its right. Thus, a common
definition of “balanced parenthesis strings” consists of two rules:
1. A balanced string has an equal number of left and right parentheses.
Profile 2. As we move from left to right along the string, the profile of the string never
becomes negative, where the profile is the running total of the number of left
parentheses seen minus the number of right parentheses seen.
Note that the profile must begin and end at 0. For example, Fig. 2.17(a) shows the
profile of (()(())), and Fig. 2.17(b) shows the profile of ()(())().
There are a number of recursive definitions for the notion of “balanced paren-
theses.” The following is a bit subtle, but we shall prove that it is equivalent to the
preceding, nonrecursive definition involving profiles.
3
2
1
0
( ( ) ( ( ) ) )
(a) Profile of (()(())).
0
( ) ( ( ) ) ( )
(b) Profile of ()(())().
As a final example, since we now know that (()) and ()() are balanced, we may
let these be x and y in the recursive rule, respectively, and show that ((()))()()
is balanced. ✦
We can show that the two definitions of “balanced” specify the same sets of
strings. To make things clearer, let us refer to strings that are balanced according to
the recursive definition simply as balanced and refer to those balanced according to
Profile-balanced the nonrecursive definition as profile-balanced. That is, the profile-balanced strings
are those whose profile ends at 0 and never goes negative. We need to show two
things:
These are the aims of the inductive proofs in the next two examples.
✦ Example 2.19. First, let us prove part (1), that every balanced string is profile-
balanced. The proof is a complete induction that mirrors the induction by which
the class of balanced strings is defined. That is, we prove
BASIS. The basis is n = 0. The only string that can be shown to be balanced
without any application of the recursive rule is ǫ, which is balanced according to
the basis rule. Evidently, the profile of the empty string ends at 0 and does not go
negative, so ǫ is profile-balanced.
profile
of x
profile
1 of y
0
( x ) y
Now we shall address the second direction of the equivalence between the two
definitions of “balanced parentheses.” We show in the next example that a profile-
balanced string is balanced.
8 Note that all profile-balanced strings happen to be of even length, so if n + 1 is odd, we are
not saying anything. However, we do not need the evenness of n for the proof.
68 ITERATION, INDUCTION, AND RECURSION
shorter than w, the inductive hypothesis applies to them, and they are each bal-
anced. The recursive rule defining “balanced” says that if x and y are balanced,
then so is (x)y. But w = (x)y, and so w is balanced. We have now completed the
inductive step and shown statement S(n) to be true for all n ≥ 0. ✦
EXERCISES
2.6.1*: Prove that the definitions of lexicographic order given in Example 2.16
and in Section 2.2 are the same. Hint : The proof consists of two parts, and each
is an inductive proof. For the first part, suppose that w < x according to the
definition in Example 2.16. Prove the following statement S(i) by induction on i:
“If it is necessary to apply the recursive rule i times to show that w < x, then w
precedes x according to the definition of ‘lexicographic order’ in Section 2.2.” The
basis is i = 0. The second part of the exercise is to show that if w precedes x in
lexicographic order according to the definition in Section 2.2, then w < x according
to the definition in Example 2.16. Now the induction is on the number of initial
positions that w and x have in common.
2.6.2: Draw the profiles of the following strings of parentheses:
a) (()(())
b) ()())(()
c) ((()())()())
d) (()(()(())))
Which are profile-balanced? For those that are profile-balanced, use the recursive
definition in Section 2.6 to show that they are balanced.
2.6.3*: Show that every string of balanced parentheses (according to the recursive
definition in Section 2.6) is the string of parentheses in some arithmetic expression
(see Example 2.17 for a definition of arithmetic expressions). Hint : Use a proof by
induction on the number of times the recursive rule of the definition of “balanced
parentheses” is used to construct the given string of balanced parentheses.
2.6.4: Tell whether each of the following C operators is prefix, postfix, or infix, and
whether they are unary, binary, or k-ary for some k > 2:
a) <
b) &
c) %
2.6.5: If you are familiar with the UNIX file system or a similar system, give a
recursive definition of the possible directory/file structures.
2.6.6*: A certain set S of integers is defined recursively by the following rules.
BASIS. 0 is in S.
b) Let j be your answer to part (a). Prove that all integers j + 1 and greater are
in S. Hint : Note the similarity to Exercise 2.4.8 (although here we are dealing
with only nonnegative integers).
2.6.7*: Define recursively the set of even-parity strings, by induction on the length
of the string. Hint : It helps to define two concepts simultaneously, both the even-
parity strings and the odd-parity strings.
2.6.8*: We can define sorted lists of integers as follows.
Prove that this recursive definition of “sorted list” is equivalent to our original,
nonrecursive definition, which is that the list consist of integers
a1 ≤ a2 ≤ · · · ≤ an
Remember, you need to prove two parts: (a) If a list is sorted by the recursive
definition, then it is sorted by the nonrecursive definition, and (b) if a list is sorted
by the nonrecursive definition, then it is sorted by the recursive definition. Part (a)
can use induction on the number of times the recursive rule is used, and (b) can
use induction on the length of the list.
2.6.9**: As suggested by Fig. 2.15, whenever we have a recursive definition, we
can classify the objects defined according to the “round” on which each is gener-
ated, that is, the number of times the inductive step is applied before we obtain
each object. In Examples 2.15 and 2.16, it was fairly easy to describe the results
generated on each round. Sometimes it is more challenging to do so. How do you
characterize the objects generated on the nth round for each of the following?
a) Arithmetic expressions like those described in Example 2.17. Hint : If you are
familiar with trees, which are the subject of Chapter 5, you might consider the
tree representation of expressions.
b) Balanced parenthesis strings. Note that the “number of applications used,”
as discussed in Example 2.19, is not the same as the round on which a string
is discovered. For example, (())() uses the inductive rule three times but is
discovered on round 2.
✦
✦ ✦
✦
2.7 Recursive Functions
A recursive function is one that is called from within its own body. Often, the call is
Direct and direct; for example, a function F has a call to F within itself. Sometimes, however,
indirect the call is indirect: some function F1 calls a function F2 directly, which calls F3
recursion directly, and so on, until some function Fk in the sequence calls F1 .
There is a common belief that it is easier to learn to program iteratively, or
to use nonrecursive function calls, than it is to learn to program recursively. While
we cannot argue conclusively against that point of view, we do believe that recur-
sive programming is easy once one has had the opportunity to practice the style.
70 ITERATION, INDUCTION, AND RECURSION
Recursive programs are often more succinct or easier to understand than their iter-
ative counterparts. More importantly, some problems are more easily attacked by
recursive programs than by iterative programs.9
Often, we can develop a recursive algorithm by mimicking a recursive definition
in the specification of a program we are trying to implement. A recursive function
that implements a recursive definition will have a basis part and an inductive part.
Frequently, the basis part checks for a simple kind of input that can be solved by
the basis of the definition, with no recursive call needed. The inductive part of the
function requires one or more recursive calls to itself and implements the inductive
part of the definition. Some examples should clarify these points.
✦ Example 2.21. Figure 2.19 gives a recursive function that computes n! given a
positive integer n. This function is a direct transcription of the recursive definition
of n! in Example 2.15. That is, line (1) of Fig. 2.19 distinguishes the basis case from
the inductive case. We assume that n ≥ 1, so the test of line (1) is really asking
whether n = 1. If so, we apply the basis rule, 1! = 1, at line (2). If n > 1, then we
apply the inductive rule, n! = n × (n − 1)!, at line (3).
int fact(int n)
{
(1) if (n <= 1)
(2) return 1; /* basis */
else
(3) return n*fact(n-1); /* induction */
}
For instance, if we call fact(4), the result is a call to fact(3), which calls
9 Such problems often involve some kind of search. For instance, in Chapter 5 we shall see
some recursive algorithms for searching trees, algorithms that have no convenient iterative
analog (although there are equivalent iterative algorithms using stacks).
SEC. 2.7 RECURSIVE FUNCTIONS 71
Defensive Programming
The program of Fig. 2.19 illustrates an important point about writing recursive
programs so that they do not run off into infinite sequences of calls. We tacitly
assumed that fact would never be called with an argument less than 1. Best, of
course, is to begin fact with a test that n ≥ 1, printing an error message and
returning some particular value such as 0 if it is not. However, even if we believe
very strongly that fact will never be called with n < 1, we shall be wise to include in
the basis case all these “error cases.” Then, the function fact called with erroneous
input will simply return the value 1, which is wrong, but not a disaster (in fact, 1
is even correct for n = 0, since 0! is conventionally defined to be 1).
However, suppose we were to ignore the error cases and write line (1) of Fig.
2.19 as
if (n == 1)
Then if we called fact(0), it would look like an instance of the inductive case, and
we would next call fact(-1), then fact(-2), and so on, terminating with failure
when the computer ran out of space to record the recursive calls.
fact(2), which calls fact(1). At that point, fact(1) applies the basis rule, be-
cause n ≤ 1, and returns the value 1 to fact(2). That call to fact completes
line (3), returning 2 to fact(3). In turn, fact(3) returns 6 to fact(4), which
completes line (3) by returning 24 as the answer. Figure 2.20 suggests the pattern
of calls and returns. ✦
Call ↓ ↑ Return 24
fact(4) fact(4)
Call ↓ ↑ Return 6
fact(3) fact(3)
Call ↓ ↑ Return 2
fact(2) fact(2)
Call ↓ ↑ Return 1
fact(1)
Fig. 2.21. A recursive function calls itself with arguments of smaller size.
✦ Example 2.22. We can turn the function SelectionSort of Fig. 2.2 into a
recursive function recSS, if we express the underlying algorithm as follows. Assume
the data to be sorted is in A[0..n-1].
1. Pick a smallest element from the tail of the array A, that is, from A[i..n-1].
2. Swap the element selected in step (1) with A[i].
3. Sort the remainder of the array, A[i+1..n-1].
We can express selection sort as the following recursive algorithm.
BASIS. If i = n − 1, then only the last element of the array remains to be sorted.
Since any one element is already sorted, we need not do anything.
call with i ≥ n, we shall not go into an infinite sequence of calls). In the basis case,
we have nothing to do, so we just return.
The remainder of the function is the inductive case. Lines (2) through (8) are
copied directly from the iterative version of selection sort. Like that program, these
lines set small to the index of the array A[i..n-1] that holds a smallest element
and then swap this element with A[i]. Finally, line (9) is the recursive call, which
sorts the remainder of the array. ✦
EXERCISES
BASIS. For n = 1, 12 = 1.
Divide-and-Conquer
One way of attacking a problem is to try to break it into subproblems and then
solve the subproblems and combine their solutions into a solution for the problem
as a whole. The term divide-and-conquer is used to describe this problem-solving
technique. If the subproblems are similar to the original, then we may be able to
use the same function to solve the subproblems recursively.
There are two requirements for this technique to work. The first is that the
subproblems must be simpler than the original problem. The second is that af-
ter a finite number of subdivisions, we must encounter a subproblem that can be
solved outright. If these criteria are not met, a recursive algorithm will continue
subdividing the problem forever, without finding a solution.
Note that the recursive function recSS in Fig. 2.22 satisfies both criteria. Each
time it is invoked, it is on a subarray that has one fewer element, and when it is
invoked on a subarray containing a single element, it returns without invoking itself
again. Similarly, the factorial program of Fig. 2.19 involves calls with a smaller
integer value at each call, and the recursion stops when the argument of the call
reaches 1. Section 2.8 discusses a more powerful use of the divide-and-conquer
technique, called “merge sort.” There, the size of the arrays being sorted diminishes
very rapidly, because merge sort works by dividing the size in half, rather than
subtracting 1, at each recursive call.
INDUCTION. If j does not divide i evenly, let k be the remainder when i is divided
by j. Then gcd(i, j) is the same as gcd(j, k).
2.7.9**: Prove that the recursive definition of GCD given in Exercise 2.7.8 gives
the same result as the nonrecursive definition (largest integer dividing both i and j
evenly).
2.7.10: Often, a recursive definition can be turned into an algorithm fairly directly.
For example, consider the recursive definition of “less than” on strings given in
Example 2.16. Write a recursive function that tests whether the first of two given
strings is “less than” the other. Assume that strings are represented by linked lists
of characters.
2.7.11*: From the recursive definition of a sorted list given in Exercise 2.6.8, create
a recursive sorting algorithm. How does this algorithm compare with the recursive
selection sort of Example 2.22?
✦
✦ ✦
✦
2.8 Merge Sort: A Recursive Sorting Algorithm
We shall now consider a sorting algorithm, called merge sort, which is radically
different from selection sort. Merge sort is best described recursively, and it il-
Divide and lustrates a powerful use of the divide-and-conquer technique, in which we sort a
conquer list (a1 , a2 , . . . , an ) by “dividing” the problem into two similar problems of half the
size. In principle, we could begin by dividing the list into two arbitrarily chosen
equal-sized lists, but in the program we develop, we shall make one list out of the
odd-numbered elements, (a1 , a3 , a5 , . . .) and the other out of the even-numbered
elements, (a2 , a4 , a6 , . . .).10 We then sort each of the half-sized lists separately. To
complete the sorting of the original list of n elements, we merge the two sorted,
half-sized lists by an algorithm to be described in the next example.
In the next chapter, we shall see that the time required for merge sort grows
much more slowly, as a function of the length n of the list to be sorted, than does
the time required by selection sort. Thus, even if recursive calls take some extra
time, merge sort is greatly preferable to selection sort when n is large. In Chapter
3 we shall examine the relative performance of these two sorting algorithms.
Merging
To “merge” means to produce from two sorted lists a single sorted list containing
all the elements of the two given lists and no other elements. For example, given
the lists (1, 2, 7, 7, 9) and (2, 4, 7, 8), the merger of these lists is (1, 2, 2, 4, 7, 7, 7, 8, 9).
Note that it does not make sense to talk about “merging” lists that are not already
sorted.
One simple way to merge two lists is to examine them from the front. At
each step, we find the smaller of the two elements at the current fronts of the lists,
choose that element as the next element on the combined list, and remove the chosen
element from its list, exposing a new “first” element on that list. Ties can be broken
10 Remember that “odd-numbered” and “even-numbered” refer to the positions of the elements
on the list, and not to the values of these elements.
76 ITERATION, INDUCTION, AND RECURSION
L1 L2 M
1, 2, 7, 7, 9 2, 4, 7, 8 empty
2, 7, 7, 9 2, 4, 7, 8 1
7, 7, 9 2, 4, 7, 8 1, 2
7, 7, 9 4, 7, 8 1, 2, 2
7, 7, 9 7, 8 1, 2, 2, 4
7, 9 7, 8 1, 2, 2, 4, 7
9 7, 8 1, 2, 2, 4, 7, 7
9 8 1, 2, 2, 4, 7, 7, 7
9 empty 1, 2, 2, 4, 7, 7, 7, 8
empty empty 1, 2, 2, 4, 7, 7, 7, 8, 9
arbitrarily, although we shall take from the first list when the leading elements of
both lists are the same.
BASIS. If either list is empty, then the other list is the desired result. This rule is
implemented by lines (1) and (2) of Fig. 2.24. Note that if both lists are empty,
then list2 will be returned. But that is correct, since the value of list2 is then
NULL and the merger of two empty lists is an empty list.
INDUCTION. If neither list is empty, then each has a first element. We can refer
to the two first elements as list1->element and list2->element, that is, the
element fields of the cells pointed to by list1 and list2, respectively. Fig 2.25
is a picture of the data structure. The list to be returned begins with the cell of
the smallest element. The remainder of the list is formed by merging all but that
element.
For example, lines (4) and (5) handle the case in which the first element of list
1 is smallest. Line (4) is a recursive call to merge. The first argument of this call is
list1->next, that is, a pointer to the second element on the first list (or NULL if the
first list only has one element). Thus, the recursive call is passed the list consisting
of all but the first element of the first list. The second argument is the entire second
list. As a consequence, the recursive call to merge at line (4) will return a pointer
to the merged list of all the remaining elements and store a pointer to this merged
list in the next field of the first cell on list 1. At line (5), we return a pointer to
that cell, which is now the first cell on the merged list of all the elements.
Figure 2.25 illustrates the changes. Dotted arrows are present when merge is
78 ITERATION, INDUCTION, AND RECURSION
list1 • • • ···
merged in
recursive call
list2 • • • ···
called. Solid arrows are created by merge. Specifically, the return value of merge is
a pointer to the cell of the smallest element, and the next field of that element is
shown pointing to the list returned by the recursive call to merge at line (4).
Finally, lines (6) and (7) handle the case where the second list has the smallest
element. The behavior of the algorithm is exactly as in lines (4) and (5), but the
roles of the two lists are reversed.
✦ Example 2.24. Suppose we call merge on the lists (1, 2, 7, 7, 9) and (2, 4, 7, 8)
of Example 2.23. Figure 2.26 illustrates the sequence of calls made to merge, if we
read the first column downward. We omit the commas separating list elements, but
commas are used to separate the arguments of merge.
CALL RETURN
merge(12779, 2478) 122477789
merge(2779, 2478) 22477789
merge(779, 2478) 2477789
merge(779, 478) 477789
merge(779, 78) 77789
merge(79, 78) 7789
merge(9, 78) 789
merge(9, 8) 89
merge(9, NULL) 9
For instance, since the first element of list 1 is less than the first element of
list 2, line (4) of Fig. 2.24 is executed and we recursively merge all but the first
element of list 1. That is, the first argument is the tail of list 1, or (2, 7, 7, 9), and
the second argument is the full list 2, or (2, 4, 7, 8). Now the leading elements of
both lists are the same. Since the test of line (3) in Fig. 2.24 favors the first list,
we remove the 2 from list 1, and our next call to merge has first argument (7, 7, 9)
and second argument (2, 4, 7, 8).
The returned lists are indicated in the second column, read upward. Notice
that, unlike the iterative description of merging suggested by Fig. 2.23, the recursive
SEC. 2.8 MERGE SORT: A RECURSIVE SORTING ALGORITHM 79
algorithm assembles the merged list from the rear, whereas the iterative algorithm
assembles it from the front. ✦
Splitting Lists
Another important task required for merge sort is splitting a list into two equal
parts, or into parts whose lengths differ by 1 if the original list is of odd length.
One way to do this job is to count the number of elements on the list, divide by
2, and break the list at the midpoint. Instead, we shall give a simple recursive
function split that “deals” the elements into two lists, one consisting of the first,
third, and fifth elements, and so on, and the other consisting of the elements at
the even positions. More precisely, the function split removes the even-numbered
elements from the list it is given as an argument and returns a new list consisting
of the even-numbered elements.
The C code for function split is shown in Fig. 2.27. Its argument is a list
of the type LIST that was defined in connection with the merge function. Note
that the local variable pSecondCell is defined to be of type LIST. We really use
pSecondCell as a pointer to the second cell on a list, rather than as a list itself;
but of course type LIST is, in fact, a pointer to a cell.
It is important to observe that split is a function with a side effect. It removes
the cells in the even positions from the list it is given as an argument, and it
assembles these cells into a new list, which becomes the return value of the function.
BASIS. If the list is of length 0 or 1, then we do nothing. That is, an empty list
is “split” into two empty lists, and a list of a single element is split by leaving
the element on the given list and returning an empty list of the even-numbered
elements, of which there are none. The basis is handled by lines (1) and (2) of Fig.
2.27. Line (1) handles the case where list is empty, and line (2) handles the case
where it is a single element. Notice that we are careful not to examine list->next
in line (2) unless we have previously determined, at line (1), that list is not NULL.
80 ITERATION, INDUCTION, AND RECURSION
INDUCTION. The inductive step applies when there are at least two elements on
list. At line (3), we keep a pointer to the second cell of the list in the local variable
pSecondCell. Line (4) makes the next field of the first cell skip over the second
cell and point to the third cell (or become NULL if there are only two cells on the
list). At line (5), we call split recursively, on the list consisting of all but the first
two elements. The value returned by that call is a pointer to the fourth element (or
NULL if the list is shorter than four elements), and we place this pointer in the next
field of the second cell, to complete the linking of the even-numbered elements. A
pointer to the second cell is returned by split at line (6); that pointer gives us
access to the linked list of all the even-numbered elements of the original list.
The changes made by split are suggested in Fig. 2.28. Original pointers are
dotted, and new pointers are solid. We also indicate the number of the line that
creates each of the new pointers.
(4)
list • • • • •
(5)
(3) recursive call
to split
pSecondCell •
(6)
return value •
BASIS. If the list to be sorted is empty or of length 1, just return the list; it is
already sorted. The basis is taken care of by lines (1) and (2) of Fig. 2.29.
INDUCTION. If the list is of length at least 2, use the function split at line (3) to
remove the even-numbered elements from list and use them to form another list,
pointed to by local variable SecondList. Line (4) recursively sorts the half-sized
lists, and returns the merger of the two lists.
SEC. 2.8 MERGE SORT: A RECURSIVE SORTING ALGORITHM 81
✦ Example 2.25. Let us use merge sort on the list of single-digit numbers
742897721
We again omit commas between digits for succinctness. First, the list is split into
two, by the call to split at line (3) of MergeSort. One of the resulting lists consists
of the odd positions, and the other the evens; that is, list = 72971 and SecondList
= 4872. At line (4), these lists are sorted, resulting in lists 12779 and 2478, and
then merged to produce the sorted list 122477789.
However, the sorting of the two half-sized lists does not occur by magic, but
rather by the methodical application of the recursive algorithm. Initially, MergeSort
splits the list on which it is called, if the list has length greater than 1. Figure 2.30(a)
shows the recursive splitting of the lists until each list is of length 1. Then the split
lists are merged, in pairs, going up the tree, until the entire list is sorted. This
process is suggested in Fig. 2.30(b). However, it is worth noting that the splits
and merges occur in a mixed order; not all splits are followed by all merges. For
example, the first half list, 72971, is completely split and merged before we begin
on the second half list, 4872. ✦
EXERCISES
2.8.1: Show the result of applying the function merge to the lists (1, 2, 3, 4, 5) and
(2, 4, 6, 8, 10).
82 ITERATION, INDUCTION, AND RECURSION
742897721
72971 4872
791 27 47 82
71 9 2 7 4 7 8 2
7 1
(a) Splitting.
122477789
12779 2478
179 27 47 28
17 9 2 7 4 7 8 2
7 1
(b) Merging.
2.8.2: Suppose we start with the list (8, 7, 6, 5, 4, 3, 2, 1). Show the sequence of calls
to merge, split, and MergeSort that result.
Multiway merge 2.8.3*: A multiway merge sort divides a list into k pieces of equal (or approximately
sort equal) size, sorts them recursively, and then merges all k lists by comparing all their
respective first elements and taking the smallest. The merge sort described in this
section is for the case k = 2. Modify the program in Fig. 2.31 so that it becomes a
multiway merge sort for the case k = 3.
2.8.4*: Rewrite the merge sort program to use the functions lt and key, described
in Exercise 2.2.8, to compare elements of arbitrary type.
2.8.5: Relate each of the functions (a) merge (b) split (c) MakeList to Fig. 2.21.
What is the appropriate notion of size for each of these functions?
SEC. 2.8 MERGE SORT: A RECURSIVE SORTING ALGORITHM 83
#include <stdio.h>
#include <stdlib.h>
main()
{
LIST list;
LIST MakeList()
{
int x;
LIST pNewCell;
✦
✦ ✦
✦
2.9 Proving Properties of Recursive Programs
If we want to prove a certain property of a recursive function, we generally need to
prove a statement about the effect of one call to that function. For example, that
effect might be a relationship between the arguments and the return value, such as
“the function, called with argument i, returns i!.” Frequently, we define a notion of
the “size” of the arguments of a function and perform a proof by induction on this
SEC. 2.9 PROVING PROPERTIES OF RECURSIVE PROGRAMS 85
Size of size. Some of the many possible ways in which size of arguments could be defined
arguments are
1. The value of some argument. For instance, for the recursive factorial program
of Fig. 2.19, the appropriate size is the value of the argument n.
2. The length of a list pointed to by some argument. The recursive function split
of Fig. 2.27 is an example where the length of the list is the appropriate size.
3. Some function of the arguments. For instance, we mentioned that the recursive
selection sort of Fig. 2.22 performs an induction on the number of elements in
the array that remain to be sorted. In terms of the arguments n and i, this
function is n − i + 1. As another example, the appropriate size for the merge
function of Fig. 2.24 is the sum of the lengths of the lists pointed to by the two
arguments of the function.
✦ Example 2.26. Consider the factorial program of Fig. 2.19 in Section 2.7. The
statement to prove by induction on i, for i ≥ 1, is
STATEMENT S(i): When called with the value i for the argument n, fact re-
turns i!.
BASIS. For i = 1, the test at line (1) of Fig. 2.19 causes the basis, line (2), to be
executed. That results in the return value 1, which is 1!.
INDUCTION. Assume S(i) to be true, that is, when called with some argument
i ≥ 1, fact returns i!. Now, consider what happens when fact is called with i + 1
as the value of variable n. If i ≥ 1, then i + 1 is at least 2, so the inductive case, line
(3), applies. The return value is thus n × f act(n − 1); or, since the variable n has
the value i + 1, the result (i + 1) × f act(i) is returned. By the inductive hypothesis,
fact(i) returns i!. Since (i + 1) × i! = (i + 1)!, we have proved the inductive step,
that fact, with argument i + 1, returns (i + 1)!. ✦
✦ Example 2.27. Now, let us examine the function MakeList, one of the auxiliary
routines in Fig. 2.31(a), in Section 2.8. This function creates a linked list to hold
the input elements and returns a pointer to this list. We shall prove the following
statement by induction on n ≥ 0, the number of elements in the input sequence.
86 ITERATION, INDUCTION, AND RECURSION
BASIS. The basis is n = 0, that is, when the input sequence is empty. The test
for EOF in line (3) of MakeList causes the return value to be set to NULL. Thus,
MakeList correctly returns an empty list.
INDUCTION. Suppose that S(n) is true for n ≥ 0, and consider what happens
when MakeList is called on an input sequence of n + 1 elements. Suppose we have
just read the first element x1 .
Line (4) of MakeList creates a pointer to a new cell c. Line (5) recursively calls
Makelist to create, by the inductive hypothesis, a pointer to a linked list for the
remaining n elements, x2 , x3 , . . . , xn . This pointer is put into the next field of c at
line (5). Line (6) puts x1 into the element field of c. Line (7) returns the pointer
created by line (4). This pointer points to a linked list for the n + 1 input elements,
x1 , x2 , . . . , xn .
We have proved the inductive step and conclude that MakeList works correctly
on all inputs. ✦
✦ Example 2.28. For our last example, let us prove the correctness of the merge-
sort program of Fig. 2.29, assuming that the functions split and merge perform
their respective tasks correctly. The induction will be on the length of the list
that MergeSort is given as an argument. The statement to be proved by complete
induction on n ≥ 0 is
BASIS. We take the basis to be both S(0) and S(1). When list is of length 0,
its value is NULL, and so the test of line (1) in Fig. 2.29 succeeds and the entire
function returns NULL. Likewise, if list is of length 1, the test of line (2) succeeds,
and the function returns list. Thus, MergeSort returns list when n is 0 or 1.
This observation proves statements S(0) and S(1), because a list of length 0 or 1 is
already sorted.
int sum(LIST L)
{
if (L == NULL) sum = 0;
else sum = L->element + sum(L->next);
}
int find0(LIST L)
{
if (L == NULL) find0 = FALSE;
else if (L->element == 0) find0 = TRUE;
else find0 = find0(L->next);
}
EXERCISES
2.9.1: Prove that the function PrintList in Fig. 2.31(b) prints the elements on the
list that it is passed as an argument. What statement S(i) do you prove inductively?
What is the basis value for i?
2.9.2: The function sum in Fig. 2.32 computes the sum of the elements on its given
list (whose cells are of the usual type as defined by the macro DefCell of Section 1.6
and used in the merge-sort program of Section 2.8) by adding the first element to
the sum of the remaining elements; the latter sum is computed by a recursive call
on the remainder of the list. Prove that sum correctly computes the sum of the list
elements. What statement S(i) do you prove inductively? What is the basis value
for i?
2.9.3: The function find0 in Fig. 2.32 returns TRUE if at least one of the elements
on its list is 0, and returns FALSE otherwise. It returns FALSE if the list is empty,
returns TRUE if the first element is 0 and otherwise, makes a recursive call on the
remainder of the list, and returns whatever answer is produced for the remainder.
Prove that find0 correctly determines whether 0 is present on the list. What
statement S(i) do you prove inductively? What is the basis value for i?
2.9.4*: Prove that the functions (a) merge of Fig. 2.24 and (b) split of Fig. 2.27
perform as claimed in Section 2.8.
2.9.5: Give an intuitive “least counterexample” proof of why induction starting
from a basis including both 0 and 1 is valid.
2.9.6**: Prove the correctness of (your C implementation of) the recursive GCD
algorithm of Exercise 2.7.8.
✦
✦ ✦
✦
2.10 Summary of Chapter 2
Here are the important ideas we should take from Chapter 2.
88 ITERATION, INDUCTION, AND RECURSION
✦ Inductive proofs, recursive definitions, and recursive programs are closely re-
lated ideas. Each depends on a basis and an inductive step to “work.”
✦ In “ordinary” or “weak” inductions, successive steps depend only on the pre-
vious step. We frequently need to perform a proof by complete induction, in
which each step depends on all the previous steps.
✦ There are several different ways to sort. Selection sort is a simple but slow
sorting algorithm, and merge sort is a faster but more complex algorithm.
✦ Induction is essential to prove that a program or program fragment works
correctly.
✦ Divide-and-conquer is a useful technique for designing some good algorithms,
such as merge sort. It works by dividing the problem into independent subparts
and then combining the results.
✦ Expressions are defined in a natural, recursive way in terms of their operands
and operators. Operators can be classified by the number of arguments they
take: unary (one argument), binary (two arguments), and k-ary (k arguments).
Also, a binary operator appearing between its operands is infix, an operator
appearing before its operands is prefix, and one appearing after its operands is
postfix.
✦
✦ ✦
✦
2.11 Bibliographic Notes for Chapter 2
An excellent treatment of recursion is Roberts [1986]. For more on sorting algo-
rithms, the standard source is Knuth [1973]. Berlekamp [1968] tells about tech-
niques — of which the error detection scheme in Section 2.3 is the simplest — for
detecting and correcting errors in streams of bits.
Berlekamp, E. R. [1968]. Algebraic Coding Theory, McGraw-Hill, New York.
Knuth, D. E. [1973]. The Art of Computer Programming, Vol. III: Sorting and
Searching, Addison-Wesley, Reading, Mass.
Roberts, E. [1986]. Thinking Recursively, Wiley, New York.
CHAPTER 3
✦
✦ ✦
✦
The Running Time
of Programs
In Chapter 2, we saw two radically different algorithms for sorting: selection sort
and merge sort. There are, in fact, scores of algorithms for sorting. This situation
is typical: every problem that can be solved at all can be solved by more than one
algorithm.
How, then, should we choose an algorithm to solve a given problem? As a
general rule, we should always pick an algorithm that is easy to understand, im-
plement, and document. When performance is important, as it often is, we also
need to choose an algorithm that runs quickly and uses the available computing
resources efficiently. We are thus led to consider the often subtle matter of how we
can measure the running time of a program or an algorithm, and what steps we can
take to make a program run faster.
✦
✦ ✦
✦
3.1 What This Chapter Is About
In this chapter we shall cover the following topics:
✦ The important performance measures for programs
✦ Methods for evaluating program performance
✦ “Big-oh” notation
✦ Estimating the running time of programs using the big-oh notation
✦ Using recurrence relations to evaluate the running time of recursive programs
The big-oh notation introduced in Sections 3.4 and 3.5 simplifies the process of esti-
mating the running time of programs by allowing us to avoid dealing with constants
that are almost impossible to determine, such as the number of machine instructions
that will be generated by a typical C compiler for a given source program.
We introduce the techniques needed to estimate the running time of programs
in stages. In Sections 3.6 and 3.7 we present the methods used to analyze programs
89
90 THE RUNNING TIME OF PROGRAMS
with no function calls. Section 3.8 extends our capability to programs with calls to
nonrecursive functions. Then in Sections 3.9 and 3.10 we show how to deal with
recursive functions. Finally, Section 3.11 discusses solutions to recurrence relations,
which are inductive definitions of functions that arise when we analyze the running
time of recursive functions.
✦
✦ ✦
✦
3.2 Choosing an Algorithm
If you need to write a program that will be used once on small amounts of data
and then discarded, then you should select the easiest-to-implement algorithm you
know, get the program written and debugged, and move on to something else. How-
ever, when you need to write a program that is to be used and maintained by many
people over a long period of time, other issues arise. One is the understandability, or
Simplicity simplicity, of the underlying algorithm. Simple algorithms are desirable for several
reasons. Perhaps most important, a simple algorithm is easier to implement cor-
rectly than a complex one. The resulting program is also less likely to have subtle
bugs that get exposed when the program encounters an unexpected input after it
has been in use for a substantial period of time.
Clarity Programs should be written clearly and documented carefully so that they can
be maintained by others. If an algorithm is simple and understandable, it is easier
to describe. With good documentation, modifications to the original program can
readily be done by someone other than the original writer (who frequently will not
be available to do them), or even by the original writer if the program was done some
time earlier. There are numerous stories of programmers who wrote efficient and
clever algorithms, then left the company, only to have their algorithms ripped out
and replaced by something slower but more understandable by subsequent main-
tainers of the code.
Efficiency When a program is to be run repeatedly, its efficiency and that of its underlying
algorithm become important. Generally, we associate efficiency with the time it
takes a program to run, although there are other resources that a program sometimes
must conserve, such as
1. The amount of storage space taken by its variables.
2. The amount of traffic it generates on a network of computers.
3. The amount of data that must be moved to and from disks.
For large problems, however, it is the running time that determines whether a given
program can be used, and running time is the main topic of this chapter. We
shall, in fact, take the efficiency of a program to mean the amount of time it takes,
measured as a function of the size of its input.
Often, understandability and efficiency are conflicting aims. For example, the
reader who compares the selection sort program of Fig. 2.3 with the merge sort
program of Fig. 2.32 will surely agree that the latter is not only longer, but quite
a bit harder to understand. That would still be true even if we summarized the
explanation given in Sections 2.2 and 2.8 by placing well-thought-out comments in
the programs. As we shall learn, however, merge sort is much more efficient than
selection sort, as long as the number of elements to be sorted is a hundred or more.
Unfortunately, this situation is quite typical: algorithms that are efficient for large
SEC. 3.3 MEASURING RUNNING TIME 91
amounts of data tend to be more complex to write and understand than are the
relatively inefficient algorithms.
The understandability, or simplicity, of an algorithm is somewhat subjective.
We can overcome lack of simplicity in an algorithm, to a certain extent, by explain-
ing the algorithm well in comments and program documentation. The documentor
should always consider the person who reads the code and its comments: Is a rea-
sonably intelligent person likely to understand what is being said, or are further
explanation, details, definitions, and examples needed?
On the other hand, program efficiency is an objective matter: a program takes
what time it takes, and there is no room for dispute. Unfortunately, we cannot run
the program on all possible inputs — which are typically infinite in number. Thus,
we are forced to make measures of the running time of a program that summarize
the program’s performance on all inputs, usually as a single expression such as “n2 .”
How we can do so is the subject matter of the balance of this chapter.
✦
✦ ✦
✦
3.3 Measuring Running Time
Once we have agreed that we can evaluate a program by measuring its running
time, we face the problem of determining what the running time actually is. The
two principal approaches to summarizing the running time are
1. Benchmarking
2. Analysis
We shall consider each in turn, but the primary emphasis of this chapter is on the
techniques for analyzing a program or an algorithm.
Benchmarking
When comparing two or more programs designed to do the same set of tasks, it is
customary to develop a small collection of typical inputs that can serve as bench-
marks. That is, we agree to accept the benchmark inputs as representative of the
job mix; a program that performs well on the benchmark inputs is assumed to
perform well on all inputs.
For example, a benchmark to evaluate sorting programs might contain one
small set of numbers, such as the first 20 digits of π; one medium set, such as the
set of zip codes in Texas; and one large set, such as the set of phone numbers in the
Brooklyn telephone directory. We might also want to check that a program works
efficiently (and correctly) when given an empty set of elements to sort, a singleton
set, and a list that is already in sorted order. Interestingly, some sorting algorithms
perform poorly when given a list of elements that is already sorted.1
1 Neither selection sort nor merge sort is among these; they take approximately the same time
on a sorted list as they would on any other list of the same length.
92 THE RUNNING TIME OF PROGRAMS
Analysis of a Program
To analyze a program, we begin by grouping inputs according to size. What we
choose to call the size of an input can vary from program to program, as we discussed
in Section 2.9 in connection with proving properties of recursive programs. For a
sorting program, a good measure of the size is the number of elements to be sorted.
For a program that solves n linear equations in n unknowns, it is normal to take n to
be the size of the problem. Other programs might use the value of some particular
input, or the length of a list that is an input to the program, or the size of an array
that is an input, or some combination of quantities such as these.
Running Time
It is convenient to use a function T (n) to represent the number of units of time
taken by a program or an algorithm on any input of size n. We shall call T (n) the
running time of the program. For example, a program may have a running time
T (n) = cn, where c is some constant. Put another way, the running time of this
program is linearly proportional to the size of the input on which it is run. Such a
Linear-time program or algorithm is said to be linear time, or just linear.
algorithm We can think of the running time T (n) as the number of C statements executed
by the program or as the length of time taken to run the program on some standard
computer. Most of the time we shall leave the units of T (n) unspecified. In fact,
SEC. 3.3 MEASURING RUNNING TIME 93
as we shall see in the next section, it makes sense to talk of the running time of a
program only as some (unknown) constant factor times T (n).
Quite often, the running time of a program depends on a particular input, not
just on the size of the input. In these cases, we define T (n) to be the worst-case
running time, that is, the maximum running time on any input among all inputs of
Worst and size n.
average-case Another common performance measure is Tavg (n), the average running time of
running time the program over all inputs of size n. The average running time is sometimes a more
realistic measure of what performance one will see in practice, but it is often much
harder to compute than the worst-case running time. The notion of an “average”
running time also implies that all inputs of size n are equally likely, which may or
may not be true in a given situation.
✦ Example 3.1. Let us estimate the running time of the SelectionSort fragment
shown in Fig. 3.1. The statements have the original line numbers from Fig. 2.2. The
purpose of the code is to set small to the index of the smallest of the elements found
in the portion of the array A from A[i] through A[n-1].
(2) small = i;
(3) for(j = i+1; j < n; j++)
(4) if (A[j] < A[small])
(5) small = j;
20,000
TB = 2n2
15,000
T (n) 10,000
5,000 TA = 100n
0
0 20 40 60 80 100
Input size n
From Fig. 3.2 we see that for inputs of size less than 50, program B is faster
than program A. When the input becomes larger than 50, program A becomes
faster, and from that point on, the larger the input, the bigger the advantage A has
over B. For inputs of size 100, A is twice as fast as B, and for inputs of size 1000,
A is 20 times as fast.
The functional form of a program’s running time ultimately determines how big
a problem we can solve with that program. As the speed of computers increases, we
get bigger improvements in the sizes of problems that we can solve with programs
whose running times grow slowly than with programs whose running times rise
rapidly.
Again, assuming that the running times of the programs shown in Fig. 3.2 are
in milliseconds, the table in Fig. 3.3 indicates how large a problem we can solve with
each program on the same computer in various amounts of time given in seconds.
For example, suppose we can afford 100 seconds of computer time. If computers
become 10 times as fast, then in 100 seconds we can handle problems of the size
that used to require 1000 seconds. With algorithm A, we can now solve problems 10
times as large, but with algorithm B we can only solve problems about 3 times as
large. Thus, as computers continue to get faster, we gain an even more significant
advantage by using algorithms and programs with lower growth rates.
2 This situation is not too dissimilar to the situation where algorithm A is merge sort and
algorithm B is selection sort. However, the running time of merge sort grows as n log n, as
we shall see in Section 3.10.
SEC. 3.3 MEASURING RUNNING TIME 95
EXERCISES
3.3.1: Consider the factorial program fragment in Fig. 2.13, and let the input size
be the value of n that is read. Counting one time unit for each assignment, read,
and write statement, and one unit each time the condition of the while-statement
is tested, compute the running time of the program.
96 THE RUNNING TIME OF PROGRAMS
3.3.2: For the program fragments of (a) Exercise 2.5.1 and (b) Fig. 2.14, give an
appropriate size for the input. Using the counting rules of Exercise 3.3.1, determine
the running times of the programs.
3.3.3: Suppose program A takes 2n /1000 units of time and program B takes 1000n2
units. For what values of n does program A take less time than program B?
3.3.4: For each of the programs of Exercise 3.3.3, how large a problem can be solved
in (a) 106 time units, (b) 109 time units, and (c) 1012 time units?
3.3.5: Repeat Exercises 3.3.3 and 3.3.4 if program A takes 1000n4 time units and
program B takes n10 time units.
✦
✦ ✦
✦
3.4 Big-Oh and Approximate Running Time
Suppose we have written a C program and have selected the particular input on
which we would like it to run. The running time of the program on this input still
depends on two factors:
1. The computer on which the program is run. Some computers execute instruc-
tions more rapidly than others; the ratio between the speeds of the fastest
supercomputers and the slowest personal computers is well over 1000 to 1.
2. The particular C compiler used to generate a program for the computer to
execute. Different programs can take different amounts of time to execute on
the same machine, even though the programs have the same effect.
As a result, we cannot look at a C program and its input and say, “This task
will take 3.21 seconds,” unless we know which machine and which compiler will
be used. Moreover, even if we know the program, the input, the machine, and
the compiler, it is usually far too complex a task to predict exactly the number of
machine instructions that will be executed.
For these reasons, we usually express the running time of a program using
“big-oh” notation, which is designed to let us hide constant factors such as
1. The average number of machine instructions a particular compiler generates.
2. The average number of machine instructions a particular machine executes per
second.
For example, instead of saying, as we did in Example 3.1, that the SelectionSort
fragment we studied takes time 4m − 1 on an array of length m, we would say
that it takes O(m) time, which is read “big-oh of m” or just “oh of m,” and which
informally means “some constant times m.”
The notion of “some constant times m” not only allows us to ignore unknown
constants associated with the compiler and the machine, but also allows us to make
some simplifying assumptions. In Example 3.1, for instance, we assumed that all
assignment statements take the same amount of time, and that this amount of time
was also taken by the test for termination in the for-loop, the incrementation of j
around the loop, the initialization, and so on. Since none of these assumptions is
valid in practice, the constants 4 and −1 in the running-time formula T (m) = 4m−1
are at best approximations to the truth. It would be more appropriate to describe
T (m) as “some constant times m, plus or minus another constant” or even as “at
SEC. 3.4 BIG-OH AND APPROXIMATE RUNNING TIME 97
most proportional to m.” The notation O(m) enables us to make these statements
without getting involved in unknowable or meaningless constants.
On the other hand, representing the running time of the fragment as O(m) does
tell us something very important. It says that the time to execute the fragment
on progressively larger arrays grows linearly, like the hypothetical Program A of
Figs. 3.2 and 3.3 discussed at the end of Section 3.3. Thus, the algorithm embodied
by this fragment will be superior to competing algorithms whose running time grows
faster, such as the hypothetical Program B of that discussion.
Definition of Big-Oh
We shall now give a formal definition of the notion of one function being “big-oh”
of another. Let T (n) be a function, which typically is the running time of some
program, measured as a function of the input size n. As befits a function that
measures the running time of a program, we shall assume that
Let f (n) be some function defined on the nonnegative integers n. We say that
“T (n) is O f (n) ”
It may seem odd that although (n + 1)2 is larger than n2 , we can still say that
(n + 1)2 is O(n2 ). In fact, we can also say that (n + 1)2 is big-oh of any fraction
of n2 , for example, O(n2 /100). To see why, choose witnesses n0 = 1 and c = 400.
Then if n ≥ 1, we know that
(n + 1)2 ≤ 400(n2 /100) = 4n2
by the same reasoning as was used in Example 3.2. The general principles underlying
these observations are that
1. Constant factors don’t matter. For any positive constant d and any function
T (n), T (n) is O dT (n) , regardless of whether d is a large number or a very
small fraction, as long as d > 0. To see why, choose witnesses n0 = 0 and
c = 1/d.3 Then T (n) ≤ c dT (n) , since cd = 1. Likewise,
if we know that
T (n) is O f (n) , then we also know that T (n) is O df (n) for any d > 0, even a
very small d. The reason is that we know that T (n) ≤ c1 f (n) for some constant
c1 and all n ≥ n0 . If we choose c = c1 /d, we can see that T (n) ≤ c df (n) for
n ≥ n0 .
2. Low-order terms don’t matter. Suppose T (n) is a polynomial of the form
ak nk + ak−1 nk−1 + · · · + a2 n2 + a1 n + a0
where the leading coefficient, ak , is positive. Then we can throw away all terms
but the first (the term with the highest exponent, k) and, by rule (1), ignore
the constant ak , replacing it by 1. That is, we can conclude T (n) is O(nk ). In
proof, let n0 = 1, and let c be the sum of all the positive coefficients among
the ai ’s, 0 ≤ i ≤ k. If a coefficient aj is 0 or negative, then surely aj nj ≤ 0. If
3 Note that although we are required to choose constants as witnesses, not functions, there is
nothing wrong with choosing c = 1/d, because d itself is some constant.
SEC. 3.4 BIG-OH AND APPROXIMATE RUNNING TIME 99
0 as n increases, we can throw away the lower-order term and conclude that T (n)
is O(2n ).
To prove formally that 2n + n3 is O(2n ), let n0 = 10 and c = 2. We must show
that for n ≥ 10, we have
2 n + n3 ≤ 2 × 2 n
If we subtract 2n from both sides, we see it is sufficient to show that for n ≥ 10, it
is the case that n3 ≤ 2n .
For n = 10 we have 210 = 1024 and 103 = 1000, and so n3 ≤ 2n for n = 10.
Each time we add 1 to n, 2n doubles, while n3 is multiplied by a quantity (n+1)3 /n3
100 THE RUNNING TIME OF PROGRAMS
that is less than 2 when n ≥ 10. Thus, as n increases beyond 10, n3 becomes
progressively less than 2n . We conclude that n3 ≤ 2n for n ≥ 10, and thus that
2n + n3 is O(2n ). ✦
EXERCISES
3.4.1: Consider the four functions
f1 : n2
f2 : n3
f3 : n2 if n is odd, and n3 if n is even
f4 : n2 if n is prime, and n3 if n is composite
For each i and j equal to 1, 2, 3, 4, determine whether fi (n) is O fj (n) . Either
give values n0 and c that prove the big-oh relationship, or assume that there are
such values
n0 and c, and then derive a contradiction to prove that fi (n) is not
O fj (n) . Hint : Remember that all primes except 2 are odd. Also remember that
there are an infinite number of primes and an infinite number of composite numbers
(nonprimes).
3.4.2: Following are some big-oh relationships. For each, give witnesses n0 and c
that can be used to prove the relationship. Choose your witnesses to be minimal,
in the sense that n0 − 1 and c are not witnesses, and if d < c, then n0 and d are
not witnesses.
a) n2 is O(.001n3 )
b) 25n4 − 19n3 + 13n2 − 106n + 77 is O(n4 )
c) 2n+10 is O(2n )
d) n10 is O(3n )√
e)* log2 n is O( n)
SEC. 3.5 SIMPLIFYING BIG-OH EXPRESSIONS 101
3.4.3*: Prove that if f (n) ≤ g(n) for all n, then f (n) + g(n) is O g(n) .
3.4.4**: Suppose that f (n) is O g(n) and g(n) is O f (n) . What can you say
about f (n) and g(n)? Is it necessarily true that f (n) = g(n)? Does the limit
f (n)/g(n) as n goes to infinity necessarily exist?
✦
✦ ✦
✦
3.5 Simplifying Big-Oh Expressions
As we saw in the previous section, it is possible to simplify big-oh expressions by
dropping constant factors and low-order terms. We shall see how important it is to
make such simplifications when we analyze programs. It is common for the running
time of a program to be attributable to many different statements or program
fragments, but it is also normal for a few of these pieces to account for the bulk
of the running time (by the “90-10” rule). By dropping low-order terms, and by
combining equal or approximately equal terms, we can often greatly simplify the
expressions for running time.
we shall explain why O(n2 ) is the preferred way to express the running time of
SelectionSort.
Tightness
First, we generally want the “tightest” big-oh upper bound we can prove. That is, if
T (n) is O(n2 ), we want to say so, rather than make the technically true but weaker
statement that T (n) is O(n3 ). On the other hand, this way lies madness, because
if we like O(n2 ) as an expression of running time, we should like O(0.5n2 ) even
better, because it is “tighter,” and we should like O(.01n2 ) still more, and so on.
However, since constant factors don’t matter in big-oh expressions, there is really
no point in trying to make the estimate of running time “tighter” by shrinking the
constant factor. Thus, whenever possible, we try to use a big-oh expression that
has a constant factor 1.
Figure 3.4 lists some of the more common running times for programs and their
informal names. Note in particular that O(1) is an idiomatic shorthand for “some
constant,” and we shall use O(1) repeatedly for this purpose.
Fig. 3.4. Informal names for some common big-oh running times.
Tight bound More precisely, we shall say that f (n) is a tight big-oh bound on T (n) if
1. T (n) is O f (n) , and
2. If T (n) is O g(n) , then it is also true that f (n) is O g(n) . Informally, we
cannot find a function g(n) that grows at least as fast as T (n) but grows slower
than f (n).
✦ Example 3.6. Let T (n) = 2n2 + 3n and f (n) = n2 . We claim that f (n) is
a tight bound on T (n). To see why, suppose T (n) is O g(n) . Then there are
constants c and n0 such that for all n ≥ n0 , we have T (n) = 2n2 + 3n ≤ cg(n).
Then g(n) ≥ (2/c)n2 for n ≥n0 . Since f (n) is n2 , we have f (n) ≤ (c/2)g(n) for
n ≥ n0 . Thus, f (n) is O g(n) .
On the other hand, f (n) = n3 is not a tight big-oh
bound on T (n). Now we
can pick g(n) = n2 . We have seen that T (n) is O g(n) , but we cannot show that
f (n) is O g(n) , since n3 is not O(n2 ). Thus, n3 is not a tight big-oh bound on
T (n). ✦
104 THE RUNNING TIME OF PROGRAMS
Simplicity
The other goal in our choice of a big-oh bound is simplicity in the expression of the
function. Unlike tightness, simplicity can sometimes be a matter of taste. However,
Simple function we shall generally regard a function f (n) as simple if
1. It is a single term and
2. The coefficient of that term is 1.
✦ Example 3.7. The function n2 is simple; 2n2 is not simple because the coeffi-
cient is not 1, and n2 + n is not simple because there are two terms. ✦
Tightness and There are some situations, however, where the tightness of a big-oh upper
simplicity may bound and simplicity of the bound are conflicting goals. The following is an example
conflict where the simple bound doesn’t tell the whole story. Fortunately, such cases are
rare in practice.
int PowersOfTwo(int n)
{
int i;
(1) i = 0;
(2) while (n%2 == 0) {
(3) n = n/2;
(4) i++;
}
(5) return i;
}
✦ Example 3.8. Consider the function PowersOfTwo in Fig. 3.5, which takes a
positive argument n and counts the number times 2 divides n. That is, the test of
line (2) asks whether n is even and, if so, removes a factor of 2 at line (3) of the
loop body. Also in the loop, we increment i, which counts the number of factors
we have removed from the original value of n.
Let the size of the input be the value of n itself. The body of the while-loop
consists of two C assignment statements, lines (3) and (4), and so we can say that
the time to execute the body once is O(1), that is, some constant amount of time,
independent of n. If the loop is executed m times, then the total time spent going
around the loop will be O(m), or some amount of time that is proportional to m.
To this quantity we must add O(1), or some constant, for the single executions of
lines (1) and (5), plus the first test of the while-condition, which is technically not
part of any loop iteration. Thus, the time spent by the program is O(m) + O(1).
Following our rule that low-order terms can be neglected, the time is O(m), unless
m = 0, in which case it is O(1). Put another way, the time spent on input n is
proportional to 1 plus the number of times 2 divides n.
SEC. 3.5 SIMPLIFYING BIG-OH EXPRESSIONS 105
How many times does 2 divide n? For every odd n, the answer is 0, so the
function PowersOfTwo takes time O(1) on every odd n. However, when n is a power
of 2 — that is, when n = 2k for some k — 2 divides n exactly k times. When n = 2k ,
we may take logarithms to base 2 of both sides and conclude that log2 n = k. That
is, m is at most logarithmic in n, or m = O(log n).4
4
3
f (n)
2
1
1 2 3 4 5 6 7 8 9 10
Fig. 3.6. The function f (n) = m(n) + 1, where m(n) is the number of times 2 divides n.
We may thus say that the running time of PowersOfTwo is O(log n). This
bound meets our definition of simplicity. However, there is another, more precise
way of stating an upper bound on the running time of PowersOfTwo, which is to
say that it is big-oh of the function f (n) = m(n) + 1, where m(n) is the number
of times 2 divides n. This function is hardly simple, as Fig. 3.6 shows. It oscillates
wildly but never goes above 1 + log2 n.
Since the running time of PowersOfTwo is O f (n) , but log n is not O f (n) ,
we claim that log n is not a tight bound on the running time. On the other hand,
f (n) is a tight bound, but it is not simple. ✦
4 Note that when we speak of logarithms within a big-oh expression, there is no need to specify
the base. The reason is that if a and b are bases, then loga n = (logb n)(loga b). Since loga b is
a constant, we see that loga n and logb n differ by only a constant factor. Thus, the functions
logx n to different bases x are big-oh of each other, and we can, by the transitive law, replace
within a big-oh expression any loga n by logb n where b is a base different from a.
106 THE RUNNING TIME OF PROGRAMS
Suppose T1 (n) is known to be O f1 (n) , while T2 (n) is known to be O f2 (n).
Further, suppose that f2 grows no faster than f1 ; that
is, f2 (n) is O f1 (n) .
Then we can conclude that T1 (n) + T2 (n) is O f1 (n) .
In proof, we know that there are constants n1 , n2 , n3 , c1 , c2 , and c3 such that
1. If n ≥ n1 , then T1 (n) ≤ c1 f1 (n).
2. If n ≥ n2 , then T2 (n) ≤ c2 f2 (n).
3. If n ≥ n3 , then f2 (n) ≤ c3 f1 (n).
Let n0 be the largest of n1 , n2 , and n3 , so that (1), (2), and (3) hold when n ≥ n0 .
Thus, for n ≥ n0 , we have
T1 (n) + T2 (n) ≤ c1 f1 (n) + c2 f2 (n)
If we use (3) to provide an upper bound on f2 (n), we can get rid of f2 (n) altogether
and conclude that
T1 (n) + T2 (n) ≤ c1 f1 (n) + c2 c3 f1 (n)
Therefore, for all n ≥ n0 we have
T1 (n) + T2 (n) ≤ cf1 (n)
if we define c to be c1 + c2 c3 . This statement is exactly what we need to conclude
that T1 (n) + T2 (n) is O f1 (n) .
✦ Example 3.9. Consider the program fragment in Fig. 3.7. This program makes
A an n × n identity matrix. Lines (2) through (4) place 0 in every cell of the n × n
array, and then lines (5) and (6) place 1’s in all the diagonal positions from A[0][0]
to A[n-1][n-1]. The result is an identity matrix A with the property that
A×M = M×A = M
SEC. 3.5 SIMPLIFYING BIG-OH EXPRESSIONS 107
(1) scanf("%d",&n);
(2) for (i = 0; i < n; i++)
(3) for (j = 0; j < n; j++)
(4) A[i][j] = 0;
(5) for (i = 0; i < n; i++)
(6) A[i][i] = 1;
Example 3.9 is just an application of the rule that low order terms don’t matter,
since we threw away the terms 1 and n, which are lower-degree polynomials than
n2 . However, the rule of sums allows us to do more than just throw away low-order
terms. If we have any constant number of terms that are, to within big-oh, the
same, such as a sequence of 10 assignment statements, each of which takes O(1)
time, then we can “add” ten O(1)’s to get O(1). Less formally, the sum of 10
constants is still a constant. To see why, note that 1 is O(1), so that any of the ten
O(1)’s, can be “added” to any other to get O(1) as a result. We keep combining
terms until only one O(1) is left.
108 THE RUNNING TIME OF PROGRAMS
Incommensurate Functions
It would be nice if any two functions
f (n) and g(n)
could be compared by big-oh;
that is, either f (n) is O g(n) , or g(n) is O f (n) (or both, since as we observed,
there are functions such as 2n2 and n2 + 3n that are each big-oh of the other).
Unfortunately, there are pairs of incommensurate functions, neither of which is
big-oh of the other.
✦ Example 3.10. Consider the function f (n) that is n for odd n and n2 for even
n. That is, f (1) = 1, f (2) = 4, f (3) = 3, f (4) = 16, f (5) = 5, and so on. Similarly,
let g(n) be n2 for odd n and let g(n) be n for even n. Then f (n) cannot be O g(n) ,
because of the even n’s. For as we observed in Section 3.4, n2 is definitely not O(n).
Similarly, g(n) cannot be O f (n) , because of the odd n’s, where the values of g
outrace the corresponding values of f . ✦
EXERCISES
3.5.6*: Show that if f1 (n) and f2 (n) are both tight bounds on some function T (n),
then f1 (n) and f2 (n) are each big-oh of the other.
3.5.7*: Show that log2 n is not O f (n) , where f (n) is the function from Fig. 3.6.
3.5.8: In the program of Fig. 3.7, we created an identity matrix by first putting
0’s everywhere and then putting 1’s along the diagonal. It might seem that a faster
way to do the job is to replace line (4) by a test that asks if i = j, putting 1 in
A[i][j] if so and 0 if not. We can then eliminate lines (5) and (6).
b)* Consider the programs of Fig. 3.7 and your answer to (a). Making simplifying
assumptions like those of Example 3.1, compute the number of time units taken
by each of the programs. Which is faster? Run the two programs on various-
sized arrays and plot their running times.
✦
✦ ✦
✦
3.6 Analyzing the Running Time of a Program
Armed with the concept of big-oh and the rules from Sections 3.4 and 3.5 for
manipulating big-oh expressions, we shall now learn how to derive big-oh upper
bounds on the running times of typical programs. Whenever possible, we shall look
for simple and tight big-oh bounds. In this section and the next, we shall consider
only programs without function calls (other than library functions such as printf),
leaving the matter of function calls to Sections 3.8 and beyond.
We do not expect to be able to analyze arbitrary programs, since questions
about running time are as hard as any in mathematics. On the other hand, we can
discover the running time of most programs encountered in practice, once we learn
some simple rules.
We ask the reader to accept the principle that certain simple operations on data
can be done in O(1) time, that is, in time that is independent of the size of the
input. These primitive operations in C consist of
The justification for this principle requires a detailed study of the machine
instructions (primitive steps) of a typical computer. Let us simply observe that
each of the described operations can be done with some small number of machine
instructions; often only one or two instructions are needed.
As a consequence, several kinds of statements in C can be executed in O(1)
time, that is, in some constant amount of time independent of input. These simple
Simple statements include
statement
1. Assignment statements that do not involve function calls in their expressions.
2. Read statements.
4. The jump statements break, continue, goto, and return expression, where
expression does not contain a function call.
In (1) through (3), the statements each consist of some finite number of primi-
tive operations, each of which we take on faith to require O(1) time. The summation
rule then tells us that the entire statements take O(1) time. Of course, the constants
hidden in the big-oh are larger for statements than for single primitive operations,
but as we already know, we cannot associate concrete constants with running time
of C statements anyway.
✦ Example 3.11. We observed in Example 3.9 that the read statement of line
(1) of Fig. 3.7 and the assignments of lines (4) and (6) each take O(1) time. For
another illustration, consider the fragment of the selection-sort program shown in
Fig. 3.8. The assignments of lines (2), (5), (6), (7), and (8) each take O(1) time. ✦
Blocks of simple Frequently, we find a block of simple statements that are executed consecu-
statements tively. If the running time of each of these statements is O(1), then the entire block
takes O(1) time, by the summation rule. That is, any constant number of O(1)’s
sums to O(1).
SEC. 3.6 ANALYZING THE RUNNING TIME OF A PROGRAM 111
✦ Example 3.12. Lines (6) through (8) of Fig. 3.8 form a block, since they are
always executed consecutively. Since each takes O(1) time, the block of lines (6) to
(8) takes O(1) time.
Note that we should not include line (5) in the block, since it is part of the
if-statement on line (4). That is, sometimes lines (6) to (8) are executed without
executing line (5). ✦
✦ Example 3.13. Consider the for-loop of lines (3) and (4) in Fig. 3.7, which is
(3) for (j = 0; j < n; j++)
(4) A[i][j] = 0;
We know that line (4) takes O(1) time. Clearly, we go around the loop n times, as
we can determine by subtracting the lower limit from the upper limit found on line
(3) and then adding 1. Since the body, line (4), takes O(1) time, we can neglect the
time to increment j and the time to compare j with n, both of which are also O(1).
Thus, the running time of lines (3) and (4) is the product of n and O(1), which is
112 THE RUNNING TIME OF PROGRAMS
O(n).
Similarly, we can bound the running time of the outer loop consisting of lines
(2) through (4), which is
(2) for (i = 0; i < n; i++)
(3) for (j = 0; j < n; j++)
(4) A[i][j] = 0;
We have already established that the loop of lines (3) and (4) takes O(n) time.
Thus, we can neglect the O(1) time to increment i and to test whether i < n in
each iteration, concluding that each iteration of the outer loop takes O(n) time.
The initialization i = 0 of the outer loop and the (n + 1)st test of the condition
i < n likewise take O(1) time and can be neglected. Finally, we observe that we go
around the outer loop n times, taking O(n) time for each iteration, giving a total
O(n2 ) running time. ✦
✦ Example 3.14. Now, let us consider the for-loop of lines (3) to (5) of Fig. 3.8.
Here, the body is an if-statement, a construct we shall learn how to analyze next.
It is not hard to deduce that line (4) takes O(1) time to perform the test and line
(5), if we execute it, takes O(1) time because it is an assignment with no function
calls. Thus, we take O(1) time to execute the body of the for-loop, regardless of
whether line (5) is executed. The incrementation and test in the loop add O(1)
time, so that the total time for one iteration of the loop is just O(1).
Now we must calculate the number of times we go around the loop. The number
of iterations is not related to n, the size of the input. Rather, the formula
“last value
minus initial value divided by the increment” gives us n − (i + 1) /1, or n − i − 1,
as the number of times around the loop. Strictly speaking, that formula holds only
if i < n. Fortunately, we can observe from line (1) of Fig. 3.8 that we do not enter
the loop body of lines (2) through (8) unless i ≤ n−2. Thus, not only is n−i−1 the
number of iterations, but we also know that this number cannot be 0. We conclude
that the time spent in the loop is (n − i − 1) × O(1), or O(n − i − 1).5 We do not
have to add in O(1) for initializing j, since we have established that n − i − 1 cannot
be 0. If we had not observed that n − i − 1 was positive, then we would have to
write the upper bound on the running time as O max(1, n − i − 1) . ✦
5 Technically, we have not discussed a big-oh operator applied to a function of more than one
variable. In this case, we can regard O(n − i − 1) as saying “at most some constant times
n − i − 1.” That is, we can consider n − i − 1 as a surrogate for a single variable.
SEC. 3.6 ANALYZING THE RUNNING TIME OF A PROGRAM 113
2. The if-part is a statement that is executed only if the condition is true (the
value of the expression is not zero), and
3. The else-part is a statement that is executed if the condition is false (evaluates
to 0). The else followed by the <else-part> is optional.
A condition, no matter how complex, requires the computer to perform only a
constant number of primitive operations, as long as there are no function calls
within the condition. Thus, the evaluation of the condition will take O(1) time.
Suppose that there are no function calls in the condition, and that the if- and
else-parts have big-oh upper bounds f (n) and g(n), respectively. Let us also suppose
that f (n) and g(n) are not both 0; that is, while the else-part may be missing, the
if-part is something other than an empty block. We leave as an exercise the question
of determining what happens
if both parts are missing
or are empty blocks.
If f (n) is O g(n) , then we can take O g(n) to be an upper bound on the
running time of the selection statement. The reason is that
1. We can neglect the O(1) for the condition,
2. If the else-part is executed, g(n) is known to be a bound on the running time,
and
3. If the if-part
is executed instead
of the else-part, the running time will be
O g(n) because f (n) is O g(n) .
Similarly, if g(n)is O f (n) , we can bound the running time of the selection state-
ment by O f (n) . Note that when the else-part is missing, as it often is, g(n) is 0,
which is surely O f (n) .
The problem case is when neither f nor g is big-oh of the other. We know
that either the if-part or the else-part, but not both, will be executed, and so a safe
upper bound on the running time is the larger of f (n) and g(n). Which is larger
can depend on n, as we saw in Example 3.10. Thus, we must write the running
time of the selection statement as O max f (n), g(n) .
(1) if (A[1][1] == 0)
(2) for (i = 0; i < n; i++)
(3) for (j = 0; j < n; j++)
(4) A[i][j] = 0;
else
(5) for (i = 0; i < n; i++)
(6) A[i][i] = 1;
✦ Example 3.17. In the selection sort fragment of Fig. 3.8, we can view the body
of the outer loop, that is, lines (2) through (8), as a block. This block consists of
five statements:
1. The assignment of line (2)
2. The loop of lines (3), (4), and (5)
3. The assignment of line (6)
4. The assignment of line (7)
5. The assignment of line (8)
Note that the selection statement of lines (4) and (5), and the assignment of line (5),
are not visible at the level of this block; they are hidden within a larger statement,
the for-loop of lines (3) to (5).
We know that the four assignment statements take O(1) time each. In Example
3.14 we learned that the running time of the second statement of the block — that
is, lines (3) through (5) — is O(n − i − 1). Thus, the running time of the block is
O(1) + O(n − i − 1) + O(1) + O(1) + O(1)
Since 1 is O(n − i − 1) (recall we also deduced that i never gets higher than n − 2),
we can eliminate all the O(1)’s by the summation rule. Thus, the entire block takes
O(n − i − 1) time.
For another example, consider the program fragment of Fig. 3.7 again. It can
be considered a single block consisting of three statements:
1. The read statement of line (1)
2. The loop of lines (2) through (4)
3. The loop of lines (5) and (6)
SEC. 3.6 ANALYZING THE RUNNING TIME OF A PROGRAM 115
We know that line (1) takes O(1) time. From Example 3.13, lines (2) through (4)
take O(n2 ) time; lines (5) and (6) take O(n) time. The block itself takes
O(1) + O(n2 ) + O(n)
time. By the summation rule, we can eliminate O(1) and O(n) in favor of O(n2 ).
We conclude that the fragment of Fig. 3.7 takes O(n2 ) time. ✦
✦ Example 3.18. Consider the program fragment shown in Fig. 3.10. The pro-
gram searches an array A[0..n-1] for the location of an element x that is believed
to be in the array.
(1) i = 0;
(2) while(x != A[i])
(3) i++;
The two assignment statements (1) and (3) in Fig. 3.10 have a running time of
O(1). The while-loop of lines (2) and (3) may be executed as many as n times, but
no more, because we assume one of the array elements is x. Since the loop body,
line (3), requires time O(1), the running time of the while-loop is O(n). From the
summation rule, the running time of the entire program fragment is O(n), because
that is the maximum of the time for the assignment of line (1) and the time for the
while-loop. In Chapter 6, we shall see how this O(n) program can be replaced by
an O(log n) program using binary search. ✦
EXERCISES
3.6.1: In a for-loop headed
116 THE RUNNING TIME OF PROGRAMS
3.6.2: Give a big-oh upper bound on the running time of the trivial selection
statement
if ( C ) { }
where C is a condition that does not involve any function calls.
3.6.3: Repeat Exercise 3.6.2 for the trivial while-loop
while ( C ) { }
3.6.6: Give a rule for the running time of a degenerate while-loop, in which the
condition is known to be false right from the start, such as
while (1 != 1)
something O f (n) ;
✦
✦ ✦
✦
3.7 A Recursive Rule for Bounding Running Time
In the previous section, we informally described a number of rules for defining the
running time of certain program constructs in terms of the running times of their
parts. For instance, we said that the time of a for-loop is essentially the time
taken by the body multiplied by the number of iterations. Hidden among these
rules was the notion that programs are built using inductive rules by which com-
pound statements (loops, selections, and other statements that have substatements
as constituent parts) are constructed from a basis consisting of simple statements
such as assignment, read, write, and jump statements. The inductive rules covered
loop formation, selection statements, and blocks, which are sequences of compound
statements.
We shall state some syntactic rules for building statements of C as a recursive
definition. These rules correspond to the grammatical rules for defining C that
often appear in a text on C. We shall see in Chapter 11 that grammars can serve as
a succinct recursive notation for specifying the syntax of programming languages.
SEC. 3.7 A RECURSIVE RULE FOR BOUNDING RUNNING TIME 117
INDUCTION. The following rules let us construct statements from smaller state-
ments:
1. While-statement. If S is a statement and C is a condition (an expression with
an arithmetic value), then
while ( C ) S
is a statement. The body S is executed as long as C is true (has a nonzero
value).
2. Do-while-statement. If S is a statement and C is a condition, then
do S while ( C ) ;
is a statement. The do-while is similar to the while-statement except that the
body S is executed at least once.
118 THE RUNNING TIME OF PROGRAMS
line or lines of the program that form the simple or compound statement represented
by the node. From each node N that represents a compound statement we find
Structure tree lines downward to its “children” nodes. The children of node N represent the
substatements forming the compound statement that N represents. Such a tree is
called the structure tree for a program.
for
(1)–(8)
block
(2)–(8)
for
(2) (6) (7) (8)
(3)–(5)
if
(4)–(5)
(5)
✦ Example 3.20. Figure 3.12 is the structure tree for the program fragment
of Fig. 3.11. Each of the circles is a leaf representing one of the five assignment
statements of Fig. 3.11. We have omitted from Fig. 3.12 an indication that these
five statements are assignment statements.
At the top (the “root”) of the tree is a node representing the entire fragment
120 THE RUNNING TIME OF PROGRAMS
of lines (1) through (8); it is a for-statement. The body of the for-loop is a block
consisting of lines (2) through (8).7 This block is represented by a node just below
the root. That block node has five children, representing the five statements of the
block. Four of them are assignment statements, lines (2), (6), (7), and (8). The
fifth is the for-loop of lines (3) through (5).
The node for this for-loop has a child representing its body, the if-statement of
lines (4) and (5). The latter node has a child representing its constituent statement,
the assignment of line (5). ✦
Just as program structures are built recursively, we can define big-oh upper bounds
on the running time of programs, using an analogous recursive method. As in
Section 3.6, we presume that there are no function calls within the expressions that
form assignment statements, print statements, conditions in selections, conditions
in while-, for-, or do-while- loops, or the initialization or reinitialization of a for-loop.
The only exception is a call to a read- or write-function such as printf.
BASIS. The bound for a simple statement — that is, for an assignment, a read, a
write, or a jump — is O(1).
INDUCTION. For the five compound constructs we have discussed, the rules for
computing their running time are as follows.
1. While-statement. Let O f (n) be the upper bound on the running time of the
body of the while-statement; f (n) was discovered by the recursive application
of these rules. Let g(n) be an upper boundon the number of times we may
go around the loop. Then O 1 + f (n) + 1 g(n) is an upper bound on the
running time of the while-loop. That is, O f (n) + 1 is an upper bound on the
running time of the body plus the test after the body. The additional 1 at the
beginning of the formula accounts for the first test, before entering the loop.
In the common case where f (n) and g(n) are at least 1 (or we can define them
to be 1 if their value would otherwise
be 0), we can write the running time
of the while-loop as O f (n)g(n) . This common formula for running time is
suggested by Fig. 3.13(a).
2. Do-while-statement. If O f (n) is an upper bound on the body of the loop and
is an upper
g(n)
bound
on the number of times we can go around the loop, then
O f (n)+1 g(n) is an upper bound on the running time of the do-while-loop.
The “+1” represents the time to compute and test the condition at the end
of each iteration of the loop. Note that for a do-while-loop, g(n) is always at
least 1. In the common case that f (n) ≥ 1 for all n, the running time of the
do-while-loop is O f (n)g(n) . Figure 3.13(b) suggests the way that running
time is computed for the common case of a do-while-loop.
7 A more detailed structure tree would have children representing the expressions for the
initialization, termination test, and reinitialization of the for-loop.
SEC. 3.7 A RECURSIVE RULE FOR BOUNDING RUNNING TIME 121
Test O(1)
O g(n)f (n)
At most
g(n) times
Body O f (n)
around
(a) While-statement.
Body O f (n)
At most
g(n) times O g(n)f (n)
around
Test O(1)
(b) Do-while-statement.
Initialize O(1)
Test O(1)
O g(n)f (n)
g(n) times
around
Body O f (n)
Reinitialize O(1)
(c) For-statement.
Fig. 3.13. Computing the running time of loop statements without function calls.
122 THE RUNNING TIME OF PROGRAMS
3. For-statement. If O f (n) is an upper bound on the running time of the body,
and g(n) is an upper bound on the number of times
around the loop, then
an
upper bound on the time of a for-statement is O 1 + f (n) + 1 g(n) . The
factor f (n) + 1 represents the cost of going around once, including the body,
the test, and the reinitialization. The “1+” at the beginning represents the
first initialization and the possibility that the first test is negative, resulting in
zero iterations of the loop. In the common case that f (n) and g(n) are both
at least 1, or can be redefined to be at least 1, then the running time of the
for-statement is O f (n)g(n) , as is illustrated by Fig. 3.13(c).
4. Selection statement. If O f1 (n) and O f2 (n) are upper bounds on the run-
ning time of the if-part and the else-part, respectively (f2 (n) is 0 if the else-part
is missing),
then an upper bound on the running time of the selection statement
is O 1 + max f1 (n), f2 (n) . The “1+” represents the test; in the common
case, where at least one of f1 (n) and f2 (n) are positive for all n, the “1+”
can be omitted. Further, if one of f1 (n) and f2 (n) is big-oh of the other,
this expression may be simplified to whichever is the larger, as was stated in
Exercise 3.5.5. Figure 3.14 suggests the computation of running time for an
if-statement.
O(1)
Test
O max(f1 (n), f2 (n)
or
If Else O larger of f1 (n) and f2 (n)
Part Part
O f1 (n) O f2 (n)
Fig. 3.14. Computing the running time of an if-statement without function calls.
5. Block. If O f1 (n) , O f2 (n) , . . . , O fk (n) are our upper bounds
on the state-
ments within the block, then O f1 (n) + f2 (n) + · · · + fk (n) is an upper bound
on the running time of the block. If possible, use the summation rule to simplify
this expression. The rule is illustrated in Fig. 3.15.
We apply these rules traveling up the structure tree that represents the con-
struction of compound statements from smaller statements. Alternatively, we can
see the application of these rules as beginning with the simple statements covered
by the basis and then proceeding to progressively larger compound statements, ap-
plying whichever of the five inductive rules is appropriate at each step. However
we view the process of computing the upper bound on running time, we analyze
SEC. 3.7 A RECURSIVE RULE FOR BOUNDING RUNNING TIME 123
O f1 (n)
O f1 (n) + f2 (n) + · · · + fk (n)
O f2 (n)
or
O largest fi (n)
O fk (n)
Fig. 3.15. Computing the running time of a block without function calls.
a compound statement only after we have analyzed all the statements that are its
constituent parts.
✦ Example 3.21. Let us revisit our selection sort program of Fig. 3.11, whose
structure tree is in Fig. 3.12. To begin, we know that each of the assignment
statements at the leaves in Fig. 3.12 takes O(1) time. Proceeding up the tree, we
next come to the if-statement of lines (4) and (5). We recall from Example 3.15
that this compound statement takes O(1) time.
Next as we travel up the tree (or proceed from smaller statements to their
surrounding larger statements), we must analyze the for-loop of lines (3) to (5). We
did so in Example 3.14, where we discovered that the time was O(n − i − 1). Here,
we have chosen to express the running time as a function of the two variables n and
i. That choice presents us with some computational difficulties, and as we shall see,
we could have chosen the looser upper bound of O(n). Working with O(n − i − 1)
as our bound, we must now observe from line (1) of Fig. 3.11 that i can never get
as large as n − 1. Thus, n − i − 1 is strictly greater than 0 and dominates O(1).
Consequently, we need not add to O(n − i − 1) an O(1) term for initializing the
index j of the for-loop.
Now we come to the block of lines (2) through (8). As discussed in Example
3.17, the running time of this block is the sum of four O(1)’s corresponding to the
four assignment statements, plus the term O(n − i − 1) for the compound statement
of lines (3) to (5). By the rule for sums, plus our observation that i < n, we drop
the O(1)’s, leaving O(n − i − 1) as the running time of the block.
Finally, we must consider the for-loop of lines (1) through (8). This loop was
not analyzed in Section 3.6, but we can apply inductive rule (3). That rule needs
an upper bound on the running time of the body, which is the block of lines (2)
through (8). We just determined the bound O(n − i − 1) for this block, presenting
us with a situation we have not previously seen. While i is constant within the
block, i is the index of the outer for-loop, and therefore varies within the loop.
Thus, the bound O(n − i − 1) makes no sense if we think of it as the running time
of all iterations of the loop. Fortunately, we can see from line (1) that i is never
124 THE RUNNING TIME OF PROGRAMS
below 0, and so O(n − 1) is an upper bound on O(n − i − 1). Moreover, by our rule
that low-order terms don’t matter, we can simplify O(n − 1) to O(n).
Next, we need to determine the number of times we go around the loop. Since
i ranges from 0 to n − 2, evidently we go around n − 1 times. When we multiply
n − 1 by O(n), we get O(n2 − n). Throwing away low-order terms again, we see
that O(n2 ) is an upper bound on the running time of the whole selection sort
program. That is to say, selection sort has a quadratic upper bound on its running
time. The quadratic upper bound is the tightest possible, since we can show that if
the elements are initially in reverse of sorted order, then selection sort does make
n(n − 1)/2 comparison steps. ✦
As we shall see, we can derive n log n upper and lower bounds for the running
time of merge sort. In practice, merge sort is more efficient than selection sort for
all but some small values of n. The reason merge sort is sometimes slower than
selection sort is because the O(n log n) upper bound hides a larger constant than
the O(n2 ) bound for selection sort. The true situation is a pair of crossing curves,
as shown in Fig. 3.2 of Section 3.3.
✦ Example 3.22. We shall perform this more precise analysis on the outer loop
of selection sort. Despite this extra effort, we still get a quadratic upper bound.
As Example 3.21 demonstrated, the running time of the iteration of the outer loop
with the value i for the index variable i is O(n − i − 1). Thus an upper bound on
thePtime taken by all
n−2 iterations, as i ranges from its initial value 0 up to n − 2, is
O i=0 (n − i − 1) . The terms in this sum form an arithmetic progression, so we
can use the formula “average of the first and last terms times the number of terms.”
That formula tells us that
n−2
X
(n − i − 1) = n(n − 1)/2 = 0.5n2 − 0.5n
i=0
Neglecting low-order terms and constant factors, we see that O(0.5n2 − 0.5n) is the
same as O(n2 ). We again conclude that selection sort has a quadratic upper bound
on its running time. ✦
SEC. 3.7 A RECURSIVE RULE FOR BOUNDING RUNNING TIME 125
n−1
Time
per
iteration
1
0 n−2
Value of i
The difference between the simple analysis in Example 3.21 and the more de-
tailed analysis in Example 3.22 is illustrated by Fig. 3.16. In Example 3.21, we
took the maximum time of any iteration as the time for each iteration, thus getting
the area of the rectangle as our bound on the running time of the outer for-loop in
Fig. 3.11. In Example 3.22, we bounded the running time of each iteration by the
diagonal line, since the time for each iteration decreases linearly with i. Thus, we
obtained the area of the triangle as our estimate of the running time. However, it is
well known that the area of the triangle is half the area of the rectangle. Since the
factor of 2 gets lost with the other constant factors that are hidden by the big-oh
notation anyway, the two upper bounds on the running time are really the same.
EXERCISES
3.7.1: Figure 3.17 contains a C program to find the average of the elements of an
array A[0..n-1] and to print the index of the element that is closest to average (a
tie is broken in favor of the element appearing first). We assume that n ≥ 1 and do
not include the necessary check for an empty array. Build a structure tree showing
how the statements are grouped into progressively more complex statements, and
give a simple and tight big-oh upper bound on the running time for each of the
statements in the tree. What is the running time of the entire program?
3.7.2: The fragment in Fig. 3.18 transforms an n by n matrix A. Show the structure
tree for the fragment. Give a big-oh upper bound on the running time of each
compound statement
a) Making the bound for the two inner loops a function of n and i.
b) Making the bound for all loops be a function of n only.
For the whole program, is there a big-oh difference between your answers to parts
(a) and (b)?
126 THE RUNNING TIME OF PROGRAMS
#include <stdio.h>
#define MAX 100
int A[MAX];
main()
{
int closest, i, n;
float avg, sum;
3.7.3*: Figure 3.19 contains a program fragment that applies the powers-of-2 op-
eration discussed in Example 3.8 to the integers i from 1 to n. Show the structure
tree for the fragment. Give a big-oh upper bound on the running time of each
compound statement
a) Making the bound for the while-loop a function of (the factors of) i.
b) Making the bound for the while-loop a function of n only.
For the whole program, is there a big-oh difference between your answers to parts
(a) and (b)?
SEC. 3.8 ANALYZING PROGRAMS WITH FUNCTION CALLS 127
3.7.4: In Fig. 3.20 is a function that determines whether the argument n is a prime.
Note that
√ if n is not a prime, then it is divisible evenly by some integer i between
2 and n. Show the structure tree for the function. Give a big-oh upper bound
on the running time of each compound statement, as a function of n. What is the
running time of the function as a whole?
int prime(int n)
{
int i;
(1) i = 2;
(2) while (i*i <= n)
(3) if (n%i == 0)
(4) return FALSE;
else
(5) i++;
(6) return TRUE;
}
✦
✦ ✦
✦
3.8 Analyzing Programs with Function Calls
We now show how to analyze the running time of a program or program fragment
that contains function calls. To begin, if all functions are nonrecursive, we can
determine the running time of the functions making up the program one at a time,
starting with those functions that do not call any other function. Then we evaluate
the running times of the functions that call only functions whose running times we
have already determined. We proceed in this fashion until we have evaluated the
running time for all functions.
There are some complexities introduced by the fact that for different functions
there may be different natural measures of the size of the input. In general, the
input to a function is the list of the arguments of that function. If function F calls
function G, we must relate the size measure for the arguments of G to the measure
of size that is used for F . It is hard to give useful generalities, but some examples in
128 THE RUNNING TIME OF PROGRAMS
this section and the next will help us see how the process of bounding the running
time for functions works in simple cases.
Suppose we have determined
that a good upper bound on the running time
of a function F is O h(n) , where n is a measure of the size of the arguments of
F . Then when a call to F is made from within a simple statement (e.g., in an
assignment), we add O h(n) to the cost of that statement.
#include <stdio.h>
int bar(int x, int n);
int foo(int x, int n);
main()
{
int a, n;
When a function call with upper bound O h(n) appears in a condition of a
while-, do-while-, or if-statement, or in the initialization, test, or reinitialization of
a for-statement, the time of that function call is accounted for as follows:
1. If the function call is in the condition of a while- or do-while-loop, or in the
condition or reinitialization of a for-loop, add h(n) to the bound on the time
for each iteration. Then proceed as in Section 3.7 to obtain the running time
of the loop.
SEC. 3.8 ANALYZING PROGRAMS WITH FUNCTION CALLS 129
2. If the function call is in the initialization of a for-loop, add O h(n) to the cost
of the loop.
3. If the function call is in the condition of an if-statement, add h(n) to the cost
of the statement.
✦ Example 3.23. Let us analyze the (meaningless) program of Fig. 3.21. First,
let us note that it is not a recursive program. The function main calls both functions
foo and bar, and foo calls bar, but that is all. The diagram of Fig. 3.22, called a
Calling graph calling graph, indicates how functions call one another. Since there are no cycles,
there is no recursion, and we can analyze the functions by starting with “group 0,”
those that do not call other functions (bar in this case), then working on “group 1,”
those that call only functions in group 0 (foo in this case), and then on “group 2,”
those that call only functions in groups 0 and 1 (main in this case). At this point,
we are done, since all functions are in groups. In general, we would have to consider
a larger number of groups, but as long as there are no cycles, we can eventually
place each function in a group.
The order in which we analyze the running time of the functions is also the
order in which we should examine them in order to gain an understanding of what
the program does. Thus, let us first consider what the function bar does. The
for-loop of lines (4) and (5) adds P each of the integers P 1 through n to x. As a
result, bar(x, n) is equal to x + ni=1 i. The summation ni=1 i is another example
of summing an arithmetic progression, which we can calculate by adding the first
and
Pn last terms, multiplying by the number of terms, and dividing by 2. That is,
i=1 i = (1 + n)n/2. Thus, bar(x, n) = x + n(n + 1)/2.
Now, consider function foo, which adds to its argument x the sum
130 THE RUNNING TIME OF PROGRAMS
main
foo
bar
n
X
bar(i, n)
i=1
From our understanding Pnof bar we know that bar(i, n) = i + n(n + 1)/2. Thus foo
adds to x the quantity i=1 (i+n(n+1)/2). We have another arithmetic progression
to sum, and this one requires more algebraic manipulation. However, the reader
may check that the quantity foo adds to its argument x is (n3 + 2n2 + n)/2.
Finally, let us consider the function main. We read n at line (1). At line (2) we
apply foo to 0 and n. By our understanding of foo, the value of foo(0,n) at line (2)
will be 0 plus (n3 + 2n2 + n)/2. At line (3), we print the value of bar(foo(0,n),n),
which, by our understanding of bar is the sum of n(n + 1)/2 and the current value
of foo(a,n). Thus, the value printed is (n3 + 3n2 + 2n)/2.
Now let us analyze the running time of the program in Fig. 3.21, working from
bar to foo and then to main, as we did in Example 3.23. In this case, we shall take
the value n to be the size of the input to all three functions. That is, even though
we generally want to consider the “size” of all the arguments to a function, in this
case the running time of the functions depends only on n.
To analyze bar, we note that line (5) takes O(1) time. The for-loop of lines
(4) and (5) iterates n times, and so the time for lines (4) and (5) is O(n). Line (6)
takes O(1) time, and so the time for the block of lines (4) to (6) is O(n).
Now we proceed to foo. The assignment in line (8) takes O(1) plus the time of
a call to bar(i,n). That call, we already know, takes O(n) time, and so the time
for line (8) is O(n). The for-loop of lines (7) and (8) iterates n times, and so we
SEC. 3.8 ANALYZING PROGRAMS WITH FUNCTION CALLS 131
multiply the O(n) for the body by n to get O(n2 ) as the running time of a call to
foo.
Finally, we may analyze main. Line (1) takes O(1) time. The call to foo at
line (2) takes O(n2 ) time, as we just deduced. The print-statement of line (3) takes
O(1) plus the time for a call to bar. The latter takes O(n) time, and so line (3) as
a whole takes O(1) + O(n) time. The total time for the block of lines (1) through
(3) is therefore O(1) + O(n2 ) + O(1) + O(n). By the rule for sums, we can eliminate
all but the second term, concluding that the function main takes O(n2 ) time. That
is, the call to foo at line (2) is the dominant cost. ✦
EXERCISES
√
3.8.2: Suppose that prime(n) is a function call that takes time O( n). Consider
a function whose body is
if ( prime(n) )
A;
else
B;
Give a simple and tight big-oh upper bound on the running time of this function,
as a function of n, on the assumption that
a) A takes O(n) time and B takes O(1) time
b) A and B both take O(1) time
3.8.3: Consider a function whose body is
sum = 0;
for (i = 1; i <= f(n); i++)
sum += i;
where f (n) is a function call. Give a simple and tight big-oh upper bound on the
running time of this function, as a function of n, on the assumption that
a) The running time of f (n) is O(n), and the value of f (n) is n!
b) The running time of f (n) is O(n), and the value of f (n) is n
c) The running time of f (n) is O(n2 ), and the value of f (n) is n
d) The running time of f (n) is O(1), and the value of f (n) is 0
3.8.4: Draw the calling graph for the functions in the merge sort program from
Section 2.8. Is the program recursive?
3.8.5*: Suppose that line (7) in the function foo of Fig. 3.21 were replaced by
for (i = 1; i <= bar(n,n); i++)
What would the running time of main be then?
132 THE RUNNING TIME OF PROGRAMS
✦
✦ ✦
✦
3.9 Analyzing Recursive Functions
Determining the running time of a function that calls itself recursively requires more
work than analyzing nonrecursive functions. The analysis for a recursive function
requires that we associate with each function F in a program an unknown running
time TF (n). This unknown function represents F ’s running time as a function of
n, the size of F ’s arguments. We then establish an inductive definition, called a
Recurrence recurrence relation for TF (n), that relates TF (n) to functions of the form TG (k) for
relation the other functions G in the program and their associated argument sizes k. If F
is directly recursive, then one or more of the G’s will be the same as F .
The value of TF (n) is normally established by an induction on the argument
size n. Thus, it is necessary to pick a notion of argument size that guarantees
functions are called with progressively smaller arguments as the recursion proceeds.
The requirement is the same as what we encountered in Section 2.9, when we tried
to prove statements about recursive programs. That should be no surprise, because
a statement about the running time of a program is just one example of something
that we might try to prove about a program.
Once we have a suitable notion of argument size, we can consider two cases:
1. The argument size is sufficiently small that no recursive calls will be made by
F . This case corresponds to the basis in an inductive definition of TF (n).
2. For larger argument sizes, one or more recursive calls may occur. Note that
whatever recursive calls F makes, whether to itself or to some other function G,
will be made with smaller arguments. This case corresponds to the inductive
step in the definition of TF (n).
The recurrence relation defining TF (n) is derived by examining the code for
function F and doing the following:
a) For each call to a function G or use of a function G in an expression (note
that G may be F ), use TG (k) as the running time of the call, where k is the
appropriate measure of the size of the arguments in the call.
b) Evaluate the running time of the body of function F , using the same techniques
as in previous sections, but leaving terms like TG (k) as unknown functions,
rather than concrete functions such as n2 . These terms cannot generally be
combined with concrete functions using simplification tricks such as the sum-
mation rule. We must analyze F twice — once on the assumption that F ’s
argument size n is sufficiently small that no recursive function calls are made,
and once assuming that n is not that small. As a result, we obtain two expres-
sions for the running time of F — one (the basis expression) serving as the basis
Basis and of the recurrence relation for TF (n), and the other (the induction expression)
induction serving as the inductive part.
expressions
c) In the resulting basis and induction
expressions for the running time of F ,
replace big-oh terms like O f (n) by a specific constant times the function
involved — for example, cf (n).
d) If a is a basis value for the input size, set TF (a) equal to the basis expression
resulting from step (c) on the assumption that there are no recursive calls.
Also, set TF (n) equal to the induction expression from (c) for the case where
n is not a basis value.
SEC. 3.9 ANALYZING RECURSIVE FUNCTIONS 133
int fact(int n)
{
(1) if (n <= 1)
(2) return 1; /* basis */
else
(3) return n*fact(n-1); /* induction */
}
The running time of the entire function is determined by solving this recurrence
relation. In Section 3.11, we shall give general techniques for solving recurrences of
the kind that arise in the analysis of common recursive functions. For the moment,
we solve these recurrences by ad hoc means.
✦ Example 3.24. Let us reconsider the recursive program from Section 2.7 to
compute the factorial function; the code is shown in Fig. 3.23. Since there is only
one function, fact, involved, we shall use T (n) for the unknown running time
of this function. We shall use n, the value of the argument, as the size of the
argument. Clearly, recursive calls made by fact when the argument is n have a
smaller argument, n − 1 to be precise.
For the basis of the inductive definition of T (n) we shall take n = 1, since no
recursive call is made by fact when its argument is 1. With n = 1, the condition
of line (1) is true, and so the call to fact executes lines (1) and (2). Each takes
O(1) time, and so the running time of fact in the basis case is O(1). That is, T (1)
is O(1).
Now, consider what happens when n > 1. The condition of line (1) is false,
and so we execute only lines (1) and (3). Line (1) takes O(1) time, and line (3)
takes O(1) for the multiplication and assignment, plus T (n − 1) for the recursive
call to fact. That is, for n > 1, the running time of fact is O(1) + T (n − 1). We
can thus define T (n) by the following recurrence relation:
We now invent constant symbols to stand for those constants hidden within
the various big-oh expressions, as was suggested by rule (c) above. In this case, we
can replace the O(1) in the basis by some constant a, and the O(1) in the induction
by some constant b. These changes give us the following recurrence relation:
BASIS. T (1) = a.
Now we must solve this recurrence for T (n). We can calculate the first few
values easily. T (1) = a by the basis. Thus, by the inductive rule, we have
134 THE RUNNING TIME OF PROGRAMS
T (2) = b + T (1) = a + b
Continuing to use the inductive rule, we get
T (3) = b + T (2) = b + (a + b) = a + 2b
Then
T (4) = b + T (3) = b + (a + 2b) = a + 3b
By this point, it should be no surprise if we guess that T (n) = a + (n − 1)b, for all
n ≥ 1. Indeed, computing some sample values, then guessing a solution, and finally
proving our guess correct by an inductive proof is a common method of dealing with
recurrences.
Repeated In this case, however, we can derive the solution directly by a method known
substitution as repeated substitution. First, let us make a substitution of variables, m for n, in
the recursive equation, which now becomes
T (m) = b + T (m − 1), for m > 1 (3.3)
Now, we can substitute n, n − 1, n − 2, . . . , 2 for m in equation (3.3) to get the
sequence of equations
1) T (n) = b + T (n − 1)
2) T (n − 1) = b + T (n − 2)
3) T (n − 2) = b + T (n − 3)
···
n − 1) T (2) = b + T (1)
Next, we can use line (2) above to substitute for T (n − 1) in (1), to get the equation
T (n) = b + b + T (n − 2) = 2b + T (n − 2)
Now, we use line (3) to substitute for T (n − 2) in the above to get
T (n) = 2b + b + T (n − 3) = 3b + T (n − 3)
We proceed in this manner, each time replacing T (n − i) by b + T (n − i − 1), until
we get down to T (1). At that point, we have the equation
T (n) = (n − 1)b + T (1)
We can then use the basis to replace T (1) by a, and conclude that T (n) = a+(n−1)b.
To make this analysis more formal, we need to prove by induction our intuitive
observations about what happens when we repeatedly substitute for T (n − i). Thus
we shall prove the following by induction on i:
BASIS. The basis is i = 1. S(1) says that T (n) = b + T (n − 1). This is the inductive
part of the definition of T (n) and therefore known to be true.
The hard part is when i ≤ n−2. In that case, S(i) says that T (n) = ib+T (n−i).
Since i ≤ n − 2, the argument of T (n − i) is at least 2. Thus we can apply the
inductive rule for T — that is, (3.3) with n − i in place of m — to get the equation
T (n − i) = b + T (n − i − 1). When we substitute b + T (n − i − 1) for T (n − i) in
the equation T (n) = ib + T (n − i), we obtain T (n) = ib + b + T (n − i − 1) , or
regrouping terms,
T (n) = (i + 1)b + T n − (i + 1)
This equation is the statement S(i + 1), and we have now proved the induction step.
We have now shown that T (n) = a + (n − 1)b. However, a and b are unknown
constants. Thus, there is no point in presenting the solution this way. Rather,
we can express T (n) as a polynomial in n, namely, bn + (a − b), and then replace
terms by big-oh expressions, giving O(n) + O(1). Using the summation rule, we
can eliminate O(1), which tells us that T (n) is O(n). That makes sense; it says
that to compute n!, we make on the order of n calls to fact (the actual number is
exactly n), each of which requires O(1) time, excluding the time spent performing
the recursive call to fact. ✦
EXERCISES
3.9.1: Set up a recurrence relation for the running time of the function sum men-
tioned in Exercise 2.9.2, as a function of the length of the list that is input to the
program. Replace big-oh’s by (unknown) constants, and try to solve your recur-
rence. What is the running time of sum?
3.9.2: Repeat Exercise 3.9.1 for the function find0 from Exercise 2.9.3. What is a
suitable size measure?
3.9.3*: Repeat Exercise 3.9.1 for the recursive selection sort program in Fig. 2.22
of Section 2.7. What is a suitable size measure?
Fibonacci 3.9.4**: Repeat Exercise 3.9.1 for the function of Fig. 3.24, which computes the
numbers Fibonacci numbers. (The first two are 1, and each succeeding number is the sum of
the previous two. The first seven Fibonacci numbers are 1, 1, 2, 3, 5, 8, 13.) Note
that the value of n is the appropriate size of an argument and that you need both
1 and 2 as basis cases.
int fibonacci(int n)
{
if (n <= 2)
return 1;
else
return fibonacci(n-1) + fibonacci(n-2);
}
3.9.5*: Write a recursive program to compute gcd(i, j), the greatest common di-
visor of two integers i and j, as outlined in Exercise 2.7.8. Show that the running
time of the program is O(log i). Hint : Show that after two calls we invoke gcd(m, n)
where m ≤ i/2.
✦
✦ ✦
✦
3.10 Analysis of Merge Sort
We shall now analyze the merge sort algorithm that we presented in Section 2.8.
First, we show that the merge and split functions each take O(n) time on lists of
length n, and then we use these bounds to show that the MergeSort function takes
O(n log n) time on lists of length n.
if
(1)–(7)
retn if
(1) (2)–(7)
retn if
(2) (3)–(7)
block block
(4)–(5) (6)–(7)
the maximum of the running times of the if- and else-parts, we find these times to
be the same. The O(1) for the test of the condition can be neglected, and so we
conclude that the running time of the innermost selection is O(1) + T (n − 1).
Now we proceed to the selection beginning on line (2), where we test whether
list2 equals NULL. The time for testing the condition is O(1), and the time for
the if-part, which is the return on line (2), is also O(1). However, the else-part
is the selection statement of lines (3) through (7), which we just determined takes
O(1) + T (n − 1) time. Thus, the time for the selection of lines (2) through (7) is
O(1) + max O(1), O(1) + T (n − 1)
The second term of the maximum dominates the first and also dominates the O(1)
contributed by the test of the condition. Thus, the time for the if-statement begin-
ning at line (2) is also O(1) + T (n − 1).
Finally, we perform the same analysis for the outermost if-statement. Essen-
tially, the dominant time is the else-part, which consists of lines (2) through (7).
138 THE RUNNING TIME OF PROGRAMS
That is, the time for the cases in which there is a recursive call, lines (4) and (5)
or lines (6) and (7), dominates the time for the cases in which there is no recursive
call, represented by lines (1) and (2), and also dominates the time for all three tests
on lines (1), (2), and (3). Thus, the time for the function merge, when n > 1, is
bounded above by O(1) + T (n − 1). We therefore have the following recurrence
relation for defining T (n):
These equations are exactly the same as those derived for the function fact
in Example 3.24. Thus, the solution is the same and we can conclude that T (n)
is O(n). That result makes intuitive sense, since merge works by eliminating an
element from one of the lists, taking O(1) time to do so, and then calling itself
recursively on the remaining lists. It follows that the number of recursive calls will
be no greater than the sum of the lengths of the lists. Since each call takes O(1)
time, exclusive of the time taken by its recursive call, we expect the time for merge
to be O(n).
Now let us consider the split function, which we reproduce as Fig. 3.27. The
analysis is quite similar to that for merge. We let the size n of the argument be
the length of the list, and we here use T (n) for the time taken by split on a list of
length n.
For the basis, we take both n = 0 and n = 1. If n = 0 — that is, list is empty
— the test of line (1) succeeds and we return NULL on line (1). Lines (2) through (6)
are not executed, and we therefore take O(1) time. If n = 1, that is, list is a single
element, the test of line (1) fails, but the test of line (2) succeeds. We therefore
return NULL on line (2) and do not execute lines (3) through (6). Again, only O(1)
time is needed for the two tests and one return statement.
For the induction, n > 1, there is a three-way selection branch, similar to the
four-way branch we encountered in merge. To save time in analysis, we may observe
— as we eventually concluded for merge — that we take O(1) time to do one or
both of the selection tests of lines (1) and (2). Also, in the cases in which one of
these two tests is true, where we return on line (1) or (2), the additional time is only
O(1). The dominant time is the case in which both tests fail, that is, in which the
list is of length at least 2; in this case we execute the statements of lines (3) through
(6). All but the recursive call in line (5) contributes O(1) time. The recursive call
takes T (n − 2) time, since the argument list is the original value of list, missing
its first two elements (to see why, refer to the material in Section 2.8, especially the
diagram of Fig. 2.28). Thus, in the inductive case, T (n) is O(1) + T (n − 2).
We may set up the following recurrence relation:
Let us evaluate the first few values of T (n). Evidently T (0) = a and T (1) = b
by the basis. We may use the inductive step to deduce
T (2) = c + T (0) = a+c
T (3) = c + T (1) = b+c
T (4) = c + T (2) = c + (a + c) = a + 2c
T (5) = c + T (3) = c + (b + c) = b + 2c
T (6) = c + T (4) = c + (a + 2c) = a + 3c
140 THE RUNNING TIME OF PROGRAMS
The calculation of T (n) is really two separate calculations, one for odd n and the
other for even n. For even n, we get T (n) = a + cn/2. That makes sense, since with
an even-length list, we eliminate two elements, taking time c to do so, and after n/2
recursive calls, we are left with an empty list, on which we make no more recursive
calls and take a time.
On an odd-length list, we again eliminate two elements, taking time c to do so.
After (n − 1)/2 calls, we are down to a list of length 1, for which time b is required.
Thus, the time for odd-length lists will be b + c(n − 1)/2.
The inductive proofs of these observations closely parallel the proof in Example
3.24. That is, we prove the following:
INDUCTION. Because S(i) has an if-then form, S(i + 1) is always true if i ≥ n/2.
Thus, the inductive step — that S(i) implies S(i+1) — requires no proof if i ≥ n/2.
The hard case occurs when 1 ≤ i < n/2. In this situation, suppose that the
inductive hypothesis S(i) is true; T (n) = ic + T (n − 2i). We substitute n − 2i for
m in (3.4), giving us
T (n − 2i) = c + T (n − 2i − 2)
If we substitute for T (n − 2i) in S(i), we get
T (n) = ic + c + T (n − 2i − 2)
If we then group terms, we get
T (n) = (i + 1)c + T n − 2(i + 1)
which is the statement S(i + 1). We have thus proved the inductive step, and we
conclude T (n) = ic + T (n − 2i).
Now if n is even, let i = n/2. Then S(n/2) says that T (n) = cn/2 + T (0),
which is a + cn/2. If n is odd, let i = (n − 1)/2. S((n − 1)/2) tells us that T (n) is
c(n − 1)/2 + T (1)
which equals b + c(n − 1)/2 since T (1) = b.
Finally, we must convert to big-oh notation the constants a, b, and c, which
represent compiler- and machine-specific quantities. Both the polynomials a + cn/2
and b + c(n − 1)/2 have high-order terms proportional to n. Thus, the question
whether n is odd or even is actually irrelevant; the running time of split is O(n)
in either case. Again, that is the intuitively correct answer, since on a list of length
n, split makes about n/2 recursive calls, each taking O(1) time.
SEC. 3.10 ANALYSIS OF MERGE SORT 141
BASIS. If list consists of a single element, then we execute lines (1) and (2), but
none of the other code. Thus, in the basis case, T (1) is O(1).
INDUCTION. In the inductive case, the tests of lines (1) and (2) both fail, and so
we execute the block of lines (3) and (4). To make things simpler, let us assume
that n is a power of 2. The reason it helps to make this assumption is that when
n is even, the split of the list is into two pieces of length exactly n/2. Moreover,
if n is a power of 2, then n/2 is also a power of 2, and the divisions by 2 are all
into equal-sized pieces until we get down to pieces of 1 element each, at which time
the recursion ends. The time spent by MergeSort when n > 1 is the sum of the
following terms:
1. O(1) for the two tests
2. O(1) + O(n) for the assignment and call to split on line (3)
3. T (n/2) for the first recursive call to MergeSort on line (4)
4. T (n/2) for the second recursive call to MergeSort on line (4)
5. O(n) for the call to merge on line (4)
6. O(1) for the return on line (4).
142 THE RUNNING TIME OF PROGRAMS
If we add these terms, and drop the O(1)’s in favor of the larger O(n)’s that come
from the calls to split and merge, we get the bound 2T (n/2) + O(n) for the time
spent by MergeSort in the inductive case. We thus have the recurrence relation:
Our next step is to replace the big-oh expressions by functions with concrete
constants. We shall replace the O(1) in the basis by constant a, and the O(n) in
the inductive step by bn, for some constant b. Our recurrence relation thus becomes
BASIS. T (1) = a.
This recurrence is rather more complicated than the ones studied so far, but
we can apply the same techniques. First, let us explore the values of T (n) for some
small n’s. The basis tells us that T (1) = a. Then the inductive step says
T (2) = 2T (1) + 2b = 2a + 2b
T (4) = 2T (2) + 4b = 2(2a + 2b) + 4b = 4a + 8b
T (8) = 2T (4) + 8b = 2(4a + 8b) + 8b = 8a + 24b
T (16) = 2T (8) + 16b = 2(8a + 24b) + 16b = 16a + 64b
It may not be easy to see what is going on. Evidently, the coefficient of a keeps
pace with the value of n; that is, T (n) is n times a plus some number of b’s. But
the coefficient of b grows faster than any multiple of n. The relationship between n
and the coefficient of b is summarized as follows:
Value of n 2 4 8 16
Coefficient of b 2 8 24 64
Ratio 1 2 3 4
SEC. 3.10 ANALYSIS OF MERGE SORT 143
The ratio is the coefficient of b divided by the value of n. Thus, it appears that
the coefficient of b is n times another factor that grows by 1 each time n doubles.
In particular, we can see that this ratio is log2 n, because log2 2 = 1, log2 4 = 2,
log2 8 = 3, and log2 16 = 4. It is thus reasonable to conjecture that the solution to
our recurrence relation is T (n) = an + bn log2 n, at least for n a power of 2. We
shall see that this formula is correct.
To get a solution to the recurrence, let us follow the same strategy as for
previous examples. We write the inductive rule with argument m, as
T (m) = 2T (m/2) + bm, for m a power of 2 and m > 1 (3.5)
We then start with T (n) and use (3.5) to replace T (n) by an expression involv-
ing smaller values of the argument; in this case, the replacing expression involves
T (n/2). That is, we begin with
T (n) = 2T (n/2) + bn (3.6)
Next, we use (3.5), with n/2 in place of m, to get a replacement for T (n/2) in
(3.6). That is, (3.5) says that T (n/2) = 2T (n/4) + bn/2, and we can replace (3.6)
by
T (n) = 2 2T (n/4) + bn/2 + bn = 4T (n/4) + 2bn
Then, we can replace T (n/4) by 2T (n/8) + bn/4; the justification is (3.5) with n/4
in place of m. That gives us
T (n) = 4 2T (n/8) + bn/4 + 2bn = 8T (n/8) + 3bn
The statement that we shall prove by induction on i is
BASIS. For i = 1, the statement S(1) says that T (n) = 2T (n/2) + bn. This equality
is the inductive rule in the definition of T (n), the running time of merge and sort,
so we know that the basis holds.
We conclude that the equality S(i) — that is, T (n) = 2i T (n/2i ) + ibn — holds
for any i between 1 and log2 n. Now consider the formula S(log2 n), that is,
T (n) = 2log2 n T (n/2log2 n ) + (log2 n)bn
We know that 2log2 n = n (recall that the definition of log2 n is the power to which
we must raise 2 to equal n). Also, n/2log2 n = 1. Thus, S(log2 n) can be written
T (n) = nT (1) + bn log2 n
We also know that T (1) = a, by the basis of the definition of T . Thus,
T (n) = an + bn log2 n
After this analysis, we must replace the constants a and b by big-oh expressions.
That is, T (n) is O(n) + O(n log n).8 Since n grows more slowly than n log n, we
may neglect the O(n) term and say that T (n) is O(n log n). That is, merge sort
is an O(n log n)-time algorithm. Remember that selection sort was shown to take
O(n2 ) time. Although strictly speaking, O(n2 ) is only an upper bound, it is in fact
the tightest simple bound for selection sort. Thus, we can be sure that, as n gets
large, merge sort will always run faster than selection sort. In practice, merge sort
is faster than selection sort for n’s larger than a few dozen.
EXERCISES
3.10.1: Draw structure trees for the functions
a) split
b) MergeSort
Indicate the running time for each node of the trees.
3.10.2*: Define a function k-mergesort that splits a list into k pieces, sorts each
piece, and then merges the result.
a) What is the running time of k-mergesort as a function of k and n?
b)**What value of k gives the fastest algorithm (as a function of n)? This problem
requires that you estimate the running times sufficiently precisely that you can
distinguish constant factors. Since you cannot be that precise in practice, for
the reasons we discussed at the beginning of the chapter, you really need to
examine how the running time from (a) varies with k and get an approximate
minimum.
✦
✦ ✦
✦
3.11 Solving Recurrence Relations
There are many techniques for solving recurrence relations. In this section, we shall
discuss two such approaches. The first, which we have already seen, is repeatedly
substituting the inductive rule into itself until we get a relationship between T (n)
and T (1) or — if 1 is not the basis — between T (n) and T (i) for some i that is
covered by the basis. The second method we introduce is guessing a solution and
checking its correctness by substituting into the basis and the inductive rules.
8 Remember that inside a big-oh expression, we do not have to specify the base of a logarithm,
because logarithms to all bases are the same, to within a constant factor.
SEC. 3.11 SOLVING RECURRENCE RELATIONS 145
In the previous two sections, we have solved exactly for T (n). However, since
T (n) is really a big-oh upper bound on the exact running time, it is sufficient to find
a tight upper bound on T (n). Thus, especially for the “guess-and-check” approach,
we require only that the solution be an upper bound on the true solution to the
recurrence.
BASIS. T (1) = a.
We can generalize this form slightly if we allow the addition of some function g(n)
in place of the constant b in the induction. We can write this form as
BASIS. T (1) = a.
This form arises whenever we have a recursive function that takes time g(n) and
then calls itself with an argument one smaller than the argument with which the
current function call was made. Examples are the factorial function of Example
3.24, the function merge of Section 3.10, and the recursive selection sort of Section
2.7. In the first two of these functions, g(n) is a constant, and in the third it is linear
in n. The function split of Section 3.10 is almost of this form; it calls itself with
an argument that is smaller by 2. We shall see that this difference is unimportant.
Let us solve this recurrence by repeated substitution. As in Example 3.24, we
first write the inductive rule with the argument m, as
T (m) = T (m − 1) + g(m)
and then proceed to substitute for T repeatedly in the right side of the original
inductive rule. Doing this, we get the sequence of expressions
T (n) = T (n − 1) + g(n)
= T (n − 2) + g(n − 1) + g(n)
= T (n − 3) + g(n − 2) + g(n − 1) + g(n)
···
= T (n − i) + g(n − i + 1) + g(n − i + 2) + · · · + g(n − 1) + g(n)
Using the technique in Example 3.24, we can prove by induction on i, for i =
1, 2, . . . , n − 1, that
i−1
X
T (n) = T (n − i) + g(n − j)
j=0
146 THE RUNNING TIME OF PROGRAMS
✦ Example 3.25. Consider the recursive selection sort function of Fig. 2.22, the
body of which we reproduce as Fig. 3.29. If we let T (m) be the running time of the
function SelectionSort when given an array of m elements to sort (that is, when
the value of its argument i is n − m), we can develop a recurrence relation for T (m)
as follows. First, the basis case is m = 1. Here, only line (1) is executed, taking
O(1) time.
For the inductive case, m > 1, we execute the test of line (1) and the assign-
ments of lines (2), (6), (7), and (8), all of which take O(1) time. The for-loop of
lines (3) to (5) takes O(n − i) time, or O(m) time, as we discussed in connection
with the iterative selection sort program in Example 3.17. To review why, note
that the body, lines (4) and (5), takes O(1) time, and we go m − 1 times around
the loop. Thus, the time of the for-loop dominates lines (1) through (8), and we
can write T (m), the time of the entire function, as T (m − 1) + O(m). The second
term, O(m), covers lines (1) through (8), and the T (m − 1) term is the time for
line (9), the recursive call. If we replace the hidden constant factors in the big-oh
expressions by concrete constants, we get the recurrence relation
BASIS. T (1) = a.
This recurrence is of the form we studied, with g(m) = bm. That is, the
solution is
SEC. 3.11 SOLVING RECURRENCE RELATIONS 147
m−2
X
T (m) = a+ b(m − j)
j=0
= a + 2b + 3b + · · · + mb
= a + b(m − 1)(m + 2)/2
Thus, T (m) is O(m2 ). Since we are interested in the running time of function
SelectionSort on the entire array of length n, that is, when called with i = 1, we
need the expression for T (n) and find that it is O(n2 ). Thus, the recursive version
of selection sort is quadratic, just like the iterative version. ✦
BASIS. T (1) = a.
This is the recurrence for a recursive algorithm that solves a problem of size n
by subdividing it into two subproblems, each of size n/2. Here g(n) is the amount
of time taken to create the subproblems and combine the solutions. For example,
MergeSort divides a problem of size n into two problems of size n/2. The function
g(n) is bn for some constant b, since the time taken by MergeSort exclusive of
recursive calls to itself is O(n), principally for the split and merge algorithms.
To solve this recurrence, we substitute for T on the right side. Here we assume
that n = 2k for some k. The recursive equation can be written with m as its
argument: T (m) = 2T (m/2) + g(m). If we substitute n/2i for m, we get
T (n/2i ) = 2T (n/2i+1) + g(n/2i ) (3.8)
If we start with the inductive rule and proceed to substitute for T using (3.8) with
progressively greater values of i, we find
to our recurrence.
Intuitively, the first term of (3.9) represents the contribution of the basis value
a. That is, there are n calls to the recursive function with an argument of size
1. The summation is the contribution of the recursion, and it represents the work
performed by all the calls with argument size greater than 1.
Figure 3.30 suggests the accumulation of time during the execution of Merge-
Sort. It represents the time to sort eight elements. The first row represents the
work on the outermost call, involving all eight elements; the second row represents
the work of the two calls with four elements each; and the third row is the four
calls with two elements each. Finally, the bottom row represents the eight calls
to MergeSort with a list of length one. In general, if there are n elements in the
original unsorted list, there will be log2 n levels at which bn work is done by calls
to MergeSort that result in other calls. The accumulated time of all these calls is
thus bn log2 n. There will be one level at which calls are made that do not result in
further calls, and an is the total time spent by these calls. Note that the first log2 n
levels represent the terms of the summation in (3.9) and the lowest level represents
the term an.
bn
bn
bn
an
✦ Example 3.26. In the case of MergeSort, the function g(n) is bn for some
constant b. The solution (3.9) with these parameters is therefore
(log2 n)−1
X
T (n) = an + 2j bn/2j
j=0
(log2 n)−1
X
= an + bn 1
j=0
= an + bn log n
The last equality comes from the fact there are log2 n terms in the sum and each
term is 1. Thus, when g(n) is linear, the solution to (3.9) is O(n log n). ✦
✦ Example 3.27. Let us again examine the MergeSort recurrence relation from
the previous section, which we wrote as
BASIS. T (1) = a.
We are going to guess that an upper bound to T (n) is f (n) = cn log2 n + d for
some constants c and d. Recall that this form is not exactly right; in the previous
example we derived the solution and saw that it had an O(n log n) term with an
O(n) term, rather than with a constant term. However, this guess turns out to be
good enough to prove an O(n log n) upper bound on T (n).
We shall use complete induction on n to prove the following, for some constants
c and d:
BASIS. When n = 1, T (1) ≤ f (1) provided a ≤ d. The reason is that the cn log2 n
term of f (n) is 0 when n = 1, so that f (1) = d, and it is given that T (1) = a.
INDUCTION. Let us assume S(i) for all i < n, and prove S(n) for some n > 1.10
If n is not a power of 2, there is nothing to prove, since the if-portion of the if-then
statement S(n) is not true. Thus, consider the hard case, in which n is a power of
2. We may assume S(n/2), that is,
T (n/2) ≤ (cn/2) log2 (n/2) + d
because it is part of the inductive hypothesis. For the inductive step, we need to
show that
T (n) ≤ f (n) = cn log2 n + d
When n ≥ 2, the inductive part of the definition of T (n) tells us that
T (n) ≤ 2T (n/2) + bn
Using the inductive hypothesis to bound T (n/2), we have
T (n) ≤ 2[c(n/2) log2 (n/2) + d] + bn
9 If it is any comfort, the theory of differential equations, which in many ways resembles the
theory of recurrence equations, also relies on known solutions to equations of a few common
forms and then educated guessing to solve other equations.
10 In most complete inductions we assume S(i) for i up to n and prove S(n + 1). In this case it
is notationally simpler to prove S(n) from S(i), for i < n, which amounts to the same thing.
150 THE RUNNING TIME OF PROGRAMS
Manipulating Inequalities
In Example 3.27 we derive one inequality, T (n) ≤ cn log2 n + d, from another,
T (n) ≤ cn log2 n + (b − c)n + 2d
Our method was to find an “excess” and requiring it to be at most 0. The general
principle is that if we have an inequality A ≤ B + E, and if we want to show that
A ≤ B, then it is sufficient to show that E ≤ 0. In Example 3.27, A is T (n), B is
cn log2 n + d, and E, the excess, is (b − c)n + d.
Since log2 (n/2) = log2 n − log2 2 = log2 n − 1, we may simplify this expression to
T (n) ≤ cn log2 n + (b − c)n + 2d (3.10)
We now show that T (n) ≤ cn log2 n + d, provided that the excess over cn log2 n + d
on the right side of (3.10) is at most 0; that is, (b − c)n + d ≤ 0. Since n > 1, this
inequality is true when d ≥ 0 and b − c ≤ −d.
We now have three constraints for f (n) = cn log n + d to be an upper bound
on T (n):
1. The constraint a ≤ d comes from the basis.
2. d ≥ 0 comes from the induction, but since we know that a > 0, this inequality
is superseded by (1).
3. b − c ≤ −d, or c ≥ b + d, also comes from the induction.
These constraints are obviously satisfied if we let d = a and c = a + b. We have
now shown by induction on n that for all n ≥ 1 and a power of 2,
T (n) ≤ (a + b)n log2 n + a
This argument shows that T (n) is O(n log n), that is, that T (n) does not grow
any faster than n log n. However, the bound (a + b)n log2 n + a that we obtained is
slightly larger than the exact answer that we obtained in Example 3.26, which was
bn log2 n + an. At least we were successful in obtaining a bound. Had we taken the
simpler guess of f (n) = cn log2 n, we would have failed, because there is no value of
c that can make f (1) ≥ a. The reason is that c × 1 × log2 1 = 0, so that f (1) = 0.
If a > 0, we evidently cannot make f (1) ≥ a. ✦
✦ Example 3.28. Now let us consider a recurrence relation that we shall en-
counter later in the book:
BASIS. G(1) = 3.
This recurrence has actual numbers, like 3, instead of symbolic constants like a. In
Chapter 13, we shall use recurrences such as this to count the number of gates in
a circuit, and gates can be counted exactly, without needing the big-oh notation to
SEC. 3.11 SOLVING RECURRENCE RELATIONS 151
Summary of Solutions
In the table below, we list the solutions to some of the most common recurrence
relations, including some we have not covered in this section. In each case, we
assume that the basis equation is T (1) = a and that k ≥ 0.
All the above also hold if bnk is replaced by any kth-degree polynomial.
BASIS. If n = 1, then we must show that G(1) ≤ c21 + d, that is, that 3 ≤ 2c + d.
This inequality becomes one of the constraints on c and d.
INDUCTION. As in Example 3.27, the only hard part occurs when n is a power of
2 and we want to prove S(n) from S(n/2). The equation in this case is
G(n/2) ≤ c2n/2 + d
We must prove S(n), which is G(n) ≤ c2n +d. We start with the inductive definition
of G,
G(n) = (2n/2 + 1)G(n/2)
152 THE RUNNING TIME OF PROGRAMS
and then substitute our upper bound for G(n/2), converting this expression to
G(n) ≤ (2n/2 + 1)(c2n/2 + d)
Simplifying, we get
G(n) ≤ c2n + (c + d)2n/2 + d
That will give the desired upper bound, c2n + d, on G(n), provided that the excess
on the right, (c + d)2n/2 , is no more than 0. It is thus sufficient that c + d ≤ 0.
We need to select c and d to satisfy the two inequalities
1. 2c + d ≥ 3, from the basis, and
2. c + d ≤ 0, from the induction.
For example, these inequalities are satisfied if c = 3 and d = −3. Then we know
that G(n) ≤ 3(2n − 1). Thus, G(n) grows exponentially with n. It happens that
this function is the exact solution, that is, that G(n) = 3(2n − 1), as the reader may
prove by induction on n. ✦
EXERCISES
The values F (0), F (1), F (2), . . . form the sequence of Fibonacci numbers, in which
each number after the first two √ is the sum of the two previous numbers. (See
Golden ratio Exercise 3.9.4.) Let r = (1 + 5)/2. This constant r is called the golden ratio
and its value is about 1.62. Show that F (n) is O(rn ). Hint : For the induction, it
helps to guess that F (n) ≤ arn for some n, and attempt to prove that inequality
by induction on n. The basis must incorporate the two values n = 0 and n = 1. In
the inductive step, it helps to notice that r satisfies the equation r2 = r + 1.
✦
✦ ✦
✦
3.12 Summary of Chapter 3
✦ Many factors go into the selection of an algorithm for a program, but simplicity,
ease of implementation, and efficiency often dominate.
✦ Big-oh expressions provide a convenient notation for upper bounds on the run-
ning times of programs.
✦ There are recursive rules for evaluating the running time of the various com-
pound statements of C, such as for-loops and conditions, in terms of the running
times of their constituent parts.
✦ Recurrence relations are a natural way to model the running time of recursive
programs.
✦ More generally, a function that takes time O(nk ) and then calls itself on a
subproblem of size n − 1 takes time O(nk+1 ).
✦ If a function calls itself twice but the recursion goes on for log2 n levels (as in
merge sort), then the total running time is O(n log n) times the work done at
each call, plus O(n) times the work done at the basis. In the case of merge
sort, the work at each call, including basis calls, is O(1), so the total running
time is O(n log n) + O(n), or just O(n log n).
SEC. 3.13 BIBLIOGRAPHIC NOTES FOR CHAPTER 3 155
✦ If a function calls itself twice and the recursion goes on for n levels (as in the
Fibonacci program of Exercise 3.9.4), then the running time is exponential in
n.
✦
✦ ✦
✦
3.13 Bibliographic Notes for Chapter 3
The study of the running time of programs and the computational complexity of
problems was pioneered by Hartmanis and Stearns [1964]. Knuth [1968] was the
book that established the study of the running time of algorithms as an essential
ingredient in computer science.
Since that time, a rich theory for the difficulty of problems has been developed.
Many of the key ideas are found in Aho, Hopcroft, and Ullman [1974, 1983].
In this chapter, we have concentrated on upper bounds for the running times of
programs. Knuth [1976] describes analogous notations for lower bounds and exact
bounds on running times.
For more discussion of divide and conquer as an algorithm-design technique,
see Aho, Hopcroft, and Ullman [1974] or Borodin and Munro [1975]. Additional
information on techniques for solving recurrence relations can be found in Graham,
Knuth, and Patashnik [1989].
Aho, A. V., J. E. Hopcroft, and J. D. Ullman [1974]. The Design and Analysis of
Computer Algorithms, Addison-Wesley, Reading, Mass.
Aho, A. V., J. E. Hopcroft, and J. D. Ullman [1983]. Data Structures and Algo-
rithms, Addison-Wesley, Reading, Mass.
Borodin, A. B., and I. Munro [1975]. The Computational Complexity of Algebraic
and Numeric Problems, American Elsevier, New York.
Graham, R. L., D. E. Knuth, and O. Patashnik [1989]. Concrete Mathematics: a
Foundation for Computer Science, Addison-Wesley, Reading, Mass.
Knuth, D. E. [1968]. The Art of Computer Programming Vol. I: Fundamental
Algorithms, Addison-Wesley, Reading, Mass.
Knuth, D. E. [1976]. “Big omicron, big omega, and big theta,” ACM SIGACT
News 8:2, pp. 18–23.
CHAPTER 4
✦
Combinatorics
✦ ✦
✦
and
Probability
In computer science we frequently need to count things and measure the likelihood
of events. The science of counting is captured by a branch of mathematics called
combinatorics. The concepts that surround attempts to measure the likelihood of
events are embodied in a field called probability theory. This chapter introduces the
rudiments of these two fields. We shall learn how to answer questions such as how
many execution paths are there in a program, or what is the likelihood of occurrence
of a given path?
✦
✦ ✦
✦
4.1 What This Chapter Is About
156
SEC. 4.2 COUNTING ASSIGNMENTS 157
✦ Counting permutations with some identical items (Section 4.6). The paradigm
problem is counting the number of anagrams of a word that may have some
letters appearing more than once.
✦ Counting the number of ways objects, some of which may be identical, can be
distributed among bins (Section 4.7). The paradigm problem is counting the
number of ways of distributing fruits to children.
In the second half of this chapter we discuss probability theory, covering the follow-
ing topics:
✦ Basic concepts: probability spaces, experiments, events, probabilities of events.
✦ Conditional probabilities and independence of events. These concepts help
us think about how observation of the outcome of one experiment, e.g., the
drawing of a card, influences the probability of future events.
✦ Probabilistic reasoning and ways that we can estimate probabilities of com-
binations of events from limited data about the probabilities and conditional
probabilities of events.
We also discuss some applications of probability theory to computing, including
systems for making likely inferences from data and a class of useful algorithms that
work “with high probability” but are not guaranteed to work all the time.
✦
✦ ✦
✦
4.2 Counting Assignments
One of the simplest but most important counting problems deals with a list of items,
to each of which we must assign one of a fixed set of values. We need to determine
how many different assignments of values to items are possible.
✦ Example 4.1. A typical example is suggested by Fig. 4.1, where we have four
houses in a row, and we may paint each in one of three colors: red, green, or blue.
Here, the houses are the “items” mentioned above, and the colors are the “values.”
Figure 4.1 shows one possible assignment of colors, in which the first house is painted
red, the second and fourth blue, and the third green.
To answer the question, “How many different assignments are there?” we first
need to define what we mean by an “assignment.” In this case, an assignment is a
list of four values, in which each value is chosen from one of the three colors red,
green, or blue. We shall represent these colors by the letters R, G, and B. Two
such lists are different if and only if they differ in at least one position.
158 COMBINATORICS AND PROBABILITY
In the example of houses and colors, we can choose any of three colors for the
first house. Whatever color we choose for the first house, there are three colors in
which to paint the second house. There are thus nine different ways to paint the first
two houses, corresponding to the nine different pairs of letters, each letter chosen
from R, G, and B. Similarly, for each of the nine assignments of colors to the first
two houses, we may select a color for the third house in three possible ways. Thus,
there are 9 × 3 = 27 ways to paint the first three houses. Finally, each of these 27
assignments can be extended to the fourth house in 3 different ways, giving a total
of 27 × 3 = 81 assignments of colors to the houses. ✦
We can extend the above example. In the general setting, we have a list of n
“items,” such as the houses in Example 4.1. There is also a set of k “values,” such
as the colors in Example 4.1, any one of which can be assigned to an item. An
Assignment assignment is a list of n values (v1 , v2 , . . . , vn ). Each of v1 , v2 , . . . , vn is chosen to
be one of the k values. This assignment assigns the value vi to the ith item, for
i = 1, 2, . . . , n.
There are k n different assignments when there are n items and each item is to
be assigned one of k values. For instance, in Example 4.1 we had n = 4 items, the
houses, and k = 3 values, the colors. We calculated that there were 81 different
assignments. Note that 34 = 81. We can prove the general rule by an induction
on n.
STATEMENT S(n): The number of ways to assign any one of k values to each of
n items is k n .
BASIS. The basis is n = 1. If there is one item, we can choose any of the k values
for it. Thus there are k different assignments. Since k 1 = k, the basis is proved.
INDUCTION. Suppose the statement S(n) is true, and consider S(n + 1), the
statement that there are k n+1 ways to assign one of k values to each of n + 1 items.
We may break any such assignment into a choice of value for the first item and, for
each choice of first value, an assignment of values to the remaining n items. There
are k choices of value for the first item. For each such choice, by the inductive
hypothesis there are k n assignments of values to the remaining n items. The total
number of assignments is thus k × k n , or k n+1 . We have thus proved S(n + 1) and
completed the induction.
Figure 4.2 suggests this selection of first value and the associated choices of
assignment for the remaining items in the case that n + 1 = 4 and k = 3, using
as a concrete example the four houses and three colors of Example 4.1. There, we
assume by the inductive hypothesis that there are 27 assignments of three colors to
three houses.
SEC. 4.2 COUNTING ASSIGNMENTS 159
Red 27
Assignments
Green 27
Assignments
Blue 27
Assignments
EXERCISES
int f(int x)
{
int n;
n = 1;
if (x%2 == 0) n *= 2;
if (x%3 == 0) n *= 3;
if (x%5 == 0) n *= 5;
if (x%7 == 0) n *= 7;
if (x%11 == 0) n *= 11;
if (x%13 == 0) n *= 13;
if (x%17 == 0) n *= 17;
if (x%19 == 0) n *= 19;
return n;
}
4.2.4: In the game of “Hollywood squares,” X’s and O’s may be placed in any of the
nine squares of a tic-tac-toe board (a 3×3 matrix) in any combination (i.e., unlike
ordinary tic-tac-toe, it is not necessary that X’s and O’s be placed alternately, so,
for example, all the squares could wind up with X’s). Squares may also be blank,
i.e., not containing either an X or and O. How many different boards are there?
4.2.5: How many different strings of length n can be formed from the ten digits?
A digit may appear any number of times in the string or not at all.
4.2.6: How many different strings of length n can be formed from the 26 lower-case
letters? A letter may appear any number of times or not at all.
4.2.7: Convert the following into K’s, M’s, G’s, T’s, or P’s, according to the rules
of the box in Section 4.2: (a) 213 (b) 217 (c) 224 (d) 238 (e) 245 (f) 259 .
4.2.8*: Convert the following powers of 10 into approximate powers of 2: (a) 1012
(b) 1018 (c) 1099 .
✦
✦ ✦
✦
4.3 Counting Permutations
In this section we shall address another fundamental counting problem: Given n
distinct objects, in how many different ways can we order those objects in a line?
Such an ordering is called a permutation of the objects. We shall let Π(n) stand for
the number of permutations of n objects.
As one example of where counting permutations is significant in computer
science, suppose we are given n objects, a1 , a2 , . . . , an , to sort. If we know nothing
about the objects, it is possible that any order will be the correct sorted order, and
thus the number of possible outcomes of the sort will be equal to Π(n), the number
of permutations of n objects. We shall soon see that this observation helps us
argue that general-purpose sorting algorithms require time proportional to n log n,
and therefore that algorithms like merge sort, which we saw in Section 3.10 takes
SEC. 4.3 COUNTING PERMUTATIONS 161
This table suggests that for powers of 2 beyond 29 we factor out 230 , 240 , or 2
raised to whatever multiple-of-10 power we can. The remaining powers of 2 name
the number of giga-, tera-, or peta- of whatever unit we are measuring. For example,
243 bytes is 8 terabytes.
O(n log n) time, are to within a constant factor as fast as can be.
There are many other applications of the counting rule for permutations. For
example, it figures heavily in more complex counting questions like combinations
and probabilities, as we shall see in later sections.
sponding to the two ways in which we may order the remaining objects A and C.
We thus have orders BAC and BCA. Finally, if we start with C first, we can order
the remaining objects A and B in the two possible ways, giving us orders CAB and
CBA. These six orders,
ABC, ACB, BAC, BCA, CAB, CBA
are all the possible orders of three elements. That is, Π(3) = 3 × 2 × 1 = 6.
Next, consider how many permutations there are for 4 objects: A, B, C, and
D. If we pick A first, we may follow A by the objects B, C, and D in any of their
6 orders. Similarly, if we pick B first, we can order the remaining A, C, and D in
any of their 6 ways. The general pattern should now be clear. We can pick any
of the four elements first, and for each such selection, we can order the remaining
three elements in any of the Π(3) = 6 possible ways. It is important to note that
the number of permutations of the three objects does not depend on which three
elements they are. We conclude that the number of permutations of 4 objects is 4
times the number of permutations of 3 objects. ✦
More generally,
Π(n + 1) = (n + 1)Π(n) for any n ≥ 1 (4.1)
That is, to count the permutations of n + 1 objects we may pick any of the n + 1
objects to be first. We are then left with n remaining objects, and these can be
permuted in Π(n) ways, as suggested in Fig. 4.4. For our example where n + 1 = 4,
we have Π(4) = 4 × Π(3) = 4 × 6 = 24.
Π(n)
Object 1 orders
Π(n)
Object 2 orders
. .
. .
. .
Π(n)
Object n + 1 orders
INDUCTION. Suppose Π(n) = n!. Then S(n + 1), which we must prove, says that
Π(n + 1) = (n + 1)!. We start with Equation (4.1), which says that
Π(n + 1) = (n + 1) × Π(n)
By the inductive hypothesis, Π(n) = n!. Thus, Π(n + 1) = (n + 1)n!. Since
n! = n × (n − 1) × · · · × 1
it must be that (n + 1) × n! = (n + 1) × n × (n − 1) × · · · × 1. But the latter product
is (n + 1)!, which proves S(n + 1).
✦ Example 4.3. As a result of the formula Π(n) = n!, we conclude that the
number of permutations of 4 objects is 4! = 4 × 3 × 2 × 1 = 24, as we saw above.
As another example, the number of permutations of 7 objects is 7! = 5040. ✦
Suppose we are given n distinct elements to sort. The answer — that is, the
correct sorted order — can be any of the n! permutations of these elements. If our
algorithm for sorting arbitrary types of elements is to work correctly, it must be
able to distinguish all n! different possible answers.
Consider the first comparison of elements that the algorithm makes, say
lessThan(X,Y)
For each of the n! possible sorted orders, either X is less than Y or it is not. Thus,
the n! possible orders are divided into two groups, those for which the answer to
the first test is “yes” and those for which it is “no.”
One of these groups must have at least n!/2 members. For if both groups have
fewer than n!/2 members, then the total number of orders is less than n!/2 + n!/2,
or less than n! orders. But this upper limit on orders contradicts the fact that we
started with exactly n! orders.
Now consider the second test, on the assumption that the outcome of the
comparison between X and Y was such that the larger of the two groups of possible
orders remains (take either outcome if the groups are the same size). That is, at
least n!/2 orders remain, among which the algorithm must distinguish. The second
comparison likewise has two possible outcomes, and at least half the remaining
orders will be consistent with one of these outcomes. Thus, we can find a group of
at least n!/4 orders consistent with the first two tests.
We can repeat this argument until the algorithm has determined the correct
sorted order. At each step, by focusing on the outcome with the larger population
of consistent possible orders, we are left with at least half as many possible orders as
at the previous step. Thus, we can find a sequence of tests and outcomes such that
after the ith test, there are at least n!/2i orders consistent with all these outcomes.
Since we cannot finish sorting until every sequence of tests and outcomes is
consistent with at most one sorted order, the number of tests t made before we
finish must satisfy the equation
n!/2t ≤ 1 (4.2)
If we take logarithms base 2 of both sides of Equation (4.2) we have log2 n! − t ≤ 0,
or
t ≥ log2 (n!)
We shall see that log2 (n!) is about n log2 n. But first, let us consider an example of
the process of splitting the possible orders.
✦ Example 4.3. Let us consider how the selection sort algorithm of Fig. 2.2
makes its decisions when given three elements (a, b, c) to sort. The first comparison
is between a and b, as suggested at the top of Fig. 4.5, where we show in the box
that all 6 possible orders are consistent before we make any tests. After the test,
the orders abc, acb, and cab are consistent with the “yes” outcome (i.e., a < b),
while the orders bac, bca, and cba are consistent with the opposite outcome, where
b > a. We again show in a box the consistent orders in each case.
In the algorithm of Fig. 2.2, the index of the smaller becomes the value small.
Thus, we next compare c with the smaller of a and b. Note that which test is made
next depends on the outcome of previous tests.
SEC. 4.3 COUNTING PERMUTATIONS 165
After making the second decision, the smallest of the three is moved into the
first position of the array, and a third comparison is made to determine which of
the remaining elements is the larger. That comparison is the last comparison made
by the algorithm when three elements are to be sorted. As we see at the bottom
of Fig. 4.5, sometimes that decision is determined. For example, if we have already
found a < b and c < a, then c is the smallest and the last comparison of a and b
must find a smaller.
In this example, all paths involve 3 decisions, and at the end there is at most
one consistent order, which is the correct sorted order. The two paths with no
consistent order never occur. Equation (4.2) tells us that the number of tests t
must be at least log2 3!, which is log2 6. Since 6 is between 22 and 23 , we know that
log2 6 will be between 2 and 3. Thus, at least some sequences of outcomes in any
algorithm that sorts three elements must make 3 tests. Since selection sort makes
only 3 tests for 3 elements, it is at least as good as any other sorting algorithm for 3
elements in the worst case. Of course, as the number of elements becomes large, we
know that selection sort is not as good as can be done, since it is an O(n2 ) sorting
algorithm and there are better algorithms such as merge sort. ✦
We must now estimate how large log2 n! is. Since n! is the product of all the
integers from 1 to n, it is surely larger than the product of only the n2 + 1 integers
from n/2 through n. This product is in turn at least as large as n/2 multiplied by
itself n/2 times, or (n/2)n/2 . Thus, log2 n! is at least log2 (n/2)n/2 . But the latter
is n2 (log2 n − log2 2), which is
n
(log2 n − 1)
2
For large n, this formula is approximately (n log2 n)/2.
A more careful analysis will tell us that the factor of 1/2 does not have to be
there. That is, log2 n! is very close to n log2 n rather than to half that expression.
166 COMBINATORICS AND PROBABILITY
We have shown only that any general-purpose sorting algorithm must have
some input for which it makes about n log2 n comparisons or more. Thus any
general-purpose sorting algorithm must take at least time proportional to n log n
in the worst case. In fact, it can be shown that the same applies to the “average”
input. That is, the average over all inputs of the time taken by a general-purpose
sorting algorithm must be at least proportional to n log n. Thus, merge sort is about
as good as we can do, since it has this big-oh running time for all inputs.
EXERCISES
4.3.3: How many comparisons does the merge sort algorithm of Section 2.8 make
if there are 4 elements? Is this number the best possible? Show the top 3 levels of
the decision tree in the style of Fig. 4.5.
4.3.4*: Are there more assignments of n values to n items or permutations of n + 1
items? Note: The answer may not be the same for all n.
4.3.5*: Are there more assignments of n/2 values to n items than there are per-
mutations of n items?
4.3.6**: Show how to sort n integers in the range 0 to n2 − 1 in O(n) time.
✦
✦ ✦
✦
4.4 Ordered Selections
Sometimes we wish to select only some of the items in a set and give them an order.
Let us generalize the function Π(n) that counted permutations in the previous
section to a two-argument function Π(n, m), which we define to be the number of
ways we can select m items from n in such a way that order matters for the selected
items, but there is no order for the unselected items. Thus, Π(n) = Π(n, n).
✦ Example 4.5. A horse race awards prizes to the first three finishers; the first
horse is said to “win,” the second to “place,” and the third to “show.” Suppose
there are 10 horses in a race. How many different awards for win, place, and show
are there?
Clearly, any of the 10 horses can be the winner. Given which horse is the
winner, any of the 9 remaining horses can place. Thus, there are 10 × 9 = 90
choices for horses in first and second positions. For any of these 90 selections of
win and place, there are 8 remaining horses. Any of these can finish third. Thus,
there are 90 × 8 = 720 selections of win, place, and show. Figure 4.6 suggests all
these possible selections, concentrating on the case where 3 is selected first and 1 is
selected second. ✦
.
1 . All but 1
.
. All but 2
2 . 2
. 1 4
5
3 .
.
. .
2 . 10
. All but 2, 3
.
. .
. 4 . All but 3, 4
. .
.
. .
10 . All but 3, 10
.
.
10 . All but 10
.
✦ Example 4.6. Consider the case from Example 4.5, where n = 10 and m =
3. We observed that Π(10, 3) = 10 × 9 × 8 = 720. The formula (4.3) says that
Π(10, 3) = 10!/7!, or
SEC. 4.4 ORDERED SELECTIONS 169
10 × 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1
7×6×5×4×3×2×1
The factors from 1 through 7 appear in both numerator and denominator and thus
cancel. The result is the product of the integers from 8 through 10, or 10 × 9 × 8,
as we saw in Example 4.5. ✦
EXERCISES
4.4.1: How many ways are there to form a sequence of m letters out of the 26
letters, if no letter is allowed to appear more than once, for (a) m = 3 (b) m = 5.
4.4.2: In a class of 200 students, we wish to elect a President, Vice President,
Secretary, and Treasurer. In how many ways can these four officers be selected?
4.4.3: Compute the following quotients of factorials: (a) 100!/97! (b) 200!/195!.
Mastermind 4.4.4: The game of Mastermind requires players to select a “code” consisting of a
sequence of four pegs, each of which may be of any of six colors: red, green, blue,
yellow, white, and black.
a) How may different codes are there?
b*) How may different codes are there that have two or more pegs of the same
color? Hint : This quantity is the difference between the answer to (a) and
another easily computed quantity.
c) How many codes are there that have no red peg?
d*) How many codes are there that have no red peg but have at least two pegs of
the same color?
170 COMBINATORICS AND PROBABILITY
Quotients of Factorials
Note that in general, a!/b! is the product of the integers between b + 1 and a, as
long as b < a. It is much easier to calculate the quotient of factorials as
a × (a − 1) × · · · × (b + 1)
than to compute each factorial and divide, especially if b is not much less than a.
✦
✦ ✦
✦
4.5 Unordered Selections
There are many situations in which we wish to count the ways to select a set of
items, but the order in which the selections are made does not matter. In terms of
the horse race example of the previous section, we may wish to know which horses
were the first three to finish, but we do not care about the order in which these
three finished. Put another way, we wish to know how many ways we can select
three horses out of n to be the top three finishers.
✦ Example 4.7. Let us again assume n = 10. We know from Example 4.5 that
there are 720 ways to select three horses, say A, B, and C, to be the win, place, and
show horses, respectively. However, now we do not care about the order of finish
of these three horses, only that A, B, and C were the first three finishers in some
order. Thus, we shall get the answer “A, B, and C are the three best horses” in
six different ways, corresponding to the ways that these three horses can be ordered
among the top three. We know there are exactly six ways, because the number of
ways to order 3 items is Π(3) = 3! = 6. However, if there is any doubt, the six ways
are seen in Fig. 4.7.
Fig. 4.7. Six orders in which a set of three horses may finish.
What is true for the set of horses A, B, and C is true of any set of three horses.
Each set of three horses will appear exactly 6 times, in all of their possible orders,
when we count the ordered selections of three horses out of 10. Thus, if we wish
SEC. 4.5 UNORDERED SELECTIONS 171
to count only the sets of three horses that may be the three top finishers, we must
divide Π(10, 3) by 6. Thus, there are 720/6 = 120 different sets of three horses out
of 10. ✦
✦ Example 4.8. Let us count the number of poker hands. In poker, each player
is dealt five cards from a 52-card deck. We do not care in what order the five cards
are dealt, just what five cards we have. To count the number of sets of five cards
we may be dealt, we could start by calculating Π(52, 5), which is the number of
ordered selections of five objects out of 52. This number is 52!/(52 − 5)!, which is
52!/47!, or 52 × 51 × 50 × 49 × 48 = 311,875,200.
However, just as the three fastest horses in Example 4.7 appear in 3! = 6
different orders, any set of five cards to appear in Π(5) = 5! = 120 different orders.
Thus, to count the number of poker hands without regard to order of selection,
we must take the number of ordered selections and divide by 120. The result is
311,875,200/120 = 2,598,960 different hands. ✦
Counting Combinations
Let us now generalize Examples 4.7 and 4.8 to get a formula for the number of ways
to select m items out of n without regard to order of selection. This function is
n
usually written m and spoken “n choose m” or “combinations of m things out of
n
Combinations of n.” To compute m , we start with Π(n, m) = n!/(n − m)!, the number of ordered
m things out of selections of m things out of n. We then group these ordered selections according
n to the set of m items selected. Since these m items can be ordered in Π(m) = m!
different ways, the groups will each have m! members. We must divide the number
of ordered selections by m! to get the number of unordered selections. That is,
n Π(n, m) n!
= = (4.4)
m Π(m) (n − m)! × m!
Simplifying, we get 52
5 = 26 × 17 × 10 × 49 × 12 = 2,598,960. ✦
n
BASIS. 0 = 1 for any n ≥ 1. That is, there is only one way to pick zero things
out of n: pick nothing. Also, nn = 1; that is, the only way to pick n things out of
n is to pick them all.
172 COMBINATORICS AND PROBABILITY
n n−1 n−1
INDUCTION. If 0 < m < n, then m = m + m−1 . That is, if we wish to pick
m things out of n, we can either
i) and then pick m things from the remaining n − 1
Not pick the first element,
elements. The term n−1 m counts this number of possibilities.
or
ii) Pick the first element and then select m − 1 things from among the remaining
n−1
n − 1 elements. The term m−1 counts these possibilities.
Incidently, while the idea of the induction should be clear — we proceed from
the simplest cases of picking all or none to more complicated cases where we pick
some but not all — we have to be careful to state what quantity the induction
is “on.” One way to look at this induction is that it is a complete induction on
the product of n and the minimum of m and n − m. Then the basis case occurs
when this product is 0 and the induction is for larger values of the product. We
have to check for the induction that n × min(m, n − m) is always greater than
(n − 1) × min(n, n − m − 1) and (n − 1) × min(m − 1, n − m) when 0 < m < n. This
check is left as an exercise.
Pascal’s triangle This recursion is often displayed by Pascal’s triangle, illustrated in Fig. 4.8,
where the borders are all 1’s (for the basis) and each interior entry is the sum of
the two numbers above it to the northeast and northwest (for the induction). Then
n
m can be read from the (m + 1)st entry of the (n + 1)st row.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
✦ Example 4.10. Consider the case where n = 4 and m = 2. We find the value
of 42 in the 3rd entry of the 5th row of Fig. 4.8. This entry is 6, and it is easy to
n
The two ways we have to compute m — by formula (4.4) or by the above
recursion — each compute the same value, naturally. We can argue so by appeal to
physical reasoning. Both methods compute the number of ways to select m items
out of n in an unordered fashion, so they must produce the same value. However,
we can also prove the equality of the two approaches by an induction on n. We
leave this proof as an exercise.
SEC. 4.5 UNORDERED SELECTIONS 173
n
Running Time of Algorithms to Compute m
n
As we saw in Example 4.9, when we use formula (4.4) to compute m we can cancel
n
(n − m)! in the denominator against the last n − m factors in n! to express m as
n n × (n − 1) × · · · × (n − m + 1)
= (4.5)
m m × (m − 1) × · · · × 1
If m is small compared to n, we can evaluate the above formula much faster than
we can evaluate (4.4). In principle, the fragment of C code in Fig. 4.9 does the job.
(1) c = 1;
(2) for (i = n; i > n-m; i--)
(3) c *= i;
(4) for (i = 2; i <= m; i++)
(5) c /= i;
Fig. 4.9. Code to compute n
m
.
n
Line (1) initializes c to 1; c will become the result, m . Lines (2) and (3)
multiply c by each of the integers between n − m + 1 and n. Then, lines (4) and (5)
divide c by each of the integers between 2 and m. Thus, Fig. 4.9 implements the
formula of Equation (4.5).
For the running time of Fig. 4.9, we have only to observe that the two loops,
lines (2) – (3) and lines (4) – (5), each iterate m times and have a body that takes
O(1) time. Thus, the running time is O(m).
In the case that m is close to n but n−m is small, we can interchange the role of
m and n−m. That is, we can cancel factors of n! and m!, getting n(n−1) · · · (m+1)
and divide that by (n − m)!. This approach gives us an alternative to (4.5), which
is
n n × (n − 1) × · · · × (m + 1)
= (4.6)
m (n − m) × (n − m − 1) × · · · × 1
Likewise, there is a code fragment similar to Fig. 4.9 that implements formula (4.6)
n
and takes time O(n − m). Since both n − m and m must be n or less for m to be
defined, we know that either way, O(n) is a bound on the running time. Moreover,
when m is either close to 0 or close to n, then the running time of the better of the
two approaches is much less than O(n).
However, Fig. 4.9 is flawed in an important way. It starts by computing the
product of a number of integers and then divides by an equal number of integers.
Since ordinary computer arithmetic can only deal with integers of a limited size
(often, about two billion is as large as an integer can get), we run the risk of
computing an intermediate result after line (3) of Fig. 4.9 that overflows the limit
n
on integer size. That may be the case even though the value of m is small enough
to be represented in the computer.
A more desirable approach is to alternate multiplications and divisions. Start
by multiplying by n, then divide by m. Multiply by n − 1; then divide by m − 1,
and so on. The problem with this approach is that we have no reason to believe
the result will be an integer at each stage. For instance, in Example 4.9 we would
begin by multiplying by 52 and dividing by 5. The result is already not an integer.
174 COMBINATORICS AND PROBABILITY
n
Formulas for m Must Yield Integers
It may not be obvious why the quotients of many factors in Equations (4.4), (4.5),
or (4.6) must always turn out to be an integer. The only simple argument is to
appeal to physical reasoning. The formulas all compute the number of ways to
choose m things out of n, and this number must be some integer.
It is much harder to argue this fact from properties of integers, without appeal-
ing to the physical meaning of the formulas. It can in fact be shown by a careful
analysis of the number of factors of each prime in numerator and denominator. As
a sample, look at the expression in Example 4.9. There is a 5 in the denominator,
and there are 5 factors in the numerator. Since these factors are consecutive, we
know one of them must be divisible by 5; it happens to be the middle factor, 50.
Thus, the 5 in the denominator surely cancels.
that part of line (6) that is involved in the calls and return, but not the time of the
recursive calls themselves. Then we can prove by induction on n:
STATEMENT S(n): If choose is called with first argument n and some second
argument m between 0 and n, then the running time T (n) of the call is at
most a(2n − 1).
INDUCTION. Assume S(n); that is, T (n) ≤ a(2n − 1). To prove S(n + 1), suppose
we call choose with first argument n + 1. Then Fig. 4.10 takes time a plus the time
of the two recursive calls on line (6). By the inductive hypothesis, each call takes
at most time a(2n − 1). Thus, the total time consumed is at most
a + 2a(2n − 1) = a(1 + 2n+1 − 2) = a(2n+1 − 1)
We have thus proved that T (n) ≤ a(2n − 1). Dropping the constant factor and
the low-order terms, we see that T (n) is O(2n ).
Curiously, while in our analyses of Chapter 3 we easily proved a smooth and
tight upper bound on running time, the O(2n ) bound on T (n) is smooth √ but not
tight. The proper smooth, tight upper bound is slightly less: O(2n / n). A proof
of this fact is quite difficult, but we leave as an exercise the easier
fact that the
n
running time of Fig. 4.10 is proportional to the value it returns: m . An important
observation is that the recursive algorithm of Fig. 4.10 is much less efficient that the
linear algorithm of Fig. 4.9. This example is one where recursion hurts considerably.
n
The Shape of the Function m
n
For a fixed value of n, the function of m that is m has a number of interesting
properties. For a large value of n, its form is the bell-shaped curve suggested
Bell curve in Fig. 4.11. We immediately notice that this function is symmetric around the
n n
midpoint n/2; this is easy to check using formula (4.4) that states m = n−m .
n
n
p
The maximum height at the center, that is, n/2 , is approximately 2 / πn/2.
For example, if n = 10, this formula gives 258.37, while 10
5 = 252.
√
The “thick part” of the curve extends for approximately n on either side of
the midpoint. For example, if n = 10, 000, then for m between 4900 and 5100 the
value of 10,000
m is close to the maximum. For m outside this range, the value of
10,000
m falls off very rapidly.
176 COMBINATORICS AND PROBABILITY
√
n
p
2n / πn/2
0 n/2 n
Fig. 4.11. The function n
m
for fixed n.
Binomial Coefficients
n
The function m , in addition to its use in counting, gives us the binomial co-
efficients. These numbers are found when we expand a two-term polynomial (a
Binomial binomial) raised to a power, such as (x + y)n .
When we expand (x + y)n , we get 2n terms, each of which is xm y n−m for some
m between 0 and n. That is, from each of the factors x + y we may choose either
x or y to contribute as a factor in a particular term. The coefficient of xm y n−m in
the expansion is the number of terms that are composed of m choices of x and the
remaining n − m choices of y.
We can generalize the idea that we used to calculate the coefficient of x2 y2
n
in Example 4.11. The coefficient of xm y n−m in the expansion of (x + y)n is m .
The reason is that we get a term xm y n−m whenever we select m of the n factors to
SEC. 4.5 UNORDERED SELECTIONS 177
provide an x and the remainderof the factors to provide y. The number of ways to
n
choose m factors out of n is m .
There is another interesting consequence of the relationship between binomial
n
coefficients and the function m . We just observed that
n
X n m n−m
(x + y)n = x y
m=0
m
EXERCISES
7 8 10 12
4.5.1: Compute the following values: (a) 3 (b) 3 (c) 7 (d) 11 .
4.5.2: In how many ways can we choose a set of 5 different letters out of the 26
possible lower-case letters?
4.5.3: What is the coefficient of
a) x3 y 4 in the expansion (x + y)7
b) x5 y 3 in the expansion of (x + y)8
4.5.4*: At Real Security, Inc., computer passwords are required to have four digits
(of 10 possible) and six letters (of 52 possible). Letters and digits may repeat. How
many different possible passwords are there? Hint : Start by counting the number
of ways to select the four positions holding digits.
4.5.5*: How many sequences of 5 letters are there in which exactly two are vowels?
4.5.6: Rewrite the fragment of Fig. 4.9 to take advantage of the case when n − m
is small compared with n.
4.5.7: Rewrite the fragment of Fig. 4.9 to convert to floating-point numbers and
alternately multiply and divide.
n n
4.5.8: Prove that if 0 ≤ m ≤ n, then m = n−m
n
a) By appealing to the meaning of the function m
b) By using Equation 4.4
n
4.5.9*: Prove by induction on n that the recursive definition of m correctly defines
n
m to be equal to n!/ (n − m)! × m! .
4.5.10**: Show by induction on n that the running time of the recursive function
n
choose(n,m) of Fig. 4.10 is at most c m for some constant c.
4.5.11*: Show that n × min(m, n − m) is always greater than
178 COMBINATORICS AND PROBABILITY
(n − 1) × min(n, n − m − 1)
and (n − 1) × min(m − 1, n − m) when 0 < m < n.
✦
✦ ✦
✦
4.6 Orderings With Identical Items
In this section, we shall examine a class of selection problems in which some of the
items are indistinguishable from one another, but the order of appearance of the
items matters when items can be distinguished. The next section will address a
similar class of problems where we do not care about order, and some items are
indistinguishable.
Anagrams ✦ Example 4.12. Anagram puzzles give us a list of letters, which we are asked
to rearrange to form a word. We can solve such problems mechanically if we have
a dictionary of legal words and we can generate all the possible orderings of the
letters. Chapter 10 considers efficient ways to check whether a given sequence of
letters is in the dictionary. But now, considering combinatorial problems, we might
start by asking how many different potential words we must check for presence in
the dictionary.
For some anagrams, the count is easy. Suppose we are given the letters abenst.
There are six letters, which may be ordered in Π(6) = 6! = 720 ways. One of these
720 ways is absent, the “solution” to the puzzle.
However, anagrams often contain duplicate letters. Consider the puzzle eilltt.
There are not 720 different sequences of these letters. For example, interchanging
the positions in which the two t’s appear does not make the word different.
Suppose we “tagged” the t’s and l’s so we could distinguish between them, say
t1 , t2 , l1 , and l2 . Then we would have 720 orders of tagged letters. However, pair
of orders that differ only in the position of the tagged l’s, such as l1 it2 t1 l2 e and
l2 it2 t1 l1 e, are not really different. Since all 720 orders group into pairs differing
only in the subscript on the l’s, we can account for the fact that the l’s are really
identical if we divide the number of strings of letters by 2. We conclude that the
number of different anagrams in which the t’s are tagged but the l’s are not is
720/2=360.
Similarly, we may pair the strings with only t’s tagged if they differ only in the
subscript of the t’s. For example, lit1 t2 le and lit2 t1 le are paired. Thus, if we
divide by 2 again, we have the number of different anagram strings with the tags
removed from both t’s and l’s. This number is 360/2 = 180. We conclude that
there are 180 different anagrams of eilltt. ✦
We may generalize the idea of Example 4.12 to a situation where there are
n items, and these items are divided into k groups. Members of each group are
indistinguishable, but members of different groups are distinguishable. We may let
mi be the number of items in the ith group, for i = 1, 2, . . . , k.
✦ Example 4.13. Reconsider the anagram problem eilltt from Example 4.12.
Here, there are six items, so n = 6. The number of groups k is 4, since there are
SEC. 4.6 ORDERINGS WITH IDENTICAL ITEMS 179
4 different letters. Two of the groups have one member (e and i), while the other
two groups have two members. We may thus take i1 = i2 = 1 and i3 = i4 = 2. ✦
If we tag the items so members of a group are distinguishable, then there are
n! different orders. However, if there are i1 members of the first group, these tagged
items may appear in i1 ! different orders. Thus, when we remove the tags from the
items in group 1, we cluster the orders into sets of size i1 ! that become identical.
We must thus divide the number of orders by i1 ! to account for the removal of tags
from group 1.
Similarly, removing the tags from each group in turn forces us to divide the
number of distinguishable orders by i2 !, by i3 !, and so on. For those ij ’s that are 1,
this division is by 1! = 1 and thus has no effect. However, we must divide by the
factorial of the size of each group of more than one item. That is what happened in
Example 4.12. There were two groups with more than one member, each of size 2,
and we divided by 2! twice. We can state and prove the general rule by induction
on k.
INDUCTION. Suppose S(k) is true, and consider a situation with k + 1 groups. Let
the last group have m = ik+1 members. These items will appear in m positions,
n
and we can choose these positions in m different ways. Once we have chosen the
m positions, it does not matter which items in the last group we place in these
positions, since they are indistinguishable.
Having chosen the positions for the last group, we have n − m positions left to
fill with the remaining k groups. The inductive hypothesis applies and tellsQus that
k
each selection of positions for the last group can be coupled with (n − m)!/ j=1 ij !
distinguishable orders in which to place the remaining groups in the remaining
positions. This formula is just (4.7) with n − m replacing n since there are only
n − m items remaining to be placed. The total number of ways to order the k + 1
groups is thus
n
m (n − m)!
Qk (4.8)
j=1 ij !
n
Let us replace m in (4.8) by its equivalent in factorials: n!/ (n − m)!m! . We
then have
180 COMBINATORICS AND PROBABILITY
n! (n − m)!
(4.9)
(n − m)!m! kj=1 ij !
Q
We may cancel (n − m)! from numerator and denominator in (4.8). Also, remember
that m is ik+1 , the number of members in the (k + 1)st group. We thus discover
that the number of orders is
n!
Qk+1
j=1 ij !
✦ Example 4.14. An explorer has rations for two weeks, consisting of 4 cans of
Tuna, 7 cans of Spam, and 3 cans of Beanie Weenies. If he opens one can each day,
in how many orders can he consume the rations? Here, there are 14 items divided
into groups of 4, 7, and 3 identical items. In terms of Equation (4.7), n = 14, k = 3,
i1 = 4, i2 = 7, and i3 = 3. The number of orders is thus
14!
4!7!3!
Let us begin by canceling the 7! in the denominator with the last 7 factors in
14! of the numerator. That gives us
14 × 13 × 12 × 11 × 10 × 9 × 8
4×3×2×1×3×2×1
Continuing to cancel factors in the numerator and denominator, we find the resulting
product is 120,120. That is, there are over a hundred thousand ways in which to
consume the rations. None sounds very appetizing. ✦
EXERCISES
4.6.1: Count the number of anagrams of the following words: (a) error (b) street
(c) allele (d) Mississippi.
4.6.2: In how many ways can we arrange in a line
a) Three apples, four pears, and five bananas
b) Two apples, six pears, three bananas, and two plums
4.6.3*: In how many ways can we place a white king, a black king, two white
knights, and a black rook on the chessboard?
4.6.4*: One hundred people participate in a lottery. One will win the grand prize
of $1000, and five more will win consolation prizes of a $50 savings bond. How
many different possible outcomes of the lottery are there?
4.6.5: Write a simple formula for the number of orders in which we may place 2n
objects that occur in n pairs of two identical objects each.
SEC. 4.7 DISTRIBUTION OF OBJECTS TO BINS 181
✦
✦ ✦
✦
4.7 Distribution of Objects to Bins
Our next class of counting problems involve the selection of a bin in which to place
each of several objects. The objects may or may not be identical, but the bins are
distinguishable. We must count the number of ways in which the bins can be filled.
✦ Example 4.15. Kathy, Peter, and Susan are three children. We have four
apples to distribute among them, without cutting apples into parts. In how many
different ways may the children receive apples?
There are sufficiently few ways that we can enumerate them. Kathy may receive
anything from 0 to 4 apples, and whatever remains can be divided between Peter
and Susan in only a few ways. If we let (i, j, k) represent the situation in which
Kathy receives i apples, Peter receives j, and Susan receives k, the 15 possibilities
are as shown in Fig. 4.12. Each row corresponds to the number of apples given to
Kathy.
A’s, and if there are m bins, we need m − 1 *’s to serve as boundaries between the
portions for the various bins. Thus, strings are of length n + m − 1.
We may choose any n of these positions to hold A’s, and the rest will hold *’s.
There are thus n+m−1 n strings of A’s and *’s, so there are this many distributions
of objects to bins. In Example 4.15 we had n = 4 and m = 3, and we concluded
6
that there were n+m−1
n = 4 distributions.
Chuck-a-Luck ✦ Example 4.16. In the game of Chuck-a-Luck we throw three dice, each with
six sides numbered 1 through 6. Players bet a dollar on a number. If the number
does not come up, the dollar is lost. If the number comes up one or more times,
the player wins as many dollars as there are occurrences of the number.
We would like to count the “outcomes,” but there may initially be some ques-
tion about what an “outcome” is. If we were to color the dice different colors so
we could tell them apart, we could see this counting problem as that of Section 4.2,
where we assign one of six numbers to each of three dice. We know there are
63 = 216 ways to make this assignment.
However, the dice ordinarily aren’t distinguishable, and the order in which the
numbers come up doesn’t matter; it is only the occurrences of each number that
determines which players get paid and how much. For instance, we might observe
that 1 comes up on two dice and 6 comes up on the third. The 6 might have
appeared on the first, second, or third die, but it doesn’t matter which.
Thus, we can see the problem as one of distribution of identical objects to bins.
The “bins” are the numbers 1 through 6, and the “objects” are the three dice. A
die is “distributed” to the bin
corresponding to the number thrown on that die.
Thus, there are 6+3−1
3 = 8
3 = 56 different outcomes in Chuck-a-Luck. ✦
✦ Example 4.17. Suppose we have three apples, two pears, and a banana to
distribute to Kathy, Peter, and Susan. Then m = 3, the number of “bins,” which
is the number of children. There are k = 3 groups, with i1 = 3, i2 = 2, and i3 = 1.
Since there are 6 objects in all, n = 6, and the strings in question are of length
SEC. 4.7 DISTRIBUTION OF OBJECTS TO BINS 183
The problems of Sections 4.2 and 4.4 are not differentiated in the table above.
The distinction is one of replacement, as discussed in the box on “Selections With
and Without Replacement” in Section 4.4. That is, in Section 4.2 we had an infinite
supply of each “color” and could select a color many times. In Section 4.4, a “horse”
selected is not available for later selections.
n + m − 1 = 8. The strings consist of three A’s standing for apples, two P’s standing
for pears, one B standing for the banana, and two *’s, the boundaries between the
shares of the children. The formula for the number of distributions is thus
(n + m − 1)! 8!
= = 1680
(m − 1)!i1 !i2 !i3 ! 2!3!2!1!
ways in which these fruits may be distributed to Kathy, Peter, and Susan. ✦
EXERCISES
✦
✦ ✦
✦
4.8 Combining Counting Rules
The subject of combinatorics offers myriad challenges, and few are as simple as
those discussed so far in this chapter. However, the rules learned so far are valuable
building blocks that may be combined in various ways to count more complex
structures. In this section, we shall learn three useful “tricks” for counting:
1. Express a count as a sequence of choices.
2. Express a count as a difference of counts.
3. Express a count as a sum of counts for subcases.
✦ Example 4.18. Let us count the number of poker hands that are one-pair
hands. A hand with one pair consists of two cards of one rank and three cards
of ranks1 that are different and also distinct from the rank of the pair. We can
describe all one-pair hands by the following steps.
1. Select the rank of the pair.
2. Select the three ranks for the other three cards from the remaining 12 ranks.
3. Select the suits for the two cards of the pair.
4. Select the suits for each of the other three cards.
If we multiply all these numbers together, we shall have the number of one-pair
hands. Note that the order in which the cards appear in the hand is not important,
as we discussed in Example 4.8, and we have made no attempt to specify the order.
Now, let us take each of these factors in turn. We can select the rank of the pair
in 13 different ways. Whichever rank we select for the pair, we have 12 ranks left.
We must select 3 of these for the remaining cards of the hand. This is a selection
in which order is unimportant, as discussed in Section 4.5. We may perform this
selection in 12
3 = 220 ways.
Now, we must select the suits for the pair. There are four suits, and we must
select two of them. Again we have an unordered selection, which we may do in
4
2 = 6 ways. Finally, we must select a suit for each of the three remaining cards.
Each has 4 choices of suit, so we have an assignment like those of Section 4.2. We
may make this assignment in 43 = 64 ways.
The total number of one-pair hands is thus 13 × 220 × 6 × 64 = 1, 098, 240.
This number is over 40% of the total number of 2,598,960 poker hands. ✦
✦ Example 4.19. There are a number of other poker hands — two pairs, three
of a kind, four of a kind, and full house — that can be counted in a manner similar
to Example 4.18. However, there are other hands that require a different approach.
First, let us consider a straight-flush, which is five cards of consecutive rank (a
straight) of the same suit (a flush). First, each straight begins with one of the nine
ranks 2 through 10 as the lowest card. That is, the straights are 2-3-4-5-6, 3-4-5-6-7,
and so on, up to 10-Jack-Queen-King-Ace. Once the ranks are determined, the
straight-flush can be completely specified by giving the suit. Thus, we can count
the straight-flushes by
1. Select the lowest rank in the straight (9 choices).
2. Select the suit (4 choices).
Thus, there are 9 × 4 = 36 straight-flushes.
Now, let us count the straights, that is, those hands whose ranks are consec-
utive but that are not straight-flushes. We shall first count all those hands with
consecutive ranks, regardless of whether the suits are the same, and then subtract
the 36 straight-flushes. To count hands with consecutive ranks, we can
1. Select the low rank (9 choices).
2. Assign a suit to each of the five ranks (45 = 1024 choices, as in Section 4.2).
The number of straights and straight-flushes is thus 9 × 1024 = 9216. When we
subtract the straight-flushes, we are left with 9216 − 36 = 9180 hands that are
classified as straights.
Next, let us count the number of flushes. Again, we shall first include the
straight-flushes and then subtract 36. We can define a flush by
186 COMBINATORICS AND PROBABILITY
EXERCISES
a) Two pairs
b) Three of a kind
c) Full house
d) Four of a kind
Be careful when counting one type of hand not to include hands that are better.
For example, in (a), make sure that the two pairs are different (so you don’t really
have four of a kind) and the fifth card is different from the pairs (so you don’t have
a full house).
Blackjack 4.8.2*: A blackjack consists of two cards, one of which is an Ace and the other of
which is a 10-point card, either a 10, Jack, Queen, or King.
a) How many different blackjacks are there in a 52-card deck?
b) In the game blackjack, one card is dealt down and the other is dealt up. Thus,
in a sense order of the two cards matters. In this case, how many different
blackjacks are there?
Pinochle deck c) In a pinochle deck there are eight cards of each rank 9, 10, Jack, Queen, King,
Ace (two indistinguishable cards of each suit) and no other cards. How many
blackjacks are there, assuming order is unimportant?
4.8.3: How many poker hands are “nothing” (i.e., not one-pair or better)? You may
use the results of Examples 4.18 and 4.19 as well as the answer to Exercise 4.8.1.
4.8.4: If we toss 12 coins in sequence, how many have
a) At least 9 heads
b) At most 4 heads
c) Between 5 and 7 heads
d) Fewer than 2 or more than 10 heads
4.8.5*: How many outcomes in Chuck-a-Luck have at least one 1?
4.8.6*: How many anagrams of the word little are there in which the two t’s are
not adjacent?
Bridge 4.8.7**: A bridge hand consists of 13 of the 52 cards. We often classify hands by
“distribution,” that is, the way cards are grouped into suits. For example, a hand
of 4-3-3-3 distribution has four of one suit and three of each of the other suits. A
hand with 5-4-3-1 distribution has one suit of five cards, one of four, one of three,
and one of one card. Count the number of hands with the following distributions:
(a) 4-3-3-3 (b) 5-4-3-1 (c) 4-4-3-2 (d) 9-2-2-0.
✦
✦ ✦
✦
4.9 Introduction to Probability Theory
Probability theory, along with its general importance, has many uses in computer
science. One important application is the estimation of the running time of pro-
grams in the case of average or typical inputs. This evaluation is important for
those algorithms whose worst-case running time is very much larger than the aver-
age running time. We shall see examples of such evaluations shortly.
Another use of probability is in designing algorithms for making decisions in
the presence of uncertainty. For example, we can use probability theory to design
188 COMBINATORICS AND PROBABILITY
algorithms for making the best possible medical diagnosis from available information
or algorithms for allocating resources on the basis of expected future needs.
Probability Spaces
When we speak of a probability space, we mean a finite set of points, each of which
Experiment, represents one possible outcome of an experiment. Each point x has associated with
outcome it a nonnegative real number called the probability of x, such that the sum of the
probabilities of all the points is 1. One also speaks of probability spaces that have
infinite numbers of points, although there is little application for these in computer
science, and we shall not deal with them here.
Commonly, the points of a probability space have equal probability. Unless we
state otherwise, you may assume that the probabilities of the various points in a
probability space are equal. Thus, if there are n points in the probability space the
probability of each point is 1/n.
✦ Example 4.22. In Fig. 4.13 is a probability space with six points. The points
are each identified with one of the numbers from 1 to 6, and we may think of this
space as representing the outcomes of the “experiment” of throwing a single fair
die. That is, one of the six numbers will appear on top of the die, and each number
is equally likely to appear, that is, 1/6th of the time. ✦
1 2 3
• • •
• • •
4 5 6
Event Any subset of the points in a probability space is called an event. The probability
of an event E, denoted PROB(E), is the sum of the probabilities of the points in
E. If the points are all equally likely, then we can compute the probability of E by
dividing the number of points in E by the number of points in the total probability
space.
Probability Calculations
Often, the calculation of the probability of an event involves combinatorics. We
must count the number of points in the event as well as the number of points in
the entire probability space. When points are equally likely, the ratio of these two
counts is the probability of the event. We shall give a series of examples to illustrate
the calculation of probabilities in this fashion.
SEC. 4.9 INTRODUCTION TO PROBABILITY THEORY 189
Event
E
We may suppose that any point in the square is equally likely to be chosen.
The “experiment” may be thought of as throwing a dart at the square in such a
way that the dart is equally likely to wind up anywhere within the square, but not
outside it. Although any point has only infinitesimal probability of being hit, the
probability of a region of the square is the ratio of the area of the region to the area
of the entire square. Thus, we can compute the probability of certain events.
For example, we show within the probability space an event E consisting of an
ellipse contained within the square. Let us suppose the area of the ellipse is 29%
of the area of the square. Then PROB(E) is 0.29. That is, if the dart is thrown at
random at the square, 29% of the time the dart will land within the ellipse.
✦ Example 4.23. Figure 4.14 shows the probability space representing the throw
of two dice. That is, the experiment is the tossing of two dice, in order, and observing
the numbers on their upper faces. Assuming the dice are fair, there are 36 equally
likely points, or outcomes of the experiment, so each point has probability 1/36.
Each point corresponds to the assignment of one of six values to each die. For
example, (2, 3) represents the outcome where 2 appears on the first die and 3 on
the second. Pair (3, 2) represents 3 on the first die and 2 on the second.
The outlined region represents the event “craps,” that is, a total 7 or 11 on the
two dice. There are eight points in this event, six where the total is 7 and two where
the total is 11. The probability of throwing craps is thus 8/36, or about 22%. ✦
✦ Example 4.24. Let us calculate the probability that a poker hand is a one-
pair hand. We learned in Example 4.8 that there are 2,598,960 different poker
hands. Consider the experiment of dealing a poker hand fairly, that is, with all
hands equally likely. Thus, the probability space for this experiment has 2,598,960
190 COMBINATORICS AND PROBABILITY
Fig. 4.14. The event “craps” in the probability space for the toss of two dice.
points. We also learned in Example 4.18 that 1,098,240 of these points represent
hands classified as one pair. Assuming all hands are equally likely to be dealt, the
probability of the event “one pair” is 1,098,240/2,598,960, or about 42%. ✦
✦ Example 4.25. In the game of Keno, twenty of the numbers from 1 to 80 are
selected at random. Before the selection, players may guess some numbers; we shall
Keno concentrate on the “5-spot game,” where players guess five numbers. A player who
guesses correctly three, four, or five of the twenty selected numbers is rewarded;
the amount of the payoff increases with the number of correct guesses. We shall
calculate the probability that a player guesses exactly three numbers correctly in a
5-spot game. The probabilities of guessing four or five correctly are left as exercises.
To begin, the appropriate probability space has one point for each possible
selection of twenty numbers from 1 to 80. The number of such selections is
80 80!
=
20 20!60!
SEC. 4.9 INTRODUCTION TO PROBABILITY THEORY 191
75! 20!60!
10( )( )
17!58! 80!
Now, we find in the numerator and denominator pairs of factorials that are close
and can almost be canceled. For instance, 75! in the numerator and 80! in the
denominator can be replaced by the product of the five numbers from 80 down to
76 in the denominator. The resulting simplification is
10 × 60 × 59 × 20 × 19 × 18
80 × 79 × 78 × 77 × 76
Now we have a computation that involves manageable numbers. The result is
about 0.084. That is, about 8.4% of the time the player guesses three out of five
correctly. ✦
Fundamental Relationships
Let us close this section by observing several important properties of probabilities.
First, if p is the probability of any event, then
0≤p≤1
That is, any event consists of 0 or more points, so its probability cannot be negative.
Also, no event can consist of more points than are in the entire probability space,
so its probability cannot exceed 1.
Complement Second, let E be an event in a probability space P . Then the complement event
event Ē for E is the set of points of P that are not in event E. We may observe that
PROB(E) + PROB(Ē) =1
or put another way, PROB(Ē) = 1 − PROB(E). The reason is that every point in P
is either in E or in Ē, but not both.
EXERCISES
4.9.1: Using the probability space for the toss of two fair dice shown in Fig. 4.14,
give the probabilities of the following events:
a) A six is thrown (i.e., the sum of the dice is 6)
b) A ten is thrown
c) The sum of the dice is odd
d) The sum of the dice is between 5 and 9
4.9.2*: Calculate the probabilities of the following events. The probability space
is the deal of two cards in order from an ordinary 52-card deck.
a) At least one card is an Ace
b) The cards are of the same rank
c) The cards are of the same suit
d) The cards are of the same rank and suit
e) The cards are the same either in rank or in suit
f) The first card is of a higher rank than the second card
4.9.3*: A dart is thrown at a one foot square on the wall, with equal likelihood of
entering the square at any point. What is the probability that the dart is thrown
SEC. 4.10 CONDITIONAL PROBABILITY 193
✦
✦ ✦
✦
4.10 Conditional Probability
In this section we shall develop a number of formulas and strategies for think-
ing about relationships among the probabilities of several events. One important
development is a notion of independent experiments, where the outcome of one ex-
periment does not affect the outcome of others. We shall also use our techniques to
calculate probabilities in some complicated situations.
These developments depend upon a notion of “conditional probability.” Infor-
mally, if we conduct an experiment and we find that an event E has occurred, it
may or may not be the case that the point representing the outcome is also in some
other event F . Figure 4.15 suggests this situation. The conditional probability of
F given E is the probability that F has also occurred.
Fig. 4.15. The conditional probability of F given E is the probability that the
outcome is in A divided by the probability that the outcome is in A or B.
Formally, if E and F are two events in a probability space, we say the condi-
tional probability of F given E, denoted PROB(F/E), is the sum of the probabilities
of the points that are in both E and F divided by the sum of the probabilities of
the points in E. In Fig. 4.15, region A is those points that are in both E and F ,
and B is those points that are in E but not F . If all points are equally likely, then
194 COMBINATORICS AND PROBABILITY
✦ Example 4.26. Let us consider the probability space of Fig. 4.14, which repre-
sents the toss of two dice. Let the event E be the six points in which the first die
comes out 1, and let the event F be the six points in which the second die comes
out 1. The situation is shown in Fig. 4.16. There is one point in both E and F ,
namely the point (1, 1). There are five points in E that are not in F . Thus, the
conditional probability PROB(F/E) is 1/6. That is, the chance that the second die
is 1, given that the first die is 1, is 1/6.
We may notice that this conditional probability is exactly the same as the
probability of F itself. That is, since F has 6 out of the 36 points in the space,
PROB(F ) = 6/36 = 1/6. Intuitively, the probability of throwing 1 on the second
die is not affected by the fact that 1 has been thrown on the first die. We shall
soon define the notion of “independent experiments,” such as the throwing of dice
in sequence, where the outcome of one experiment does not influence the outcome
of the others. In these cases, if E and F are events representing outcomes of the
two experiments, we expect that PROB(F/E) = PROB(F ). We have just seen an
example of this phenomenon. ✦
✦ Example 4.27. Suppose our experiment is the deal of two cards, in order,
from the usual 52-card deck. The number of points in this experiment of selection
without replacement (as in Section 4.4) is 52 × 51 = 2652. We shall assume that
the deal is fair, so each point has the same probability.
Let the event E be that the first card is an Ace, and let event F be that the
second card is an Ace. Then the number of points in E is 4 × 51 = 204. That is, the
first card must be one of the four Aces, and the second card can be any of 51 cards,
excluding the Ace that was chosen first. Thus, PROB(E) = 204/2652 = 1/13. That
result matches our intuition. All 13 ranks being equally likely, we expect that one
time in 13 an Ace will appear first.
Similarly, the number of points in event F is 51 × 4 = 204. We can choose
any of the 4 Aces for the second card and any of the remaining 51 cards for the
first card. The fact that the first card is theoretically dealt first is irrelevant. There
are thus 204 outcomes in which an Ace appears in the second position. Therefore,
PROB(F ) = 1/13 just as for E. Again, this result meets our intuition that one time
in 13 an Ace will be dealt as the second card.
Now, let us compute PROB(F/E). Of the 204 points in E, there are 12 that
have an Ace in the second position and therefore are also in F . That is, all points
in E have an Ace in the first position. We may select this Ace in 4 different ways,
corresponding to the 4 suits. For each selection, we have 3 different choices of
Ace for the second position. Thus, the number of choices of two Aces with order
considered is 4 × 3 according to the technique of Section 4.4.
Therefore, the conditional probability PROB(F/E) is 12/204, or 1/17. We no-
tice that the conditional probability of F given E is not the same as the probability
of F in this example. That also makes intuitive sense. The probability of getting an
Ace in the second position goes down when we know there is an Ace in the first po-
sition. For then, there are only 3 Aces remaining out of 51 cards, and 3/51 = 1/17.
SEC. 4.10 CONDITIONAL PROBABILITY 195
F
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
• • • • • •
In comparison, if we know nothing about the first card, there are 4 Aces out of 52
that we may receive as the second card. ✦
Independent Experiments
As we suggested in Examples 4.23, 4.26, and 4.27, we sometimes form a probability
space representing the outcomes of two or more experiments. In the simplest cases,
Joint the points in this joint probability space are lists of outcomes, one for each of
probability the experiments. Figure 4.16 is an example of a probability space that is joint
space between two experiments. In other situations, where there is a connection between
the outcomes of the experiments, the joint space may have some points missing.
Example 4.27 discussed such a case, where the joint space represented the deal of
two cards and the pairs of outcomes in which the two cards are identical are not
possible.
There is an intuitive notion of the outcome of one experiment X being “inde-
pendent” of previous experiments in the sequence, meaning that the probabilities
of the various outcomes of X do not depend on the outcomes of the previous ex-
periments. Thus, in Example 4.26 we argued that the roll of the second die is
independent of the roll of the first die, while in Example 4.27 we saw that the
196 COMBINATORICS AND PROBABILITY
second card dealt was not independent of the first, since the first card was then
unavailable.
In defining independence, we shall focus on two experiments. However, since
either experiment may itself be the sequence of several experiments, we effectively
cover the case of many experiments. We must begin with a probability space that
represents the outcome of two successive experiments, X1 and X2 .
✦ Example 4.28. Figure 4.14 illustrates a joint probability space in which ex-
periment X1 is the throw of the first die and X2 is the throw of the second die.
Here, every pair of outcomes is represented by one point, and the points have equal
probability, 1/36.
In Example 4.27 we discussed a space of 2652 points representing the selection
of two cards in order. This space consists of all pairs (C, D) in which C and D are
cards, and C 6= D. Again, each of these points has the same probability, 1/2652. ✦
✦ Example 4.29. In Fig. 4.16, E is E1 , the event of all points for which the
outcome of the first experiment is 1. Likewise, F is the event F1 , whose points are
those where the outcome of the second experiment is 1. More generally, each row
corresponds to one of the six possible outcomes of the first experiment, and each
column corresponds to one of the six possible outcomes of the second experiment. ✦
Next, introduce factor ri in both the numerator and denominator in each term of
the sum above. The result is
k
X ei ri
PROB(E) = ( )( )
r n
i=1 i
Now, notice that ri /n = PROB(Ri ); that is, ri /n is the fraction of the entire space
in region Ri . Also, ei /ri is PROB(E/Ri ), the conditional probability of event E
given event Ri . Put another way, ei /ri is the fraction of the points in region Ri
that are in E. The result is the following formula for the probability of an event E.
k
X
PROB(E) = PROB(E/Ri )PROB(Ri ) (4.10)
i=1
Informally, the probability of E is the sum over all regions of the probability of
being in that region times the probability of E within that region.
2 A “region” is synonymous with an “event,” that is, a subset of a probability space. However,
we shall use the term region to emphasize the fact that the space is being partitioned into
events that completely cover the space and do not overlap.
198 COMBINATORICS AND PROBABILITY
✦ Example 4.31. The diagram in Fig. 4.17 suggests how Equation (4.10) is to
be applied. There we see a probability space that has been divided vertically into
three regions, R1 , R2 , and R3 . There is an event E, which we doubly outline. We
let a through f be the numbers of points in the six sets shown.
d
f
b
E
c e
a
R1 R2 R3
✦ Example 4.32. Let us use Equation (4.10) to compute the probability of the
event E that two cards dealt in order are both Aces. The probability space is the
2652 points discussed in Example 4.27. We shall divide this space into two regions:
R1 : Those points in which the first card is an Ace. There are 4 × 51 = 204 such
points, since we may pick the first card to be an Ace in 4 ways, and there are
then 51 choices of second card.
R2 : The remaining 2448 points.
SEC. 4.10 CONDITIONAL PROBABILITY 199
✦ Example 4.33. Let us apply Equation (4.10) to the problem of computing the
probability of the event E that at least one 1 appears in the toss of three dice, as in
the game Chuck-a-Luck described in Example 4.16. First, we must understand that
the notion of an “outcome” described in that example does not match the notion
of a point in a probability space. In Example 4.16 we established that there were
56 different “outcomes,” which were defined to be the numbers of occurrences of 1
through 6 on the faces of the dice. For instance, “a 4, a 5, and a 6” is one possible
outcome; “two 3’s and a 4” is another. However, not all outcomes in this sense have
the same probability. In particular, outcomes with three different numbers showing
have twice the probability of outcomes with two of one number and six times the
probability of an outcome where all three dice show the same number.
While we could use the probability space whose points are “outcomes ” in
the sense of Example 4.16, it is more more natural to consider the order in which
the dice are rolled and thus develop a probability space whose points have equal
probability. There are 63 = 216 different outcomes corresponding to the roll of
three dice, in order, and each of these outcomes has probability 1/216.
We could calculate the probability of at least one 1 in a direct manner, without
using Equation (4.10). First, calculate the number of rolls in which no 1 appears.
We can assign to each of the three dice any of the numbers from 2 to 6. There are
thus 53 = 125 points in the space that have no 1, and 216 − 125 = 91 points that
do have a 1. Therefore, PROB(E) = 91/216, or about 42%.
The approach above is short but requires that we use several “tricks.” An-
other way to calculate the probability “by force” is to divide the space into three
regions, corresponding to the cases in which there are one, two, or three different
numbers showing. Let Ri be the region of points with i different numbers. We
can calculate the probabilities of the various regions as follows. For R3 there are
only six points, one for each of the numbers 1 through 6 that may appear on all
three dice. For R1 there are 6 × 5 × 4 = 120 ways to select three different numbers
out of six, according to the rule of Section 4.4. Thus, R2 must have the remain-
ing 216 − 6 − 120 = 90 points.3 The probabilities of the regions are PROB(R1 ) =
6/216 = 1/36, PROB(R2 ) = 90/216 = 5/12, and PROB(R3 ) = 120/216 = 5/9.
Next, we can calculate the conditional probabilities. If there are three numbers
out of the possible six showing, the probability is 1/2 that one of them is 1. If two
3 We can compute this number directly by multiplying 6 ways that we can choose the number
to appear twice, times 5 ways we can pick the number on the remaining dice, times 31 = 3
ways we can pick the die that has the unique number. Note that 6 × 5 × 3 = 90.
200 COMBINATORICS AND PROBABILITY
numbers are showing, the probability is 1/3 that 1 appears at least once. If only
one number shows, there is a 1/6 chance that it is 1. Thus, PROB(E/R1 ) = 1/6,
PROB(E/R2 ) = 1/3, and PROB(E/R3 ) = 1/2. We can put all these probabilities
together to evaluate (4.10). The result is
PROB(E) = (1/6)(1/36) + (1/3)(5/12) + (1/2)(5/9)
= 1/216 + 5/36 + 5/18 = 91/216
This fraction agrees with the direct calculation, of course. If we see the “trick” of
the direct calculation, that approach is considerably easier. However, breaking the
problem into regions is frequently a more reliable way to guarantee success. ✦
A useful way to look at (4.11) is that the probability of E is the average over all
regions of the probability of E given we are in that region.
Now, consider a probability space that represents the outcomes of two inde-
pendent experiments X1 and X2 . We may divide this space into k regions, each the
set of points for which the outcome of X1 has a particular value. Then each of the
regions has the same probability, 1/k.
Suppose we want to calculate the probability of the event E in which X1 has
the outcome a and X2 has the outcome b. We may use formula (4.11). If Ri is not
the region corresponding to outcome a for X1 , then PROB(E/Ri ) = 0. Thus, all
but the term for the region of a drops out of (4.11). If Ra is that region, we get
1
PROB(E) = PROB(E/Ra ) (4.12)
k
What is PROB(E/Ra )? It is the probability that X1 has outcome a and X2 has
outcome b, given that X1 has outcome a. Since we are given that X1 has outcome
a, PROB(E/Ra ) is just the probability that X2 has outcome b given that X1 has
outcome a. Since X1 and X2 are independent, PROB(E/Ra ) is just the probability
that X2 has outcome b. If there are m possible outcomes for X2 , then PROB(E/Ra )
is 1/m. Then (4.12) becomes
1 1
PROB(E) = ( )( )
k m
We may generalize the above reasoning to any number of experiments. To do
so, we let experiment X1 be a sequence of experiments and show by induction on
the total number of independent experiments that the probability of all having a
particular sequence of outcomes is the product of the probabilities of each outcome.
SEC. 4.10 CONDITIONAL PROBABILITY 201
✦ Example 4.34. The probability that the last four digits of a phone number
are 1234 is 0.0001. The selection of each digit is an experiment with ten possible
outcomes: 0 through 9. Moreover, each selection is independent of the other se-
lections, since we are performing a “selection with replacement” as in Section 4.2.
The probability that the first digit is 1 is 1/10. Similarly, the probability that the
second digit is 2 is 1/10, and likewise for the other two digits. The probability of
the event that the four digits are 1234 in order is (1/10)4 = 0.0001. ✦
EXERCISES
4.10.1: Using the space of Fig. 4.14, give the conditional probabilities of the fol-
lowing pairs of events.
a) The second die is even, given that the first die is odd
b) The first die is even, given that the second die is at least 3
c) The sum of the dice is at least 7, given that the first die is 4
d) The second die is 3, given that the sum of the dice is 8
4.10.2: Divide the Chuck-a-Luck (see Example 4.16) probability space into three
regions, as in Example 4.33. Use this division and Equation 4.10 to compute the
probability that
a) There are at least two 1’s showing
b) All three dice are 1
c) There is exactly one 1 showing
4.10.3: Show that in Chuck-a-Luck, the probability of any event in which all three
dice have different values is twice the probability of any event where one number
appears exactly twice and six times the probability of any event in which all three
dice show the same number.
202 COMBINATORICS AND PROBABILITY
4.10.6*: Consider the set of sequences of seven letters chosen from W and L. We may
think of these sequences as representing the outcomes of a match of seven games,
where W means the first team wins the game and L means the second team wins the
game. The match is won by the first team to win four games (thus, some games
may never get played, but we need to include their hypothetical outcomes in the
points in order that we have a probability space of equally likely points).
a) What is the probability that a team will win the match, given that it has won
the first game?
b) What is the probability that a team will win the match, given that it has won
the first two games?
c) What is the probability that a team will win the match, given that it has won
two out of the first three games?
4.10.7**: There are three prisoners, A, B, and C. They are told one and only one
is to be shot and that the guard knows who. A asks the guard to tell him the name
of one of the other prisoners who will not be shot. The guard answers that B will
not be shot.
A reasons that either he or C will be shot, so the probability that A will be
shot is 1/2. On the other hand, reasons A, no matter who is to be shot, the guard
knows somebody besides A who will not be shot, so he always has an answer to
A’s question. Therefore, the asking and answering of the question provides no
information about whether or not A is to be shot, so the probability that A will be
shot is still 1/3, as it was before the question was asked.
What is the true probability that A will be shot after the sequence of events
described above? Hint : You need to construct an appropriate probability space,
one that represents not only the experiment in which a prisoner is chosen to be shot
but also the possibility that the guard has a choice of whether to answer “B” or
“C,” and the experiment in which he chooses one if so.
Bayes’ Rule This formula is called Bayes’ Rule. It gives a value for the probability of Rj given
that E has been observed. For Example 4.31, calculate PROB(R1 /E), PROB(R2 /E),
and PROB(R3 /E) using Bayes’ rule.
SEC. 4.11 PROBABILISTIC REASONING 203
✦
✦ ✦
✦
4.11 Probabilistic Reasoning
An important application of probability in computing is in the design of systems
that predict events. An example is a medical diagnosis system. Ideally, the process
of diagnosis consists of performing tests or observing symptoms until the outcome of
the tests and presence or absence of certain symptoms is sufficient for the physician
to determine what disease the patient has. In practice, however, diagnoses are
rarely certain. Rather, a diagnosis is the most likely disease, or a disease whose
conditional probability, given the outcomes of the experiments that are the tests
and observations of symptoms, is highest.
Let us consider an overly simple example that has the flavor of diagnosis using
probability. Suppose it is known that when a patient has a headache, the probability
that he has the flu is 50%. That is,
PROB(Flu/Headache) = 0.5
In the above, we interpret Flu as the name of an event that can be interpreted as
“the patient has the flu.” Similarly, Headache is the name of the event that the
patient complains of a headache.
Suppose we also know that when the patient’s temperature is measured at 102
(Fahrenheit) or above, the probability is 60% that he has the flu. If we allow Fever
to be the name of the event that the patient’s temperature is at least 102, then we
can write this observation as
PROB(Flu/Fever) = 0.6
Now, consider the following diagnosis situation. A patient comes to the doctor
complaining of a headache. The doctor takes his temperature and finds it is 102.
What is the probability that the patient has the flu?
The situation is suggested by Fig. 4.18. There we see the three events Flu,
Headache, and Fever, which together divide the space into 8 regions, which we
indicate by letters a through h. For example, c is the event that the patient has a
headache and flu, but not a fever.
The given information about probabilities puts some constraints on the sizes
of the events in Fig. 4.18. Let us use the letters a through h as standing not only
for the regions indicated in Fig. 4.18 but as the probabilities of those events. Then
the condition that PROB(Flu/Headache) = 0.5 says that the sum of regions c + f
is half the total size of the headache event, or put another way:
c+f = d+g (4.13)
Similarly, the fact that PROB(Flu/Fever) = 0.6 says that e + f is 3/5 of the total
size of the Fever event, or:
e + f = 23 (g + h) (4.14)
Now, let us interpret the question: what is the probability of flu, given both a
fever and a headache? The fact that there is both fever and headache says that we
are either in region f or region g. In region f the diagnosis of flu is correct, and in
g it is not. Thus, the probability of flu is f /(f + g).
What is the value of f /(f + g)? The answer may be surprising. We have
absolutely no information about the probability of flu; it could be 0 or 1 or anything
in between. Here are two examples of how the points of the probability space of
Fig. 4.18 could actually be distributed.
204 COMBINATORICS AND PROBABILITY
Flu Headache
c
b d
e g
Fever
✦ Example 4.35. Suppose that in Fig. 4.18 the probabilities associated with the
various events are: d = f = 0.3, a = h = 0.2, and the four other regions have
probability 0. Note that these values satisfy the constraining equations (4.13) and
(4.14). In this example, f /(f + g) = 1; that is, a patient with both a headache and
fever is certain to have the flu. Then the probability space of Fig. 4.18 actually
looks like that of Fig. 4.19. There we see that whenever a patient has both a fever
and headache, he has the flu, and conversely, whenever he has the flu, he has both
fever and headache.4 ✦
4 Although there are other examples where b 6= 0 — that is, one can have the flu and yet have
neither fever nor headache — yet f /(f + g) = 1.
SEC. 4.11 PROBABILISTIC REASONING 205
Headache
a
f d
h Flu
Fever
Fig. 4.19. Example of space where Fever and Headache guarantee Flu.
Headache
a
Flu c
e g
Fever
Fig. 4.20. Example of space where Fever and Headache guarantee no Flu.
✦ Example 4.37. Referring again to the situation of Fig. 4.18, suppose we are
told that at any time, 2% of the population has a fever and 3% of the population
has a headache. That is, the size of the event Fever is 0.02, and the size of the event
Headache is 0.03. What fraction of the population has either a fever or a headache,
or both?
The answer is that between 3% and 5% of the population has at least one. To
see why, let us do some calculation in terms of the eight regions defined in Fig. 4.18.
206 COMBINATORICS AND PROBABILITY
We can generalize our explorations of Example 4.37 to any two events. The
Rule for sums rule for sums is as follows. If E and F are any two events, and G is the event that
either E or F or both occurs, then
max PROB(E), PROB(F ) ≤ PROB(G) ≤ PROB(E) + PROB(F ) (4.18)
That is, the probability of E-or-F is between the larger of the probabilities of E
and F and the sum of those probabilities.
The same idea holds within any other event H. That is, all the probabilities
in (4.18) may be made conditional on event H, giving us the more general rule
max PROB(E/H), PROB(F/H) ≤ PROB(G/H)
≤ PROB(E/H) + PROB(F/H) (4.19)
✦ Example 4.38. Suppose, in the scenario of Fig. 4.18 we are told that 70% of all
people with the flu have a fever, and 80% of all people with the flu have a headache.
Then in (4.19), Flu is the event H, E is the event Fever, F is Headache, and G is
Headache-or-Fever. We are told that PROB(E/H) = PROB(Fever/Flu) = 0.7, and
PROB(F/H) = PROB(Headache/Flu) = 0.8.
SEC. 4.11 PROBABILISTIC REASONING 207
Rule (4.19) says that PROB(G/H) is at least the larger of 0.7 and 0.8. That is,
if you have the flu, then the probability that you have a fever or headache or both
is at least 0.8. Rule (4.19) also says that PROB(G/H) is at most
PROB(E/H) + PROB(F/H)
or 0.7 + 0.8 = 1.5. However, that upper bound is not useful. We know that no event
can have probability greater than 1, so 1 is a better upper bound on PROB(G/H). ✦
As with the rule for sums, the same idea applies to probabilities that are
conditional upon some other event H. That is,
PROB(E/H) + PROB(F/H) − 1 ≤ PROB(G/H)
≤ min PROB(E/H), PROB(F/H) (4.20)
✦ Example 4.38. Again referring to Fig. 4.18, suppose that 70% of those with
the flu have a fever and 80% have a headache. How many have both a fever and
a headache? According to (4.20) with H the event Flu, the probability of both a
fever and headache, given that the person has the flu, is at least 0.7 + 0.8 − 1 = 0.5
and at most min(0.7, 0.8) = 0.7. ✦
208 COMBINATORICS AND PROBABILITY
E F
prob.
= pq
prob. = p
prob. = q
E-or-F
prob. = p + q − pq
✦ Example 4.40. For the case n = 2, such as Example 4.35, we need only
give three conditional probabilities. Thus we might assert, as we did previously,
that PROB(Flu/Fever) = 0.6 and PROB(Flu/Headache) = 0.5. Then, we might add
information such as PROB(Flu/Fever-and-Headache) = 0.9. ✦
✦ Example 4.41. Suppose we state that whenever one has the flu, one is sure to
have a headache. In terms of Fig. 4.18, we are saying that regions b and e are empty.
Suppose also that whenever one has the flu, one also has a fever. Then region c
of Fig. 4.18 is also empty. Figure 4.22 suggests the simplification to Fig. 4.18 that
results from these two assumptions.
Under the conditions that b, c, and e are all 0, and again assuming that
PROB(Flu/Headache) = 0.5 and PROB(Flu/Fever) = 0.6, we can rewrite Equations
(4.13) and (4.14) as
f =d+g
f = 23 (g + h)
Since d and h are both at least 0, the first equation says f ≥ g and the second says
f ≥ 3g/2.
Now, let us see what we know about the probability of flu, given both fever and
headache, that is, PROB(Flu/Fever-and-Headache). This conditional probability
in either Fig. 4.18 or Fig. 4.22 is f /(f + g). Since f ≥ 3g/2, we conclude that
f /(f + g) ≥ 0.6. That is, the probability is at least 0.6 that a patient with headache
and fever has the flu. ✦
210 COMBINATORICS AND PROBABILITY
f d
We can generalize Example 4.41 to apply to any three events, two of which are
implied by the third. Suppose E, F , and G are events, and
PROB(E/G) = PROB(F/G) = 1
That is, whenever G occurs, E and F are certain to occur as well. Suppose further
that PROB(G/E) = p and PROB(G/F ) = q. Then
PROB(G/E-and-F ) ≥ max(p, q) (4.21)
The reason Equation (4.21) holds can be seen from Fig. 4.22, if we interpret Flu
as G, Fever as E, and Headache as F . Then p = f /(f + g + h) and q = f /(f + g + d).
Since d and h are at least 0, it follows that p ≤ f /(f + g) and q ≤ f /(f + g). But
f /(f + g) is PROB(G/E-and-F ). Thus, this conditional probability is equal to or
greater than the larger of p and q.
EXERCISES
4.11.1: Generalize the rule of sums and the rule of products to more than two
events. That is, if E1 , E2 , . . . , En are events with probabilities p1 , p2 , . . . , pn , re-
spectively,
a) What can we say about the probability that at least one of the n events occur?
b) What can we say about the probability that all n events occur?
Recall that Ē is the complement event for E and F̄ is the complement event for F .
SEC. 4.11 PROBABILISTIC REASONING 211
c) Suppose we are also told that whenever it is going to be cold at night, the
sunlight sensor reads high, and the temperature drops at least 4 degrees after
sunset, i.e., PROB(High/Cold) and PROB(Dropping/Cold) are both 1. Give
upper and lower limits on PROB(Cold/High-and-Dropping).
d**)Under the same assumption as part (c), give upper and lower limits on
PROB(Cold/High-or-Dropping)
Note that this problem requires reasoning not covered in the section.
4.11.4*: In many situations, such as Example 4.35, two or more events mutually
reinforce a conclusion. That is, we expect intuitively that whatever
PROB(Flu/Headache)
may be, being told that the patient has a fever as well as a headache increases
Reinforcing the probability of flu. Say that event E reinforces event F in the conclusion G
events if PROB(G/E-and-F ) ≥ PROB(G/F ). Show that if events E and F each reinforce
the other in the conclusion G, then Equation (4.21) holds. That is, the conditional
212 COMBINATORICS AND PROBABILITY
✦
✦ ✦
✦
4.12 Expected Value Calculations
Commonly, the possible outcomes of an experiment have associated values. In this
section, we shall use simple gambling games as examples, where money is won or lost
depending on the outcome of the experiment. In the next section, we shall discuss
more complex examples from computer science, where we compute the expected
running time of certain algorithms.
Payoff function Suppose we have a probability space and a payoff function f on the points of
that space. The expected value of f is the sum over all points x of f (x)PROB(x).
We denote this value by EV(f ). When all points are equally likely, we can compute
the expected value EV(f ) by
1. Summing f (x) for all x in the space and then
2. Dividing by the number of points in the space.
Mean value The expected value is sometimes called the mean value and it can be thought of as
a “center of gravity.”
✦ Example 4.42. Suppose the space is the six points representing the outcomes
of the throw of a fair die. These points are naturally thought of as integers 1
through 6. Let the payoff function be the identity function; that is, f (i) = i for
i = 1, 2, . . . , 6. Then the expected value of f is
EV(f ) = f (1) + f (2) + f (3) + f (4) + f (5) + f (6) /6
= (1 + 2 + 3 + 4 + 5 + 6)/6 = 21/6 = 3.5
That is, the expected value of the number on a die is 3.5.
As another example, let g be the payoff function g(i) = i2 . Then, for the same
experiment — the throw of one die — the expected value of g is
EV(g) = (12 + 22 + 32 + 42 + 52 + 62 )/6
= (1 + 4 + 9 + 16 + 25 + 36)/6 = 91/6 = 15.17
Informally, the expected value for the square of the number thrown on a die is
15.17 ✦
In the common case where the points have equal probability, let nv be the
number of points in event Ev and let n be the total number of points in the space.
Then PROB(Ev ) is nv /n, and we may write
X
EV(f ) = vnv /n
v
214 COMBINATORICS AND PROBABILITY
✦ Example 4.44. In Example 4.25, we introduced the game of Keno, and com-
puted the probability of guessing three out of 5 correct. Now let us compute the
expected value of the payoff in the 5-spot game of Keno. Recall that in the 5-spot
game, the player guesses five numbers from 1 to 80. When the game is played,
twenty numbers from 1 to 80 are selected. The player wins if three or more of those
twenty numbers are among the five he selected.
However, the payoff depends on how many of the player’s five numbers are
correct. Typically, for a $1 bet, the player receives $2 if he guesses three out of five
(i.e., the player has a net gain of $1). If he guesses four out of five, he receives $15,
and for guessing all five, he is rewarded with $300. If he guesses fewer than three
correctly, the player receives nothing, and loses his $1.
In Example 4.25, we calculated the probability of guessing three out of five to be
0.08394 (to four significant places). We can similarly calculate that the probability
of guessing four out of five is 0.01209, and the probability of guessing all five is
0.0006449. Then, the probability of guessing fewer than three is 1 minus these
three fractions, or 0.90333. The payoffs for fewer than three, for three, four, and
five are −1, +1, +14, and +299, respectively. Thus, we may apply formula (4.22)
to get the expected payoff of the 5-spot game of Keno. It is
0.90333 × −1 + 0.08394 × 1 + 0.01209 × 14 + 0.0006449 × 299 = −0.4573
Thus, the player loses almost 46 cents of every dollar he bets in this game. ✦
EXERCISES
4.12.1: Show that if we throw three dice, the expected number of 1’s that will
appear is 1/2.
4.12.2*: Since we win when there is a 1 and lose when there is not, why does
not the fact in Exercise 4.12.1 imply that Chuck-a-Luck is an even game (i.e., the
expected payoff by betting on 1, or any other number, is 0)?
4.12.3: Suppose that in a 4-spot game of Keno, where the player guesses four
numbers, the payout is as follows: for guessing two, $1 (i.e., the player gets his
dollar back); for guessing three, $4; for guessing all four, $50. What is the expected
value of the payout?
4.12.4: Suppose in a 6-spot game of Keno, the payouts are as follows: for guessing
three, $1; for four, $4; for five, $25; for guessing all six, $1000. What is the expected
value of the payout?
4.12.5: Suppose we play a Chuck-a-Luck type of game with six dice. The player
pays $1 to play, bets on a number, and throws the dice. He is rewarded with $1 for
every time his selected number appears. For instance, if it appears once, the net
Fair game payout is 0; if it appears twice, the net payout is +1, and so on. Is this a fair game
(i.e., is the expected value of the payout 0)?
4.12.6*: Based on the style of payout suggested by Exercise 4.12.5, we could modify
the payout of the standard 3-dice form of Chuck-a-Luck so that the player pays some
amount to play. He is then rewarded with $1 for every time his number appears.
What is the proper amount the player should pay in order that this be a fair game?
SEC. 4.13 SOME PROGRAMMING APPLICATIONS OF PROBABILITY 215
✦
✦ ✦
✦
4.13 Some Programming Applications of Probability
In this section, we shall consider two types of uses for probability calculations in
computer science. The first is an analysis of the expected running time of an
algorithm. The second is a new type of algorithm, often called a “Monte Carlo”
algorithm, because it takes a risk of being incorrect. As we shall see, by adjusting
parameters, it is possible to make Monte-Carlo algorithms correct with as high
a probability as we like, except that we cannot reach probability 1, or absolute
certainty.
A Probabilistic Analysis
Let us consider the following simple problem. Suppose we have an array of n
integers, and we ask whether an integer x is an entry of the array A[0..n-1]. The
algorithm in Fig.4.23 does as well as any. Note that it returns a type which we
called BOOLEAN, defined to be a synonym for int in Section 1.6. Also in that section
were defined the constants TRUE and FALSE, which stand for 1 and 0, respectively.
The loop of lines (1) – (3) examines each entry of the array, and if x is found
there, immediately terminates the loop with TRUE as our answer. If x is never found,
we reach line (4) and return FALSE. Let the time taken by the body of the loop
and the loop incrementation and test be c. Let d be the time of line (4) and the
initialization of the loop. Then if x is not found, the running time of the function
of Fig. 4.23 is cn + d, which is O(n).
However, suppose x is found; what is the running time of Fig. 4.23 then?
Clearly, the earlier x is found, the less time it takes. If x were somehow guaranteed
to be in A[0], the time would be O(1), since the loop would iterate only once. But
if x were always at or near the end, the time would be O(n).
Surely the worst case is when we find x at the last step, so O(n) is a smooth
and tight upper bound on the worst case. However, is it possible that the average
case is much better than O(n)? In order to address this question, we need to define
a probability space whose points represent the possible places in which x can be
found. The simplest assumption is that x is equally likely to be placed in any of
the n entries of array A. If so, then our space has n points, one representing each
of the integers from 0 to n − 1, which are the bounds of the index of array A.
Our question then becomes: in this probability space, what is the expected
value of the running time of the function of Fig. 4.23? Consider a point i in the
216 COMBINATORICS AND PROBABILITY
space; i can be anything from 0 to n − 1. If x is in A[i], then the loop will iterate
i + 1 times. An upper bound on the running time is thus ci + d. This bound is off
slightly in the constant d, since line (4) is never executed. However, the difference
does not matter, since d will disappear when we translate to a big-oh expression
anyway.
We must thus find the expected value of the function f (i) = ci + d on this
probability space. We sum ci + d where i ranges from 0 to n − 1, and then divide
by n, the number of points. That is,
n−1
X
EV(f ) = ci + d /n = cn(n − 1)/2 + dn /n = c(n − 1)/2 + d
i=0
For large n, this expression is about cn/2. Thus, O(n) is the smooth and tight
upper bound on this expected value. That is, the expected value is, to within a
constant factor of about 2, the same as the worst case. This result makes intuitive
sense. If x is equally likely to be anywhere in the array, it will “typically” be half
way down the array, and we therefore do about half the work that would be done
if x were not in the array at all, or if it were at the last element.
Algorithm
Algorithm says “I don’t
says “true” know,” but
answer is
“true”
a
b c
2. If the correct answer is “true,” then the algorithm answers “false” with proba-
bility (1 − p)n , which we assume is very small because n is chosen large enough
to make it small. The algorithm answers “true” with probability 1 − (1 − p)n ,
which is presumably very close to 1.
Thus, there are no failures when the correct answer is “false” and very few failures
when the correct answer is “true.”
EXERCISES
A[0]<A[2], the test would not be independent, since knowing the first two hold we
can be sure the third holds.
4.13.6**: Suppose we are given an array of size n filled with integers in the range
1 to n. These integers were either selected to be different, or they were selected at
random and independently, so we can expect some equalities among √ entries in the
array. Give a Monte-Carlo algorithm that has running time O( n) and has at most
a probability of 10−6 of saying the array was filled with distinct integers when in
fact it was filled at random.
✦
✦ ✦
✦
4.14 Summary of Chapter 4
The reader should remember the following formulas and paradigm problems for
counting.
✦ We can select k items out of n and arrange the k selected items in any order in
n!/(n − k)! different ways. The paradigm problem is choosing the win, place,
and show horses (k = 3) in a race with n horses.
n
✦ The number ofways to select m objects out of n, without order, is m , or
n!/ m!(n − m)! . The paradigm problem is dealing poker hands, where n = 52
and m = 5.
✦ If we want to order n items, some of which are identical, the number of ways to
do so is computed as follows. Start with n!. Then, if one value appears k > 1
times among the n items, divide by k!. Perform this division for each value
that appears more than once. The paradigm problem is counting anagrams of
a word of length n, where we must divide n! by k! for each letter that appears
k times in the word and k > 1.
✦ If as above, some of the objects are not identical, we count the number of ways
to distribute them to bins as follows. Start with (n + m − 1)!/(m − 1)!. Then,
if there is a group of k identical objects, and k > 1, divide by k!. Perform the
division for each value that appears more than once. The paradigm problem is
distributing fruits of various kinds to children.
In addition, the reader should remember the following points about probability.
✦
✦ ✦
✦
4.15 Bibliographic Notes for Chapter 4
A venerable and excellent introduction to combinatorics is Liu [1968]. Graham,
Knuth, and Patashnik [1989] is a deeper discussion of the subject. Feller [1968] is
the classic book on probability theory and its applications.
The Monte-Carlo algorithm for testing whether a number is a prime is from
Rabin [1976]. A discussion of this algorithm and other interesting issues involving
222 COMBINATORICS AND PROBABILITY
computer security and algorithms that use randomness in an important way can be
found in Dewdeney [1993]. A more advanced discussion of these topics is presented
in Papadimitriou [1994].
Dewdeney, A. K. [1993]. The Turing Omnibus, Computer Science Press, New York.
Feller, W. [1968]. An Introduction to Probability Theory and Its Applications,
Third Edition, Wiley, New York.
Graham, R. L., D. E. Knuth, and O. Patashnik [1989]. Concrete Mathematics: A
Foundation for Computer Science, Addison-Wesley, Reading, Mass.
Liu, C.-L. [1968]. An Introduction to Combinatorial Mathematics, McGraw-Hill,
New York.
Papadimitriou, C. H. [1994]. Computational Complexity, Addison-Wesley, Reading,
Mass.
Rabin, M. O. [1976]. “Probabilistic algorithms,” in Algorithms and Complexity:
New Directions and Recent Trends (J. F. Traub, ed.), pp. 21–39, Academic Press,
New York.
CHAPTER 5
✦
✦ ✦
✦
The Tree
Data Model
There are many situations in which information has a hierarchical or nested struc-
ture like that found in family trees or organization charts. The abstraction that
models hierarchical structure is called a tree and this data model is among the most
fundamental in computer science. It is the model that underlies several program-
ming languages, including Lisp.
Trees of various types appear in many of the chapters of this book. For in-
stance, in Section 1.3 we saw how directories and files in some computer systems
are organized into a tree structure. In Section 2.8 we used trees to show how lists
are split recursively and then recombined in the merge sort algorithm. In Section
3.7 we used trees to illustrate how simple statements in a program can be combined
to form progressively more complex statements.
✦
✦ ✦
✦
5.1 What This Chapter Is About
223
224 THE TREE DATA MODEL
✦ The priority queue, which is a set to which elements can be added, but from
which only the maximum element can be deleted at any one time. An efficient
data structure, called a partially ordered tree, is introduced for implementing
priority queues, and an O(n log n) algorithm, called heapsort, for sorting n
elements is derived using a balanced partially ordered tree data structure, called
a heap (Sections 5.9 and 5.10).
✦
✦ ✦
✦
5.2 Basic Terminology
Nodes and Trees are sets of points, called nodes, and lines, called edges. An edge connects two
edges distinct nodes. To be a tree, a collection of nodes and edges must satisfy certain
properties; Fig. 5.1 is an example of a tree.
Root 1. In a tree, one node is distinguished and called the root. The root of a tree is
generally drawn at the top. In Fig. 5.1, the root is n1 .
2. Every node c other than the root is connected by an edge to some one other
Parent and node p called the parent of c. We also call c a child of p. We draw the parent
child of a node above that node. For example, in Fig. 5.1, n1 is the parent of n2 , n3 ,
and n4 , while n2 is the parent of n5 and n6 . Said another way, n2 , n3 , and n4
are children of n1 , while n5 and n6 are children of n2 .
All nodes are 3. A tree is connected in the sense that if we start at any node n other than the
connected to root, move to the parent of n, to the parent of the parent of n, and so on, we
the root eventually reach the root of the tree. For instance, starting at n7 , we move to
its parent, n4 , and from there to n4 ’s parent, which is the root, n1 .
n1
n2 n3 n4
n5 n6 n7
BASIS. A single node n is a tree. We say that n is the root of this one-node tree.
INDUCTION. Let r be a new node and let T1 , T2 , . . . , Tk be one or more trees with
roots c1 , c2 , . . . , ck , respectively. We require that no node appear more than once in
the Ti ’s; and of course r, being a “new” node, cannot appear in any of these trees.
We form a new tree T from r and T1 , T2 , . . . , Tk as follows:
a) Make r the root of tree T .
SEC. 5.2 BASIC TERMINOLOGY 225
✦ Example 5.1. We can use this recursive definition to construct the tree in Fig.
5.1. This construction also verifies that the structure in Fig. 5.1 is a tree. The nodes
n5 and n6 are each trees themselves by the basis rule, which says that a single node
can be considered a tree. Then we can apply the inductive rule to create a new
tree with n2 as the root r, and the tree T1 , consisting of n5 alone, and the tree T2 ,
consisting of n6 alone, as children of this new root. The nodes c1 and c2 are n5 and
n6 , respectively, since these are the roots of the trees T1 and T2 . As a result, we
can conclude that the structure
n2
n5 n6
is a tree; its root is n2 .
Similarly, n7 alone is a tree by the basis, and by the inductive rule, the structure
n4
n7
is a tree; its root is n4 .
Node n3 by itself is a tree. Finally, if we take the node n1 as r, and n2 , n3 , and
n4 as the roots of the three trees just mentioned, we create the structure in Fig.
5.1, verifying that it indeed is a tree. ✦
✦ Example 5.3. In Fig. 5.1, all seven nodes are descendants of n1 , and n1 is an
ancestor of all nodes. Also, all nodes but n1 itself are proper descendants of n1 ,
and n1 is a proper ancestor of all nodes in the tree but itself. The ancestors of n5
are n5 , n2 , and n1 . The descendants of n4 are n4 and n7 . ✦
Sibling Nodes that have the same parent are sometimes called siblings. For example,
in Fig. 5.1, nodes n2 , n3 , and n4 are siblings, and n5 and n6 are siblings.
Subtrees
In a tree T , a node n, together with all of its proper descendants, if any, is called
a subtree of T . Node n is the root of this subtree. Notice that a subtree satisfies
the three conditions for being a tree: it has a root, all other nodes in the subtree
have a unique parent in the subtree, and by following parents from any node in the
subtree, we eventually reach the root of the subtree.
✦ Example 5.4. Referring again to Fig. 5.1, node n3 by itself is a subtree, since
n3 has no descendants other than itself. As another example, nodes n2 , n5 , and n6
form a subtree, with root n2 , since these nodes are exactly the descendants of n2 .
However, the two nodes n2 and n6 by themselves do not form a subtree without
node n5 . Finally, the entire tree of Fig. 5.1 is a subtree of itself, with root n1 . ✦
✦ Example 5.5. In Fig. 5.1, the leaves are n5 , n6 , n3 , and n7 . The nodes n1 , n2 ,
and n4 are interior. ✦
✦ Example 5.6. In Fig. 5.1, node n1 has height 2, n2 has height 1, and leaf n3
has height 0. In fact, any leaf has height 0. The tree in Fig. 5.1 has height 2. The
depth of n1 is 0, the depth of n2 is 1, and the depth of n5 is 2. ✦
Ordered Trees
Optionally, we can assign a left-to-right order to the children of any node. For
example, the order of the children of n1 in Fig. 5.1 is n2 leftmost, then n3 , then n4 .
This left-to-right ordering can be extended to order all the nodes in a tree. If m
and n are siblings and m is to the left of n, then all of m’s descendants are to the
left of all of n’s descendants.
✦ Example 5.7. In Fig. 5.1, the nodes of the subtree rooted at n2 — that is, n2 ,
n5 , and n6 — are all to the left of the nodes of the subtrees rooted at n3 and n4 .
Thus, n2 , n5 , and n6 are all to the left of n3 , n4 , and n7 . ✦
In a tree, take any two nodes x and y neither of which is an ancestor of the
other. As a consequence of the definition of “to the left,” one of x and y will be
to the left of the other. To tell which, follow the paths from x and y toward the
root. At some point, perhaps at the root, perhaps lower, the paths will meet at
some node z as suggested by Fig. 5.2. The paths from x and y reach z from two
different nodes m and n, respectively; it is possible that m = x and/or n = y, but
it must be that m 6= n, or else the paths would have converged somewhere below z.
m n
x y
✦ Example 5.8. Since no leaf can be an ancestor of another leaf, it follows that
all leaves can be ordered “from the left.” For instance, the order of the leaves in
Fig. 5.1 is n5 , n6 , n3 , n7 . ✦
228 THE TREE DATA MODEL
Labeled Trees
A labeled tree is a tree in which a label or value is associated with each node of the
tree. We can think of the label as the information associated with a given node.
The label can be something as simple, such as a single integer, or complex, such as
the text of an entire document. We can change the label of a node, but we cannot
change the name of a node.
If the name of a node is not important, we can represent a node by its label.
However, the label does not always provide a unique name for a node, since several
nodes may have the same label. Thus, many times we shall draw a node with both
its label and its name. The following paragraphs illustrate the concept of a labeled
tree and offer some samples.
+ −
T1 T2 T1
Expressions (i), (ii), and (v) are single operands, and so the basis rule tells us that
the trees of Fig. 5.4(a), (b), and (e), respectively, represent these expressions. Note
that each of these trees consists of a single node to which we have given a name —
n1 , n2 , and n5 , respectively — and a label, which is the operand in the circle.
x n1 10 n2 + n3
x n1 10 n2
− n4 y n5 × n6
+ n3 y n5 − n4
x n1 10 n2 + n3
x n1 10 n2
(d) For (−(x + 10)). (e) For y. (f) For (y × (−(x + 10))).
formed by applying unary − to expression (iii), so that the tree for −(x + 10) ,
root labeled −
shown in Fig. 5.4(d), has
above the tree for (x + 10). Finally, the
tree for the expression y × −(x + 10) , shown in Fig. 5.4(f), has a root labeled
×, whose children are the roots of the trees of Fig. 5.4(e) and (d), in that order. ✦
2 3
4 5 6 7
8 9 10 11 12
13 14
15
EXERCISES
5.2.1: In Fig. 5.5 we see a tree. Tell what is described by each of the following
phrases:
a) The root of the tree
b) The leaves of the tree
c) The interior nodes of the tree
d) The siblings of node 6
e) The subtree with root 5
f) The ancestors of node 10
g) The descendants of node 10
h) The nodes to the left of node 10
i) The nodes to the right of node 10
j) The longest path in the tree
k) The height of node 3
l) The depth of node 13
m) The height of the tree
5.2.2: Can a leaf in a tree ever have any (a) descendants? (b) proper descendants?
5.2.3: Prove that in a tree no leaf can be an ancestor of another leaf.
SEC. 5.3 DATA STRUCTURES FOR TREES 231
5.2.4*: Prove that the two definitions of trees in this section are equivalent. Hint :
To show that a tree according the nonrecursive definition is a tree according the
recursive definition, use induction on the number of nodes in the tree. In the
opposite direction, use induction on the number of rounds used in the recursive
definition.
5.2.5: Suppose we have a graph consisting of four nodes, r, a, b, and c. Node r is
an isolated node and has no edges connecting it. The remaining three nodes form
a cycle; that is, we have an edge connecting a and b, an edge connecting b and c,
and an edge connecting c and a. Why is this graph not a tree?
5.2.6: In many kinds of trees, there is a significant distinction between the interior
nodes and the leaves (or rather the labels of these two kinds of nodes). For example,
in an expression tree, the interior nodes represent operators, and the leaves represent
atomic operands. Give the distinction between interior nodes and leaves for each of
the following kinds of trees:
a) Trees representing directory structures, as in Section 1.3
b) Trees representing the splitting and merging of lists for merge sort, as in Section
2.8
c) Trees representing the structure of a function, as in Section 3.7
5.2.7: Give expression trees for the following expressions. Note that, as is cus-
tomary with expressions, we have omitted redundant parentheses. You must first
restore the proper pairs of parentheses, using the customary rules for precedence
and associativity of operators.
a) (x + 1) × (x − y + 4)
b) 1+2+3+4+5+6
c) 9×8+7×6+5
5.2.8: Show that if x and y are two distinct nodes in an ordered tree, then exactly
one of the following conditions must hold:
a) x is a proper ancestor of y
b) x is a proper descendant of y
c) x is to the left of y
d) x is to the right of y
✦
✦ ✦
✦
5.3 Data Structures for Trees
Many data structures can be used to represent trees. Which one we should use
depends on the particular operations we want to perform. As a simple example, if
all we ever want to do is to locate the parents of nodes, then we can represent each
node by a structure consisting of a label plus a pointer to the structure representing
the parent of that node.
As a general rule, the nodes of a tree can be represented by structures in which
the fields link the nodes together in a manner similar to the way in which the nodes
are connected in the abstract tree; the tree itself can be represented by a pointer to
the root’s structure. Thus, when we talk about representing trees, we are primarily
interested in how the nodes are represented.
232 THE TREE DATA MODEL
One distinction in representations concerns where the structures for the nodes
“live” in the memory of the computer. In C, we can create the space for struc-
tures for nodes by using the function malloc from the standard library stdlib.h,
in which case nodes “float” in memory and are accessible only through pointers.
Alternatively, we can create an array of structures and use elements of the array to
represent nodes. Again nodes can be linked according to their position in the tree,
but it is also possible to visit nodes by walking down the array. We can thus access
nodes without following a path through the tree. The disadvantage of an array-
based representation is that we cannot create more nodes than there are elements
in the array. In what follows, we shall assume that nodes are created by malloc,
although in situations where there is a limit on how large trees can grow, an array
of structures of the same type is a viable, and possibly preferred, alternative.
info
p0 p1 ··· pbf −1
0 n1
h 0 n2 s 0 n3
e 1 n4 i 0 n5 h 0 n6
r 0 n7 s 1 n8 e 1 n9
s 1 n10
Fig. 5.7. Trie for words he, hers, his, and she.
distinguish between upper- and lower-case, and words contain no special characters
such as apostrophes, then we can take the branching factor to be 26. The type of
a node, including the two label fields, can be defined as in Fig. 5.8. In the array
children, we assume that the letter a is represented by index 0, the letter b by
index 1, and so on.
The abstract trie of Fig. 5.7 can be represented by the data structure of Fig.
5.9. We represent nodes by showing the first two fields, letter and isWord, along
with those elements of the array children that have non-NULL pointers. In the
children array, for each non-NULL element, the letter indexing the array is shown
in the entry above the pointer to the child, but that letter is not actually present
in the structure. Note that the letter field of the root is irrelevant. ✦
0
··· h ··· s ···
h 0 s 0
··· e ··· i ··· ··· h ···
e 1 i 0 h 0
··· r ··· ··· s ··· ··· e ···
r 0 s 1 e 1
··· s ··· ··· ···
s 1
···
✦ Example 5.11. In Fig. 5.1, n3 is the right sibling of n2 , n4 is the right sibling
of n3 , and n4 has no right sibling. We would find the children of n1 by following
its leftmost-child pointer to n2 , then the right-sibling pointer to n3 , and then the
right-sibling pointer of n3 to n4 . There, we would find a NULL right-sibling pointer
and know that n1 has no more children.
Figure 5.10 contains a sketch of the leftmost-child–right-sibling representation
for the tree in Fig. 5.1. The downward arrows are the leftmost-child links; the
sideways arrows are the right-sibling links. ✦
n1
n2 n3 n4
n5 n6 n7
The field info holds the label associated with the node and it can have any
type. The fields leftmostChild and rightSibling point to the leftmost child
and right sibling of the node in question. Note that while leftmostChild gives
information about the node itself, the field rightSibling at a node is really part
of the linked list of children of that node’s parent.
✦ Example 5.12. Let us represent the trie of Fig. 5.7 in the leftmost-child–right-
sibling form. First, the type of nodes is
typedef struct NODE *pNODE;
struct NODE {
char letter;
int isWord;
pNODE leftmostChild, rightSibling;
};
The first two fields represent information, according to the scheme described in
Example 5.10. The trie of Fig. 5.7 is represented by the data structure shown
in Fig. 5.11. Notice that each leaf has a NULL leftmost-child pointer, and each
rightmost child has a NULL right-sibling pointer.
h 0 s 0
e 1 i 0 h 0
r 0 s 1 e 1
s 1
Parent Pointers
Sometimes, it is useful to include in the structure for each node a pointer to the
parent. The root has a NULL parent pointer. For example, the structure of Example
5.12 could become
typdef struct NODE *pNODE;
struct NODE {
char letter;
int isWord;
pNODE leftmostChild, rightSibling, parent;
};
With this structure, it becomes possible to determine what word a given node
represents. We repeatedly follow parent pointers until we come to the root, which
we can identify because it alone has the value of parent equal to NULL. The letter
fields encountered along the way spell the word, backward.
EXERCISES
5.3.1: For each node in the tree of Fig. 5.5, indicate the leftmost child and right
sibling.
238 THE TREE DATA MODEL
✦
✦ ✦
✦
5.4 Recursions on Trees
The usefulness of trees is highlighted by the number of recursive operations on trees
that can be written naturally and cleanly. Figure 5.13 suggests the general form of a
recursive function F (n) that takes a node n of a tree as argument. F first performs
some steps (perhaps none), which we represent by action A0 . Then F calls itself
on the first child, c1 , of n. During this recursive call, F will “explore” the subtree
rooted at c1 , doing whatever it is F does to a tree. When that call returns to the
call at node n, some other action — say A1 — is performed. Then F is called on
the second child of n, resulting in exploration of the second subtree, and so on, with
actions at n alternating with calls to F on the children of n.
c1 c2 ··· ck
Preorder ✦ Example 5.13. A simple recursion on a tree produces what is known as the
preorder listing of the node labels of the tree. Here, action A0 prints the label of the
node, and the other actions do nothing other than some “bookkeeping” operations
that enable us to visit each child of a given node. The effect is to print the labels
as we would first meet them if we started at the root and circumnavigated the
tree, visiting all the nodes in a counterclockwise tour. Note that we print the
label of a node only the first time we visit that node. The circumnavigation is
suggested by the arrow in Fig. 5.14, and the order in which the nodes are visited is
+a + ∗ − b − c − ∗d ∗ +. The preorder listing is the sequence of node labels +a ∗ −bcd.
Let us suppose that we use a leftmost-child–right-sibling representation of nodes
in an expression tree, with labels consisting of a single character. The label of an
interior node is the arithmetic operator at that node, and the label of a leaf is
240 THE TREE DATA MODEL
a ∗
− d
b c
a letter standing for an operand. Nodes and pointers to nodes can be defined as
follows:
typedef struct NODE *pNODE;
struct NODE {
char nodeLabel;
pNODE leftmostChild, rightSibling;
};
The function preorder is shown in Fig. 5.15. In the explanation that follows, it is
convenient to think of pointers to nodes as if they were the nodes themselves.
void preorder(pNODE n)
{
pNODE c; /* a child of node n */
(1) printf("%c\n", n->nodeLabel);
(2) c = n->leftmostChild;
(3) while (c != NULL) {
(4) preorder(c);
(5) c = c->rightSibling;
}
}
Action “A0 ” consists of the following parts of the program in Fig. 5.15:
1. Printing the label of node n, at line (1),
2. Initializing c to be the leftmost child of n, at line (2), and
3. Performing the first test for c != NULL, at line (3).
Line (2) initializes a loop in which c becomes each child of n, in turn. Note that if
n is a leaf, then c is assigned the value NULL at line (2).
We go around the while-loop of lines (3) to (5) until we run out of children
of n. For each child, we call the function preorder recursively on that child, at
line (4), and then advance to the next child, at line (5). Each of the actions Ai ,
SEC. 5.4 RECURSIONS ON TREES 241
for i ≥ 1, consists of line (5), which moves c through the children of n, and the
test at line (3) to see whether we have exhausted the children. These actions are
for bookkeeping only; in comparison, line (1) in action A0 does the significant step,
printing the label.
The sequence of events for calling preorder on the root of the tree in Fig. 5.14
is summarized in Fig. 5.16. The character at the left of each line is the label of the
node n at which the call of preorder(n) is currently being executed. Because no
two nodes have the same label, it is convenient here to use the label of a node as
its name. Notice that the characters printed are +a ∗ −bcd, in that order, which is
the same as the order of circumnavigation. ✦
call preorder(+)
(+) print +
(+) call preorder(a)
(a) print a
(+) call preorder(∗)
(∗) print ∗
(∗) call preorder(−)
(−) print −
(−) call preorder(b)
(b) print b
(−) call preorder(c)
(c) print c
(∗) call preorder(d)
(d) print d
✦ Example 5.14. Another common way to order the nodes of the tree, called
Postorder postorder, corresponds to circumnavigating the tree as in Fig. 5.14 but listing a
node the last time it is visited, rather than the first. For instance, in Fig. 5.14, the
postorder listing is abc − d ∗ +.
To produce a postorder listing of the nodes, the last action does the printing,
and so a node’s label is printed after the postorder listing function is called on all
of its children, in order from the left. The other actions initialize the loop through
the children or move to the next child. Note that if a node is a leaf, all we do is list
the label; there are no recursive calls.
If we use the representation of Example 5.13 for nodes, we can create postorder
listings by the recursive function postorder of Fig. 5.17. The action of this function
when called on the root of the tree in Fig. 5.14 is shown in Fig. 5.18. The same
convention regarding node names is used here as in Fig. 5.16. ✦
void postorder(pNODE n)
{
pNODE c; /* a child of node n */
(1) c = n->leftmostChild;
(2) while (c != NULL) {
(3) postorder(c);
(4) c = c->rightSibling;
}
(5) printf("%c\n", n->nodeLabel);
}
call postorder(+)
(+) call postorder(a)
(a) print a
(+) call postorder(∗)
(∗) call postorder(−)
(−) call postorder(b)
(b) print b
(−) call postorder(c)
(c) print c
(−) print −
(∗) call postorder(d)
(d) print d
(∗) print ∗
(+) print +
Evaluating an BASIS. For a leaf we produce the integer value of the node as the value of the tree.
expression tree
INDUCTION. Suppose we wish to compute the value of the expression formed by
the subtree rooted at some node n. We evaluate the subexpressions for the two
subtrees rooted at the children of n; these are the values of the operands for the
operator at n. We then apply the operator labeling n to the values of these two
subtrees, and we have the value of the entire subtree rooted at n.
int eval(pNODE n)
{
int val1, val2; /* values of first and second subtrees */
(1) if (n->op) == ’i’) /* n points to a leaf */
(2) return n->value;
else {/* n points to an interior node */
(3) val1 = eval(n->leftmostChild);
(4) val2 = eval(n->leftmostChild->rightSibling);
(5) switch (n->op) {
(6) case ’+’: return val1 + val2;
(7) case ’-’: return val1 - val2;
(8) case ’*’: return val1 * val2;
(9) case ’/’: return val1 / val2;
}
}
}
If the node n is a leaf, then the test of line (1) succeeds and we return the
integer label of that leaf at line (2). If the node is not a leaf, then we evaluate its
left operand at line (3) and its right operand at line (4), storing the results in val1
and val2, respectively. Note in connection with line (4) that the second child of a
node n is the right sibling of the leftmost child of the node n. Lines (5) through (9)
form a switch statement, in which we decide what the operator at n is and apply
the appropriate operation to the values of the left and right operands.
5 ∗
− 2
10 3
For instance, consider the expression tree of Fig. 5.20. We see in Fig. 5.21
the sequence of calls and returns that are made at each node during the evaluation
of this expression. As before, we have taken advantage of the fact that labels are
unique and have named nodes by their labels. ✦
call eval(+)
(+) call eval(5)
(5) return 5
(+) call eval(∗)
(∗) call eval(−)
(−) call eval(10)
(10) return 10
(−) call eval(3)
(3) return 3
(−) return 7
(∗) call eval(2)
(2) return 2
(∗) return 14
(+) return 19
Fig. 5.21. Actions of function eval at each node on tree of Fig. 5.20.
We can translate this definition into a recursive program that computes the height
of each node into a field height:
This program is shown in Fig. 5.22. We assume that nodes are structures of
the form
typedef struct NODE *pNODE;
struct NODE {
int height;
pNODE leftmostChild, rightSibling;
};
The function computeHt takes a pointer to a node as argument and computes the
height of that node in the field height. If we call this function on the root of a
tree, it will compute the heights of all the nodes of that tree.
At line (1) we initialize the height of n to 0. If n is a leaf, we are done, because
the test of line (3) will fail immediately, and so the height of any leaf is computed
to be 0. Line (2) sets c to be (a pointer to) the leftmost child of n. As we go around
the loop of lines (3) through (7), c becomes each child of n in turn. We recursively
compute the height of c at line (4). As we proceed, the value in n->height will
be 1 greater than the height of the highest child seen so far, but 0 if we have not
seen any children. Thus, lines (5) and (6) allow us to increase the height of n if we
246 THE TREE DATA MODEL
void computeHt(pNODE n)
{
pNODE c;
(1) n->height = 0;
(2) c = n->leftmostChild;
(3) while (c != NULL) {
(4) computeHt(c);
(5) if (c->height >= n->height)
(6) n->height = 1+c->height;
(7) c = c->rightSibling;
}
}
Fig. 5.22. Procedure to compute the height of all the nodes of a tree.
find a new child that is higher than any previous child. Also, for the first child, the
SEC. 5.4 RECURSIONS ON TREES 247
test of line (5) will surely be satisfied, and we set n->height to 1 more than the
height of the first child. When we fall out of the loop because we have seen all the
children, n->height has been set to 1 more than the maximum height of any of n’s
children. ✦
EXERCISES
5.4.1: Write a recursive program to count the number of nodes in a tree that is
represented by leftmost-child and right-sibling pointers.
5.4.2: Write a recursive program to find the maximum label of the nodes of a tree.
Assume that the tree has integer labels, and that it is represented by leftmost-child
and right-sibling pointers.
5.4.3: Modify the program in Fig. 5.19 to handle trees containing unary minus
nodes.
5.4.4*: Write a recursive program that computes for a tree, represented by leftmost-
child and right-sibling pointers, the number of left-right pairs, that is, pairs of nodes
n and m such that n is to the left of node m. For example, in Fig. 5.20, node 5 is to
the left of the nodes labeled ∗, −, 10, 3, and 2; node 10 is to the left of nodes 3 and
2; and node − is to the left of node 2. Thus, the answer for this tree is eight pairs.
Hint : Let your recursive function return two pieces of information when called on
a node n: the number of left-right pairs in the subtree rooted at n, and also the
number of nodes in the subtree rooted at n.
5.4.5: List the nodes of the tree in Fig. 5.5 (see the Exercises for Section 5.2) in
(a) preorder and (b) postorder.
5.4.6: For each of the expressions
i) (x + y) ∗ (x + z)
ii) (x − y) ∗ z + (y − w) ∗ x
iii) (a ∗ x + b) ∗ x + c ∗ x + d ∗ x + e ∗ x + f
do the following:
a) Construct the expression tree.
b) Find the equivalent prefix expression.
c) Find the equivalent postfix expression.
5.4.7: Convert the expression ab + c ∗ de − /f + from postfix to (a) infix and (b)
prefix.
5.4.8: Write a function that “circumnavigates” a tree, printing the name of a node
each time it is passed.
5.4.9: What are the actions A0 , A1 , and so forth, for the postorder function in Fig.
5.17? (“Actions” are as indicated in Fig. 5.13.)
248 THE TREE DATA MODEL
✦
✦ ✦
✦
5.5 Structural Induction
c1 c2 ··· ck
T1 T2 Tk
Fig. 5.24. The body of the function eval(n) from Fig. 5.19.
SEC. 5.5 STRUCTURAL INDUCTION 249
STATEMENT S(T ): The value returned by eval when called on the root of T
equals the value of the arithmetic expression represented by T .
BASIS. For the basis, T consists of a single node. That is, the argument n is a
(pointer to a) leaf. Since the op field has the value ’i’ when the node represents
an operand, the test of line (1) in Fig. 5.24 succeeds, and the value of that operand
is returned at line (2).
+ n
T1 T2
Fig. 5.25. The call eval(n) returns the sum of the values of T1 and T2 .
If we examine the switch statement of lines (5) through (9), we see that what-
ever operator appears at the root n is applied to the two values val1 and val2. For
example, if the root holds +, as in Fig. 5.25, then at line (5) the value returned is
val1 + val2, as it should be for an expression that is the sum of the expressions of
trees T1 and T2 . We have now completed the inductive step.
We conclude that S(T ) holds for all expression trees T , and, therefore, the
function eval correctly evaluates trees that represent expressions. ✦
✦ Example 5.18. Now let us consider the function computeHt of Fig. 5.22, the
body of which we reproduce as Fig. 5.26. This function takes as argument a (pointer
to a) node n and computes the height of n. We shall prove the following statement
by structural induction:
250 THE TREE DATA MODEL
(1) n->height = 0;
(2) c = n->leftmostChild;
(3) while (c != NULL) {
(4) computeHt(c);
(5) if (c->height >= n->height)
(6) n->height = 1+c->height;
(7) c = c->rightSibling;
}
Fig. 5.26. The body of the function computeHt(n) from Fig. 5.22.
BASIS. If the tree T is a single node n, then at line (2) of Fig. 5.26, c will be given
the value NULL, since n has no children. Thus, the test of line (3) fails immediately,
and the body of the while-loop is never executed. Since line (1) sets n->height to
0, which is the correct value for a leaf, we conclude that S(T ) holds when T has a
single node.
INDUCTION. Now suppose n is the root of a tree T that is not a single node. Then
n has at least one child. We may assume by the inductive hypothesis that when
computeHt(c) is called at line (4), the correct height is installed in the height field
of each node in the subtree rooted at c, including c itself. We need to show that
the while-loop of lines (3) through (7) correctly sets n->height to 1 more than
the maximum of the heights of the children of n. To do so, we need to perform
another induction, which is nested “inside” the structural induction, just as one loop
might be nested within another loop in a program. This induction is an “ordinary”
induction, not a structural induction, and its statement is
STATEMENT S ′ (i): After the loop of lines (3) to (7) has been executed i times,
the value of n->height is 1 more than the largest of the heights of the first
i children of n.
BASIS. The basis is i = 1. Since n->height is set to 0 outside the loop — at line
(1) — and surely no height can be less than 0, the test of line (5) will be satisfied.
Line (6) sets n->height to 1 more than the height of its first child.
INDUCTION. Assume that S ′ (i) is true. That is, after i iterations of the loop,
n->height is 1 larger than the largest height among the first i children. If there is
an (i + 1)st child, then the test of line (3) will succeed and we execute the body an
(i + 1)st time. The test of line (5) compares the new height with the largest of the
previous heights. If the new height, c->height, is less than 1 plus the largest of
the first i heights, no change to n->height will be made. That is correct, since the
maximum height of the first i + 1 children is the same as the maximum height of the
first i children. However, if the new height is greater than the previous maximum,
SEC. 5.5 STRUCTURAL INDUCTION 251
then the test of line (5) will succeed, and n->height is set to 1 more than the height
of the (i + 1)st child, which is correct.
We can now return to the structural induction. When the test of line (3) fails,
we have considered all the children of n. The inner induction, S ′ (i), tells us that
when i is the total number of children, n->height is 1 more than the largest height
of any child of n. That is the correct height for n. The inductive hypothesis S
applied to each of the children of n tells us that the correct height has been stored
in each of their height fields. Since we just saw that n’s height has also been
correctly computed, we conclude that all the nodes in T have been assigned their
correct height.
We have now completed the inductive step of the structural induction, and we
conclude that computeHt correctly computes the height of each node of every tree
on which it is called. ✦
tree making S false. Thus, we know that S(T1 ), S(T2 ), . . . , S(Tk ) are all true. The
inductive step, which we assume proved, tells us that S(T0 ) is also true. Again we
contradict the assumption that T0 violates S.
We have considered the two possible cases, a tree of one node or a tree with
more than one node, and have found that in either case, T0 cannot be a violation
of S. Therefore, S has no violations, and S(T ) must be true for all trees T .
EXERCISES
2 The branching factor and the degree are related concepts, but not the same. The branching
factor is the maximum degree of any node in the tree.
SEC. 5.6 BINARY TREES 253
5.5.7**: Show the converse of Exercise 5.5.6: every tree in the nonrecursive sense
is a tree in the recursive sense.
✦
✦ ✦
✦
5.6 Binary Trees
This section presents another kind of tree, called a binary tree, which is different
from the “ordinary” tree introduced in Section 5.2. In a binary tree, a node can
Left and right have at most two children, and rather than counting children from the left, there
children are two “slots,” one for a left child and the other for a right child. Either or both
slots may be empty.
n1 n1
n2 n2
✦ Example 5.19. Figure 5.27 shows two binary trees. Each has node n1 as root.
The first has n2 as the left child of the root and no right child. The second has no
left child, and has n2 as the right child of the root. In both trees, n2 has neither a
left nor a right child. These are the only binary trees with two nodes. ✦
254 THE TREE DATA MODEL
INDUCTION. If r is a node, and T1 and T2 are binary trees, then there is a binary
tree with root r, left subtree T1 , and right subtree T2 , as suggested in Fig. 5.28.
That is, the root of T1 is the left child of r, unless T1 is the empty tree, in which
case r has no left child. Similarly, the root of T2 is the right child of r, unless T2 is
empty, in which case r has no right child.
T1 T2
✦ Example 5.20. Figure 5.29 shows the five shapes that a binary tree of three
nodes can have. In each binary tree in Fig. 5.29, n3 is a descendant of n1 , and there
is a path from n1 to n3 . Node n3 is a leaf in each tree, while n2 is a leaf in the
middle tree and an interior node in the other four trees.
The height of n3 is 0 in each tree, while the height of n1 is 2 in all but the
middle tree, where the height of n1 is 1. The height of each tree is the same as the
height of n1 in that tree. Node n3 is of depth 2 in all but the middle tree, where it
is of depth 1. ✦
SEC. 5.6 BINARY TREES 255
n1
n2
There is another technical difference. While trees are defined to have at least one
Empty tree node, it is convenient to include the empty tree, the tree with no nodes, among the
binary trees.
n1 n1 n1 n1 n1
n2 n2 n2 n3 n2 n2
n3 n3 n3 n3
we can add a pointer to the parent. Note that the type of the parent pointer is
*NODE, or equivalently TREE.
{
action A0 ;
recursive call on left subtree;
action A1 ;
recursive call on right subtree;
action A2 ;
}
a ∗
− d
b c
void preorder(TREE t)
{
(1) if (t != NULL) {
(2) printf("%c\n", t->nodeLabel);
(3) preorder(t->leftChild);
(4) preorder(t->rightChild);
}
}
for nodes and trees. Then Fig. 5.32 shows a recursive function that lists the labels
of the nodes of a binary tree T in preorder.
The behavior of this function is similar to that of the function of the same name
in Fig. 5.15 that was designed to work on ordinary trees. The significant difference
is that when the function of Fig. 5.32 comes to a leaf, it calls itself on the (missing)
left and right children. These calls return immediately, because when t is NULL,
none of the body of the function except the test of line (1) is executed. We could
save the extra calls if we replaced lines (3) and (4) of Fig. 5.32 by
(3) if (t->leftChild != NULL) preorder(t->leftChild);
(4) if (t->rightChild != NULL) preorder(t->rightChild);
However, that would not protect us against a call to preorder from another func-
tion, with NULL as the argument. Thus, we would have to leave the test of line (1)
in place for safety. ✦
EXERCISES
5.6.1: Write a function that prints an inorder listing of the (labels of the) nodes
of a binary tree. Assume that the nodes are represented by records with left-child
and right-child pointers, as described in this section.
5.6.2: Write a function that takes a binary expression tree and prints a fully paren-
thesized version of the represented expression. Assume the same data structure as
in Exercise 5.6.1.
5.6.3*: Repeat Exercise 5.6.2 but print only the needed parentheses, assuming the
usual precedence and associativity of arithmetic operators.
5.6.4: Write a function that produces the height of a binary tree.
Full binary tree 5.6.5: Define a node of a binary tree to be a full if it has both a left and a right
child. Prove by structural induction that the number of full nodes in a binary tree
is 1 fewer than the number of leaves.
5.6.6: Suppose we represent a binary tree by the left-child, right-child record type.
Prove by structural induction that the number of NULL pointers is 1 greater than
the number of nodes.
258 THE TREE DATA MODEL
Inorder Traversals
In addition to preorder and postorder listings of binary trees, there is another
ordering of nodes that makes sense for binary trees only. An inorder listing of
the nodes of a binary tree is formed by listing each node after exploring the left
subtree, but before exploring the right subtree (i.e., in the position for action A1
of Fig. 5.30). For example, on the tree of Fig. 5.31, the inorder listing would be
a + b − c ∗ d.
A preorder traversal of a binary tree that represents an expression produces
the prefix form of that expression, and a postorder traversal of the same tree pro-
duces the postfix form of the expression. The inorder traversal almost produces the
ordinary, or infix, form of an expression, but the parentheses are missing. That is,
the tree of Fig. 5.31 represents the expression a + (b − c) ∗ d, which is not the same
as the inorder listing, a + b − c ∗ d, but only because the necessary parentheses are
missing from the latter.
To be sure that needed parentheses are present, we could parenthesize all op-
erators. In this modified inorder traversal, action A0 , the step performed before
exploring the left subtree, checks whether the label of the node is an operator and,
if so, prints ’(’, a left parenthesis. Similarly, action A2 , performed after exploring
both subtrees, prints a right parenthesis, ’)’, if the
label is an operator. The result,
applied to the binary tree of Fig. 5.31, would be a + (b − c) ∗ d , which has the
needed pair of parentheses around b − c, along with two pairs of parentheses that
are redundant.
5.6.7**: Trees can be used to represent recursive calls. Each node represents a
recursive call of some function F , and its children represent the calls made by F .
n
In this exercise, we shall consider the recursion for m given in Section 4.5, based
n
= n−1 n−1
on the recursion m m + m−1 . Each call can be represented by a binary
n
tree. If a node corresponds to the computation of m , and the basis cases (m = 0
and m = n) do not apply, then the left child represents n−1
m and the left child
n−1
represents m−1 . If the node represents a basis case, then it has neither left nor
right child.
a) Prove
by structuraln induction that a binary tree with root corresponding to
n
m has exactly 2 m − 1 nodes.
n
b) Use
(a) to show that the running time of the recursive algorithm for m is
n
n
O m . Note that this running time is therefore also O(2 ), but the latter is
a smooth-but-not-tight bound.
✦
✦ ✦
✦
5.7 Binary Search Trees
✦ Example 5.22. Figure 5.33 shows a binary search tree for the set
{Hairy, Bashful, Grumpy, Sleepy, Sleazy, Happy}
where the < order is lexicographic. Note that the names in the left subtree of the
root are all lexicographically less than Hairy, while those in the right subtree are
all lexicographically greater. This property holds at every node of the tree. ✦
260 THE TREE DATA MODEL
Hairy
Bashful Sleepy
Grumpy Sleazy
Happy
Fig. 5.33. Binary search tree with six nodes labeled by strings.
BASIS. If the tree T is empty, then x is not present. If T is not empty, and x
appears at the root, then x is present.
SEC. 5.7 BINARY SEARCH TREES 261
INDUCTION. If T is not empty but x is not at the root, let y be the element at the
root of T . If x < y look up x only in the left subtree of the root, and if x > y look
up x only in the right subtree of y. The BST property guarantees that x cannot be
in the subtree we do not search.
✦ Example 5.23. Suppose we want to look up Grumpy in the binary search tree
of Fig. 5.33. We compare Grumpy with Hairy at the root and find that Grumpy
precedes Hairy in lexicographic order. We thus call lookup on the left subtree.
The root of the left subtree is Bashful, and we compare this label with Grumpy,
finding that the former precedes the latter. We thus call lookup recursively on the
right subtree of Bashful. Now we find Grumpy at the root of this subtree and return
TRUE. These steps would be carried out by a function modeled after Fig. 5.34 that
dealt with lexicographic comparisons. ✦
More concretely, the recursive function lookup(x,T) in Fig. 5.34 implements
this algorithm, using the left-child–right-child data structure. Note that lookup
returns a value of type BOOLEAN, which is a defined type synonymous with int, but
with the intent that only defined values TRUE and FALSE, defined to be 1 and 0,
respectively, will be used. Type BOOLEAN was introduced in Section 1.6. Also, note
that lookup is written only for types that can be compared by =, <, and so on. It
would require rewriting for data like the character strings used in Example 5.23.
262 THE TREE DATA MODEL
At line (1), lookup determines whether T is empty. If not, then at line (3)
lookup determines whether x is stored at the current node. If x is not there, then
lookup recursively searches the left subtree or right subtree depending on whether
x is less than or greater than the element stored at the current node.
INDUCTION. If T is not empty and does not have x at its root, then insert x into
the left subtree if x is less than the element at the root, or insert x into the right
subtree if x is greater than the element at the root.
The function insert(x,T) shown in Fig. 5.35 implements this algorithm for
the left-child–right-child data structure. When we find that the value of T is NULL
at line (1), we create a new node, which becomes the tree T . This tree is created
by lines (2) through (5) and returned at line (10).
If x is not found at the root of T , then, at lines (6) through (9), insert is
called on the left or right subtree, whichever is appropriate. The subtree, modified
by the insertion, becomes the new value of the left or right subtree of the root of T
at lines (7) or (9), respectively. Line (10) returns the augmented tree.
Notice that if x is at the root of T , then none of the tests of lines (1), (6),
and (8) succeed. In this case, insert returns T without doing anything, which is
correct, since x is already in the tree.
✦ Example 5.24. Let us continue with Example 5.23, understanding that techni-
cally, the comparison of character strings requires slightly different code from that
of Fig. 5.35, in which arithmetic comparisons like < are replaced by calls to suit-
ably defined functions like lt. Figure 5.36 shows the binary search tree of Fig. 5.33
SEC. 5.7 BINARY SEARCH TREES 263
after we insert Filthy. We begin by calling insert at the root, and we find that
Filthy < Hairy. Thus, we call insert on the left child, at line (7) of Fig. 5.35.
The result is that we find Filthy > Bashful, and so we call insert on the right
child, at line (9). That takes us to Grumpy, which follows Filthy in lexicographic
order, and we call insert on the left child of Grumpy.
The pointer to the left child of Grumpy is NULL, so at line (1) we discover that
we must create a new node. This one-node tree is returned to the call of insert at
the node for Grumpy, and the tree is installed as the value of the left child of Grumpy
at line (7). The modified tree with Grumpy and Filthy is returned to the call of
insert at the node labeled Bashful, and this modified tree becomes the right child
of Bashful. Then, continuing up the tree, the new tree rooted at Bashful becomes
the left child of the root of the entire tree. The final tree is shown in Fig. 5.36. ✦
Hairy
Bashful Sleepy
Grumpy Sleazy
Filthy Happy
x n y n
Fig. 5.37. To delete x, remove the node containing y, the smallest element in the
right subtree, and then replace the label x by y at node n.
Second, suppose n has both children present. One strategy is to find the node
m with label y, the smallest element in the right subtree of n, and replace x by y
in node n, as suggested by Fig. 5.37. We can then remove node m from the right
subtree.
The BST property continues to hold. The reason is that x is greater than
everything in the left subtree of n, and so y, being greater than x (because y is
in the right subtree of n), is also greater than everything in the left subtree of n.
Thus, as far as the left subtree of n is concerned, y is a suitable element at n. As
far as the right subtree of n is concerned, y is also suitable as the root, because y
was chosen to be the smallest element in the right subtree.
Fig. 5.38. Function deletemin(pT) removes and returns the smallest element from T .
SEC. 5.7 BINARY SEARCH TREES 265
✦ Example 5.25. Figure 5.41 shows what would happen if we used a function
similar to delete (but able to compare character strings) to remove Hairy from the
266 THE TREE DATA MODEL
zk
<
zk−1
< > zk
···
z1
<
> z1
>y
Fig. 5.39. All the other elements in the right subtree are greater than y.
binary search tree of Fig. 5.36. Since Hairy is at a node with two children, delete
calls the function deletemin, which removes and returns the smallest element,
Happy, from the right subtree of the root. Happy then becomes the label of the root
of the tree, the node at which Hairy was stored. ✦
SEC. 5.7 BINARY SEARCH TREES 267
Happy
Bashful Sleepy
Grumpy Sleazy
Filthy
EXERCISES
m cannot previously have had a left child). Show why this set of changes preserves
the BST property. Would you prefer this strategy to the one described in Section
5.7? Hint : For both methods, consider their effect on the lengths of paths. As we
shall see in the next section, short paths make the operations run fast.
5.7.8*: In this exercise, refer to the binary search tree represented in Fig. 5.39.
Show by induction on i that if 1 ≤ i ≤ k, then y < zi . Then, show that y is the
least element in the tree rooted at zk .
5.7.9: Write a complete C program to implement a dictionary that stores integers.
Accept commands of the form x i, where x is one of the letters i (insert), d (delete),
and l (lookup). Integer i is the argument of the command, the integer to be inserted,
deleted, or searched for.
✦
✦ ✦
✦
5.8 Efficiency of Binary Search Tree Operations
The binary search tree provides a reasonably fast implementation of a dictionary.
First, notice that each of the operations insert, delete, and lookup makes a number
of recursive calls equal to the length of the path followed (but this path must include
the route to the smallest element of the right subtree, in case deletemin is called).
Also, a simple analysis of the functions lookup, insert, delete, and deletemin
tells us that each operation takes O(1) time, plus the time for one recursive call.
Moreover, since this recursive call is always made at a child of the current node,
the height of the node in each successive call decreases by at least 1.
Thus, if T (h) is the time taken by any of these functions when called with
a pointer to a node of height h, we have the following recurrence relation upper-
bounding T (h):
BASIS. T (0) = O(1). That is, when called on a leaf, the call either terminates
without further calls or makes a recursive call with a NULL argument and then
returns without further calls. All of this work takes O(1) time.
INDUCTION. T (h) ≤ T (h − 1) + O(1) for h ≥ 1. That is, the time taken by a call
on any interior node is O(1) plus the time for a recursive call, which is on a node
of height at most h − 1. If we make the reasonable assumption that T (h) increases
with increasing h, then the time for the recursive call is no greater than T (h − 1).
The solution to the recurrence for T (h) is O(h), as discussed in Section 3.9.
Thus, the running time of each dictionary operation on a binary search tree of n
nodes is at most proportional to the height of the tree. But what is the height of a
typical binary search tree of n nodes?
n1
n2
···
nk
only but is a mixture of right and left children, with the path taking a turn either
left or right at each interior node.
The height of a k-node tree like Fig. 5.42 is clearly k − 1. We thus expect
that lookup, insert, and delete will take O(k) time on a dictionary of k elements, if
the representation of that dictionary happens to be one of these unfortunate trees.
Intuitively, if we need to look for element x, on the average we shall find it halfway
down the path, requiring us to look at k/2 nodes. If we fail to find x, we shall
likewise have to search down the tree until we come to the place where x would be
found, which will also be halfway down, on the average. Since each of the operations
lookup, insert, and delete requires searching for the element involved, we know that
these operations each take O(k) time on the average, given one of the bad trees of
the form of Fig. 5.42.
n1
n2 n3
n4 n5 n6 n7
A complete binary tree of height h has 2h+1 − 1 nodes. We can prove this claim
by induction on the height h.
BASIS. If h = 0, the tree consists of a single node. Since 20+1 − 1 = 1, the basis
case holds.
270 THE TREE DATA MODEL
INDUCTION. Suppose that a complete binary tree of height h has 2h+1 − 1 nodes,
and consider a complete binary tree of height h + 1. This tree consists of one node
at the root and left and right subtrees that are complete binary trees of height h.
For example, the height-2 complete binary tree of Fig. 5.43 consists of the root, n1 ;
a left subtree containing n2 , n4 , and n5 , which is a complete binary tree of height
1; and a right subtree consisting of the remaining three nodes, which is another
complete binary tree of height 1. Now the number of nodes in two complete binary
trees of height h is 2(2h − 1), by the inductive hypothesis. When we add the root
node, we find that a complete binary tree of height h+ 1 has 2(2h − 1)+ 1 = 2h+1 − 1
nodes, which proves the inductive step.
Now we can invert this relationship and say that a complete binary tree of
k = 2h+1 − 1 nodes has height h. Equivalently, k + 1 = 2h+1 . If we take logarithms,
then log2 (k + 1) = h + 1, or approximately, h is O(log k). Since the running time of
lookup, insert, and delete is proportional to the height of the tree, we can conclude
that on a complete binary tree, these operations take time that is logarithmic in the
number of nodes. That performance is much better than the linear time taken for
pessimal trees like Fig. 5.42. As the dictionary size becomes large, the running time
of the dictionary operations grows much more slowly than the number of elements
in the set.
Thus, we ask, For what value of d is (3/4)d k ≤ 1? If we take logarithms to the base
2, we get
d log2 (3/4) + log2 k ≤ log2 1 (5.1)
Now log2 1 = 0, and the quantity log2 (3/4) is a negative constant, about −0.4.
Thus we can rewrite (5.1) as log2 k ≤ 0.4d, or d ≥ (log2 k)/0.4 = 2.5 log2 k.
Put another way, at a depth of about two and a half times the logarithm to the
base 2 of the number of nodes, we expect to find only leaves (or to have found the
leaves at higher levels). This argument justifies, but does not prove, the statement
that the typical binary search tree will have a height that is proportional to the
logarithm of the number of nodes in the tree.
EXERCISES
5.8.1: If tree T has height h and branching factor b, what are the largest and
smallest numbers of nodes that T can have?
5.8.2**: Perform an experiment in which we choose one of the n! orders for n
different values and insert the values in this order into an initially empty binary
search tree. Let P (n) be the expected value of the depth of the node at which a
particular value v among the n values is found after this experiment.
a) Show that, for n ≥ 2,
n−1
2 X
P (n) = 1 + kP (k)
n2
k=1
✦
✦ ✦
✦
5.9 Priority Queues and Partially Ordered Trees
So far, we have seen only one abstract data type, the dictionary, and one imple-
mentation for it, the binary search tree. In this section we shall study another
abstract data type and one of its most efficient implementations. This ADT, called
a priority queue, is a set of elements each of which has an associated priority. For
example, the elements could be records and the priority could be the value of one
field of the record. The two operations associated with the priority queue ADT are
the following:
1. Inserting an element into the set (insert).
2. Finding and deleting from the set an element of highest priority (this combined
operation is called deletemax). The deleted element is returned by this function.
have certain background jobs such as backup of data to tape or long calculations
that the user has designated to run with a low priority.
Jobs can be represented by records consisting of an integer ID for the job and
an integer for the job’s priority. That is, we might use the structure
struct ETYPE {
int jobID;
int priority;
};
for elements of a priority queue. When a new job is initiated, it gets an ID and a
priority. We then execute the insert operation for this element on the priority queue
of jobs waiting for service. When a processor becomes available, the system goes
to the priority queue and executes the deletemax operation. The element returned
by this operation is a waiting job of highest priority, and that is the one executed
next. ✦
✦ Example 5.27. We can implement a sorting algorithm using the priority queue
ADT. Suppose we are given the sequence of integers a1 , a2 , . . . , an to sort. We
insert each into a priority queue, using the element’s value as its priority. If we
then execute deletemax n times, the integers will be selected highest first, or in the
reverse of their sorted (lowest-first) order. We shall discuss this algorithm in more
detail in the next section; it is known as heapsort. ✦
1. The labels of the nodes are elements with a “priority”; that priority may be
the value of an element or the value of some component of an element.
2. The element stored at a node has at least as large a priority as the elements
stored at the children of that node.
POT property Property 2 implies that the element at the root of any subtree is always a largest
element of that subtree. We call property 2 the partially ordered tree property, or
POT property.
✦ Example 5.28. Figure 5.44 shows a partially ordered tree with 10 elements.
Here, as elsewhere in this section, we shall represent elements by their priorities, as
if the element and the priority were the same thing. Note that equal elements can
appear on different levels in the tree. To see that the POT property is satisfied at
the root, note that 18, the element there, is no less than the elements 18 and 16
found at its children. Similarly, we can check that the POT property holds at every
interior node. Thus, Fig. 5.44 is a partially ordered tree. ✦
SEC. 5.9 PRIORITY QUEUES AND PARTIALLY ORDERED TREES 273
18
18 16
9 7 1 9
3 7 5
1 2 3 4 5 6 7 8 9 10
18 18 16 9 7 1 9 3 7 5
Layers of Implementation
It is useful to compare our two ADT’s, the dictionary and the priority queue, and
to notice that, in each case, we have given one abstract implementation and one
data structure for that implementation. There are other abstract implementations
for each, and other data structures for each abstract implementation. We promised
to discuss other abstract implementations for the dictionary, such as the hash table,
and in the exercises of Section 5.9 we suggest that the binary search tree may be
a suitable abstract implementation for the priority queue. The table below sum-
marizes what we already know about abstract implementations and data structures
for the dictionary and the priority queue.
✦ Example 5.29. The heap for the balanced partially ordered tree in Fig. 5.44
is shown in Fig. 5.45. For instance, A[4] holds the value 9; this array element
represents the left child of the left child of the root in Fig. 5.44. The children of
this node are found in A[8] and A[9]. Their elements, 3 and 7, are each no greater
than 9, as is required by the POT property. Array element A[5], which corresponds
to the right child of the left child of the root, has a left child in A[10]. It would
have a right child in A[11], but the partially ordered tree has only 10 elements at
the moment, and so A[11] is not part of the heap. ✦
While we have shown tree nodes and array elements as if they were the prior-
ities themselves, in principle an entire record appears at the node or in the array.
As we shall see, we shall have to do much swapping of elements between children
and parents in a partially ordered tree or its heap representation. Thus, it is con-
siderably more efficient if the array elements themselves are pointers to the records
representing the objects in the priority queue and these records are stored in an-
other array “outside” the heap. Then we can simply swap pointers, leaving the
records in place.
up” the new element until it either reaches a position where the parent has a larger
element or reaches the root.
Bubbling up The C function bubbleUp to perform this operation is shown in Fig. 5.46. It
makes use of a function swap(A,i,j) that exchanges the elements in A[i] and A[j];
this function is also defined in Fig. 5.46. The operation of bubbleUp is simple.
Given argument i indicating the node that, with its parent, possibly violates the
POT property, we test whether i = 1 (that is, whether we are already at the root,
so that no POT violation can occur), and if not, whether the element A[i] is greater
than the element at its parent. If so, we swap A[i] with its parent and recursively
call bubbleUp at the parent.
temp = A[i];
A[i] = A[j];
A[j] = temp;
}
Fig. 5.46. The function swap exchanges array elements, and the function bubbleUp
pushes a new element of a heap into its rightful place.
✦ Example 5.30. Suppose we start with the heap of Fig. 5.45 and we add an
eleventh element, with priority 13. This element goes in A[11], giving us the array
1 2 3 4 5 6 7 8 9 10 11
18 18 16 9 7 1 9 3 7 5 13
We now call bubbleUp(A,11), which compares A[11] with A[5] and finds that
we must swap these elements because A[11] is larger. That is, A[5] and A[11] violate
the POT property. Thus, the array becomes
1 2 3 4 5 6 7 8 9 10 11
18 18 16 9 13 1 9 3 7 5 7
Now we call bubbleUp(A,5). This results in comparison of A[2] and A[5]. Since
A[2] is larger, there is no POT violation, and bubbleUp(A,5) does nothing. We
have now restored the POT property to the array. ✦
276 THE TREE DATA MODEL
Implementation We now show how to implement the priority queue operation insert. Let n be
of insert the current number of elements in the priority queue, and assume A[1..n] already
satisfies the POT property. We increment n and then store the element to be
inserted into the new A[n]. Finally, we call bubbleUp(A,n). The code for insert is
shown in Fig. 5.47. The argument x is the element to be inserted, and the argument
pn is a pointer to the current size of the priority queue. Note that n must be passed
by reference — that is, by a pointer to n — so that when n is incremented the
change has an affect that is not local only to insert. A check that n < M AX is
omitted.
Fig. 5.48. bubbleDown pushes a POT violator down to its proper position.
SEC. 5.9 PRIORITY QUEUES AND PARTIALLY ORDERED TREES 277
This function is a bit tricky. We have to decide which child of A[i] to swap
with, if any, and the first thing we do is assume that the larger child is A[2i], at
line (1) of Fig. 5.48. If the right child exists (i.e., child < n) and the right child is
the larger, then the tests of line (2) are met and at line (3) we make child be the
right child of A[i].
Now at line (4) we test for two things. First, it is possible that A[i] really has
no children in the heap. We therefore check whether A[i] is an interior node by
asking whether child ≤ n. The second test of line (4) is whether A[i] is less than
A[child]. If both these conditions are met, then at line (5) we swap A[i] with its
larger child, and at line (6) we recursively call bubbleDown, to push the offending
element further down, if necessary.
Implementation We can use bubbleDown to implement the priority queue operation deletemax
of deletemax as shown in Fig. 5.49. The function deletemax takes as arguments an array A and
a pointer pn to the number n that is the number of elements currently in the heap.
We omit a test that n > 0.
In line (1), we swap the element at the root, which is to be deleted, with the
last element, in A[n]. Technically, we should return the deleted element, but, as
we shall see, it is convenient to put it in A[n], which will no longer be part of the
heap.
At line (2), we decrement n by 1, effectively deleting the largest element, now
residing in the old A[n]. Since the root may now violate the POT property, we call
bubbleDown(A,1,n) at line (3), which will recursively push the offending element
down until it either reaches a point where it is no less than either of its children, or
becomes a leaf; either way, there is no violation of the POT property.
✦ Example 5.31. Suppose we start with the heap of Fig. 5.45 and execute
deletemax. After swapping A[1] and A[10], we set n to 9. The heap then becomes
1 2 3 4 5 6 7 8 9
5 18 16 9 7 1 9 3 7
Next, we compare A[8] and A[9], finding that the latter is larger, so that child =
9 at line (4) of bubbleDown(A,4,9). We again perform the swap, since A[4] < A[9],
resulting in the array
1 2 3 4 5 6 7 8 9
18 9 16 7 7 1 9 3 5
Next, we call bubbleDown(A,9,9). We set child to 18 at line (1), and the first
test of line (2) fails, because child < n is false. Similarly, the test of line (4) fails,
and we make no swap or recursive call. The array is now a heap with the POT
property restored. ✦
BASIS. If i = 1, then T (i) is O(1), since it is easy to check that the bubbleUp
program of Fig. 5.46 does not make any recursive calls and only the test of the
if-statement is executed.
INDUCTION. If i > 1, then the if-statement test may fail anyway, because A[i]
does not need to rise further. If the test succeeds, then we execute swap, which
takes O(1) time, plus a recursive call to bubbleUp with an argument i/2 (or slightly
less if i is odd). Thus T (i) ≤ T (i/2) + O(1).
for each j. As in Section 3.10, we choose the value of j that makes T (i/2j ) simplest.
In this case, we make j equal to log2 i, so that i/2j = 1. Thus, (5.2) becomes
T (i) = a + b log2 i; that is, T (i) is O(log i). Since bubbleUp is O(log i), so is insert.
Now consider deletemax. We can see from Fig. 5.49 that the running time of
deletemax is O(1) plus the running time of bubbleDown. The analysis of bubble-
Down, in Fig. 5.48, is essentially the same as that of bubbleUp. We omit it and
conclude that bubbleDown and deletemax also take O(log n) time.
EXERCISES
5.9.1: Starting with the heap of Fig. 5.45, show what happens when we
a) Insert 3
b) Insert 20
c) Delete the maximum element
d) Again delete the maximum element
5.9.2: Prove Equation (5.2) by induction on i.
5.9.3: Prove by induction on the depth of the POT-property violation that the
function bubbleUp of Fig. 5.46 correctly restores a tree with one violation to a tree
that has the POT property.
5.9.4: Prove that the function insert(A,x,n) makes A into a heap of size n, if A
was previously a heap of size n − 1. You may use Exercise 5.9.3. What happens if
A was not previously a heap?
5.9.5: Prove by induction on the height of the POT-property violation that the
function bubbleDown of Fig. 5.48 correctly restores a tree with one violation to a
tree that has the POT property.
5.9.6: Prove that deletemax(A,n) makes a heap of size n into one of size n − 1.
What happens if A was not previously a heap?
5.9.7: Prove that bubbleDown(A,1,n) takes O(log n) time on a heap of length n.
5.9.8**: What is the probability that an n-element heap, with distinct element
priorities chosen at random, is a partially ordered tree? If you cannot derive the
general rule, write a recursive function to compute the probability as a function of
n.
5.9.9: We do not need to use a heap to implement a partially ordered tree. Suppose
we use the conventional left-child–right-child data structure for binary trees. Show
how to implement the functions bubbleDown, insert, and deletemax using this
structure instead of the heap structure.
5.9.10*: A binary search tree can be used as an abstract implementation of a
priority queue. Show how the operations insert and deletemax can be implemented
using a binary search tree with the left-child–right-child data structure. What is
the running time of these operations (a) in the worst case and (b) on the average?
280 THE TREE DATA MODEL
✦
✦ ✦
✦
5.10 Heapsort: Sorting with Balanced POTs
We shall now describe the algorithm known as heapsort. It sorts an array A[1..n]
in two phases. In the first phase, heapsort gives A the POT property. The second
phase of heapsort repeatedly selects the largest remaining element from the heap
until the heap consists of only the smallest element, whereupon the array A is
sorted.
Figure 5.50 shows the array A during the second phase. The initial part of
the array has the POT property, and the remaining part has its elements sorted in
nondecreasing order. Furthermore, the elements in the sorted part are the largest
n − i elements in the array. During the second phase, i is allowed to run from n
down to 1, so that the heap, initially the entire array A, eventually shrinks until
it is only the smallest element, located in A[1]. In more detail, the second phase
consists of the following steps.
1. A[1], the largest element in A[1..i], is exchanged with A[i]. Since all elements
in A[i+1..n] are as large as or larger than any of A[1..i], and since we just
moved the largest of the latter group of elements to position i, we know that
A[i..n] are the largest n − i + 1 elements and are in sorted order.
2. The value i is decremented, reducing the size of the heap by 1.
3. The POT property is restored to the initial part of the array by bubbling down
the element at the root, which we just moved to A[1].
✦ Example 5.32. Consider the array in Fig. 5.45, which has the POT property.
Let us go through the first iteration of the second phase. In the first step, we
exchange A[1] and A[10] to get:
1 2 3 4 5 6 7 8 9 10
5 18 16 9 7 1 9 3 7 18
The second step reduces the heap size to 9, and the third step restores the POT
property to the first nine elements by calling bubbleDown(1). In this call, A[1] and
A[2] are exchanged:
1 2 3 4 5 6 7 8 9 10
18 5 16 9 7 1 9 3 7 18
1 2 3 4 5 6 7 8 9 10
18 9 16 5 7 1 9 3 7 18
1 2 3 4 5 6 7 8 9 10
18 9 16 7 7 1 9 3 5 18
1 2 3 4 5 6 7 8 9 10
16 9 9 7 7 1 5 3 18 18
At this stage, the last two elements of the array are the two largest elements, in
sorted order.
Phase 2 continues until the array is completely sorted:
1 2 3 4 5 6 7 8 9 10
1 3 5 7 7 9 9 16 18 18
Heapifying an Array
We can provide an upper bound on the finite sum (5.3) by extending it to an infinite
sum and then pulling out the factor n/2:
SEC. 5.10 HEAPSORT: SORTING WITH BALANCED POTS 283
∞
nX
i/2i (5.4)
2 i=1
P∞
We must now get an upper bound on the sum in (5.4). This sum, i=1 i/2i ,
can be written as
(1/2) + (1/4 + 1/4) + (1/8 + 1/8 + 1/8) + (1/16 + 1/16 + 1/16 + 1/16) + · · ·
We can write these inverse powers of 2 as the triangle shown in Fig. 5.53. Each row
is an infinite geometric series with ratio 1/2, which sums to twice the first term in
the series, as indicated at the right edge of Fig. 5.53. The row sums form another
geometric series, which sums to 2.
#include <stdio.h>
int A[MAX+1];
main()
{
int i, n, x;
n = 0;
while (n < MAX && scanf("%d", &x) != EOF)
A[++n] = x;
heapsort(A, n);
for (i = 1; i <= n; i++)
printf("%d\n", A[i]);
}
EXERCISES
✦
✦ ✦
✦
5.11 Summary of Chapter 5
The reader should take away the following points from Chapter 5:
✦ Trees are an important data model for representing hierarchical information.
SEC. 5.12 BIBLIOGRAPHIC NOTES FOR CHAPTER 5 285
✦ Many data structures involving combinations of arrays and pointers can be used
to implement trees, and the data structure of choice depends on the operations
performed on the tree.
✦ Two of the most important representations for tree nodes are the leftmost-
child–right-sibling representation and the trie (array of pointers to children).
✦ Recursive algorithms and proofs are well suited for trees. A variant of our
basic induction scheme, called structural induction, is effectively a complete
induction on the number of nodes in a tree.
✦ The binary tree is a variant of the tree model in which each node has (optional)
left and right children.
✦ A binary search tree is a labeled binary tree with the “binary search tree
property” that all the labels in the left subtree precede the label at a node, and
all labels in the right subtree follow the label at the node.
✦ The dictionary abstract data type is a set upon which we can perform the oper-
ations insert, delete, and lookup. The binary search tree efficiently implements
dictionaries.
✦ A priority queue is another abstract data type, a set upon which we can perform
the operations insert and deletemax.
✦ A partially ordered tree is a labeled binary tree with the property that the
label at any node is at least as great as the label at its children.
✦ Balanced partially ordered trees, where the nodes fully occupy levels from the
root to the lowest level, where only the leftmost positions are occupied, can be
implemented by an array structure called a heap. This structure provides an
O(log n) implementation of a priority queue and leads to an O(n log n) sorting
algorithm called heapsort.
✦
✦ ✦
✦
5.12 Bibliographic Notes for Chapter 5
The trie representation of trees is from Fredkin [1960]. The binary search tree was
invented independently by a number of people, and the reader is referred to Knuth
[1973] for a history as well as a great deal more information on various kinds of
search trees. For more advanced applications of trees, see Tarjan [1983].
Williams [1964] devised the heap implementation of balanced partially ordered
trees. Floyd [1964] describes an efficient version of heapsort.
Floyd, R. W. [1964]. “Algorithm 245: Treesort 3,” Comm. ACM 7:12, pp. 701.
Fredkin, E. [1960]. “Trie memory,” Comm. ACM 3:4, pp. 490–500.
Knuth, D. E. [1973]. The Art of Computer Programming, Vol. III, Sorting and
Searching, 2nd ed., Addison-Wesley, Reading, Mass.
Tarjan, R. E. [1983]. Data Structures and Network Algorithms, SIAM Press,
Philadelphia.
Williams, J. W. J. [1964]. “Algorithm 232: Heapsort,” Comm. ACM 7:6, pp. 347–
348.
CHAPTER 6
✦
✦ ✦
✦
The List
Data Model
Like trees, lists are among the most basic of data models used in computer programs.
Lists are, in a sense, simple forms of trees, because one can think of a list as a binary
tree in which every left child is a leaf. However, lists also present some aspects that
are not special cases of what we have learned about trees. For instance, we shall talk
about operations on lists, such as pushing and popping, that have no common analog
for trees, and we shall talk of character strings, which are special and important
kinds of lists requiring their own data structures.
✦
✦ ✦
✦
6.1 What This Chapter Is About
We introduce list terminology in Section 6.2. Then in the remainder of the chapter
we present the following topics:
✦ The stack, a list upon which we insert and delete at only one end (Section 6.6).
✦ The queue, a list upon which we insert at one end and delete at the other
(Section 6.8).
✦ Character strings and the special data structures we use to represent them
(Section 6.10).
✦ The run-time stack and the way C and many other languages implement re-
cursive functions (Section 6.7).
286
SEC. 6.2 BASIC TERMINOLOGY 287
✦ The problem of finding longest common subsequences of two strings, and its
solution by a “dynamic programming,” or table-filling, algorithm (Section 6.9).
✦
✦ ✦
✦
6.2 Basic Terminology
A list is a finite sequence of zero or more elements. If the elements are all of type
List T , then we say that the type of the list is “list of T .” Thus we can have lists of
integers, lists of real numbers, lists of structures, lists of lists of integers, and so on.
We generally expect the elements of a list to be of some one type. However, since
a type can be the union of several types, the restriction to a single “type” can be
circumvented.
A list is often written with its elements separated by commas and enclosed in
parentheses:
(a1 , a2 , . . . , an )
where the ai ’s are the elements of the list.
In some situations we shall not show commas or parentheses explicitly. In
Character particular, we shall study character strings, which are lists of characters. Character
string strings are generally written with no comma or other separating marks and with no
surrounding parentheses. Elements of character strings will normally be written in
typewriter font. Thus foo is the list of three characters of which the first is f and
the second and third are o.
✦ Example 6.2. A line of text is another example of a list. The individual charac-
ters making up the line are the elements of this list, so the list is a character string.
This character string usually includes several occurrences of the blank character,
and normally the last character in a line of text is the “newline” character.
As another example, a document can be viewed as a list. Here the elements
of the list are the lines of text. Thus a document is a list whose elements that are
themselves lists, character strings in particular. ✦
288 THE LIST DATA MODEL
(0,1,1)
(1,1,1)
(0,0,1)
(1,0,1)
(0,1,0) (1,1,0)
(0,0,0) (1,0,0)
✦ Example 6.4. The length of list (1) in Example 6.1 is 8, and the length of list
(2) is 6. The length of list (3) is 12, since there is one position for each month. The
fact that there are only three different numbers on the list is irrelevant as far as the
length of the list is concerned. ✦
Parts of a List
Head and tail of If a list is not empty, then it consists of a first element, called the head and the
a list remainder of the list, called the tail. For instance, the head of list (2) in Example
6.1 is helium, while the tail is the list consisting of the remaining five elements,
(neon, argon, krypton, xenon, radon)
SEC. 6.2 BASIC TERMINOLOGY 289
✦ Example 6.5. Let L be the character string abc. The sublists of L are
ǫ, a, b, c, ab, bc, abc
These are all subsequences of L as well, and in addition, ac is a subsequence, but
not a sublist.
For another example, let L be the character string abab. Then the sublists are
ǫ, a, b, ab, ba, aba, bab, abab
These are also subsequences of L, and in addition, L has the subsequences aa, bb,
aab, and abb. Notice that a character string like bba is not a subsequence of L.
Even though L has two b’s and an a, they do not appear in such an order in L
that we can form bba by striking out elements of L. That is, there is no a after the
second b in L. ✦
Prefix and A prefix of a list is any sublist that starts at the beginning of the list. A suffix
suffix is a sublist that terminates at the end of the list. As special cases, we regard ǫ as
both a prefix and a suffix of any list.
✦ Example 6.6. The prefixes of the list abc are ǫ, a, ab, and abc. Its suffixes are
ǫ, c, bc, and abc. ✦
290 THE LIST DATA MODEL
EXERCISES
6.2.1: Answer the following questions about the list (2, 7, 1, 8, 2).
a) What is the length?
b) What are all the prefixes?
c) What are all the suffixes?
d) What are all the sublists?
e) How many subsequences are there?
f) What is the head?
g) What is the tail?
h) How many positions are there?
6.2.2: Repeat Exercise 6.2.1 for the character string banana.
6.2.3**: In a list of length n ≥ 0, what are the largest and smallest possible
numbers of (a) prefixes (b) sublists (c) subsequences?
6.2.4: If the tail of the tail of the list L is the empty list, what is the length of L?
6.2.5*: Bea Fuddled wrote a list whose elements are themselves lists of integers,
but omitted the parentheses: 1,2,3. There are many lists of lists that could have
been represented, such as (1), (2, 3) . What are all the possible lists that do not
have the empty list as an element?
SEC. 6.3 OPERATIONS ON LISTS 291
✦
✦ ✦
✦
6.3 Operations on Lists
A great variety of operations can be performed on lists. In Chapter 2, when we
discussed merge sort, the basic problem was to sort a list, but we also needed to
split a list into two, and to merge two sorted lists. Formally, the operation of
sorting a list (a1 , a2 , . . . , an ) amounts to replacing the list with a list consisting of
a permutation of its elements, (b1 , b2 , . . . , bn ), such that b1 ≤ b2 ≤ · · · ≤ bn . Here,
Sorting and as before, ≤ represents an ordering of the elements, such as “less than or equal to”
merging on integers or reals, or lexicographic order on strings. The operation of merging
two sorted lists consists of constructing from them a sorted list containing the same
elements as the two given lists. Multiplicity must be preserved; that is, if there are
k occurrences of element a among the two given lists, then the resulting list has k
occurrences of a. Review Section 2.8 for examples of these two operations on lists.
✦ Example 6.7. Let L be the list (1, 2, 3, 2). The result of insert(1, L) could be
the list (1, 1, 2, 3, 2), if we chose to push 1, that is, to insert 1 at the beginning. We
could also insert the new 1 at the end, yielding (1, 2, 3, 2, 1). Alternatively, the new
1 could be placed in any of three positions interior to the list L.
The result of delete(2, L) is the list (1, 3, 2) if we delete the first occurrence of
2. If we ask lookup(x, L), the answer is TRUE if x is 1, 2, or 3, but FALSE if x is any
other value. ✦
292 THE LIST DATA MODEL
Concatenation
We concatenate two lists L and M by forming the list that begins with the elements
of L and then continues with the elements of M . That is, if L = (a1 , a2 , . . . , an )
and M = (b1 , b2 , . . . , bk ), then LM , the concatenation of L and M , is the list
(a1 , a2 , . . . , an , b1 , b2 , . . . , bk )
Identity for Note that the empty list is the identity for concatenation. That is, for any list L,
concatenation we have ǫL = Lǫ = L.
✦ Example 6.8. If L is the list (1, 2, 3) and M is the list (3, 1), then LM is the list
(1, 2, 3, 3, 1). If L is the character string dog and M is the character string house,
then LM is the character string doghouse. ✦
EXERCISES
a) delete x, insert(x, L) = L
b) insert x, delete(x, L) = L
c) f irst(L) = retrieve(1, L)
d) last(L) = retrieve length(L), L
✦
✦ ✦
✦
6.4 The Linked-List Data Structure
The easiest way to implement a list is to use a linked list of cells. Each cell consists
of two fields, one containing an element of the list, the other a pointer to the next
cell on the linked list. In this chapter we shall make the simplifying assumption that
elements are integers. Not only may we use the specific type int for the type of
elements, but we can compare elements by the standard comparison operators ==, <,
and so on. The exercises invite the reader to develop variants of our functions that
work for arbitrary types, where comparisons are made by user-defined functions
such as eq to test equality, lt(x, y) to test if x precedes y in some ordering, and so
on.
In what follows, we shall use our macro from Section 1.6:
DefCell(int, CELL, LIST);
which expands into our standard structure for cells and lists:
typedef struct CELL *LIST;
struct CELL {
int element;
LIST next;
};
Note that LIST is the type of a pointer to a cell. In effect, the next field of each
cell points both to the next cell and to the entire remainder of the list.
Figure 6.2 shows a linked list that represents the abstract list
L = (a1 , a2 , . . . , an )
There is one cell for each element; the element ai appears in the element field of the
ith cell. The pointer in the ith cell points to the (i + 1)st cell, for i = 1, 2, . . . , n − 1,
and the pointer in the last cell is NULL, indicating the end of the list. Outside the
list is a pointer, named L, that points to the first cell of the list; L is of type LIST.
If the list L were empty, then the value of L would be NULL.
L a1 a2 ··· an •
Lookup
To perform lookup(x, D), we examine each cell of the list representing D to see
whether it holds the desired element x. If so, we return TRUE. If we reach the end
of the list without finding x, we return FALSE. As before, the defined constants
TRUE and FALSE stand for the constants 1 and 0, and BOOLEAN for the defined type
int. A recursive function lookup(x,D) is shown in Fig. 6.3.
If the list has length n, we claim that the function of Fig. 6.3 takes O(n) time.
Except for the recursive call at the end, lookup takes O(1) time. When the call is
SEC. 6.4 THE LINKED-LIST DATA STRUCTURE 295
made, the length of the remaining list is 1 less than the length of the list L. Thus
it should surprise no one that lookup on a list of length n takes O(n) time. More
formally, the following recurrence relation gives the running time T (n) of lookup
when the list L pointed to by the second argument has length n.
The solution to this recurrence is T (n) = O(n), as we saw several times in Chapter
3. Since a dictionary of n elements is represented by a list of length n, lookup takes
O(n) time on a dictionary of size n.
Unfortunately, the average time for a successful lookup is also proportional to
n. For if we are looking for an element x known to be in D, the expected value of
the position of x in the list is (n + 1)/2. That is, x could be anywhere from the first
to the nth element, with equal probability. Thus, the expected number of recursive
calls to lookup is (n + 1)/2. Since each takes O(1) time, the average successful
lookup takes O(n) time. Of course, if the lookup is unsuccessful, we make all n
calls before reaching the end of the list and returning FALSE.
Deletion
A function to delete an element x from a linked list is shown in Fig. 6.4. The second
parameter pL is a pointer to the list L (rather than the list L itself). We use the
“call by reference” style here because we want delete to remove the cell containing
x from the list. As we move down the list, pL holds a pointer to a pointer to the
“current” cell. If we find x in the current cell C, at line (2), then we change the
pointer to cell C at line (3), so that it points to the cell following C on the list. If
C happens to be last on the list, the former pointer to C becomes NULL. If x is not
the current element, then at line (4) we recursively delete x from the tail of the list.
Note that the test at line (1) causes the function to return with no action if
the list is empty. That is because x is not present on an empty list, and we need
not do anything to remove x from the dictionary. If D is a linked list representing
a dictionary, then a call to delete(x, &D) initiates the deletion of x from the
dictionary D.
If the element x is not on the list for the dictionary D, then we run down to
the end of the list, taking O(1) time for each element. The analysis is similar to
296 THE LIST DATA MODEL
that for the lookup function of Fig. 6.3, and we leave the details for the reader.
Thus the time to delete an element not in D is O(n) if D has n elements. If x is
in the dictionary D, then, on the average, we shall encounter x halfway down the
list. Therefore we search (n + 1)/2 cells on the average, and the running time of a
successful deletion is also O(n).
Insertion
A function to insert an element x into a linked list is shown in Fig. 6.5. To insert
x, we need to check that x is not already on the list (if it is, we do nothing). If x
is not already present, we must add it to the list. It does not matter where in the
list we add x, but the function in Fig. 6.5 adds x to the end of the list. When at
line (1) we detect the NULL at the end, we are therefore sure that x is not already
on the list. Then, lines (2) through (4) append x to the end of the list.
If the list is not NULL, line (5) checks for x at the current cell. If x is not there,
line (6) makes a recursive call on the tail. If x is found at line (5), then function
insert terminates with no recursive call and with no change to the list L. A call
to insert(x, &D) initiates the insertion of x into dictionary D.
As in the case of lookup and deletion, if we do not find x on the list, we travel to
the end, taking O(n) time. If we do find x, then on the average we travel halfway1
down the list, and we still take O(n) time on the average.
1 In the following analyses, we shall use terms like “halfway” or “n/2” when we mean the
middle of a list of length n. Strictly speaking, (n + 1)/2 is more accurate.
SEC. 6.4 THE LINKED-LIST DATA STRUCTURE 297
Deletion is slightly different. We cannot stop our search for x when we en-
counter a cell with element x, because there could be other copies of x. Thus we
must delete x from the tail of a list L, even when the head of L contains x. As a
result, not only do we have longer lists to contend with, but to achieve a successful
deletion we must search every cell rather than an average of half the list, as we
could for the case in which no duplicates were allowed on the list. The details of
these versions of the dictionary operations are left as an exercise.
In summary, by allowing duplicates, we make insertion faster, O(1) instead of
O(n). However, successful deletions require search of the entire list, rather than an
average of half the list, and for both lookup and deletion, we must contend with lists
that are longer than when duplicates are not allowed, although how much longer
depends on how often we insert an element that is already present in the dictionary.
Which method to choose is a bit subtle. Clearly, if insertions predominate, we
should allow duplicates. In the extreme case, where we do only insertions but never
lookup or deletion, we get performance of O(1) per operation, instead of O(n).2 If
we can be sure, for some reason, that we shall never insert an element already in
the dictionary, then we can use both the fast insertion and the fast deletion, where
we stop when we find one occurrence of the element to be deleted. On the other
hand, if we may insert duplicate elements, and lookups or deletions predominate,
then we are best off checking for the presence of x before inserting it, as in the
insert function of Fig. 6.5.
2 But why bother inserting into a dictionary if we never look to see what is there?
298 THE LIST DATA MODEL
Comparison of Methods
The table in Fig. 6.7 indicates the number of cells we must search for each of
the three dictionary operations, for each of the three list-based representations of
dictionaries we have discussed. We take n to be the number of elements in the
dictionary, which is also the length of the list if no duplicates are allowed. We use
m for the length of the list when duplicates are allowed. We know that m ≥ n, but
we do not know how much greater m is than n. Where we use n/2 → n we mean
that the number of cells is an average of n/2 when the search is successful, and n
when unsuccessful. The entry n/2 → m indicates that on a successful lookup we
shall see n/2 elements of the dictionary, on the average, before seeing the one we
want,3 but on an unsuccessful search, we must go all the way to the end of a list of
length m.
Notice that all of these running times, except for insertion with duplicates,
are worse than the average running times for dictionary operations when the data
structure is a binary search tree. As we saw in Section 5.8, dictionary operations
take only O(log n) time on the average when a binary search tree is used.
3 In fact, since there may be duplicates, we may have to examine somewhat more than n/2
cells before we can expect to see n/2 different elements.
SEC. 6.4 THE LINKED-LIST DATA STRUCTURE 299
L • a1 a2 ··· an •
Dictionary operations on a doubly linked list structure are essentially the same
as those on a singly linked list. To see the advantage of doubly linked lists, consider
the operation of deleting an element ai , given only a pointer to the cell containing
that element. With a singly linked list, we would have to find the previous cell by
searching the list from the beginning. With a doubly linked list, we can do the task
in O(1) time by a sequence of pointer manipulations, as shown in Fig. 6.9.
300 THE LIST DATA MODEL
EXERCISES
6.4.1: Set up the recurrence relations for the running times of (a) delete in Fig.
6.4 (b) insert in Fig. 6.5. What are their solutions?
6.4.2: Write C functions for dictionary operations insert, lookup and delete using
linked lists with duplicates.
6.4.3: Write C functions for insert and delete, using sorted lists as in Fig. 6.6.
6.4.4: Write a C function that inserts an element x into a new cell that follows
the cell pointed to by p on a doubly linked list. Figure 6.9 is a similar function for
deletion, but for insertion, we don’t need to know the list header L.
6.4.5: If we use the doubly linked data structure for lists, an option is to represent a
list not by a pointer to a cell, but by a cell with the element field unused. Note that
this “header” cell is not itself a part of the list. The next field of the header points
to the first true cell of the list, and the previous field of the first cell points to the
header cell. We can then delete the cell (not the header) pointed to by pointer p
without knowing the header L, as we needed to know in Fig. 6.9. Write a C function
to delete from a doubly linked list using the format described here.
6.4.6: Write recursive functions for (a) retrieve(i, L) (b) length(L) (c) last(L)
using the linked-list data structure.
SEC. 6.5 ARRAY-BASED IMPLEMENTATION OF LISTS 301
6.4.7: Extend each of the following functions to cells with an arbitrary type ETYPE
for elements, using functions eq(x, y) to test if x and y are equal and lt(x, y) to tell
if x precedes y in an ordering of the elements of ETYPE.
✦
✦ ✦
✦
6.5 Array-Based Implementation of Lists
Figure 6.10 shows how we might represent the list (a0 , a1 , . . . , an−1 ) using an array
A[0..MAX-1]. Elements a0 , a1 , . . . , an−1 are stored in A[0..n-1], and length = n.
0 a0
1 a1
···
n−1 an−1
···
M AX − 1
As in the previous section, we assume that list elements are integers and invite
the reader to generalize the functions to arbitrary types. The structure declaration
for this array-based implementation of lists is:
typedef struct {
int A[MAX];
int length;
} LIST;
302 THE LIST DATA MODEL
Here, LIST is a structure of two fields; the first is an array A that stores the elements,
the second an integer length that contains the number of elements currently on
the list. The quantity MAX is a user-defined constant that bounds the number of
elements that will ever be stored on the list.
The array-based representation of lists is in many ways more convenient than
the linked-list representation. It does suffer, however, from the limitation that lists
cannot grow longer than the array, which can cause an insertion to fail. In the
linked-list representation, we can grow lists as long as we have available computer
memory.
We can perform the dictionary operations on array-based lists in roughly the
same time as on lists in the linked-list representation. To insert x, we look for x,
and if we do not find it, we check whether length < M AX. If not, there is an error
condition, as we cannot fit the new element into the array. Otherwise, we store x
in A[length] and then increase length by 1. To delete x, we again lookup x, and if
found, we shift all following elements of A down by one position; then, we decrement
length by 1. The details of functions to implement insert and delete are left as
exercises. We shall concentrate on the details of lookup below.
It is easy to see that, on the average, we search half the array A[0..length-1]
before finding x if it is present. Thus letting n be the value of length, we take
O(n) time to perform a lookup. If x is not present, we search the whole array
A[0..length-1], again requiring O(n) time. This performance is the same as for
a linked-list representation of a list.
SEC. 6.5 ARRAY-BASED IMPLEMENTATION OF LISTS 303
(1) pL->A[pL->length] = x;
(2) i = 0;
(3) while (x != pL->A[i])
(4) i++;
(5) return (i < pL->length);
}
The sentinel is placed just beyond the list by line (1). Note that since length
does not change, this x is not really part of the list. The loop of lines (3) and (4)
increases i until we find x. Note that we are guaranteed to find x even if the list
is empty, because of the sentinel. After finding x, we test at line (5) whether we
have found a real occurrence of x (that is, i < length), or whether we have found
the sentinel (that is, i = length). Note that if we are using a sentinel, it is essential
that length be kept strictly less than MAX, or else there will be no place to put the
sentinel.
304 THE LIST DATA MODEL
A
0
Search here
if x < A ⌊(n − 1)/2⌋
⌊(n − 1)/2⌋
Search here
if x > A ⌊(n − 1)/2⌋
n−1
4 The notation ⌊a⌋, the f loor of a, is the integer part of a. Thus ⌊6.5⌋ = 6 and ⌊6⌋ = 6.
Also, ⌈a⌉, the ceiling of a, is the smallest integer greater than or equal to a. For instance,
⌈6.5⌉ = 7 and ⌈6⌉ = 6.
SEC. 6.5 ARRAY-BASED IMPLEMENTATION OF LISTS 305
2. Calls itself recursively at line (5) or line (7) on a sublist that is at most half as
long as the array A[low..high] that it was given to search.
Starting with an array of length n, we cannot divide the length of the array to be
searched in half more than log2 n times before it has length 1, whereupon we either
find x at A[mid], or we fail to find x at all after a call on the empty list.
To look for x in an array A with n elements, we call binsearch(x,A,0,n-1).
We see that binsearch calls itself O(log n) times at most. At each call, we spend
O(1) time, plus the time of the recursive call. The running time of binary search
is therefore O(log n). That compares favorably with the linear search, which takes
O(n) time on the average, as we have seen.
EXERCISES
6.5.1: Write functions to (a) insert x and (b) delete x from a list L, using linear
search of an array.
6.5.2: Repeat Exercise 6.5.1 for an array with sentinels.
6.5.3: Repeat Exercise 6.5.1 for a sorted array.
6.5.4: Write the following functions assuming that list elements are of some arbi-
trary type ETYPE, for which we have functions eq(x, y) that tells whether x and y
are equal and lt(x, y) telling whether x precedes y in the order of elements of type
ETYPE.
a) Function lookup of Fig. 6.11.
b) Function lookup of Fig. 6.12.
c) Function binsearch of Fig. 6.14.
306 THE LIST DATA MODEL
6.5.5**: Let P (k) be the length (high − low + 1) of the longest array such that the
Probes in binary search algorithm of Fig. 6.14 never makes more than k probes [evaluations of
binary search mid at line (3)]. For example, P (1) = 1, and P (2) = 3. Write a recurrence relation
for P (k). What is the solution to your recurrence relation? Does it demonstrate
that binary search makes O(log n) probes?
6.5.6*: Prove by induction on the difference between low and high that if x is in
the range A[low..high], then the binary search algorithm of Fig. 6.14 will find x.
6.5.7: Suppose we allowed arrays to have duplicates, so insertion could be done in
O(1) time. Write insert, delete, and lookup functions for this data structure.
6.5.8: Rewrite the binary search program to use iteration rather than recursion.
6.5.9**: Set up and solve a recurrence relation for the running time of binary search
on an array of n elements. Hint : To simplify, it helps to take T (n) as an upper
bound on the running time of binary search on any array of n or fewer elements
(rather than on exactly n elements, as would be our usual approach).
Ternary search 6.5.10: In ternary search, given a range low to high, we compute the approximate
1/3 point of the range,
f irst = ⌊(2 × low + high)/3⌋
and compare the lookup element x with A[f irst]. If x > A[f irst], we compute the
approximate 2/3 point,
second = ⌈(low + 2 × high)/3⌉
and compare x with A[second]. Thus we isolate x to within one of three ranges,
each no more than one third the range low to high. Write a function to perform
ternary search.
6.5.11**: Repeat Exercise 6.5.5 for ternary search. That is, find and solve a
recurrence relation for the largest array that requires no more than k probes during
ternary search. How do the number of probes required for binary and ternary search
compare? That is, for a given k, can we handle larger arrays by binary search or
by ternary search?
✦
✦ ✦
✦
6.6 Stacks
A stack is an abstract data type based on the list data model in which all operations
Top of stack are performed at one end of the list, which is called the top of the stack. The term
“LIFO (for last-in first-out) list” is a synonym for stack.
The abstract model of a stack is the same as that of a list — that is, a sequence
of elements a1 , a2 , . . . , an of some one type. What distinguishes stacks from general
lists is the particular set of operations permitted. We shall give a more complete set
of operations later, but for the moment, we note that the quintessential stack oper-
Push and pop ations are push and pop, where push(x) puts the element x on top of the stack and
pop removes the topmost element from the stack. If we write stacks with the top at
the right end, the operation push(x) applied to the list (a1 , a2 , . . . , an ) yields the list
(a1 , a2 , . . . , an , x). Popping the list (a1 , a2 , . . . , an ) yields the list (a1 , a2 , . . . , an−1 );
popping the empty list, ǫ, is impossible and causes an error condition,
SEC. 6.6 STACKS 307
✦ Example 6.9. Many compilers begin by turning the infix expressions that
appear in programs into equivalent postfix expressions. For example, the expression
(3 + 4) × 2 is 3 4 + 2 × in postfix notation. A stack can be used to evaluate postfix
expressions. Starting with an empty stack, we scan the postfix expression from left
to right. Each time we encounter an argument, we push it onto the stack. When we
encounter an operator, we pop the stack twice, remembering the operands popped.
We then apply the operator to the two popped values (with the second as the left
operand) and push the result onto the stack. Figure 6.15 shows the stack after each
step in the processing of the postfix expression 3 4 + 2 ×. The result, 14, remains
on the stack after processing. ✦
Operations on a Stack
The two previous ADT’s we discussed, the dictionary and the priority queue, each
had a definite set of associated operations. The stack is really a family of similar
The ADT stack ADT’s with the same underlying model, but with some variation in the set of
allowable operations. In this section, we shall discuss the common operations on
stacks and show two data structures that can serve as implementations for the stack,
one based on linked lists and the other on arrays.
In any collection of stack operations we expect to see push and pop, as we
mentioned. There is another common thread to the operations chosen for the stack
ADT(s): they can all be implemented simply in O(1) time, independent of the
number of elements on the stack. You can check as an exercise that for the two
data structures suggested, all operations require only constant time.
In addition to push and pop, we generally need an operation clear that initial-
Clear stack izes the stack to be empty. In Example 6.9, we tacitly assumed that the stack started
out empty, without explaining how it got that way. Another possible operation is
a test to determine whether the stack is currently empty.
Full and empty The last of the operations we shall consider is a test whether the stack is
stacks “full.” Now in our abstract model of a stack, there is no notion of a full stack, since
a stack is a list and lists can grow as long as we like, in principle. However, in any
implementation of a stack, there will be some length beyond which it cannot grow.
The most common example is when we represent a list or stack by an array. As
308 THE LIST DATA MODEL
seen in the previous section, we had to assume the list would not grow beyond the
constant MAX, or our implementation of insert would not work.
The formal definitions of the operations we shall use in our implementation of
stacks are the following. Let S be a stack of type ETYPE and x an element of type
ETYPE.
1. clear(S). Make the stack S empty.
2. isEmpty(S). Return TRUE if S is empty, FALSE otherwise.
3. isF ull(S). Return TRUE if S is full, FALSE otherwise.
4. pop(S, x). If S is empty, return FALSE; otherwise, set x to the value of the top
element on stack S, remove this element from S, and return TRUE.
5. push(x, S). If S is full, return FALSE; otherwise, add the element x to the top
of S and return TRUE.
There is a common variation of pop that assumes S is nonempty. It takes only S
as an argument and returns the element x that is popped. Yet another alternative
version of pop does not return a value at all; it just removes the element at the top
of the stack. Similarly, we may write push with the assumption that S is not “full.”
In that case, push does not return any value.
0 a0
1 a1
···
n−1 an−1
···
M AX − 1
With an array-based implementation, the stack can grow either upward (from lower
locations to higher) or downward (from higher locations to lower). We choose to
have the stack grow upward;5 that is, the oldest element a0 in the stack is in
location 0, the next-to-oldest element a1 is in location 1, and the most recently
inserted element an−1 is in the location n − 1.
The field top in the array structure indicates the position of the top of stack.
Thus, in Fig. 6.16, top has the value n− 1. An empty stack is represented by having
top = −1. In that case, the content of array A is irrelevant, there being no elements
on the stack.
The programs for the five stack operations defined earlier in this section are
5 Thus the “top” of the stack is physically shown at the bottom of the page, an unfortunate
but quite standard convention.
310 THE LIST DATA MODEL
shown in Fig. 6.17. We pass stacks by reference to avoid having to copy large arrays
that are arguments of the functions.
L an an−1 ··· a1 •
The type definition macro we have used for list cells can as well be used for
stacks. The macro
DefCell(int, CELL, STACK);
defines stacks of integers, expanding into
typdef struct CELL *STACK;
struct CELL {
int element;
STACK next;
};
With this representation, the five operations can be implemented by the functions
in Fig. 6.18. We assume that malloc never runs out of space, which means that
the isF ull operation always returns FALSE, and the push operation never fails.
The effects of push and pop on a stack implemented as a linked list are illus-
trated in Fig. 6.19.
L a b c •
(a) List L.
L x a b c •
L b c •
Fig. 6.19. Push and pop operations on a stack implemented as a linked list.
EXERCISES
6.6.1: Show the stack that remains after executing the following sequence of oper-
ations, starting with an empty stack: push(a), push(b), pop, push(c), push(d), pop,
push(e), pop, pop.
6.6.2: Using only the five operations on stacks discussed in this section to ma-
nipulate the stack, write a C program to evaluate postfix expressions with integer
operands and the four usual arithmetic operators, following the algorithm suggested
in Example 6.9. Show that you can use either the array or the linked-list imple-
mentation with your program by defining the data type STACK appropriately and
including with your program first the functions of Fig. 6.17, and then the functions
of Fig. 6.18.
312 THE LIST DATA MODEL
✦
✦ ✦
✦
6.7 Implementing Function Calls Using a Stack
An important application of stacks is normally hidden from view: a stack is used to
allocate space in the computer’s memory to the variables belonging to the various
functions of a program. We shall discuss the mechanism used in C, although a
similar mechanism is used in almost every other programming language as well.
int fact(int n)
{
(1) if (n <= 1)
(2) return 1; /* basis */
else
(3) return n*fact(n-1); /* induction */
}
To see what the problem is, consider the simple, recursive factorial function
fact from Section 2.7, which we reproduce here as Fig. 6.20. The function has a
parameter n and a return value. As fact calls itself recursively, different calls are
active at the same time. These calls have different values of the parameter n and
produce different return values. Where are these different objects with the same
names kept?
Run-time To answer the question, we must learn a little about the run-time organization
organization associated with a programming language. The run-time organization is the plan
used to subdivide the computer’s memory into regions to hold the various data items
SEC. 6.7 IMPLEMENTING FUNCTION CALLS USING A STACK 313
Code
Static data
Stack
Heap
Figure 6.21 shows a typical subdivision of run-time memory. The first area
contains the object code for the program being executed. The next area contains
Static data the static data for the program, such as the values of certain constants and exter-
Run-time stack nal variables used by the program. The third area is the run-time stack, which
grows downward toward the higher addresses in memory. At the highest-numbered
memory locations is the heap, an area set aside for the objects that are dynamically
allocated using malloc.6
The run-time stack holds the activation records for all the currently live acti-
vations. A stack is the appropriate structure, because when we call a function, we
can push an activation record onto the stack. At all times, the currently executing
activation A1 has its activation record at the top of the stack. Just below the top of
the stack is the activation record for the activation A2 that called A1 . Below A2 ’s
activation record is the record for the activation that called A2 , and so on. When
a function returns, we pop its activation record off the top of stack, exposing the
activation record of the function that called it. That is exactly the right thing to
do, because when a function returns, control passes to the calling function.
✦ Example 6.10. Consider the skeletal program shown in Fig. 6.22. This program
is nonrecursive, and there is never more than one activation for any one function.
6 Do not confuse this use of the term “heap” with the heap data structure discussed in Section
5.9.
314 THE LIST DATA MODEL
void P();
void Q();
main() {
int x, y, z;
P(); /* Here */
}
void P();
{
int p1, p2;
Q();
}
void Q()
{
int q1, q2, q3;
···
}
When the main function starts to execute, its activation record containing the space
for the variables x, y, and z is pushed onto the stack. When function P is called,
at the place marked Here, its activation record, which contains the space for the
variables p1 and p2, is pushed onto the stack.7 When P calls Q, Q’s activation record
is pushed onto the stack. At this point, the stack is as shown in Fig. 6.23.
When Q finishes executing, its activation record is popped off the stack. At
that time, P is also finished, and so its activation record is popped. Finally, main
too is finished and has its activation record popped off the stack. Now the stack is
empty, and the program is finished. ✦
✦ Example 6.11. Consider the recursive function fact from Fig. 6.20. There
may be many activations of fact live at any one time, but each one will have an
activation record of the same form, namely
n
fact
consisting of a word for the parameter n, which is filled initially, and a word for the
return value, which we have denoted fact. The return value is not filled until the
last step of the activation, just before the return.
7 Notice that the activation record for P has two data objects, and so is of a “type” different
from that of the activation record for the main program. However, we may regard all ac-
tivation record forms for a program as variants of a single record type, thus preserving the
viewpoint that a stack has all its elements of the same type.
SEC. 6.7 IMPLEMENTING FUNCTION CALLS USING A STACK 315
x
y
z
p1
p2
q1
q2
q3
Suppose we call fact(4). Then we create one activation record, of the form
n 4
fact -
As fact(4) calls fact(3), we next push an activation record for that activation
onto the run-time stack, which now appears as
n 4
fact -
n 3
fact -
Note that there are two locations named n and two named fact. There is no
confusion, since they belong to different activations and only one activation record
can be at the top of the stack at any one time: the activation record belonging to
the currently executing activation.
n 4
fact -
n 3
fact -
n 2
fact -
n 1
fact -
Then fact(3) calls fact(2), which calls fact(1). At that point, the run-time
stack is as in Fig. 6.24. Now fact(1) makes no recursive call, but assigns f act = 1.
The value 1 is thus placed in the slot of the top activation record reserved for fact.
316 THE LIST DATA MODEL
n 4
fact -
n 3
fact -
n 2
fact -
n 1
fact 1
The other slots labeled fact are unaffected, as shown in Fig. 6.25.
Then, fact(1) returns, exposing the activation record for fact(2) and return-
ing control to the activation fact(2) at the point where fact(1) was called. The
return value, 1, from fact(1) is multiplied by the value of n in the activation record
for fact(2), and the product is placed in the slot for fact of that activation record,
as required by line (3) in Fig. 6.20. The resulting stack is shown in Fig. 6.26.
n 4
fact -
n 3
fact -
n 2
fact 2
Similarly, fact(2) then returns control to fact(3), and the activation record
for fact(2) is popped off the stack. The return value, 2, multiplies n of fact(3),
producing the return value 6. Then, fact(3) returns, and its return value multiplies
n in fact(4), producing the return value 24. The stack is now
n 4
fact 24
EXERCISES
6.7.1: Consider the C program of Fig. 6.27. The activation record for main has a
slot for the integer i. The important data in the activation record for sum is
SEC. 6.7 IMPLEMENTING FUNCTION CALLS USING A STACK 317
#define MAX 4
int A[MAX];
int sum(int i);
main()
{
int i;
int sum(int i)
{
(4) if (i > MAX)
(5) return 0;
else
(6) return A[i] + sum(i+1);
}
1. The parameter i.
2. The return value.
3. An unnamed temporary location, which we shall call temp, to store the value
of sum(i+1). The latter is computed in line (6) and then added to A[i] to form
the return value.
Show the stack of activation records immediately before and immediately after each
call to sum, on the assumption that the value of A[i] is 10i. That is, show the stack
immediately after we have pushed an activation record for sum, and just before
we pop an activation record off the stack. You need not show the contents of the
bottom activation record (for main) each time.
6.7.2*: The function delete of Fig. 6.28 removes the first occurrence of integer x
from a linked list composed of the usual cells defined by
DefCell(int, CELL, LIST);
The activation record for delete consists of the parameters x and pL. However,
since pL is a pointer to a list, the value of the second parameter in the activation
record is not a pointer to the first cell on the list, but rather a pointer to a pointer
to the first cell. Typically, an activation record will hold a pointer to the next field
of some cell. Show the sequence of stacks when delete(3,&L) is called (from some
other function) and L is a pointer to the first cell of a linked list containing elements
1, 2, 3, and 4, in that order.
✦
✦ ✦
✦
6.8 Queues
Another important ADT based on the list data model is the queue, a restricted
form of list in which elements are inserted at one end, the rear, and removed from
Front and rear the other end, the f ront. The term “FIFO (first-in first-out) list” is a synonym for
of queue queue.
The intuitive idea behind a queue is a line at a cashier’s window. People enter
the line at the rear and receive service once they reach the front. Unlike a stack,
there is fairness to a queue; people are served in the order in which they enter the
line. Thus the person who has waited the longest is the one who is served next.
Operations on a Queue
The abstract model of a queue is the same as that of a list (or a stack), but the
operations applied are special. The two operations that are characteristic of a
Enqueue and queue are enqueue and dequeue; enqueue(x) adds x to the rear of a queue, dequeue
dequeue removes the element from the front of the queue. As is true of stacks, there are
certain other useful operations that we may want to apply to queues.
Let Q be a queue whose elements are of type ETYPE, and let x be an element
of type ETYPE. We shall consider the following operations on queues:
1. clear(Q). Make the queue Q empty.
2. dequeue(Q, x). If Q is empty, return FALSE; otherwise, set x to the value of
the element at the front of Q, remove this element from Q, and return TRUE.
3. enqueue(x, Q). If Q is full, return FALSE; otherwise, add the element x to the
rear of Q and return TRUE.
4. isEmpty(Q). Return TRUE if Q is empty and FALSE otherwise.
5. isF ull(Q). Return TRUE if Q is full and FALSE otherwise.
As with stacks, we can have more “trusting” versions of enqueue and dequeue that
do not check for a full or empty queue, respectively. Then enqueue does not return
a value, and dequeue takes only Q as an argument and returns the value dequeued.
typedef struct {
LIST front, rear;
} QUEUE;
If the queue is empty, front will be NULL, and the value of rear is then irrelevant.
Figure 6.29 gives programs for the queue operations mentioned in this section.
Note that when a linked list is used there is no notion of a “full” queue, and so isF ull
returns FALSE always. However, if we used some sort of array-based implementation
of a queue, there would be the possibility of a full queue.
EXERCISES
6.8.1: Show the queue that remains after executing the following sequence of opera-
tions, starting with an empty queue: enqueue(a), enqueue(b), dequeue, enqueue(c),
enqueue(d), dequeue, enqueue(e), dequeue, dequeue.
6.8.2: Show that each of the functions in Fig. 6.29 can be executed in O(1) time,
regardless of the length of the queue.
6.8.3*: We can represent a queue by an array, provided that the queue does not
grow too long. In order to make the operations take O(1) time, we must think of
Circular array the array as circular. That is, the array A[0..n-1] can be thought of as having
A[1] follow A[0], A[2] follow A[1], and so on, up to A[n-1] following A[n-2],
but also A[0] follows A[n-1]. The queue is represented by a pair of integers front
and rear that indicate the positions of the front and rear elements of the queue.
An empty queue is represented by having f ront be the position that follows rear
in the circular sense; for example, f ront = 23 and rear = 22, or f ront = 0 and
rear = n − 1. Note that therefore, the queue cannot have n elements, or that
condition too would be represented by f ront immediately following rear. Thus the
queue is full when it has n−1 elements, not when it has n elements. Write functions
for the queue operations assuming the circular array data structure. Do not forget
to check for full and empty queues.
SEC. 6.9 LONGEST COMMON SUBSEQUENCES 321
6.8.4*: Show that if (a1 , a2 , . . . , an ) is a queue with a1 at the front, then ai was
enqueued before ai+1 , for i = 1, 2, . . . , n − 1.
✦
✦ ✦
✦
6.9 Longest Common Subsequences
This section is devoted to an interesting problem about lists. Suppose we have two
lists and we want to know what differences there are between them. This problem
appears in many different guises; perhaps the most common occurs when the two
lists represent two different versions of a text file and we want to determine which
lines are common to the two versions. For notational convenience, throughout this
section we shall assume that lists are character strings.
A useful way to think about this problem is to treat the two files as sequences
of symbols, x = a1 · · · am and y = b1 · · · bn , where ai represents the ith line of the
first file and bj represents the jth line of the second file. Thus an abstract symbol
like ai may really be a “big” object, perhaps a full sentence.
Diff command There is a UNIX command diff that compares two text files for their differ-
ences. One file, x, might be the current version of a program and the other, y,
might be the version of the program before a small change was made. We could
use diff to remind ourselves of the changes that were made turning y into x. The
typical changes that are made to a text file are
1. Inserting a line.
2. Deleting a line.
A modification of a line can be treated as a deletion followed by an insertion.
Usually, if we examine two text files in which a small number of such changes
have been made when transforming one into the other, it is easy to see which lines
correspond to which, and which lines have been deleted and which inserted. The
diff command makes the assumption that one can identify what the changes are by
LCS first finding a longest common subsequence, or LCS, of the two lists whose elements
are the lines of the two text files involved. An LCS represents those lines that have
not been changed.
Recall that a subsequence is formed from a list by deleting zero or more ele-
Common ments, keeping the remaining elements in order. A common subsequence of two lists
subsequence is a list that is a subsequence of both. A longest common subsequence of two lists
is a common subsequence that is as long as any common subsequence of the two
lists.
are no common subsequences of length five or more. That fact will follow from the
algorithm we describe next. ✦
✦ Example 6.13. Figure 6.30(a) shows one of two possible matchings between
strings abcabba and cbabac corresponding to the common subsequence baba and
Fig. 6.30(b) shows a matching corresponding to cbba. ✦
a b c a b b a a b c a b b a
c b a b a c c b a b a c
Thus let us consider any matching between prefixes (a1 , . . . , ai ) and (b1 , . . . , bj ).
There are two cases, depending on whether or not the last symbols of the two lists
are equal.
a) If ai 6= bj , then the matching cannot include both ai and bj . Thus an LCS of
(a1 , . . . , ai ) and (b1 , . . . , bj ) must be either
i) An LCS of (a1 , . . . , ai−1 ) and (b1 , . . . , bj ), or
ii) An LCS of (a1 , . . . , ai ) and (b1 , . . . , bj−1 ).
SEC. 6.9 LONGEST COMMON SUBSEQUENCES 323
If we have already found the lengths of the LCS’s of these two pairs of prefixes,
then we can take the larger to be the length of the LCS of (a1 , . . . , ai ) and
(b1 , . . . , bj ). This situation is formalized in rule (2) of the induction that follows.
b) If ai = bj , we can match ai and bj , and the matching will not interfere
with any other potential matches. Thus the length of the LCS of (a1 , . . . , ai )
and (b1 , . . . , bj ) is 1 greater than the length of the LCS of (a1 , . . . , ai−1 ) and
(b1 , . . . , bj−1 ). This situation is formalized in rule (3) of the following induction.
These observations let us give a recursive definition for L(i, j), the length of the
LCS of (a1 , . . . , ai ) and (b1 , . . . , bj ). We use complete induction on the sum i + j.
BASIS. If i + j = 0, then both i and j are 0, and so the LCS is ǫ. Thus L(0, 0) = 0.
INDUCTION. Consider i and j, and suppose we have already computed L(g, h) for
any g and h such that g + h < i + j. There are three cases to consider.
1. If either i or j is 0, then L(i, j) = 0.
2. If i > 0 and j > 0, and ai 6= bj , then L(i, j) = max L(i, j − 1), L(i − 1, j) .
3. If i > 0 and j > 0, and ai = bj , then L(i, j) = 1 + L(i − 1, j − 1).
8 Strictly speaking, we discussed only big-oh expressions that are a function of one variable.
However, the meaning here should be clear. If T (m, n) is the running time of the program
324 THE LIST DATA MODEL
Dynamic Programming
The term “dynamic programming” comes from a general theory developed by R.
E. Bellman in the 1950’s for solving problems in control systems. People who work
in the field of artificial intelligence often speak of the technique under the name
Memoing memoing or tabulation.
✦ Example 6.14. Let x be the list cbabac and y the list abcabba. Figure 6.32
shows the table constructed for these two lists. For instance, L(6, 7) is a case where
a6 6= b7 . Thus L(6, 7) is the larger of the entries just below and just to the left.
Since these are 4 and 3, respectively, we set L(6, 7), the entry in the upper right
corner, to 4. Now consider L(4, 5). Since both a4 and b5 are the symbol b, we add
1 to the entry L(3, 4) that we find to the lower left. Since that entry is 2, we set
L(4, 5) to 3. ✦
Recovery of an LCS
We now have a table giving us the length of the LCS, not only for the lists in
question, but for each pair of their prefixes. From this information we must deduce
one of the possible LCS’s for the two lists in question. To do so, we shall find the
matching pairs of elements that form one of the LCS’s. We shall find a path through
the table, beginning at the upper right corner; this path will identify an LCS.
Suppose that our path, starting at the upper right corner, has taken us to row
i and column j, the point in the table that corresponds to the pair of elements ai
on lists of length m and n, then there are constants m0 , n0 , and c such that for all m ≥ m0
and n ≥ n0 , T (m, n) ≤ cmn.
SEC. 6.9 LONGEST COMMON SUBSEQUENCES 325
c 6 0 1 2 3 3 3 3 4
a 5 0 1 2 2 3 3 3 4
b 4 0 1 2 2 2 3 3 3
a 3 0 1 1 1 2 2 2 3
b 2 0 0 1 1 1 2 2 2
c 1 0 0 0 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7
a b c a b b a
Fig. 6.32. Table of longest common subsequences for cbabac and abcabba.
c 6 0 1 2 3 3 3 3 4
a 5 0 1 2 2 3 3 3 4
b 4 0 1 2 2 2 3 3 3
a 3 0 1 1 1 2 2 2 3
b 2 0 0 1 1 1 2 2 2
c 1 0 0 0 1 1 1 1 1
0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7
a b c a b b a
✦ Example 6.15. The table of Fig. 6.32 is shown again in Fig. 6.33, with a
path shown in bold. We start with L(6, 7), which is 4. Since a6 6= b7 , we look
immediately to the left and down to find the value 4, which must appear in at least
one of these places. In this case, 4 appears only below, and so we go to L(5, 7). Now
a5 = b7 ; both are a. Thus a is the last symbol of the LCS, and we move southwest,
to L(4, 6).
326 THE LIST DATA MODEL
Since a4 and b6 are both b, we include b, ahead of a, in the LCS being formed,
and we again move southwest, to L(3, 5). Here, we find a3 6= b5 , but L(3, 5), which
is 2, equals both the entry below and the entry to the left. We have elected in this
situation to move down, so we next move to L(2, 5). There we find a2 = b5 = b,
and so we put a b ahead of the LCS being formed and move southwest to L(1, 4).
Since a1 6= b4 and only the entry to the left has the same value (1) as L(1, 4),
we move to L(1, 3). Now we have a1 = b3 = c, and so we add c to the beginning
of the LCS and move to L(0, 2). At this point, we have no choice but to move left
to L(0, 1) and then L(0, 0), and we are done. The resulting LCS consists of the
four characters we discovered, in the reverse order, or cbba. That happens to be
one of the two LCS’s we mentioned in Example 6.12. We can obtain other LCS’s
by choosing to go left instead of down when L(i, j) equals both L(i, j − 1) and
L(i − 1, j), and by choosing to go left or down when one of these equals L(i, j), even
in the situation when ai = bj (i.e., by skipping certain matches in favor of matches
farther to the left). ✦
We can prove that this path finding algorithm always finds an LCS. The state-
ment that we prove by complete induction on the sum of the lengths of the lists
is:
INDUCTION. Assume the inductive hypothesis for sums k or less, and let i + j =
k + 1. Suppose we are at L(i, j), which has value v. If ai = bj , then we find one
match and move to L(i − 1, j − 1). Since the sum (i − 1) + (j − 1) is less than i + j,
the inductive hypothesis applies. Since L(i − 1, j − 1) must be v − 1, we know that
we shall find v − 1 more elements for our LCS, which, with the one element just
found, will give us v elements. That observation proves the inductive hypothesis in
this case.
The only other case is when ai 6= bj . Then, either L(i − 1, j) or L(i, j − 1), or
both, must have the value v, and we move to one of these positions that does have
the value v. Since the sum of the row and column is i + j − 1 in either case, the
inductive hypothesis applies, and we conclude that we find v elements for the LCS.
Again we can conclude that S(k + 1) is true. Since we have considered all cases,
we are done and conclude that if we are at an entry L(i, j), we always find L(i, j)
elements for our LCS.
EXERCISES
6.9.2*: Find all the LCS’s of the pairs of lists from Exercise 6.9.1. Hint : After
building the table from Exercise 6.9.1, trace backward from the upper right corner,
following each choice in turn when you come to a point that could be explained in
two or three different ways.
6.9.3**: Suppose we use the recursive algorithm for computing the LCS that we
described first (instead of the table-filling program that we recommend). If we call
L(4, 4) with two lists having no symbols in common, how many calls to L(1, 1) are
made? Hint : Use a table-filling (dynamic programming) algorithm to compute a
table giving the value of L(i, j) for all i and j. Compare your result with Pascal’s
triangle from Section 4.5. What does this relationship suggest about a formula for
the number of calls?
6.9.4**: Suppose we have two lists x and y, each of length n. For n below a certain
size, there can be at most one string that is an LCS of x and y (although that string
may occur in different positions of x and/or y). For example, if n = 1, then the
LCS can only be ǫ, unless x and y are both the same symbol a, in which case a
is the only LCS. What is the smallest value of n for which x and y can have two
different LCS’s?
6.9.5: Show that the program of Fig. 6.31 has running time O(mn).
6.9.6: Write a C program to take a table, such as that computed by the program
of Fig. 6.31, and find the positions, in each string, of one LCS. What is the running
time of your program, if the table is m by n?
6.9.7: In the beginning of this section, we suggested that the length of an LCS and
the size of the largest matching between positions of two strings were related.
b) Prove that if two strings have a matching of length k, then they have a common
subsequence of length k.
c) Conclude from (a) and (b) that the lengths of the LCS and the greatest size of
a matching are the same.
✦
✦ ✦
✦
6.10 Representing Character Strings
Character strings are probably the most common form of list encountered in prac-
tice. There are a great many ways to represent strings, and some of these techniques
are rarely appropriate for other kinds of lists. Therefore, we shall devote this section
to the special issues regarding character strings.
First, we should realize that storing a single character string is rarely the whole
problem. Often, we have a large number of character strings, each rather short.
They may form a dictionary, meaning that we insert and delete strings from the
population as time goes on, or they may be a static set of strings, unchanging over
time. The following are two typical examples.
328 THE LIST DATA MODEL
Concordance 1. A useful tool for studying texts is a concordance, a list of all the words used in
the document and the places in which they occur. There will typically be tens
of thousands of different words used in a large document, and each occurrence
must be stored once. The set of words used is static; that is, once formed it does
not change, except perhaps if there were errors in the original concordance.
2. The compiler that turns a C program into machine code must keep track of
all the character strings that represent variables of the program. A large pro-
gram may have hundreds or thousands of variable names, especially when we
remember that two local variables named i that are declared in two functions
are really two distinct variables. As the compiler scans the program, it finds
new variable names and inserts them into the set of names. Once the com-
piler has finished compiling a function, the variables of that function are not
available to subsequent functions, and so may be deleted.
In both of these examples, there will be many short character strings. Short
words abound in English, and programmers like to use single letters such as i or x
for variables. On the other hand, there is no limit on the length of words, either in
English texts or in programs.
Character Strings in C
Character-string constants, as might appear in a C program, are stored as arrays of
Null character characters, followed by the special character ’\0’, called the null character, whose
value is 0. However, in applications such as the ones mentioned above, we need the
facility to create and store new strings as a program runs. Thus, we need a data
structure in which we can store arbitrary character strings. Some of the possibilities
are:
1. Use a fixed-length array to hold character strings. Strings shorter than the
array are followed by a null character. Strings longer than the array cannot be
Truncation stored in their entirety. They must be truncated by storing only their prefix of
length equal to the length of the array.
2. A scheme similar to (1), but assume that every string, or prefix of a truncated
string, is followed by the null character. This approach simplifies the reading
of strings, but it reduces by one the number of string characters that can be
stored.
3. A scheme similar to (1), but instead of following strings by a null character,
use another integer length to indicate how long the string really is.
4. To avoid the restriction of a maximum string length, we can store the characters
of the string as the elements of a linked list. Possibly, several characters can
be stored in one cell.
5. We may create a large array of characters in which individual character strings
are placed. A string is then represented by a pointer to a place in the array
where the string begins. Strings may be terminated by a null character or they
may have an associated length.
SEC. 6.10 REPRESENTING CHARACTER STRINGS 329
✦ Example 6.16. Consider the data structure we might use to hold one entry in
a concordance, that is, a single word and its associated information. We need to
hold
1. The word itself.
2. The number of times the word appears.
3. A list of the lines of the document in which there are one or more occurrences
of the word.
Thus we might use the following structure:
typedef struct {
char word[MAX];
int occurrences;
LIST lines;
} WORDCELL;
Here, MAX is the maximum length of a word. All WORDCELL structures have an array
called word of MAX bytes, no matter how short the word happens to be.
The field occurrences is a count of the number of times the word appears,
and lines is a pointer to the beginning of a linked list of cells. These cells are of
the conventional type defined by the macro
DefCell(int, CELL, LIST);
Each cell holds one integer, representing a line on which there are one or more
occurrences of the word in question. Note that occurrences could be larger than
the length of the list, if the word appeared several times on one line.
In Fig. 6.34 we see the structure for the word earth in the first chapter of
Genesis. We assume MAX is at least 6. The complete list of line (verse) numbers is
(1, 2, 10, 11, 12, 15, 17, 20, 22, 24, 25, 26, 28, 29, 30).
word: "earth\0"
occurrences: 20
lines: 1 2 10 ··· 30 •
Fig. 6.34. Concordance entry for the word earth in the first chapter of Genesis.
structures in a linked list, by adding a “next” field to the type WORDCELL instead.
That would be a simpler structure, but it would be less efficient if the number of
words is large. We shall see, in the next chapter, how to arrange these structures
in a hash table, which probably offers the best performance of all data structures
for this problem. ✦
e a r t h •
This scheme removes any upper limit on the length of words, but it is, in practice,
not very economical of space. The reason is that each structure of type CHARCELL
takes at least five bytes, assuming one for the character and a typical four for a
pointer to the next cell on the list. Thus, the great majority of the space is used
for the “overhead” of pointers rather than the “payload” of characters.
We can be a bit more clever, however, if we pack several bytes into the data
Packing field of each cell. For example, if we put four characters into each cell, and pointers
characters into consume four bytes, then half our space will be used for “payload,” compared with
cells 20% payload in the one-character-per-cell scheme. The only caution is that we
must have some character, such as the null character, that can serve as a string-
terminating character, as is the case for character strings stored in arrays. In
general, if CPC (characters per cell) is the number of characters that we are willing
to place in one cell, we can declare cells by
typedef struct CHARCELL *CHARSTRING;
struct CHARCELL {
char characters[CPC];
CHARSTRING next;
};
SEC. 6.10 REPRESENTING CHARACTER STRINGS 331
For example, if CPC = 4, then we could store the word earth in two cells, as
e a r t h \0 •
We could also increase CPC above 4. As we do so, the fraction of space taken
for pointers decreases, which is good; it means that the overhead of using linked
lists rather than arrays is dropping. On the other hand, if we used a very large
value for CPC, we would find that almost all words used only one cell, but that cell
would have many unused locations in it, just as an array of length CPC would.
word:
occurrences: 1377
lines: 1 ···
0 1 2 3 4 5 6 7 8
space: i n * t h e * b e ···
As in Example 6.16, the structures of Example 6.18 can be formed into data
structures such as binary search trees or linked lists by adding the appropriate
pointer fields to the WORDCELL structure. The function lt(W1 , W2 ) that compares
two WORDCELL’s W1 and W2 follows the word fields of these structures and compares
them lexicographically.
To build a concordance using such a binary search tree, we maintain a pointer
available to the first unoccupied position in the array space. Initially, available
points to space[0]. Suppose we are scanning the text for which the concordance
SEC. 6.10 REPRESENTING CHARACTER STRINGS 333
is being built and we find the next word — say, the. We do not know whether or
not the is already in the binary search tree. We thus temporarily add the* to the
position indicated by available and the three following positions. We remember
that the newly added word takes up 4 bytes.
Now we can search for the word the in the binary search tree. If found, we
add 1 to its count of occurrences and insert the current line into the list of lines.
If not found, we create a new node — which includes the fields of the WORDCELL
structure, plus left- and right-child pointers (both NULL) — and insert it into the
tree at the proper place. We set the word field in the new node to available, so
that it refers to our copy of the word the. We set occurrences to 1 and create a
list for the field lines consisting of only the current line of text. Finally, we must
add 4 to available, since the word the has now been added permanently to the
space array.
EXERCISES
6.10.1: For the structure type WORDCELL discussed in Example 6.16, write the
following programs:
a) A function create that returns a pointer to a structure of type WORDCELL.
b) A function insert(WORDCELL *pWC, int line) that takes a pointer to the
structure WORDCELL and a line number, adds 1 to the number of occurrences
for that word, and adds that line to the list of lines if it is not already there.
334 THE LIST DATA MODEL
6.10.2: Redo Example 6.17 under the assumption that any word length from 1 to
40 is equally likely; that is, 10% of the words are of length 1–4, 10% are of length
5–8, and so on, up to 10% in the range 37–40. What is the average number of bytes
required if CPC is 4, 8, . . . , 40?
6.10.3*: If, in the model of Example 6.17, all word lengths from 1 to n are equally
likely, what value of CPC, as a function of n, minimizes the number of bytes used?
If you cannot get the exact answer, a big-oh approximation is useful.
6.10.4*: One advantage of using the structure of Example 6.18 is that one can share
parts of the space array among two or more words. For example, the structure for
the word he could have word field equal to 5 in the array of Fig. 6.36. Compress
the words all, call, man, mania, maniac, recall, two, woman into as few elements
of the space array as you can. How much space do you save by compression?
6.10.7: Write a program to take two WORDCELL’s as discussed in Example 6.18 and
determine which one’s word precedes the other in lexicographic order. Recall that
words are terminated by * in this example.
✦
✦ ✦
✦
6.11 Summary of Chapter 6
✦ Linked lists and arrays are two data structures that can be used to implement
lists.
✦ Lists are a simple implementation of dictionaries, but their efficiency does not
compare with that of the binary search tree of Chapter 5 or the hash table to
be covered in Chapter 7.
✦ Placing a “sentinel” at the end of an array to make sure we find the element
we are seeking is a useful efficiency improver.
✦
✦ ✦
✦
6.12 Bibliographic Notes for Chapter 6
Knuth [1968] is still the fundamental source on list data structures. While it is
hard to trace the origins of very basic notions such as “list” or “stack,” the first
programming language to use lists as a part of its data model was IPL-V (Newell et
al. [1961]), although among the early list-processing languages, only Lisp (McCarthy
et al. [1962]) survives among the currently important languages. Lisp, by the way,
stands for “LISt Processing.”
The use of stacks in run-time implementation of recursive programs is discussed
in more detail in Aho, Sethi, and Ullman [1986].
The longest-common-subsequence algorithm described in Section 6.9 is by Wag-
ner and Fischer [1975]. The algorithm actually used in the UNIX diff command is
described in Hunt and Szymanski [1977]. Aho [1990] surveys a number of algorithms
involving the matching of character strings.
Dynamic programming as an abstract technique was described by Bellman
[1957]. Aho, Hopcroft, and Ullman [1983] give a number of examples of algorithms
using dynamic programming.
Aho, A. V. [1990]. “Algorithms for finding patterns in strings,” in Handbook of
Theoretical Computer Science Vol. A: Algorithms and Complexity (J. Van Leeuwen,
ed.), MIT Press, Cambridge, Mass.
Aho, A. V., J. E. Hopcroft, and J. D. Ullman [1983]. Data Structures and Algo-
rithms, Addison-Wesley, Reading, Mass.
Aho, A. V., R. Sethi, and J. D. Ullman [1986]. Compilers: Principles, Techniques,
and Tools, Addison-Wesley, Reading, Mass.
Bellman, R. E. [1957]. Dynamic Programming, Princeton University Press, Prince-
ton, NJ.
Hunt, J. W. and T. G. Szymanski [1977]. “A fast algorithm for computing longest
common subsequences,” Comm. ACM 20:5, pp. 350–353.
Knuth, D. E. [1968]. The Art of Computer Programming, Vol. I, Fundamental
Algorithms, Addison-Wesley, Reading, Mass.
McCarthy, J. et al. [1962]. LISP 1.5 Programmer’s Manual, MIT Computation
Center and Research Laboratory of Electronics, Cambridge, Mass.
Newell, A., F. M. Tonge, E. A. Feigenbaum, B. F. Green, and G. H. Mealy [1961].
Information Processing Language-V Manual, Prentice-Hall, Englewood Cliffs, New
Jersey.
336 THE LIST DATA MODEL
✦
✦ ✦
✦
The Set
Data Model
The set is the most fundamental data model of mathematics. Every concept in
mathematics, from trees to real numbers, is expressible as a special kind of set.
In this book, we have seen sets in the guise of events in a probability space. The
dictionary abstract data type is a kind of set, on which particular operations —
insert, delete, and lookup — are performed. Thus, it should not be surprising that
sets are also a fundamental model of computer science. In this chapter, we learn
the basic definitions concerning sets and then consider algorithms for efficiently
implementing set operations.
✦
✦ ✦
✦
7.1 What This Chapter Is About
337
338 THE SET DATA MODEL
✦
✦ ✦
✦
7.2 Basic Definitions
In mathematics, the term “set” is not defined explicitly. Rather, like terms such as
“point” and “line” in geometry, the term set is defined by its properties. Specifically,
there is a notion of membership that makes sense only for sets. When S is a set and
x is anything, we can ask the question, “Is x a member of set S?” The set S then
consists of all those elements x for which x is a member of S. The following points
summarize some important notations for talking about sets.
1. The expression x ∈ S means that the element x is a member of the set S.
2. If x1 , x2 , . . . , xn are all the members of set S, then we can write
S = {x1 , x2 , . . . , xn }
Here, each of the x’s must be distinct; we cannot repeat an element twice in a
set. However, the order in which the members of a set are listed is arbitrary.
Empty set 3. The empty set, denoted ∅, is the set that has no members. That is, x ∈ ∅ is
false, no matter what x is.
✦ Example 7.1. Let S = {1, 3, 6}; that is, let S be the set that has the integers
1, 3, and 6, and nothing else, as members. We can say 1 ∈ S, 3 ∈ S, and 6 ∈ S.
However, the statement 2 ∈ S is false, as is the statement that any other thing is a
member of S.
Sets can also have other sets as members. For example, let T = {{1, 2}, 3, ∅}.
Then T has three members. First is the set {1, 2}, that is, the set with 1 and 2 as
is sole members. Second is the integer 3. Third is the empty set. The following are
true statements: {1, 2} ∈ T , 3 ∈ T , and ∅ ∈ T . However, 1 ∈ T is false. That is,
the fact that 1 is a member of a member of T does not mean that 1 is a member of
T itself. ✦
Atoms
In formal set theory, there really is nothing but sets. However, in our informal
set theory, and in data structures and algorithms based on sets, it is convenient to
assume the existence of certain atoms, which are elements that are not sets. An
atom can be a member of a set, but nothing can be a member of an atom. It
is important to remember that the empty set, like the atoms, has no members.
However, the empty set is a set rather than an atom.
We shall generally assume that integers and lowercase letters denote atoms.
When talking about data structures, it is often convenient to use complex data
types as the types of atoms. Thus, atoms may be structures or arrays, and not be
very “atomic” at all.
{x | x ∈ S and P (x)}
or “the set of elements x in S such that x has property P .”
Set former The preceding expression is called a set former. The variable x in the set
former is local to the expression, and we could just as well have written
{y | y ∈ S and P (y)}
to describe the same set.
✦ Example 7.2. Let S be the set {1, 3, 6} from Example 7.1. Let P (x) be the
property “x is odd.” Then
{x | x ∈ S and x is odd }
is another way of defining the set {1, 3}. That is, we accept the elements 1 and 3
from S because they are odd, but we reject 6 because it is not odd.
As another example, consider the set T = {{1, 2}, 3, ∅} from Example 7.1.
Then
{A | A ∈ T and A is a set }
denotes the set {{1, 2}, ∅}. ✦
Equality of Sets
We must not confuse what a set is with how it is represented. Two sets are equal,
that is, they are really the same set, if they have exactly the same members. Thus,
most sets have many different representations, including those that explicitly enu-
merate their elements in some order and representations that use abstraction.
✦ Example 7.3. The set {1, 2} is the set that has exactly the elements 1 and 2 as
members. We can present these elements in either order, so {1, 2} = {2, 1}. There
are also many ways to express this set by abstraction. For example,
340 THE SET DATA MODEL
Infinite Sets
It is comforting to assume that sets are finite — that is, that there is some particular
integer n such that the set at hand has exactly n members. For example, the set
{1, 3, 6} has three members. However, some sets are infinite, meaning there is no
integer that is the number of elements in the set. We are familiar with infinite sets
such as
1. N, the set of nonnegative integers
2. Z, the set of nonnegative and negative integers
3. R, the set of real numbers
4. C, the set of complex numbers
From these sets, we can create other infinite sets by abstraction.
There are some subtle and interesting properties of infinite sets. We shall take
up the matter again in Section 7.11.
EXERCISES
7.2.1: What are the members of the set {{a, b}, {a}, {b, c}}?
7.2.2: Write set-former expressions for the following:
a) The set of integers greater than 1000.
b) The set of even integers.
7.2.3: Find two different representations for the following sets, one using abstrac-
tion, the other not.
a) {a, b, c}.
b) {0, 1, 5}.
SEC. 7.2 BASIC DEFINITIONS 341
Russell’s Paradox
One might wonder why the operation of abstraction requires that we designate some
other set from which the elements of the new set must come. Why can’t we just
use an expression like {x | P (x)}, for example,
{x | x is blue }
to define the set of all blue things? The reason is that allowing such a general way
to define sets gets us into a logical inconsistency discovered by Bertrand Russell and
called Russell’s paradox. We may have met this paradox informally when we heard
about the town where the barber shaves everyone who doesn’t shave himself, and
then were asked whether the barber shaves himself. If he does, then he doesn’t, and
if he doesn’t, he does. The way out of this anomaly is to realize that the statement
“shaves everyone who doesn’t shave himself,” while it looks reasonable, actually
makes no formal sense.
To understand Russell’s paradox concerning sets, suppose we could define sets
of the form {x | P (x)} for any property P . Then let P (x) be the property “x is not
a member of x.” That is, let P be true of a set x if x is not a member of itself. Let
S be the set
S = {x | x is not a member of x}
{x | x is not a member of x}
But again, that set is S, and so we conclude that S is not a member of itself.
Thus, when we start by assuming that P (S) is false, we prove that it is true,
and when we start by assuming that P (S) is true, we wind up proving that it is
false. Since we arrive at a contradiction either way, we are forced to blame the
notation. That is, the real problem is that it makes no sense to define the set S as
we did.
Another interesting consequence of Russell’s paradox is that it makes no sense
to suppose there is a “set of all elements.” If there were such a “universal set” —
Universal set say U — then we could speak of
and we would again have Russell’s paradox. We would then be forced to give up
abstraction altogether, and that operation is far too useful to drop.
342 THE SET DATA MODEL
✦
✦ ✦
✦
7.3 Operations on Sets
There are special operations that are commonly performed on sets, such as union
and intersection. You are probably familiar with many of them, but we shall re-
view the most important operations here. In the next sections we discuss some
implementations of these operations.
✦ Example 7.5. Let S be the set {1, 2, 3} and T the set {3, 4, 5}. Then
S ∪ T = {1, 2, 3, 4, 5}, S ∩ T = {3}, and S − T = {1, 2}
That is, S ∪ T contains all the elements appearing in either S or T . Although 3
appears in both S and T , there is, of course, only one occurrence of 3 in S ∪ T ,
because elements cannot appear more than once in a set. S ∩ T contains only 3,
because no other element appears in both S and T . Finally, S − T contains 1 and
2, because these appear in S and do not appear in T . The element 3 is not present
in S − T , because although it appears in S, it also appears in T . ✦
When the sets S and T are events in a probability space, the union, intersection,
and difference have a natural meaning. S ∪ T is the event that either S or T (or
both) occurs. S ∩ T is the event that both S and T occur. S − T is the event that
S, but not T occurs. However, if S is the set that is the entire probability space,
then S − T is the event “T does not occur,” that is, the complement of T .
Venn Diagrams
It is often helpful to see operations involving sets as pictures called Venn diagrams.
Figure 7.1 is a Venn diagram showing two sets, S and T , each of which is represented
by an ellipse. The two ellipses divide the plane into four regions, which we have
numbered 1 to 4.
1. Region 1 represents those elements that are in neither S nor T .
2. Region 2 represents S − T , those elements that are in S but not in T .
3. Region 3 represents S ∩ T , those elements that are in both S and T .
4. Region 4 represents T − S, those elements that are in T but not in S.
5. Regions 2, 3, and 4 combined represent S ∪ T , those elements that are in S or
T , or both.
SEC. 7.3 OPERATIONS ON SETS 343
Region 1
S T
Fig. 7.1. Regions representing Venn diagrams for the basic set operations.
What Is an Algebra?
We may think that the term “algebra” refers to solving word problems, finding
roots of polynomials, and other matters covered in a high school algebra course. To
a mathematician, however, the term algebra refers to any sort of system in which
there are operands and operators from which one builds expressions. For an algebra
to be interesting and useful, it usually has special constants and laws that allow us
to transform one expression into another “equivalent” expression.
The most familiar algebra is that in which operands are integers, reals, or per-
haps complex numbers — or variables representing values from one of these classes
— and the operators are the ordinary arithmetic operators: addition, multiplica-
tion, subtraction, and division. The constants 0 and 1 are special and satisfy laws
like x + 0 = x. In manipulating arithmetic expressions, we use laws such as the
distributive law, which lets us replace any expression of the form a × b + a × c by
an equivalent expression a × (b + c). Notice that by making this transformation, we
reduce the number of arithmetic operations by 1. Often the purpose of algebraic
manipulation of expressions, such as this one, is to find an equivalent expression
whose evaluation takes less time than the evaluation of the original.
Throughout this book, we shall meet various kinds of algebras. Section 8.7
introduces relational algebra, a generalization of the algebra of sets that we discuss
here. Section 10.5 talks about the algebra of regular expressions for describing
patterns of character strings. Section 12.8 introduces the reader to the Boolean
algebra of logic.
While we have suggested that Region 1 in Fig. 7.1 has finite extent, we should
remember that this region represents everything outside S and T . Thus, this region
is not a set. If it were, we could take its union with S and T to get the “univer-
sal set,” which we know by Russell’s paradox does not exist. Nevertheless, it is
often convenient to draw as a region the elements that are not in any of the sets
represented explicitly in the Venn diagram, as we did in Fig. 7.1.
algebra of sets in which the operators are union, intersection, and difference and the
operands are sets or variables denoting sets. Once we allow ourselves to build up
complicated expressions like R ∪ (S ∩ T )−U , we can ask whether two expressions
Equivalent are equivalent, that is, whether they always denote the same set regardless of what
expressions sets we substitute for the operands that are variables. By substituting one expres-
sion for an equivalent expression, sometimes we can simplify expressions involving
sets so that they may be evaluated more efficiently.
In what follows, we shall list the most important algebraic laws — that is,
statements asserting that one expression is equivalent to another — for union, in-
tersection, and difference of sets. The symbol ≡ is used to denote equivalence of
expressions.
In many of these algebraic laws, there is an analogy between union, intersection,
and difference of sets, on one hand, and addition, multiplication, and subtraction
of integers on the other hand. We shall, however, point out those laws that do not
have analogs for ordinary arithmetic.
a) The commutative law of union: (S ∪ T ) ≡ (T ∪ S). That is, it does not matter
which of two sets appears first in a union. The reason this law holds is simple.
The element x is in S ∪ T if x is in S or if x is in T , or both. That is exactly
the condition under which x is in T ∪ S.
b) The associative law of union: S ∪ (T ∪ R) ≡ (S ∪ T ) ∪ R . That is, the
union of three sets can be written either by first taking the union of the first
two or the last two; in either case, the result will be the same. We can justify
this law as we did the commutative law, by arguing that an element is in the
set on the left if and only if it is in the set on the right. The intuitive reason
is that both sets contain exactly those elements that are in either S, T , or R,
or any two or three of them.
The commutative and associative laws of union together tell us that we can
take the union of a collection of sets in any order. The result will always be the
same set of elements, namely those elements that are in one or more of the sets. The
argument is like the one we presented for addition, which is another commutative
and associative operation, in Section 2.4. There, we showed that all ways to group
a sum led to the same result.
c) The commutative law of intersection: (S ∩ T ) ≡ (T ∩ S). Intuitively, an
element x is in the sets S ∩ T and T ∩ S under exactly the same circumstances:
when x is in S and x is in T .
d) The associative law of intersection: S ∩ (T ∩ R) ≡ (S ∩ T ) ∩ R . Intu-
itively, x is in either of these sets exactly when x is in all three of S, T , and
R. Like addition or union, the intersection of any collection of sets may be
grouped as we choose, and the result will be the same; in particular, it will be
the set of elements in all the sets.
e) Distributive law of intersection over union: Just as we know that multiplication
distributes over addition — that is, a × (b + c) = a × b + a × c — the law
S ∩ (T ∪ R) ≡ (S ∩ T ) ∪ (S ∩ R)
holds for sets. Intuitively, an element x is in each of these sets exactly when x
is in S and also in at least one of T and R. Similarly, by the commutativity of
SEC. 7.3 OPERATIONS ON SETS 345
✦ Example 7.6. Let S = {1, 2, 3}, T = {3, 4, 5}, and R = {1, 4, 6}. Then
S ∪ (T ∩ R) = {1, 2, 3} ∪ ({3, 4, 5} ∩ {1, 4, 6})
= {1, 2, 3} ∪ {4}
= {1, 2, 3, 4}
On the other hand,
(S ∪ T ) ∩ (S ∪ R) = ({1, 2, 3} ∪ {3, 4, 5}) ∩ ({1, 2, 3} ∪ {1, 4, 6})
= {1, 2, 3, 4, 5} ∩ {1, 2, 3, 4, 6}
= {1, 2, 3, 4}
Thus, the distributive law of union over intersection holds in this case. That doesn’t
prove that the law holds in general, of course, but the intuitive argument we gave
with rule (f) should be convincing. ✦
g) Associative law of union and difference: S − (T ∪ R) ≡ (S − T ) − R . Both
sides contain an element x exactly when x is in S but in neither T nor R.
Notice that this law is analogous to the arithmetic law a − (b + c) = (a − b) − c.
h) Distributive law of difference over union: (S ∪ T )−R ≡ (S −R) ∪ (T −R) .
In justification, an element x is in either set when it is not in R, but is in either
S or T , or both. Here is another point at which the analogy with addition and
subtraction breaks down; it is not true that (a + b) − c = (a − c) + (b − c),
unless c = 0.
i) The empty set is the identity for union. That is, (S ∪ ∅) ≡ S, and by commu-
tativity of union, (∅ ∪ S) ≡ S. Informally, an element x can be in S ∪ ∅ only
when x is in S, since x cannot be in ∅.
Note that there is no identity for intersection. We might imagine that the set
of “all elements” could serve as the identity for intersection, since the intersection
of a set S with this “set” would surely be S. However, as mentioned in connection
with Russell’s paradox, there cannot be a “set of all elements.”
346 THE SET DATA MODEL
S 2 3 4 T
6
5 7
Fig. 7.2. Venn diagram showing the distributive law of intersection over union:
S ∩ (T ∪ R) consists of regions 3, 5, and 6, as does (S ∩ T ) ∪ (S ∩ R).
We can use the diagram to help us keep track of the values of various sub-
expressions. For instance, T ∪ R is regions 3, 4, 5, 6, 7, and 8. Since S is regions
2, 3, 5, and 6, it follows that S ∩ (T ∪ R) is regions 3, 5, and 6. Similarly, S ∩ T is
regions 3 and 6, while S ∩ R is regions 5 and 6. It follows that (S ∩ T ) ∪ (S ∩ R)
is the same regions 3, 5, and 6, proving that
S ∩ (T ∪ R) ≡ (S ∩ T ) ∪ (S ∩ R)
SEC. 7.3 OPERATIONS ON SETS 347
✦ Example 7.7. We shall prove the equivalence S − (S ∪ R) ≡ ∅. Let us start
with law (g), the associative law for union and difference, which is
S − (T ∪ R) ≡ (S − T ) − R
We substitute S for each of the two occurrences of T to get a new equivalence:
S − (S ∪ R) ≡ (S − S) − R
By law (l), (S − S) ≡ ∅. Thus, we may substitute ∅ for (S − S) above to get:
S − (S ∪ R) ≡ (∅ − R)
1. {1, 2} ⊆ {1, 2, 3}
2. {1, 2} ⊂ {1, 2, 3}
3. {1, 2} ⊆ {1, 2}
Note that a set is always a subset of itself but a set is never a proper subset of itself,
so that {1, 2} ⊂ {1, 2} is false. ✦
There are a number of algebraic laws involving the subset operator and the
other operators that we have already seen. We list some of them here.
o) ∅ ⊆ S for any set S.
p) If S ⊆ T , then
i) (S ∪ T ) ≡ T ,
ii) (S ∩ T ) ≡ S, and
iii) (S − T ) ≡ ∅.
STEP REASON
1) x is in S − (T ∪ R) Given
2) x is in S Definition of − and (1)
3) x is not in T ∪ R Definition of − and (1)
4) x is not in T Definition of ∪ and (3)
5) x is not in R Definition of ∪ and (3)
6) x is in S − T Definition of − with (2) and (4)
7) x is in (S − T ) − R Definition of − with (6) and (5)
Fig. 7.3. Proof of one half of the associative law for union and difference.
✦ Example 7.9. Let us prove the associative law for union and difference,
SEC. 7.3 OPERATIONS ON SETS 349
S − (T ∪ R) ≡ (S − T ) − R
We start by assuming that x is in the expression on the left. The sequence of steps
is shown in Fig. 7.3. Note that in steps (4) and (5), we use the definition of union
backwards. That is, (3) tells us that x is not in T ∪ R. If x were in T , (3) would
be wrong, and so we can conclude that x is not in T . Similarly, x is not in R.
STEP REASON
1) x is in (S − T ) − R Given
2) x is in S − T Definition of − and (1)
3) x is not in R Definition of − and (1)
4) x is in S Definition of − and (2)
5) x is not in T Definition of − and (2)
6) x is not in T ∪ R Definition of ∪ with (3) and (5)
7) x is in S − (T ∪ R) Definition of − with (4) and (6)
Fig. 7.4. Second half of the proof of the associative law for union and difference.
✦ Example 7.10. As another example, let us prove part of (p), the rule that if
S ⊆ T , then S ∪ T ≡ T . We begin by assuming that x is in S ∪ T . We know by
the definition of union that either
1. x is in S or
2. x is in T .
In case (1), since S ⊆ T is assumed, we know that x is in T . In case (2), we
immediately see that x is in T . Thus, in either case x is in T , and we have completed
the first half of the proof, the statement that (S ∪ T ) ⊆ T .
Now let us assume that x is in T . Then x is in S ∪ T by the definition of
union. Thus, T ⊆ (S ∪ T ), which is the second half of the proof. We conclude that
if S ⊆ T then (S ∪ T ) ≡ T . ✦
That is, P(S) is a set with eight members; each member is itself a set. The empty
Singleton set set is in P(S), since surely ∅ ⊆ S. The singletons — sets with one member of S,
namely, {1}, {2}, and {3} — are in P(S). Likewise, the three sets with two of the
three members of S are in P(S), and S itself is a member of P(S).
As another example, P(∅) = {∅} since ∅ ⊆ S, but for no set S besides the
empty set is S ⊆ ∅. Note that {∅}, the set containing the empty set, is not the
same as the empty set. In particular, the former has a member, namely ∅, while
the empty set has no members. ✦
BASIS. If n = 0, then S is ∅. We have already observed that P(∅) has one member.
Since 20 = 1, we have proved the basis.
We conclude that there are exactly 2 × 2n , or 2n+1 , subsets of T , half that are
subsets of S, and half that are formed from a subset of S by including an+1 . Thus,
the inductive step is proved; given that any set S of n elements has 2n subsets, we
have shown that any set T of n + 1 elements has 2n+1 subsets.
EXERCISES
7.3.1: In Fig. 7.2, we showed two expressions for the set of regions {3, 5, 6}. How-
ever, each of the regions can be represented by expressions involving S, T , and R and
the operators union, intersection, and difference. Write two different expressions for
each of the following:
a) Region 6 alone
b) Regions 2 and 4 together
c) Regions 2, 4, and 8 together
7.3.2: Use Venn diagrams to show the following algebraic laws. For each sub-
expression involved in the equivalence, indicate the set of regions it represents.
SEC. 7.4 LIST IMPLEMENTATION OF SETS 351
a) S ∪ (T ∩ R) ≡ (S ∪ T ) ∩ (S ∪ R)
b) (S ∪ T ) − R ≡ (S − R) ∪ (T − R)
c) S − (T ∪ R) ≡ (S − T ) − R
7.3.3: Show each of the equivalences from Exercise 7.3.2 by showing containment
of each side in the other.
7.3.4: Assuming S ⊆ T , prove the following by showing that each side of the
equivalence is a subset of the other:
a) (S ∩ T ) ≡ S
b) (S − T ) ≡ ∅
7.3.5*: Into how many regions does a Venn diagram with n sets divide the plane,
assuming that no set is a subset of any other? Suppose that of the n sets there is
one that is a subset of one other, but there are no other containments. Then some
regions would be empty. For example, in Fig. 7.1, if S ⊆ T , then region 2 would be
empty, because there is no element that is in S but not in T . In general, how many
nonempty regions would there be?
7.3.6: Prove that if S ⊆ T , then P(S) ⊆ P(T ).
7.3.7*: In C we can represent a set S whose members are sets by a linked list
whose elements are the headers for lists; each such list represents a set that is one
of the members of S. Write a C program that takes a list of elements representing
a set (i.e., a list in which all the elements are distinct) and returns the power set of
the given set. What is the running time of your program? Hint : Use the inductive
proof that there are 2n members in the power set of a set of n elements to devise
a recursive algorithm that creates the power set. If you are clever, you can use the
same list as part of several sets, to avoid copying the lists that represent members
of the power set, thus saving both time and space.
7.3.8: Show that
a) P(S) ∪ P(T ) ⊆ P(S ∪ T )
b) P(S ∩ T ) ⊆ P(S) ∩ P(T )
Are either (a) or (b) true if containment is replaced by equivalence?
7.3.9: What is P(P(P(∅)))?
7.3.10*: If we apply the power-set operator n times, starting with ∅, how many
members does the resulting set have? For an example, Exercise 7.3.9 is the case
n = 3.
✦
✦ ✦
✦
7.4 List Implementation of Sets
We have already seen, in Section 6.4, how to implement the dictionary operations
insert, delete, and lookup using a linked-list data structure. We also observed
there that the expected running time of these operations is O(n) if the set has n
elements. This running time is not as good as the O(log n) average time taken
for the dictionary operations using a balanced binary search tree data structure,
as in Section 5.8. On the other hand, as we shall see in Section 7.6, a linked-list
352 THE SET DATA MODEL
(1) copy S to U ;
(2) for (each x in T )
(3) if (!lookup(x, S))
(4) insert(x, U );
Fig. 7.5. Pseudocode sketch of the algorithm for taking the union
of sets represented by unsorted lists.
Suppose S has n members and T has m members. The operation in line (1),
copying S to U , can easily be accomplished in O(n) time. The lookup of line (3)
takes O(n) time. We only execute the insertion of line (4) if we know from line (3)
that x is not in S. Since x can only appear once on the list for T , we know that x is
not yet in U . Therefore, it is safe to place x at the front of U ’s list, and line (4) can
be accomplished in O(1) time. The for-loop of lines (2) through (4) is iterated m
times, and its body takes time O(n). Thus, the time for lines (2) to (4) is O(mn),
which dominates the O(n) for line (1).
There are similar algorithms for intersection and difference, each taking O(mn)
time. We leave these algorithms for the reader to design.
T while creating their union. Instead, we must make copies of all elements to form
the union.
We assume that the types LIST and CELL are defined as before, by the macro
DefCell(int, CELL, LIST);
The function setUnion is shown in Fig. 7.6. It makes use of an auxiliary function
assemble(x, L, M ) that creates a new cell at line (1), places element x in that
cell at line (2), and calls setUnion at line (3) to take the union of the lists L
and M . Then assemble returns a cell for x followed by the list that results from
applying setUnion to L and M . Note that the functions assemble and setUnion
are mutually recursive; each calls the other.
Function setUnion selects the least element from its two given sorted lists and
passes to assemble the chosen element and the remainders of the two lists. There
are six cases for setUnion, depending on whether or not one of its lists is NULL, and
if not, which of the two elements at the heads of the lists precedes the other.
1. If both lists are NULL, setUnion simply returns NULL, ending the recursion.
This case is lines (5) and (6) of Fig. 7.6.
2. If L is NULL and M is not, then at lines (7) and (8) we assemble the union by
taking the first element from M , followed by the “union” of the NULL list with
the tail of M . Note that, in this case, successive calls to setUnion result in M
being copied.
3. If M is NULL but L is not, then at lines (9) and (10) we do the opposite,
assembling the answer from the first element of L and the tail of L.
4. If the first elements of L and M are the same, then at lines (11) and (12) we
assemble the answer from one copy of this element, referred to as L->element,
and the tails of L and M .
5. If the first element of L precedes that of M , then at lines (13) and (14) we
assemble the answer from this smallest element, the tail of L, and the entire
list M .
6. Symmetrically, at lines (15) and (16), if M has the smallest element, then we
assemble the answer from that element, the entire list L, and the tail of M .
✦ Example 7.12. Suppose S is {1, 3, 6} and T is {5, 3}. The sorted lists rep-
resenting these sets are L = (1, 3, 6) and M = (3, 5). We call setUnion(L, M ) to
take the union. Since the first element of L, which is 1, precedes the first element
of M , which is 3, case (5) applies, and we assemble the answer from 1, the tail
of L, which we shall call L1 = (3, 6), and M . Function assemble(1, L1, M ) calls
setUnion(L1 , M ) at line (3), and the result is the list with first element 1 and tail
equal to whatever the union is.
This call to setUnion is case (4), where the two leading elements are equal;
both are 3 here. Thus, we assemble the union from one copy of element 3 and the
tails of the lists L1 and M . These tails are L2 , consisting of only the element 6, and
M1 , consisting of only the element 5. The next call is setUnion(L2 , M1 ), which is an
instance of case (6). We thus add 5 to the union and call setU nion(L2 , NULL). That
is case (3), generating 6 for the union and calling setU nion(NULL, NULL). Here, we
354 THE SET DATA MODEL
have case (1), and the recursion ends. The result of the initial call to setUnion is
the list (1, 3, 5, 6). Figure 7.7 shows in detail the sequence of calls and returns made
on this example data. ✦
Notice that the list generated by setUnion always comes out in sorted order.
We can see why the algorithm works, by observing that whichever case applies, each
element in lists L or M is either copied to the output, by becoming the first param-
eter in a call to assemble, or remains on the lists that are passed as parameters in
the recursive call to setUnion.
call setU nion (1, 3, 6), (3, 5)
call assemble 1, (3, 6), (3, 5)
call setU nion (3, 6), (3, 5)
call assemble 3, (6), (5)
call setU nion (6), (5)
call assemble 5, (6), NULL
call setU nion (6), NULL)
call assemble 6, NULL, NULL
call setU nion(NULL, NULL)
return NULL
return (6)
return (6)
return (5,6)
return (5,6)
return (3,5,6)
return (3,5,6)
return (1,3,5,6)
return (1,3,5,6)
by setUnion is O(m + n). To see why, note that calls to assemble spend O(1) time
creating a cell for the output list and then calling setUnion on the remaining lists.
Thus, the calls to assemble in Fig. 7.6 can be thought of as costing O(1) time plus
the time for a call to setUnion on lists the sum of whose lengths is either one less
than that of L and M , or in case (4), two less. Further, all the work in setUnion,
exclusive of the call to assemble, takes O(1) time.
It follows that when setUnion is called on lists of total length m+n, it will result
in at most m + n recursive calls to setUnion and an equal number to assemble.
Each takes O(1) time, exclusive of the time taken by the recursive call. Thus, the
time to take the union is O(m + n), that is, proportional to the sum of the sizes of
the sets.
This time is less than that of the O(mn) time needed to take the union of sets
represented by unsorted lists. In fact, if the lists for our sets are not sorted, we
can sort them in O(n log n + m log m) time, and then take the union of the sorted
lists. Since n log n dominates n and m log m dominates m, we can express the total
356 THE SET DATA MODEL
cost of sorting and taking the union as O(n log n + m log m). That expression can
be greater than O(mn), but is less whenever n and m are close in value — that is,
whenever the sets are approximately the same size. Thus, it usually makes sense to
sort before taking the union.
EXERCISES
7.4.1: Write C programs for taking the (a) union, (b) intersection, and (c) difference
of sets represented by unsorted lists.
7.4.2: Modify the program of Fig. 7.6 so that it takes the (a) intersection and (b)
difference of sets represented by sorted lists.
SEC. 7.5 CHARACTERISTIC-VECTOR IMPLEMENTATION OF SETS 357
7.4.3: The functions assemble and setUnion from Fig. 7.6 leave the lists whose
union they take intact; that is, they make copies of elements rather than use the
cells of the given lists themselves. Can you simplify the program by allowing it to
destroy the given lists as it takes their union?
7.4.4*: Prove by induction on the sum of the lengths of the lists given as parameters
that setUnion from Fig. 7.6 returns the union of the given lists.
Symmetric 7.4.5*: The symmetric difference of two sets S and T is (S − T ) ∪ (T − S), that
difference is, the elements that are in exactly one of S and T . Write a program to take the
symmetric difference of two sets that are represented by sorted lists. Your program
should make one pass through the lists, like Fig. 7.6, rather than call routines for
union and difference.
7.4.6*: We analyzed the program of Fig. 7.6 informally by arguing that if the total
of the lengths of the lists was n, there were O(n) calls to setUnion and assemble
and each call took O(1) time plus whatever time the recursive call took. We can
formalize this argument by letting TU (n) be the running time for setUnion and
TA (n) be the running time of assemble on lists of total length n. Write recursive
rules defining TU and TA in terms of each other. Substitute to eliminate TA , and
set up a conventional recurrence for TU . Solve that recurrence. Does it show that
setUnion takes O(n) time?
✦
✦ ✦
✦
7.5 Characteristic-Vector Implementation of Sets
Frequently, the sets we encounter are each subsets of some small set U , which we
shall refer to as the “universal set.”1 For example, a hand of cards is a subset of the
set of all 52 cards. When the sets with which we are concerned are each subsets of
some small set U , there is a representation of sets that is much more efficient than
the list representation discussed in the previous section. We order the elements of U
in some way so that each element of U can be associated with a unique “position,”
which is an integer from 0 up to n − 1, where n is the number of elements in U .
Then, given a set S that is contained in U , we can represent S by a charac-
teristic vector of 0’s and 1’s, such that for each element x of U , if x is in S, the
position corresponding to x has a 1, and if x is not in S, then that position has a 0.
✦ Example 7.13. Let U be the set of cards. We may order the cards any way we
choose, but one reasonable scheme is to order them by suits: clubs, then diamonds,
then hearts, then spades. Then, within a suit, we order the cards ace, 2, 3, . . . , 10,
jack, queen, king. For instance, the position of the ace of clubs is 0, the king of
clubs is 12, the ace of diamonds is 13, and the jack of spades is 49. A royal flush in
hearts is represented by the characteristic vector
0000000000000000000000000010000000011110000000000000
1 Of course U cannot be a true universal set, or set of all sets, which we argued does not exist
because of Russell’s paradox.
358 THE SET DATA MODEL
The first 1, in position 26, represents the ace of hearts; and the other four 1’s, in
positions 35 through 38, represent the 10, jack, queen, and king of hearts.
The set of all clubs is represented by
1111111111111000000000000000000000000000000000000000
The type BOOLEAN is as described in Section 1.6. To insert the element corresponding
to position i into a set S declared to be of type USET, we have only to execute
S[i] = TRUE;
If we want to look up this element, we have only to return the value S[i], which
tells us whether the ith element is present in S or not.
Note that each of the dictionary operations insert, delete, and lookup thus
takes O(1) time, when sets are represented by characteristic vectors. The only
disadvantage to this technique is that all sets must be subsets of some universal set
U . Moreover, the universal set must be small; otherwise, the length of the arrays
becomes so large that we cannot store them conveniently. In fact, since we shall
normally have to initialize all elements of the array for a set S to TRUE or FALSE,
the initialization of any subset of U (even ∅) must take time proportional to the
size of U . If U had a large number of elements, the time to initialize a set could
dominate the cost of all other operations.
To form the union of two sets that are subsets of a common universal set of
n elements, and that are represented by characteristic vectors S and T , we define
another characteristic vector R to be the bitwise OR of the characteristic vectors S
and T :
R[i] = S[i] || T[i], for 0 ≤ i < n
Similarly, we can make R represent the intersection of S and T by taking the bitwise
AND of S and T :
R[i] = S[i] && T[i], for 0 ≤ i < n
The arrays representing characteristic vectors and the Boolean operations on them
can be implemented using the bitwise operators of C if we define the type BOOLEAN
appropriately. However, the code is machine specific, and so we shall not present
any details here. A portable (but more space consuming) implementation of char-
acteristic vectors can be accomplished with arrays of int’s of the appropriate size,
and this is the definition of BOOLEAN that we have assumed.
✦ Example 7.14. Let us consider sets of apple varieties. Our universal set will
consist of the six varieties listed in Fig. 7.9; the order of their listing indicates their
position in characteristic vectors.
Notice that the time to perform union, intersection, and difference using char-
acteristic vectors is proportional to the length of the vectors. That length is not
directly related to the size of the sets, but is equal to the size of the universal set.
If the sets have a reasonable fraction of the elements in the universal set, then the
time for union, intersection, and difference is proportional to the sizes of the sets
involved. That is better than the O(n log n) time for sorted lists, and much better
than the O(n2 ) time for unsorted lists. However, the drawback of characteristic
360 THE SET DATA MODEL
vectors is that, should the sets be much smaller than the universal set, the running
time of these operations can be far greater than the sizes of the sets involved.
EXERCISES
7.5.1: Give the characteristic vectors of the following sets of cards. For convenience,
you can use 0k to represent k consecutive 0’s and 1k for k consecutive 1’s.
7.5.2: Using bitwise operators, write C programs to compute the (a) union and (b)
difference of two sets of cards, the first represented by words a1 and a2, the second
represented by b1 and b2.
✦
✦ ✦
✦
7.6 Hashing
1. It is no longer enough just to put TRUE in a cell, because we won’t know which
of the 100 million possible words are actually present in the dictionary, or if in
fact more than one word in any one group is present.
SEC. 7.6 HASHING 361
2. If, for example, the first 100 million possible words include all the short words,
then we would expect many more than the average number of words from the
dictionary to fall into this group of possible words. Note that our arrangement
has as many cells of the array as there are words in the dictionary, and so we
expect the average cell to represent one word; but surely there are in English
many thousands of words in the first group, which would include all the words
of up to five letters, and some of the six-letter words.
To solve problem (1), we need to list, in each cell of the array, all the words
in its group that are present in the dictionary. That is, the array cell becomes
the header of a linked list with these words. To solve problem (2), we need to
be careful how we assign potential words to groups. We must distribute elements
among groups so that it is unlikely (although never impossible) that there will be
many elements in a single group. Note that if there are a large number of elements
in a group, and we represent groups by linked lists, then lookup will be very slow
for members of a large group.
headers
0
1
···
h
x h(x) • a1 a2 ··· an •
B−1
HASHTABLE headers;
the array headers is of the appropriate type to contain the bucket headers for our
hash table.
int h(ETYPE x)
{
int i, sum;
sum = 0;
for (i = 0; x[i] != ’\0’; i++)
sum += x[i];
return sum % B;
}
Fig. 7.11. A hash function that sums the integer equivalents of characters,
assuming ETYPE is an array of characters.
Now, we must define the hash function h. The code for this function is shown
in Fig. 7.11. The integer equivalent of each of the characters of the string x is
summed in the variable sum. The last step computes and returns as the value of
the hash function h the remainder of this sum when it is divided by the number of
buckets B.
Let us consider some examples of words and the buckets into which the function
h puts them. We shall enter into the hash table the seven words3
anyone lived in a pretty how town
In order to compute h(anyone), we need to understand the integer values of char-
acters. In the usual ASCII code for characters, the lower-case letters have integer
values starting at 97 for a (that’s 1100001 in binary), 98 for b, and so on, up to
122 for z. The upper-case letters correspond to integers that are 32 less than their
lower-case equivalents — that is, from 65 for A (1000001 in binary) to 90 for Z.
Thus, the integer equivalents for the characters in anyone are 97, 110, 121, 111,
110, 101. The sum of these is 650. When we divide by B, which is 5, we get the
remainder 0. Thus, anyone belongs in bucket 0. The seven words of our example
are assigned, by the hash function of Fig. 7.11, to the buckets indicated in Fig. 7.12.
We see that three of the seven words have been assigned to one bucket, number
0. Two words are assigned to bucket 2, and one each to buckets 1 and 4. That is
somewhat less even a distribution than would be typical, but with a small number of
words and buckets, we should expect anomalies. As the number of words becomes
large, they will tend to distribute themselves among the five buckets approximately
evenly. The hash table, after insertion of these seven words, is shown in Fig. 7.13. ✦
3 The words are from a poem of the same name by e. e. cummings. The poem doesn’t get any
easier to decode. The next line is “with up so floating many bells down.”
364 THE SET DATA MODEL
headers
0 anyone in pretty •
1 town •
2 lived a •
3 •
4 how •
#include <string.h>
etInsert after first finding the element of the array that is the header for the
appropriate bucket, h(x). We assume that the hash function h is defined elsewhere.
Also recall that the type HASHTABLE means that H is an array of pointers to cells
(i.e., an array of lists).
✦ Example 7.16. Suppose we wish to delete the element in from the hash table
of Fig. 7.13, assuming the hash function described in Example 7.15. The delete
operation is carried out essentially like the function insert of Fig. 7.14. We compute
h(in), which is 0. We thus go to the header for bucket number 0. The second cell
on the list for this bucket holds in, and we delete that cell. The detailed C program
is left as an exercise. ✦
and the dictionary operations on a hash table take O(1) time each, on the average,
just as when we use a characteristic-vector representation. If we try to do better
by making B much larger than n, so that most buckets are empty, it still takes
us O(1) time to find the bucket header, and so the running time does not improve
significantly once B becomes larger than n.
We must also consider that in some circumstances it may not be possible to
keep B close to n all the time. If the set is growing rapidly, then n increases while
Restructuring B remains fixed, so that ultimately n/B becomes large. It is possible to restructure
hash tables the hash table by picking a larger value for B and then inserting each of the elements
into the new table. It takes O(n) time to do so, but that time is no greater than
the O(n) time that must be spent inserting the n elements into the hash table in
the first place. (Note that n insertions, at O(1) average time per insertion, require
O(n) time in all.)
EXERCISES
7.6.1: Continue filling the hash table of Fig. 7.13 with the words with up so
floating many bells down.
7.6.2*: Comment on how effective the following hash functions would be at dividing
typical sets of English words into buckets of roughly equal size:
a) Use B = 10, and let h(x) be the remainder when the length of the word x is
divided by 10.
b) Use B = 128, and let h(x) be the integer value of the last character of x.
c) Use B = 10. Take the sum of the values of the characters in x. Square the
result, and take the remainder when divided by 10.
7.6.3: Write C programs for performing (a) delete and (b) lookup in a hash table,
using the same assumptions as for the code in Fig. 7.14.
✦
✦ ✦
✦
7.7 Relations and Functions
While we have generally assumed that elements of sets are atomic, in practice it is
often useful to give elements some structure. For example, in the previous section we
talked about elements that were character strings of length 32. Another important
structure for elements is fixed-length lists, which are similar to C structures. Lists
Tuple, used as set elements will be called tuples, and each list element is called a component
component of the tuple.
The number of components a tuple has is called its arity. For example, (a, b)
is a tuple of arity 2; its first component is a and its second component is b. A tuple
of arity k is also called a k-tuple.
A set of elements, each of which is a tuple of the same arity, — say, k — is
Arity: unary, called a relation. The arity of this relation is k. A tuple or relation of arity 1 is
binary unary. If the arity is 2, it is binary, and in general, if the arity is k, then the tuple
or relation is k-ary.
SEC. 7.7 RELATIONS AND FUNCTIONS 367
✦ Example 7.17. The relation R = {(1, 2), (1, 3), (2, 2)} is a relation of arity
2, or a binary relation. Its members are (1, 2), (1, 3), and (2, 2), each of which is a
tuple of arity 2. ✦
In this section, we shall consider primarily binary relations. There are also
many important applications of nonbinary relations, especially in representing and
manipulating tabular data (as in relational databases). We shall discuss this topic
extensively in Chapter 8.
Cartesian Products
The product is sometimes called the Cartesian product, after the French mathe-
matician René Descartes.
✦ Example 7.18. Recall that Z is the conventional symbol for the set of all
integers. Thus, Z × Z stands for the set of pairs of integers.
As another example, if A is the two-element set {1, 2} and B is the three-
element set {a, b, c}, then A × B is the six-element set
{(1, a), (1, b), (1, c), (2, a), (2, b), (2, c)}
Note that the product of sets is aptly named, because if A and B are finite sets,
then the number of elements in A × B is the product of the number of elements in
A and the number of elements in B. ✦
Unlike the arithmetic product, the Cartesian product does not have the common
properties of commutativity or associativity. It is easy to find examples where
A × B 6= B × A
disproving commutativity. The associative law doesnot even make sense, because
(A×B)×C would have as members pairs like (a, b), c , while members of A×(B×C)
would be pairs of the form a, (b, c) .
k -way product Since we shall need on several occasions to talk about sets of tuples with more
of sets than two components, we need to extend the product notation to a k-way product.
We let A1 × A2 × · · · × Ak stand for the product of sets A1 , A2 , . . . , Ak , that is, the
set of k-tuples (a1 , a2 , . . . , ak ) such that a1 ∈ A1 , a2 ∈ A2 , . . . , and ak ∈ Ak .
368 THE SET DATA MODEL
Binary Relations
A binary relation R is a set of pairs that is a subset of the product of two sets A
and B. If a relation R is a subset of A × B, we say that R is from A to B. We call
Domain, range A the domain and B the range of the relation. If B is the same as A, we say that
R is a relation on A or “on the domain” A.
✦ Example 7.21. The same notation can be used for arbitrary binary relations.
For instance, the relation R from Example 7.17 can be written as the three “facts”
1R2, 1R3, and 2R2. ✦
SEC. 7.7 RELATIONS AND FUNCTIONS 369
✦ Example 7.22. The graph for the relation R from Example 7.17 is shown in
Fig. 7.15. It has nodes for the elements 1, 2, and 3. Since 1R2, there is an arc from
node 1 to node 2. Since 1R3, there is an arc from 1 to 3, and since 2R2, there is an
arc from node 2 to itself. There are no other arcs, because there are no other pairs
in R. ✦
1 2
Fig. 7.15. Graph for the relation {(1, 2), (1, 3), (2, 2)}.
370 THE SET DATA MODEL
Functions
Suppose a relation R, from domain A to range B, has the property that for every
member a of A there is at most one element b in B such that aRb. Then R is said
Partial function to be a partial function from domain A to range B.
If for every member a of A there is exactly one element b in B such that aRb,
Total function then R is said to be a total function from A to B. The difference between a partial
function and a total function is that a partial function can be undefined on some
elements of its domain; for example, for some a in A, there may be no b in B such
that aRb. We shall use the term “function” to refer to the more general notion of
a partial function, but whenever the distinction between a partial function and a
total function is important, we shall use the word “partial.”
There a common notation used to describe functions. We often write R(a) = b
if b is the unique element such that aRb.
Notice that the set-theoretic notion of a function is not much different from the
notion of a function that we encountered in C. That is, suppose s is a C function
declared as
int s(int a)
{
return a*a;
}
that takes an integer and returns its square. We usually think of s(a) as being the
same function as S(a), although the former is a way to compute squares and the
latter only defines the operation of squaring abstractly. Also note that in practice
s(a) is always a partial function, since there are many values of a for which s(a)
will not return an integer because of the finiteness of computer arithmetic.
C has functions that take more than one parameter. A C function f that takes
two integer parameters a and b, returning an integer, is a function from Z × Z to
Z. Similarly, if the two parameters are of types that make them belong to sets A
and B, respectively, and f returns a member of type C, then f is a function from
A×B to C. More generally, if f takes k parameters — say, from sets A1 , A2 , . . . , Ak ,
respectively — and returns a member of set B, then we say that f is a function
from A1 × A2 × · · · × Ak to B.
For example, we can regard the function lookup(x,L) from Section 6.4 as a
function from Z × L to {TRUE, FALSE}. Here, L is the set of linked lists of integers.
Formally, a function from domain A1 × · · · × Ak to range B is a set of pairs of
the form (a1 , . . . , ak ), b , where each ai is in set Ai and b is in B. Notice that the
first element of the pair is itself a k-tuple. For example, the function lookup(x,L)
discussed above can be thought of as the set of pairs (x, L), t , where x is an
SEC. 7.7 RELATIONS AND FUNCTIONS 371
(x, L) lookup t
One-to-One Correspondences
Let F be a partial function from domain A to range B with the following properties:
1. For every element a in A, there is an element b in B such that F (a) = b.
2. For every b in B, there is some a in A such that F (a) = b.
3. For no b in B are there two elements a1 and a2 in A such that F (a1 ) and F (a2 )
are both b.
Then F is said to be a one-to-one correspondence from A to B. The term bijection
is also used for a one-to-one correspondence.
Property (1) says that F is a total function from A to B. Property (2) is the
condition of being onto: F is a total function from A onto B. Some mathematicians
Surjection use the term surjection for a total function that is onto.
Properties (2) and (3) together say that F behaves like a total function from
Injection B to A. A total function with property (3) is sometimes called an injection.
A one-to-one correspondence is basically a total function in both directions, but
it is important to observe that whether F is a one-to-one correspondence depends
not only on the pairs in F , but on the declared domain and range. For example, we
could take any one-to-one correspondence from A to B and change the domain by
372 THE SET DATA MODEL
··· −2 −1 0 1 2 ···
Fig. 7.17. Graph for the relation that is the function P (a) = a + 1.
EXERCISES
7.7.1: Give an example of sets A and B for which A × B is not the same as B × A.
7.7.2: Let R be the relation defined by aRb, bRc, cRd, aRc, and bRd.
a) Draw the graph of R.
b) Is R a function?
c) Name two possible domains for R; name two possible ranges.
d) What is the smallest set S such that R is a relation on S (i.e., the domain and
the range can both be S)?
7.7.3: Let T be a tree and let S be the set of nodes of T . Let R be the “child-
parent” relation; that is, cRp if and only if c is a child of p. Answer the following,
and justify your answers:
a) Is R a partial function, no matter what tree T is?
b) Is R a total function from S to S no matter what T is?
c) Can R ever be a one-to-one correspondence (i.e., for some tree T )?
d) What does the graph for R look like?
7.7.4: Let R be the relation on the set of integers {1, 2, . . . , 10} defined by aRb if a
and b are distinct and have a common divisor other than 1. For example, 2R4 and
6R9, but not 2R3.
a) Draw the graph for R.
b) Is R a function? Why or why not?
7.7.5*: Although we observed that S = (A × B) × C and T = A × (B × C) are
not the same set, we can show that they are “essentially the same” by
exhibiting a
natural one-to-one correspondence between them. For each (a, b), c in S, let
F (a, b), c = a, (b, c)
✦
✦ ✦
✦
7.8 Implementing Functions as Data
In a programming language, functions are usually implemented by code, but when
their domain is small, they can be implemented using techniques quite similar to
the ones we used for sets. In this section we shall discuss the use of linked lists,
characteristic vectors, and hash tables to implement finite functions.
374 THE SET DATA MODEL
Operations on Functions
The operations we most commonly perform on functions are similar to those for
dictionaries. Suppose F is a function from domain set A to range set B. Then we
may
1. Insert a new pair (a, b), such that F (a) = b. The only nuance is that, since
F must be a function, should there already be a pair (a, c) for any c, this pair
must be replaced by (a, b).
2. Delete the value associated with F (a). Here, we need to give only the domain
value a. If there is any b such that F (a) = b, the pair (a, b) is removed from
the set. If there is no such pair, then no change is made.
3. Lookup the value associated with F (a); that is, given domain value a, we return
the value b such that F (a) = b. If there is no such pair (a, b) in the set, then
we return some special value warning that F (a) is undefined.
✦ Example 7.25. Suppose F consists of the pairs {(3, 9), (−4, 16), (0, 0)}; that
is, F (3) = 9; F (−4) = 16, and F (0) = 0. Then lookup(3) returns 9, and lookup(2)
returns a value indicating that no value is defined for F (2). If F is the “squaring”
function, the value −1 might be used to indicate a missing value, since −1 is not
the true square of any integer.
The operation delete(3) removes the pair (3, 9), while delete(2) has no effect.
If we execute insert(5, 25), the pair (5, 25) is added to the set F , or equivalently,
we now have F (5) = 25. If we execute insert(3, 10), the old pair (3, 9) is removed
from F , and the new pair (3, 10) is added to F , so that now F (3) = 10. ✦
SEC. 7.8 IMPLEMENTING FUNCTIONS AS DATA 375
A function, being a set of pairs, can be stored in a linked list just like any other set.
It is useful to define cells with three fields, one for the domain value, one for the
range value, and one for a next-cell pointer. For example, we could define cells as
typedef struct CELL *LIST;
struct CELL {
DTYPE domain;
RTYPE range;
LIST next;
};
where DTYPE is the type for domain elements and RTYPE is the type for range
elements. Then a function is represented by a pointer to (the first cell of) a linked
list.
The function in Fig. 7.18 performs the operation insert(a, b, L), assuming that
DTYPE and RTYPE are both arrays of 32 characters. We search for a cell containing
a in the domain field. If found, we set its range field to b. If we reach the end of
the list, we create a new cell and store (a, b) therein. Otherwise, we test whether
the cell has domain element a. If so, we change the range value to b, and we are
done. If the domain has a value other than a, then we recursively insert into the
tail of the list.
Fig. 7.18. Inserting a new fact into a function represented as a linked list.
If the function F has n pairs, then insert takes O(n) time on the average.
Likewise, the analogous delete and lookup functions for a function represented as a
linked list require O(n) time on the average.
376 THE SET DATA MODEL
✦ Example 7.26. Suppose we want to store information about apples, like the
harvest information of Fig. 7.9, but we now want to give the actual month of harvest,
rather than the binary choice early/late. We can associate an integer constant with
each element in the domain and range by defining the enumeration types
enum APPLES {Delicious, GrannySmith, Jonathan, McIntosh,
Gravenstein, Pippin};
enum MONTHS {Unknown, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug,
Sep, Oct, Nov, Dec};
This declaration associates 0 with the identifier Delicious, 1 with GrannySmith,
and so on. It also associates 0 with Unknown, 1 with Jan, and so on. The identifier
Unknown indicates that the harvest month is not known. We can now declare an
array
int Harvest[6];
with the intention that the array Harvest represents the set of pairs in Fig. 7.19.
Then the array Harvest appears as in Fig. 7.20, where, for example, the entry
Harvest[Delicious] = Oct means Harvest[0] = 10. ✦
Delicious Oct
GrannySmith Aug
Jonathan Sep
McIntosh Oct
Gravenstein Sep
Pippin Nov
✦ Example 7.27. Let us use the same data about apples that appeared in Exam-
ple 7.26, except now we shall use the actual names rather than integers to represent
the domain. To represent the function Harvest, we shall use a hash table with
five buckets. We shall define APPLES to be 32-character arrays, while MONTHS is an
enumeration as in Example 7.26. The buckets are linked lists with field variety
for a domain element of type APPLES, field harvested for a range element of type
int (a month), and a link field next to the next element of the list.
We shall use a hash function h similar to that shown in Fig. 7.11 of Section
7.6. Of course, h is applied to domain elements only — that is, to character strings
of length 32, consisting of the name of an apple variety.
Now, we can define the type HASHTABLE as an array of B LIST’s. B is the
number of buckets, which we have taken to be 5. All these declarations appear in
the beginning of Fig. 7.22. We may then declare a hash table Harvest to represent
the desired function.
Harvest
GrannySmith McIntosh
0 •
Aug Oct
1 •
2 •
Gravenstein
3 •
Sep
Fig. 7.21. Apples and their harvest months stored in a hash table.
After inserting the six apple varieties listed in Fig. 7.19, the arrangement of cells
within buckets is shown in Fig. 7.21. For example, the word Delicious yields the
sum 929 if we add up the integer values of the nine characters. Since the remainder
378 THE SET DATA MODEL
when 929 is divided by 5 is 4, the Delicious apple belongs in bucket 4. The cell
for Delicious has that string in the variety field, the month Oct in the harvested
field, and a pointer to the next cell of the bucket. ✦
#include <string.h>
#define B 5
that is, the month in which apple variety a is harvested. If the month is undefined,
it returns the value Unknown.
EXERCISES
7.8.1: Write functions that perform (a) delete and (b) lookup on functions repre-
sented by linked lists, analogous to the insert function of Fig. 7.18.
7.8.2: Write functions that perform (a) insert, (b) delete, and (c) lookup on a
function represented by a vector, that is, an array of RTYPE’s indexed by integers
representing DTYPE’s.
7.8.3: Write functions that perform (a) insert and (b) delete on functions repre-
sented by hash tables, analogous to the lookup function of Fig. 7.22.
7.8.4: A binary search tree can also be used to represent functions as data. Define
appropriate data structures for a binary search tree to hold the apple information in
Fig. 7.19, and implement (a) insert, (b) delete, and (c) lookup using these structures.
7.8.5: Design an information retrieval system to keep track of information about
at bats and hits for baseball players. Your system should accept triples of the form
Ruth 5 2
380 THE SET DATA MODEL
to indicate that Ruth in 5 at bats got 2 hits. The entry for Ruth should be updated
appropriately. You should also be able to query the number of at bats and hits
for any player. Implement your system so that the functions insert and lookup will
work on any data structure as long as they use the proper subroutines and types.
✦
✦ ✦
✦
7.9 Implementing Binary Relations
The implementation of binary relations differs in some details from the implemen-
tation of functions. Recall that both binary relations and functions are sets of pairs,
but a function has for each domain element a at most one pair of the form (a, b)
for any b. In contrast, a binary relation can have any number of range elements
associated with a given domain element a.
In this section, we shall first consider the meaning of insertion, deletion, and
lookup for binary relations. Then we see how the three implementations we have
been using — linked lists, characteristic vectors, and hash tables — generalize to
binary relations. In Chapter 8, we shall discuss implementation of relations with
more than two components. Frequently, data structures for such relations are built
from the structures for functions and binary relations.
✦ Example 7.28. Most varieties of plums require one of several other specific
varieties for pollination; without the appropriate “pollinizer,” the tree cannot bear
fruit. A few varieties are “self-fertile”: they can serve as their own pollinizer. Figure
7.23 shows a binary relation on the set of plum varieties. A pair (a, b) in this relation
means that variety b is a pollinizer for variety a.
Inserting a pair into this table corresponds to asserting that one variety is a
pollinizer for another. For example, if a new variety is developed, we might enter
into the relation facts about which varieties pollinize the new variety, and which it
SEC. 7.9 IMPLEMENTING BINARY RELATIONS 381
VARIETY POLLINIZER
Beauty Santa Rosa
Santa Rosa Santa Rosa
Burbank Beauty
Burbank Santa Rosa
Eldorado Santa Rosa
Eldorado Wickson
Wickson Santa Rosa
Wickson Beauty
of a domain element, a range element, and a pointer to the next cell, just like
the cells for functions. Insertion and deletion are carried out as for ordinary sets,
as discussed in Section 6.4. The only nuance is that equality of set members is
determined by comparing both the field holding the domain element and the field
holding the range element.
Lookup is a somewhat different operation from the operations of the same name
we have encountered previously. We must go down the list, looking for cells with
a particular domain value a, and we must assemble a list of the associated range
values. An example will show the mechanics of the lookup operation on linked lists.
(1) if (L == NULL)
(2) return NULL;
(3) else if (!strcmp(L->variety, a)) /* L->variety == a */ {
(4) P = (PLIST) malloc(sizeof(struct PCELL));
(5) strcpy(P->pollinizer, L->pollinizer);
(6) P->next = lookup(a, L->next);
(7) return P;
}
else /* a not the domain value of current pair */
(8) return lookup(a, L->next);
}
A Characteristic-Vector Approach
For sets and for functions, we saw that we could create an array indexed by elements
of a “universal” set and place appropriate values in the array. For sets, the appro-
priate array values are TRUE and FALSE, and for functions they are those values that
can appear in the range, plus (usually) a special value that means “none.”
For binary relations, we can index an array by members of a small declared
domain, just as we did for functions. However, we cannot use a single value as an
array element, because a relation can have any number of range values for a given
domain value. The best we can do is to use as an array element the header of a
linked list that contains all the range values associated with a given domain value.
✦ Example 7.30. Let us redo the plum example using this organization. As was
pointed out in the last section, when we use a characteristic-vector style, we must
fix the set of values, in the domain at least; there is no such constraint for linked-list
or hash-table representations. Thus, we must redeclare the PVARIETY type to be an
enumerated type:
enum PVARIETY {Beauty, SantaRosa, Burbank, Eldorado, Wickson};
We can continue to use the PCELL type for lists of varieties, as defined in Example
7.29. Then we may define the array
PLIST Pollinizers[5];
That is, the array representing the relation of Fig. 7.23 is indexed by the varieties
mentioned in that figure, and the value associated with each variety is a pointer
to the first cell on its list of pollinizers. Figure 7.25 shows the pairs of Fig. 7.23
represented in this way. ✦
384 THE SET DATA MODEL
Pollinizers
Beauty SantaRosa •
SantaRosa SantaRosa •
length n, this search takes O(n) average time, since we must scan the entire list if
the pair is not found and, on the average, half the list if it is found.
For lookup, an examination of Fig. 7.24 should convince us that this function
takes O(1) time plus a recursive call on the tail of a list. We thus make n calls if
the list is of length n, for a total time of O(n).
Now consider the generalized characteristic vector. The operation lookup(a) is
easiest. We go to the array element indexed by a, and there we find our answer, a
list of all the b’s such that (a, b) is in the relation. We don’t even have to examine the
elements or copy them. Thus, lookup takes O(1) time when characteristic vectors
are used.
On the other hand, insert and delete are less simple. To insert (a, b), we can
go to the array element indexed by a easily enough, but we must search the entire
list to make sure that (a, b) is not already there.5 That requires an amount of
time proportional to the average length of a list, that is, to the average number of
range values associated with a given domain value. We shall call this parameter m.
Another way to look at m is that it is n, the total number of pairs in the relation,
divided by the number of different domain values. If we assume that any list is as
likely to be searched as any other, then we require O(m) time on the average to
perform an insert or a delete.
Finally, let us consider the hash table. If there are n pairs in our relation and
B buckets, we expect there to be an average of n/B pairs per bucket. However,
the parameter m must be figured in as well. If there are n/m different domain
values, then at most n/m buckets can be nonempty, since the bucket for a pair is
determined only by the domain value. Thus, m is a lower bound on the average size
of a bucket, regardless of B. Since n/B is alsoa lower bound, the time to perform
one of the three operations is O max(m, n/B) .
✦ Example 7.31. Suppose there is a relation of 1000 pairs, distributed among 100
domain values. Then the typical domain value has 10 associated range values; that
is, m = 10. If we use 1000 buckets — that is, B = 1000 — then m is greater than
n/B, which is 1, and we expect the average bucket that we might actually search
(because its number is h(a) for some domain value a that appears in the relation)
to have about 10 pairs. In fact, it will have on the average slightly more, because by
coincidence, the same bucket could be h(a1 ) and h(a2 ) for different domain values
a1 and a2 . If we choose B = 100, then m = n/B = 10, and we would again expect
each bucket we might search to have about 10 elements. As just mentioned, the
actual number is slightly more because of coincidences, where two or more domain
values hash to the same bucket. ✦
EXERCISES
7.9.1: Using the data types from Example 7.29, write a function that takes a
pollinizer value b and a list of variety-pollinizer pairs, and returns a list of the
varieties that are pollinized by b.
5 We could insert the pair without regard for whether it is already present, but that would
have both the advantages and disadvantages of the list representation discussed in Section
6.4, where we allowed duplicates.
386 THE SET DATA MODEL
7.9.2: Write (a) insert and (b) delete routines for variety-pollinizer pairs using the
assumptions of Example 7.29.
7.9.3: Write (a) insert, (b) delete, and (c) lookup functions for a relation repre-
sented by the vector data structure of Example 7.30. When inserting, do not forget
to check for an identical pair already in the relation.
7.9.4: Design a hash-table data structure to represent the pollinizer relation that
forms the primary example of this section. Write functions for the operations insert,
delete, and lookup.
7.9.5*: Prove that the function lookup of Fig. 7.24 works correctly, by showing
by induction on the length of list L that lookup returns a list of all the elements b
such that the pair (a, b) is on the list L.
7.9.6*: Design a data structure that allows O(1) average time to perform each of
the operations insert, delete, lookup, and inverseLookup. The latter operation
takes a range element and finds the associated domain elements.
7.9.7: In this section and the previous, we defined some new abstract data types
that had operations we called insert, delete, and lookup. However, these operations
were defined slightly differently from the operations of the same name on dictio-
naries. Make a table for the ADT’s DICTIONARY, FUNCTION (as discussed in
Section 7.8), and RELATION (as discussed in this section) and indicate the possi-
ble abstract implementations and the data structures that support them. For each,
indicate the running time of each operation.
✦
✦ ✦
✦
7.10 Some Special Properties of Binary Relations
In this section we shall consider some of the special properties that certain useful
binary relations have. We begin by defining some basic properties: transitivity,
reflexivity, symmetry, and antisymmetry. These are combined to form common
types of binary relations: partial orders, total orders, and equivalence relations.
SEC. 7.10 SOME SPECIAL PROPERTIES OF BINARY RELATIONS 387
Transitivity
Let R be a binary relation on the domain D. We say that the relation R is transitive
if whenever aRb and bRc are true, aRc is also true. Figure 7.26 illustrates the
transitivity property as it appears in the graph of a relation. Whenever the dotted
arrows from a to b and from b to c appear in the diagram, for some particular a, b,
and c, then the solid arrow from a to c must also be in the diagram. It is important
to remember that transitivity, like the other properties to be defined in this section,
pertains to the relation as a whole. It is not enough that the property be satisfied
for three particular domain elements; it must be satisfied for all triples a, b, and c
in the declared domain D.
a b c
Fig. 7.26. Transitivity condition requires that if both the arcs aRb and bRc
are present in the graph of a relation, then so is the arc aRc.
✦ Example 7.32. Consider the relation < on Z, the set of integers. That is, <
is the set of pairs of integers (a, b) such that a is less than b. The relation < is
transitive, because if a < b and b < c, we know that a < c. Similarly, the relations
≤, >, and ≥ on integers are transitive. These four comparison relations are likewise
transitive on the set of real numbers.
However, consider the relation 6= on the integers (or the reals for that matter).
This relation is not transitive. For instance, let a and c both be 3, and let b be 5.
Then a 6= b and b 6= c are both true. If the relation were transitive, we would have
a 6= c. But that says 3 6= 3, which is wrong. We conclude that 6= is not transitive.
Transitivity of For another example of a transitive relation, consider ⊆, the subset relation.
subset We might like to consider the relation as being the set of all pairs of sets (S, T )
such that S ⊆ T , but to imagine that there is such a set would lead us to Russell’s
paradox again. However, suppose we have a “universal” set U . We can let ⊆U be
the set of pairs of sets
{(S, T ) | S ⊆ T and T ⊆ U }
Then ⊆U is a relation on P(U ), the power set of U , and we can think of ⊆U as the
subset relation.
For instance, let U = {1, 2}. Then ⊆{1,2} consists of the nine (S, T )-pairs shown
in Fig. 7.27. Thus, ⊆U contains exactly those pairs such that the first component
is a subset (not necessarily proper) of the second component and both are subsets
of {1, 2}.
It is easy to check that ⊆U is transitive, no matter what the universal set U
is. If A ⊆ B and B ⊆ C, then it must be that A ⊆ C. The reason is that for every
x in A, we know that x is in B, because A ⊆ B. Since x is in B, we know that x
is in C, because B ⊆ C. Thus, every element of A is an element of C. Therefore,
A ⊆ C. ✦
388 THE SET DATA MODEL
S T
∅ ∅
∅ {1}
∅ {2}
∅ {1, 2}
{1} {1}
{1} {1, 2}
{2} {2}
{2} {1, 2}
{1, 2} {1, 2}
Reflexivity
Some binary relations R have the property that for every element a in the declared
domain, R has the pair (a, a); that is, aRa. If so, we say that R is reflexive. Figure
7.28 suggests that the graph of a reflexive relation has a loop on every element of
its declared domain. The graph may have other arrows in addition to the loops.
However, it is not sufficient that there be loops for the elements of the current
domain; there must be one for each element of the declared domain.
a b ··· z
Fig. 7.28. A reflexive relation R has xRx for every x in its declared domain.
✦ Example 7.33. The relation ≥ on the reals is reflexive. For each real number
a, we have a ≥ a. Similarly, ≤ is reflexive, and both these relations are also reflexive
on the integers. However, < and > are not reflexive, since a < a and a > a are each
false for at least one value of a; in fact, they are both false for all a.
The subset relations ⊆U defined in Example 7.32 are also reflexive, since A ⊆ A
for any set A. However, the similarly defined relations ⊂U that contain the pair
(S, T ) if T ⊆ U and S ⊂ T — that is, S is a proper subset of T — are not reflexive.
The reason is that A ⊂ A is false for some A (in fact, for all A). ✦
{(b, a) | (a, b) ∈ R}
For example, > is the inverse of <, since a > b exactly when b < a. Likewise, ≥ is
the inverse of ≤.
a b
We say that R is symmetric if it is its own inverse. That is, R is symmetric if,
whenever aRb, we also have bRa. Figure 7.29 suggests what symmetry looks like
in the graph of a relation. Whenever the forward arc is present, the backward arc
must also be present.
We say that R is antisymmetric if aRb and bRa are both true only when
a = b. Note that it is not necessary that aRa be true for any particular a in
an antisymmetric relation. However, an antisymmetric relation can be reflexive.
Figure 7.30 shows how the antisymmetry condition relates to graphs of relations.
a b c
never optional
| | | |
a1 a2 a3 an
✦ Example 7.36. Figure 7.32 represents the partial order ⊆{1,2,3} . We have
Reduced graph drawn the relation as a reduced graph, in which we have omitted arcs that can be
inferred by transitivity. That is, S ⊆{1,2,3} T if either
1. S = T,
2. There is an arc from S to T , or
3. There is a path of two or more arcs leading from S to T .
For example, we know that ∅ ⊆{1,2,3} {1, 3}, because of the path from ∅ to {1} to
{1, 3}. ✦
{1} {1, 2}
{3} {2, 3}
Equivalence Relations
An equivalence relation is a binary relation that is reflexive, symmetric, and transi-
tive. This kind of relation is quite different from the partial orders and total orders
we have met in our previous examples. In fact, a partial order can never be an
equivalence relation, except in the trivial cases that the declared domain is empty,
or there is only one element a in the declared domain and the relation is {(a, a)}.
Equivalence Classes
Another way to view an equivalence relation is that it partitions its domain into
equivalence classes. If R is an equivalence relation on a domain D, then we can
divide D into equivalence classes so that
1. Each domain element is in exactly one equivalence class.
2. If aRb, then a and b are in the same equivalence class.
3. If aRb is false, then a and b are in different equivalence classes.
✦ Example 7.38. Consider the relation R of Example 7.37, where aRb when
a − b is a multiple of 3. One equivalence class is the set of integers that are exactly
divisible by 3, that is, those that leave a remainder of 0 when divided by 3. This
class is {. . . , −3, 0, 3, 6, . . .}. A second is the set of integers that leave a remainder of
1 when divided by 3, that is, {. . . , −2, 1, 4, 7, . . .}. The last class is the set of integers
that leave a remainder of 2 when divided by 3. This class is {. . . , −1, 2, 5, 8, . . .}.
The classes partition the set of integers into three disjoint sets, as suggested by Fig.
7.33.
Notice that when two integers leave the same remainder when divided by 3,
then their difference is evenly divided by 3. For instance, 14 = 3 × 4 + 2 and
5 = 3 × 1 + 2. Thus, 14 − 5 = 3 × 4 − 3 × 1 + 2 − 2 = 3 × 3. We therefore know that
14R5. On the other hand, if two integers leave different remainders when divided by
SEC. 7.10 SOME SPECIAL PROPERTIES OF BINARY RELATIONS 393
Exactly
Divisible
by 3
Leave Leave
Remainder Remainder
of 1 of 2
3, their difference surely is not evenly divisible by 3. Thus, integers from different
classes, like 5 and 7, are not related by R. ✦
Closures of Relations
A common operation on relations is to take a relation that does not have some
property and add as few pairs as possible to create a relation that does have that
property. The resulting relation is called the closure (for that property) of the
original relation.
394 THE SET DATA MODEL
Transitive ✦ Example 7.39. We discussed reduced graphs in connection with Fig. 7.32.
closure Although we were representing a transitive relation, ⊆{1,2,3} , we drew arcs corre-
sponding to only a subset of the pairs in the relation. We can reconstruct the entire
relation by applying the transitive law to infer new pairs, until no new pairs can
be inferred. For example, we see that there are arcs corresponding to the pairs
({1}, {1, 3}) and ({1, 3}, {1, 2, 3}), and so the transitive law tells us that the pair
({1}, {1, 2, 3}) must also be in the relation. Then this pair, together with the pair
(∅, {1}) tells us that (∅, {1, 2, 3}) is in the relation. To these we must add the “re-
flexive” pairs (S, S), for each set S that is a subset of {1, 2, 3}. In this manner, we
can reconstruct all the pairs in the relation ⊆{1,2,3} . ✦
Topological Another useful closure operation is topological sorting, where we take a partial
sorting order and add tuples until it becomes a total order. While the transitive closure
of a binary relation is unique, there are frequently several total orders that contain
a given partial order. We shall learn in Chapter 9 of a surprisingly efficient algo-
rithm for topological sorting. For the moment, let us consider an example where
topological sorting is useful.
EXERCISES
7.10.1: Give an example of a relation that is reflexive for one declared domain but
not reflexive for another declared domain. Remember that for D to be a possible
domain for a relation R, D must include every element that appears in a pair of R
but it may also include more elements.
7.10.2**: How many pairs are there in the relation ⊆{1,2,3} ? In general, how many
pairs are there in ⊆U , if U has n elements? Hint : Try to guess the function from a
few cases like the two-element case (Fig. 7.27) where there are 9 pairs. Then prove
your guess correct by induction.
7.10.3: Consider the binary relation R on the domain of four-letter strings defined
by sRt if t is formed from the string s by cycling its characters one position left.
That is, abcdRbcda, where a, b, c, and d are individual letters. Determine whether
R is (a) reflexive, (b) symmetric, (c) transitive, (d) a partial order, and/or (e) an
equivalence relation. Give a brief argument why, or a counterexample, in each case.
7.10.4: Consider the domain of four-letter strings in Exercise 7.10.3. Let S be
the binary relation consisting of R applied 0 or more times. Thus, abcdSabcd,
abcdSbcda, abcdScdab, and abcdSdabc. Put another way, a string is related by S to
any of its rotations. Answer the five questions from Exercise 7.10.3 for the relation
S. Again, give justification in each case.
7.10.5*: What is wrong with the following “proof”?
(Non)Theorem: If binary relation R is symmetric and transitive, then R is reflexive.
(Non)Proof : Let x be some member of the domain of R. Pick y such that xRy. By
symmetry, yRx. By transitivity, xRy and yRx imply xRx. Since x is an arbitrary
member of R’s domain, we have shown that xRx for every element in the domain
of R, which “proves” that R is reflexive.
7.10.6: Give examples of relations with declared domain {1, 2, 3} that are
a) Reflexive and transitive, but not symmetric
b) Reflexive and symmetric, but not transitive
c) Symmetric and transitive, but not reflexive
d) Symmetric and antisymmetric
e) Reflexive, transitive, and a total function
f) Antisymmetric and a one-to-one correspondence
7.10.7*: How many arcs are saved if we use the reduced graph for the relation ⊆U ,
where U has n elements, rather than the full graph?
7.10.8: Are (a) ⊆U and (b) ⊂U either partial orders or total orders when U has
one element? What if U has zero elements?
396 THE SET DATA MODEL
✦
✦ ✦
✦
7.11 Infinite Sets
All of the sets that one would implement in a computer program are finite, or
limited, in extent; one could not store them in a computer’s memory if they were
not. Many sets in mathematics, such as the integers or reals, are infinite in extent.
These remarks seem intuitively clear, but what distinguishes a finite set from an
infinite one?
The distinction between finite and infinite is rather surprising. A finite set is
one that does not have the same number of elements as any of its proper subsets.
Recall from Section 7.7 that we said we could use the existence of a one-to-one
correspondence between two sets to establish that that they are equipotent, that is,
Equipotent sets they have the same number of members.
If we take a finite set such as S = {1, 2, 3, 4} and any proper subset of it, such
as T = {1, 2, 3}, there is no way to find a one-to-one correspondence between the
two sets. For example, we could map 4 of S to 3 of T , 3 of S to 2 of T , and 2 of S
to 1 of T , but then we would have no member of T to associate with 1 of S. Any
other attempt to build a one-to-one correspondence from S to T must likewise fail.
Your intuition might suggest that the same should hold for any set whatsoever:
how could a set have the same number of elements as a set formed by throwing away
one or more of its elements? Consider the natural numbers (nonnegative integers)
N and the proper subset of N formed by throwing away 0; call it N − {0}, or
{1, 2, 3, . . .}. Then consider the one-to-one correspondence F from N to N − {0}
defined by F (0) = 1, F (1) = 2, and, in general, F (i) = i + 1.
Surprisingly, F is a one-to-one correspondence from N to N − {0}. For each
i in N, there is at most one j such that F (i) = j, so F is a function. In fact,
there is exactly one such j, namely i + 1, so that condition (1) in the definition
of one-to-one correspondence (see Section 7.7) is satisfied. For every j in N − {0}
there is some i such that F (i) = j, namely, i = j − 1. Thus condition (2) in the
definition of one-to-one correspondence is satisfied. Finally, there cannot be two
SEC. 7.11 INFINITE SETS 397
Infinite Hotels
To help you appreciate that there are as many numbers from 0 up as from 1 up,
imagine a hotel with an infinite number of rooms, numbered 0, 1, 2, and so on; for
any integer, there is a room with that integer as room number. At a certain time,
there is a guest in each room. A kangaroo comes to the front desk and asks for
a room. The desk clerk says, “We don’t see many kangaroos around here.” Wait
— that’s another story. Actually, the desk clerk makes room for the kangaroo as
follows. He moves the guest in room 0 to room 1, the guest in room 1 to room 2,
and so on. All the old guests still have a room, and now room 0 is vacant, and the
kangaroo goes there. The reason this “trick” works is that there are truly the same
number of rooms numbered from 1 up as are numbered from 0 up.
distinct numbers i1 and i2 in N such that F (i1 ) and F (i2 ) are both j, because then
i1 + 1 and i2 + 1 would both be j, from which we would conclude that i1 = i2 . We
are forced to conclude that F is a one-to-one correspondence between N and its
proper subset N − {0}.
✦ Example 7.41. The set of natural numbers and the set of even natural numbers
are equipotent. Let F (i) = 2i. Then F is a one-to-one correspondence that maps 0
to 0, 1 to 2, 2 to 4, 3 to 6, and in general, every natural number to a unique natural
number, its double.
Similarly, Z and N are the same size; that is, there are as many nonnegative
and negative integers as nonnegative integers. Let F (i) = 2i for all i ≥ 0, and let
F (i) = −2i − 1 for i < 0. Then 0 goes to 0, 1 to 2, −1 to 1, 2 to 4, −2 to 3, and so
on. Every integer is sent to a unique nonnegative integer, with the negative integers
going to odd numbers and the nonnegative integers to even numbers.
Even more surprising, the set of pairs of natural numbers is equinumerous with
N itself. To see how the one-to-one correspondence is constructed, consider Fig.
7.34, which shows the pairs in N × N arranged in an infinite square. We order the
pairs according to their sum, and among pairs of equal sum, by order of their first
components. This order begins (0, 0), (0, 1), (1, 0), (0, 2), (1, 1), (2, 0), (0, 3), (1, 2),
and so on, as suggested by Fig. 7.34.
Now, every pair has a place in the order. The reason is that for any pair (i, j),
there are only a finite number of pairs with a smaller sum, and a finite number
with the same sum and a smaller value of i. In fact, we can calculate the position
of the pair (i, j) in the order; it is (i + j)(i + j + 1)/2 + i. That is, our one-
to-one correspondence associates the pair (i, j) with the unique natural number
(i + j)(i + j + 1)/2 + i.
Notice that we have to be careful how we order pairs. Had we ordered them
by rows in Fig. 7.34, we would never get to the pairs on the second or higher rows,
398 THE SET DATA MODEL
5 15
↑ 4 10 16
j 3 6 11
2 3 7 12
1 1 4 8 13
0 0 2 5 9 14
0 1 2 3 4 5
i →
because there are an infinite number of pairs on each row. Similarly, ordering by
columns would not work. ✦
The formal definition of infinite sets is interesting, but that definition may not
meet our intuition of what infinite sets are. For example, one might expect that
an infinite set was one that, for every integer n, contained at least n elements.
Fortunately, this property can be proved for every set that the formal definition
tells us is infinite. The proof is an example of induction.
INDUCTION. Assume S(n) for some n ≥ 0. We shall prove that I has a subset
with n + 1 elements. By the inductive hypothesis, I has a subset T with n elements.
By the formal definition of an infinite set, there is a proper subset J ⊂ I and a 1-1
correspondence f from I to J. Let a be an element in I − J; surely a exists because
J is a proper subset.
Consider R, the image of T under f , that is, if T = {b1 , . . . , bn }, then R =
{f (b1 ), . . . , f (bn )}. Since f is 1-1, each of f (b1 ), . . . , f (bn ) are different, so R is of
size n. Since f is from I to J, each of the f (bk )’s is in J; that is, R ⊆ J. Thus, a
SEC. 7.11 INFINITE SETS 399
Cardinality of Sets
We defined two sets S and T to be equipotent (equal in size) if there is a one-to-one
correspondence from S to T . Equipotence is an equivalence relation on any set of
sets, and we leave this point as an exercise. The equivalence class to which a set S
belongs is said to be the cardinality of S. For example, the empty set belongs to an
equivalence class by itself; we can identify this class with cardinality 0. The class
containing the set {a}, where a is any element, is cardinality 1, the class containing
the set {a, b} is cardinality 2, and so on.
The class containing N is “the cardinality of the integers,” usually given the
Countable set, name aleph-zero, and a set in this class is said to be countable. The set of real
aleph-zero numbers belongs to another equivalence class, often called the continuum. There
are, in fact, an infinite number of different infinite cardinalities.
POSITIONS
0 1 2 3 4 5 6 ···
0 3 1 4 1 5 9 2 ···
REAL 1 5 5 5 5 5 5 5 ···
NUMBERS 2 6 2 5 0 0 0 0 ···
↓ 3 1 2 1 2 1 2 1 ···
4
Fig. 7.35. Hypothetical table of real numbers, assuming that the reals are countable.
The value of the ith digit, ai , depends on that of the ith diagonal digit, that is, on
the value found at the ith position of the ith real. If this value is 0 through 4, we
let ai = 8. If the value at the ith diagonal position is 5 through 9, then ai = 1.
✦ Example 7.42. Given the part of the table suggested by Fig. 7.35, our real
number r begins .8118 · · · . To see why, note that the value at position 0 of real 0 is
3, and so a0 = 8. The value at position 1 of real 1 is 5, and so a1 = 1. Continuing,
the value at position 2 of real 2 is 5 and the value at position 3 of real 3 is 2, and
so the next two digits are 18. ✦
We claim that r does not appear anywhere in the hypothetical list of reals, even
though we supposed that all real numbers from 0 to 1 were in the list. Suppose r
were rj , the real number associated with row j. Consider the difference d between
r and rj . We know that aj , the digit in position j of the decimal expansion of r,
was specifically chosen to differ by at least 4 and at most 8 from the digit in the jth
position of rj . Thus, the contribution to d from the jth position is between 4/10j+1
and 8/10j+1 .
The contribution to d from all positions after the jth is no more than 1/10j+1 ,
since that would be the difference if one of r and rj had all 0’s there and the other
had all 9’s. Hence, the contribution to d from all positions j and greater is between
3/10j+1 and 9/10j+1 .
Finally, in positions before the jth, r and rj are either the same, in which case
the contribution to d from the first j − 1 positions is 0, or r and rj differ by at least
1/10j . In either case, we see that d cannot be 0. Thus, r and rj cannot be the same
real number.
We conclude that r does not appear in the list of real numbers. Thus, our
hypothetical one-to-one correspondence from the nonnegative integers to the reals
between 0 and 1 is not one to one. We have shown there is at least one real number
in that range, namely r, that is not associated with any integer.
EXERCISES
7.11.1: Show that equipotence is an equivalence relation. Hint : The hard part
is transitivity, showing that if there is a one-to-one correspondence f from S to
T , and a one-to-one correspondence g from T to R, then there is a one-to-one
Composition of correspondence from S to R. This function is the composition
of f and g, that is,
functions the function that sends each element x in S to g f (x) in R.
SEC. 7.12 SUMMARY OF CHAPTER 7 401
7.11.2: In the ordering of pairs in Fig. 7.34, what pair is assigned number 100?
7.11.3*: Show that the following sets are countable (have a one-to-one correspon-
dence between them and the natural numbers):
a) The set of perfect squares
b) The set of triples (i, j, k) of natural numbers
c) The set of powers of 2
d) The set of finite sets of natural numbers
7.11.4**: Show that P(N), the power set of the natural numbers, has the same
cardinality as the reals — that is, there is a one-to-one correspondence from P(N)
to the reals between 0 and 1. Note that this conclusion does not contradict Exercise
7.11.3(d), because here we are talking about finite and infinite sets of integers, while
there we counted only finite sets. Hint : The following construction almost works,
but needs to be fixed. Consider the characteristic vector for any set of natural
numbers. This vector is an infinite sequence of 0’s and 1’s. For example, {0, 1} has
the characteristic vector 1100 · · · , and the set of odd numbers has the characteristic
vector 010101 · · · . If we put a decimal point in front of a characteristic vector, we
have a binary fraction between 0 and 1, which represents a real number. Thus, every
set is sent to a real in the range 0 to 1, and every real number in that range can
be associated with a set, by turning its binary representation into a characteristic
vector. The reason this association is not a one-to-one correspondence is that certain
reals have two binary representations. For example, .11000 · · · and .10111 · · · both
represent the real number 3/4. However, these sequences as characteristic vectors
represent different sets; the first is {0, 1} and the second is the set of all integers
except 1. You can modify this construction to define a one-to-one correspondence.
7.11.5**: Show that there is a one-to-one correspondence from pairs of reals in the
range 0 to 1 to reals in that range. Hint : It is not possible to imitate the table of
Fig. 7.34 directly. However, we may take a pair of reals, say, (r, s), and combine
the infinite decimal fractions for r and s to make a unique new real number t. This
number will not be related to r and s by any simple arithmetic expression, but from
t, we can recover r and s uniquely. The reader must discover a way to construct
the decimal expansion of t from the expansions of r and s.
7.11.6**: Show that whenever a set S contains subsets of all integer sizes 0, 1, . . . ,
then it is an infinite set according to the formal definition of “infinite”; that is, S
has a one-to-one correspondence with one of its proper subsets.
✦
✦ ✦
✦
7.12 Summary of Chapter 7
You should take away the following points from Chapter 7:
✦ The concept of a set is fundamental to both mathematics and computer science.
✦ The common operations on sets such as union, intersection, and difference can
be visualized in terms of Venn diagrams.
✦ Algebraic laws can be used to manipulate and simplify expressions involving
sets and operations on sets.
402 THE SET DATA MODEL
✦ Linked lists, characteristic vectors, and hash tables provide three basic ways to
represent sets. Linked lists offer the greatest flexibility for most set operations
but are not always the most efficient. Characteristic vectors provide the great-
est speed for certain set operations but can be used only when the universal set
is small. Hash tables are often the method of choice, providing both economy
of representation and speed of access.
✦ (Binary) relations are sets of pairs. A function is a relation in which there is
at most one tuple with a given first component.
✦ A one-to-one correspondence between two sets is a function that associates a
unique element of the second set with each element of the first, and vice versa.
✦ There are a number of significant properties of binary relations: reflexivity,
transitivity, symmetry, and asymmetry are among the most important.
✦ Partial orders, total orders, and equivalence relations are important special
cases of binary relations.
✦ Infinite sets are those sets that have a one-to-one correspondence with one of
their proper subsets.
✦ Some infinite sets are “countable,” that is, they have a one-to-one correspon-
dence with the integers. Other infinite sets, such as the reals, are not countable.
✦ The data structures and operations defined on sets and relations in this chapter
will be used in many different ways in the remainder of this book.
✦
✦ ✦
✦
7.13 Bibliographic Notes for Chapter 7
Halmos [1974] provides a good introduction to set theory. Hashing techniques were
first developed in the 1950’s, and Peterson [1957] covers the early techniques. Knuth
[1973] and Morris [1968] contain additional material on hashing techniques. Rein-
gold [1972] discusses the computational complexity of basic set operations. The
theory of infinite sets was developed by Cantor [1915].
Cantor, G. [1915]. “Contributions to the founding of the theory of transfinite num-
bers,” reprinted by Dover Press, New York.
Halmos, P. R. [1974]. Naive Set Theory, Springer-Verlag, New York.
Knuth, D. E. [1973]. The Art of Computer Programming, Vol. III, Sorting and
Searching, Addison-Wesley, Reading, Mass.
Morris, R. [1968]. “Scatter storage techniques,” Comm. ACM 11:1, pp. 35–44.
Peterson, W. W. [1957]. “Addressing for random access storage,” IBM J. Research
and Development 1:7, pp. 130–146.
Reingold, E. M. [1972]. “On the optimality of some set algorithms,” J. ACM 19:4,
pp. 649–659.
CHAPTER 8
✦
✦ ✦
✦
The Relational
Data Model
One of the most important applications for computers is storing and managing
information. The manner in which information is organized can have a profound
effect on how easy it is to access and manage. Perhaps the simplest but most
versatile way to organize information is to store it in tables.
The relational model is centered on this idea: the organization of data into
collections of two-dimensional tables called “relations.” We can also think of the
relational model as a generalization of the set data model that we discussed in
Chapter 7, extending binary relations to relations of arbitrary arity.
Originally, the relational data model was developed for databases — that is,
Database information stored over a long period of time in a computer system — and for
database management systems, the software that allows people to store, access, and
modify this information. Databases still provide us with important motivation for
understanding the relational data model. They are found today not only in their
original, large-scale applications such as airline reservation systems or banking sys-
tems, but in desktop computers handling individual activities such as maintaining
expense records, homework grades, and many other uses.
Other kinds of software besides database systems can make good use of tables
of information as well, and the relational data model helps us design these tables and
develop the data structures that we need to access them efficiently. For example,
such tables are used by compilers to store information about the variables used in
the program, keeping track of their data type and of the functions for which they
are defined.
✦
✦ ✦
✦
8.1 What This Chapter Is About
There are three intertwined themes in this chapter. First, we introduce you to the
design of information structures using the relational model. We shall see that
✦ Tables of information, called “relations,” are a powerful and flexible way to
represent information (Section 8.2).
403
404 THE RELATIONAL DATA MODEL
✦
✦ ✦
✦
8.2 Relations
Section 7.7 introduced the notion of a “relation” as a set of tuples. Each tuple of
a relation is a list of components, and each relation has a fixed arity, which is the
number of components each of its tuples has. While we studied primarily binary
relations, that is, relations of arity 2, we indicated that relations of other arities
were possible, and indeed can be quite useful.
The relational model uses a notion of “relation” that is closely related to this
set-theoretic definition, but differs in some details. In the relational model, in-
formation is stored in tables such as the one shown in Fig. 8.1. This particular
table represents data that might be stored in a registrar’s computer about courses,
students who have taken them, and the grades they obtained.
Attribute The columns of the table are given names, called attributes. In Fig. 8.1, the
attributes are Course, StudentId, and Grade.
SEC. 8.2 RELATIONS 405
Tuple Each row in the table is called a tuple and represents a basic fact. The first
row, (CS101, 12345, A), represents the fact that the student with ID number 12345
got an A in the course CS101.
A table has two aspects:
1. The set of column names, and
2. The rows containing the information.
The term “relation” refers to the latter, that is, the set of rows. Each row represents
a tuple of the relation, and the order in which the rows appear in the table is
immaterial. No two rows of the same table may have identical values in all columns.
Relation scheme Item (1), the set of column names (attributes) is called the scheme of the
relation. The order in which the attributes appear in the scheme is immaterial, but
we need to know the correspondence between the attributes and the columns of
the table in order to write the tuples properly. Frequently, we shall use the scheme
as the name of the relation. Thus the table in Fig. 8.1 will often be called the
Course-StudentId-Grade relation. Alternatively, we could give the relation a name,
like CSG.
406 THE RELATIONAL DATA MODEL
Representing Relations
As sets, there are a variety of ways to represent relations by data structures. A
table looks as though its rows should be structures, with fields corresponding to
the column names. For example, the tuples in the relation of Fig. 8.1 could be
represented by structures of the type
struct CSG {
char Course[5];
int StudentId;
char Grade[2];
};
The table itself could be represented in any of a number of ways, such as
1. An array of structures of this type.
2. A linked list of structures of this type, with the addition of a next field to link
the cells of the list.
Additionally, we can identify one or more attributes as the “domain” of the relation
and regard the remainder of the attributes as the “range.” For instance, the relation
of Fig. 8.1 could be viewed as a relation from domain Course to a range consisting of
StudentId-Grade pairs. We could then store the relation in a hash table according
to the scheme for binary relations that we discussed in Section 7.9. That is, we hash
Course values, and the elements in buckets are Course-StudentId-Grade triples. We
shall take up this issue of data structures for relations in more detail, starting in
Section 8.4.
Databases
A collection of relations is called a database. The first thing we need to do when
designing a database for some application is to decide on how the information to
be stored should be arranged into tables. Design of a database, like all design
problems, is a matter of business needs and judgment. In an example to follow, we
shall expand our application of a registrar’s database involving courses, and thereby
expose some of the principles of good database design.
Some of the most powerful operations on a database involve the use of several
relations to represent coordinated types of data. By setting up appropriate data
structures, we can jump from one relation to another efficiently, and thus obtain
information from the database that we could not uncover from a single relation.
The data structures and algorithms involved in “navigating” among relations will
be discussed in Sections 8.6 and 8.8.
The set of schemes for the various relations in a database is called the scheme
Database of the database. Notice the difference between the scheme for the database, which
scheme tells us something about how information is organized in the database, and the set
of tuples in each relation, which is the actual information stored in the database.
✦ Example 8.1. Let us supplement the relation of Fig. 8.1, which has scheme
{Course, StudentId, Grade}
with four other relations. Their schemes and intuitive meanings are:
SEC. 8.2 RELATIONS 407
Queries on a Database
We saw in Chapter 7 some of the most important operations performed on relations
and functions; they were called insert, delete, and lookup, although their appropri-
ate meanings differed, depending on whether we were dealing with a dictionary, a
function, or a binary relation. There is a great variety of operations one can perform
on database relations, especially on combinations of two or more relations, and we
shall give a feel for this spectrum of operations in Section 8.7. For the moment, let
us focus on the basic operations that we might perform on a single relation. These
are a natural generalization of the operations discussed in the previous chapter.
1. insert(t, R). We add the tuple t to the relation R, if it is not already there.
This operation is in the same spirit as insert for dictionaries or binary relations.
2. delete(X, R). Here, X is intended to be a specification of some tuples. It
consists of components for each of the attributes of R, and each component
can be either
a) A value, or
b) The symbol ∗, which means that any value is acceptable.
The effect of this operation is to delete all tuples that match the specification
X. For example, if we cancel CS101, we want to delete all tuples of the
Course-Day-Hour
relation that have Course = “CS101.” We could express this condition by
delete (“CS101”, ∗, ∗), Course-Day-Hour
That operation would delete the first three tuples of the relation in Fig. 8.2(c),
because their first components each are the same value as the first component
of the specification, and their second and third components all match ∗, as any
values do.
408 THE RELATIONAL DATA MODEL
(a) StudentId-Name-Address-Phone
Course Prerequisite
CS101 CS100
EE200 EE005
EE200 CS100
CS120 CS101
CS121 CS120
CS205 CS101
CS206 CS121
CS206 CS205
(b) Course-Prerequisite
(c) Course-Day-Hour
Course Room
CS101 Turing Aud.
EE200 25 Ohm Hall
PH100 Newton Lab.
(d) Course-Room
3. lookup(X, R). The result of this operation is the set of tuples in R that match
the specification X; the latter is a symbolic tuple as described in the preceding
item (2). For example, if we wanted to know for what courses CS101 is a
prerequisite, we could ask
lookup (∗, “CS101”), Course-Prerequisite
(CS120, CS101)
(CS205, CS101)
✦ Example 8.2. Here are some more examples of operations on our registrar’s
database:
a) lookup (“CS101”, 12345, ∗), Course-StudentId-Grade finds the grade of the
student with ID 12345 in CS101. Formally, the result is the one matching
tuple, namely the first tuple in Fig. 8.1.
b) lookup (“CS205”, “CS120”), Course-Prerequisite asks whether CS120 is a
prerequisite of CS205. Formally, it produces as an answer either the single
tuple (“CS205”, “CS120”) if that tuple is in the relation, or the empty set if
not. For the particular relation of Fig. 8.2(b), the empty set is the answer.
c) delete (“CS101”, ∗), Course-Room drops the first tuple from the relation of
Fig. 8.2(d).
d) insert (“CS205”, “CS120”), Course-Prerequisite makes CS120 a prerequisite
of CS205.
e) insert (“CS205”, “CS101”), Course-Prerequisite has no effect on the relation
of Fig. 8.2(b), because the inserted tuple is already there. ✦
Notice that we take six tuples, with four components each, to do the work previously
done by five tuples, with two or three components each.
✦ Conversely, do not separate attributes when they represent connected informa-
tion.
For example, we cannot replace the Course-Day-Hour relation by two relations, one
with scheme Course-Day and the other with scheme Course-Hour. For then, we
could only tell that EE200 meets Tuesday, Wednesday, and Thursday, and that it
has meetings at 10AM and 1PM, but we could not tell when it met on each of its
three days.
EXERCISES
8.2.1: Give appropriate structure declarations for the tuples of the relations of Fig.
8.2(a) through (d).
8.2.2*: What is an appropriate database scheme for
a) A telephone directory, including all the information normally found in a direc-
tory, such as area codes.
SEC. 8.3 KEYS 411
✦
✦ ✦
✦
8.3 Keys
Many database relations can be considered functions from one set of attributes to
the remaining attributes. For example, we might choose to view the
Course-StudentId-Grade
relation as a function whose domain is Course-StudentId pairs and whose range
is Grade. Because functions have somewhat simpler data structures than general
relations, it helps if we know a set of attributes that can serve as the domain of a
function. Such a set of attributes is called a “key.”
More formally, a key for a relation is a set of one or more attributes such that
under no circumstances will the relation have two tuples whose values agree in each
column headed by a key attribute. Frequently, there are several different sets of
attributes that could serve as a key for a relation, but we normally pick one and
refer to it as “the key.”
Finding Keys
Because keys can be used as the domain of a function, they play an important role
in the next section when we discuss primary index structures. In general, we cannot
deduce or prove that a set of attributes forms a key; rather, we need to examine
carefully our assumptions about the application being modeled and how they are
reflected in the database scheme we are designing. Only then can we know whether
it is appropriate to use a given set of attributes as a key. There follows a sequence
of examples that illustrate some of the issues.
Thus it is reasonable to take the StudentId attribute by itself as a key for the
StudentId-Name-Address-Phone relation.
However, in declaring StudentId a key, we have made a critical assumption,
enunciated in item (2) preceding, that we never want to store two names, addresses,
or phone numbers for one student. But we could just as well have decided otherwise,
for example, that we want to store for each student both a home address and a
campus address. If so, we would probably be better off designing the relation to
have five attributes, with Address replaced by HomeAddress and LocalAddress,
rather than have two tuples for each student, with all but the Address component
the same. If we did use two tuples — differing in their Address components only —
then StudentId would no longer be a key but {StudentId, Address} would be a
key. ✦
✦ Example 8.6. In the Course-Day-Hour relation of Fig. 8.2(c), all three at-
tributes form the only reasonable key. Perhaps Course and Day alone could be
declared a key, but then it would be impossible to store the fact that a course met
twice in one day (e.g., for a lecture and a lab). ✦
SEC. 8.3 KEYS 413
EXERCISES
8.3.1*: Suppose we want to store home and local addresses and also home and
local phones for students in the StudentId-Name-Address-Phone relation.
a) What would then be the most suitable key for the relation?
b) This change causes redundancy; for example, the name of a student could be
repeated four times as his or her two addresses and two phones are combined
in all possible ways in different tuples. We suggested in Example 8.3 that one
solution is to use separate attributes for the different addresses and different
phones. What would the relation scheme be then? What would be the most
suitable key for this relation?
c) Another approach to handling redundancy, which we suggested in Section 8.2,
is to split the relation into two relations, with different schemes, that together
hold all the information of the original. Into what relations should we split
StudentId-Name-Address-Phone, if we are going to allow multiple addresses
and phones for one student? What would be the most suitable keys for these
relations? Hint : A critical issue is whether addresses and phones are inde-
pendent. That is, would you expect a phone number to ring in all addresses
belonging to one student (in which case address and phone are independent),
or are phones associated with single addresses?
8.3.2*: The Department of Motor Vehicles keeps a database with the following
kinds of information.
1. The name of a driver (Name).
2. The address of a driver (Addr).
3. The license number of a driver (LicenseNo).
4. The serial number of an automobile (SerialNo).
5. The manufacturer of an automobile (Manf).
6. The model name of an automobile (Model).
414 THE RELATIONAL DATA MODEL
✦
✦ ✦
✦
8.4 Primary Storage Structures for Relations
In Sections 7.8 and 7.9 we saw how certain operations on functions and binary
relations were speeded up by storing pairs according to their domain value. In
terms of the general insert, delete, and lookup operations that we defined in Section
8.2, the operations that are helped are those where the domain value is specified.
Recalling the Variety-Pollinizer relation from Section 7.9 again, if we regard Variety
as the domain of the relation, then we favor operations that specify a variety but
we do not care whether a pollinizer is specified.
Here are some structures we might use to represent a relation.
1. A binary search tree, with a “less than” relation on domain values to guide the
placement of tuples, can serve to facilitate operations in which a domain value
is specified.
2. An array used as a characteristic vector, with domain values as the array index,
can sometimes serve.
3. A hash table in which we hash domain values to find buckets will serve.
4. In principle, a linked list of tuples is a candidate structure. We shall ignore
this possibility, since it does not facilitate operations of any sort.
The same structures work when the relation is not binary. In place of a single
attribute for the domain, we may have a combination of k attributes, which we call
the domain attributes or just the “domain” when it is clear we are referring to a
Domain and set of attributes. Then, domain values are k-tuples, with one component for each
range attributes attribute of the domain. The range attributes are all those attributes other than
the domain attributes. The range values may also have several components, one for
each attribute of the range.
In general, we have to pick which attributes we want for the domain. The
easiest case occurs when there is one or a small number of attributes that serve
as a key for the relation. Then it is common to choose the key attribute(s) as the
SEC. 8.4 PRIMARY STORAGE STRUCTURES FOR RELATIONS 415
domain and the rest as the range. In cases where there is no key (except the set
of all attributes, which is not a useful key), we may pick any set of attributes as
the domain. For example, we might consider typical operations that we expect to
perform on the relation and pick for the domain an attribute we expect will be
specified frequently. We shall see some concrete examples shortly.
Once we have selected a domain, we can select any of the four data structures
just named to represent the relation, or indeed we could select another structure.
However, it is common to choose a hash table based on domain values as the index,
and we shall generally do so here.
Primary index The chosen structure is said to be the primary index structure for the relation.
The adjective “primary” refers to the fact that the location of tuples is determined
by this structure. An index is a data structure that helps find tuples, given a
value for one or more components of the desired tuples. In the next section, we
shall discuss “secondary” indexes, which help answer queries but do not affect the
location of the data.
1 1009 is a convenient prime around 1000. We might choose about 1000 buckets if there were
several thousand students in our database, so that the average number of tuples in a bucket
would be small.
416 THE RELATIONAL DATA MODEL
BUCKET
HEADERS
···
h to other tuples
12345 237 12345 C.Brown 12 Apple St. 555-1234
in bucket 237
···
1008
in the linked lists of the buckets and for the bucket header array. Figure 8.4 suggests
what the hash table would look like. ✦
In either case, we would not be able to compute a value for the hash function. For
example, given only the course, we do not have a student ID to add to the sum of
the characters converted to integers, and thus have no value to divide by 1009 to
get the bucket number.
However, suppose it is quite common to ask queries like, “Who is taking
CS101?,” that is,
lookup (“CS101”, ∗, ∗), Course-StudentId-Grade
SEC. 8.4 PRIMARY STORAGE STRUCTURES FOR RELATIONS 417
We might find it more efficient to use a primary structure based only on the value of
the Course component. That is, we may regard our relation as a binary relation in
the set-theoretic sense, with domain equal to Course and range the StudentId-Grade
pairs.
For instance, suppose we convert the characters of the course name to integers,
sum them, divide by 197, and take the remainder. Then the tuples of the
Course-StudentId-Grade
relation would be divided by this hash function into 197 buckets, numbered 0
through 196. However, if CS101 has 100 students, then there would be at least
100 structures in its bucket, regardless of how many buckets we chose for our hash
table; that is the disadvantage of using something other than a key on which to
base our primary index structure. There could even be more than 100 structures,
if some other course were hashed to the same bucket as CS101.
On the other hand, we still get help when we want to find the students in a
given course. If the number of courses is significantly more than 197, then on the
average, we shall have to search something like 1/197 of the entire
Course-StudentId-Grade
relation, which is a great saving. Moreover, we get some help when performing
operations like looking up a particular student’s grade in a particular course, or
inserting or deleting a Course-StudentId-Grade tuple. In each case, we can use the
Course value to restrict our search to one of the 197 buckets of the hash table. The
only sort of operation for which no help is provided is one in which no course is
specified. For example, to find the courses taken by student 12345, we must search
all the buckets. Such a query can be made more efficient only if we use a secondary
index structure, as discussed in the next section. ✦
binary relations in Chapter 7. To review the ideas, let us focus on a hash table as
the primary index structure. If the operation specifies a value for the domain, then
we hash this value to find a bucket.
1. To insert a tuple t, we examine the bucket to check that t is not already there,
and we create a new cell on the bucket’s list for t if it is not.
2. To delete tuples that match a specification X, we find the domain value from X,
hash to find the proper bucket, and run down the list for this bucket, deleting
each tuple that matches the specification X.
3. To lookup tuples according to a specification X, we again find the domain value
from X and hash that value to find the proper bucket. We run down the list
for that bucket, producing as an answer each tuple on the list that matches the
specification X.
If the operation does not specify the domain value, we are not so fortunate.
An insert operation always specifies the inserted tuple completely, but a delete or
lookup might not. In those cases, we must search all the bucket lists for matching
tuples and delete or list them, respectively.
EXERCISES
8.4.1: The DMV database of Exercise 8.3.2 should be designed to handle the
following sorts of queries, all of which may be assumed to occur with significant
frequency.
1. What is the address of a given driver?
2. What is the license number of a given driver?
3. What is the name of the driver with a given license number?
4. What is the name of the driver who owns a given automobile, identified by its
registration number?
5. What are the serial number, manufacturer, and model of the automobile with
a given registration number?
6. Who owns the automobile with a given registration number?
Suggest appropriate primary index structures for the relations you designed in Ex-
ercise 8.3.2, using a hash table in each case. State your assumptions about how
many drivers and automobiles there are. Tell how many buckets you suggest, as
well as what the domain attribute(s) are. How many of these types of queries can
you answer efficiently, that is, in average time O(1) independent of the size of the
relations?
8.4.2: The primary structure for the Course-Day-Hour relation of Fig. 8.2(c) might
depend on the typical operations we intended to perform. Suggest an appropriate
hash table, including both the attributes in the domain and the number of buckets
if the typical queries are of each of the following forms. You may make reasonable
assumptions about how many courses and different class periods there are. In each
case, a specified value like “CS101” is intended to represent a “typical” value; in
this case, we would mean that Course is specified to be some particular course.
SEC. 8.5 SECONDARY INDEX STRUCTURES 419
a) lookup (“CS101”, “M”, ∗), Course-Day-Hour .
b) lookup (∗, “M”, “9AM”), Course-Day-Hour .
c) delete (“CS101”, ∗, ∗), Course-Day-Hour .
d) Half of type (a) and half of type (b).
e) Half of type (a) and half of type (c).
f) Half of type (b) and half of type (c).
✦
✦ ✦
✦
8.5 Secondary Index Structures
Suppose we store the StudentId-Name-Address-Phone relation in a hash table,
where the hash function is based on the key StudentId, as in Fig. 8.4. This primary
index structure helps us answer queries in which the student ID number is specified.
However, perhaps we wish to ask questions in terms of students’ names, rather than
impersonal — and probably unknown — ID’s. For example, we might ask, “What
is the phone number of the student named C. Brown?” Now, our primary index
structure gives no help. We must go to each bucket and examine the lists of records
until we find one whose Name field has value “C. Brown.”
To answer such a query rapidly, we need an additional data structure that takes
us from a name to the tuple or tuples with that name in the Name component of
the tuple.2 A data structure that helps us find tuples — given a value for a certain
attribute or attributes — but is not used to position the tuples within the overall
structure, is called a secondary index.
What we want for our secondary index is a binary relation whose
1. Domain is Name.
2. Range is the set of pointers to tuples of the StudentId-Name-Address-Phone
relation.
In general, a secondary index on attribute A of relation R is a set of pairs (v, p),
where
a) v is a value for attribute A, and
b) p is a pointer to one of the tuples, in the primary index structure for relation
R, whose A-component has the value v.
The secondary index has one such pair for each tuple with the value v in attribute
A.
We may use any of the data structures for binary relations for storing secondary
indexes. Usually, we would expect to use a hash table on the value of the attribute
A. As long as the number of buckets is no greater than the number of different
values of attribute A, we can normally expect good performance — that is, O(n/B)
time, on the average — to find one pair (v, p) in the hash table, given a desired
value of v. (Here, n is the number of pairs and B is the number of buckets.) To
show that other structures are possible for secondary (or primary) indexes, in the
next example we shall use a binary search tree as a secondary index.
2 Remember that Name is not a key for the StudentId-Name-Address-Phone relation, despite
the fact that in the sample relation of Fig. 8.2(a), there are no tuples that have the same
Name value. For example, if Linus goes to the same college as Lucy, we could find two tuples
with Name equal to “L. Van Pelt,” but with different student ID’s.
420 THE RELATIONAL DATA MODEL
For the secondary index, we shall use a binary search tree, whose nodes store
elements that are pairs consisting of the name of a student and a pointer to a tuple.
The tuples themselves are stored as records, which are linked in a list to form one
of the buckets of the hash table, and so the pointers to tuples are really pointers to
records. Thus we need the structures of Fig. 8.5. The types TUPLE and HASHTABLE
are the same as in Fig. 8.3, except that we are now using two buckets rather than
1009 buckets.
The type NODE is a binary tree node with two fields, Name and toTuple, repre-
senting the element at the node — that is, a student’s name — and a pointer to a
record where the tuple for that student is kept. The remaining two fields, lc and
rc, are intended to be pointers to the left and right children of the node. We shall
use alphabetic order on the last names of students as the “less than” order with
which we compare elements at the nodes of the tree. The secondary index itself is
a variable of type TREE — that is, a pointer to a node — and it takes us to the root
of the binary search tree.
An example of the entire structure is shown in Fig. 8.6. To save space, the
Address and Phone components of tuples are not shown. The Li’s indicate the
memory locations at which the records of the primary index structure are stored.
SEC. 8.5 SECONDARY INDEX STRUCTURES 421
BUCKET
HEADERS
L1 L2
0 67890 L. Van Pelt 22222 P. Patty •
L3
1 12345 C. Brown •
C. Brown L3
L. Van Pelt L1
P. Patty L2
• •
pointers to tuples, one pointer for each tuple with a given value in the Name field.
For instance, if there were several P. Patty’s, the bottom node in Fig. 8.6(b) would
have, in place of L2, the header of a linked list. The elements of that list would be
the pointers to the various tuples that had Name attribute equal to “P. Patty.”
EXERCISES
8.5.1: Show how to modify the binary search tree structure of Fig. 8.5 to allow for
the possibility that there are several tuples in the StudentId-Name-Address-Phone
relation that have the same student name. Write a C function that takes a name
and lists all the tuples of the relation that have that name for the Name attribute.
8.5.2**: Suppose that we have decided to store the
StudentId-Name-Address-Phone
SEC. 8.6 NAVIGATION AMONG RELATIONS 423
relation with a primary index on StudentId. We may also decide to create some
secondary indexes. Suppose that all lookups will specify only one attribute, either
Name, Address, or Phone. Assume that 75% of all lookups specify Name, 20%
specify Address, and 5% specify Phone. Suppose that the cost of an insertion or a
deletion is 1 time unit, plus 1/2 time unit for each secondary index we choose to
build (e.g., the cost is 2.5 time units if we build all three secondary indexes). Let the
cost of a lookup be 1 unit if we specify an attribute for which there is a secondary
index, and 10 units if there is no secondary index on the specified attribute. Let a be
the fraction of operations that are insertions or deletions of tuples with all attributes
specified; the remaining fraction 1 − a of the operations are lookups specifying one
of the attributes, according to the probabilities we assumed [e.g., .75(1 − a) of all
operations are lookups given a Name value]. If our goal is to minimize the average
time of an operation, which secondary indexes should we create if the value of
parameter a is (a) .01 (b) .1 (c) .5 (d) .9 (e) .99?
8.5.3: Suppose that the DMV wants to be able to answer the following types of
queries efficiently, that is, much faster than by searching entire relations.
i) Given a driver’s name, find the driver’s license(s) issued to people with that
name.
ii) Given a driver’s license number, find the name of the driver.
iii) Given a driver’s license number, find the registration numbers of the auto(s)
owned by this driver.
iv) Given an address, find all the drivers’ names at that address.
v) Given a registration number (i.e., a license plate), find the driver’s license(s)
of the owner(s) of the auto.
Suggest a suitable data structure for your relations from Exercise 8.3.2 that will
allow all these queries to be answered efficiently. It is sufficient to suppose that
each index will be built from a hash table and tell what the primary and secondary
indexes are for each relation. Explain how you would then answer each type of
query.
8.5.4*: Suppose that it is desired to find efficiently the pointers in a given secondary
index that point to a particular tuple t in the primary index structure. Suggest
a data structure that allows us to find these pointers in time proportional to the
number of pointers found. What operations are made more time-consuming because
of this additional structure?
✦
✦ ✦
✦
8.6 Navigation among Relations
Until now, we have considered only operations involving a single relation, such
as finding a tuple given values for one or more of its components. The power of
the relational model can be seen best when we consider operations that require
us to “navigate,” or jump from one relation to another. For example, we could
answer the query “What grade did the student with ID 12345 get in CS101?” by
working entirely within the Course-StudentId-Grade relation. But it would be more
natural to ask, “What grade did C. Brown get in CS101?” That query cannot be
answered within the Course-StudentId-Grade relation alone, because that relation
uses student ID’s, rather than names.
424 THE RELATIONAL DATA MODEL
“C. Brown”
answers
If there are no indexes we can use, then answering this query can be quite
time-consuming. Suppose that there are n tuples in the
StudentId-Name-Address-Phone
relation and m tuples in the Course-StudentId-Grade relation. Also assume that
there are k students with the name “C. Brown.” A sketch of the algorithm for
finding the grades of this student or students in CS101, assuming there are no
indexes we can use, is shown in Fig. 8.8.
Let us determine the running time of the program in Fig. 8.8. Starting from
the inside out, the print statement of line (6) takes O(1) time. The conditional
statement of lines (5) and (6) also takes O(1) time, since the test of line (5) is an
O(1)-time test. Since we assume that there are m tuples in the relation
SEC. 8.6 NAVIGATION AMONG RELATIONS 425
Course-StudentId-Grade
the loop of lines (4) through (6) is iterated m times and thus takes O(m) time in
total. Since line (3) takes O(1) time, the block of lines (3) to (6) takes O(m) time.
Now consider the if-statement of lines (2) to (6). Since the test of line (2) takes
O(1) time, the entire if-statement takes O(1) time if the condition is false and O(m)
time if it is true. However, we have assumed that the condition is true for k tuples
and false for the rest; that is, there are k tuples t for which the name component
is “C. Brown.” Since there is so much difference between the times taken when
the condition is true and when it is false, we should be careful how we analyze the
for-loop of lines (1) to (6). That is, instead of counting the number of times around
the loop and multiplying by the greatest time the body can take, we shall consider
separately the two outcomes of the test at line (2).
First, we go around the loop n times, because that is the number of different
values of t. For the k tuples t on which the test at line (2) is true, we take O(m)
time each, or a total of O(km) time. For the remaining n − k tuples for which the
test is false, we take O(1) time per tuple, or O(n − k) total. Since k is presumably
much less than n, we can take O(n) as a simpler but tight upper bound instead of
O(n − k). Thus the cost of the entire program is O(n + km). In the likely case
where k = 1, when there is only one student with the given name, the time required,
O(n + m), is proportional to the sum of the sizes of the two relations involved. If
k is greater than 1, the time is greater still.
Let us assume that the index on Name is a hash table with about n buckets,
used as a secondary index. Since n is the number of tuples in the
StudentId-Name-Address-Grade
relation, the buckets have O(1) tuples each, on the average. Finding the bucket for
Name value “C. Brown” takes O(1) time. If there are k tuples with this name, it
will take O(k) time to find these tuples in the bucket and O(1) time to skip over
possible other tuples in the bucket. Thus line (1) of Fig. 8.9 takes O(k) time on the
average.
The loop of lines (2) through (5) is executed k times. Let us suppose we store
the k tuples t that were found at line (1) in a linked list. Then the cost of going
around the loop by finding the next tuple t or discovering that there are no more
tuples is O(1), as are the costs of lines (3) and (5). We claim that line (4) can also
be executed in O(1) time, and therefore the loop of lines (2) to (5) takes O(k) time.
We analyze line (4) as follows. Line (4) requires the lookup of a single tuple,
given its key value. Let us suppose that the Course-StudentId-Grade relation has a
primary index on its key, {Course, StudentId}, and that this index is a hash table
with about m buckets. Then the average number of tuples per bucket is O(1), and
therefore line (4) of Fig. 8.9 takes O(1) time. We conclude that the body of the
loop of lines (2) through (5) takes O(1) average time, and thus the entire program
of Fig. 8.9 takes O(k) average time. That is, the cost is proportional to the number
of students with the particular name we query about, regardless of the size of the
relations involved.
“C. Brown”
Course Room
answers
If we do not have indexes, then the best we can hope for is that we can execute
this plan in time proportional to the sum of the sizes of the four relations involved.
However, there are a number of indexes we can take advantage of.
a) In step (1), we can use an index on the Name component of the
StudentId-Name-Address-Phone
relation to get the student ID of C. Brown in O(1) average time.
b) In step (2), we can take advantage of an index on the StudentId component of
Course-StudentId-Grade to get in O(k) time all the courses C. Brown is taking,
if he is taking k courses.
c) In step (3), we can take advantage of an index on Course in the
Course-Day-Hour
relation to find all the meetings of the k courses from step (2) in average time
proportional to the sum of the numbers of meetings of these courses. If we
assume that no course meets more than five times a week, then there are at
most 5k tuples, and we can find them in O(k) average time. If there is no index
on Course for this relation, but there is an index on Day and/or Hour, we can
take some advantage of such an index, although we may look at far more than
O(k) tuples, depending on how many courses there are that meet on Monday
or that meet at 9AM on some day.
d) In step (4), we can take advantage of an index on Course for the Course-Room
relation. In that case, we can retrieve the desired room in O(1) average time.
We conclude that, with all the right indexes, we can answer this very complicated
query in O(k) average time. Since k, the number of courses taken by C. Brown, can
be assumed small — say, 5 or so — this amount of time is normally quite small,
428 THE RELATIONAL DATA MODEL
EXERCISES
8.6.1: Suppose that the Course-StudentId-Grade relation in Fig. 8.9 did not have
an index on Course-StudentId pairs, but rather had an index on Course alone. How
would that affect the running time of Fig. 8.9? What if the index were only on
StudentId?
8.6.2: Discuss how the following queries can be answered efficiently. In each case,
state what assumptions you make about the number of elements in intermediate
sets (e.g., the number of courses taken by C. Brown), and also state what indexes
you assume exist.
a) Find all the prerequisites of the courses taken by C. Brown.
b) Find the phone numbers of all the students taking a course that meets in Turing
Aud.
c) Find the prerequisites of the prerequisites of CS206.
8.6.3: Assuming no indexes, how much time would each of the queries in Exercise
8.6.2 take, as a function of the sizes of the relations involved, assuming straightfor-
ward iterations over all tuples, as in the examples of this section?
✦
✦ ✦
✦
8.7 An Algebra of Relations
In Section 8.6 we saw that a query involving several relations can be quite compli-
cated. It is useful to express such queries in language that is much “higher-level”
than C, in the sense that the query expresses what we want (e.g., all tuples with
Course component equal to “CS101”) without having to deal with issues such as
SEC. 8.7 AN ALGEBRA OF RELATIONS 429
A B C
0 1 2
0 3 4
5 2 3
This relation has scheme {A, B, C}, and it has three tuples, (0, 1, 2), (0, 3, 4), and
(5, 2, 3).
Variable A variable argument might be represented by R(A, B, C), which denotes a
arguments relation called R, whose columns are named A, B, and C but whose set of tuples
is unknown. If the scheme {A, B, C} for R is understood or irrelevant, we can just
use R as the operand.
✦ Example 8.11. Let R and S be the relations of Fig. 8.11(a) and (b), re-
spectively. Note that both relations have the scheme {A, B}. The union operator
produces a relation with each tuple that appears in either R or S, or both. Note
that since relations are sets, they can never have two or more copies of the same
tuple, even though a tuple appears in both R and S, as does the tuple (0, 1) in this
example. The relation R ∪ S is shown in Fig. 8.11(c).
The intersection operator produces the relation that has those tuples appearing
in both operands. Thus the relation R ∩ S has only the tuple (0, 1), as shown in
Fig. 8.11(d). The set difference produces a relation with those tuples in the first
relation that are not also in the second. The relation R − S, shown in Fig. 8.11(e),
430 THE RELATIONAL DATA MODEL
A B A B
0 1 0 1
2 3 4 5
(a) R (b) S
A B
0 1
2 3 A B A B
4 5 0 1 2 3
has the tuple (2, 3) of R, because that tuple is not in S, but does not have the tuple
(0, 1) of R, because that tuple is also in S. ✦
The other operators of relational algebra are designed to perform the kinds of actions
we have studied in this chapter. For example, we have frequently wanted to extract
from a relation tuples meeting certain conditions, such as all tuples from the
Course-StudentId-Grade
relation that have Course component “CS101.” For this purpose, we use the se-
lection operator. This operator takes a single relation as operand, but also has a
conditional expression as a “parameter.” We write the selection operator σC (R),
where σ (Greek lower-case sigma) is the symbol for selection, C is the condition,
and R is the relation operand. The condition C is allowed to have operands that
are attributes from the scheme of R, as well as constants. The operators allowed in
C are the usual ones for C conditional expressions, that is, arithmetic comparisons
and the logical connectives.
The result of this operation is a relation whose scheme is the same as that of
R. Into this relation we put every tuple t of R such that condition C becomes true
when we substitute for each attribute A the component of tuple t in the column for
A.
✦ Example 8.12. Let CSG stand for the Course-StudentId-Grade relation of Fig.
8.1. If we want those tuples that have Course component “CS101,” we can write
the expression
σCourse=“CS101” (CSG)
SEC. 8.7 AN ALGEBRA OF RELATIONS 431
The result of this expression is a relation with the same scheme as CSG, that is,
{Course, StudentId, Grade}, and the set of tuples shown in Fig. 8.12. That is,
the condition becomes true only for those tuples where the Course component is
“CS101.” For then, when we substitute “CS101” for Course, the condition becomes
“CS101” = “CS101.” If the tuple has any other value, such as “EE200”, in the
Course component, we get an expression like “EE200” = “CS101,” which is false. ✦
Whereas the selection operator makes a copy of the relation with some rows deleted,
we often want to make a copy in which some columns are eliminated. For that pur-
pose we have the projection operator, represented by the symbol π. Like selection,
the projection operator takes a single relation as argument, and it also takes a pa-
rameter, which is a list of attributes, chosen from the scheme of the relation that is
the argument.
If R is a relation with set of attributes {A1 , . . . , Ak }, and (B1 , . . . , Bn ) is a list
of some of the A’s, then πB1 ,...,Bn (R), the projection of R onto attributes B1 , . . . , Bn ,
is the set of tuples formed as follows. Take each tuple t in R, and extract its com-
ponents in attributes B1 , . . . , Bn ; say these components are b1 , . . . , bn , respectively.
Then add the tuple (b1 , . . . , bn ) to the relation πB1 ,...,Bn (R). Note that two or more
tuples of R may have the same components in all of B1 , . . . , Bn . If so, only one copy
of the projection of those tuples goes into πB1 ,...,Bn (R), since that relation, like all
relations, cannot have more than one copy of any tuple.
✦ Example 8.13. Suppose we wanted to see only the student ID’s for the students
who are taking CS101. We could apply the same selection as in Example 8.12, which
gives us all the tuples for CS101 in the CSG relation, but we then must project
out the course and grade; that is, we project onto StudentId alone. The expression
that performs both operations is
πStudentId σCourse=“CS101” (CSG)
The result of this expression is the relation of Fig. 8.12 projected onto its StudentId
component — that is, the unary relation of Fig. 8.13. ✦
432 THE RELATIONAL DATA MODEL
StudentId
12345
67890
33333
Joining Relations
Finally, we need a way to express the idea that two relations are connected, so that
we can navigate from one to the other. For this purpose, we use the join operator,
which we denote ⊲⊳.3 Suppose we have two relations R and S, with sets of attributes
(schemes) {A1 , . . . , An } and {B1 , . . . , Bm }, respectively. We pick one attribute from
each set — say, Ai and Bj — and these attributes become parameters of the join
operation with arguments R and S.
The join of R and S, written R Ai⊲⊳ =Bj S, is formed by taking each tuple r from
R and each tuple s from S and comparing them. If the component of r for Ai equals
the component of s for Bj , then we form one tuple from r and s; otherwise, no tuple
is created from the pairing of r and s. We form a tuple from r and s by taking the
components of r and following them by all the components of s, but omitting the
component for Bj , which is the same as the Ai component of r anyway.
The relation R Ai⊲⊳ =Bj S is the set of tuples formed in this manner. Note that
there could be no tuples in this relation, if no value appearing in the Ai column
of R also appeared in the Bj column of S. At the other extreme, every tuple of
R could have the same value in the Ai component, and this component could also
appear in the Bj component of every tuple in S. Then, the number of tuples in
the join would be the product of the number of tuples in R and the number in S,
since every pair of tuples would match. Generally, the truth lies somewhere between
these extremes; each tuple of R pairs with some but not all of the tuples of S.
The scheme of the joined relation is
{A1 , . . . , An , B1 , . . . , Bj−1 , Bj+1 , . . . , Bm }
that is, the set of all attributes of R and S except for Bj . However, there could be
two occurrences of the same name on this list, if one of the A’s was the same as one
of the B’s (other than Bj , which is not an attribute of the join). If that is the case,
we shall insist that one of the pair of identical attributes be renamed.
3 The “join” that we describe here is less general than that normally found in relational algebra
but will serve to get the flavor of the operator without going into all the complexities of the
subject.
SEC. 8.7 AN ALGEBRA OF RELATIONS 433
that is, if the tuples are talking about the same course. Thus if we join CR with
CDH, requiring equality of the two Course attributes, we shall get a relation with
scheme
{Course, Room, Day, Hour}
that contains each tuple (c, r, d, h) such that (c, r) is a tuple of CR and (c, d, h) is
a tuple of CDH. The expression defining this relation is
CR ⊲⊳ CDH
Course=Course
and the value of the relation produced by this expression, assuming that the relations
have the tuples found in Fig. 8.2, is as shown in Fig. 8.14.
To see how the relation of Fig. 8.14 is constructed, consider the first tuple of
CR, which is (CS101, Turing Aud.). We examine the tuples of CDH for those that
have the same Course value, that is, “CS101.” In Fig. 8.2(c), we find that the first
three tuples match, and from each of them, we construct one of the first three tuples
of Fig. 8.14. For example, the first tuple of CDH, which is (CS101, M, 9AM), joins
with tuple (CS101, Turing Aud.) to create the first tuple of Fig. 8.14. Notice how
that tuple agrees with each of the two tuples from which it is constructed.
Similarly, the second tuple of CR, (EE200, 25 Ohm Hall), shares a common
Course component with each of the last three tuples of CDH. These three pairings
give rise to the last three tuples of Fig. 8.14. The last tuple of CR,
(PH100, Newton Lab.)
does not have the same Course component as any tuple of CDH. Thus that tuple
does not contribute anything at all to the join. ✦
Natural Join
When we join two relations R and S, it is common that the attributes we equate
have the same name. If, in addition, R and S have no other attribute names in
common, then we can omit the parameter of the join and simply write R ⊲⊳ S. Such
a join is called a natural join.
For instance, the join in Example 8.14 is a natural join. The equated attributes
are both called Course, and the remaining attributes of CR and CDH all have
distinct names. Thus we could have written this join simply as CR ⊲⊳ CDH.
434 THE RELATIONAL DATA MODEL
πDay,Hour
σRoom=“Turing Aud.”
⊲⊳
CR CDH
✦ Example 8.15. Building on Example 8.14, suppose we wanted to see not the
entire relation CR ⊲⊳ CDH, but just the Day-Hour pairs during which Turing Aud.
is occupied by some course. Then we need to take the relation of Fig. 8.14 and
1. Select for those tuples having Room component “Turing Aud.,” and
2. Project onto attributes Day and Hour.
The expression that performs the join, selection, and projection, in that order,
is
πDay,Hour σRoom=“Turing Aud.” (CR ⊲⊳ CDH)
Alternatively, we could display this expression as the tree shown in Fig. 8.15. The
relation computed at the join node appeared in Fig. 8.14. The relation for the
selection node is the first three tuples in Fig. 8.14, because these have “Turing Aud.”
in their Room component. The relation for the root of the expression is shown in
Fig. 8.16, that is, the Day and Hour components of the latter three tuples. ✦
Day Hour
M 9AM
W 9AM
F 9AM
EXERCISES
✦
✦ ✦
✦
8.8 Implementing Relational Algebra Operations
Using the right data structures and algorithms for relational algebra operations can
speed up database queries. In this section, we shall consider some of the simpler
and more common strategies for implementing relational algebra operations.
Implementing Projection
In principle, when we perform a projection, we have no choice but to run through
every tuple and make a copy that omits the components corresponding to attributes
not on the projection list. Indexes do not help us at all. Moreover, after we compute
the projection of each tuple, we may find that we are left with many duplicates.
For example, suppose we have a relation R with scheme (A, B, C) and we
compute πA,B (R). Even though R cannot have tuples that agree on all of A, B,
and C, it may have many tuples with the same values for attributes A and B
but different values for C. Then all these tuples will yield the same tuple in the
projection.
Thus, after we compute a projection such as S = πL (R), for some relation R
and list of attributes L, we must eliminate duplicates. For example, we could sort
S and then run through the tuples in the sorted order. Any tuple that is the same
SEC. 8.8 IMPLEMENTING RELATIONAL ALGEBRA OPERATIONS 437
as the previous tuple in the order will be eliminated. Another way to eliminate
duplicates is to treat the relation S as an ordinary set. Each time we generate a
tuple by projecting a tuple of R onto the attributes in the list L, we insert it into
the set. As with all insertions into a set, if the element inserted is already there, we
do nothing. A structure such as a hash table will serve adequately to represent the
set S of tuples generated by the projection.
To sort the relation S before eliminating duplicates requires O(n log n) time if
there are n tuples in the relation R. If we instead hash tuples of S as we generate
them and we use a number of buckets proportional to n, then the entire projection
will take O(n) time, on the average. Thus hashing is normally slightly better than
sorting.
Implementing Selection
When we perform a selection S = σC (R) and there are no indexes on R, then
we have no choice but to run through all the tuples of R to apply the condition C.
Regardless of how we perform the selection, we know that there can be no duplicates
in the result S, as long as R has no duplicates.
However, if there are indexes on R, then we can often take advantage of one of
them to home in on the tuples that meet the condition C, and we can thus avoid
looking at most or all of the tuples that do not meet condition C. The simplest
situation occurs when condition C is of the form A = b, where A is an attribute of
R and b is a constant. If R has an index on A, then we can retrieve all the tuples
that meet this condition by looking up b in the index.
If condition C is the logical AND of several conditions, then we can use any one
of them to look up tuples using an index, and then check the retrieved tuples to see
which ones meet the remaining conditions. For example, suppose condition C is
(A = a) AND (B = b)
Then we have the choice of using an index on A or an index on B, if either or both
exists. Suppose that there is an index on B, and either there is no index on A or
we prefer to use the index on B. Then we get all the tuples of R that have the
value b in their B component. Each of these tuples that has a in the A component
belongs in the relation S, the result of the selection; other retrieved tuples do not.
The time taken for the selection is proportional to the number of tuples with B
value b, which generally lies somewhere between the number of tuples in R and the
number of tuples in the answer, S.
Implementing Join
Suppose we want to take the natural join of relation R with scheme {A, B} and
relation S with scheme {B, C}. Suppose also that the join is the natural join,
with equality between the B attributes of the two relations.4 How we perform this
join depends on what indexes on attribute B we can find. The issues are similar to
those discussed in Section 8.6, when we considered how to navigate among relations,
because the join is the essence of navigation.
Nested-loop join There is an obvious and slow way to compute the join, called nested-loop join.
We compare every tuple of one relation with every tuple of the other relation, as
4 We show for each relation only one attribute (A and C, respectively) that is not involved in
the join, but the ideas mentioned here clearly carry over to relations with many attributes.
438 THE RELATIONAL DATA MODEL
✦ Example 8.16. Suppose we want to join the relation CDH from Fig. 8.2(c) with
the relation CR from Fig. 8.2(d). Here, Course plays the role of attribute B, Day
and Hour together play the role of A, and Room is C. The six tuples from CDH and
the three from CR are first padded with the name of the relation. No reordering
of components is necessary, because Course is first in both relations. When we
compare tuples, we first compare the Course components, using lexicographic order
to determine which course name comes first in the order. If there is a tie, that is,
if the course names are the same, we compare the last components, where we take
CDH to precede CR. If there is still a tie, we can allow either tuple to precede the
other.
Then one sorted order of the tuples will be as shown in Fig. 8.17. Note that
this list is not a relation, because it has tuples of varying lengths. However, it does
group the tuples for CS101 and the tuples for EE200, so that we can easily take the
5 We could arrange while sorting that the last component — that is, the relation name —
be taken into account, so that a tuple with a given B value from relation R is deemed to
precede a tuple with the same B value from S. Then, the tuples with a common B value
would appear with the tuples from R first, and then the tuples from S.
SEC. 8.8 IMPLEMENTING RELATIONAL ALGEBRA OPERATIONS 439
EXERCISES
8.8.1: Suppose that the StudentId-Name-Address-Phone relation (SNAP) of Fig.
8.2(a) is stored with a primary index on StudentId (the key) and a secondary
index on Phone. How would you compute most efficiently the answer to the query
σC (SNAP) if C were
a) StudentId = 12345 AND Address 6= “45 Kumquat Blvd”?
b) Name = “C. Brown” AND Phone = 555-1357?
c) Name = “C. Brown” OR Phone = 555-1357?
8.8.2: Show how to sort-join the relations CSG from Fig. 8.1 and SNAP from Fig.
8.2(a) by sorting the merged list of tuples as in Example 8.16. Assume the natural
join, or equality on the StudentId components, is wanted. Show the result of the
sort, analogous to Fig. 8.17, and give the tuples in the result of the join.
8.8.3*: Suppose that we join relations R and S, each with n tuples, and the result
has O(n3/2 ) tuples. Write formulas for the big-oh running time, as a function of n,
for the following techniques for taking the join:
a) Nested-loop join
b) Sort-join
c) Index-join, using an index on the join attribute of R
d) Index-join, using an index on the join attribute of S
8.8.4*: We proposed taking the union of two relations by using an index on an
attribute A that was a key for one of the relations. Is the method a reasonable way
to take a union if the attribute A that has an index is not a key?
8.8.5*: Suppose we want to compute (a) R ∩ S (b) R − S using an index on
attribute A for one of R and S. Can we obtain running time close to the sum of
the sizes of the two relations?
8.8.6: If we project a relation R onto a set of attributes that contains a key for R,
do we need to eliminate duplicates? Why?
✦
✦ ✦
✦
8.9 Algebraic Laws for Relations
As with other algebras, by transforming expressions we often have the opportunity
to “optimize” expressions. That is, we can take an expression that is expensive to
evaluate and turn it into an equivalent expression whose evaluation has a lower cost.
While transformations to arithmetic or logical expressions sometimes save a few
operations, the right transformations applied to expressions of relational algebra can
save orders of magnitude in the time it takes to evaluate the expression. Because of
the tremendous difference between the running times of optimized and unoptimized
relational algebra expressions, our ability to optimize such expressions is essential
if programmers are going to program in very high-level languages, like the language
SQL that we mentioned in Section 8.7.
SEC. 8.9 ALGEBRAIC LAWS FOR RELATIONS 441
✦ Example 8.17. Let us take up the complex query that we first considered
in Section 8.6: “Where is C. Brown 9 AM on Mondays?” This query involves
navigating over the four relations
1. CSG (Course-StudentId-Grade),
2. SNAP (StudentId-Name-Address-Phone),
3. CDH (Course-Day-Hour), and
4. CR (Course-Room).
To get an algebraic expression for the query, we can start by taking the natural
join of all four relations. That is, we connect CSG and SNAP by equating the
StudentId components. Think of this operation as extending each
Course-StudentId-Grade
tuple by adding components for the name, address, and phone of the student men-
tioned in the tuple. Of course, we wouldn’t want to store data this way, because it
forces us to repeat the information about each student once for each course the stu-
dent takes. However, we are not storing this data, but just designing an expression
to compute it.
To the result of CSG ⊲⊳ SNAP we join CDH, by equating on the Course
components. That join has the effect of taking each CSG tuple (already extended
by the student information), making one copy for each meeting of the course, and
extending each tuple by one of the possible Day and Hour values. Finally, we join
the result of (CSG ⊲⊳ SNAP) ⊲⊳ CDH with the CR relation, equating Course
components, which has the effect of extending each tuple by adding a component
with the room in which the course meets. The resulting relation has scheme
{Course, StudentId, Grade, Name, Address, Phone, Day, Hour, Room}
and the meaning of a tuple (c, s, g, n, a, p, d, h, r) is that
SEC. 8.9 ALGEBRAIC LAWS FOR RELATIONS 443
πRoom
⊲⊳
⊲⊳ CR
⊲⊳ CDH
CSG SNAP
conditions. We could split into three selections, but in this example it suffices to
split the condition Name = “C. Brown” off from the other two. The result of the
split is shown in Fig. 8.19(b).
Now, the selection involving Day and Hour can be pushed down to the right
operand of the middle join, since the right operand, the relation CDH, has both
attributes Day and Hour. Then the other selection, involving Name, can be pushed
to the left operand of the middle join, since that operand, CSG ⊲⊳ SNAP, has Name
as an attribute. These two changes yield the expression tree shown in Fig. 8.19(c).
Finally, the selection on Name involves an attribute of SNAP, and so we can
push this selection to the right operand of the bottom join. This change is shown
in Fig. 8.19(d).
Now we have an expression that gives us almost the same plan as we developed
in Section 8.6 for this query. We begin at the bottom of the expression in Fig.
8.19(d) by finding the student ID(s) for the student(s) named “C. Brown.” By
joining the tuples of SNAP that have Name = “C. Brown” with the CSG relation,
we get the courses taken by C. Brown. When we apply the second selection to
relation CDH, we get the courses that meet at 9AM on Mondays. The middle join
in Fig. 8.19(d) thus gives us tuples with a course that both is taken by C. Brown
and meets at 9AM Mondays. The top join gets the rooms in which those courses
meet, and the projection gives us these rooms as answer.
The major difference between this plan and the plan of Section 8.6 is that the
latter projects away useless components of tuples, while the plan here carries them
along until the end. Thus to complete our optimization of expressions of relational
algebra, we need laws that push projections down the tree. These laws are not all
the same as the laws for selection, as we shall see in the next subsection. ✦
πRoom πRoom
⊲⊳ ⊲⊳
CR
σName=“C.Brown” AND Day=“M” AND Hour=“9AM” σName=“C.Brown” CR
⊲⊳ CDH ⊲⊳
CSG SNAP
πRoom πRoom
⊲⊳ ⊲⊳
⊲⊳ CR ⊲⊳ CR
(c) Push the two selections (d) Push the selection on Name
in different directions. below the bottom join.
⊲⊳ S) ≡ π π (R) ⊲⊳ π (S)
πL (R A=B L M A=B N
where
1. List M consists of those attributes of L that are in the scheme for R, followed
by attribute A if it is not on L, and
2. List N is the attributes of L that are in the scheme of S, followed by B if that
attribute is not on list L.
Note that the useful way in which to apply this projection pushing law is from
left to right, even though we thereby introduce two additional projections and do
not get rid of any. The reason is that it is usually beneficial to project out what
attributes we can as early as possible, that is, as far down the tree as we can. We
still may have to do the projection onto the list L after the join, in the situation
where the join attribute A is not on the list L (recall that the other join attribute,
B from S, will not appear in the join anyway).
Sometimes, the lists M and/or N consist of all attributes of R or S, respectively.
If so, there is no point in performing the projection, since it has no effect, except
perhaps a pointless permutation of the columns of the relation. Thus we shall use
the following law.
πL (R) ≡ R
provided that list L consists of all the attributes in the scheme for R. Note that
this law takes the point of view that relations are not changed by permutations of
their columns.
There is also a situation in which we do not want to bother projecting. Suppose
we have a subexpression πL (R) that is part of a larger expression, and let R be a
single relation (rather than an expression involving one or more occurrences of
operators). Suppose also that above this subexpression in the expression tree is
another projection. To perform the projection on R now requires us to examine the
entire relation, regardless of the existence of indexes. If we instead carry along the
attributes of R not on the list L, until the next opportunity to project out those
attributes, we are frequently able to save a significant amount of time.
For instance, we shall, in the next example, discuss a subexpression
πCourse,StudentId(CSG)
which has the effect of getting rid of grades. Since our entire expression, which
is for the query of Example 8.17, eventually focuses on a few tuples of the CSG
relation, we are much better off projecting out grades later; by so doing, we avoid
ever examining the entire CSG relation.
✦ Example 8.18. Let us proceed from Fig. 8.19(d) to push projections down.
The projection at the root is first pushed below the top join. The projection list
consists of only Room, and the join attribute on both sides of the join is Course.
Thus on the left we project onto Course alone, since Room is not an attribute of the
expression on the left. The right operand of the join is projected onto both Course
and Room. Since these are all the attributes of the operand CR, we can omit the
projection. The resulting expression is shown in Fig. 8.20(a).
Now, we can push the projection onto Course below the middle join. Since
SEC. 8.9 ALGEBRAIC LAWS FOR RELATIONS 447
πRoom πRoom
⊲⊳ ⊲⊳
πCourse CR ⊲⊳ CR
⊲⊳ πCourse πCourse
SNAP SNAP
πRoom πRoom
⊲⊳ ⊲⊳
⊲⊳ CR ⊲⊳ CR
SNAP SNAP
(c) Push the projection (d) Remove the step that projects
below the bottom join. out the grade from CSG.
Course is also the join attribute on both sides, we introduce two operators πCourse
below the middle join. Since the result of the middle join then has only attribute
Course, we no longer need the projection above that join; the new expression is
shown in Fig. 8.20(b). Note that this join, involving two relations whose tuples
have only the one component Course, is effectively an intersection of sets. That
makes sense — it intersects the set of courses C. Brown is taking with the set of
courses that meet at 9AM Mondays.
At this point, we need to push πCourse below the bottom join. The join attribute
is StudentId on both sides, and so the projection list on the left is (Course, Studen-
tId) and the list on the right is just StudentId (because Course is not an attribute
of the expression on the right). The expression that results is shown in Fig. 8.20(c).
Finally, as we mentioned just before the example, it is advantageous here not
to project Grade out of the CSG relation immediately. Above that projection we
meet the operator πCourse , which will get rid of the grades anyway. If we instead
use the expression of Fig. 8.20(d), we have essentially the plan of Section 8.6 for
this query. That is, the expression πStudentId σName=“C.Brown” (SNAP) gives us
the student ID(s) for students named “C. Brown,” and the first join followed by
projection πCourse gives us the courses taken by those students. If there is an index
on Name for relation SNAP and there is an index on StudentId for relation CSG,
then these operations are performed quickly.
The subexpression πCourse σDay=“M” AND Hour=“9AM” (CDH) has as its value
the courses that meet at 9AM Mondays, and the middle join intersects these sets
to give us the courses taken by a student named “C. Brown” that meet at 9AM
Mondays. Finally, the top join followed by projection looks up these courses in the
CR relation (a fast operation if there is an index on Course), and produces the
associated rooms as answer. ✦
EXERCISES
8.9.3: Take each of your relational algebra queries from Exercise 8.7.3 and push
selections and projections down as far as you can.
8.9.4: Let us make the following gross simplifications regarding the number of
tuples that appear in relations that are the result of the operations of relational
algebra.
SEC. 8.10 SUMMARY OF CHAPTER 8 449
✦
✦ ✦
✦
8.10 Summary of Chapter 8
✦ A “key” for a relation is a set of attributes that uniquely determine values for
the other attributes of the relation. Often, a primary index uses a key for its
domain.
✦ “Secondary indexes” are data structures that facilitate operations that specify
a particular attribute, usually one not part of the domain for the primary index.
✦ Relational algebra is a high-level notation for specifying queries about one
or more relations. Its principal operations are union, intersection, difference,
selection, projection, and join.
✦ There are a number of ways to implement joins more efficiently than the obvious
“nested-loop join,” which pairs each tuple of one relation with each tuple of
the other. Index-join and sort-join run in time that is close to what it takes to
look at the two relations involved and produce the result of the join.
✦ Optimization of expressions in relational algebra can make significant improve-
ments in the running time for evaluation of expressions and is therefore essential
if languages based on relational algebra are to be used in practice to express
queries.
✦ A number of ways to improve the running time of a given expression are known.
Pushing down selections is often the most profitable.
✦
✦ ✦
✦
8.11 Bibliographic Notes for Chapter 8
Further study of database systems, especially those based on the relational model,
can be found in Ullman [1988].
The paper by Codd [1970] is generally regarded as the origin of the relational
data model, although there were a number of earlier works that contained some
of the ideas. The first implementations of systems using this model were INGRES
(Stonebraker et al. [1976]) at Berkeley and System R (Astrahan et al. [1976]) at
IBM. The latter is the origin of the language SQL sampled in Section 8.7 and found
in many database management systems today; see Chamberlin et al. [1976]. The
relational model is also found in the UNIX command awk (Aho, Kernighan, and
Weinberger [1988]).
Aho, A. V., B. W. Kernighan, and P. J. Weinberger [1988]. The AWK programming
Language, Addison-Wesley, Reading, Mass.
Astrahan, M. M., et al. [1976]. “System R: a relational approach to data manage-
ment,” ACM Trans. on Database Systems 1:2, pp. 97–137.
Chamberlin, D. D., et al. [1976]. “SEQUEL 2: a unified approach to data definition,
manipulation, and control,” IBM J. Research and Development 20:6, pp. 560–575.
Codd, E. F. [1970]. “A relational model for large shared data banks,” Comm. ACM
13:6, pp. 377–387.
Stonebraker, M., E. Wong, P. Kreps, and G. Held [1976]. “The design and imple-
mentation of INGRES,” ACM Trans. on Database Systems 1:3, pp. 189–222.
Ullman, J. D. [1988]. Principles of Database and Knowledge-Base Systems (two
volumes) Computer Science Press, New York.
CHAPTER 9
✦
✦ ✦
✦
The Graph
Data Model
A graph is, in a sense, nothing more than a binary relation. However, it has a
powerful visualization as a set of points (called nodes) connected by lines (called
edges) or by arrows (called arcs). In this regard, the graph is a generalization of the
tree data model that we studied in Chapter 5. Like trees, graphs come in several
forms: directed/undirected, and labeled/unlabeled.
Also like trees, graphs are useful in a wide spectrum of problems such as com-
puting distances, finding circularities in relationships, and determining connectiv-
ities. We have already seen graphs used to represent the structure of programs in
Chapter 2. Graphs were used in Chapter 7 to represent binary relations and to
illustrate certain properties of relations, like commutativity. We shall see graphs
used to represent automata in Chapter 10 and to represent electronic circuits in
Chapter 13. Several other important applications of graphs are discussed in this
chapter.
✦
✦ ✦
✦
9.1 What This Chapter Is About
The main topics of this chapter are
✦ The definitions concerning directed and undirected graphs (Sections 9.2 and
9.10).
✦ The two principal data structures for representing graphs: adjacency lists and
adjacency matrices (Section 9.3).
✦ An algorithm and data structure for finding the connected components of an
undirected graph (Section 9.4).
✦ A technique for finding minimal spanning trees (Section 9.5).
✦ A useful technique for exploring graphs, called “depth-first search” (Section
9.6).
451
452 THE GRAPH DATA MODEL
✦
✦ ✦
✦
9.2 Basic Concepts
Directed graph A directed graph, consists of
1. A set N of nodes and
Nodes and arcs 2. A binary relation A on N . We call A the set of arcs of the directed graph.
Arcs are thus pairs of nodes.
Graphs are drawn as suggested in Fig. 9.1. Each node is represented by a
circle, with the name of the node inside. We shall usually name the nodes by
integers starting at 0, or we shall use an equivalent enumeration. In Fig. 9.1, the
set of nodes is N = {0, 1, 2, 3, 4}.
Each arc (u, v) in A is represented by an arrow from u to v. In Fig. 9.1, the
set of arcs is
A = {(0, 0), (0, 1), (0, 2), (1, 3), (2, 0), (2, 1), (2, 4), (3, 2), (3, 4), (4, 1)}
0 1
2 3
Head and tail In text, it is customary to represent an arc (u, v) as u → v. We call v the head
of the arc and u the tail to conform with the notion that v is at the head of the
SEC. 9.2 BASIC CONCEPTS 453
arrow and u is at its tail. For example, 0 → 1 is an arc of Fig. 9.1; its head is node
1 and its tail is node 0. Another arc is 0 → 0; such an arc from a node to itself is
Loop called a loop. For this arc, both the head and the tail are node 0.
Labels
As for trees, it is permissible to attach a label to each node. Labels will be drawn
near their node. Similarly, we can label arcs by placing the label near the middle
of the arc. Any type can be used as a node label or an arc label. For instance, Fig.
9.2 shows a node named 1, with a label “dog,” a node named 2, labeled “cat,” and
an arc 1 → 2 labeled “bites.”
bites
1 2
dog cat
Again as with trees, we should not confuse the name of a node with its label.
Node names must be unique in a graph, but two or more nodes can have the same
label.
Paths
A path in a directed graph is a list of nodes (v1 , v2 , . . . , vk ) such that there is an arc
Length of a from each node to the next, that is, vi → vi+1 for i = 1, 2, . . . , k − 1. The length
path of the path is k − 1, the number of arcs along the path. For example, (0, 1, 3) is a
path of length two in Fig. 9.1.
The trivial case k = 1 is permitted. That is, any node v by itself is a path of
length zero from v to v. This path has no arcs.
✦ Example 9.1. Consider the graph of Fig. 9.1. There is a cycle (0, 0) of length 1
because of the loop 0 → 0. There is a cycle (0, 2, 0) of length 2 because of the arcs
0 → 2 and 2 → 0. Similarly, (1, 3, 2, 1) is a cycle of length 3, and (1, 3, 2, 4, 1) is a
cycle of length 4. ✦
454 THE GRAPH DATA MODEL
Note that a cycle can be written to start and end at any of its nodes. That
Equivalent is, the cycle (v1 , v2 , . . . , vk , v1 ) could also be written as (v2 , . . . , vk , v1 , v2 ) or as
cycles (v3 , . . . , vk , v1 , v2 , v3 ), and so on. For example, the cycle (1, 3, 2, 4, 1) could also
have been written as (2, 4, 1, 3, 2).
On every cycle, the first and last nodes are the same. We say that a cycle
Simple cycle (v1 , v2 , . . . , vk , v1 ) is simple if no node appears more than once among v1 , . . . , vk ;
that is, the only repetition in a simple cycle occurs at the final node.
✦ Example 9.2. All the cycles in Example 9.1 are simple. In Fig. 9.1 the
cycle (0, 2, 0) is simple. However, there are cycles that are not simple, such as
(0, 2, 1, 3, 2, 0) in which node 2 appears twice. ✦
Given a nonsimple cycle containing node v, we can find a simple cycle contain-
ing v. To see why, write the cycle to begin and end at v, as in (v, v1 , v2 , . . . , vk , v).
If the cycle is not simple, then either
1. v appears three or more times, or
2. There is some node u other than v that appears twice; that is, the cycle must
look like (v, . . . , u, . . . , u, . . . , v).
In case (1), we can remove everything up to, but not including, the next-to-last
occurrence of v. The result is a shorter cycle from v to v. In case (2), we can remove
the section from u to u, replacing it by a single occurrence of u, to get the cycle
(v, . . . , u, . . . , v). The result must still be a cycle in either case, because each arc of
the result is present in the original cycle, and therefore is present in the graph.
It may be necessary to repeat this transformation several times before the cycle
becomes simple. Since the cycle always gets shorter with each iteration, eventually
we must arrive at a simple cycle. What we have just shown is that if there is a cycle
in a graph, then there must be at least one simple cycle.
✦ Example 9.3. Given the cycle (0, 2, 1, 3, 2, 0), we can remove the first 2 and the
following 1, 3 to get the simple cycle (0, 2, 0). In physical terms, we started with
the cycle that begins at 0, goes to 2, then 1, then 3, back to 2, and finally back to
0. The first time we are at 2, we can pretend it is the second time, skip going to 1
and 3, and proceed right back to 0.
For another example, consider the nonsimple cycle (0, 0, 0). As 0 appears three
times, we remove the first 0, that is, everything up to but not including the next-
to-last 0. Physically, we have replaced the path in which we went around the loop
0 → 0 twice by the path in which we go around once. ✦
Cyclic graph If a graph has one or more cycles, we say the graph is cyclic. If there are no
cycles, the graph is said to be acyclic. By what we just argued about simple cycles,
a graph is cyclic if and only if it has a simple cycle, because if it has any cycles at
all, it will have a simple cycle.
✦ Example 9.4. We mentioned in Section 3.8 that we could represent the calls
SEC. 9.2 BASIC CONCEPTS 455
Calling graph made by a collection of functions with a directed graph called the “calling graph.”
The nodes are the functions, and there is an arc P → Q if function P calls function
Q. For instance, Fig. 9.3 shows the calling graph associated with the merge sort
algorithm of Section 2.9.
main
MakeList PrintList
MergeSort
split merge
The existence of a cycle in the calling graph implies a recursion in the algorithm.
In Fig. 9.3 there are four simple cycles, one around each of the nodes MakeList,
MergeSort, split, and merge. Each cycle is a trivial loop. Recall that all these
functions call themselves, and thus are recursive. Recursions in which a function
calls itself are by far the most common kind, and each of these appears as a loop in
Direct and the calling graph. We call these recursions direct. However, one occasionally sees an
indirect indirect recursion, in which there is a cycle of length greater than 1. For instance,
recursion the graph
P Q R
represents a function P that calls function Q, which calls function R, which calls
function P . ✦
456 THE GRAPH DATA MODEL
Acyclic Paths
A path is said to be acyclic if no node appears more than once on the path. Clearly,
no cycle is acyclic. The argument that we just gave to show that for every cycle
there is a simple cycle also demonstrates the following principle. If there is any path
at all from u to v, then there is an acyclic path from u to v. To see why, start with
any path from u to v. If there is a repetition of some node w, which could be u or
v, replace the two occurrences of w and everything in between by one occurrence of
w. As for the case of cycles, we may have to repeat this process several times, but
eventually we reduce the path to an acyclic path.
✦ Example 9.5. Consider the graph of Fig 9.1 again. The path (0, 1, 3, 2, 1, 3, 4)
is a path from 0 to 4 that contains a cycle. We can focus on the two occurrences of
node 1, and replace them, and the 3, 2 between them, by 1, leaving (0, 1, 3, 4), which
is an acyclic path because no node appears twice. We could also have obtained the
same result by focusing on the two occurrences of node 3. ✦
Undirected Graphs
Sometimes it makes sense to connect nodes by lines that have no direction, called
Edge edges. Formally, an edge is a set of two nodes. The edge {u, v} says that nodes u
and v are connected in both directions.1 If {u, v} is an edge, then nodes u and v
Neighbors are said to be adjacent or to be neighbors. A graph with edges, that is, a graph
with a symmetric arc relation, is called an undirected graph.
✦ Example 9.6. Figure 9.4 represents a partial road map of the Hawaiian Islands,
indicating some of the principal cities. Cities with a road between them are indicated
by an edge, and the edge is labeled by the driving distance. It is natural to represent
roads by edges, rather than arcs, because roads are normally two-way. ✦
1 Note that the edge is required to have exactly two nodes. A singleton set consisting of one
node is not an edge. Thus, although an arc from a node to itself is permitted, we do not
permit a looping edge from a node to itself. Some definitions of “undirected graph” do permit
such loops.
SEC. 9.2 BASIC CONCEPTS 457
Kahului
22 60
28 24
Keokea
15
Maili Wahiawa Kaneohe
20 12 11
Pearl 13
Honolulu Kamuela
City
31 45
114
Kona Hilo
a cycle.
Simple cycle Perhaps the easiest approach is to define a simple cycle in an undirected graph
to be a path of length three or more that begins and ends at the same node, and
with the exception of the last node does not repeat any node. The notion of a
nonsimple cycle in an undirected graph is not generally useful, and we shall not
pursue this concept.
Equivalent As with directed cycles, we regard two undirected cycles as the same if they
cycles consist of the same nodes in the same order, with a different starting point. Undi-
rected cycles are also the same if they consist of the same nodes in reverse order.
Formally, the simple cycle (v1 , v2 , . . . , vk ) is equivalent, for each i between 1 and k,
to the cycle (vi , vi+1 , . . . , vk , v1 , v2 , . . . , vi−1 ) and to the cycle
(vi , vi−1 , . . . , v1 , vk , vk−1 , . . . , vi+1 )
by starting at Maili and proceeding in the same order around the circle. Likewise,
it could have been written to start at Pearl City and proceed around the circle in
reverse order:
(Pearl City, Maili, Wahiawa, Pearl City)
For another example,
(Laie, Wahiawa, Pearl City, Honolulu, Kaneohe, Laie)
is a simple cycle of length five. ✦
f b
e c
EXERCISES
9.2.6: Find an example of indirect recursion among the functions so far in this
book.
9.2.7: Write the cycle (0, 1, 2, 0) in all possible ways.
9.2.8*: Let G be a directed graph and let R be the relation on the cycles of G defined
by (u1 , . . . , uk , u1 )R(v1 , . . . , vk , v1 ) if and only if (u1 , . . . , uk , u1 ) and (v1 , . . . , vk , v1 )
represent the same cycle. Show that R is an equivalence relation on the cycles of
G.
9.2.9*: Show that the relation S defined on the nodes of a graph by uSv if and only
if u = v or there is some cycle that includes both nodes u and v, is an equivalence
relation.
✦
✦ ✦
✦
9.3 Implementation of Graphs
There are two standard ways to represent a graph. One, called adjacency lists, is
familiar from the implementation of binary relations in general. The second, called
adjacency matrices, is a new way to represent binary relations, and is more suitable
for relations where the number of pairs is a sizable fraction of the total number
of pairs that could possibly exist over a given domain. We shall consider these
representations, first for directed graphs, then for undirected graphs.
Adjacency Lists
That is, the entry successors[u] contains a pointer to a linked list of all the
successors of node u.
460 THE GRAPH DATA MODEL
successors
0 0 1 2 •
1 3 •
2 0 1 4 •
3 2 4 •
4 1 •
✦ Example 9.8. The graph of Fig. 9.1 can be represented by the adjacency lists
shown in Fig. 9.6. We have sorted the adjacency lists by node number, but the
successors of a node can appear in any order on its adjacency list. ✦
Adjacency Matrices
Another common way to represent directed graphs is as adjacency matrices. We
can create a two-dimensional array
BOOLEAN arcs[MAX][MAX];
in which the value of arcs[u][v] is TRUE if there is an arc u → v, and FALSE if not.
✦ Example 9.9. The adjacency matrix for the graph of Fig. 9.1 is shown in Fig.
9.7. We use 1 for TRUE and 0 for FALSE. ✦
0 1 2 3 4
0 1 1 1 0 0
1 0 0 0 1 0
2 1 1 0 0 1
3 0 0 1 0 1
4 0 1 0 0 0
Operations on Graphs
We can see some of the distinctions between the two graph representations if we
consider some simple operations on graphs. Perhaps the most basic operation is
to determine whether there is an arc u → v from a node u to a node v. In the
adjacency matrix, it takes O(1) time to look up arcs[u][v] to see whether the
entry there is TRUE or not.
SEC. 9.3 IMPLEMENTATION OF GRAPHS 461
With adjacency lists, it takes O(1) time to find the header of the adjacency
list for u. We must then traverse this list to the end if v is not there, or half the
way down the list on the average if v is present. If there are a arcs and n nodes in
the graph, then we take time O(1 + a/n) on the average to do the lookup. If a is
no more than a constant factor times n, this quantity is O(1). However, the larger
a is when compared with n, the longer it takes to tell whether an arc is present
using the adjacency list representation. In the extreme case where a is around n2 ,
its maximum possible value, there are around n nodes on each adjacency list. In
this case, it takes O(n) time on the average to find a given arc. Put another way,
the denser a graph is, the more we prefer the adjacency matrix to adjacency lists,
when we need to look up a given arc.
On the other hand, we often need to find all the successors of a given node
u. Using adjacency lists, we go to successors[u] and traverse the list, in average
time O(a/n), to find all the successors. If a is comparable to n, then we find all the
successors of u in O(1) time. But with adjacency matrices, we must examine the
entire row for node u, taking O(n) time no matter what a is. Thus, for graphs with
a small number of edges per node, adjacency lists are much faster than adjacency
matrices when we need to examine all the successors of a given node.
However, suppose we want to find all the predecessors of a given node v. With
an adjacency matrix, we can examine the column for v; a 1 in the row for u means
that u is a predecessor of v. This examination takes O(n) time. The adjacency-
list representation gives us no help finding predecessors. We must examine the
adjacency list for every node u, to see if that list includes v. Thus, we may examine
462 THE GRAPH DATA MODEL
A Matter of Degree
The number of arcs out of a node v is called the out-degree of v. Thus, the out-
In- and Out- degree of a node equals the length of its adjacency list; it also equals the number
degree of 1’s in the row for v in the adjacency matrix. The number of arcs into node v is
the in-degree of v. The in-degree measures the number of times v appears on the
adjacency list of some node, and it is the number of 1’s found in the column for v
in the adjacency matrix.
In an undirected graph, we do not distinguish between edges coming in or
going out of a node. For an undirected graph, the degree of node v is the number
of neighbors of v, that is, the number of edges {u, v} containing v for some node
u. Remember that in a set, order of members is unimportant, so {u, v} and {v, u}
Degree of a are the same edge, and are counted only once. The degree of an undirected graph is
graph the maximum degree of any node in the graph. For example, if we regard a binary
tree as an undirected graph, its degree is 3, since a node can only have edges to its
parent, its left child, and its right child. For a directed graph, we can say that the
in-degree of a graph is the maximum of the in-degrees of its nodes, and likewise, the
out-degree of a graph is the maximum of the out-degrees of its nodes.
all the cells of all the adjacency lists, and we shall probably examine most of them.
Since the number of cells in the entire adjacency list structure is equal to a, the
number of arcs of the graph, the time to find predecessors using adjacency lists is
thus O(a) on a graph of a arcs. Here, the advantage goes to the adjacency matrix;
and the denser the graph, the greater the advantage.
✦ Example 9.10. Consider how to represent the largest component of the undi-
rected graph of Fig. 9.4 (which represents six cities on the island of Oahu). For the
moment, we shall ignore the labels on the edges. The adjacency matrix representa-
tion is shown in Fig. 9.8. Notice that the matrix is symmetric.
Figure 9.9 shows the representation by adjacency lists. In both cases, we are
using an enumeration type
enum CITYTYPE {Laie, Kaneohe, Honolulu,
PearlCity, Maili, Wahiawa};
to index arrays. That arrangement is somewhat rigid, since it does not allow any
changes in the set of nodes of the graph. We shall give a similar example shortly
where we name nodes explicitly by integers, and use city names as node labels, for
more flexibility in changing the set of nodes. ✦
successors
Suppose a graph has labels on its arcs (or edges if it is undirected). Using an
adjacency matrix, we can replace the 1 that represents the presence of arc u → v in
the graph by the label of this arc. It is necessary that we have some value that is
permissible as a matrix entry but cannot be mistaken for a label; we use this value
to represent the absence of an arc.
If we represent the graph by adjacency lists, we add to the cells forming the
lists an additional field nodeLabel. If there is an arc u → v with label L, then on
the adjacency list for node u we shall find a cell with v in its nodeName field and L
in its nodeLabel field. That value represents the label of the arc.
We represent labels on nodes in a different way. For an adjacency matrix, we
simply create another array, say NodeLabels, and let NodeLabels[u] be the label of
node u. When we use adjacency lists, we already have an array of headers indexed
by nodes. We change this array so that it has elements that are structures, one field
for the node label and one field pointing to the beginning of the adjacency list.
464 THE GRAPH DATA MODEL
0 Laie
28 24
15
Maili 4 5 Wahiawa 1 Kaneohe
20 12 11
3 2 Honolulu
13
Pearl
City
Fig. 9.10. Map of Oahu with nodes named by integers and labeled by cities.
cities
0 Laie
1 Kaneohe
2 Honolulu
3 PearlCity
4 Maili
5 Wahiawa
distances
0 1 2 3 4 5
0 −1 24 −1 −1 −1 28
1 24 −1 11 −1 −1 −1
2 −1 11 −1 13 −1 −1
3 −1 −1 13 −1 20 12
4 −1 −1 −1 20 −1 15
5 28 −1 −1 12 15 −1
✦ Example 9.11. Let us again represent the large component of the graph of
Fig. 9.4, but this time, we shall incorporate the edge labels, which are distances.
Furthermore, we shall give the nodes integer names, starting with 0 for Laie, and
proceeding clockwise. The city names themselves are indicated by node labels.
We shall take the type of node labels to be character arrays of length 32. This
representation is more flexible than that of Example 9.10, since if we allocate extra
places in the array, we can add cities should we wish. The resulting graph is redrawn
SEC. 9.3 IMPLEMENTATION OF GRAPHS 465
cities
0 Laie 1 24 5 28 •
1 Kaneohe 0 24 2 11 •
2 Honolulu 1 11 3 13 •
3 PearlCity 2 13 4 20 5 12 •
4 Maili 3 20 5 15 •
5 Wahiawa 0 28 3 12 4 15 •
Fig. 9.12. Adjacency-list representation of graph with node and edge labels.
466 THE GRAPH DATA MODEL
EXERCISES
9.3.1: Represent the graph of Fig. 9.5 (see the exercises of Section 9.2) by
a) Adjacency lists
b) An adjacency matrix
9.3.2: Suppose the arcs of Fig. 9.5 were instead edges (i.e., the graph were undi-
rected). Repeat Exercise 9.3.1 for the undirected graph.
9.3.3: Let us label each of the arcs of the directed graph of Fig. 9.5 by the character
string of length 2 consisting of the tail followed by the head. For example, the arc
a → b is labeled by the character string ab. Also, suppose each node is labeled
by the capital letter corresponding to its name. For instance, the node named a is
labeled A. Repeat Exercise 9.3.1 for this labeled, directed graph.
9.3.6: Design algorithms to insert and delete arcs from an (a) adjacency-matrix (b)
adjacency-list representation of a directed graph.
a) Looking up an arc?
b) Finding all successors?
c) Finding all predecessors?
✦
✦ ✦
✦
9.4 Connected Components of an Undirected Graph
We can divide any undirected graph into one or more connected components. Each
connected component is a set of nodes with paths from any member of the compo-
nent to any other. Moreover, the connected components are maximal, that is, for
Connected no node in the component is there a path to any node outside the component. If a
graph graph consists of a single connected component, then we say the graph is connected.
SEC. 9.4 CONNECTED COMPONENTS OF AN UNDIRECTED GRAPH 467
✦ Example 9.12. Consider again the graph of the Hawaiian Islands in Fig. 9.4.
There are three connected components, corresponding to three islands. The largest
component consists of Laie, Kaneohe, Honolulu, Pearl City, Maili, and Wahiawa.
These are cities on the island of Oahu, and they are clearly mutually connected
by roads, that is, by paths of edges. Also, clearly, there are no roads leading from
Oahu to any other island. In graph-theoretic terms, there are no paths from any of
the six cities mentioned above to any of the other cities in Fig. 9.4.
A second component consists of the cities of Lahaina, Kahului, Hana, and
Keokea; these are cities on the island of Maui. The third component is the cities of
Hilo, Kona, and Kamuela, on the “big island” of Hawaii. ✦
from Maili to Laie in the same graph. If we put these paths together, we get a path
from Honolulu to Laie:
(Honolulu, PearlCity, Wahiawa, Maili, PearlCity, Wahiawa, Laie)
It happens that this path is cyclic. As mentioned in Section 9.2, we can always
remove cycles to get an acyclic path. In this case, one way to do so is to replace
the two occurrences of Wahiawa and the nodes in between by one occurrence of
Wahiawa to get
(Honolulu, PearlCity, Wahiawa, Laie)
BASIS. G0 consists of only the nodes of G with none of the edges. Every node is
in a component by itself.
INDUCTION. Suppose we have the connected components for the graph Gi after
considering the first i edges, and we now consider the (i + 1)st edge, {u, v}.
1. If u and v are in the same component of Gi , then Gi+1 has the same set of
connected components as Gi , because the new edge does not connect any nodes
that were not already connected.
When we have considered all edges in this manner, we have the connected compo-
nents of the full graph.
SEC. 9.4 CONNECTED COMPONENTS OF AN UNDIRECTED GRAPH 469
u v
Fig. 9.13. Adding edge {u, v} connects the components containing u and v.
✦ Example 9.14. Let us consider the graph of Fig. 9.4. We can consider edges
in any order, but for reasons having to do with an algorithm in the next section,
let us list the edges in order of the edge labels, smallest first. This list of edges is
shown in Fig. 9.14.
Initially, all thirteen nodes are in components of their own. When we consider
edge 1, {Kaneohe, Honolulu}, we merge these two nodes into a single component.
The second edge, {Wahiawa, PearlCity}, merges those two cities. The third edge
is {PearlCity, Honolulu}. That edge merges the components containing these two
cities. Presently, each of these components contains two cities, so we now have one
component with four cities, namely
{Wahiawa, PearlCity, Honolulu, Kaneohe}
All other cities are still in components by themselves.
Edge 4 is {Maili, Wahiawa} and adds Maili to the large component. The fifth
edge is {Kahului, Keokea}, which merges these two cities into a component. When
we consider edge 6, {Maili, PearlCity}, we see a new phenomenon: both ends of the
edge are already in the same component. We therefore do no merging with edge 6.
Edge 7 is {Lahaina, Kahului}, and it adds the node Lahaina to the component
{Kahului, Keokea}, forming the component {Lahaina, Kahului, Keokea}. Edge 8
adds Laie to the largest component, which is now
{Laie, Kaneohe, Honolulu, PearlCity, Wahiawa, Maili}
The ninth edge, {Laie, Wahiawa}, connects two cities in this component and is thus
ignored.
Edge 10 groups Kamuela and Kona into a component, and edge 11 adds Hilo
to this component. Edge 12 adds Hana to the component of
{Lahaina, Kahului, Keokea}
Finally, edge 13, {Hilo, Kona}, connects two cities already in the same component.
Thus,
{Laie, Kaneohe, Honolulu, PearlCity, Wahiawa, Maili}
{Lahaina, Kahului, Keokea, Hana}
{Kamuela, Hilo, Kona}
There are a number of data structures that can support these operations. We shall
study one simple idea that gives surprisingly good performance. The key is to put
the nodes of each component into a tree.2 The component is represented by the
root of the tree. The two operations above can now be implemented as follows:
2. To merge two different components, we make the root of one component a child
of the root of the other.
2 It is important to understand that, in what follows, the “tree” and the “graph” are distinct
structures. There is a one-to-one correspondence between the nodes of the graph and the
nodes of the tree; that is, each tree node represents a graph node. However, the parent-child
edges of the tree are not necessarily edges in the graph.
SEC. 9.4 CONNECTED COMPONENTS OF AN UNDIRECTED GRAPH 471
✦ Example 9.15. Let us follow the steps of Example 9.14, showing the trees
created at certain steps. Initially, every node is in a one-node tree by itself. The
first edge, {Kaneohe, Honolulu}, causes us to merge two one-node trees, {Kaneohe}
and {Honolulu}, into one two-node tree, {Kaneohe, Honolulu}. Either node could
be made a child of the other. Let us suppose that Honolulu is made the child of
the root Kaneohe.
Similarly, the second edge, {Wahiawa, PearlCity}, merges two trees, and we
may suppose that PearlCity is made the child of the root Wahiawa. At this point,
the current collection of components is represented by the two trees in Fig. 9.15 and
nine one-node trees.
Wahiawa Kaneohe
PearlCity Honolulu
The third edge, {PearlCity, Honolulu}, merges these two components. Let
us suppose that Wahiawa is made a child of the other root, Kaneohe. Then the
resulting component is represented by the tree of Fig. 9.16.
Kaneohe
Wahiawa Honolulu
PearlCity
When we consider the fourth edge, {Wahiawa, Maili}, we merge Maili into the
component represented by the tree of Fig. 9.16. We could either make Maili a child
of Kaneohe, or make Kaneohe a child of Maili. We prefer the former, since that
keeps the height of the tree small, while making the root of the large component
a child of the root of the small component tends to make paths in the tree larger.
Large paths, in turn, cause us to take more time following a path to the root, which
we need to do to determine the component of a node. By following that policy
and making arbitrary decisions when components have the same height, we might
wind up with the three trees in Fig. 9.17 that represent the three final connected
components. ✦
472 THE GRAPH DATA MODEL
Kaneohe
PearlCity
Kahului Kamuela
Fig. 9.17. Trees representing final connected components using tree-merging algorithm.
STATEMENT S(h): A tree of height h, formed by the policy of merging lower into
higher, has at least 2h nodes.
BASIS. The basis is h = 0. Such a tree must be a single node, and since 20 = 1,
the statement S(0) is true.
INDUCTION. Suppose S(h) is true for some h ≥ 0, and consider a tree T of height
h + 1. At some time during the formation of T by mergers, the height first reached
h + 1. The only way to get a tree of height h + 1 is to make the root of some tree
T1 , of height h, a child of the root of some tree T2 . T is T1 plus T2 , plus perhaps
other nodes that were added later, as suggested by Fig. 9.18.
Now T1 , by the inductive hypothesis, has at least 2h nodes. Since its root was
made a child of the root of T2 , the height of T2 is also at least h. Thus, T2 also
has at least 2h nodes. T consists of T1 , T2 , and perhaps more, so T has at least
2h +2h = 2h+1 nodes. That statement is S(h+1), and we have proved the inductive
step.
We now know that if a tree has n nodes and height h, it must be that n ≥ 2h .
Taking logarithms of both sides, we have log2 n ≥ h; that is, the height of the tree
cannot be greater than the logarithm of the number of nodes. Consequently, when
SEC. 9.4 CONNECTED COMPONENTS OF AN UNDIRECTED GRAPH 473
T2
T1
we follow any path from a node to its root, we take O(log n) time.
We shall now describe in more detail the data structure that implements these
ideas. First, suppose that there is a type NODE representing nodes. As before, we
assume the type NODE is int and MAX is at least the number of nodes in the graph.
For our example of Fig. 9.4, we shall let MAX be 13.
We shall also assume that there is a list edges consisting of cells of type EDGE.
These cells are defined by
typedef struct EDGE *EDGELIST;
struct EDGE {
NODE node1, node2;
EDGELIST next;
};
Finally, for each node of the graph, we need a corresponding tree node. Tree
nodes will be structures of type TREENODE, consisting of
1. A parent pointer, so that we can build a tree on the graph’s nodes, and follow
the tree to its root. A root node will be identified by having NULL as its parent.
2. The height of the tree of which a given node is the root. The height will only
be used if the node is presently a root.
We may thus define type TREENODE by
typedef struct TREENODE *TREE;
struct TREENODE {
int height;
TREE parent;
}:
We shall define an array
TREE nodes[MAX];
to associate with each graph node a node in some tree. It is important to realize
that each entry in the array nodes is a pointer to a node in the tree, yet this entry
is the sole representative of the node in the graph.
Two important auxiliary functions are shown in Fig. 9.19. The first, find,
takes a node a, gets a pointer to the corresponding tree node, x, and follows the
parent pointers in x and its ancestors, until it comes to the root. This search for
the root is performed by lines (2) and (3). If the root is found, a pointer to the root
is returned at line (4). Note that at line (1), the type NODE must be int so it may
be used to index the array nodes.
474 THE GRAPH DATA MODEL
(1) x = nodes[a];
(2) while (x->parent != NULL)
(3) x = x->parent;
(4) return x;
}
The second function, merge,3 takes pointers to two tree nodes, x and y, which
must be the roots of distinct trees for the merger to work properly. The test of line
(5) determines which of the roots has the greater height; ties are broken in favor of
y. The higher is assigned to the local variable higher and the lower to the local
variable lower at lines (6–7) or lines (8–9), whichever is appropriate. Then at line
(10) the lower is made a child of the higher and at lines (11) and (12) the height
of the higher, which is now the root of the combined tree, is incremented by one if
the heights of T1 and T2 are equal. The height of the lower remains as it was, but
it is now meaningless, because the lower is no longer a root.
The heart of the algorithm to find connected components is shown in Fig. 9.20.
We assume that the function makeEdges() turns the graph at hand into a list of
3 Do not confuse this function with a function of the same name used for merge sorting in
Chapters 2 and 3.
SEC. 9.4 CONNECTED COMPONENTS OF AN UNDIRECTED GRAPH 475
#include <stdio.h>
#include <stdlib.h>
#define MAX 13
typedef int NODE;
typedef struct EDGE *EDGELIST;
struct EDGE {
NODE node1, node2;
EDGELIST next;
};
main()
{
NODE u;
TREE a, b;
EDGELIST e;
TREE nodes[MAX];
4 It is normal to think of m as the number of edges, but in some graphs, there are more nodes
than edges.
SEC. 9.4 CONNECTED COMPONENTS OF AN UNDIRECTED GRAPH 477
EXERCISES
9.4.1: Figure 9.21 lists some cities in the state of Michigan and the road mileage
between them. For the purposes of this exercise, ignore the mileage. Construct the
connected components of the graph by examining each edge in the manner described
in this section.
9.4.2*: Prove, by induction on k, that a connected component of k nodes has at
least k − 1 edges.
9.4.3*: There is a simpler way to implement “merge” and “find,” in which we keep
an array indexed by nodes, giving the component of each node. Initially, each node
is in a component by itself, and we name the component by the node. To find
the component of a node, we simply look up the corresponding array entry. To
merge components, we run down the array, changing each occurrence of the first
component to the second.
a) Write a C program to implement this algorithm.
b) As a function of n, the number of nodes, and m, the larger of the number of
nodes and edges, what is the running time of this program?
c) For certain numbers of edges and nodes, this implementation is actually better
than the one described in the section. When?
9.4.4*: Suppose that instead of merging lower trees into higher trees in the con-
nected components algorithm of this section, we merge trees with fewer nodes
into trees with a larger number of nodes. Is the running time of the connected-
components algorithm still O(m log n)?
478 THE GRAPH DATA MODEL
✦
✦ ✦
✦
9.5 Minimal Spanning Trees
There is an important generalization of the connected components problem, in which
we are given an undirected graph with edges labeled by numbers (integers or reals).
We must not only find the connected components, but for each component we must
find a tree connecting the nodes of that component. Moreover, this tree must be
minimal, meaning that the sum of the edge labels is as small as possible.
Unrooted, The trees talked about here are not quite the same as the trees of Chapter 5.
unordered trees Here, no node is designated the root, and there is no notion of children or of order
among the children. Rather, when we speak of “trees” in this section, we mean
unrooted, unordered trees, which are just undirected graphs that have no simple
cycles.
Spanning tree A spanning tree for an undirected graph G is the nodes of G together with a
subset of the edges of G that
1. Connect the nodes; that is, there is a path between any two nodes using only
edges in the spanning tree.
2. Form an unrooted, unordered tree; that is, there are no (simple) cycles.
If G is a single connected component, then there is always a spanning tree. A
minimal spanning tree is a spanning tree the sum of whose edge labels is as small
as that of any spanning tree for the given graph.
✦ Example 9.16. Let graph G be the connected component for the island of
Oahu, as in Fig. 9.4 or Fig. 9.10. One possible spanning tree is shown in Fig. 9.22.
It is formed by deleting the edges {Maili, Wahiawa} and {Kaneohe, Laie}, and
Weight of a tree retaining the other five edges. The weight, or sum of edge labels, for this tree is 84.
As we shall see, that is not a minimum. ✦
Laie
28
20 12 11
Pearl 13
Honolulu
City
PearlCity
Laie Kaneohe
We can order the children of each node if we wish, but the order will be arbitrary,
bearing no relation to the original unrooted tree.
✦ Example 9.17. The Acme Surfboard Wax Company has offices in the thirteen
cities shown in Fig. 9.4. It wishes to rent dedicated data transmission lines from the
phone company, and we shall suppose that the phone lines run along the roads that
are indicated by edges in Fig. 9.4. Between islands, the company must use satellite
transmission, and the cost will be proportional to the number of components. How-
ever, for the ground transmission lines, the phone company charges by the mile.5
Thus, we wish to find a minimal spanning tree for each connected component of the
graph of Fig. 9.4.
If we divide the edges by component, then we can run Kruskal’s algorithm on
each component separately. However, if we do not already know the components,
5 This is one possible way to charge for leased telephone lines. One finds a minimal spanning
tree connecting the desired sites, and the charge is based on the weight of that tree, regardless
of how the phone connections are provided physically.
480 THE GRAPH DATA MODEL
then we must consider all the edges together, smallest label first, in the order of
Fig. 9.14. As in Section 9.4, we begin with each node in a component by itself.
We first consider the edge {Kaneohe, Honolulu}, the edge with the smallest
label. This edge merges these two cities into one component, and because we
perform a merge operation, we select that edge for the minimal spanning tree.
Edge 2 is {Wahiawa, PearlCity}, and since that edge also merges two components,
it is selected for the spanning tree. Likewise, edges 3 and 4, {PearlCity, Honolulu}
and {Wahiawa, Maili}, merge components, and are therefore put in the spanning
tree.
Edge 5, {Kahului, Keokea}, merges these two cities, and is also accepted for
the spanning tree, although this edge will turn out to be part of the spanning tree
for the Maui component, rather than the Oahu component as was the case for the
four previous edges.
Edge 6, {Maili, PearlCity}, connects two cities that are already in the same
component. Thus, this edge is rejected for the spanning tree. Even though we
shall have to pick some edges with larger labels, we cannot pick {Maili, PearlCity},
because to do so would form a cycle of the cities Maili, Wahiawa, and Pearl City. We
cannot have a cycle in the spanning tree, so one of the three edges must be excluded.
As we consider edges in order of label, the last edge of the cycle considered must
have the largest label, and is the best choice to exclude.
Edge 7, {Lahaina, Kahului}, and edge 8, {Laie, Kaneohe}, are both accepted
for the spanning tree, because they merge components. Edge 9, {Laie, Wahiawa},
is rejected because its ends are in the same component. We accept edges 10 and 11;
they form the spanning tree for the “big island” component, and we accept edge 12
to complete the Maui component. Edge 13 is rejected, because it connects Kona
and Hilo, which were merged into the same component by edges 10 and 11. The
resulting spanning trees of the components are shown in Fig. 9.23. ✦
We can prove that Kruskal’s algorithm produces a spanning tree whose weight is
as small as that of any spanning tree for the given graph. Let G be an undirected,
connected graph. For convenience, let us add infinitesimal amounts to some labels, if
necessary, so that all labels are distinct, and yet the sum of the added infinitesimals
is not as great as the difference between two edges of G that have different labels.
As a result, G with the new labels will have a unique minimal spanning tree, which
will be one of the minimal spanning trees of G with the original weights.
Then, let e1 , e2 , . . . , em be all the edges of G, in order of their labels, smallest
first. Note that this order is also the order in which Kruskal’s algorithm considers
the edges. Let K be the spanning tree for G with the adjusted labels produced by
Kruskal’s algorithm, and let T be the unique minimal spanning tree for G.
We shall prove that K and T are really the same. If they are different, then
there must be at least one edge that is in one but not the other. Let ei be the first
such edge in the ordering of edges; that is, each of e1 , . . . , ei−1 is either in both K
and T , or in neither of K and T . There are two cases, depending on whether ei is
in K or is in T . We shall show a contradiction in each case, and thus conclude that
ei does not exist; thus K = T , and K is the minimal spanning tree for G.
SEC. 9.5 MINIMAL SPANNING TREES 481
Kahului
22 60
24
Keokea
15
Maili Wahiawa Kaneohe
12 11
Pearl 13
Honolulu Kamuela
City
31 45
Kona Hilo
ei
There are two subcases, depending on whether or not ei has a higher label than all
the edges on path Q.
a) Edge ei has the highest label. Then all the edges on Q are among {e1 , . . . , ei−1 }.
Remember that T and K agree on all edges before ei , and so all the edges of
Q are also edges of K. But ei is also in K, which implies K has a cycle. We
thus rule out the possibility that ei has a higher label than any of the edges of
path Q.
b) There is some edge f on path Q that has a higher label than ei . Suppose
f connects nodes w and x. Figure 9.25 shows the situation in tree T . If we
remove edge f from T , and add edge ei , we do not form a cycle, because path
Q was broken by the removal of f . The resulting collection of edges has a lower
weight than T , because f has a higher label than ei . We claim the resulting
edges still connect all the nodes. To see why, notice that w and x are still
connected; there is a path that follows Q from w to u, then follows the edge
ei , then the path Q from v to x. Since {w, x} was the only edge removed, if its
endpoints are still connected, surely all nodes are connected. Thus, the new
set of edges is a spanning tree, and its existence contradicts the assumption
that T was minimal.
We have now shown that it is impossible for ei to be in K but not in T . That rules
out the second case. Since it is impossible that ei is in one of T and K, but not the
other, we conclude that K really is the minimal spanning tree T . That is, Kruskal’s
algorithm always finds a minimal spanning tree.
w
f
ei
v
previous section. It appears that the total time for Kruskal’s algorithm is thus
O m(log n + log m) .
However, notice that m ≤ n2 , because there are only n(n − 1)/2 pairs of nodes.
Thus, log m ≤ 2 log n, and m(log n + log m) ≤ 3m log n. Since constant factors can
be neglected within a big-oh expression, we conclude that Kruskal’s algorithm takes
O(m log n) time.
EXERCISES
9.5.1: Draw the tree of Fig. 9.22 if Wahiawa is selected as the root.
9.5.2: Use Kruskal’s algorithm to find minimal spanning trees for each of the
components of the graph whose edges and labels are listed in Fig. 9.21 (see the
exercises for Section 9.4).
9.5.3**: Prove that if G is a connected, undirected graph of n nodes, and T is a
spanning tree for G, then T has n − 1 edges. Hint : We need to do an induction on
n. The hard part is to show that T must have some node v with degree 1; that is,
T has exactly one edge containing v. Consider what would happen if for every node
u, there were at least two edges of T containing u. By following edges into and out
of a sequence of nodes, we would eventually find a cycle. Since T is supposedly a
spanning tree, it could not have a cycle, which gives us a contradiction.
9.5.4*: Once we have selected n − 1 edges, it is not necessary to consider any more
edges for possible inclusion in the spanning tree. Describe a variation of Kruskal’s
algorithm that does not sort all the edges, but puts them in a priority queue, with
the negative of the edge’s label as its priority (i.e., shortest edge is selected first by
deleteMax). Show that if a spanning tree can be found among the first m/ log m
edges, then this version of Kruskal’s algorithm takes only O(m) time.
9.5.5*: Suppose we find a minimal spanning tree T for a graph G. Let us then add
to G the edge {u, v} with weight w. Under what circumstances will T be a minimal
spanning tree of the new graph?
484 THE GRAPH DATA MODEL
Euler circuit 9.5.6**: An Euler circuit for an undirected graph G is a path that starts and ends
at the same node and contains each edge of G exactly once.
a) Show that a connected, undirected graph has an Euler circuit if and only if
each node is of even degree.
b) Let G be an undirected graph with m edges in which every node is of even
degree. Give an O(m) algorithm to construct an Euler circuit for G.
✦
✦ ✦
✦
9.6 Depth-First Search
We shall now describe a graph-exploration method that is useful for directed graphs.
In Section 5.4 we discussed the preorder and postorder traversals of trees, where we
start at the root and recursively explore the children of each node we visit. We can
apply almost the same idea to any directed graph.6 From any node, we recursively
explore its successors.
However, we must be careful if the graph has cycles. If there is a cycle, we can
wind up calling the exploration function recursively around the cycle forever. For
instance, consider the graph of Fig. 9.26. Starting at node a, we might decide to
explore node b next. From b we might explore c first, and from c we could explore
b first. That gets us into an infinite recursion, where we alternate exploring from b
and c. In fact, it doesn’t matter in what order we choose to explore successors of
b and c. Either we shall get caught in some other cycle, or we eventually explore c
from b and explore b from c, infinitely.
b d
c e f
The search algorithm is called depth-first search because we find ourselves going
as far from the initial node (as “deep”) as fast as we can. It can be implemented
with a simple data structure. Again, let us assume that the type NODE is used
to name nodes and that this type is int. We represent arcs by adjacency lists.
Since we need a “mark” for each node, which can take on the values VISITED and
UNVISITED, we shall create an array of structures to represent the graph. These
structures will contain both the mark and the header for the adjacency list.
enum MARKTYPE {VISITED, UNVISITED};
typedef struct {
enum MARKTYPE mark;
LIST successors;
} GRAPH[MAX];
where LIST is an adjacency list, defined in the customary manner:
typedef struct CELL *LIST;
struct CELL {
NODE nodeName;
LIST next;
};
We begin by marking all the nodes UNVISITED. Recursive function dfs(u, G)
of Fig. 9.27 works on a node u of some externally defined graph G of type GRAPH.
At line (1) we mark u VISITED, so we don’t call dfs on it again. Line (2)
initializes p to point to the first cell on the adjacency list for node u. The loop of
lines (3) through (7) takes p down the adjacency list, considering each successor, v,
of u, in turn.
Line (4) sets v to be the “current” successor of u. At line (5) we test whether
v has ever been visited before. If so, we skip the recursive call at line (6) and we
move p to the next cell of the adjacency list at line (7). However, if v has never been
visited, we start a depth-first search from node v, at line (6). Eventually, we finish
the call to dfs(v, G). Then, we execute line (7) to move p down u’s adjacency list
486 THE GRAPH DATA MODEL
✦ Example 9.18. Suppose G is the graph of Fig. 9.26, and, for specificity, assume
the nodes on each adjacency list are ordered alphabetically. Initially, all nodes are
marked UNVISITED. Let us call dfs(a).7 Node a is marked VISITED at line (1), and
at line (2) we initialize p to point to the first cell on a’s adjacency list. At line (4)
v is set to b, since b is the node in the first cell. Since b is currently unvisited, the
test of line (5) succeeds, and at line (6) we call dfs(b).
Now, we start a new call to dfs, with u = b, while the old call with u = a is
dormant but still alive. We begin at line (1), marking b VISITED. Since c is the first
node on b’s adjacency list, c becomes the value of v at line (4). Node c is unvisited,
so that we succeed at line (5) and at line (6) we call dfs(c).
A third call to dfs is now alive, and to begin dfs(c), we mark c VISITED
and set v to b at line (4), since b is the first, and only, node on c’s adjacency list.
However, b was already marked VISITED at line (1) of the call to dfs(b), so that
we skip line (6) and move p down c’s adjacency list at line (7). Since c has no more
successors, p becomes NULL, so that the test of line (3) fails, and dfs(c) is finished.
We now return to the call dfs(b). Pointer p is advanced at line (7), and it
now points to the second cell of b’s adjacency list, which holds node d. We set v to
d at line (4), and since d is unvisited, we call dfs(d) at line (6).
For the execution of dfs(d), we mark d VISITED. Then v is first set to c. But
c is visited, and so next time around the loop, v = e. That leads to the call dfs(e).
Node e has only c as a successor, and so after marking e VISITED, dfs(e) returns
to dfs(d). We next set v = f at line (4) of dfs(d), and call dfs(f). After marking
f VISITED, we find that f also has only c as a successor, and c is visited.
We are now finished with dfs(f). Since f is the last successor of d, we are
also finished with dfs(d), and since d is the last successor of b, we are done with
dfs(b) as well. That takes us back to dfs(a). Node a has another successor, d,
but that node is visited, and so we are done with dfs(a) as well.
Figure 9.28 summarizes the action of dfs on the graph of Fig. 9.26. We show the
stack of calls to dfs, with the currently active call at the right. We also indicate the
action taken at each step, and we show the value of the local variable v associated
with each currently live call, or show that p = NULL, indicating that there is no
active value for v. ✦
7 In what follows, we shall omit the second argument of dfs, which is always the graph G.
SEC. 9.6 DEPTH-FIRST SEARCH 487
dfs(a) Return
p =NULL
✦ Example 9.19. The tree for the exploration of the graph in Fig. 9.26 that was
summarized in Fig. 9.28 is seen in Fig. 9.29. We show the tree arcs, representing the
parent-child relationship, as solid lines. Other arcs of the graph are shown as dotted
arrows. For the moment, we should ignore the numbers labeling the nodes. ✦
a 6
b 5
c d 4
1
e 2 f 3
Fig. 9.29. One possible depth-first search tree for the graph of Fig. 9.26.
nodes
with
active
calls
to
dfs
Fig. 9.30. Part of the tree that is built when arc u → v is considered.
that evidently did not happen (for then v would not be to the right of u), it must
be that v is already in the tree when the arc u → v is considered.
However, Fig. 9.30 shows the parts of the tree that exist while dfs(u) is active.
Since children are added in left-to-right order, no proper ancestor of node u as yet
has a child to the right of u. Thus, v can only be an ancestor of u, a descendant of
u, or somewhere to the left of u. Thus, if u → v is a cross edge, v must be to the
left of u, not the right of u as we initially supposed.
the test of line (4), then u will be marked VISITED, and so we do not create a tree
with root u.
✦ Example 9.20. Suppose we apply the above algorithm to the graph of Fig.
9.26, but let d be the node whose name is 0; that is, d is the first root of a tree
for the depth-first spanning forest. We call dfs(d), which constructs the first tree
of Fig. 9.32. Now, all nodes but a are visited. As u becomes each of the various
nodes in the loop of lines (3) to (5) of Fig. 9.31, the test of line (4) fails except
when u = a. Then, we create the one-node second tree of Fig. 9.32. Note that both
successors of a are marked VISITED when we call dfs(a), and so we do not make
any recursive calls from dfs(a). ✦
d a
c e f
When we present the nodes of a graph as a depth-first search forest, the notions
of forward, backward, and tree arcs apply as before. However, the notion of a cross
arc must be extended to include arcs that run from one tree to a tree to its left.
Examples of such cross arcs are a → b and a → d in Fig. 9.32.
The rule that cross arcs always go from right to left continues to hold. The
reason is also the same. If there were a cross arc u → v that went from one tree
to a tree to the right, then consider what happens when we call dfs(u). Since v
was not added to the tree being formed at the moment, it must already have been
SEC. 9.6 DEPTH-FIRST SEARCH 491
in some tree. But the trees to the right of u have not yet been created, and so v
cannot be part of one of them.
8 In fact, the sum of the mu ’s will be exactly m, except in the case that the number of nodes
exceeds the number of arcs; recall that m is the larger of the numbers of nodes and arcs.
492 THE GRAPH DATA MODEL
void dfsForest(GRAPH G)
{
NODE u;
(10) k = 0;
(11) for (u = 0; u < MAX; u++)
(12) G[u].mark = UNVISITED;
(13) for (u = 0; u < MAX; u++)
(14) if (G[u].mark == UNVISITED)
(15) dfs(u, G);
}
call to a returns, giving a the number 6. Notice that this order is exactly the one
we would get if we simply walked the tree in postorder. ✦
We can assign the postorder numbers to the nodes with a few simple modifica-
tions to the depth-first search algorithm we have written so far; these changes are
summarized in Fig. 9.33.
1. In the GRAPH type, we need an additional field for each node, called postorder.
For the graph G, we place the postorder number of node u in G[u].postorder.
This assignment is accomplished at line (9) of Fig. 9.33.
2. We use a global variable k to count nodes in postorder. This variable is defined
externally to dfs and dfsForest. As seen in Fig. 9.33, we initialize k to 0
in line (10) of dfsForest, and just before assigning a postorder number, we
increment k by 1 at line (8) in dfs.
Notice that as a result, when there is more than one tree in the depth-first search
forest, the first tree gets the lowest numbers, the next tree gets the next numbers
in order, and so on. For example, in Fig. 9.32, a would get the postorder number 6.
v w
Time of v Time of w
Time of u
(b) Active intervals for their calls to dfs.
We can make several observations. First, the call to dfs on a descendant like v
is active for only a subinterval of the time during which the call on an ancestor, like
u, is active. In particular, the call to dfs(v) terminates before the call to dfs(u)
does. Thus, the postorder number of v must be less than the postorder number of
u whenever v is a proper descendant of u.
Second, if w is to the right of v, then the call to dfs(w) cannot begin until after
the call to dfs(v) terminates. Thus, whenever v is to the left of w, the postorder
number of v is less than that of w. Although not shown in Fig. 9.34, the same is
true even if v and w are in different trees of the depth-first search forest, with v’s
tree to the left of w’s tree.
We can now consider the relationship between the postorder numbers of u and
v for each arc u → v.
1. If u → v is a tree arc or forward arc, then v is a descendant of u, and so v
precedes u in postorder.
2. If u → v is a cross arc, then we know v is to the left of u, and again v precedes
u in postorder.
3. If u → v is a backward arc and v 6= u, then v is a proper ancestor of u, and so
v follows u in postorder. However, v = u is possible for a backward arc, since
a loop is a backward arc. Thus, in general, for backward arc u → v, we know
that the postorder number of v is at least as high as the postorder number of
u.
In summary, we see that in postorder, the head of an arc precedes the tail, unless
the arc is a backward arc; in which case the tail precedes or equals the head. Thus,
we can identify the backward arcs simply by finding those arcs whose tails are equal
to or less than their heads in postorder. We shall see a number of applications of
this idea in the next section.
EXERCISES
9.6.1: For the tree of Fig. 9.5 (see the exercises for Section 9.2), give two depth-first
search trees starting with node a. Give a depth-first search tree starting with node
d.
9.6.2*: No matter which node we start with in Fig. 9.5, we wind up with only one
tree in the depth-first search forest. Explain briefly why that must be the case for
this particular graph.
9.6.3: For each of your trees of Exercise 9.6.1, indicate which of the arcs are tree,
forward, backward, and cross arcs.
9.6.4: For each of your trees of Exercise 9.6.1, give the postorder numbers for the
nodes.
9.6.5*: Consider the graph with three nodes, a, b, and c, and the two arcs a → b
and b → c. Give all the possible depth-first search forests for this graph, considering
all possible starting nodes for each tree. What is the postorder numbering of the
nodes for each forest? Are the postorder numbers always the same for this graph?
SEC. 9.7 SOME USES OF DEPTH-FIRST SEARCH 495
9.6.6*: Consider the generalization of the graph of Exercise 9.6.5 to a graph with n
nodes, a1 , a2 , . . . , an , and arcs a1 → a2 , a2 → a3 , . . . , an−1 → an . Prove by complete
induction on n that this graph has 2n−1 different depth-first search forests. Hint :
It helps to remember that 1 + 1 + 2 + 4 + 8 + · · · + 2i = 2i+1 , for i ≥ 0.
9.6.7*: Suppose we start with a graph G and add a new node x that is a predecessor
of all other nodes in G. If we run dfsForest of Fig. 9.31 on the new graph, starting
at node x, then a single tree results. If we then delete x from this tree, several trees
may result. How do these trees relate to the depth-first search forest of the original
graph G?
9.6.8**: Suppose we have a directed graph G, from whose representation we have
just constructed a depth-first spanning forest F by the algorithm of Fig. 9.31. Let
us now add the arc u → v to G to form a new graph H, whose representation is
exactly that of G, except that node v now appears somewhere on the adjacency
list for node u. If we now run Fig. 9.31 on this representation of H, under what
circumstances will the same depth-first forest F be constructed? That is, when will
the tree arcs for H are exactly the same as the tree arcs for G?
✦
✦ ✦
✦
9.7 Some Uses of Depth-First Search
In this section, we see how depth-first search can be used to solve some problems
quickly. As previously, we shall here use n to represent the number of nodes of a
graph, and we shall use m for the larger of the number of nodes and the number of
arcs; in particular, we assume that n ≤ m is always true. Each of the algorithms
presented takes O(m) time, on a graph represented by adjacency lists. The first
algorithm determines whether a directed graph is acyclic. Then for those graphs
that are acyclic, we see how to find a topological sort of the nodes (topological sort-
ing was discussed in Section 7.10; we shall review the definitions at the appropriate
time). We also show how to compute the transitive closure of a graph (see Section
7.10 again), and how to find connected components of an undirected graph faster
than the algorithm given in Section 9.4.
Fig. 9.35. Every backward arc forms a cycle with tree arcs.
pk < p1 . Then consider the arc vk → v1 that completes the cycle. The postorder
number of its tail, which is pk , is less than the postorder number of its head, p1 ,
and so this arc is a backward arc. That proves there must be some backward arc
in any cycle.
As a result, after computing the postorder numbers of all nodes, we simply
examine all the arcs, to see if any has a tail less than or equal to its head, in
postorder. If so, we have found a backward arc, and the graph is cyclic. If there is
no such arc, the graph is acyclic. Figure 9.36 shows a function that tests whether an
externally defined graph G is acyclic, using the data structure for graphs described
in the previous section. It also makes use of the function dfsForest defined in Fig.
9.33 to compute the postorder numbers of the nodes of G.
BOOLEAN testAcyclic(GRAPH G)
{
NODE u, v; /* u runs through all the nodes */
LIST p; /* p points to each cell on the adjacency list
for u; v is a node on the adjacency list */
(1) dfsForest(G);
(2) for (u = 0; u < MAX; u++) {
(3) p = G[u].successors;
(4) while (p != NULL) {
(5) v = p->nodeName;
(6) if (G[u].postorder <= G[v].postorder)
(7) return FALSE;
(8) p = p->next;
}
}
(9) return TRUE;
}
Topological Sorting
Suppose we know that a directed graph G is acyclic. As for any graph, we may find
a depth-first search forest for G and thereby determine a postorder for the nodes of
G. Suppose (v1 , v2 , . . . , vn ) is a list of the nodes of G in the reverse of postorder;
that is, v1 is the node numbered n in postorder, v2 is numbered n−1, and in general,
vi is the node numbered n − i + 1 in postorder.
The order of the nodes on this list has the property that all arcs of G go forward
in the order. To see why, suppose vi → vj is an arc of G. Since G is acyclic, there
are no backward arcs. Thus, for every arc, the head precedes the tail. That is, vj
precedes vi in postorder. But the list is the reverse of postorder, and so vi precedes
vj on the list. That is, every tail precedes the corresponding head in the list order.
An order for the nodes of a graph G with the property that for every arc of G
Topological the tail precedes the head is called a topological order, and the process of finding
order such an order for the nodes is called topological sorting. Only acyclic graphs have a
topological order, and as we have just seen, we can produce a topological order for
an acyclic graph O(m) time, where m is the larger of the number of nodes and arcs,
by performing a depth-first search. As we are about to give a node its postorder
number, that is, as we complete the call to dfs on that node, we push the node
onto a stack. When we are done, the stack is a list in which the nodes appear in
postorder, with the highest at the top (front). That is the reverse postorder we
desire. Since the depth-first search takes O(m) time, and pushing the nodes onto a
stack takes only O(n) time, the whole process takes O(m) time.
498 THE GRAPH DATA MODEL
✦ Example 9.22. In Fig. 9.37(a) is an acyclic graph, and in Fig. 9.37(b) is the
depth-first search forest we get by considering the nodes in alphabetic order. We
also show in Fig. 9.37(b) the postorder numbers that we get from this depth-first
search. If we list the nodes highest postorder number first, we get the topological
order (d, e, c, f, b, a). The reader should check that each of the eight arcs in Fig.
9.37(a) has a tail that precedes its head according to this list. There are, incidentally,
three other topological orders for this graph, such as (d, c, e, b, f, a). ✦
✦ Example 9.23. Consider the graph of Fig. 9.37(a). If we start our depth-first
search from node a, we can go nowhere, since there are no arcs out of a. Thus,
dfs(a) terminates immediately. Since only a is visited, we conclude that a is the
only node reachable from a.
If we start with b, we can reach a, but that is all; the reachable set for b is
{a, b}. Similarly, from c we reach {a, b, c, f }, from d we reach all the nodes, from e
we reach {a, b, e, f }, and from f we can reach only {a, f }.
For another example, consider the graph of Fig. 9.26. From a we can reach all
the nodes. From any node but a, we can reach all the nodes except a. ✦
SEC. 9.7 SOME USES OF DEPTH-FIRST SEARCH 499
c e
b f
a
(a) A directed acyclic graph.
1 2 4 6
a b c d
3 5
f e
To see why, first note that the presence of an arc u → v in the directed graph
indicates that there is an edge {u, v}. Thus, all the nodes of a tree are connected.
Now we must show the converse, that if two nodes are connected, then they are
in the same tree. Suppose there were a path in the undirected graph between two
nodes u and v that are in different trees. Say the tree of u was constructed first.
Then there is a path in the directed graph from u to v, which tells us that v, and
all the nodes on this path, should have been added to the tree with u. Thus, nodes
are connected in the undirected graph if and only if they are in the same tree; that
is, the trees are the connected components.
✦ Example 9.24. Consider the undirected graph of Fig. 9.4 again. One possible
depth-first search forest we might construct for this graph is shown in Fig. 9.38.
Notice how the three depth-first search trees correspond to the three connected
SEC. 9.7 SOME USES OF DEPTH-FIRST SEARCH 501
Laie Maili
Kaneohe
components. ✦
EXERCISES
9.7.1: Find all the topological orders for the graph of Fig. 9.37.
9.7.2*: Suppose R is a partial order on domain D. We can represent R by its
graph, where the nodes are the elements of D and there is an arc u → v whenever
uRv and u 6= v. Let (v1 , v2 , . . . , vn ) be a topological ordering of the graph of R.
Let T be the relation defined by vi T vj whenever i ≤ j. Show that
a) T is a total order, and
b) The pairs in R are a subset of the pairs in T ; that is, T is a total order containing
the partial order R.
9.7.3: Apply depth-first search to the graph of Fig. 9.21 (after converting it to a
symmetric directed graph), to find the connected components.
9.7.4: Consider the graph with arcs a → c, b → a, b → c, d → a, and e → c.
a) Test the graph for cycles.
b) Find all the topological orders for the graph.
c) Find the reachable set of each node.
9.7.5*: In the next section we shall consider the general problem of finding shortest
paths from a source node s. That is, we want for each node u the length of the
shortest path from s to u if one exists. When we have a directed, acyclic graph, the
problem is easier. Give an algorithm that will compute the length of the shortest
path from node s to each node u (infinity if no such path exists) in a directed,
acyclic graph G. Your algorithm should take O(m) time, where m is the larger of
the number of nodes and arcs of G. Prove that your algorithm has this running
time. Hint : Start with a topological sort of G, and visit each node in turn. On
visiting a node u, calculate the shortest distance from s to u in terms of the already
calculated shortest distances to the predecessors of u.
502 THE GRAPH DATA MODEL
9.7.6*: Give algorithms to compute the following for a directed, acyclic graph G.
Your algorithms should run in time O(m), where m is the larger of the number of
nodes and arcs of G, and you should prove that this running time is all that your
algorithm requires. Hint : Adapt the idea of Exercise 9.7.5.
a) For each node u, find the length of the longest path from u to anywhere.
b) For each node u, find the length of the longest path to u from anywhere.
c) For a given source node s and for all nodes u of G, find the length of the longest
path from s to u.
d) For a given source node s and for all nodes u of G, find the length of the longest
path from u to s.
e) For each node u, find the length of the longest path through u.
✦
✦ ✦
✦
9.8 Dijkstra’s Algorithm for Finding Shortest Paths
Suppose we have a graph, which could be either directed or undirected, with labels
on the arcs (or edges) to represent the “length” of that arc. An example is Fig.
9.4, which showed the distance along certain roads of the Hawaiian Islands. It
is quite common to want to know the minimum distance between two nodes; for
example, maps often include tables of driving distance as a guide to how far one can
travel in a day, or to help determine which of two routes (that go through different
intermediate cities) is shorter. A similar kind of problem would associate with each
arc the time it takes to travel along that arc, or perhaps the cost of traveling that
arc. Then the minimum “distance” between two nodes would correspond to the
traveling time or the fare, respectively.
In general, the distance along a path is the sum of the labels of that path. The
Minimum minimum distance from node u to node v is the minimum of the distance of any
distance path from u to v.
✦ Example 9.25. Consider the map of Oahu in Fig. 9.10. Suppose we want to
find the minimum distance from Maili to Kaneohe. There are several paths we
could choose. One useful observation is that, as long as the labels of the arcs are
nonnegative, the minimum-distance path need never have a cycle. For we could
skip that cycle and find a path between the same two nodes, but with a distance
no greater than that of the path with the cycle. Thus, we need only consider
1. The path through Pearl City and Honolulu.
2. The path through Wahiawa, Pearl City, and Honolulu.
3. The path through Wahiawa and Laie.
4. The path through Pearl City, Wahiawa, and Laie.
The distances of these paths are 44, 51, 67, and 84, respectively. Thus, the minimum
distance from Maili to Kaneohe is 44. ✦
SEC. 9.8 DIJKSTRA’S ALGORITHM FOR FINDING SHORTEST PATHS 503
If we wish to find the minimum distance from one given node, called the source
Source node node, to all the nodes of the graph, one of the most efficient techniques to use is
a method called Dijkstra’s algorithm, the subject of this section. It turns out that
if all we want is the distance from one node u to another node v, the best way is
to run Dijkstra’s algorithm with u as the source node and stop when we deduce
the distance to v. If we want to find the minimum distance between every pair of
nodes, there is an algorithm that we shall cover in the next section, called Floyd’s
algorithm, that sometimes is preferable to running Dijkstra’s algorithm with every
node as a source.
The essence of Dijkstra’s algorithm is that we discover the minimum distance
from the source to other nodes in the order of those minimum distances, that is,
closest nodes first. As Dijkstra’s algorithm proceeds, we have a situation like that
Settled node suggested in Fig. 9.39. In the graph G there are certain nodes that are settled, that
is, their minimum distance is known; this set always includes s, the source node.
Special path For the unsettled nodes v, we record the length of the shortest special path, which
is a path that starts at the source node, travels only through settled nodes, then at
the last step jumps out of the settled region to v.
Graph G
settled
nodes
s v
special
path
✦ Example 9.26. Consider the map of Oahu in Fig. 9.10. That graph is undi-
rected, but we shall assume edges are arcs in both directions. Let the source be
Honolulu. Then initially, only Honolulu is settled and its distance is 0. We can set
dist(PearlCity) to 13 and dist(Kaneohe) to 11, but other cities, having no arc from
Honolulu, are given distance INFTY. The situation is shown in the first column of
Fig. 9.40. The star on distances indicates that the node is settled.
ROUND
CITY (1) (2) (3) (4) (5)
Honolulu 0* 0* 0* 0* 0*
PearlCity 13 13 13* 13* 13*
Maili INFTY INFTY 33 33 33*
Wahiawa INFTY INFTY 25 25* 25*
Laie INFTY 35 35 35 35
Kaneohe 11 11* 11* 11* 11*
VALUES OF dist
Among the unsettled nodes, the one with the smallest distance is now Kaneohe,
and so this node is settled. There are arcs from Kaneohe to Honolulu and Laie.
The arc to Honolulu does not help, but the value of dist(Kaneohe), which is 11,
plus the label of the arc from Kaneohe to Laie, which is 24, totals 35, which is less
than “infinity,” the current value of dist(Laie). Thus, in the second column, we
have reduced the distance to Laie to 35. Kaneohe is now settled.
In the next round, the unsettled node with the smallest distance is Pearl City,
with a distance of 13. When we make Pearl City settled, we must consider the
neighbors of Pearl City, which are Maili and Wahiawa. We reduce the distance to
Maili to 33 (the sum of 13 and 20), and we reduce the distance to Wahiawa to 25
(the sum of 13 and 12). The situation is now as in column (3).
Next to be settled is Wahiawa, with a distance of 25, least among the currently
unsettled nodes. However, that node does not allow us to reduce the distance to
any other node, and so column (4) has the same distances as column (3). Similarly,
we next settle Maili, with a distance of 33, but that does not reduce any distances,
leaving column (5) the same as column (4). Technically, we have to settle the last
node, Laie, but the last node cannot affect any other distances, and so column (5)
gives the shortest distances from Honolulu to all six cities. ✦
SEC. 9.8 DIJKSTRA’S ALGORITHM FOR FINDING SHORTEST PATHS 505
Graph G
settled
nodes
s v
w
u
INDUCTION. Now assume (a) and (b) hold after k nodes have been settled, and
let v be the (k + 1)st node settled. We claim that (a) still holds, because dist(v)
is the least distance of any path from s to v. Suppose not. By part (b) of the
inductive hypothesis, when k nodes are settled, dist(v) is the minimum distance
of any special path to v, and so there must be some shorter nonspecial path to v.
As suggested in Fig. 9.41, this path must leave the settled nodes at some node w
(which could be s), and go to some unsettled node u. From there, the path could
meander in and out of the settled nodes, until it finally arrives at v.
However, v was chosen to be the (k + 1)st node settled, which means that at
this time, dist(u) could not be less than dist(v), or else we would have selected u
as the (k + 1)st node. By (b) of the inductive hypothesis, dist(u) is the minimum
length of any special path to u. But the path from s to w to u in Fig. 9.41 is a
special path, so that its distance is at least dist(u). Thus, the supposed shorter
9 When labels are allowed to be negative, we can find graphs for which Dijkstra’s algorithm
gives incorrect answers.
506 THE GRAPH DATA MODEL
path from s to v through w and u has a distance that is at least dist(v), because
the initial part from s to u already has distance dist(u), and dist(u) ≥ dist(v).10
Thus, (a) holds for k + 1 nodes, that is, (a) continues to hold when we include v
among the settled nodes.
Now we must show that (b) holds when we add v to the settled nodes. Consider
some node u that remains unsettled when we add v to the settled nodes. On the
shortest special path to u, there must be some penultimate (next-to-last) node; this
node could either be v or some other node w. The two possibilities are suggested
by Fig. 9.42.
Graph G
settled
nodes
s v u
Fig. 9.42. What is the penultimate node on the shortest special path to u?
First, suppose the penultimate node is v. Then the length of the path from s
to v to u suggested in Fig. 9.42 is dist(v) plus the label of the arc v → u.
Alternatively, suppose the penultimate node is some other node w. By induc-
tive hypothesis (a), the shortest path from s to w consists only of nodes that were
settled prior to v, and therefore, v does not appear on the path. Thus, the length of
the shortest special path to u does not change when we add v to the settled nodes.
Now recall that when we settle v, we adjust each dist(u) to be the smaller of
the old value of dist(u) and dist(v) plus the label of arc v → u. The former covers
the case that some w other than v is the penultimate node, and the latter covers
the case that v is the penultimate node. Thus, part (b) also holds, and we have
completed the inductive step.
10 Note that the fact that the labels are nonnegative is vital; if not, the portion of the path
from u to v could have a negative distance, resulting in a shorter path to v.
11 Actually, this implementation is only best when the number of arcs is somewhat less than
the square of the number of nodes, which is the maximum number of arcs there can be. A
simple implementation for the dense case is discussed in the exercises.
SEC. 9.8 DIJKSTRA’S ALGORITHM FOR FINDING SHORTEST PATHS 507
to represent the partially ordered tree. The intent is that to each graph node u
there corresponds a partially ordered tree node a that has priority equal to dist(u).
However, unlike Section 5.9, we shall organize the partially ordered tree by least
priority rather than greatest. (Alternatively, we could take the priority of a to be
−dist(u).) Figure 9.43 illustrates the data structure.
graph
dist succs. toPOT potNodes
0 1
u a
a u
n−1 n
We use NODE for the type of graph nodes. As usual, we shall name nodes with
integers starting at 0. We shall use the type POTNODE for the type of nodes in the
partially ordered tree. As in Section 5.9, we shall assume that the nodes of the
partially ordered tree are numbered starting at 1 for convenience. Thus, both NODE
and POTNODE are synonyms for int.
The data type GRAPH is defined to be
typedef struct {
float dist;
LIST successors;
POTNODE toPOT;
} GRAPH[MAX];
Here, MAX is the number of nodes in the graph, and LIST is the type of adjacency
lists consisting of cells of type CELL. Since we need to include labels, which we take
to be floating-point numbers, we shall declare as the type CELL
typedef struct CELL *LIST;
struct CELL {
NODE nodeName;
float nodeLabel;
LIST next;
};
We declare the data type POT to be an array of graph nodes
508 THE GRAPH DATA MODEL
temp = P[b];
P[b] = P[a];
P[a] = temp;
G[P[a]].toPOT = a;
G[P[b]].toPOT = b;
}
Fig. 9.44. Function to swap two nodes of the partially ordered tree.
We shall need to bubble nodes up and down the partially ordered tree, as we
did in Section 5.9. The major difference is that here, the value in an element of
the array potNodes is not the priority. Rather, that value takes us to a node of
graph, and in the structure for that node we find the field dist, which gives us
the priority. We therefore need an auxiliary function priority that returns dist
for the appropriate graph node. We shall also assume for this section that smaller
SEC. 9.8 DIJKSTRA’S ALGORITHM FOR FINDING SHORTEST PATHS 509
priorities rise to the top of the partially ordered tree, rather than larger priorities
as in Section 5.9.
Figure 9.45 shows the function priority and functions bubbleUp and bubble-
Down that are simple modifications of the functions of the same name in Section 5.9.
Each takes a graph G and a partially ordered tree P as arguments. Function bub-
bleDown also needs an integer last that indicates the end of the current partially
ordered tree in the array P.
Fig. 9.45. Bubbling nodes up and down the partially ordered tree.
Initialization
We shall assume that the adjacency list for each graph node has already been
created and that a pointer to the adjacency list for graph node u appears in
graph[u].successors. We shall also assume that node 0 is the source node. If we
take the graph node i to correspond to node i + 1 of the partially ordered tree, then
the array potNodes is appropriately initialized as a partially ordered tree. That is,
the root of the partially ordered tree represents the source node of the graph, to
which we give priority 0, and to all other nodes we give priority INFTY, our “infinite”
defined constant.
510 THE GRAPH DATA MODEL
As we shall see, on the first round of Dijkstra’s algorithm, we select the source
node to “settle,” which will create the condition we regard as our starting point in
the informal introduction, where the source node is settled and dist[u] is noninfi-
nite only when there is an arc from the source to u. The initialization function is
shown in Fig. 9.46. As with previous functions in this section, initialize takes as
arguments the graph and the partially ordered tree. It also takes a pointer pLast to
the integer last, so it can initialize it to M AX, the number of nodes in the graph.
Recall that last will indicate the last position in the array for the partially ordered
tree that is currently in use.
Note that the indexes of the partially ordered tree are 1 through M AX, while
for the graph, they are 0 through M AX − 1. Thus, in lines (3) and (4) of Fig. 9.46,
we have to make node i of the graph correspond initially to node i+1 of the partially
ordered tree.
SEC. 9.8 DIJKSTRA’S ALGORITHM FOR FINDING SHORTEST PATHS 511
At line (7) we begin updating distances to reflect the fact that v is now settled.
Pointer p is initialized to the beginning of the adjacency list for node v. Then in the
loop of lines (8) to (13), we consider each successor u of v. After setting variable
u to one of the successors of v at line (9), we test at line (10) whether the shortest
special path to u goes through v. That is the case whenever the old value of dist(u),
512 THE GRAPH DATA MODEL
represented in this data structure by G[u].dist, is greater than the sum of dist(v)
plus the label of the arc v → u. If so, then at line (11), we set dist(u) to its new,
smaller value, and at line (12) we call bubbleUp, so, if necessary, u can rise in the
partially ordered tree to reflect its new priority. The loop completes when at line
(13) we move p down the adjacency list of v.
EXERCISES
9.8.1: Find the shortest distance from Detroit to the other cities, according to the
graph of Fig. 9.21 (see the exercises for Section 9.4). If a city is unreachable from
Detroit, the minimum distance is “infinity.”
SEC. 9.8 DIJKSTRA’S ALGORITHM FOR FINDING SHORTEST PATHS 513
9.8.2: Sometimes, we wish to count the number of arcs traversed getting from one
node to another. For example, we might wish to minimize the number of transfers
needed in a plane or bus trip. If we label each arc 1, then a minimum-distance
calculation will count arcs. For the graph in Fig. 9.5 (see the exercises for Section
9.2), find the minimum number of arcs needed to reach each node from node a.
Australopithecus Afarensis AF
Australopithecus Africanus AA
Homo Habilis HH
Australopithecus Robustus AR
Homo Erectus HE
Australopithecus Boisei AB
Homo Sapiens HS
9.8.3: In Fig. 9.48(a) are seven species of hominids and their convenient abbrevia-
tions. Certain of these species are known to have preceded others because remains
have been found in the same place separated by layers indicating that time had
elapsed. The table in Fig. 9.48(b) gives triples (x, y, t) that mean species x has
been found in the same place as species y, but x appeared t millions of years before
y.
a) Draw a directed graph representing the data of Fig. 9.48, with arcs from the
earlier species to the later, labeled by the time difference.
b) Run Dijkstra’s algorithm on the graph from (a), with AF as the source, to find
the shortest time by which each of the other species could have followed AF.
9.8.4*: The implementation of Dijkstra’s algorithm that we gave takes O(m log n)
time, which is less than O(n2 ) time, except in the case that the number of arcs is
close to n2 , its maximum possible number. If m is large, we can devise another
implementation, without a priority queue, where we take O(n) time to select the
winner at each round, but only O(mu ) time, that is, time proportional to the
514 THE GRAPH DATA MODEL
number of arcs out of the settled node u, to update dist. The result is an O(n2 )
time algorithm. Develop the ideas suggested here, and write a C program for this
implementation of Dijkstra’s algorithm.
9.8.5**: Dijkstra’s algorithm does not always work if there are negative labels
on some arcs. Give an example of a graph with some negative labels for which
Dijkstra’s algorithm gives the wrong answer for some minimum distance.
9.8.6**: Let G be a graph for which we have run Dijkstra’s algorithm and settled
the nodes in some order. Suppose we add to G an arc u → v with a weight of 0, to
form a new graph G′ . Under what conditions will Dijkstra’s algorithm run on G′
settle the nodes in the same order as for G?
9.8.7*: In this section we took the approach of linking the arrays representing the
graph G and the partially ordered tree by storing integers that were indices into the
other array. Another approach is to use pointers to array elements. Reimplement
Dijkstra’s algorithm using pointers instead of integer indices.
✦
✦ ✦
✦
9.9 Floyd’s Algorithm for Shortest Paths
If we want the minimum distances between all pairs of nodes in a graph with n
nodes, with nonnegative labels, we can run Dijkstra’s algorithm with each of the n
nodes as source. Since one run of Dijkstra’s algorithm takes O(m log n) time, where
m is the larger of the number of nodes and number of arcs, finding the minimum
distances between all pairs of nodes this way takes O(mn log n) time. Moreover,
if m is close to its maximum, n2 , we can use an O(n2 )-time implementation of
Dijkstra’s algorithm discussed in Exercise 9.8.4, which when run n times gives us
an O(n3 )-time algorithm to find the minimum distances between each pair of nodes.
There is another algorithm for finding the minimum distances between all pairs
of nodes, called Floyd’s algorithm. This algorithm takes O(n3 ) time, and thus is in
principle no better than Dijkstra’s algorithm, and worse than Dijkstra’s algorithm
when the number of arcs is much less than n2 . However, Floyd’s algorithm works
on an adjacency matrix, rather than adjacency lists, and it is conceptually much
simpler than Dijkstra’s algorithm.
The essence of Floyd’s algorithm is that we consider in turn each node u of
Pivot the graph as a pivot. When u is the pivot, we try to take advantage of u as an
intermediate node between all pairs of nodes, as suggested in Fig. 9.49. For each
pair of nodes, say v and w, if the sum of the labels of arcs v → u and u → w, which
is d + e in Fig. 9.49, is less than the current label, f , of the arc from v to w, then
we replace f by d + e.
A fragment of code implementing Floyd’s algorithm is shown in Fig. 9.50. As
before, we assume nodes are named by integers starting at 0. We use NODE as the
type of nodes, but we assume this type is integers or an equivalent enumerated type.
We assume there is an n × n array arc, such that arc[v][w] is the label of the arc
v → w in the given graph. However, on the diagonal we have arc[v][v] = 0 for
all nodes v, even if there is an arc v → v. The reason is that the shortest distance
from a node to itself is always 0, and we do not wish to follow any arcs at all. If
there is no arc from v to w, then we let arc[v][w] be INFTY, a special value that is
much greater than any other label. There is a similar array dist that, at the end,
SEC. 9.9 FLOYD’S ALGORITHM FOR SHORTEST PATHS 515
0 0
1 1
·
· ·
· ·
d u ·
v e
f
w
·
· ·
· ·
·
n-1 n-1
NODE u, v, w;
holds the minimum distances; dist[v][w] will become the minimum distance from
node v to node w.
Lines (1) to (3) initialize dist to be arc. Lines (4) to (8) form a loop in which
each node u is taken in turn to be the pivot. For each pivot u, in a double loop on
v and w, we consider each pair of nodes. Line (7) tests whether it is shorter to go
from v to w through u than directly, and if so, line (8) lowers dist[v][w] to the
sum of the distances from v to u and from u to w.
✦ Example 9.27. Let us work with the graph of Fig. 9.10 from Section 9.3, using
the numbers 0 through 5 for the nodes; 0 is Laie, 1 is Kaneohe, and so on. Figure
9.51 shows the arc matrix, with label INFTY for any pair of nodes that do not have
a connecting edge. The arc matrix is also the initial value of the dist matrix.
Note that the graph of Fig. 9.10 is undirected, so the matrix is symmetric; that
516 THE GRAPH DATA MODEL
Warshall’s Algorithm
Sometimes, we are only interested in telling whether there exists a path between two
nodes, rather than what the minimum distance is. If so, we can use an adjacency
matrix where the type of elements is BOOLEAN (int), with TRUE (1) indicating the
presence of an arc and FALSE (0) its absence. Similarly, the elements of the dist
matrix are of type BOOLEAN, with TRUE indicating the existence of a path and FALSE
indicating that no path between the two nodes in question is known. The only
modification we need to make to Floyd’s algorithm is to replace lines (7) and (8) of
Fig. 9.50 by
(7) if (dist[v][w] == FALSE)
(8) dist[v][w] = dist[v][u] && dist[u][w];
These lines will set dist[v][w] to TRUE, if it is not already TRUE, whenever both
dist[v][u] and dist[u][w] are TRUE.
The resulting algorithm, called Warshall’s algorithm, computes the reflexive
and transitive closure of a graph of n nodes in O(n3 ) time. That is never better
than the O(nm) time that the method of Section 9.7 takes, where we used depth-first
search from each node. However, Warshall’s algorithm uses an adjacency matrix
rather than lists, and if m is near n2 , it may actually be more efficient than multiple
depth-first searches because of the simplicity of Warshall’s algorithm.
is, arc[v][w] = arc[v][w]. If the graph were directed, this symmetry might not
be present, but Floyd’s algorithm takes no advantage of symmetry, and thus works
for directed or undirected graphs.
0 1 2 3 4 5
0 0 24 INFTY INFTY INFTY 28
1 24 0 11 INFTY INFTY INFTY
2 INFTY 11 0 13 INFTY INFTY
3 INFTY INFTY 13 0 20 12
4 INFTY INFTY INFTY 20 0 15
5 28 INFTY INFTY 12 15 0
Fig. 9.51. The arc matrix, which is the initial value of the dist matrix.
The first pivot is u = 0. Since the sum of INFTY and anything is INFTY, the only
pair of nodes v and w, neither of which is u, for which dist[v][u] + dist[u][w] is
less than INFTY is v = 1 and w = 5, or vice versa.12 Since dist[1][5] is INFTY at
this time, we replace dist[1][5] by the sum of dist[1][0] + dist[0][5] which
is 52. Similarly, we replace dist[5][1] by 52. No other distances can be improved
with pivot 0, which leaves the dist matrix of Fig. 9.52.
12 If one of v and w is the u, it is easy to see dist[v][w] can never be improved by going
through u. Thus, we can ignore pairs of the form (v, u) or (u, w) when searching for pairs
whose distance is improved by going through the pivot u.
SEC. 9.9 FLOYD’S ALGORITHM FOR SHORTEST PATHS 517
0 1 2 3 4 5
0 0 24 INFTY INFTY INFTY 28
1 24 0 11 INFTY INFTY 52
2 INFTY 11 0 13 INFTY INFTY
3 INFTY INFTY 13 0 20 12
4 INFTY INFTY INFTY 20 0 15
5 28 52 INFTY 12 15 0
Now we make node 1 the pivot. In the current dist, shown in Fig. 9.52, node 1
has noninfinite connections to 0 (distance 24), 2 (distance 11), and 5 (distance 52).
We can combine these edges to reduce the distance between nodes 0 and 2 from
INFTY to 24+11 = 35. Also, the distance between 2 and 5 is reduced to 11+52 = 63.
Note that 63 is the distance along the path from Honolulu, to Kaneohe, to Laie, to
Wahiawa, not the shortest way to get to Wahiawa, but the shortest way that only
goes through nodes that have been the pivot so far. Eventually, we shall find the
shorter route through Pearl City. The current dist matrix is shown in Fig. 9.53.
0 1 2 3 4 5
0 0 24 35 INFTY INFTY 28
1 24 0 11 INFTY INFTY 52
2 35 11 0 13 INFTY 63
3 INFTY INFTY 13 0 20 12
4 INFTY INFTY INFTY 20 0 15
5 28 52 63 12 15 0
13 The reader should compare Fig. 9.55 with Fig. 9.49. The latter shows how to use a pivot
node in the general case of a directed graph, where the arcs in and out of the pivot may have
different labels. Fig. 9.55 takes advantage of the symmetry in the example graph, letting us
use edges between node 3 and the other nodes to represent both arcs into node 3, as on the
left of Fig. 9.49, and arcs out of 3, as on the right of Fig. 9.49.
518 THE GRAPH DATA MODEL
0 1 2 3 4 5
0 0 24 35 48 INFTY 28
1 24 0 11 24 INFTY 52
2 35 11 0 13 INFTY 63
3 48 24 13 0 20 12
4 INFTY INFTY INFTY 20 0 15
5 28 52 63 12 15 0
48
5 1
12 24
3
20 13
4 2
0 1 2 3 4 5
0 0 24 35 48 68 28
1 24 0 11 24 44 36
2 35 11 0 13 33 25
3 48 24 13 0 20 12
4 68 44 33 20 0 15
5 28 36 25 12 15 0
The use of 4 as a pivot does not improve any distances. When 5 is the pivot,
we can improve the distance between 0 and 3, since in Fig. 9.56,
dist[0][5] + dist[5][3] = 40
which is less than dist[0][3], or 48. In terms of cities, that corresponds to discov-
ering that it is shorter to go from Laie to Pearl City via Wahiawa than via Kaneohe
and Honolulu. Similarly, we can improve the distance between 0 and 4 to 43, from
68. The final dist matrix is shown in Fig. 9.57. ✦
0 1 2 3 4 5
0 0 24 35 40 43 28
1 24 0 11 24 44 36
2 35 11 0 13 33 25
3 40 24 13 0 20 12
4 43 44 33 20 0 15
5 28 36 25 12 15 0
nodes numbered
higher than k
nodes numbered
v
lower than k
w
Fig. 9.58. A k-path cannot have nodes higher than k, except (possibly) at the ends.
STATEMENT S(k): If labels of arcs are nonnegative, then just before we set u to
k + 1 in the loop of lines (4) to (8) of Fig. 9.50, dist[v][w] is the length of
the shortest k-path from v to w, or INFTY if there is no such path.
BASIS. The basis is k = −1. We set u to 0 just before we execute the body of the
loop for the first time. We have just initialized dist to be arc in lines (1) to (3).
Since the arcs and the paths consisting of a node by itself are the only (−1)-paths,
the basis holds.
INDUCTION. Assume S(k), and consider what happens to dist[v][w] during the
iteration of the loop with u = k + 1. Suppose P is a shortest (k + 1)-path from v
to w. There are two cases, depending on whether P goes through node k + 1.
1. If P is a k-path, that is, P does not actually go through node k + 1, then by
the inductive hypothesis, dist[v][w] already equals the length of P after the
kth iteration. We cannot change dist[u][v] during the round with k + 1 as
pivot, because there are no shorter (k + 1)-paths.
2. If P is a (k + 1)-path, we can assume that P only goes through node k + 1
once, because cycles can never decrease distances (recall we require all labels
to be nonnegative). Thus, P is composed of a k-path Q from v to node k + 1,
followed by a k-path R from node k + 1 to w, as suggested in Fig. 9.59. By the
inductive hypothesis, dist[v][k+1] and dist[k+1][w] will be the lengths of
paths Q and R, respectively, after the kth iteration.
k-path Q k-path R
v k+1 w
changed in the (k + 1)st iteration. The reason is that all arc labels are nonnegative,
and so all lengths of paths are nonnegative; thus the test of line (7) in Fig. 9.50
must fail when u (i.e., node k + 1) is one of v or w.
Thus, when we apply the test of line (7) for arbitrary v and w, with u = k + 1,
the values of dist[v][k+1] and dist[k+1][w] have not changed since the end of
the kth iteration. That is to say, the test of line (7) compares the length of the
shortest k-path, with the sum of the lengths of the shortest k-paths from v to k + 1
and from k + 1 to w. In case (1), where path P does not go through k + 1, the
former will be the shorter, and in case (2), where P does go through k + 1, the
latter will be the sum of the lengths of the paths Q and R in Fig. 9.59, and will be
the shorter.
We conclude that the (k + 1)st iteration sets dist[v][w] to the length of the
shortest (k + 1)-path, for all nodes v and w. That is the statement S(k + 1), and
so we conclude the induction.
To finish our proof, we let k = n − 1. That is, we know that after finishing all
n iterations, dist[v][w] is the minimum distance of any (n − 1)-path from v to
w. But since any path is an (n − 1)-path, we have shown that dist[v][w] is the
minimum distance along any path from v to w.
EXERCISES
9.9.1: Assuming all arcs in Fig. 9.5 (see the exercises for Section 9.2) have label 1,
use Floyd’s algorithm to find the length of the shortest path between each pair of
nodes. Show the distance matrix after pivoting with each node.
9.9.2: Apply Warshall’s algorithm to the graph of Fig. 9.5 to compute its reflexive
and transitive closure. Show the reachability matrix after pivoting with each node.
9.9.3: Use Floyd’s algorithm to find the shortest distances between each pair of
cities in the graph of Michigan in Fig. 9.21 (see the exercises for Section 9.4).
9.9.4: Use Floyd’s algorithm to find the shortest possible time between each of the
hominid species in Fig. 9.48 (see the exercises for Section 9.8).
9.9.5: Sometimes we want to consider only paths of one or more arcs, and exclude
single nodes as paths. How can we modify the initialization of the arc matrix so
that only paths of length 1 or more will be considered when finding the shortest
path from a node to itself?
9.9.6*: Find all the acyclic 2-paths in Fig. 9.10.
9.9.7*: Why does Floyd’s algorithm not work when there are both positive and
negative costs on the arcs?
9.9.8**: Give an algorithm to find the longest acyclic path between two given
nodes.
9.9.8**: Suppose we run Floyd’s algorithm on a graph G. Then, we lower the label
of the arc u → v to 0, to construct the new graph G′ . For what pairs of nodes s and
t will dist[s][t] be the same at each round when Floyd’s algorithm is applied to
G and G′ ?
522 THE GRAPH DATA MODEL
✦
✦ ✦
✦
9.10 An Introduction to Graph Theory
Graph theory is the branch of mathematics concerned with properties of graphs.
In the previous sections, we have presented the basic definitions of graph theory,
along with some fundamental algorithms that computer scientists have developed to
calculate key properties of graphs efficiently. We have seen algorithms for computing
shortest paths, spanning trees, and depth-first-search trees. In this section, we shall
present a few more important concepts from graph theory.
Complete Graphs
An undirected graph that has an edge between every pair of distinct nodes is called
a complete graph. The complete graph with n nodes is called Kn . Figure 9.60 shows
the complete graphs K1 through K4 .
n1 n1 n1 n2
n1
n2 n2 n3 n3 n4
K1 K2 K3 K4
{u, v} of Kn . For u we can pick any of the n nodes; for v we can pick any of the
remaining n − 1 nodes. The total number of choices is therefore n(n − 1). However,
we count each edge twice that way, once as {u, v} and a second time as {v, u}, so
that we must divide the total number of choices by 2 to get the correct number of
edges.
Complete There is also a notion of a complete directed graph. This graph has an arc
directed graph from every node to every other node, including itself. A complete directed graph
with n nodes has n2 arcs. Figure 9.61 shows the complete directed graph with 3
nodes and 9 arcs.
n1
n2 n3
Planar Graphs
An undirected graph is said to be planar if it is possible to place its nodes on a
plane and then draw its edges as continuous lines so that no two edges cross.
✦ Example 9.29. The graph K4 was drawn in Fig. 9.60 in such a way that its
two diagonal edges crossed. However, K4 is a planar graph, as we can see by the
drawing in Fig. 9.62. There, by redrawing one of the diagonals on the outside, we
Plane avoid having any two edges cross. We say that Fig. 9.62 is a plane presentation
presentation of the graph K4 , while the drawing in Fig. 9.60 is a nonplane presentation of K4 .
Note that it is permissible to have edges that are not straight lines in a plane
presentation. ✦
n1 n2
n3 n4
In Figure 9.63 we see what are in a sense the two simplest nonplanar graphs,
Nonplanar that is, graphs that do not have any plane presentation. One is K5 , the complete
graph graph with five nodes. The other is sometimes called K3,3 ; it is formed by taking
two groups of three nodes and connecting each node of one group to each node of
the other group, but not to nodes of the same group. The reader should try to
redraw each of these graphs so that no two edges cross, just to get a feel for why
they are not planar.
n1 n4
n1
n2 n3 n2 n5
n4 n5 n3 n6
K5 K3,3
Applications of Planarity
Planarity has considerable importance in computer science. For example, many
graphs or similar diagrams need to be presented on a computer screen or on paper.
For clarity, it is desirable to make a plane presentation of the graph, or if the graph
is not planar, to make as few crossings of edges as possible.
The reader may observe that in Chapter 13 we draw some fairly complex dia-
grams of circuits, which are really graphs whose nodes are gates and junction points
of wires, and whose edges are the wires. Since these circuits are not planar in gen-
eral, we had to adopt a convention in which wires were allowed to cross without
connecting, and a dot signals a connection of wires.
A related application concerns the design of integrated circuits. Integrated
circuits, or “chips,” embody logical circuits such as those discussed in Chapter 13.
They do not require that the logical circuit be inscribed in a plane presentation,
but there is a similar limitation that allows us to assign edges to several “levels,”
often three or four levels. On one level, the graph of the circuit must have a plane
presentation; edges are not allowed to cross. However, edges on different levels may
cross.
Graph Coloring
The problem of graph coloring for a graph G is to assign a “color” to each node
Chromatic so that no two nodes that are connected by an edge are assigned the same color.
number We may then ask how many distinct colors are required to color a graph in this
sense. The minimum number of colors needed for a graph G is called the chromatic
number of G, often denoted χ(G). A graph that can be colored with no more than
k -colorability k colors is called k-colorable.
y g
r
g b
Cliques
k -clique A clique in an undirected graph G is a set of nodes such that there is in G an edge
between every pair of nodes in the set. A clique of k nodes is called a k-clique. The
Clique number size of the largest clique in a graph is called the clique number of that graph.
EXERCISES
✦
✦ ✦
✦
9.11 Summary of Chapter 9
The table of Fig. 9.65 summarizes the various problems we have addressed in this
chapter, the algorithms for solving them, and the running time of the algorithms.
In this table, n is the number of nodes in the graph and m is the larger of the
number of nodes and the number of arcs/edges. Unless otherwise noted, we assume
graphs are represented by adjacency lists.
In addition, we have introduced the reader to most of the key concepts of graph
theory. These include
SEC. 9.12 BIBLIOGRAPHIC NOTES FOR CHAPTER 9 527
✦
✦ ✦
✦
9.12 Bibliographic Notes for Chapter 9
For additional material on graph algorithms, see Aho, Hopcroft, and Ullman [1974,
1983]. Depth-first search was first used to create efficient graph algorithms by
Hopcroft and Tarjan [1973]. Dijkstra’s algorithm is from Dijkstra [1959], Floyd’s
algorithm from Floyd [1962], Kruskal’s algorithm from Kruskal [1956], and War-
shall’s algorithm from Warshall [1962].
Berge [1962] covers the mathematical theory of graphs. Lawler [1976], Pa-
padimitriou and Steiglitz [1982], and Tarjan [1983] present advanced graph opti-
mization techniques.
Aho, A. V., J. E. Hopcroft, and J. D. Ullman [1974]. The Design and Analysis of
Computer Algorithms, Addison-Wesley, Reading, Mass.
Aho, A. V., J. E. Hopcroft, and J. D. Ullman [1983]. Data Structures and Algo-
rithms, Addison-Wesley, Reading, Mass.
528 THE GRAPH DATA MODEL
Berge, C. [1962]. The Theory of Graphs and its Applications, Wiley, New York.
Dijkstra, E. W. [1959]. “A note on two problems in connexion with graphs,” Num-
berische Mathematik 1, pp. 269–271.
Floyd, R. W. [1962]. “Algorithm 97: shortest path,” Comm. ACM 5:6, pp. 345.
Hopcroft, J. E., and R. E. Tarjan [1973]. “Efficient algorithms for graph manipula-
tion,” Comm. ACM 16:6, pp. 372-378.
Kruskal, J. B., Jr. [1956]. “On the shortest spanning subtree of a graph and the
traveling salesman problem,” Proc. AMS 7:1, pp. 48–50.
Lawler, E. [1976]. Combinatorial Optimization: Networks and Matroids, Holt,
Rinehart and Winston, New York.
Papadimitriou, C. H., and K. Steiglitz [1982]. Combinatorial Optimization: Algo-
rithms and Complexity, Prentice-Hall, Englewood Cliffs, New Jersey.
Tarjan, R. E. [1983]. Data Structures and Network Algorithms, SIAM, Philadelphia.
Warshall, S. [1962]. “A theorem on Boolean matrices,” J. ACM 9:1, pp. 11-12.
CHAPTER 10
Patterns,
✦
Automata,
✦ ✦
✦
and
Regular Expressions
A pattern is a set of objects with some recognizable property. One type of pattern
is a set of character strings, such as the set of legal C identifiers, each of which is
a string of letters, digits, and underscores, beginning with a letter or underscore.
Another example would be the set of arrays of 0’s and 1’s of a given size that
a character reader might interpret as representing the same symbol. Figure 10.1
shows three 7 × 7-arrays that might be interpreted as letter A’s. The set of all such
arrays would constitute the pattern called “A.”
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0
0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0
0 1 1 0 1 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 0
0 1 1 1 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1
1 1 0 0 0 1 1 1 0 0 1 1 0 0 1 0 0 0 0 0 1
1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0
The two fundamental problems associated with patterns are their definition
and their recognition, subjects of this and the next chapter. Recognizing patterns
is an integral part of tasks such as optical character recognition, an example of
which was suggested by Fig. 10.1. In some applications, pattern recognition is a
component of a larger problem. For example, recognizing patterns in programs
is an essential part of compiling — that is, the translation of programs from one
language, such as C, into another, such as machine language.
There are many other examples of pattern use in computer science. Patterns
play a key role in the design of the electronic circuits used to build computers and
other digital devices. They are used in text editors to allow us to search for instances
of specific words or sets of character strings, such as “the letters if followed by any
sequence of characters followed by then.” Most operating systems allow us to use
529
530 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
patterns in commands; for example, the UNIX command “ls *tex” lists all files
whose names end with the three-character sequence “tex”.
An extensive body of knowledge has developed around the definition and recog-
nition of patterns. This theory is called “automata theory” or “language theory,”
and its basic definitions and techniques are part of the core of computer science.
✦
✦ ✦
✦
10.1 What This Chapter Is About
This chapter deals with patterns consisting of sets of strings. In it, we shall learn:
✦ The “finite automaton” is a graph-based way of specifying patterns. These
come in two varieties, deterministic automata (Section 10.2) and nondetermin-
istic automata (Section 10.3).
✦ A deterministic automaton is convertible in a simple way into a program that
recognizes its pattern (Section 10.2).
✦ A nondeterministic automaton can be converted to a deterministic automaton
recognizing the same pattern by use of the “subset construction” discussed in
Section 10.4.
✦ Regular expressions are an algebra for describing the same kinds of patterns
that can be described by automata (Sections 10.5 through 10.7).
✦ Regular expressions can be converted to automata (Section 10.8) and vice versa
(Section 10.9).
We also discuss string patterns in the next chapter. There we introduce a
recursive notation called “context-free grammars” for defining patterns. We shall
see that this notation is able to describe patterns not expressible by automata
or regular expressions. However, in many cases grammars are not convertible to
programs in as simple manner as are automata or regular expressions.
✦
✦ ✦
✦
10.2 State Machines and Automata
Programs that search for patterns often have a special structure. We can identify
certain positions in the code at which we know something particular about the
program’s progress toward its goal of finding an instance of a pattern. We call these
State positions states. The overall behavior of the program can be viewed as moving from
state to state as it reads its input.
To make these ideas more concrete, let us consider a specific pattern-matching
problem: “What English words contain the five vowels in order?” To help answer
this question, we can use a word list that is found with many operating systems. For
example, in the UNIX system one can find such a list in the file /usr/dict/words,
where the commonly used words of English appear one to a line. In this file, some
of the words that contain the vowels in order are
abstemious
facetious
sacrilegious
SEC. 10.2 STATE MACHINES AND AUTOMATA 531
#include <stdio.h>
#define TRUE 1
#define FALSE 0
typedef int BOOLEAN;
BOOLEAN findChar(char **pp, char c)
{
(1) while (**pp != c && **pp != ’\0’)
(2) (*pp)++;
(3) if (**pp == ’\0’)
(4) return FALSE;
else {
(5) (*pp)++;
(6) return TRUE;
}
}
BOOLEAN testWord(char *p)
{
/* state 0 */
(7) if (findChar(&p, ’a’))
/* state 1 */
(8) if (findChar(&p, ’e’))
/* state 2 */
(9) if (findChar(&p, ’i’))
/* state 3 */
(10) if (findChar(&p, ’o’))
/* state 4 */
(11) if (findChar(&p, ’u’))
/* state 5 */
(12) return TRUE;
(13) return FALSE;
}
main()
{
(14) printf("%d\n", testWord("abstemious"));
}
The picture representing a program’s states is a directed graph, whose arcs are
labeled by sets of characters. There is an arc from state s to state t, labeled by the
set of characters C, if, when in state s, we go to state t exactly when we see one of
Transition the characters in set C. The arcs are called transitions. If x is one of the characters
in set C, which labels the transition from state s to state t, then if we are in state s
and receive an x as our next character, we say we “make a transition on x to state
t.” In the common case that set C is a singleton {x}, we shall label the arc by x,
rather than {x}.
Accepting state We also label certain of the nodes accepting states. When we reach one of these
and start state
SEC. 10.2 STATE MACHINES AND AUTOMATA 533
states, we have found our pattern and “accept.” Conventionally, accepting states
are represented by double circles. Finally, one of the nodes is designated the start
state, the state in which we begin to recognize the pattern. We indicate the start
state by an arrow entering from nowhere. Such a graph is called a finite automaton
Automaton or just automaton. We see an example of an automaton in Fig. 10.3.
The behavior of an automaton is conceptually simple. We imagine that an
automaton receives a list of characters known as the input sequence. It begins in
the start state, about to read the first character of the input sequence. Depending
on that character, it makes a transition, perhaps to the same state, or perhaps
to another state. The transition is dictated by the graph of the automaton. The
automaton then reads the second character and makes the proper transition, and
so on.
start a e i o u
0 1 2 3 4 5
Fig. 10.3. Automaton to recognize sequences of letters that have subsequence aeiou.
✦ Example 10.2. Our next example is from signal processing. Instead of regard-
ing all characters as potential inputs for an automaton, we shall allow only inputs 0
Bounce filter and 1. The particular automaton we shall design, sometimes called a bounce filter,
takes a sequence of 0’s and 1’s as inputs. The object is to “smooth” the sequence
by regarding a single 0 surrounded by 1’s as “noise,” and replacing the 0 by 1.
Similarly, one 1 surrounded by 0’s will be regarded as noise and replaced by 0.
As an example of how a bounce filter might be used, we could be scanning
a digitized black-white image, line by line. Each line of the image is, in fact, a
534 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
sequence of 0’s and 1’s. Since pictures sometimes do have small spots of the wrong
color, due, for example, to imperfections in the film or the photography process, it
is useful to get rid of such spots, in order to reduce the number of distinct regions
in the image and allow us to concentrate on “real” features, rather than spurious
ones.
Figure 10.4 is the automaton for our bounce filter. The interpretations of the
four states are as follows:
a) We have just seen a sequence of 0’s, at least two in a row.
b) We have just seen a sequence of 0’s followed by a single 1.
c) We have just seen a sequence of at least two 1’s.
d) We have just seen a sequence of 1’s followed by a single 0.
State a is designated the start state, which implies that our automaton will behave
as if there were an unseen prefix of 0’s prior to the input.
b d
1 0
0 1 1 0
start
a c
0 1
The accepting states are c and d. For this automaton, acceptance has a some-
what different significance from that of the automaton of Fig. 10.3. There, when we
reached the accepting state, we said that the whole input was accepted, including
characters the automaton had not even read yet.1 Here, we want an accepting state
to say “output a 1,” and a nonaccepting state to say “output a 0.” Under this in-
terpretation, we shall translate each bit in the input to a bit of the output. Usually,
the output will be the same as the input, but sometimes it will differ. For instance,
Fig. 10.5 shows the input, states, and their outputs when the input is 0101101.
Input: 0 1 0 1 1 0 1
State: a a b a b c d c
Output: 0 0 0 0 0 1 1 1
1 However, we could have modified the automaton to read all letters following the u, by adding
a transition from state 5 to itself on all letters.
SEC. 10.2 STATE MACHINES AND AUTOMATA 535
EXERCISES
10.2.1: Design automata to read strings of 0’s and 1’s, and
536 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
a) Determine if the sequence read so far has even parity (i.e., there have been an
even number of 1’s). Specifically, the automaton accepts if the string so far has
even parity and rejects if it has odd parity.
b) Check that there are no more than two consecutive 1’s. That is, accept unless
111 is a substring of the input string read so far.
What is the intuitive meaning of each of your states?
10.2.2: Indicate the sequence of states and the outputs when your automata from
Exercise 10.2.1 are given the input 101001101110.
10.2.3*: Design an automaton that reads a word (character string) and tells
whether the letters of the word are in sorted order. For example, adept and chilly
have their letters in sorted order; baby does not, because an a follows the first b.
The word must be terminated by a blank, so that the automaton will know when
it has read all the characters. (Unlike Example 10.1, here we must not accept until
we have seen all the characters, that is, until we reach the blank at the end of the
word.) How many states do you need? What are their intuitive meanings? How
many transitions are there out of each state? How many accepting states are there?
10.2.4: Design an automaton that tells whether a character string is a legal C
identifier (letter followed by letters, digits, or underscore) followed by a blank.
10.2.5: Write C programs to implement each of the automata of Exercises 10.2.1
through 10.2.4.
10.2.6: Design an automaton that tells whether a given character string is one of
the third-person singular pronouns, he, his, him, she, her, or hers, followed by a
blank.
10.2.7*: Convert your automaton from Exercise 10.2.6 into a C function and use
it in a program to find all places where the third-person singular pronouns appear
as substrings of a given string.
✦
✦ ✦
✦
10.3 Deterministic and Nondeterministic Automata
One of the most basic operations using an automaton is to take a sequence of
symbols a1 a2 · · · ak and follow from the start state a path whose arcs have labels
that include these symbols in order. That is, for i = 1, 2, . . . , k, ai is a member of the
set Si that labels the ith arc of the path. Constructing this path and its sequence
of states is called simulating the automaton on the input sequence a1 a2 · · · ak . This
Label of a path path is said to have the label a1 a2 · · · ak ; it may also have other labels, of course,
since the sets Si labeling the arcs along the path may each include many characters.
✦ Example 10.3. We did one such simulation in Fig. 10.5, where we followed the
automaton of Fig. 10.4 on input sequence 0101101. For another example, consider
the automaton of Fig. 10.3, which we used to recognize words with subsequence
aeiou. Consider the character string adept.
We start in state 0. There are two transitions out of state 0, one on the set of
characters Λ − a, and the other on a alone. Since the first character in adept is a,
SEC. 10.3 DETERMINISTIC AND NONDETERMINISTIC AUTOMATA 537
we follow the latter transition, which takes us to state 1. Out of state 1, there are
transitions on Λ − e and e. Since the second character is d, we must take the former
transition, because all letters but e are included in the set Λ − e. That leaves us
in state 1 again. As the third character is e, we follow the second transition out of
state 1, which takes us to state 2. The final two letters of adept are both included
in the set Λ − i, and so our next two transitions are from state 2 to state 2. We
thus finish our processing of adept in state 2. The sequence of state transitions is
shown in Fig. 10.6. Since state 2 is not an accepting state, we do not accept the
input adept. ✦
Input: a d e p t
State: 0 1 1 2 2 2
Deterministic Automata
The automata discussed in the previous section have an important property. For
any state s and any input character x, there is at most one transition out of state
s whose label includes x. Such an automaton is said to be deterministic.
Simulating a deterministic automaton on a given input sequence is straight-
forward. In any state s, given the next input character x, we consider each of the
labels of the transitions out of s. If we find a transition whose label includes x,
then that transition points to the proper next state. If none includes x, then the
automaton “dies,” and cannot process any more input, just as the automaton of
Fig. 10.3 dies after it reaches state 5, because it knows it already has found the
subsequence aeiou.
It is easy to convert a deterministic automaton into a program. We create a
piece of code for each state. The code for state s examines its input and decides
which of the transitions out of s, if any, should be followed. If a transition from
state s to state t is selected, then the code for state s must arrange for the code of
state t to be executed next, perhaps by using a goto-statement.
538 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
void bounce()
{
char x;
/* state a */
a: putchar(’0’);
x = getchar();
if (x == ’0’) goto a; /* transition to state a */
if (x == ’1’) goto b; /* transition to state b */
goto finis;
/* state b */
b: putchar(’0’);
x = getchar();
if (x == ’0’) goto a; /* transition to state a */
if (x == ’1’) goto c; /* transition to state c */
goto finis;
/* state c */
c: putchar(’1’);
x = getchar();
if (x == ’0’) goto d; /* transition to state d */
if (x == ’1’) goto c; /* transition to state c */
goto finis;
/* state d */
d: putchar(’1’);
x = getchar();
if (x == ’0’) goto a; /* transition to state a */
if (x == ’1’) goto c; /* transition to state c */
goto finis;
finis: ;
The code is shown in Fig. 10.7. For instance, in state a we print the character
0, because a is a nonaccepting state. If the input character is 0, we stay in state a,
and if the input character is 1, we go to state b. ✦
SEC. 10.3 DETERMINISTIC AND NONDETERMINISTIC AUTOMATA 539
t
{· · · , x, · · ·}
{· · · , x, · · ·}
u
Nondeterministic Automata
Nondeterministic automata are allowed (but not required) to have two or more
transitions containing the same symbol out of one state. Note that a deterministic
automaton is technically a nondeterministic automaton as well, one that happens
not to have multiple transitions on one symbol. An “automaton” is in general
nondeterministic, but we shall use “nondeterministic automaton” when we want to
emphasize that the automaton need not be a deterministic automaton.
Nondeterministic automata are not directly implementable by programs, as
we mentioned, but they are useful conceptual tools for a number of applications
that we shall discuss. Moreover, by using the “subset construction,” to be covered
in the next section, it is possible to convert any nondeterministic automaton to a
deterministic automaton that accepts the same set of character strings.
✦ Example 10.5. The League Against Sexist Speech (LASS) wishes to catch
sexist writing that contains the word “man.” They not only want to catch constructs
such as “ombudsman,” but more subtle forms of discrimination such as “maniac” or
“emancipate.” LASS plans to design a program using an automaton; that program
will scan character strings and “accept” when it finds the character string man
anywhere within the input.
Λ−m
start m a n
0 1 2 3
Λ−a
Λ−n
Fig. 10.9. Deterministic automaton that recognizes most, but not all,
strings ending in man.
One might first try a deterministic automaton like that shown in Fig. 10.9. In
this automaton, state 0, the start state, represents the case in which we have not
begun to see the letters of “man.” State 1 is intended to represent the situation in
which we have seen an m; in state 2 we have recognized ma, and in state 3 we have
seen man. In states 0, 1, and 2, if we fail to see the hoped-for letter, we go back to
state 0 and try again.
However, Fig. 10.9 does not quite work correctly. On an input string such as
command, it stays in state 0 while reading the c and o. It goes to state 1 on reading
the first m, but the second m takes it back to state 0, which it does not subsequently
leave.
A nondeterministic automaton that correctly recognizes character strings with
an embedded man is shown in Fig. 10.10. The key innovation is that in state 0 we
guess whether an m marks the beginning of man or not. Since the automaton is
nondeterministic, it is allowed to guess both “yes” (represented by the transition
from state 0 to state 1) and “no” (represented by the fact that the transition from
state 0 to state 0 can be performed on all letters, including m) at the same time.
SEC. 10.3 DETERMINISTIC AND NONDETERMINISTIC AUTOMATA 541
start m a n
0 1 2 3
Fig. 10.10. Nondeterministic automaton that recognizes all strings ending in man.
c 0 m m a n d
0 0 0 0 0 0 0 0
1 1 2 3
Since state 3 is an accepting state, we accept at that point.2 The fact that we
are also in state 0 after seeing comman is irrelevant as far as acceptance is concerned.
The final transition is on input d, from state 0 to 0. Note that state 3 goes nowhere
on any input, and so that branch dies.
Also note that the transitions back to state 0, which were present in Fig. 10.9
to handle the case in which the next character of the word man was not received,
are unnecessary in Fig. 10.10, because in Fig. 10.10 we are not compelled to follow
the sequence from state 0 to 1 to 2 to 3 when we see input man. Thus, although
state 3 looks like it is “dead” and ends computation when we see man, we are also
in state 0 upon seeing man. That state allows us to accept inputs like manoman by
staying in state 0 during the first man and going through states 1, 2, and 3 when
the second man is read. ✦
2 Notice that the automaton of Fig. 10.10, like that of Fig. 10.3, accepts when it sees the
pattern it is looking for, not at the end of the word. When we eventually convert Fig. 10.10
to a deterministic automaton, we can design from it a program that prints the entire word,
like the program of Fig. 10.2.
542 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
Of course, the design of Fig. 10.10, while appealing, cannot be turned into
a program directly. We shall see in the next section how it is possible to turn
Fig. 10.10 into a deterministic automaton with only four states. This deterministic
automaton, unlike that of Fig. 10.9, will correctly recognize all occurrences of man.
While we can convert any nondeterministic automaton into a deterministic one,
we are not always as fortunate as we are in the case of Fig. 10.10. In that case, the
corresponding deterministic automaton will be seen to have no more states than the
nondeterministic automaton, four states for each. There are other nondeterministic
automata whose corresponding deterministic automata have many more states. A
nondeterministic automaton with n states might be convertible only into a deter-
ministic automaton with 2n states. The next example happens to be one in which
the deterministic automaton has many more states than the nondeterministic one.
Consequently, a nondeterministic automaton may be considerably easier to design
than a deterministic automaton for the same problem.
✦ Example 10.6. When Peter Ullman, the son of one of the authors, was in fourth
grade, he had a teacher who tried to build the students’ vocabularies by assigning
Partial anagram them “partial anagram” problems. Each week they would be given a word and were
asked to find all the words that could be made using one or more of its letters.
One week, when the word was “Washington,” the two authors of this book
got together and decided to do an exhaustive search to see how many words were
possible. Using the file /usr/dict/words and a three-step procedure, we found 269
words. Among them were five 7-letter words:
agonist
goatish
showing
washing
wasting
Since the case of a letter is not significant for this problem, our first step was
to translate all upper-case letters in the dictionary into lower case. A program to
carry out this task is straightforward.
Our second step was to select the words that contain only characters from the
set S = {a,g,h,i,n,o,s,t,w}, the letters in washington. A simple, deterministic
automaton can do this task; one is shown in Fig 10.12. The newline character is
the character that marks the ends of lines in /usr/dict/words. In Fig. 10.12, we
stay in state 0 as long as we see letters that appear in washington. If we encounter
any other character besides newline, there is no transition, and the automaton
can never reach the accepting state 1. If we encounter newline after reading only
letters in washington, then we make the transition from state 0 to state 1, and
accept.
The automaton in Fig. 10.12 accepts words such as hash that have more occur-
rences of some letter than are found in the word washington itself. Our third and
final step, therefore, was to eliminate those words that contain three or more n’s or
two or more of another of the characters in set S. This task can also be done by
an automaton. For example, the automaton in Fig. 10.13 accepts words that have
at least two a’s. We stay in state 0 until we see an a, whereupon we go to state 1.
We stay there until we see a second a; at that point we go to state 2 and accept.
This automaton accepts those words that fail to be partial anagrams of washington
SEC. 10.3 DETERMINISTIC AND NONDETERMINISTIC AUTOMATA 543
start newline
0 1
because they have too many a’s. In this case, the words we want are exactly those
that never cause the automaton to enter the accepting state 2 at any time during
their processing.
Λ−a Λ−a
start a a
0 1 2
Λ Λ−a
start a a
0 1 2
Λ−g
g g
3 4
Λ−h
h h
5 6
Λ−i
i i
7 8
Λ−n Λ−n
n n n
9 10 11
Λ−o
o o
12 13
Λ−s
s s
14 15
Λ−t
t t
16 17
Λ−w
w w
18 19
Fig. 10.14. Nondeterministic automaton that detects words with more than one
a, g, h, i, o, s, t, or w, or more than two n’s.
SEC. 10.3 DETERMINISTIC AND NONDETERMINISTIC AUTOMATA 545
s h i n i n g
0 0 0 0 0 0 0 0
9 9
7 7 7
9 9 10 10
7 7 8
5 5 5 5 5 5
14 14 14 14 14 14 14
2. We found all resulting words that are accepted by the automaton of Fig. 10.12
and therefore consist only of letters in washington.
3. We removed from the list created in (2) all those words that are accepted by
the nondeterministic automaton of Fig. 10.14.
This algorithm is a straightforward way to find all the partial anagrams of “Wash-
ington” in the file /usr/dict/words. Of course, we must find some reasonable way
to simulate the nondeterministic automaton of Fig. 10.14, and we shall discuss how
that can be done in the next sections. ✦
EXERCISES
10.3.5: Simulate the automata of Figs. 10.9 and 10.10 on the input string summand.
10.3.6: Simulate the automaton of Fig. 10.14 on input strings
a) saint
b) antagonist
c) hashish
Which are accepted?
10.3.7: We can represent an automaton by a relation with attributes State, Input,
and Next. The intent is that if (s, x, t) is a tuple, then input symbol x is a label of
the transition from state s to state t. If the automaton is deterministic, what is a
suitable key for the relation? What if the automaton is nondeterministic?
10.3.8: What data structures would you suggest for representing the relation of
the previous exercise, if we only wanted to find the next state(s), given a state and
an input symbol?
10.3.9: Represent the automata of
a) Fig. 10.10
b) Fig. 10.9
c) Fig. 10.14
as relations. You may use ellipses to represent the transitions on large sets of letters
such as Λ − m.
SEC. 10.4 FROM NONDETERMINISM TO DETERMINISM 547
✦
✦ ✦
✦
10.4 From Nondeterminism to Determinism
In this section we shall see that every nondeterministic automaton can be replaced
by a deterministic one. As we have seen, it is sometimes easier to think of a non-
deterministic automaton to perform a certain task. However, because we cannot
write programs from nondeterministic automata as readily as from deterministic
machines, it is quite important that there is an algorithm to transform a nondeter-
ministic automaton into an equivalent deterministic one.
Equivalence of Automata
In the previous sections, we have seen two views of acceptance. In some examples,
such as Example 10.1 (words containing the subsequence aeiou), we took accep-
tance to mean that the entire word was accepted, even though we may not have
scanned the entire word yet. In others, like the bounce filter of Example 10.2, or the
automaton of Fig. 10.12 (words whose letters are all in washington), we accepted
only when we wanted to signal approval of the exact input that we had seen since we
started the automaton. Thus in Example 10.2 we accepted all sequences of inputs
that result in a 1 output. In Fig. 10.12, we accepted only when we had seen the
newline character, and thus knew that the entire word had been seen.
When we talk about the formal behavior of automata, we require only the
second interpretation (the input so far is accepted). Formally, suppose A and B
are two automata (deterministic or not). We say A and B are equivalent if they
accept the same set of input strings. Put another way, if a1 a2 · · · ak is any string of
symbols, then the following two conditions hold:
1. If there is a path labeled a1 a2 · · · ak from the start state of A to some accepting
state of A, then there is also a path labeled a1 a2 · · · ak from the start state of
B to some accepting state of B, and
2. If there is a path labeled a1 a2 · · · ak from the start state of B to some accepting
state of B, then there is also a path labeled a1 a2 · · · ak from the start state of
A to some accepting state of A.
✦ Example 10.7. Consider the automata of Figs. 10.9 and 10.10. As we noted
from Fig. 10.11, the automaton of Fig. 10.10 accepts the input string comman, be-
cause this sequence of characters labels the path 0 → 0 → 0 → 0 → 1 → 2 → 3 in
Fig. 10.10, and this path goes from the start state to an accepting state. However,
in the automaton of Fig. 10.9, which is deterministic, we can check that the only
path labeled comman is 0 → 0 → 0 → 1 → 0 → 0 → 0. Thus if Fig. 10.9 is au-
tomaton A, and Fig. 10.10 is automaton B, we have a violation of point (2) above,
which tells us that these two automata are not equivalent. ✦
is, after reading some input list a1 a2 · · · ak , the nondeterministic automaton is “in”
those states that are reached from the start state along paths labeled a1 a2 · · · ak .
✦ Example 10.8. After reading input string shin, the automaton illustrated in
Fig. 10.15 is in the set of states {0, 5, 7, 9, 14}. These are the states that appear
in the column just after the first n. After reading the next i, it is in the set
of states {0, 5, 7, 8, 9, 14}, and after reading the following n, it is in set of states
{0, 5, 7, 9, 10, 14}. ✦
T U
S
0 0
0
5 5
5
i 7 n 7
7
8 9
9
9 10
14
14 14
The accepting states of D are those sets of N ’s states that include at least one
accepting state of N . That makes intuitive sense. If S is a state of D and a set of
N ’s states, then the inputs a1 a2 · · · ak that take D from its start state to state S
also take N from its start state to all of the states in S. If S includes an accepting
state, then a1 a2 · · · ak is accepted by N , and D must also accept. Since D enters
only state S on receiving input a1 a2 · · · ak , S must be an accepting state of D.
start m a n
0 1 2 3
Fig. 10.17. Nondeterministic automaton that recognizes the strings ending in man.
Λ−m
start m
{0} {0, 1}
Next, we must consider the transitions out of {0, 1}. Examining Fig. 10.17
again, we see that on all inputs except m and a, state 0 goes only to 0, and state 1
goes nowhere. Thus there is a transition from state {0, 1} to state {0}, labeled by
all the letters except m and a. On input m, state 1 again goes nowhere, but 0 goes to
550 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
both 0 and 1. Hence there is a transition from {0, 1} to itself, labeled by m. Finally,
on input a, state 0 goes only to itself, but state 1 goes to state 2. Thus there is a
transition labeled a from state {0, 1} to state {0, 2}. The portion of D constructed
so far is shown in Fig. 10.19.
Λ−m m
start m a
{0} {0, 1} {0, 2}
Λ−m−a
Now we need to construct the transitions out of state {0, 2}. On all inputs
except m and n, state 0 goes only to 0, and state 2 goes nowhere, and so there is a
transition from {0, 2} to {0}, labeled by all the letters except m and n. On input
m, state 2 goes nowhere, and 0 goes to both 0 and 1, and so there is a transition
from {0, 2} to {0, 1} labeled m. On input n, state 0 goes only to itself and 2 goes
to 3. Thus there is a transition labeled n from {0, 2} to {0, 3}. This state of D is
accepting, since it includes the accepting state of Fig. 10.17, state 3.
Finally, we must supply the transitions out of {0, 3}. Since state 3 goes nowhere
on any input, the transitions out of {0, 3} will reflect the transitions out of 0 only,
and thus will go to the same states as state {0}. As the transitions out of {0, 3} do
not take us to any state of D that we have not already seen, the construction of D
is finished. The complete deterministic automaton is shown in Fig. 10.20.
Notice that this deterministic automaton correctly accepts all and only the
strings of letters that end in man. Intuitively, the automaton will be in state {0}
whenever the character string so far does not end with any prefix of man except the
empty string. State {0, 1} means that the string seen so far ends in m, {0, 2} means
that it ends in ma, and {0, 3} means that it ends in man. ✦
✦ Example 10.10. The nondeterministic automaton of Fig. 10.17 has four states,
and its equivalent deterministic automaton of Fig. 10.20 has four states also. It
would be nice if all nondeterministic automata converted to small deterministic au-
tomata, and many common examples used in compiling of programming languages,
for example, do in fact convert to relatively small deterministic automata. Yet there
is no guarantee that the deterministic automaton will be small, and a k-state nonde-
terministic automaton could wind up being converted to a deterministic automaton
with as many as 2k states. That is, the deterministic automaton could have one
state for every member of the power set of the set of states of the nondeterministic
automaton.
As an example where we get many states, consider the automaton of Fig. 10.14,
from the previous section. Since this nondeterministic automaton has 20 states,
conceivably the deterministic automaton constructed by the subset construction
SEC. 10.4 FROM NONDETERMINISM TO DETERMINISM 551
Λ−m m m
m
start m a n
{0} {0, 1} {0, 2} {0, 3}
Λ−m−a
Λ−m−n
Λ−m
could have as many as 220 states or over a million states; these states would be all
members of the power set of {0, 1, . . . , 19}. It turns out not to be that bad, but
there are quite a few states.
We shall not attempt to draw the equivalent deterministic automaton for Fig.
10.14. Rather, let us think about what sets of states we actually need. First, since
there is a transition from state 0 to itself on every letter, all sets of states that we
actually see will include 0. If the letter a has not yet been input, then we cannot
get to state 1. However, if we have seen exactly one a, we shall be in state 1, no
matter what else we have seen. We can make an analogous observation about any
of the other letters in washington.
If we start Fig. 10.14 in state 0 and feed it a sequence of letters that are a
subset of the letters appearing in washington, then in addition to being in state
0, we shall also be in some subset of the states 1, 3, 5, 7, 9, 12, 14, 16, and 18.
By choosing the input letters properly, we can arrange to be in any of these sets of
states. As there are 29 = 512 such sets, there are at least that number of states in
the deterministic automaton equivalent to Fig. 10.14.
However, there are even more states, because letter n is treated specially in
552 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
Fig. 10.14. If we are in state 9, we can also be in state 10, and in fact we shall be
in both 9 and 10 if we have seen two n’s. Thus, while for the other eight letters we
have two choices (e.g., for letter a, either include state 1 or don’t), for letter n we
have three choices (include neither of 9 and 10, include 9 only, or include both 9
and 10). Thus there are at least 3 × 28 = 768 states.
But that is not all. If the input so far ends in one of the letters of washington,
and we previously saw enough of that letter, then we shall also be in the accepting
state corresponding to that letter (e.g., state 2 for a). However, we cannot be in
two accepting states after the same input. Counting the number of additional sets
of states becomes trickier.
Suppose accepting state 2 is a member of the set. Then we know 1 is a member
of the set, and of course 0 is a member, but we still have all our options for the
states corresponding to the letters other than a; that number of sets is 3 × 27 , or
384. The same applies if our set includes accepting state 4, 6, 8, 13, 15, 17, or 19;
in each case there are 384 sets including that accepting state. The only exception
is when accepting state 11 is included (and therefore 9 and 10 are also present).
Then, there are only 28 = 256 options. The total number of states in the equivalent
deterministic automaton is thus
768 + 8 × 384 + 256 = 4864
The first term, 768, counts the sets that have no accepting state. The next term
counts the eight cases in which the set includes the accepting state for one of the
eight letters other than n, and the third term, 256, counts the sets that include
state 11. ✦
start a1 a2 · · · ak
{s1 , s2 , . . . , sn }
(a) In automaton D.
start a1 a2 · · · ak
s1
a1 a2 · · · ak
s2
···
a1 a2 · · · ak
sn
(b) In automaton N.
a1 a2 · · · ak from the start state of D is exactly the set of states of N that are reached
from N ’s start state by following some path labeled a1 a2 · · · ak .
BASIS. Let k = 0. A path of length 0 leaves us where we started, that is, in the
start state of both automata D and N . Recall that if s0 is the start state of N ,
then the start state of D is {s0 }. Thus the inductive statement holds for k = 0.
INDUCTION. Suppose the statement holds for k, and consider an input string
a1 a2 · · · ak ak+1
Then the path from the start state of D to state T labeled by a1 a2 · · · ak ak+1 appears
as shown in Fig. 10.22; that is, it goes through some state S just before making the
last transition to T on input ak+1 .
start a1 a2 · · · ak ak+1
S T
We may assume, by the inductive hypothesis, that S is exactly the set of states
that automaton N reaches from its start state along paths labeled a1 a2 · · · ak , and
we must prove that T is exactly the set of states N reaches from its start state along
paths labeled a1 a2 · · · ak ak+1 . There are two parts to the proof of this inductive
step.
554 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
1. We must prove that T does not contain too much; that is, if t is a state of N
that is in T , then t is reached by a path labeled a1 a2 · · · ak ak+1 from N ’s start
state.
2. We must prove that T contains enough; that is, if t is a state of N reached
from the start state along a path labeled a1 a2 · · · ak ak+1 , then t is in T .
For (1), let t be in T . Then, as suggested by Fig. 10.23, there must be a state
s in S that justifies t being in T . That is, there is in N a transition from s to t,
and its label includes ak+1 . By the inductive hypothesis, since s is in S, there must
be a path from the start state of N to s, labeled a1 a2 · · · ak . Thus there is a path
from the start state of N to t, labeled a1 a2 · · · ak ak+1 .
start a1 a2 · · · ak ak+1
s t
S T
Now we must show (2), that if there is a path from the start state of N to t,
labeled a1 a2 · · · ak ak+1 , then t is in T . This path must go through some state s just
before making a transition to t on input ak+1 . Thus there is a path from the start
state of N to s, labeled a1 a2 · · · ak . By the inductive hypothesis, s is in set of states
S. Since N has a transition from s to t, with a label that includes ak+1 , the subset
construction applied to set of states S and input symbol ak+1 , demands that t be
placed in T . Thus t is in T .
Given the inductive hypothesis, we have now shown that T consists of exactly
the states of N that are reachable from the start state of N along some path labeled
a1 a2 · · · ak ak+1 . That is the inductive step, and we conclude that the state of the
deterministic automaton D reached along the path labeled a1 a2 · · · ak is always the
set of N ’s states reachable along some path with that label. Since the accepting
states of D are those that include an accepting state of N , we conclude that D
and N accept the same strings; that is, D and N are equivalent, and the subset
construction “works.”
EXERCISES
Minimization of Automata
One of the issues concerning automata, especially when they are used to design
circuits, is how few states are needed to perform a given task. That is, we may ask,
given an automaton, whether there is an equivalent automaton with fewer states,
and if so, what is the least number of states of any equivalent automaton?
It turns out that if we restrict ourselves to deterministic automata, there is a
unique minimum-state deterministic automaton equivalent to any given automaton,
and it is fairly easy to find it. The key is to define when two states s and t of a
Equivalent deterministic automaton are equivalent, that is, for any input sequence, the paths
states from s and t labeled by that sequence either both lead to accepting states or neither
does. If states s and t are equivalent, then there is no way to tell them apart by
feeding inputs to the automaton, and so we can merge s and t into a single state.
Actually, we can more easily define when states are not equivalent, as follows.
BASIS. If s is an accepting state and t is not accepting, or vice versa, then s and t
are not equivalent.
INDUCTION. If there is some input symbol x such that there are transitions from
states s and t on input x to two states that are known not to be equivalent, then s
and t are not equivalent.
There are some additional details necessary to make this test work; in particu-
Dead state lar, we may have to add a “dead state,” which is not accepting and has transitions
to itself on every input. As a deterministic automaton may have no transition out
of a given state on a given symbol, before performing this minimization procedure,
we need to add transitions to the dead state from any state, on all inputs for which
no other transition exists. We note that there is no similar theory for minimizing
nondeterministic automata.
10.4.4*: Some automata have state-input combinations for which there is no tran-
sition at all. If state s has no transition on symbol x, we can add a transition
from s to a special “dead state” on input x. The dead state is not accepting, and
has a transition to itself on every input symbol. Show that adding a “dead state”
produces an automaton equivalent to the one with which we started.
10.4.5: Show that if we add a dead state to a deterministic automaton, we can
get an equivalent automaton that has paths from the start state labeled by every
possible string.
10.4.6*: Show that if we apply the subset construction to a deterministic automa-
ton, we either get the same automaton, with each state s renamed {s}, or we add
a dead state (corresponding to the empty set of states).
10.4.7**: Suppose that we take a deterministic automaton and change every ac-
cepting state to nonaccepting and every nonaccepting state to accepting.
a) How would you describe the language accepted by the new automaton in terms
of the language of the old automaton?
556 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
{a, b} {a, b}
a)
start {a, b}
0 1
{a, b, c} {a, b, c}
b)
start a b c
0 1 2 3
{a, b}
c)
start a {a, b} {a, b} {a, b}
0 1 2 3 4
{a, b, c} {a, b, c}
d)
start a b c
0 1 2 3
{a, b, c}
b a
4 5
{a, b, c}
b b a
6 7 8
✦
✦ ✦
✦
10.5 Regular Expressions
An automaton defines a pattern, namely the set of strings labeling paths from the
initial state to some accepting state in the graph of the automaton. In this section,
we meet regular expressions, which are an algebraic way to define patterns. Regular
expressions are analogous to the algebra of arithmetic expressions with which we
SEC. 10.5 REGULAR EXPRESSIONS 557
Notational Convention
We shall continue to use typewriter font for the characters that appear in strings.
The regular expression atomic operand for a given character will be denoted by
that character in boldface. For instance, a is the regular expression corresponding
to character a. When we need to use a variable, we shall write it in italics. Variables
are used to stand for complicated expressions. For instance, we shall use the variable
letter to stand for “any letter,” a set whose regular expression we shall soon meet.
are all familiar, and the relational algebra that we met in Chapter 8. Interestingly,
the set of patterns that can be expressed in the regular-expression algebra is exactly
the same set of patterns that can be described by automata.
Union
The first, and most familiar, is the union operator, which we shall denote | .3 The
rule for union is that if R and S are two regular expressions, then R | S denotes
the union of the languages that R and S denote. That is, L(R | S) = L(R) ∪ L(S).
Recall that L(R) and L(S) are each sets of strings, so the notion of taking their
union makes sense.
Concatenation
The second operator for the algebra of regular expressions is called concatenation.
It is represented by no operator symbol at all, just as multiplication is sometimes
written without an operator; for instance, in arithmetic ab denotes the product of a
and b. Like union, concatenation is a binary, infix operator. If R and S are regular
expressions, then RS is the concatenation of R and S.4
L(RS), the language denoted by RS, is formed from the languages L(R) and
L(S), as follows. For each string r in L(R) and each string s in L(S), the string rs,
the concatenation of strings r and s, is in L(RS). Recall the concatenation of two
lists, such as character strings, is formed by taking the elements of the first list, in
order, and following them by the elements of the second list, in order.
3 The plus sign + is also commonly used for the union operator in regular expressions; we shall
not do so, however.
4 Technically, we should write RS as (R)(S), to emphasize the fact that R and S are separate
expressions and that their parts must not be blended because of precedence rules. The
situation is analogous to the fact that if we multiply the arithmetic expression w + x by
the arithmetic expression y + z, we must write the product as (w + x)(y + z). Note that,
because multiplication takes precedence over addition, the product with parentheses omitted,
w + xy + z, would not be interpreted as the product of w + x and y + z. As we shall see,
concatenation and union have precedences that make them similar to multiplication and
addition, respectively.
SEC. 10.5 REGULAR EXPRESSIONS 559
✦ Example 10.12. Let R be the regular expression a, and so L(R) is the set {a}.
Also let S be the regular expression b, so L(S) = {b}. Then RS is the expression
ab. To form L(RS), we need to take every string in L(R) and concatenate it
with every string in L(S). In this simple case, both languages L(R) and L(S) are
singletons, so we have only one choice from each. We pick a from L(R) and b from
L(S), and concatenate these lists of length 1 to get the string ab. Thus L(RS) is
{ab}. ✦
Example 10.12 can be generalized, in the sense that any string, written in
boldface, is a regular expression that denotes the language consisting of one string,
the corresponding list of characters. For instance, then is a regular expression whose
language is {then}. We shall see that concatenation is an associative operator, so
it doesn’t matter how the characters in the regular expression are grouped, and we
do not need to use any parentheses.
✦ Example 10.13. Now let us look at the concatenation of two regular ex-
pressions whose languages are not singleton sets. Let R be the regular expression
a | (ab).5 The language L(R) is the union of L(a) and L(ab), that is {a, ab}.
Let S be the regular expression
c | (bc). Similarly, L(S) = {c, bc}. The regular
expression RS is a | (ab) c | (bc) . Note that the parentheses around R and S
are required, because of precedence.
c bc
a ac abc
ab abc abbc
Fig. 10.25. Forming the concatenation of {a, ab} with {c, bc}.
5 As we shall see, concatenation takes precedence over union, so the parentheses are redundant.
560 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
To discover the strings in L(RS), we pair each of the two strings from L(R)
with each of the two strings from L(S). This pairing is suggested in Fig. 10.25.
From a in L(R) and c in L(S), we get the string ac. The string abc is obtained
in two different ways, either as (a)(bc) or as (ab)(c). Finally, the string abbc is
obtained as the concatenation of ab from L(R) and bc from L(S). Thus L(RS) is
{ac, abc, abbc}. ✦
Note that the number of strings in the language L(RS) cannot be greater than
the product of the number of strings in L(R) times the number of strings in L(S).
In fact, the number of strings in L(RS) is exactly this product, unless there are
“coincidences,” in which the same string is formed in two or more different ways.
Example 10.13 was an instance where the string abc was produced in two ways,
and therefore the number of strings in L(RS), which was 3, was one less than the
product of the number of strings in the languages of R and S. Similarly, the number
of strings in the language L(R | S) is no greater than the sum of the number of
strings in the languages L(R) and L(S), and can be less only when there are strings
in common to L(R) and L(S). As we shall see when we discuss algebraic laws for
these operators, there is a close, although not exact, analogy between union and
concatenation on one hand, and the arithmetic operators + and × on the other.
Closure
The third operator is called Kleene closure or just closure.6 It is a unary, postfix
operator; that is, it takes one operand and it appears after that operand. Closure is
denoted by a star, so R* is the closure of regular expression R. Because the closure
operator is of highest precedence, it is often necessary to put parentheses around
the R, and write (R)*.
The effect of the closure operator is to say “zero or more occurrences of strings
in R.” That is, L(R*) consists of
1. The empty string ǫ, which we can think of as zero occurrences of strings in R.
2. All the strings in L(R); these represent one occurrence of a string in L(R).
3. All the strings in L(RR), the concatenation of L(R) with itself; these represent
two occurrences of strings in L(R).
4. All the strings in L(RRR), L(RRRR), and so on, representing three, four, and
more occurrences of strings in L(R).
We can informally write
R* = ǫ | R | RR | RRR | · · ·
However, we must understand that the expression on the right side of the equals sign
is not a regular expression, because it contains an infinite number of occurrences
of the union operator. All regular expressions are built from a finite number of
occurrences of the three operators.
6 Steven C. Kleene wrote the original paper describing the algebra of regular expressions.
SEC. 10.5 REGULAR EXPRESSIONS 561
✦ Example 10.15. Now let R be the regular expression a | b, so L(R) = {a, b},
and consider what L(R*) is. Again, this language contains ǫ, representing zero
occurrences of strings from L(R). One occurrence of a string from R gives us
{a, b} for L(R*). Two occurrences give us the four strings {aa, ab, ba, bb}, three
occurrences give us the eight strings of length three that consist of a’s and/or b’s,
and so on. Thus L(R*) is all strings of a’s and b’s of any finite length whatsoever. ✦
✦ Example 10.16. Consider the expression a | bc*d. We first consider the *’s.
There is only one, and to its left, the smallest expression is c. We may thus group
this * with its operand, as a | b(c*)d.
Next, we consider the concatenations in the above expression. There are two,
one between the b and the left parenthesis, and the second between the right paren-
thesis and the d. Considering the first, we find the expression b immediately to the
left, but to the right we must go until we include the right parenthesis, since expres-
sions must have balanced parentheses. Thus the operands of the first concatenation
are b and (c*). We place parentheses around these to get the expression
a | b(c*) d
For the second concatenation, the shortest expression immediately to the left is now
b(c*) , and the shortest expression immediately to the right is d. With parentheses
added to group the operands of this concatenation, the expression becomes
a | b(c*) d
562 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
Finally, we must consider the unions. There is only one; its left operand is a,
and its right operand is the rest of the expression above. Technically, we must place
parentheses around the entire expression, yielding
a | b(c*) d
✦ Example 10.17. We can extend the idea from Example 10.15 to say “strings
of any length consisting of symbols a1 , a2 , . . . , an ” with the regular expression
(a1 | a2 | · · · | an )*
For instance, we can describe C identifiers as follows. First, define the regular
expression
letter = A | B | · · · | Z | a | b | · · · | z |
That is, the “letters” in C are the upper- and lowercase letters and the underscore.
Similarly define
digit = 0 | 1 | · · · | 9
Then the regular expression
letter(letter | digit)*
represents all strings of letters, digits, and underscores not beginning with a digit. ✦
0 by a 1. Thus the expression (0 | 1)*11(1 | 01)* represents all strings of 0’s and
1’s that end in two 1’s followed by any sequence in which 0’s, if any, are followed
immediately by 1’s. The final factor, (ǫ | 0), says “an optional 0,” that is, the
strings just described may be followed by a 0, or not, as we choose. ✦
EXERCISES
10.5.1: In Example 10.13 we considered the regular expression (a | ab)(c | bc),
and saw that its language consisted of the three strings ac, abc, and abbc, that is,
an a and a c, separated by from zero to two b’s. Write two other regular expressions
that define the same language.
10.5.2: Write regular expressions that define the following languages.
a) The strings for the six C comparison operators, =, <=, <, >=, >, and !=.
b) All strings of 0’s and 1’s that end in 0.
c) All strings of 0’s and 1’s with at least one 1.
d) All strings of 0’s and 1’s with at most one 1.
e) All strings of 0’s and 1’s such that the third position from the right end is 1.
f) All strings of lower-case letters that are in sorted order.
10.5.3*: Write regular expressions that define the following languages.
a) All strings of a’s and b’s such that all runs of a’s are of even length. That is,
strings such as bbbaabaaaa, aaaabb, and ǫ are in the language; abbabaa and
aaa are not.
b) Strings that represent numbers of type float in C.
c) Strings of 0’s and 1’s having even parity, that is, an even number of 1’s. Hint :
Think of even-parity strings as the concatenation of elementary strings with
even parity, either a single 0 or a pair of 1’s separated only by 0’s.
10.5.4**: Write regular expressions that define the following languages.
a) The set of all C identifiers that are not keywords. If you forget some of the
keywords, it is not serious. The point of the exercise is to express strings that
are not in some reasonably large set of strings.
b) All strings of a’s, b’s, and c’s such that no two consecutive positions are the
same character.
c) The set of all strings of two lower-case letters that are not the same. Hint : You
can “brute-force” this one, but there are 650 pairs of distinct letters. A better
idea is to do some grouping. For example, the relatively short expression
(a | b | · · · | m)(n | o | · · · | z)
covers 169 of the 650 pairs.
d) All strings of 0’s and 1’s that, as a binary number, represent an integer that is
a multiple of 3.
10.5.5: Put parentheses in the following regular expressions to indicate the proper
grouping of operands according to the precedence of operators union, concatenation,
and closure.
564 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
a) a | bc | de
b) a | b* | (a | b)*a
10.5.6: Remove redundant parentheses from the following expressions, that is,
remove parentheses whose grouping would be implied by the precedence of operators
and the fact that union and concatenation are each associative (and therefore, the
grouping of adjacent unions or adjacent concatenations is irrelevant).
a) (ab)(cd)
b) a | b(c)*
c) (a) | b (c | d)
✦
✦ ✦
✦
10.6 The UNIX Extensions to Regular Expressions
The UNIX operating system has several commands that use a regular-expression
like notation to describe patterns. Even if the reader is not familiar with UNIX or
with most of these commands, these notations are useful to know. We find regular
expressions used in at least three kinds of commands.
1. Editors. The UNIX editors ed and vi, as well as most modern text editors,
allow the user to scan text for a place where an instance of a given pattern is
found. The pattern is specified by a regular expression, although there is no
general union operator, just “character classes,” which we shall discuss below.
2. The pattern-matching program grep and its cousins. The UNIX command grep
scans a file and examines each line. If the line contains a substring that matches
the pattern specified by a regular expression, then the line is printed (grep
stands for “globally search for regular expression and print”). The command
grep itself allows only a subset of the regular expressions, but the extended
command egrep allows the full regular expression notation, including some
other extensions. The command awk allows full regular expression searching,
and also treats lines of text as if they were tuples of a relation, thus allowing
operations of relational algebra like selection and projection to be performed
on files.
3. Lexical analysis. The UNIX command lex is useful for writing a piece of a
compiler and for many similar tasks. The first thing a compiler must do is par-
Token tition a program into tokens, which are substrings that fit together logically.
Examples are identifiers, constants, keywords such as then, and operators such
as + or <=. Each token type can be specified as a regular expression; for in-
stance, Example 10.17 showed us how to specify the token class “identifier.”
SEC. 10.6 THE UNIX EXTENSIONS TO REGULAR EXPRESSIONS 565
The lex command allows the user to specify the token classes by regular ex-
pressions. It then produces a program that serves as a lexical analyzer, that is,
a program that partitions its input into tokens.
Character Classes
Often, we need to write a regular expression that denotes a set of characters, or
strictly speaking, a set of character strings of length one, each string consisting of
a different character in the set. Thus in Example 10.17 we defined the expression
letter to denote any of the strings consisting of one upper- or lower-case letter, and
we defined the expression digit to denote any of the strings consisting of a single
digit. These expressions tend to be rather long, and UNIX provides some important
shorthands.
First, we can enclose any list of characters in square brackets, to stand for the
regular expression that is the union of these letters. Such an expression is called a
character class. For example, the expression [aghinostw] denotes the set of letters
appearing in the word washington, and [aghinostw]* denotes the set of strings
composed of those letters only.
Second, we do not always have to list all the characters explicitly. Recall that
characters are almost invariably coded in ASCII. This code assigns bit strings, which
are naturally interpreted as integers, to the various characters, and it does so in a
rational way. For instance, the capital letters are assigned consecutive integers.
Likewise, the lower-case letters are assigned consecutive integers, and the digits are
assigned consecutive integers.
If we put a dash between two characters, we denote not only those characters,
but also all the characters whose codes lie between their codes.
✦ Example 10.19. We can define the upper- and lower-case letters by [A-Za-z].
The first three characters, A-Z, represent all the characters whose codes lie between
those for A and Z, that is, all the upper-case letters. The next three characters, a-z,
similarly denote all the lower-case letters.
Incidentally, because the dash has this special meaning, we must be careful if
we want to define a character class including -. We must place the dash either first
or last in the list. For instance, we could specify the set of four arithmetic operators
by [-+*/], but it would be an error to write [+-*/], because then the range +-*
would denote all the characters whose codes are between the codes for + and *. ✦
✦ Example 10.20. The automaton of Fig. 10.12 in Section 10.3, started at the
beginning of a line, will accept that line exactly when the line consists only of letters
in the word washington. We can express this pattern as a UNIX regular expression:
ˆ[aghinostw]*$. In words, the pattern is “the beginning of the line, followed by
any sequence of letters from the word washington, followed by the end of the line.”
566 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
As an example of how this regular expression is used, the UNIX command line
grep ’^[aghinostw]*$’ /usr/dict/words
will print all the words in the dictionary that consist only of letters from wash-
ington. UNIX requires, in this case, that the regular expression be written as a
quoted string. The effect of the command is that each line of the specified file
/usr/dict/words is examined. If it has any substring that is in the set of strings
denoted by the regular expression, then the line is printed; otherwise, the line is
not printed. Note that the line beginning and ending symbols are essential here.
Suppose they were missing. Since the empty string is in the language denoted by
the regular expression [aghinostw]*, we would find that every line has a substring
(namely ǫ) that is in the language of the regular expression, and thus every line
would be printed. ✦
will find and print all the words that have aeiou as a subsequence.
The fact that the dots will match characters other than letters is unimportant,
since there are no other characters besides letters and the newline character in the
file /usr/dict/words. However, if the dot could match the newline character, then
this regular expression could allow grep to use several lines together to find one
occurrence of the vowels in order. It is for examples like this one that the dot is
defined not to match the newline character. ✦
Additional Operators
The regular expressions in the UNIX commands awk and egrep also include some
additional operators.
1. Unlike grep, the commands awk and egrep also permit the union operator | in
their regular expressions.
2. The unary postfix operators ? and + do not allow us to define additional
languages, but they often make it easier to express languages. If R is a regular
expression, then R? stands for ǫ | R, that is, an optional R. Thus L(R?) is
L(R) ∪ {ǫ}. R+ stands for RR*, or equivalently, “one or more occurrences of
words from R.” Thus,
L(R+ ) = L(R) ∪ L(RR) ∪ L(RRR) · · ·
In particular, if ǫ is in L(R), then L(R+ ) and L(R*) denote the same language.
If ǫ is not in L(R), then L(R+ ) denotes L(R*) − {ǫ}. The operators + and ?
have the same associativity and precedence as *.
✦ Example 10.23. We can scan input for all lines whose letters are in strictly
increasing alphabetical order with the egrep command
egrep ’^a?b?c?d?e?f?g?h?i?j?k?l?m?n?o?p?q?r?s?t?u?v?w?x?y?z?$’
That is, we scan each line to see if between the beginning and end of the line there
is an optional a, and optional b, and so on. A line containing the word adept,
for instance, matches this expression, because the ?’s after a, d, e, p, and t can
be interpreted as “one occurrence,” while the other ?’s can be interpreted as “zero
occurrences,” that is, ǫ. ✦
568 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
EXERCISES
10.6.1: Write expressions for the following character classes.
10.6.2*: If you have UNIX available, write egrep programs to examine the file
/usr/dict/words
and find
✦
✦ ✦
✦
10.7 Algebraic Laws for Regular Expressions
It is possible for two regular expressions to denote the same language, just as two
arithmetic expressions can denote the same function of their operands. As an
example, the arithmetic expressions x + y and y + x each denote the same function
of x and y, because addition is commutative. Similarly, the regular expressions
R | S and S | R denote the same languages, no matter what regular expressions
we substitute for R and S; the justification is that union is also a commutative
operation.
Often, it is useful to simplify regular expressions. We shall see shortly that,
when we construct regular expressions from automata, we often construct a regular
expression that is unnecessarily complex. A repertoire of algebraic equivalences may
allow us to “simplify” expressions, that is, replace one regular expression by another
that involves fewer operands and/or operators, yet that denotes the same language.
The process is analogous to what we go through when we manipulate arithmetic
expressions to simplify an unwieldy expression. For example, we might multiply
two large polynomials and then simplify the result by grouping similar terms. As
another example, we simplified expressions of relational algebra in Section 8.9 to
allow faster evaluation.
Equivalent Two regular expressions R and S are equivalent, written R ≡ S, if L(R) = L(S).
regular If so, we say that R ≡ S is an equivalence. In what follows, we shall assume that R,
expressions S, and T are arbitrary regular expressions, and state our equivalences with these
operands.
SEC. 10.7 ALGEBRAIC LAWS FOR REGULAR EXPRESSIONS 569
Proving Equivalences
In this section we prove a number of equivalences involving regular expressions.
Recall that an equivalence between two regular expressions is a claim that the lan-
guages of these two expressions are equal, no matter what languages we substitute
for their variables. We thus prove an equivalence by showing the equality of two
languages, that is, two sets of strings. In general, we prove that set S1 equals set
S2 by proving containment in both directions. That is, we prove S1 ⊆ S2 , and we
also prove S2 ⊆ S1 . Both directions are necessary to prove equality of sets.
11. RR* ≡ R*R. Note that both sides are equivalent to R+ in the extended
notation of Section 10.6.
12. (RR* | ǫ) ≡ R*. That is, the union of R+ and the empty string is equivalent
to R*.
EXERCISES
10.7.1: Prove that the right distributive law of concatenation over union, equiva-
lence (8), holds.
10.7.2: The equivalences ∅∅ ≡ ∅ and ǫǫ ≡ ǫ follow from equivalences already stated,
by substitution for variables. Which equivalences do we use?
10.7.3: Prove equivalences (10) through (12).
10.7.4: Prove that
a) (R | R*) ≡ R*
b) (ǫ | R*) ≡ R*
10.7.5*: Are there examples of particular regular expressions R and S that are
“commutative,” in the sense that RS = SR for these particular expressions? Give
a proof if not, or some examples if so.
10.7.6*: The operand ∅ is not needed in regular expressions, except that without
it, we could not find a regular expression whose language is the empty set. Call a
∅-free regular regular expression ∅-free if it has no occurrences of ∅. Prove by induction on the
expression number of operator occurrences in a ∅-free regular expression R, that L(R) is not
the empty set. Hint : The next section gives an example of an induction on the
number of operator occurrences of a regular expression.
10.7.7**: Show by induction on the number of operator occurrences in a regular
expression R, that R is equivalent to either the regular expression ∅, or some ∅-free
regular expression.
✦
✦ ✦
✦
10.8 From Regular Expressions to Automata
Remember our initial discussion of automata in Section 10.2, where we observed a
close relationship between deterministic automata and programs that used the con-
cept of “state” to distinguish the roles played by different parts of the program. We
said then that designing deterministic automata is often a good way to design such
programs. However, we also saw that deterministic automata could be hard to de-
sign. We saw in Section 10.3 that sometimes nondeterministic automata were easier
to design, and that the subset construction allows us to turn any nondeterministic
automaton into a deterministic one. Now that we have met regular expressions, we
see that often it is even easier to write a regular expression than it is to design a
nondeterministic automaton.
Thus, it is good news that there is a way to convert any regular expression into
a nondeterministic automaton, and from there we can use the subset construction
to convert to a deterministic automaton. In fact, we shall see in the next section
that it is also possible to convert any automaton into a regular expression whose
572 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
language is exactly the set of strings that the automaton accepts. Thus automata
and regular expressions have exactly the same capability to describe languages.
In this section, we need to do a number of things to show how regular expres-
sions are converted to automata.
1. We introduce automata with ǫ-transitions, that is, with arcs labeled ǫ. These
arcs are used in paths but do not contribute to the labels of paths. This form
of automaton is an intermediate between regular expressions and the automata
discussed earlier in this chapter.
We first extend our notion of automata to allow arcs labeled ǫ. Such automata still
accept a string s if and only if there is a path labeled s from the start state to an
accepting state. However, note that ǫ, the empty string, is “invisible” in strings,
and so when constructing the label for a path we in effect delete all the ǫ’s and use
only the “real” characters.
a
1 2
ǫ ǫ
start
0 3
ǫ ǫ ǫ
b ǫ ǫ c ǫ
4 5 6 7 8 9
ǫ
Fig. 10.26. An automaton with ǫ-transitions for a | bc*.
When we remember that ǫ concatenated with any other string is that other string,
we see we can “drop out” the ǫ’s to get the string bcc, which is the label of the
path in question.
You can probably discover that the paths from state 0 to state 3 are labeled
by all and only the strings a, b, bc, bcc, bccc, and so on. A regular expression for
this set is a | bc*, and we shall see that the automaton of Fig. 10.26 is constructed
naturally from that expression. ✦
expression, we would create three different automata, with six states in all, each
similar to Fig. 10.27(c), but with a in place of x.
The automaton of Fig. 10.27(a) evidently accepts no strings, since you cannot
get from the start state to the accepting state; thus its language is ∅. Figure 10.27(b)
is suitable for ǫ, since it accepts the empty string but no other. Figure 10.27(c) is
an automaton for accepting only the string x. We can create new automata with
different values of the symbol x as we choose. Note that each of these automata
satisfies the three requirements stated above; there is one accepting state, no arcs
into the start state and no arcs out of the accepting state.
start
(a) Automaton for ∅.
start ǫ
start x
INDUCTION. Now suppose that S(i) is true for all i ≤ n; that is, for any regular
expression R with up to n operator occurrences, there is an automaton satisfying
the conditions of the inductive hypothesis and accepting all and only the strings
in L(R). Now, let R be a regular expression with n + 1 operator occurrences. We
focus on the “outermost” operator in R; that is, R can only be of the form R1 | R2 ,
R1 R2 , or R1 *, depending on whether union, concatenation, or closure was the last
operator used when R was formed.
In any of these three cases, R1 and R2 cannot have more than n operators,
because there is one operator of R that is not part of either.7 Thus the inductive
hypothesis applies to R1 and R2 in all three cases. We can prove S(n + 1) by
consideration of these cases in turn.
7 Let us not forget that even though concatenation is represented by juxtaposition, rather
than a visible operator symbol, uses of concatenation still count as operator occurrences
when deciding how many operator occurrences R has.
SEC. 10.8 FROM REGULAR EXPRESSIONS TO AUTOMATA 575
For R1
ǫ ǫ
start
ǫ ǫ
For R2
(a) Constructing the automaton for the union of two regular expressions.
start ǫ
For R1 For R2
(b) Constructing the automaton for the concatenation of two regular expressions.
start ǫ ǫ
For R1
ǫ
(c) Constructing the automaton for the closure of a regular expression.
Fig. 10.28. Inductive part of the construction of an automaton from a regular expression.
path in the chosen automaton to get to its accepting state, and then an ǫ-transition
to the accepting state of the automaton for R. This path is labeled by some string
s that the automaton we traveled through accepts, because we go from the start
state to the accepting state of that automaton. Therefore, s is either in L(R1 ) or
L(R2 ), depending on which automaton we traveled through. Since we only add
ǫ’s to that path’s labels, the automaton of Fig. 10.28(a) also accepts s. Thus the
strings accepted are those in L(R1 ) ∪ L(R2 ), which is L(R1 | R2 ), or L(R).
automaton.
The only way to get from the start to the accepting state of Fig. 10.28(b) is
1. Along a path labeled by a string s in L(R1 ), to get from the start state to the
accepting state of the automaton for R1 , then
2. Along the arc labeled ǫ to the start state of the automaton for R2 , and then
3. Along a path labeled by some string t in L(R2 ) to get to the accepting state.
The label of this path is st. Thus the automaton of Fig. 10.28(b) accepts exactly
the strings in L(R1 R2 ), or L(R).
a ·
b ∗
✦ Example 10.25. Let us construct the automaton for the regular expression
a | bc*. An expression tree for this regular expression is shown in Fig. 10.29; it is
analogous to the expression trees we discussed in Section 5.2, and it helps us see
the order in which the operators are applied to the operands.
There are three leaves, and for each, we construct an instance of the automaton
of Fig. 10.27(c). These automata are shown in Fig. 10.30, and we have used the
states that are consistent with the automaton of Fig. 10.26, which as we mentioned,
is the automaton we shall eventually construct for our regular expression. It should
be understood, however, that it is essential for the automata corresponding to the
SEC. 10.8 FROM REGULAR EXPRESSIONS TO AUTOMATA 577
various occurrences of operands to have distinct states. In our example, since each
operand is different, we would expect to use different states for each, but even if
there were several occurrences of a, for example, in the expression, we would create
distinct automata for each occurrence.
start a
1 2
(a) Automaton for a.
start b
4 5
(b) Automaton for b.
start c
7 8
(c) Automaton for c.
Now we must work up the tree of Fig. 10.29, applying operators and construct-
ing larger automata as we go. The first operator applied is the closure operator,
which is applied to operand c. We use the construction of Fig. 10.28(c) for the
closure. The new states introduced are called 6 and 9, again to be consistent with
Fig. 10.26. Fig. 10.31 shows the automaton for the regular expression c*.
ǫ
start ǫ c ǫ
6 7 8 9
ǫ
Fig. 10.31. Automaton for c*.
Next, we apply the concatenation operator to b and c*. We use the construc-
tion of Fig. 10.28(b), and the resulting automaton is shown in Fig. 10.32.
ǫ
start b ǫ ǫ c ǫ
4 5 6 7 8 9
ǫ
Fig. 10.32. Automaton for bc*.
Finally, we apply the union operator to a and bc*. The construction used is
that of Fig. 10.28(a), and we call the new states introduced 0 and 3. The resulting
automaton appeared in Fig. 10.26. ✦
578 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
Eliminating Epsilon-Transitions
If we are in any state s of an automaton with ǫ-transitions, we in effect are also
in any state that we can get to from s by following a path of arcs labeled ǫ. The
reason is that whatever string labels the path we’ve taken to get to state s, the
same string will be the label of the path extended with ǫ-transitions.
✦ Example 10.26. In Fig. 10.26, we can get to the state 5 by following a path
labeled b. From state 5, we can get to states 6, 7, 9, and 3 by following paths of
ǫ-labeled arcs. Thus if we are in state 5, we are, in effect, also in these four other
states. For instance, since 3 is an accepting state, we can think of 5 as an accepting
state as well, since every input string that gets us to state 5 will also get us to state
3, and thus be accepted. ✦
Thus the first question we need to ask is, from each state, what other states
can we reach following only ǫ-transitions? We gave an algorithm to answer this
question in Section 9.7, when we studied reachability as an application of depth-
first search. For the problem at hand, we have only to modify the graph of the
finite automaton by removing transitions on anything but ǫ. That is, for each real
symbol x, we remove all arcs labeled by x. Then, we perform a depth-first search
of the remaining graph from each node. The nodes visited during the depth-first
search from node v is exactly the set of node reachable from v using ǫ-transitions
only.
Recall that one depth-first search takes O(m) time, where m is the larger of
the number of nodes and arcs of the graph. In this case, there are n depth-first
searches to do, if the graph has n nodes, for a total of O(mn) time. However, there
are at most two arcs out of any one node in the automata constructed from regular
expressions by the algorithm described previously in this section. Thus m ≤ 2n,
and O(mn) is O(n2 ) time.
1 2
ǫ ǫ
start
0 3
ǫ ǫ ǫ
ǫ ǫ ǫ
4 5 6 7 8 9
ǫ
Fig. 10.33. The ǫ-transitions from Fig. 10.26.
SEC. 10.8 FROM REGULAR EXPRESSIONS TO AUTOMATA 579
✦ Example 10.27. In Fig. 10.33 we see the arcs that remain from Fig. 10.26
when the three arcs labeled by a real symbol, a, b, or c, are deleted. Figure 10.34
is a table giving the reachability information for Fig. 10.33; that is, a 1 in row i and
column j means that there is a path of length 0 or more from node i to node j. ✦
0 1 2 3 4 5 6 7 8 9
0 1 1 1
1 1
2 1 1
3 1
4 1
5 1 1 1 1 1
6 1 1 1 1
7 1
8 1 1 1 1
9 1 1
Armed with the reachability information, we can construct our equivalent au-
tomaton that has no ǫ-transitions. The idea is to bundle into one transition of the
new automaton a path of zero or more ǫ-transitions of the old automaton followed
by one transition of the old automaton on a real symbol. Every such transition
takes us to the second state of one of the automata that were introduced by the
basis rule of Fig. 10.27(c), the rule for operands that are real symbols. The reason
is that only these states are entered by arcs with real symbols as labels. Thus our
new automaton needs only these states and the start state for its own set of states.
Important state Let us call these states the important states.
In the new automaton, there is a transition from important state i to important
state j with symbol x among its labels if there is some state k such that
1. State k is reachable from state i along a path of zero or more ǫ-transitions.
Note that k = i is always permitted.
2. In the old automaton, there is a transition from state k to state j, labeled x.
We also must decide which states are accepting states in the new automaton.
As we mentioned, when we are in a state, we are effectively in any state it can reach
along ǫ-labeled arcs, and so in the new automaton, we shall make state i accepting if
there is, in the old automaton, a path of ǫ-labeled arcs from state i to the accepting
state of the old automaton. Note that i may itself be the accepting state of the old
automaton, which therefore remains accepting in the new automaton.
state 0, which is the initial state, and states 2, 5, and 8, because these are entered
by arcs labeled by a real symbol.
We shall begin by discovering the transitions for state 0. According to Fig.
10.34, from state 0 we can reach states 0, 1, and 4 along paths of ǫ-labeled arcs.
We find a transition on a from state 1 to 2 and a transition on b from 4 to 5. Thus
in the new automaton, there is a transition from 0 to 2 labeled a and from 0 to 5
labeled b. Notice that we have collapsed the paths 0 → 1 → 2 and 0 → 4 → 5 of
Fig. 10.26 into single transitions with the label of the non-ǫ transition along those
paths. As neither state 0, nor the states 1 and 4 that it reaches along ǫ-labeled
paths are accepting states, in the new automaton, state 0 is not accepting.
start
0
c
b
c
5 8
Next, consider the transitions out of state 2. Figure 10.34 tells us that from 2
we can reach only itself and 3 via ǫ-transitions, and so we must look for transitions
out of states 2 or 3 on real symbols. Finding none, we know that there are no
transitions out of state 2 in the new automaton. However, since 3 is accepting, and
2 reaches 3 by ǫ-transitions, we make 2 accepting in the new automaton.
When we consider state 5, Fig. 10.34 tells us to look at states 3, 5, 6, 7, and
9. Among these, only state 7 has a non-ǫ transition out; it is labeled c and goes to
state 8. Thus in the new automaton, the only transition out of state 5 is a transition
on c to state 8. We make state 5 accepting in the new automaton, since it reaches
accepting state 3 following ǫ-labeled arcs.
Finally, we must look at the transitions out of state 8. By reasoning similar to
that for state 5, we conclude that in the new automaton, the only transition out of
state 8 is to itself and is labeled c. Also, state 8 is accepting in the new automaton.
Figure 10.35 shows the new automaton. Notice that the set of strings it accepts
is exactly those strings in L(a | bc*), that is, the string a (which takes us to state
2), the string b (which takes us to state 5), and the strings bc, bcc, bccc, and
so on, all of which take us to state 8. The automaton of Fig. 10.35 happens to
be deterministic. If it were not, we would have to use the subset construction to
convert it to a deterministic automaton, should we wish to design a program that
would recognize the strings of the original regular expression. ✦
SEC. 10.8 FROM REGULAR EXPRESSIONS TO AUTOMATA 581
start
0
EXERCISES
10.8.1: Construct automata with ǫ-transitions for the following regular expressions.
a) aaa Hint : Remember to create a new automaton for each occurrence of operand
a.
b) (ab | ac)*
c) (0 | 1 | 1*)*
10.8.2: For each of the automata constructed in Exercise 10.8.1, find the reachable
sets of nodes for the graph formed from the ǫ-labeled arcs. Note that you need only
construct the reachable states for the start state and the states that have non-ǫ
transitions in, when you construct the automaton without ǫ-transitions.
10.8.3: For each of the automata of Exercise 10.8.1, construct an equivalent au-
tomaton without ǫ-transitions.
10.8.4: Which of the automata in Exercise 10.8.3 are deterministic? For those that
are not, construct an equivalent deterministic automaton.
10.8.5*: For the deterministic automata constructed from Exercises 10.8.3 or Ex-
ercise 10.8.4, are there equivalent deterministic automata with fewer states? If so,
find minimal ones.
10.8.6*: We can generalize our construction from a regular expression to an au-
tomaton with ǫ-transitions to include expressions that use the extended operators
of Section 10.7. That statement is true in principle, since each of those extensions is
a shorthand for an “ordinary” regular expression, by which we could replace the ex-
tended operator. However, we can also incorporate the extended operators directly
into our construction. Show how to modify the construction to cover
582 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
start
For R1 For R2
Fig. 10.37. Alternative automaton for the concatenation of two regular expressions.
✦
✦ ✦
✦
10.9 From Automata to Regular Expressions
In this section, we shall demonstrate the other half of the equivalence between
automata and regular expressions, by showing that for every automaton A there is a
regular expression whose language is exactly the set of strings accepted by A. While
we generally use the construction of the last section, where we convert “designs” in
the form of regular expressions into programs, in the form of deterministic automata,
this construction is also interesting and instructive. It completes the proof of the
equivalence, in expressive power, of two radically different notations for describing
patterns.
Our construction involves the elimination of states, one by one, from an au-
tomaton. As we proceed, we replace the labels on the arcs, which are initially
sets of characters, by more complicated regular expressions. Initially, if we have
label {x1 , x2 , . . . , xn } on an arc, we replace the label by the regular expression
x1 | x2 | · · · | xn , which represents essentially the same set of symbols, although
technically the regular expression represents strings of length 1.
In general, we can think of the label of a path as the concatenation of the regular
expressions along that path, or as the language defined by the concatenation of those
expressions. That view is consistent with our notion of a path labeled by a string.
That is, if the arcs of a path are labeled by the regular expressions R1 , R2 , . . . , Rn ,
in that order, then the path is labeled by w, if and only if string w is in the language
L(R1 R2 · · · Rn ).
a|b a|b|c
0 1 2
Fig. 10.38. Path with regular expressions as labels. The label of the path is the
language of the concatenated regular expressions.
R11
s1 t1
U
S1 T1
s2 t2
S2 T2
· ·
· u ·
· Sn Tm ·
sn tm
Hence, after eliminating u and all arcs into or out of u, we must replace Rij ,
the label of the arc si → tj , by
Rij | Si U *Tj
There are a number of useful special cases. First, if U = ∅, that is, the loop on
u is not really there, then U * = ∅* = ǫ. Since ǫ is the identity under concatenation,
(Si ǫ)Tj = Si Tj ; that is, U has effectively disappeared as it should. Similarly, if
Rij = ∅, meaning that there was formerly no arc from si to tj , then we introduce
this arc and give it label Si U *Tj , or just Si Tj , if U = ∅. The reason we can do so
is that ∅ is the identity for union, and so ∅ | Si U *Tj = Si U *Tj .
b d
1 0
0 1 0 1
start
a c
0 0
✦ Example 10.30. Let us consider the bounce filter automaton of Fig. 10.4,
which we reproduce here as Fig. 10.40. Suppose we wish to eliminate state b, which
thus plays the role of state u in Fig. 10.39. State b has one predecessor, a, and two
successors, a and c. There is no loop on b, and so we introduce one, labeled ∅. There
is an arc from a to itself, labeled 0. Since a is both a predecessor and a successor
of b, this arc is needed in the transformation. The only other predecessor-successor
pair is a and c. Since there is no arc a → c, we add one with label ∅. The diagram
of relevant states and arcs is as shown in Fig. 10.41.
1 0
a b a
1
∅ c
For the a–a pair, we replace the label of the arc a → a by 0 | 1∅*0. The term
0 represents the original label of the arc, the 1 is the label of the arc a → b, ∅ is
the label of the loop b → b, and the second 0 is the label of the arc b → a. We can
simplify, as described above, to eliminate ∅*, leaving us with the expression 0 | 10.
That makes sense. In Fig. 10.40, the paths from a to a, going through b zero or
more times, but no other state, have the set of labels {0, 10}.
The pair a–c is handled similarly. We replace the label ∅ on the arc a → c by
∅ | 1∅*1, which simplifies to 11. That again makes sense, since in Fig. 10.40, the
only path from a to c, via b has label 11. When we eliminate node b and change
the arc labels, Fig. 10.40 becomes Fig. 10.42. Note that in this automaton, some
of the arcs have labels that are regular expressions whose languages have strings of
length greater than 1. However, the sets of path labels for the paths among states
a, c, and d has not changed from what they were in Fig. 10.40. ✦
d
0
0 1
start 11
a c
0 | 10 1
S T
V
start
s t
U
Fig. 10.43. An automaton reduced to two states.
586 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
We need to discover what regular expression describes the set of labels of paths
that start at s and end at t. One way to express this set of strings is to recognize
that each such path gets to t for the first time, and then goes from t to itself, zero
or more times, possibly passing through s as it goes. The set of strings that take
us to t for the first time is L(S*U ). That is, we use strings in L(S) zero or more
times, staying in state s as we do, and then we follow a string from L(U ). We can
stay in state t either by following a string in L(T ), which takes us from t to t, or by
following a string in V S*U , which takes us to state s, keeps us at s for a while, and
then takes us back to t. We can follow zero or more strings from these two groups,
in any order, which we can express as (T | V S*U )*. Thus a regular expression for
the set of strings that get us from state s to state t is
S*U (T | V S*U )* (10.4)
There is one special case, when the start state s is itself an accepting state.
Then, there are strings that are accepted because they take the automaton A from
s to s. We eliminate all states but s, leaving an automaton that looks like Fig.
10.44. The set of strings that take A from s to s is L(S*). Thus we may use S* as
a regular expression to account for the contribution of accepting state s.
start
s
1. Eliminate states b and d from Fig. 10.40 to get an automaton involving only a
and c.
2. Eliminate states b and c from Fig. 10.40 to get an automaton involving only a
and d.
SEC. 10.9 FROM AUTOMATA TO REGULAR EXPRESSIONS 587
Since in each case we must eliminate state b, Fig. 10.42 has gotten us half way
toward both goals. For (1), let us eliminate state d from Fig. 10.42. There is a path
labeled 00 from c to a via d, so we need to introduce an arc labeled 00 from c to
a. There is a path labeled 01 from c to itself, via d, so we need to add label 01 to
the label of the loop at c, and that label becomes 1 | 01. The resulting automaton
is shown in Fig. 10.45.
00
start 11
a c
0 | 10 1 | 01
Now for goal (2) we shall again start at Fig. 10.42 and eliminate state c this
time. In Fig. 10.42 we can go from state a to state d via c, and the regular expression
describing the possible strings is 111*0.8 That is, 11 takes us from a to c, 1* allows
us to loop at c zero or more times, and finally 0 takes us from c to d. Thus we
introduce an arc labeled 111*0 from a to d. Similarly, in Fig. 10.42 we can go from
d to itself, via c, by following strings in 11*0. Thus this expression becomes the
label of a loop at d. The reduced automaton is shown in Fig. 10.46.
start 111*0
a d
0 | 10 11*0
Now we may apply the formula developed in (10.4) to the automata in Figs.
10.45 and 10.46. For Fig. 10.45, we have S = 0 | 10, U = 11, V = 00, and
T = 1 | 01. Thus a regular expression denoting the set of strings that take the
automaton of Fig. 10.40 from start state a to accepting state c is
(0 | 10)*11 (1 | 01) | 00(0 | 10)*11 * (10.5)
and the expression denoting the strings that take state a to accepting state d is
(0 | 10)*111*0 11*0 | 0(0 | 10)*111*0 * (10.6)
8 Remember that because * takes precedence over concatenation, 111*0 is parsed as 11(1*)0,
and represents the strings consisting of two or more 1’s followed by a 0.
588 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
The expression that denotes the strings accepted by the bounce filter automaton is
the union of (10.5) and (10.6), or
(0 | 10)*11 (1 | 01) | 00(0 | 10)*11 * |
(0 | 10)*111*0 11*0 | 0(0 | 10)*111*0 *
You may recall that we suggested a much simpler regular expression for the same
language,
(0 | 1)*11(1 | 01)*(ǫ | 0)
This difference should remind us that there can be more than one regular expression
for the same language, and that the expression we get by converting an automaton
to a regular expression is not necessarily the simplest expression for that language. ✦
EXERCISES
10.9.1: Find regular expressions for the automata of
a) Fig. 10.3
b) Fig. 10.9
c) Fig. 10.10
d) Fig. 10.12
e) Fig. 10.13
f) Fig. 10.17
g) Fig. 10.20
You may wish to use the shorthands of Section 10.6.
10.9.2: Convert the automata of Exercise 10.4.1 to regular expressions.
10.9.3*: Show that another regular expression we could use for the set of strings
that get us from state s to state t in Fig. 10.43 is (S | U T *V )*U T *.
10.9.4: How can you modify the construction of this section so that regular expres-
sions can be generated from automata with ǫ-transitions?
✦
✦ ✦
✦
10.10 Summary of Chapter 10
The subset construction of Section 10.4, together with the conversions of Sections
10.8 and 10.9, tell us that three ways to express languages have exactly the same
expressive power. That is, the following three statements about a language L are
either all true or all false.
1. There is some deterministic automaton that accepts all and only the strings in
L.
SEC. 10.11 BIBLIOGRAPHIC NOTES FOR CHAPTER 10 589
2. There is some (possibly nondeterministic) automaton that accepts all and only
the strings in L.
3. L is L(R) for some regular expression R.
The subset construction shows that (2) implies (1). Evidently (1) implies (2),
since a deterministic automaton is a special kind of nondeterministic automaton.
We showed that (3) implies (2) in Section 10.8, and we showed (2) implies (3) in
Section 10.9. Thus, all of (1), (2), and (3) are equivalent.
In addition to these equivalences, we should take away a number of important
ideas from Chapter 10.
✦ Deterministic automata can be used as the core of programs that recognize
many different kinds of patterns in strings.
✦ Regular expressions are often a convenient notation for describing patterns.
✦ There are algebraic laws for regular expressions that make union and concate-
nation behave in ways similar to + and ×, but with some differences.
✦
✦ ✦
✦
10.11 Bibliographic Notes for Chapter 10
The reader can learn more about the theory of automata and languages in Hopcroft
and Ullman [1979].
The automaton model for processing strings was first expressed in roughly the
form described here by Huffman [1954], although there were a number of similar
models discussed earlier and concurrently; the history can be found in Hopcroft
and Ullman [1979]. Regular expressions and their equivalence to automata are
from Kleene [1956]. Nondeterministic automata and the subset construction are
from Rabin and Scott [1959]. The construction of nondeterministic automata from
regular expressions that we used in Section 10.8 is from McNaughton and Yamada
[1960], while the construction in the opposite direction, in Section 10.9, is from
Kleene’s paper.
The use of regular expressions as a way to describe patterns in strings first
appeared in Ken Thompson’s QED system (Thompson [1968]), and the same ideas
later influenced many commands in his UNIX system. There are a number of other
applications of regular expressions in system software, much of which is described
in Aho, Sethi, and Ullman [1986].
Aho, A. V., R. Sethi, and J. D. Ullman [1986]. Compiler Design: Principles, Tech-
niques, and Tools, Addison-Wesley, Reading, Mass.
Hopcroft, J. E. and J. D. Ullman [1979]. Introduction to Automata Theory, Lan-
guages, and Computation, Addison-Wesley, Reading, Mass.
Huffman, D. A. [1954]. “The synthesis of sequential switching machines,” Journal
of the Franklin Institute 257:3-4, pp. 161–190 and 275–303.
Kleene, S. C. [1956]. “Representation of events in nerve nets and finite automata,”
in Automata Studies (C. E. Shannon and J. McCarthy, eds.), Princeton University
Press.
590 PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS
McNaughton, R. and H. Yamada [1960]. “Regular expressions and state graphs for
automata,” IEEE Trans. on Computers 9:1, pp. 39–47.
Rabin, M. O. and D. Scott [1959]. “Finite automata and their decision problems,”
IBM J. Research and Development 3:2, pp. 115-125.
Thompson, K. [1968]. “Regular expression search algorithm,” Comm. ACM 11:6,
pp. 419–422.
CHAPTER 11
✦
Recursive
✦ ✦
✦
Description
of Patterns
In the last chapter, we saw two equivalent ways to describe patterns. One was
graph-theoretic, using the labels of paths in a kind of graph that we called an
“automaton.” The other was algebraic, using the regular expression notation. In
this chapter, we shall see a third way to describe patterns, using a form of recursive
definition called a “context-free grammar” (“grammar” for short).
One important application of grammars is the specification of programming
languages. Grammars are a succinct notation for describing the syntax of typical
programming languages; we shall see many examples in this chapter. Further, there
is mechanical way to turn a grammar for a typical programming language into a
“parser,” one of the key parts of a compiler for the language. The parser uncovers
the structure of the source program, often in the form of an expression tree for each
statement in the program.
✦
✦ ✦
✦
11.1 What This Chapter Is About
591
592 RECURSIVE DESCRIPTION OF PATTERNS
✦ A proof that grammars are more powerful than regular expressions for describ-
ing languages (Section 11.8). First, we show that grammars are at least as
descriptive as regular expressions by showing how to simulate a regular ex-
pression with a grammar. Then we describe a particular language that can be
specified by a grammar, but by no regular expression.
✦
✦ ✦
✦
11.2 Context-Free Grammars
Arithmetic expressions can be defined naturally by a recursive definition. The
following example illustrates how the definition works. Let us consider arithmetic
expressions that involve
1. The four binary operators, +, −, ∗, and /,
2. Parentheses for grouping, and
3. Operands that are numbers.
The usual definition of such expressions is an induction of the following form:
This induction defines a language, that is, a set of strings. The basis states
that any number is in the language. Rule (a) states that if s is a string in the
language, then so is the parenthesized string (s); this string is s preceded by a left
parenthesis and followed by a right parenthesis. Rules (b) to (e) say that if s and t
are two strings in the language, then so are the strings s+t, s-t, s*t, and s/t.
Grammars allow us to write down such rules succinctly and with a precise
meaning. As an example, we could write our definition of arithmetic expressions
with the grammar shown in Fig. 11.1.
The symbols used in Fig. 11.1 require some explanation. The symbol
<Expression>
Syntactic is called a syntactic category; it stands for any string in the language of arithmetic
category expressions. The symbol → means “can be composed of.” For instance, rule (2) in
Fig. 11.1 states that an expression can be composed of a left parenthesis followed by
any string that is an expression followed by a right parenthesis. Rule (3) states that
an expression can be composed of any string that is an expression, the character +,
and any other string that is an expression. Rules (4) through (6) are similar to rule
(3).
Rule (1) is different because the symbol number on the right of the arrow is
not intended to be a literal string, but a placeholder for any string that can be
interpreted as a number. We shall later show how numbers can be defined gram-
matically, but for the moment let us imagine that number is an abstract symbol,
and expressions use this symbol to represent any atomic operand.
Notational Conventions
We denote syntactic categories by a name, in italics, surrounded by angular brackets,
for example, <Expression >. Terminals in productions will either be denoted by a
boldface x to stand for the string x (in analogy with the convention for regular
expressions), or by an italicized character string with no angular brackets, for the
case that the terminal, like number, is an abstract symbol.
We use the metasymbol ǫ to stand for an empty body. Thus, the production
<S> → ǫ means that the empty string is in the language of syntactic category <S>.
We sometimes group the bodies for one syntactic category into one production,
separating the bodies by the metasymbol |, which we can read as “or.” For example,
if we have productions
<S> → B1 , <S> → B2 , . . . , <S> → Bn
where the B’s are each the body of a production for the syntactic category <S>,
then we can write these productions as
<S> → B1 | B2 | · · · | Bn
<Digit > → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<Number> → <Digit >
<Number> → <Number> <Digit >
Note that, by our convention regarding the metasymbol |, the first line is short for
the ten productions
<Digit > → 0
<Digit > → 1
···
<Digit > → 9
We could similarly have combined the two productions for <Number> into one
line. Note that the first production for <Number > states that a single digit is a
number, and the second production states that any number followed by another
digit is also a number. These two productions together say that any string of one
or more digits is a number.
Figure 11.2 is an expanded grammar for expressions, in which the abstract
terminal number has been replaced by productions that define the concept. Notice
that the grammar has three syntactic categories, <Expression>, <Number> and
Start symbol <Digit >. We shall treat the syntactic category <Expression> as the start symbol;
it generates the strings (in this case, well-formed arithmetic expressions) that we
intend to define with the grammar. The other syntactic categories, <Number> and
<Digit >, stand for auxiliary concepts that are essential, but not the main concept
for which the grammar was written. ✦
<Balanced > → ǫ
<Balanced > → ( <Balanced > ) <Balanced >
ses.
There is another way that strings of balanced parentheses could be defined. If
we recall Section 2.6, our original motivation for describing such strings was that
they are the subsequences of parentheses that appear within expressions when we
delete all but the parentheses. Figure 11.1 gives us a grammar for expressions.
Consider what happens if we remove all terminals but the parentheses. Production
(1) becomes
<Expression> → ǫ
Production (2) becomes
<Expression> → ( <Expression> )
and productions (3) through (6) all become
<Expression> → <Expression> <Expression>
If we replace the syntactic category <Expression> by a more appropriate name,
<BalancedE >, we get another grammar for balanced strings of parentheses, shown
in Fig. 11.4. These productions are rather natural. They say that
1. The empty string is balanced,
2. If we parenthesize a balanced string, the result is balanced, and
3. The concatenation of balanced strings is balanced.
<BalancedE > → ǫ
<BalancedE > → ( <BalancedE > )
<BalancedE > → <BalancedE > <BalancedE >
The grammars of Figs. 11.3 and 11.4 look rather different, but they do define
the same set of strings. Perhaps the easiest way to prove that they do is to show that
the strings defined by <BalancedE > in Fig. 11.4 are exactly the “profile balanced”
strings defined in Section 2.6. There, we proved the same assertion about the strings
defined by <Balanced > in Fig. 11.3. ✦
✦ Example 11.3. We can also describe the structure of control flow in languages
like C grammatically. For a simple example, it helps to imagine that there are ab-
stract terminals condition and simpleStat. The former stands for a conditional ex-
pression. We could replace this terminal by a syntactic category, say <Condition >.
SEC. 11.2 CONTEXT-FREE GRAMMARS 597
The productions for <Condition> would resemble those of our expression grammar
above, but with logical operators like &&, comparison operators like <, and the
arithmetic operators.
The terminal simpleStat stands for a statement that does not involve nested
control structure, such as an assignment, function call, read, write, or jump state-
ment. Again, we could replace this terminal by a syntactic category and the pro-
ductions to expand it.
We shall use <Statement> for our syntactic category of C statements. One
way statements can be formed is through the while-construct. That is, if we have a
statement to serve as the body of the loop, we can precede it by the keyword while
and a parenthesized condition to form another statement. The production for this
statement-formation rule is
<Statement> → while ( condition ) <Statement>
EXERCISES
11.2.1: Give a grammar to define the syntactic category <Identifier>, for all those
strings that are C identifiers. You may find it useful to define some auxiliary
syntactic categories like <Digit >.
598 RECURSIVE DESCRIPTION OF PATTERNS
place of the terminal condition. You may use an abstract terminal comparison to
represent any comparison expression, such as x+1<y+z. Then replace comparison by
a syntactic category <Comparison> that expresses arithmetic comparisons in terms
of the comparison operators such as < and a syntactic category <Expression >. The
latter can be defined roughly as in the beginning of Section 11.2, but with additional
operators found in C, such as unary minus and %.
11.2.9*: Write productions that will define the syntactic category <SimpleStat >,
to replace the abstract terminal simpleStat in Fig. 11.5. You may assume the syn-
tactic category <Expression> stands for C arithmetic expressions. Recall that a
“simple statement” can be an assignment, function call, or jump, and that, techni-
cally, the empty string is also a simple statement.
✦
✦ ✦
✦
11.3 Languages from Grammars
A grammar is essentially an inductive definition involving sets of strings. The major
departure from the examples of inductive definitions seen in Section 2.6 and many
of the examples we had in Section 11.2 is that with grammars it is routine for several
syntactic categories to be defined by one grammar. In contrast, our examples of
Section 2.6 each defined a single notion. Nonetheless, the way we constructed the set
of defined objects in Section 2.6 applies to grammars. For each syntactic category
<S> of a grammar, we define a language L(<S>), as follows:
BASIS. Start by assuming that for each syntactic category <S> in the grammar,
the language L(<S>) is empty.
condenses the strings considerably. The shorthand uses the terminals w (while),
c (parenthesized condition), and s (simpleStat). The grammar uses the syntactic
category <S> for statements and the syntactic category <L> for statement lists.
The productions are shown in Fig. 11.6.
Let L be the language of strings in the syntactic category <L>, and let S
be the language of strings in the syntactic category <S>. Initially, by the basis
rule, both L and S are empty. In the first round, only productions (3) and (5)
are useful, because the bodies of all the other productions each have a syntactic
category, and we do not yet have any strings in the languages for the syntactic
categories. Production (3) lets us infer that s; is a string in the language S, and
production (5) tells us that ǫ is in language L.
The second round begins with L = {ǫ}, and S = {s;}. Production (1) now
allows us to add wcs; to S, since s; is already in S. That is, in the body of
production (1), terminals w and c can only stand for themselves, but syntactic
category <S> can be replaced by any string in the language S. Since at present,
string s; is the only member of S, we have but one choice to make, and that choice
yields the string wcs;.
Production (2) adds string {}, since terminals { and } can only stand for
themselves, but syntactic category <L> can stand for any string in language L. At
the moment, L has only ǫ.
Since production (3) has a body consisting of a terminal, it will never produce
any string other than s;, so we can forget this production from now on. Similarly,
production (5) will never produce any string other than ǫ, so we can ignore it on
this and future rounds.
Finally, production (4) produces string s; for L when we replace <L> by ǫ and
replace <S> by s;. At the end of round 2, the languages are S = {s;, wcs;, {}},
and L = {ǫ, s;}.
On the next round, we can use productions (1), (2), and (4) to produce new
strings. In production (1), we have three choices to substitute for <S>, namely s;,
wcs;, and {}. The first gives us a string for language S that we already have, but
the other two give us new strings wcwcs; and wc{}.
Production (2) allows us to substitute ǫ or s; for <L>, giving us old string {}
and new string {s;} for language S. In production (4), we can substitute ǫ or s;
for <L> and s;, wcs;, or {} for <S>, giving us for language L one old string, s;,
and the five new strings wcs;, {}, s;s;, s;wcs;, and s;{}.1
1 We are being extremely systematic about the way we substitute strings for syntactic cate-
gories. We assume that throughout each round, the languages L and S are fixed as they were
defined at the end of the previous round. Substitutions are made into each of the production
bodies. The bodies are allowed to produce new strings for the syntactic categories of the
SEC. 11.3 LANGUAGES FROM GRAMMARS 601
The current languages are S = {s;, wcs;, {}, wcwcs;, wc{}, {s;}}, and
L = {ǫ, s;, wcs;, {}, s;s;, s;wcs;, s;{}}
We may proceed in this manner as long as we like. Figure 11.7 summarizes the first
three rounds. ✦
S L
Round 1: s; ǫ
Round 2: wcs; s;
{}
Round 3: wcwcs; wcs;
wc{} {}
{s;} s;s;
s;wcs;
s;{}
Infinite As in Example 11.4, the language defined by a grammar may be infinite. When
language a language is infinite, we cannot list every string. The best we can do is to enumerate
the strings by rounds, as we started to do in Example 11.4. Any string in the
language will appear on some round, but there is no round at which we shall have
produced all the strings. The set of strings that would ever be put into the language
of a syntactic category <S> forms the (infinite) language L(<S>).
EXERCISES
11.3.1: What new strings are added on the fourth round in Example 11.4?
11.3.2*: On the ith round of Example 11.4, what is the length of the shortest
string that is new for either of the syntactic categories? What is the length of the
longest new string for
a) <S>
b) <L>?
11.3.3: Using the grammar of
a) Fig. 11.3
b) Fig. 11.4
generate strings of balanced parentheses by rounds. Do the two grammars generate
the same strings on the same rounds?
heads, but we do not use the strings newly constructed from one production in the body
of another production on the same round. It doesn’t matter. All strings that are going to
be generated will eventually be generated on some round, regardless of whether or not we
immediately recycle new strings into the bodies or wait for the next round to use the new
strings.
602 RECURSIVE DESCRIPTION OF PATTERNS
11.3.4: Suppose that each production with some syntactic category <S> as its
head also has <S> appearing somewhere in its body. Why is L(<S>) empty?
11.3.5*: When generating strings by rounds, as described in this section, the only
new strings that can be generated for a syntactic category <S> are found by making
a substitution for the syntactic categories of the body of some production for <S>,
such that at least one substituted string was newly discovered on the previous round.
Explain why the italicized condition is correct.
11.3.6**: Suppose we want to tell whether a particular string s is in the language
of some syntactic category <S>.
a) Explain why, if on some round, all the new strings generated for any syntactic
category are longer than s, and s has not already been generated for L(<S>),
then s cannot ever be put in L(<S>). Hint : Use Exercise 11.3.5.
b) Explain why, after some finite number of rounds, we must fail to generate any
new strings that are as short as or shorter than s.
c) Use (a) and (b) to develop an algorithm that takes a grammar, one of its
syntactic categories <S>, and a string of terminals s, and tells whether s is in
L(<S>).
✦
✦ ✦
✦
11.4 Parse Trees
As we have seen, we can discover that a string s belongs to the language L(<S>),
for some syntactic category <S>, by the repeated application of productions. We
start with some strings derived from basis productions, those that have no syntactic
category in the body. We then “apply” productions to strings already derived for
the various syntactic categories. Each application involves substituting strings for
occurrences of the various syntactic categories in the body of the production, and
thereby constructing a string that belongs to the syntactic category of the head.
Eventually, we construct the string s by applying a production with <S> at the
head.
It is often useful to draw the “proof” that s is in L(<S>) as a tree, which
we call a parse tree. The nodes of a parse tree are labeled, either by terminals, by
syntactic categories, or by the symbol ǫ. The leaves are labeled only by terminals
or ǫ, and the interior nodes are labeled only by syntactic categories.
Every interior node v represents the application of a production. That is, there
must be some production such that
1. The syntactic category labeling v is the head of the production, and
2. The labels of the children of v, from the left, form the body of the production.
✦ Example 11.5. Figure 11.8 is an example of a parse tree, based on the grammar
of Fig. 11.2. However, we have abbreviated the syntactic categories <Expression>,
<N umber>, and <Digit> to <E>, <N >, and <D>, respectively. The string
represented by this parse tree is 3*(2+14).
For example, the root and its children represent the production
SEC. 11.4 PARSE TREES 603
<E>
<E> * <E>
2 <D> 4
Fig. 11.8. Parse tree for the string 3 ∗ (2 + 14) using the grammar from Fig. 11.2.
BASIS. For every terminal of the grammar, say x, there is a tree with one node
labeled x. This tree has yield x, of course.
<S>
<S>
X1 X2 ··· Xn
T1 T2 Tn
Fig. 11.10. Constructing a parse tree using a production and other parse trees.
✦ Example 11.6. Let us follow the construction of the parse tree in Fig. 11.8, and
see how its construction mimics a proof that the string 3*(2+14) is in L(<E>).
First, we can construct a one-node tree for each of the terminals in the tree. Then
the group of productions on line (1) of Fig. 11.2 says that each of the ten digits is a
string of length 1 belonging to L(<D>). We use four of these productions to create
the four trees shown in Fig. 11.11. For instance, we use the production <D> →1
to create the parse tree in Fig. 11.11(a) as follows. We create a tree with a single
node labeled 1 for the symbol 1 in the body. Then we create a node labeled <D>
as the root and give it one child, the root (and only node) of the tree selected for 1.
Our next step is to use production (2) of Fig. 11.2, or <N > → <D>, to
discover that digits are numbers. For instance, we may choose the tree of Fig.
11.11(a) to substitute for <D> in the body of production (2), and get the tree of
Fig. 11.12(a). The other two trees in Fig. 11.12 are produced similarly.
SEC. 11.4 PARSE TREES 605
1 2 3 4
(a) (b) (c) (d)
Fig. 11.11. Parse trees constructed using production
<D> → 1 and similar productions.
1 2 3
(a) (b) (c)
Fig. 11.12. Parse trees constructed using production <N > → <D>.
Now we can use production (3), which is <N > → <N ><D>. For <N > in the
body we shall select the tree of Fig. 11.12(a), and for <D> we select Fig. 11.11(d).
We create a new node labeled by <N >, for the head, and give it two children, the
roots of the two selected trees. The resulting tree is shown in Fig. 11.13. The yield
of this tree is the number 14.
<N >
<D> 4
Fig. 11.13. Parse trees constructed using production <N > → <N ><D>.
Our next task is to create a tree for the sum 2+14. First, we use the production
(4), or <E> → <N >, to build the parse trees of Fig. 11.14. These trees show that
3, 2, and 14 are expressions. The first of these comes from selecting the tree of Fig.
11.12(c) for <N > of the body. The second is obtained by selecting the tree of Fig.
11.12(b) for <N >, and the third by selecting the tree of Fig. 11.13.
Then we use production (6), which is <E> → <E>+<E>. For the first <E>
in the body we use the tree of Fig. 11.14(b), and for the second <E> in the body
we use the tree of Fig. 11.14(c). For the terminal + in the body, we use a one-node
tree with label +. The resulting tree is shown in Fig. 11.15; its yield is 2+14.
606 RECURSIVE DESCRIPTION OF PATTERNS
3 2 <D> 4
(a) (b)
1
(c)
Fig. 11.14. Parse trees constructed using production <E> → <N >.
<E>
<E> + <E>
2 <D> 4
We next use production (5), or <E> → (<E>), to construct the parse tree of
Fig. 11.16. We have simply selected the parse tree of Fig. 11.15 for the <E> in the
body, and we select the obvious one-node trees for the terminal parentheses.
Lastly, we use production (8), which is <E> → <E> * <E>, to construct the
parse tree that we originally showed in Fig. 11.8. For the first <E> in the body,
we choose the tree of Fig. 11.14(a), and for the second we choose the tree of Fig.
11.16. ✦
<E>
( <E> )
<E> + <E>
2 <D> 4
1. If T is a parse tree with root labeled <S> and yield s, then string s is in the
language L(<S>).
2. If string s is in L(<S>), then there is a parse tree with yield s and root labeled
<S>.
This equivalence should be fairly intuitive. Roughly, parse trees are assembled from
smaller parse trees in the same way that we assemble long strings from shorter ones,
using substitution for syntactic categories in the bodies of productions. We begin
with part (1), which we prove by complete induction on the height of tree T .
BASIS. Suppose the height of the parse tree is 1. Then the tree looks like Fig. 11.17,
or, in the special case where n = 0, like the tree of Fig. 11.9. The only way we can
construct such a tree is if there is a production <S> → x1 x2 · · · xn , where each of
the x’s is a terminal (if n = 0, the production is <S> → ǫ). Thus, x1 x2 · · · xn is a
string in L(<S>).
<S>
x1 x2 ··· xn
INDUCTION. Suppose that statement (1) holds for all trees of height k or less.
Now consider a tree of height k + 1 that looks like Fig. 11.10. Then each of the
subtrees Ti , for i = 1, 2, . . . , n, can be of height at most k. For if any one of the
subtrees had height k + 1 or more, the entire tree would have height at least k + 2.
Thus, the inductive hypothesis applies to each of the trees Ti .
By the inductive hypothesis, if Xi , the root of the subtree Ti , is a syntactic
category, then the yield of Ti , say si , is in the language L(Xi ). If Xi is a terminal,
let us define string si to be Xi . Then the yield of the entire tree is s1 s2 · · · sn .
We know that <S> → X1 X2 · · · Xn is a production, by the definition of a
parse tree. Suppose that we substitute string si for Xi , whenever Xi is a syntactic
category. By definition, Xi is si if Xi is a terminal. It follows that the substituted
body is s1 s2 · · · sn , the same as the yield of the tree. By the inductive rule for the
language of <S>, we know that s1 s2 · · · sn is in L(<S>).
Now we must prove statement (2), that every string s in a syntactic category
<S> has a parse tree with root <S> and s as yield. To begin, let us note that
for each terminal x, there is a parse tree with both root and yield x. Now we use
complete induction on the number of times we applied the inductive step (described
in Section 11.3) when we deduced that s is in L(<S>).
BASIS. Suppose s requires one application of the inductive step to show that s is
in L(<S>). Then there must be a production <S> → x1 x2 · · · xn , where all the
x’s are terminals, and s = x1 x2 · · · xn . We know that there is a one node parse
tree labeled xi for i = 1, 2, . . . , n. Thus, there is a parse tree with yield s and root
labeled <S>; this tree looks like Fig. 11.17. In the special case that n = 0, we know
s = ǫ, and we use the tree of Fig. 11.9 instead.
INDUCTION. Suppose that any string t found to be in the language of any syntactic
category <T > by k or fewer applications of the inductive step has a parse tree with
t as yield and <T > at the root. Consider a string s that is found to be in the
language of syntactic category <S> by k + 1 applications of the inductive step.
Then there is a production <S> → X1 X2 · · · Xn , and s = s1 s2 · · · sn , where each
substring si is either
1. Xi , if Xi is a terminal, or
Thus, for each i, we can find a tree Ti , with yield si and root labeled Xi . If Xi is a
syntactic category, we invoke the inductive hypothesis to claim that Ti exists, and
if Xi is a terminal, we do not need the inductive hypothesis to claim that there is
a one-node tree labeled Xi . Thus, the tree constructed as in Fig. 11.10 has yield s
and root labeled <S>, proving the induction step.
EXERCISES
11.4.1: Find a parse tree for the strings
SEC. 11.4 PARSE TREES 609
( + )
2 14
a) 35+21
b) 123-(4*5)
c) 1*2*(3-4)
according to the grammar of Fig. 11.2. The syntactic category at the root should
be <E> in each case.
11.4.2: Using the statement grammar of Fig. 11.6, find parse trees for the following
strings:
a) wcwcs;
b) {s;}
c) {s;wcs;}.
The syntactic category at the root should be <S> in each case.
11.4.3: Using the balanced parenthesis grammar of Fig. 11.3, find parse trees for
the following strings:
a) (()())
b) ((()))
610 RECURSIVE DESCRIPTION OF PATTERNS
c) ((())()).
11.4.4: Find parse trees for the strings of Exercise 11.4.3, using the grammar of
Fig. 11.4.
✦
✦ ✦
✦
11.5 Ambiguity and the Design of Grammars
Let us consider the grammar for balanced parentheses that we originally showed in
Fig. 11.4, with syntactic category <B> abbreviating <Balanced >:
<B> → (<B>) | <B><B> | ǫ (11.1)
Suppose we want a parse tree for the string ()()(). Two such parse trees are shown
in Fig. 11.18, one in which the first two pairs of parentheses are grouped first, and
the other in which the second two pairs are grouped first.
<B>
<B> <B>
( <B> ) ( <B> ) ǫ
ǫ ǫ
(a) Parse tree that groups from the left.
<B>
<B> <B>
ǫ ( <B> ) ( <B> )
ǫ ǫ
(b) Parse tree that groups from the right.
Fig. 11.18. Two parse trees with the same yield and root.
SEC. 11.5 AMBIGUITY AND THE DESIGN OF GRAMMARS 611
It should come as no surprise that these two parse trees exist. Once we es-
tablish that both () and ()() are balanced strings of parentheses, we can use the
production <B> → <B><B> with () substituting for the first <B> in the body
and ()() substituting for the second, or vice-versa. Either way, the string ()()()
is discovered to be in the syntactic category <B>.
Ambiguous A grammar in which there are two or more parse trees with the same yield and
grammar the same syntactic category labeling the root is said to be ambiguous. Notice that
not every string has to be the yield of several parse trees; it is sufficient that there
be even one such string, to make the grammar ambiguous. For example, the string
()()() is sufficient for us to conclude that the grammar (11.1) is ambiguous. A
grammar that is not ambiguous is called unambiguous. In an unambiguous gram-
mar, for every string s and syntactic category <S>, there is at most one parse tree
with yield s and root labeled <S>.
An example of an unambiguous grammar is that of Fig. 11.3, which we repro-
duce here with <B> in place of <Balanced >,
<B> → (<B>)<B> | ǫ (11.2)
A proof that the grammar is unambiguous is rather difficult. In Fig. 11.19 is the
unique parse tree for string ()()(); the fact that this string has a unique parse tree
does not prove the grammar (11.2) is unambiguous, of course. We can only prove
unambiguity by showing that every string in the language has a unique parse tree.
<B>
( <B> ) <B>
ǫ ( <B> ) <B>
ǫ ( <B> ) <B>
ǫ ǫ
Fig. 11.19. Unique parse tree for the string ( ) ( ) ( ) using the grammar (11.2).
Ambiguity in Expressions
While the grammar of Fig. 11.4 is ambiguous, there is no great harm in its ambiguity,
because whether we group several strings of balanced parentheses from the left or
the right matters little. When we consider grammars for expressions, such as that
of Fig. 11.2 in Section 11.2, some more serious problems can occur. Specifically,
some parse trees imply the wrong value for the expression, while others imply the
correct value.
612 RECURSIVE DESCRIPTION OF PATTERNS
✦ Example 11.7. Let us use the shorthand notation for the expression grammar
that was developed in Example 11.5. Then consider the expression 1-2+3. It has
two parse trees, depending on whether we group operators from the left or the right.
These parse trees are shown in Fig. 11.20(a) and (b).
<E> <E>
<N > <N > <D> <D> <N > <N >
1 2 2 3
(a) Correct parse tree. (b) Incorrect parse tree.
The tree of Fig. 11.20(a) associates from the left, and therefore groups the
operands from the left. That grouping is correct, since we generally group operators
at the same precedence from the left; 1-2+3 is conventionally interpreted as (1-
2)+3, which has the value 2. If we evaluate the expressions represented by subtrees,
working up the tree of Fig. 11.20(a), we first compute 1 − 2 = −1 at the leftmost
child of the root, and then compute −1 + 3 = 2 at the root.
On the other hand, Fig. 11.20(b), which associates from the right, groups our
expression as 1-(2+3), whose value is −4. This interpretation of the expression
is unconventional, however. The value −4 is obtained working up the tree of Fig.
SEC. 11.5 AMBIGUITY AND THE DESIGN OF GRAMMARS 613
11.20(b), since we evaluate 2 + 3 = 5 at the rightmost child of the root, and then
1 − 5 = −4 at the root. ✦
Associating operators of equal precedence from the wrong direction can cause
problems. We also have problems with operators of different precedence; it is pos-
sible to group an operator of low precedence before one of higher precedence, as we
see in the next example.
✦ Example 11.8. Consider the expression 1+2*3. In Fig. 11.21(a) we see the
expression incorrectly grouped from the left, while in Fig. 11.21(b), we have correctly
grouped the expression from the right, so that the multiplication gets its operands
grouped before the addition. The former grouping yields the erroneous value 9,
while the latter grouping produces the conventional value of 7. ✦
<E> <E>
<N > <N > <D> <D> <N > <N >
1 2 2 3
(a) Incorrect parse tree. (b) Correct parse tree.
(1) <E> → <E> + <T > | <E> − <T > | <T >
(2) <T > → <T > ∗ <F > | <T >/<F > | <F >
(3) <F > → (<E>) | <N >
(4) <N > → <N ><D> | <D>
(5) <D> → 0 | 1 | · · · | 9
For instance, the three productions in line (1) define an expression to be either
a smaller expression followed by a + or - and another term, or to be a single term.
If we put these ideas together, the productions say that every expression is a term,
followed by zero or more pairs, each pair consisting of a + or - and a term. Similarly,
line (2) says that a term is either a smaller term followed by * or / and a factor, or it
is a single factor. That is, a term is a factor followed by zero or more pairs, each pair
consisting of a * or a / and a factor. Line (3) says that factors are either numbers,
or expressions surrounded by parentheses. Lines (4) and (5) define numbers and
digits as we have done previously.
The fact that in lines (1) and (2) we use productions such as
<E> → <E> + <T >
rather than the seemingly equivalent <E> → <T > + <E>, forces terms to be
grouped from the left. Thus, we shall see that an expression such as 1-2+3 is
correctly grouped as (1-2)+3. Likewise, terms such as 1/2*3 are correctly grouped
as (1/2)*3, rather than the incorrect 1/(2*3). Figure 11.23 shows the only possible
parse tree for the expression 1-2+3 in the grammar of Fig. 11.22. Notice that 1-2
must be grouped as an expression first. If we had grouped 2+3 first, as in Fig.
11.20(b), there would be no way, in the grammar of Fig. 11.22, to attach the 1- to
this expression.
The distinction among expressions, terms, and factors enforces the correct
grouping of operators at different levels of precedence. For example, the expres-
sion 1+2*3 has only the parse tree of Fig. 11.24, which groups the subexpression
2*3 first, like the tree of Fig. 11.21(b) and unlike the incorrect tree of Fig. 11.21(a),
which groups 1+2 first.
As for the matter of balanced parentheses, we have not proved that the gram-
mar of Fig. 11.22 is unambiguous. The exercises contain a few more examples that
should help convince the reader that this grammar is not only unambiguous, but
gives the correct grouping for each expression. We also suggest how the idea of this
grammar can be extended to more general families of expressions.
SEC. 11.5 AMBIGUITY AND THE DESIGN OF GRAMMARS 615
<E>
<D> 2
<E>
<D> <D> 3
1 2
Fig. 11.24. Parse tree for 1 + 2 ∗ 3 in the unambiguous grammar of Fig. 11.22.
616 RECURSIVE DESCRIPTION OF PATTERNS
EXERCISES
11.5.1: In the grammar of Fig. 11.22, give the unique parse tree for each of the
following expressions:
a) (1+2)/3
b) 1*2-3
c) (1+2)*(3+4)
11.5.2*: The expressions of the grammar in Fig. 11.22 have two levels of precedence;
+ and − at one level, and ∗ and / at a second, higher level. In general, we can handle
expressions with k levels of precedence by using k + 1 syntactic categories. Modify
the grammar of Fig. 11.22 to include the exponentiation operator ^, which is at a
level of precedence higher than * and /. As a hint, define a primary to be an operand
or a parenthesized expression, and redefine a factor to be one or more primaries
connected by the exponentiation operator. Note that exponentiation groups from
the right, not the left, and 2^3^4 means 2^(3^4), rather than (2^3)^4. How do
we force grouping from the right among primaries?
11.5.3*: Extend the unambiguous expression grammar to allow the comparison
operators, =, <=, and so on, which are all at the same level of precedence and
left-associative. Their precedence is below that of + and −.
11.5.4: Extend the expression grammar of Fig. 11.22 to include the unary minus
sign. Note that this operator is at a higher precedence than the other operators;
for instance, -2*-3 is grouped (-2)*(-3).
11.5.5: Extend your grammar of Exercise 11.5.3 to include the logical operators
&&, ||, and !. Give && the precedence of *, || the precedence of +, and ! a higher
precedence than unary −. && and || are binary operators that group from the left.
11.5.6*: Not every expression has more than one parse tree according to the am-
biguous grammar of Fig. 11.2 in Section 11.2. Give several examples of expressions
that have unique parse trees according to this grammar. Can you give a rule indi-
cating when an expression will have a unique parse tree?
11.5.7: The following grammar defines the set of strings (other than ǫ) consisting
of 0’s and 1’s only.
<String> → <String><String> | 0 | 1
In this grammar, how many parse trees does the string 010 have?
11.5.8: Give an unambiguous grammar that defines the same language as the
grammar of Exercise 11.5.7.
11.5.9*: How many parse trees does grammar (11.1) have for the empty string?
Show three different parse trees for the empty string.
✦
✦ ✦
✦
11.6 Constructing Parse Trees
Grammars are like regular expressions, in that both notations describe languages
but do not give directly an algorithm for determining whether a string is in the
SEC. 11.6 CONSTRUCTING PARSE TREES 617
Recursive-Descent Parsing
What we shall give instead is a simpler but less powerful parsing technique called
“recursive descent,” in which the grammar is replaced by a collection of mutually
recursive functions, each corresponding to one of the syntactic categories of the
grammar. The goal of the function S that corresponds to the syntactic category
<S> is to read a sequence of input characters that form a string in the language
L(<S>), and to return a pointer to the root of a parse tree for this string.
A production’s body can be thought of as a sequence of goals — the terminals
and syntactic categories — that must be fulfilled in order to find a string in the
syntactic category of the head. For instance, consider the unambiguous grammar
for balanced parentheses, which we reproduce here as Fig. 11.25.
(1) <B> → ǫ
(2) <B> → ( <B> ) <B>
Production (2) states that one way to find a string of balanced parentheses is
to fulfill the following four goals in order.
1. Find the character (, then
2. Find a string of balanced parentheses, then
3. Find the character ), and finally
4. Find another string of balanced parentheses.
In general, a terminal goal is satisfied if we find that this terminal is the next input
symbol, but the goal cannot be satisfied if the next input symbol is something else.
To tell whether a syntactic category in the body is satisfied, we call a function for
that syntactic category.
The arrangement for constructing parse trees according to a grammar is sug-
gested in Fig. 11.26. Suppose we want to determine whether the sequence of ter-
minals X1 X2 · · · Xn is a string in the syntactic category <S>, and to find its parse
tree if so. Then on the input file we place X1 X2 · · · Xn ENDM, where ENDM is a special
Endmarker symbol that is not a terminal.2 We call ENDM, the endmarker, and its purpose is to
2 In real compilers for programming languages, the entire input might not be placed in a file
at once, but terminals would be discovered one at a time by a preprocessor called a “lexical
analyzer” that examines the source program one character at a time.
618 RECURSIVE DESCRIPTION OF PATTERNS
indicate that the entire string being examined has been read. For example, in C
programs it would be typical to use the end-of-file or end-of-string character for the
endmarker.
X1 X2 ··· Xn ENDM
↑
Call S
Input cursor An input cursor marks the terminal to be processed, the current terminal. If
the input is a string of characters, then the cursor might be a pointer to a character.
We start our parsing program by calling the function S for the starting syntactic
category <S>, with the input cursor at the beginning of the input.
Each time we are working on a production body, and we come to a terminal a
in the production, we look for the matching terminal a at the position indicated by
the input cursor. If we find a, we advance the input cursor to the next terminal on
the input. If the current terminal is something other than a, then we fail to match,
and we cannot find a parse tree for the input string.
On the other hand, if we are working on a production body and we come to
a syntactic category <T >, we call the function T for <T >. If T “fails,” then the
entire parse fails, and the input is deemed not to be in the language being parsed.
If T succeeds, then it “consumes” some input, but moving the input cursor forward
zero or more positions on the input. All input positions, from the position at the
time T was called, up to but not including the position at which T leaves the cursor,
are consumed. T also returns a tree, which is the parse tree for the consumed input.
When we have succeeded with each of the symbols in a production body, we
assemble the parse tree for the portion of the input represented by that production.
To do so, we create a new root node, labeled by the head of the production. The
root’s children are the roots of the trees returned by successful calls to functions for
the syntactic categories of the body and leaves created for each of the terminals of
the body.
TREE B()
{
(1) TREE firstB, secondB;
(2) if(*nextTerminal == ’(’) /* follow production 2 */ {
(3) nextTerminal++;
(4) firstB = B();
(5) if(firstB != FAILED && *nextTerminal == ’)’) {
(6) nextTerminal++;
(7) secondB = B();
(8) if(secondB == FAILED)
(9) return FAILED;
else
(10) return makeNode4(’B’,
makeNode0(’(’),
firstB,
makeNode0(’)’),
secondB);
}
else /* first call to B failed */
(11) return FAILED;
}
else /* follow production 1 */
(12) return makeNode1(’B’, makeNode0(’e’));
}
Fig. 11.27(b). Function to construct parse trees for strings of balanced parentheses.
string. We return a tree constructed by makeNode4. This tree has a root labeled
’B’, and four children. The first child is a leaf labeled (, constructed by makeNode0.
The second is the tree we stored in firstB, which is the parse tree produced by the
call to B at line (4). The third child is a leaf labeled ), and the fourth is the parse
tree stored in secondB, which was returned by the second call to B at line (7).
Line (11) is used only when the test of line (5) fails. Finally, line (12) handles
the case where the original test of line (1) fails to find ( as the first character. In
that case, we assume that production (1) is correct. This production has the body
ǫ, and so we consume no input but return a node, created by makeNode1, that has
the label B and one child labeled ǫ.
( ) ( ) ENDM
call 4 call 5
call 2
call 3
call 1
In call 2, the test of line (2) fails, and we thus return the tree of Fig. 11.29(a)
at line (12). Now we return to call 1, where we are at line (5), with ) pointed to
by nextTerminal and the tree of Fig. 11.29(a) in firstB. Thus, the test of line (5)
succeeds. We advance nextTerminal at line (6) and call B at line (7). This is “call
3” in Fig. 11.28.
In call 3 we succeed at line (2), advance nextTerminal at line (3), and call B
at line (4); this call is “call 4” in Fig. 11.28. As with call 2, call 4 fails the test of
line (2) and returns a (distinct) tree like that of Fig. 11.29(a) at line (12).
We now return to call 3, with nextTerminal still pointing to ), with firstB
(local to this call of B) holding a tree like Fig. 11.29(a), and with control at line
(5). The test succeeds, and we advance nextTerminal at line (6), so it now points
to ENDM. We make the fifth call to B at line (7). This call has its test fail at line (2)
and returns another copy of Fig. 11.29(a) at line (12). This tree becomes the value
of secondB for call 3, and the test of line (8) fails. Thus, at line (10) of call 3, we
construct the tree shown in Fig. 11.29(b).
At this point, call 3 returns successfully to call 1 at line (8), with secondB
of call 1 holding the tree of Fig. 11.29(b). As in call 3, the test of line (8) fails,
and at line (10) we construct a tree with a new root node, whose second child is a
copy of the tree in Fig. 11.29(a) — this tree was held in firstB of call 1 — and
whose fourth child is the tree of Fig. 11.29(b). The resulting tree, which is placed
in parseTree by main, is shown in Fig. 11.29(c). ✦
SEC. 11.6 CONSTRUCTING PARSE TREES 623
B B
ǫ ( B ) B
(a)
ǫ ǫ
(b)
B
( B ) B
ǫ ( B ) B
ǫ ǫ
(c)
Fig. 11.29. Trees constructed by recursive calls to B.
If we have not failed after considering all the Xi ’s, then assemble a parse tree to
return by creating a new node, with children corresponding to X1 , X2 , . . . , Xn ,
in order. If Xi is a terminal, then the child for Xi is a newly created leaf with
label Xi . If Xi is a syntactic category, then the child for Xi is the root of
the tree that was returned when a call to the function for Xi was completed.
Figure 11.29 was an example of this tree construction.
If the syntactic category <S> represents the language whose strings we want
to recognize and parse, then we start the parsing process by placing the input cursor
at the first input terminal. A call to the function S will cause a parse tree for the
input to be constructed if there is one and will return failure if the input is not in
the language L(<S>).
EXERCISES
11.6.1: Show the sequence of calls made by the program of Fig. 11.27 on the inputs
a) (())
b) (()())
c) ())(
in each case followed by the endmarker symbol ENDM.
11.6.2: Consider the following grammar for numbers.
<Number> → <Digit ><Number> | ǫ
<Digit > → 0 | 1 | · · · | 9
Design a recursive-descent parser for this grammar; that is, write a pair of functions,
one for <Number > and the other for <Digit >. You may follow the style of Fig.
11.27 and assume that there are functions like makeN ode1 that return trees with
the root having a specified number of children.
11.6.3**: Suppose we had written the productions for <Number > in Exercise
11.6.2 as
<Number> → <Digit ><Number> | <Digit >
or as
<Number> → <Number><Digit > | ǫ
11.6.4*: The grammar in Fig. 11.30 defines nonempty lists, which are elements
separated by commas and surrounded by parentheses. An element can be either an
atom or a list structure. Here, <E> stands for element, <L> for list, and <T >
for “tail,” that is, either a closing ), or pairs of commas and elements ended by ).
Write a recursive-descent parser for the grammar of Fig. 11.30.
✦
✦ ✦
✦
11.7 A Table-Driven Parsing Algorithm
As we have seen in Section 6.7, recursive function calls are normally implemented
by a stack of activation records. As the functions in a recursive-descent parser do
something very specific, it is possible to replace them by a single function that
examines a table and manipulates a stack itself.
Remember that the function S for a syntactic category <S> first decides what
production to use, then goes through a sequence of steps, one for each symbol in
the body of the selected production. Thus, we can maintain a stack of grammar
symbols that roughly corresponds to the stack of activation records. However,
both terminals and syntactic categories are placed on the stack. When a syntactic
category <S> is on top of the stack, we first determine the correct production.
Then we replace <S> by the body of the selected production, with the left end at
the top of the stack. When a terminal is at the top of the stack, we make sure it
matches the current input symbol. If so, we pop the stack and advance the input
cursor.
To see intuitively why this arrangement works, suppose that a recursive-descent
parser has just called S, the function for syntactic category <S>, and the selected
production has body a<B><C>. Then there would be four times when this acti-
vation record for S is active.
1. When it checks for a on the input,
2. When it makes the call to B,
3. When that call returns and C is called, and
4. When the call to C returns and S is finished.
If, in the table-driven parser, we immediately replace <S> by the symbols of
the body, a<B><C> in this example, then the stack will expose these symbols at
the same points on the input when control returns to the corresponding activation
of S in the recursive-descent parser.
1. The first time, a is exposed, and we check for a on the input, just as function
S would.
2. The second time, which occurs immediately afterward, S would call B, but we
have <B> at the top of the stack, which will cause the same action.
3. The third time, S calls C, but we find <C> on top of the stack and do the
same.
4. The fourth time, S returns, and we find no more of the symbols by which <S>
was replaced. Thus, the symbol below the point on the stack that formerly held
<S> is now exposed. Analogously, the activation record below S’s activation
record would receive control in the recursive-descent parser.
626 RECURSIVE DESCRIPTION OF PATTERNS
Parsing Tables
As an alternative to writing a collection of recursive functions, we can construct a
parsing table, whose rows correspond to the syntactic categories, and whose columns
correspond to the possible lookahead symbols. The entry in the row for syntactic
category <S> and lookahead symbol X is the number of the production with head
<S> that must be used to expand <S> if the lookahead is X.
Certain entries of the parse table are left blank. Should we find that syntactic
category <S> needs to be expanded, and the lookahead is X, but the entry in the
row for <S> and the column for X is blank, then the parse has failed. In this case,
we can be sure that the input is not in the language.
✦ Example 11.10. In Fig. 11.31 we see the parsing table for the grammar of
Fig. 11.25, the unambiguous grammar for balanced parentheses. This parsing table
is rather simple, because there is only one syntactic category. The table expresses
the same strategy that we used in our running example of Section 11.6. Expand
by production (2), or <B> → (<B>)<B>, if the lookahead is (, and expand by
production (1), or <B> → ǫ, otherwise. We shall see shortly how parsing tables
such as this one are used. ✦
( ) ENDM
<B> 2 1 1
w c { } s ; ENDM
<S> 1 2 3
<T > 4 4 5 4
The grammar of Fig. 11.33 has the form it does so that it can be parsed by
recursive descent (or equivalently, by the table-driven parsing algorithm we are
describing). To see why this form is necessary, let us consider the productions for
<L> in the grammar of Fig. 11.6:
<L> → <L><S> | ǫ
SEC. 11.7 A TABLE-DRIVEN PARSING ALGORITHM 627
If the current input is a terminal like s that begins a statement, we know that <L>
must be expanded at least once by the first production, whose body is <L><S>.
However, we cannot tell how many times to expand until we examine subsequent
inputs and see how many statements there are in the statement list.
Our approach in Fig. 11.33 is to remember that a block consists of a left bracket
followed by zero or more statements and a right bracket. Call the zero or more
statements and the right bracket the “tail,” represented by syntactic category <T >.
Production (2) in Fig. 11.33 says that a statement can be a left bracket followed by
a tail. Productions (4) and (5) say that a tail is either a statement followed by a
tail, or just a right bracket.
We can decide whether to expand a <T > by production (4) or (5) quite eas-
ily. Production (5) only makes sense if a right bracket is the current input, while
production (4) only makes sense if the current input can start a statement. In our
simple grammar, the only terminals that start statements are w, {, and s. Thus, we
see in Fig. 11.32 that in the row for syntactic category <T > we choose production
(4) on these three lookaheads and choose production (5) on the lookahead }. On
other lookaheads, it is impossible that we could have the beginning of a tail, so we
leave the entries for other lookaheads blank in the row for <T >.
Similarly, the decision for syntactic category <S> is easy. If the lookahead
symbol is w, then only production (1) could work. If the lookahead is {, then
only production (2) is a possible choice, and on lookahead s, the only possibility is
production (3). On any other lookahead, there is no way that the input could form
a statement. These observations explain the row for <S> in Fig. 11.32. ✦
b) If the entry contains production i, then we pop <S> from the top of the
stack and push each of the symbols in the body of production i onto the
stack. The symbols of the body are pushed rightmost first, so that at the
end, the first symbol of the body is at the top of the stack, the second
symbol is immediately below it, and so on. As a special case, if the body
is ǫ, we simply pop <S> off the stack and push nothing.
Suppose we wish to determine whether string s is in L(<S>). In that case, we
start our driver with the string s ENDM on the input,3 and read the first terminal
as the lookahead symbol. The stack initially only consists of the syntactic category
<S>.
✦ Example 11.12. Let us use the parsing table of Fig. 11.32 on the input
{w c s ; s ; }ENDM
Figure 11.34 shows the steps taken by the table-driven parser. The stack contents
are shown with the top at the left end, so that when we replace a syntactic category
at the top of the stack by the body of one of its productions, the body appears in
the top positions of the stack, with its symbols in the usual order.
Fig. 11.34. Steps of a table-driven parser using the table of Fig. 11.32.
3 Sometimes the endmarker symbol ENDM is needed as a lookahead symbol to tell us that we
have reached the end of the input; other times it is only to catch errors. For instance, ENDM is
needed in Fig. 11.31, because we can always have more parentheses after a balanced string,
but it is not needed in Fig. 11.32, as is attested to by the fact that we never put any entries
in the column for ENDM.
SEC. 11.7 A TABLE-DRIVEN PARSING ALGORITHM 629
Line (1) of Fig. 11.34 shows the initial situation. As <S> is the syntactic
category in which we want to test membership of the string {wcs;s;}, we start
with the stack holding only <S>. The first symbol of the given string, {, is the
lookahead symbol, and the remainder of the string, followed by ENDM is the remaining
input.
If we consult the entry in Fig. 11.32 for syntactic category <S> and lookahead
{, we see that we must expand <S> by production (2). The body of this production
is {<T >, and we see that this sequence of two grammar symbols has replaced <S>
at the top of the stack when we get to line (2).
Now there is a terminal, {, at the top of the stack. We thus compare it with
the lookahead symbol. Since the stack top and the lookahead agree, we pop the
stack and advance to the next input symbol, w, which becomes the new lookahead
symbol. These changes are reflected in line (3).
Next, with <T > on top of the stack and w the lookahead symbol, we consult
Fig. 11.32 and find that the proper action is to expand by production (4). We thus
pop <T > off the stack and push <S><T >, as seen in line (4). Similarly, the <S>
now on top of the stack is replaced by the body of production (1), since that is the
action decreed by the row for <S> and the column for lookahead w in Fig. 11.32;
that change is reflected in line (5). After lines (5) and (6), the terminals on top of
the stack are compared with the current lookahead symbol, and since each matches,
they are popped and the input cursor advanced.
The reader is invited to follow lines (7) through (16) and check that each is
the proper action to take according to the parsing table. As each terminal, when
it gets to the top of the stack, matches the then current lookahead symbol, we do
not fail. Thus, the string {wcs;s;} is in the syntactic category <S>; that is, it is
a statement. ✦
The algorithm described above tells whether a given string is in a given syntac-
tic category, but it doesn’t produce the parse tree. There is, however, a simple
modification of the algorithm that will also give us a parse tree, when the input
string is in the syntactic category with which we initialize the stack. The recursive-
descent parser described in the previous section builds its parse trees bottom-up,
that is, starting at the leaves and combining them into progressively larger subtrees
as function calls return.
For the table-driven parser, it is more convenient to build the parse trees from
the top down. That is, we start with the root, and as we choose a production
with which to expand the syntactic category at the top of the stack, we simulta-
neously create children for a node in the tree under construction; these children
correspond to the symbols in the body of the selected production. The rules for
tree construction are as follows.
1. Initially, the stack contains only some syntactic category, say, <S>. We ini-
tialize the parse tree to have only one node, labeled <S>. The <S> on the
stack corresponds to the one node of the parse tree being constructed.
630 RECURSIVE DESCRIPTION OF PATTERNS
✦ Example 11.13. Let us follow the steps of Fig. 11.34 and construct the parse
tree as we go. To begin, at line (1), the stack consists of only <S>, and the
corresponding tree is the single node shown in Fig. 11.35(a). At line (2) we expanded
<S> using the production
<S> → {<T >
and so we give the leaf of Fig. 11.35(a) two children, labeled { and <T >, from the
left. The tree for line (2) is shown in Fig. 11.35(b).
Line (3) results in no change in the parse tree, since we match terminals and
do not expand a syntactic category. However, at line (4) we expand <T > into
<S><T >, and so we give the leaf labeled <T > in Fig. 11.35(b) two children with
these symbols as labels, as shown in Fig. 11.35(c). Then at line (5) the <S> is
expanded to wc<S>, which results in the leaf labeled <S> in Fig. 11.35(c) being
given three children. The reader is invited to continue this process. The final parse
tree is shown in Fig. 11.36. ✦
<S>
{ <T >
s ; s ; }
Fig. 11.36. Complete parse tree for the parse of Fig. 11.34.
Left recursion The first trick is to eliminate left recursion. We pointed out in Example 11.11
elimination how the productions
<L> → <L><S> | ǫ
could not be parsable by these methods because we could not tell how many times
to apply the first production. In general, whenever a production for some syntactic
category <X> has a body that begins with <X> itself, we are going to get confused
as to how many times the production needs to be applied to expand <X>. We
call this situation left recursion. However, we can often rearrange symbols of the
body of the offending production so <X> comes later. This step is left recursion
elimination.
✦ Example 11.14. In Example 11.11 discussed above, we can observe that <L>
represents zero or more <S>’s. We can therefore eliminate the left-recursion by
reversing the <S> and <L>, as
<L> → <S><L> | ǫ
For another example, consider the productions for numbers:
<N umber> → <N umber> <Digit> | <Digit>
Given a digit as lookahead, we do not know how many times to use the first pro-
duction to expand <N umber>. However, we observe that a number is one or more
digits, allowing us to reorder the body of the first production, as:
<N umber> → <Digit> <N umber> | <Digit>
This pair of productions eliminates the left recursion. ✦
Unfortunately, the productions of Example 11.14 are still not parsable by our
Left factoring methods. To make them parsable, we need the second trick, which is left factoring.
When two productions for a syntactic category <X> have bodies that begin with
632 RECURSIVE DESCRIPTION OF PATTERNS
the same symbol C, we cannot tell which production to use, whenever the lookahead
is something that could come from that common symbol C.
To left factor the productions, we create a new syntactic category <T > that
represents the “tail” of either production, that is, the parts of the body that follow
C. We then replace the two productions for <X> by a production
<X> → C <T >
and two productions with head <T >. These two productions have bodies that are
the “tails,” that is, whatever follows C in the two productions for <X>.
✦ Example 11.15. Consider the productions for <N umber> that we developed
in Example 11.14:
<N umber> → <Digit> <N umber> | <Digit>
These two productions begin with a common symbol, <Digit>. We thus cannot
tell which to use when the lookahead is a digit. However, we can defer the decision
if we left factor them into
<N umber> → <Digit> <T ail>
<T ail> → <N umber> | ǫ
Here, the two productions for <T ail> allow us to choose the tail of the first produc-
tion for <N umber>, which is <N umber> itself, or the tail of the second production
for <N umber>, which is ǫ.
Now, when we have to expand <N umber> and see a digit as the lookahead,
we replace <N umber> by <Digit> <T ail>, match the digit, and then can choose
how to expand tail, seeing what follows that digit. That is, if another digit follows,
then we expand by the first choice for <T ail>, and if something other than a digit
follows, we know we have seen the whole number and replace <T ail> by ǫ. ✦
EXERCISES
11.7.1: Simulate the table-driven parser using the parsing table of Fig. 11.32 on
the following input strings:
a) {s;}
b) wc{s;s;}
c) {{s;s;}s;}
d) {s;s}
11.7.2: For each of the parses in Exercise 11.7.1 that succeeds, show how the parse
tree is constructed during the parse.
11.7.3: Simulate the table-driven parser, using the parsing table of Fig. 11.31, on
the input strings of Exercise 11.6.1.
11.7.4: Show the construction of the parse trees during the parses of Exercise
11.7.3.
11.7.5*: The following grammar
SEC. 11.8 GRAMMARS VERSUS REGULAR EXPRESSIONS 633
✦
✦ ✦
✦
11.8 Grammars Versus Regular Expressions
Both grammars and regular expressions are notations for describing languages. We
saw in Chapter 10 that the regular expression notation was equivalent to two other
notations, deterministic and nondeterministic automata, in the sense that the set
of languages describable by each of these notations is the same. Is it possible that
grammars are another notation equivalent to the ones we have seen previously?
The answer is “no”; grammars are more powerful than the notations such as
regular expressions that we introduced in Chapter 10. We shall demonstrate the
expressive power of grammars in two steps. First, we shall show that every language
describable by a regular expression is also describable by a grammar. Then we shall
exhibit a language that can be described by a grammar, but not by any regular
expression.
STATEMENT For every regular expression R, there is a grammar such that for
one of its syntactic categories <S>, we have L(<S>) = L(R).
That is, the language denoted by the regular expression is also the language of the
syntactic category <S>.
BASIS. The basis case is n = 0, where the regular expression R has zero operator
occurrences. Either R is a single symbol — say, x — or R is ǫ or ∅. We create a
new syntactic category <S>. In the first case, where R = x, we also create the
production <S> → x. Thus, L(<S>) = {x}, and L(R) is the same language of
one string. If R is ǫ, we similarly create the production <S> → ǫ for <S>, and if
R = ∅, we create no production at all for <S>. Then L(<S>) is {ǫ} when R is ǫ,
and L(<S>) is ∅ when R is ∅.
INDUCTION. Suppose the inductive hypothesis holds for regular expressions with n
or fewer occurrences of operators. Let R be a regular expression with n + 1 operator
occurrences. There are three cases, depending on whether the last operator applied
to build R is union, concatenation, or closure.
1. R = R1 | R2 . Since there is one operator occurrence, | (union), that is part of
neither R1 nor R2 , we know that neither R1 nor R2 have more than n operator
occurrences. Thus, the inductive hypothesis applies to each of these, and we can
find a grammar G1 with a syntactic category <S1 >, and a grammar G2 with a
syntactic category <S2 >, such that L(<S1 >) = L(R1 ) and L(<S2 >) = L(R2 ).
To avoid coincidences when the two grammars are merged, we can assume that
as we construct new grammars, we always create syntactic categories with
names that appear in no other grammar. As a result, G1 and G2 have no
syntactic category in common. We create a new syntactic category <S> that
appears neither in G1 , in G2 , nor in any other grammar that we may have
constructed for other regular expressions. To the productions of G1 and G2 we
add the two productions
<S> → <S1 > | <S2 >
Then the language of <S> consists of all and only the strings in the languages
of <S1 > and <S2 >. These are L(R1 ) and L(R2 ), respectively, and so
L(<S>) = L(R1 ) ∪ L(R2 ) = L(R)
as we desired.
2. R = R1 R2 . As in case (1), suppose there are grammars G1 and G2 , with syn-
tactic categories <S1 > and <S2 >, respectively, such that L(<S1 >) = L(R1 )
and L(<S2 >) = L(R2 ). Then create a new syntactic category <S> and add
the production
<S> → <S1 ><S2 >
to the productions of G1 and G2 . Then L(<S>) = L(<S1 >)L(<S2 >).
3. R = R1 *. Let G1 be a grammar with a syntactic category <S1 > such that
L(<S1 >) = L(R1 ). Create a new syntactic category <S> and add the pro-
ductions
SEC. 11.8 GRAMMARS VERSUS REGULAR EXPRESSIONS 635
4 If one of these symbols appeared two or more times, it would not be necessary to make a new
syntactic category for each occurrence; one syntactic category for each symbol would suffice.
636 RECURSIVE DESCRIPTION OF PATTERNS
0 0 0 0
s0 s1 ··· sm−1 sm
shall call s2 , and so on. In general, after reading i 0’s A is in state si , as suggested
by Fig. 11.37.5
Now A was assumed to have exactly m states, and there are m+1 states among
s0 , s1 , . . . , sm . Thus, it is not possible that all of these states are different. There
must be some distinct integers i and j in the range 0 to m, such that si and sj are
really the same state. If we assume that i is the smaller of i and j, then the path of
Fig. 11.37 must have at least one loop in it, as suggested by Fig. 11.38. In practice,
there could be many more loops and repetitions of states than is suggested by Fig.
11.38. Also notice that i could be 0, in which case the path from s0 to si suggested
in Fig. 11.38 is really just a single node. Similarly, sj could be sm , in which case
the path from sj to sm is but a single node.
The implication of Fig. 11.38 is that the automaton A cannot “remember” how
many 0’s it has seen. If it is in state sm , it might have seen exactly m 0’s, and so
it must be that if we start in state m and feed A exactly m 1’s, A arrives at an
accepting state, as suggested by Fig. 11.39.
However, suppose that we fed A a string of m + j − i 0’s. Looking at Fig.
11.38, we see that i 0’s take A from s0 to si , which is the same as sj . We also see
that m − j 0’s take A from sj to sm . Thus, m − j + i 0’s take A from s0 to sm , as
suggested by the upper path in Fig. 11.39.
Hence, m − j + i 0’s followed by m 1’s takes A from s0 to an accepting state.
Put another way, the string 0m−j+i 1m is in the language of A. But since j is greater
5 The reader should remember that we don’t really know the names of A’s states; we only
know that A has m states for some integer m. Thus, the names s0 , . . . , sm are not A’s names
for its states, but rather our names for its states. That is not as odd as it might seem. For
example, we routinely do things like create an array s, indexed from 0 to m, and store in
s[i] some value, which might be the name of a state of automaton A. We might then, in a
program, refer to this state name as s[i], rather than by its own name.
638 RECURSIVE DESCRIPTION OF PATTERNS
sj−1 si+1
0 0
0 0
s0 si−1 si sj+1 sm
= sj
0m−j+i
start 1m
s0 sm
0m
Fig. 11.39. Automaton A cannot tell whether it has seen m 0’s or m − j + i 0’s.
than i, this string has more 1’s than 0’s, and is not in the language E. We conclude
that A’s language is not exactly E, as we had supposed.
As we started by assuming only that E had a deterministic finite automaton
and wound up deriving a contradiction, we conclude that our assumption was false;
that is, E has no deterministic finite automaton. Thus, E cannot have a regular
expression either.
The language {0n 1n | n ≥ 1} is just one of an infinite number of languages that
can be specified by a grammar but not by any regular expression. Some examples
are offered in the exercises.
EXERCISES
11.8.1: Find grammars that define the languages of the following regular expres-
sions.
a) (a | b)*a
b) a* | b* | (ab)*
c) a*b*c*
SEC. 11.8 GRAMMARS VERSUS REGULAR EXPRESSIONS 639
11.8.2*: Show that the set of strings of balanced parentheses is not defined by any
regular expression. Hint : The proof is similar to the proof for the language E above.
Suppose that the set of balanced strings had a deterministic finite automaton of m
states. Feed this automaton m (’s, and examine the states it enters. Show that the
automaton can be “fooled” into accepting a string with unbalanced parentheses.
11.8.3*: Show that the language consisting of strings of the form 0n 10n , that is,
two equal-length runs of 0’s separated by a single 1, is not defined by any regular
expression.
11.8.4*: One sometimes sees fallacious assertions that a language like E of this
section is described by a regular expression. The argument is that for each n, 0n 1n
is a regular expression defining the language with one string, 0n 1n . Thus,
01 | 02 12 | 03 13 | · · ·
is a regular expression describing E. What is wrong with this argument?
11.8.5*: Another fallacious argument about languages claims that E has the fol-
lowing finite automaton. The automaton has one state a, which is both the start
state and an accepting state. There is a transition from a to itself on symbols 0 and
1. Then surely string 0i 1i takes state a to itself, and is this accepted. Why does
this argument not show that E is the language of a finite automaton?
11.8.6**: Show that each of the following languages cannot be defined by a regular
expression.
✦
✦ ✦
✦
11.9 Summary of Chapter 11
After reading this chapter, the reader should be aware of the following points:
✦ How a (context-free) grammar defines a language
✦ How to construct a parse tree to represent the grammatical structure of a string
✦ What ambiguity is and why ambiguous grammars are undesirable in the spec-
ification of programming languages
✦ A technique called recursive-descent parsing that can be used to construct parse
trees for certain classes of grammars
✦ A table-driven way of implementing recursive-descent parsers
✦ Why grammars are a more powerful notation for describing languages than are
regular expressions or finite automata
✦
✦ ✦
✦
11.10 Bibliographic Notes for Chapter 11
Context-free grammars were first studied by Chomsky [1956] as a formalism for
describing natural languages. Similar formalisms were used to define the syntax of
two of the earliest important programming languages, Fortran (Backus et al. [1957])
and Algol 60 (Naur [1963]). As a result, context-free grammars are often referred
to as Backus-Naur Form (BNF, for short). The study of context-free grammars
through their mathematical properties begins with Bar-Hillel, Perles, and Shamir
[1961]. For a more thorough study of context-free grammars and their applications
see Hopcroft and Ullman [1979] or Aho, Sethi, and Ullman [1986].
Recursive-descent parsers have been used in many compilers and compiler-
writing systems (see Lewis, Rosenkrantz, and Stearns [1974]). Knuth [1965] was
the first to identify LR grammars, the largest natural class of grammars that can
be deterministically parsed, scanning the input from left to right.
Aho, A. V., R. Sethi, and J. D. Ullman [1986]. Compiler Design: Principles, Tech-
niques, and Tools, Addison-Wesley, Reading, Mass.
Backus, J. W. [1957]. “The FORTRAN automatic coding system,” Proc. AFIPS
Western Joint Computer Conference, pp. 188–198, Spartan Books, Baltimore.
Bar-Hillel, Y., M. Perles, and E. Shamir [1961]. “On formal properties of simple
phrase structure grammars,” Z. Phonetik, Sprachwissenschaft und Kommunika-
tionsforschung 14, pp. 143–172.
Chomsky, N. [1956]. “Three models for the description of language,” IRE Trans.
Information Theory IT-2:3, pp. 113–124.
Hopcroft, J. E., and J. D. Ullman [1979]. Introduction to Automata Theory, Lan-
guages, and Computation, Addison-Wesley, Reading, Mass.
SEC. 11.10 BIBLIOGRAPHIC NOTES FOR CHAPTER 11 641
Knuth, D. E. [1965]. “On the translation of languages from left to right,” Informa-
tion and Control 8:6, pp. 607–639.
Lewis, P. M., D. J. Rosenkrantz, and R. E. Stearns [1974]. “Attributed transla-
tions,” J. Computer and System Sciences 9:3, pp. 279–307.
Naur, P. (ed.) [1963]. “Revised report on the algorithmic language Algol 60,”
Comm. ACM 6:1, pp. 1–17.
CHAPTER 12
✦
✦ ✦
✦
Propositional
Logic
✦
✦ ✦
✦
12.1 What This Chapter Is About
Section 12.2 gives an intuitive explanation of what propositional logic is, and why it
is useful. The next section, 12,3, introduces an algebra for logical expressions with
Boolean-valued operands and with logical operators such as AND, OR, and NOT that
Boolean algebra operate on Boolean (true/false) values. This algebra is often called Boolean algebra
after George Boole, the logician who first framed logic as an algebra. We then learn
the following ideas.
✦ Truth tables are a useful way to represent the meaning of an expression in logic
(Section 12.4).
✦ We can convert a truth table to a logical expression for the same logical function
(Section 12.5).
✦ The Karnaugh map is a useful tabular technique for simplifying logical expres-
sions (Section 12.6).
✦ There is a rich set of “tautologies,” or algebraic laws that can be applied to
logical expressions (Sections 12.7 and 12.8).
642
SEC. 12.2 WHAT IS PROPOSITIONAL LOGIC? 643
✦
✦ ✦
✦
12.2 What Is Propositional Logic?
Sam wrote a C program containing the if-statement
if (a < b || (a >= b && c == d)) ... (12.1)
Sally points out that the conditional expression in the if-statement could have been
written more simply as
if (a < b || c == d) ... (12.2)
How did Sally draw this conclusion?
She might have reasoned as follows. Suppose a < b. Then the first of the two
OR’ed conditions is true in both statements, so the then-branch is taken in either of
the if-statements (12.1) and (12.2).
Now suppose a < b is false. In this case, we can only take the then-branch
if the second of the two conditions is true. For statement (12.1), we are asking
whether
a >= b && c == d
is true. Now a >= b is surely true, since we assume a < b is false. Thus we take the
then-branch in (12.1) exactly when c == d is true. For statement (12.2), we clearly
take the then-branch exactly when c == d. Thus no matter what the values of a, b,
c, and d are, either both or neither of the if-statements cause the then-branch to be
followed. We conclude that Sally is right, and the simplified conditional expression
can be substituted for the first with no change in what the program does.
Propositional logic is a mathematical model that allows us to reason about the
truth or falsehood of logical expressions. We shall define logical expressions formally
in the next section, but for the time being we can think of a logical expression as
a simplification of a conditional expression such as lines (12.1) or (12.2) above that
abstracts away the order of evaluation contraints of the logical operators in C.
“proposition,” that is, any statement that can have one of the truth values, true or
false.
Logical expressions can contain logical operators such as AND, OR, and NOT.
When the values of the operands of the logical operators in a logical expression are
known, the value of the expression can be determined using rules such as
1. The expression p AND q is true only when both p and q are true; it is false
otherwise.
The operator NOT has the same meaning as the C operator !. The operators AND and
OR are like the C operators && and ||, respectively, but with a technical difference.
The C operators are defined to evaluate the second operand only when the first
operand does not resolve the matter — that is, when the first operation of && is
true or the first operand of || is false. However, this detail is only important when
the C expression has side effects. Since there are no “side effects” in the evaluation
of logical expressions, we can take AND to be synonymous with the C operator &&
and take OR to be synonymous with ||.
For example, the condition in Equation (12.1) can be written as the logical
expression
p OR (NOT p) AND q
and Equation (12.2) can be written as p OR q. Our reasoning about the two if-
statements (12.1) and (12.2) showed the general proposition that
p OR (NOT p) AND q ≡ (p OR q) (12.3)
where ≡ means “is equivalent to” or “has the same Boolean value as.” That is,
no matter what truth values are assigned to the propositional variables p and q,
the left-hand side and right-hand side of ≡ are either both true or both false. We
discovered that for the equivalence above, both are true when p is true or when q is
true, and both are false if p and q are both false. Thus, we have a valid equivalence.
As p and q can be any propositions we like, we can use equivalence (12.3) to
simplify many different expressions. For example, we could let p be
a == b+1 && c < d
Note that we placed parentheses around the values of p and q to make sure the
resulting expression is grouped properly.
Equivalence (12.3) tells us that (12.4) can be simplified to the right-hand side
of (12.3), which is
(a == b+1 && c < d) || (a == c || b == c)
SEC. 12.3 LOGICAL EXPRESSIONS 645
As another example, we could let p be the proposition, “It is sunny,” and q the
proposition, “Joe takes his umbrella.” Then the left-hand side of (12.3) is
“It is sunny, or it is not sunny and Joe takes his umbrella.”
while the right-hand side, which says the same thing, is
“It is sunny or Joe takes his umbrella.”
✦
✦ ✦
✦
12.3 Logical Expressions
BASIS. Propositional variables and the logical constants, TRUE and FALSE, are log-
ical expressions. These are the atomic operands.
That is, logical expressions can be built from the binary infix operators AND and
OR, the unary prefix operator NOT. As with other algebras, we need parentheses for
grouping, but in some cases we can use the precedence and associativity of operators
to eliminate redundant pairs of parentheses, as we do in the conditional expressions
of C that involve these logical operators. In the next section, we shall see more
logical operators than can appear in logical expressions.
✦ Example 12.2. NOT NOT p OR q is grouped NOT (NOT p) OR q. NOT p OR q AND r
is grouped (NOT p) OR (q AND r). You should observe that there is an analogy
between the precedence and associativity of AND, OR, and NOT on one hand, and the
arithmetic operators ×, +, and unary − on the other. For instance, the second of
the above expressions can be compared with the arithmetic expression −p + q × r,
which has the same grouping, (−p) + (q × r). ✦
SEC. 12.3 LOGICAL EXPRESSIONS 647
OR
AND s
p OR
q r
Boolean Functions
The “meaning” of any expression can be described formally as a function from
the values of its arguments to a value for the whole expression. For example, the
arithmetic expression x × (x + y) is a function that takes values for x and y (say
reals) and returns the value obtained by adding the two arguments and multiplying
the sum by the first argument. The behavior is similar to that of a C function
declared
648 PROPOSITIONAL LOGIC
EXERCISES
12.3.1: Evaluate the following expressions for all possible truth values, to express
their Boolean functions as a set-theoretic function.
a) p AND (p OR q)
b) NOT p OR q
c) (p AND q) OR (NOT p AND NOT q)
12.3.2: Write C functions to implement the logical expressions in Exercise 12.3.1.
SEC. 12.4 TRUTH TABLES 649
✦
✦ ✦
✦
12.4 Truth Tables
p q p AND q p q p OR q p NOT p
0 0 0 0 0 0 0 1
0 1 0 0 1 1 1 0
1 0 0 1 0 1
1 1 1 1 1 1
✦ Example 12.5. The truth tables for AND, OR, and NOT are shown in Fig. 12.2.
Here, and frequently in this chapter, we shall use the shorthand that 1 stands for
TRUE and 0 stands for FALSE. Thus the truth table for AND says that the result is
TRUE if and only if both operands are TRUE; the second truth table says that the
result of applying the OR operator is TRUE when either of the operands, or both, are
TRUE; the third truth table says that the result of applying the NOT operator is TRUE
if and only if the operand has the value FALSE. ✦
Understanding “Implies”
The meaning of the implication operator → may appear unintuitive, since we must
get used to the notion that “falsehood implies everything.” We should not confuse
→ with causation. That is, p → q may be true, yet p does not “cause” q in any
sense. For example, let p be “it is raining,” and q be “Sue takes her umbrella.” We
might assert that p → q is true. It might even appear that the rain is what caused
Sue to take her umbrella. However, it could also be true that Sue is the sort of
person who doesn’t believe weather forecasts and prefers to carry an umbrella at
all times.
p q p→q
0 0 1
0 1 1
1 0 0
1 1 1
2. Equivalence, written ≡, means “if and only if”; that is, p ≡ q is true when
both p and q are true, or when both are false, but not otherwise. Its truth
table is shown in Fig. 12.4. Another way of looking at the ≡ operator is that
it asserts that the operands on the left and right have the same truth value.
That is what we meant
in Section 12.2 when we claimed, for example, that
p OR (NOT p AND q) ≡ (p OR q).
3. The NAND, or “not-and,” operator applies AND to its operands and then comple-
ments the result by applying NOT. We write p NAND q to denote NOT (p AND q).
4. Similarly, the NOR, or “not-or,” operator takes the OR of its operands and com-
plements the result; p NOR q denotes NOT (p OR q). The truth tables for NAND
and NOR are shown in Fig. 12.4.
P 4 6 2 Q
7
5 3
Here, the region 0 corresponds to the set of elements that are in none of P , Q,
and R, region 1 corresponds to the elements that are in R, but not in P or Q. In
general, if we look at the 3-place binary representation of a region number, say abc,
then the elements of the region are in P if a = 1, in Q if b = 1, and in R if c = 1.
Thus the region numbered (abc)2 corresponds to the row of the truth table where
p, q, and r have truth values a, b, and c, respectively.
When dealing with Venn diagrams, we took the union of two sets of regions to
include the regions in either set. In analogy, when we take the OR of columns in a
truth table, we put 1 in the union of the rows that have 1 in the first column and
the rows that have 1 in the second column. Similarly, we intersect sets of regions
in a Venn diagram by taking only those regions in both sets, and we take the AND
of columns by putting a 1 in the intersection of the set of rows that have 1 in the
first column and the set of rows with 1 in the second column.
The logical NOT operator does not quite correspond to a set operator. However,
if we imagine that the union of all the regions is a “universal set,” then logical
NOT corresponds to taking a set of regions and producing the set consisting of the
remaining regions of the Venn diagram, that is, subtracting the given set from the
universal set.
SEC. 12.5 FROM BOOLEAN FUNCTIONS TO LOGICAL EXPRESSIONS 655
EXERCISES
12.4.1: Give the rule for computing the (a) NAND (b) NOR (c) ≡ of two columns of
a truth table.
12.4.2: Compute the truth table for the following expressions and their subexpres-
sions.
a) (p → q) ≡ (NOT p OR q)
b) p → q → (r OR NOT p)
c) (p OR q) → (p AND q)
12.4.3*: To what set operator does the logical expression p AND NOT q correspond?
(See the box comparing Venn diagrams and truth tables.)
12.4.4*: Give examples to show that →, NAND, and NOR are not associative.
for any truth values x2 , x3 , . . . , xk . Similarly, we can say f does not depend on
its ith argument if the value of f never changes when its ith argument is switched
between TRUE and FALSE. How many Boolean functions of two arguments do not
depend on their first or second argument (or both)?
12.4.6*: Construct truth tables for the 16 Boolean functions of two variables. How
many of these functions are commutative?
Exclusive or 12.4.7: The binary exclusive-or function, ⊕, is defined to have value TRUE if and
only if exactly one of its arguments are TRUE.
✦
✦ ✦
✦
12.5 From Boolean Functions to Logical Expressions
Now, let us consider the problem of designing a logical expression from a truth
table. We start with a truth table as the specification of the logical expression,
and our goal is to find an expression with the given truth table. Generally, there is
an infinity of different expressions we could use; we usually limit our selection to a
particular set of operators, and we often want an expression that is “simplest” in
some sense.
This problem is a fundamental one in circuit design. The logical operators in
the expression may be taken as the gates of the circuit, and so there is a straight-
forward translation from a logical expression to an electronic circuit, by a process
we shall discuss in the next chapter.
656 PROPOSITIONAL LOGIC
x y
d c
✦ Example 12.7. As we saw in Section 1.3, we can design a 32-bit adder out of
one-bit adders of the type shown in Fig. 12.7. The one-bit adder sums two input
bits x and y, and a carry-in bit c, to produce a carry-out bit d and a sum bit z.
The truth table in Fig. 12.8 tells us the value of the carry-out bit d and the
sum-bit z, as a function of x, y, and c for each of the eight combinations of input
values. The carry-out bit d is 1 if at least two of x, y, and c have the value 1, and
d = 0 if only zero or one of the inputs is 1. The sum bit z is 1 if an odd number of
x, y, and c are 1, and 0 if not.
x y c d z
0) 0 0 0 0 0
1) 0 0 1 0 1
2) 0 1 0 0 1
3) 0 1 1 1 0
4) 1 0 0 0 1
5) 1 0 1 1 0
6) 1 1 0 1 0
7) 1 1 1 1 1
Fig. 12.8. Truth table for the carry-out bit d and the sum-bit z.
Fig. 12.9. Truth table for carry-out expression (12.5) and its subexpressions.
Shorthand Notation
Before proceeding to describe how we build expressions from truth tables, there are
some simplifications in notation that will prove helpful.
1. We can represent the AND operator by juxtaposition, that is, by no operator at
all, just as we often represent multiplication, and as we represented concatena-
tion in Chapter 10.
2. The OR operator can be represented by +.
3. The NOT operator can be represented by an overbar. This convention is espe-
cially useful when the NOT applies to a single variable, and we shall often write
NOT p as p̄.
One important reason for the new notation is that it allows us to think of
AND and OR as if they were multiplication and addition in arithmetic. Thus we can
apply such familiar laws as commutativity, associativity, and distributivity, which
we shall see in Section 12.8 apply to these logical operators, just as they do to the
corresponding arithmetic operators. For example, we shall see that p(q + r) can be
replaced by pq + pr, and then by rp + qp, whether the operators involved are AND
Product, sum, and OR, or multiplication and addition.
conjunction, Because of this shorthand notation, it is common to refer to the AND of expres-
disjunction sions as a product and to the OR of expressions as a sum. Another name for the
658 PROPOSITIONAL LOGIC
Each mi is a term that corresponds to one of the rows of the truth table for which
the function has value 1. Thus there are as many terms in the expression as there
Minterm are 1’s in the column for that function. Each of the terms mi is called a minterm
and has a special form that we shall describe below.
To begin our explanation of minterms, a literal is an expression that is either
a single propositional variable, such as p, or a negated variable, such as NOT p,
Literal which we shall often write as p̄. If the truth table has k variable columns, then
each minterm consists of the logical AND, or “product,” of k literals. Let r be a row
for which we wish to construct the minterm. If the variable p has the value 1 in
row r, then select literal p. If p has value 0 in row r, then select p̄ as the literal.
The minterm for row r is the product of the literals for each variable. Clearly, the
minterm can only have the value 1 if all the variables have the values that appear
in row r of the truth table.
Now construct an expression for the function by taking the logical OR, or “sum,”
of those minterms that correspond to rows with 1 as the value of the function. The
Sum of resulting expression is in “sum of products” form, or disjunctive normal form. The
products; expression is correct, because it has the value 1 exactly when there is a minterm
disjunctive with value 1; this minterm cannot be 1 unless the values of the variables correspond
normal form to the row of the truth table for that minterm, and that row has value 1.
This expression is more complex than (12.5). However, we shall see in the next
section how expression (12.5) can be derived.
Similarly, we can construct a logical expression for the sum-bit z by taking the
sum of the minterms for rows 1, 2, 4, and 7 to obtain
x̄ȳc + x̄yc̄ + xȳc̄ + xyc
✦
SEC. 12.5 FROM BOOLEAN FUNCTIONS TO LOGICAL EXPRESSIONS 659
p q r a b
0 0 0 0 1
0 0 1 1 1
0 1 0 0 1
0 1 1 0 0
1 0 0 1 0
1 0 1 1 0
1 1 0 1 0
1 1 1 1 0
EXERCISES
12.5.1: Figure 12.10 is a truth table that defines two Boolean functions, a and b, in
terms of variables p, q, and r. Write sum-of-products expressions for each of these
functions.
12.5.2: Write product-of-sums expressions (see the box on “Product-of-Sums Ex-
pressions”) for
a) Function a of Fig. 12.10.
b) Function b of Fig. 12.10.
c) Function z of Fig. 12.8.
660 PROPOSITIONAL LOGIC
Product-of-Sums Expressions
There is a dual way to convert a truth table into an expression involving AND, OR,
Conjunctive and NOT; this time, the expression will be a product (logical AND) of sums (logical
normal form OR) of literals. This form is called “product-of-sums,” or conjunctive normal form.
For each row of a truth table, we can define a maxterm, which is the sum of
those literals that disagree with the value of one of the argument variables in that
row. That is, if the row has value 0 for variable p, then use literal p, and if the
Maxterm value of that row for p is 1, then use p̄. The value of the maxterm is thus 1 unless
each variable p has the value specified for p by that row.
Thus, if we look at all the rows of the truth table for which the value is 0, and
take the logical AND of the maxterms for all those rows, our expression will be 0
exactly when the inputs match one of the rows for which the function is to be 0. It
follows that the expression has value 1 for all the other rows, that is, those rows for
which the truth table gives the value 1. For example, the rows with value 0 for d
in Fig. 12.8 are numbered 0, 1, 2, and 4. The maxterm for row 0 is x + y + c, and
that for row 1 is x + y + c̄, for example. The product-of-sums expression for d is
(x + y + c)(x + y + c̄)(x + ȳ + c)(x̄ + y + c)
This expression is equivalent to (12.5) and (12.6).
12.5.3**: Which of the following logical operators form a complete set of operators
by themselves: (a) ≡ (b) → (c) NOR? Prove your answer in each case.
12.5.4**: Of the 16 Boolean functions of two variables, how many are complete by
themselves?
12.5.5*: Show that the AND and OR of monotone functions is monotone. Then show
that any expression with operators AND and OR only, is monotone.
✦
✦ ✦
✦
12.6 Designing Logical Expressions by Karnaugh Maps
In this section, we present a tabular technique for finding sum-of-products expres-
sions for Boolean functions. The expressions produced are often simpler than those
constructed in the previous section by the expedient of taking the logical OR of all
the necessary minterms in the truth table.
For instance, in Example 12.7 we did an ad hoc design of an expression for
the carry-out function of a one-bit adder. We saw that it was possible to use a
product of literals that was not a minterm; that is, it was missing literals for some
of the variables. For example, we used the product of literals xy to cover the sixth
and seventh rows of Fig. 12.8, in the sense that xy has value 1 exactly when the
variables x, y, and c have the values indicated by one of those two rows.
Similarly, in Example 12.7 we used the expression xc to cover rows 5 and 7,
and we used yc to cover rows 3 and 7. Note that row 7 is covered by all three
expressions. There is no harm in that. In fact, had we used only the minterms for
rows 5 and 3, which are xȳc and x̄yc, respectively, in place of xc and yc, we would
have obtained an expression that was correct, but that had two more occurrences
of operators than the expression xy + xc + yc obtained in Example 12.7.
SEC. 12.6 DESIGNING LOGICAL EXPRESSIONS BY KARNAUGH MAPS 661
The essential concept here is that if we have two minterms differing only by
the negation of one variable, such as xyc̄ and xyc for rows 6 and 7, respectively,
we can combine the two minterms by taking the common literals and dropping the
variable in which the terms differ. This observation follows from the general law
(pq + p̄q) ≡ q
To see this equivalence, note that if q is true, then either pq is true, or p̄q is true,
and conversely, when either pq or p̄q is true, then it must be that q is true.
We shall see a technique for verifying such laws in the next section, but, for
the moment, we can let the intuitive meaning of our law justify its use. Note also
that use of this law is not limited to minterms. We could, for example, let p be
any propositional variable and q be any product of literals. Thus we can combine
any two products of literals that differ only in one variable (one product has the
variable itself and the other its complement), replacing the two products by the one
product of the common literals.
Karnaugh Maps
There is a graphical technique for designing sum-of-products expressions from truth
tables; the method works well for Boolean functions up to four variables. The
idea is to write a truth table as a two-dimensional array called a Karnaugh map
(pronounced “car-no”) whose entries, or “points,” each represent a row of the truth
table. By keeping adjacent the points that represent rows differing in only one
variable, we can “see” useful products of literals as certain rectangles, all of whose
points have the value 1.
✦ Example 12.10. In Fig. 12.11 we see the Karnaugh map for the “implies”
function, p → q. There are four points corresponding to the four possible values for
p and q. Note that “implies” has value 1 except when p = 1 and q = 0, and so the
only point in the Karnaugh map with value 0 is the entry for p = 1 and q = 0; all
the other points have value 1. ✦
Implicants
An implicant for a Boolean function f is a product x of literals for which no assign-
ment of values to the variables of f makes x true and f false. For example, every
minterm for which the function f has value 1 is an implicant of f . However, there
are other products that can also be implicants, and we shall learn to read these off
of the Karnaugh map for f .
662 PROPOSITIONAL LOGIC
0 1
0 1 1
p
1 0 1
Covering points An implicant is said to cover the points for which it has the value 1. A logical
of a Karnaugh expression can be constructed for a Boolean function by taking the OR of a set of
map implicants that together cover all points for which that function has value 1.
✦ Example 12.12. Figure 12.12 shows two implicants in the Karnaugh map for
the “implies” function. The larger, which covers two points, corresponds to the
single literal, p̄. This implicant covers the top two points of the map, both of which
have 1’s in them. The smaller implicant, pq, covers the point p = 1 and q = 1.
Since these two implicants together cover all the points that have value 1, their
sum, p̄ + pq, is an equivalent expression for p → q; that is, (p → q) ≡ (p̄ + pq). ✦
0 1
0 1 1
p
1 0 1
Prime Implicants
A prime implicant x for a Boolean function f is an implicant for f that ceases to
be an implicant for f if any literal in x is deleted. In effect, a prime implicant is an
implicant that has as few literals as possible.
Note that the bigger a rectangle is, the fewer literals there are in its product.
We would generally prefer to replace a product with many literals by one with fewer
literals, which involves fewer occurrences of operators, and thus is “simpler.” We
are thus motivated to consider only those implicants that are prime, when selecting
a set of implicants to cover a map.
Remember that every implicant for a given Karnaugh map consists only of
points with 1’s. An implicant is a prime implicant because expanding it further by
doubling its size would force us to cover a point with value 0.
✦ Example 12.13. In Fig. 12.12, the larger implicant p̄ is prime, since the only
possible larger implicant is the entire map, which cannot be used because it contains
a 0. The smaller implicant pq is not prime, since it is contained in the second
column, which consists only of 1’s, and is therefore an implicant for the “implies”
Karnaugh map. Figure 12.13 shows the only possible choice of prime implicants for
the “implies” map.1 They correspond to the products p̄ and q, and they give rise
to the expression p̄ + q, which we noted in Section 12.3 was equivalent to p → q. ✦
1 In general, there may be many sets of prime implicants that cover a given Karnaugh map.
664 PROPOSITIONAL LOGIC
0 1
0 1 1
p
1 0 1
yc
00 01 11 10
0 0 0 1 0
x
1 0 1 1 1
✦ Example 12.14. The three prime implicants for the carry-out function were
indicated in Fig. 12.14. We may convert each to a product of literals; see the box
“Reading Implicants from the Karnaugh Map.” The corresponding products are
xc for the leftmost one, yc for the vertical one, and xy for the rightmost one. The
sum of these three expressions is the sum-of-products that we obtained informally
in Example 12.7; we now see how this expression was obtained. ✦
✦ Example 12.15. Figure 12.15 shows the Karnaugh map for the three-variable
Boolean function NAND (p, q, r). The prime implicants are
1. The first row, which corresponds to p̄.
2. The first two columns, which correspond to q̄.
3. Columns 1 and 4, which correspond to r̄.
The sum-of-products expression for this map is p̄ + q̄ + r̄. ✦
666 PROPOSITIONAL LOGIC
qr
00 01 11 10
0 1 1 1 1
p
1 1 1 0 1
Fig. 12.15. Karnaugh map with prime implicants p̄, q̄, and r̄ for NAND(p, q, r).
rs
00 01 11 10
00 1 1 0 1
01 1 0 0 0
pq
11 0 0 0 0
10 1 0 0 0
Fig. 12.16. Karnaugh map with prime implicants for the “at most one 1” function.
1. Any point.
2. Any two horizontally or vertically adjacent points, including those that are
adjacent in the end-around sense.
4. Any 2 × 2 square, including those in the end-around sense, such as two adjacent
points in the top row and the two points in the bottom row that are in the same
columns. The four corners is, as we mentioned, a special case of a “square” as
well.
✦ Example 12.16. Figure 12.16 shows the Karnaugh map of a Boolean function
of four variables, p, q, r, and s, that has the value 1 when at most one of the inputs
is 1. There are four prime implicants, all of size 2, and two of them are end-around.
The implicant consisting of the first and last points of the top row has points that
agree in variables p, q, and s; the common value is 0 for each variable. Thus its
product of literals is p̄q̄s̄. Similarly, the other implicants have products p̄q̄r̄, p̄r̄s̄,
and q̄r̄s̄. The expression for the function is thus
p̄q̄r̄ + p̄q̄s̄ + p̄r̄s̄ + q̄r̄s̄
✦
rs
00 01 11 10
00 1 1 0 1
01 0 0 0 1
pq
11 1 0 0 0
10 1 0 1 1
✦ Example 12.17. The map of Fig. 12.17 was chosen for the pattern of its 1’s,
rather than for any significance its function has. It does illustrate an important
point. Five prime implicants that together cover all the 1 points are shown, in-
cluding the all-corners implicant (shown dashed), for which the product of literals
expression is q̄s̄; the other four prime implicants have products p̄q̄r̄, p̄rs̄, pq̄r, and
pr̄s̄.
We might think, from the examples seen so far, that to form the logical expres-
sion for this map we should take the logical OR of all five implicants. However, a
moment’s reflection tells us that the largest implicant, q̄s̄, is superfluous, since all
its points are covered by other prime implicants. Moreover, this is the only prime
implicant that we have the option to eliminate, since each other prime implicant
has a point that only it covers. For example, p̄q̄r̄ is the only prime implicant to
cover the point in the first row and second column. Thus
p̄q̄r̄ + p̄rs̄ + pq̄r + pr̄s̄
is the preferred sum-of-products expression obtained from the map of Fig. 12.17. ✦
EXERCISES
12.6.1: Draw the Karnaugh maps for the following functions of variables p, q, r,
and s.
a) The function that is TRUE if one, two, or three of p, q, r, and s are TRUE, but
not if zero or all four are TRUE.
b) The function that is TRUE if up to two of p, q, r, and s are TRUE, but not if
three or four are TRUE.
c) The function that is TRUE if one, three, or four of p, q, r, and s are TRUE, but
not if zero or two are TRUE.
d) The function represented by the logical expression pqr → s.
e) The function that is TRUE if pqrs, regarded as a binary number, has value less
than ten.
12.6.2: Find the implicants — other than the minterms — for each of your Kar-
naugh maps from Exercise 12.6.1. Which of them are prime implicants? For each
function, find a sum of prime implicants that covers all the 1’s of the map. Do you
need to use all the prime implicants?
12.6.3: Show that every product in a sum-of-products expression for a Boolean
function is an implicant of that function.
12.6.4*: One can also construct a product-of-sums expression from a Karnaugh
map. We begin by finding rectangles of the types that form implicants, but with all
Anti-implicant points 0, instead of all points 1. Call such a rectangle an “anti-implicant.” We can
construct for each anti-implicant a sum of literals that is 1 on all points but those of
the anti-implicant. For each variable x, this sum has literal x if the anti-implicant
includes only points for which x = 0, and it has literal x̄ if the anti-implicant has
only points for which x = 1. Otherwise, the sum does not have a literal involving
x. Find all the prime anti-implicants for your Karnaugh maps of Exercise 12.6.1.
12.6.5: Using your answer to Exercise 12.6.4, write product-of-sums expressions
for each of the functions of Exercise 12.6.1. Include as few sums as you can.
SEC. 12.7 TAUTOLOGIES 669
12.6.6**: How many (a) 1 × 2 (b) 2 × 2 (c) 1 × 4 (d) 2 × 4 rectangles that form
implicants are there in a 4×4 Karnaugh map? Describe their implicants as products
of literals, assuming the variables are p, q, r, and s.
✦
✦ ✦
✦
12.7 Tautologies
A tautology is a logical expression whose value is true regardless of the values of its
propositional variables. For a tautology, all the rows of the truth table, or all the
points in the Karnaugh map, have the value 1. Simple examples of tautologies are
TRUE
p + p̄
(p + q) ≡ (p + p̄q)
Tautologies have many important uses. For example, suppose we have an
expression of the form E1 ≡ E2 that is a tautology. Then, whenever we have an
instance of E1 within any expression, we can replace E1 by E2 , and the resulting
expression will represent the same Boolean function.
Figure 12.18(a) shows the expression tree for a logical expression F containing
E1 as a subexpression. Figure 12.18(b) shows the same expression with E1 replaced
by E2 . If E1 ≡ E2 , the values of the roots of the two trees must be the same, no
matter what assignment of truth values is made to the variables. The reason is that
we know the nodes marked n in the two trees, which are the roots of the expression
trees for E1 and E2 , must get the same value in both trees, because E1 ≡ E2 . The
evaluation of the trees above n will surely yield the same value, proving that the
two trees are equivalent. The ability to substitute equivalent expressions for one
Substitution another is colloquially known as the “substitution of equals for equals.” Note that
of equals for in other algebras, such as those for arithmetic, sets, relations, or regular expressions
equals we also may substitute one expression for another that has the same value.
F F
n n
E1 E2
✦ Example 12.18. Consider the associative law for the logical operator OR, which
can be phrased as the expression
(p + q) + r ≡ p + (q + r) (12.7)
670 PROPOSITIONAL LOGIC
The truth table for the various subexpressions appears in Fig. 12.19. The final
column, labeled E, represents the entire expression. Observe that every row has
value 1 for E, showing that the expression (12.7) is a tautology. As a result, any
time we see an expression of the form (p + q) + r, we are free to replace it by
p + (q + r). Note that p, q, and r can stand for any expressions, as long as the
same expression is used for both occurrences of p, and q and r are likewise treated
consistently. ✦
p q r p+q (p + q) + r q+r p + (q + r) E
0 0 0 0 0 0 0 1
0 0 1 0 1 1 1 1
0 1 0 1 1 1 1 1
0 1 1 1 1 1 1 1
1 0 0 1 1 0 1 1
1 0 1 1 1 1 1 1
1 1 0 1 1 1 1 1
1 1 1 1 1 1 1 1
Fig. 12.19. Truth table proving the associative law for OR.
✦ Example 12.19. The commutative law for the logical operator AND can be
verified by showing that the logical expression pq ≡ qp is a tautology. To get some
instances of this law, we can perform substitutions on this expression. For example,
we could substitute r + s for p and r̄ for q to get the equivalence
(r + s)(r̄) ≡ (r̄)(r + s)
Note that we put parentheses around each substituted expression to avoid acci-
dentally changing the grouping of operators because of our operator-precedence
conventions. In this case, the parentheses around r + s are essential, but the paren-
theses around r̄ could be omitted.
Some other substitution instances follow. We could replace p by r and not
replace q, to get rq ≡ qr. We could leave p alone and replace q by the constant
expression 1 (TRUE), to get p AND 1 ≡ 1 AND p. However, we cannot substitute r
for the first occurrence of p and substitute a different expression, say r + s, for the
2 We should not confuse the substitution principle with the “substitution of equals for equals.”
The substitution principle applies to tautologies only, while we may substitute equals for
equals in any expression.
SEC. 12.7 TAUTOLOGIES 671
AND AND
p q q p
The reason the substitution principle holds true can be seen if we think about
expression trees. Imagine the expression tree for some tautology, such as the one
discussed in Example 12.19, which we show in Fig. 12.20. Since the expression is
a tautology, we know that, whatever assignment of truth values we make for the
propositional variables at the leaves, the value at the root is true (as long as we
assign the same value to each leaf that is labeled by a given variable).
Now suppose that we substitute for p an expression with tree Tp and that we
substitute for q an expression with tree Tq ; in general, we select one tree for each
variable of the tautology, and replace all leaves for that variable by the tree selected
for that variable.3 Then we have a new expression tree similar to that suggested
by Fig. 12.21. When we make an assignment of truth values for the variables of
the new tree, the value of each node that is a root of a tree Tp gets the same value,
because the same evaluation steps are performed underneath any such node.
AND AND
Tp Tq Tq Tp
Once the roots of the trees like Tp and Tq in Fig. 12.21 are evaluated, we have
a consistent assignment of values to the variables at the leaves of the original tree,
which we illustrated in Fig. 12.20. That is, we take whatever value is computed
for the occurrences of Tp , which must all be the same value, and assign it to all
3 As a special case, the tree selected for some variable x can be a single node labeled x, which
is the same as making no substitution for x.
672 PROPOSITIONAL LOGIC
the leaves labeled p in the original tree. We do the same for q, and in general,
for any variable appearing in the original tree. Since the original tree represents a
tautology, we know that evaluating that tree will result in the value TRUE at the
root. But above the substituted trees, the new and original trees are the same, and
so the new tree also produces value TRUE at the root. Since the above reasoning
holds true no matter what substitution of values we make for the variables of the
new tree, we conclude that the expression represented by the new tree is also a
tautology.
NPC
NP
EXERCISES
12.7.1: Which of the following expressions are tautologies?
SEC. 12.7 TAUTOLOGIES 673
Inherent Intractability
The tautology problem, “Is E a tautology,” is an important example of a problem
that appears to be inherently exponential. That is, if k is the number of variables
in expression E, all the known algorithms to solve the tautology problem have a
running time that is an exponential function of k.
NP-complete There is a family of problems, called NP-complete, which includes many impor-
problem tant optimization problems that no one knows how to solve in less than exponential
time. Many mathematicians and scientists have worked long and hard trying to find
for at least one of these problems an algorithm that runs in less than exponential
time, but no such algorithm has yet been found, and so many people now suspect
that none exists.
Satisfiability One of the classic NP-complete problems is the satisfiability problem: “Is there
problem a truth assignment that makes logical expression E true?” Satisfiability is closely
related to the tautology problem, and as with that problem, no significantly better
solution to the satisfiability problem is known than cycling through all possible
truth assignments.
Either all the NP-complete problems have less-than-exponential time solutions,
or none do. The fact that each NP-complete problem appears to require exponential
time thus reinforces our belief that all are inherently exponential-time problems.
Thus we have strong evidence that the straightforward satisfiability test is about
the best we can do.
Incidentally, “NP” stands for “nondeterministic polynomial.” “Nondetermin-
istic” informally means “the ability to guess right,” as discussed in Section 10.3.
A problem can be “solved in nondeterministic polynomial time” if, given a guess
at a solution for some instance of size n, we can check that the guess is correct in
polynomial time, that is, in time nc for some constant c.
Satisfiability is an example of such a problem. If someone gave us an assign-
ment of truth values to variables that they claimed, or guessed, made expression E
evaluate to 1, we could evaluate E with that assignment to its operands, and check,
in time at most quadratic in the length of E, that the expression is satisfiable.
The class of problems that — like satisfiability — can be “solved” by guessing
followed by a polynomial time check is called N P . Some problems in N P are
actually quite easy, and can be solved without the guessing, still taking only time
that is polynomial in the length of the input. However, there are many problems in
N P that can be proved to be as hard as any in N P , and these are the NP-complete
problems. (Do not confuse “completeness” in this sense, meaning “hardest in the
class,” with “complete set of operators” meaning “able to express every Boolean
function.”)
The family of problems solvable in polynomial time with no guessing is often
called P . Figure 12.22 shows the relationship between P , N P , and the NP-complete
problems. If any NP-complete problem is in P , then P = N P , something we doubt
very much is the case, because all the known NP-complete problems, and some other
problems in N P , appear not to be in P . The tautology problem is not believed to
NP-hard be in N P , but it is as hard or harder than any problem in N P (called an NP-hard
problem problem) and if the tautology problem is in P , then P = N P .
674 PROPOSITIONAL LOGIC
a) pqr → p + q
b) (p → q)(q → r) → (p → r)
c) (p → q) → p
d) p ≡ (q + r) → (q̄ → pr)
12.7.2*: Suppose we had an algorithm to solve the tautology problem for a logical
expression. Show how this algorithm could be used to
a) Determine whether two expressions were equivalent.
b) Solve the satisfiability problem (see the box on “Inherent Intractability”).
✦
✦ ✦
✦
12.8 Some Algebraic Laws for Logical Expressions
In this section, we shall enumerate some useful tautologies. In each case, we shall
state the law, leaving the tautology test to be carried out by the reader by con-
structing the truth table.
Laws of Equivalence
We begin with some observations about how equivalence works. The reader should
notice the dual role played by equivalence. It is one of a number of operators that
we use in logical expressions. However, it is also a signal that two expressions are
“equal,” and that one can be substituted for the other. Thus a tautology of the
form E1 ≡ E2 tells us something about E1 and E2 , namely that either can be
substituted for the other within larger expressions, using the principle “equals can
be substituted for equals.”
Further, we can use equivalences to prove other equivalences. If we have a
sequence of expressions E1 , E2 , . . . , Ek , such that each is derived from the previous
one by a substitution of equals for equals, then each of these expressions gives the
same value when evaluated with the same truth assignment. As a consequence,
E1 ≡ Ek must be a tautology.
12.1. Reflexivity of equivalence: p ≡ p.
As with all the laws we state, the principle of substitution applies, and we
may replace p by any expression. Thus this law says that any expression is
equivalent to itself.
12.2. Commutative law for equivalence: (p ≡ q) ≡ (q ≡ p).
Informally, p is equivalent to q if and only if q is equivalent to p. By the principle
of substitution, if any expression E1 is equivalent to another expression E2 , then
E2 is equivalent to E1 . Thus either of E1 and E2 may be substituted for the
other.
12.3. Transitive law for equivalence: (p ≡ q) AND (q ≡ r) → (p ≡ r).
Informally, if p is equivalent to q, and q is equivalent to r, then p is equivalent to
r. An important consequence of this law is that if we have found both E1 ≡ E2
and E2 ≡ E3 to be tautologies, then E1 ≡ E3 is a tautology.
12.4. Equivalence of the negations: (p ≡ q) ≡ (p̄ ≡ q̄).
Two expressions are equivalent if and only if their negations are equivalent.
SEC. 12.8 SOME ALGEBRAIC LAWS FOR LOGICAL EXPRESSIONS 675
4 Of course, (0 AND p) ≡ 0 holds as well. We shall not, in the future, mention all the conse-
quences of the commutative laws.
676 PROPOSITIONAL LOGIC
DeMorgan’s Laws
There are two laws that allow us to push NOT’s through an expression of AND’s and
OR’s, resulting in an expression in which all the negations apply to propositional
variables. The resulting expression is an AND-OR expression applied to literals. In-
tuitively, if we negate an expression with AND’s and OR’s, we can push the negation
down the expression tree, “flipping” operators as we go. That is, each AND becomes
an OR, and vice versa. Finally, the negations reach the leaves, where they stay, un-
less they meet a negated literal, in which case we can remove two negations by law
12.13. We must be careful, when we construct the new expression, to place paren-
theses properly, because the precedence of operators changes when we exchange
AND’s and OR’s.
The basic rules are called “DeMorgan’s laws.” They are the following two
tautologies.
12.20. DeMorgan’s laws.
a) NOT (pq) ≡ p̄ + q̄.
b) NOT (p + q) ≡ p̄q̄.
Part (a) says that p and q are not both true exactly when at least one of them
is false, and (b) says that neither p nor q is true if and only if they are both
false. We can generalize these two laws to allow any number of propositional
variables as follows.
c) NOT (p1 p2 · · · pk ) ≡ (p¯1 + p¯2 + · · · + p¯k ).
d) NOT (p1 + p2 + · · · + pk ) ≡ (p¯1 p¯2 · · · p¯k ).
For example, (d) says that none of some collection of expressions is true if and
only if all of them are false.
✦ Example 12.20. We have seen in Sections 12.5 and 12.6 how to construct sum-
of-products expressions for arbitrary logical expressions. Suppose we start with an
arbitrary such expression E, which we may write as E1 + E2 + · · · + Ek , where each
Ei is the AND of literals. We can construct a product-of-sums expression for NOT E,
by starting with
NOT (E1 + E2 + · · · + Ek )
and applying DeMorgan’s law (d) to get
NOT (E1 ) NOT (E2 ) · · · NOT (Ek ) (12.8)
Now let Ei be the product of literals Xi1 Xi2 · · · Xiji , where each X is either a
variable or its negation. Then we can apply (c) to NOT (Ei ) to turn it into
678 PROPOSITIONAL LOGIC
is above an AND flips it to an OR, and causes NOT’s to appear on its two arguments.
The right-hand argument becomes NOT q, while the left-hand argument, which was
NOT p, becomes NOT NOT p, or simply p. The resulting tree is shown in Fig. 12.23(c).
The tree of Fig. 12.23(c) represents the expression p̄(p + q̄) ≡ p̄q̄. To get the
expression into the form of law 12.19(a), we must negate the variables. That is, we
substitute expression p̄ for p and q̄ for q. When we eliminate the double negations,
we are left with exactly 12.19(a). ✦
EXERCISES
12.8.1: Check, by constructing the truth tables, that each of the laws 12.1 to 12.24
are tautologies.
12.8.2: We can substitute expressions for any propositional variable in a tautology
and get another tautology. Substitute x + y for p, yz for q, and x̄ for r in each of the
tautologies 12.1 to 12.24, to get new tautologies. Do not forget to put parentheses
around the substituted expressions if needed.
12.8.3: Prove that
a) p1 + p2 + · · · + pn is equivalent to the sum (logical OR) of the pi ’s in any order.
b) p1 p2 · · · pn is equivalent to the product (logical AND) of the pi ’s in any order.
Hint : A similar result was shown for addition in Section 2.4.
680 PROPOSITIONAL LOGIC
NOT NOT
OR OR
p AND p q
NOT q
AND AND
p AND p q
NOT q
AND AND
p p NOT p q
12.8.4*: Use laws given in this section to transform the first of each pair of ex-
pressions into the second. To save effort, you may omit steps that use laws 12.5
through 12.13, which are analogous to arithmetic. For example, commutativity and
associativity of AND and OR may be assumed.
a) Transform pq + rs into (p + r)(p + s)(q + r)(q + s).
b) Transform pq + pq̄r into p(q + r).
c) Transform pq + pq̄ + p̄q + p̄q̄ into 1. (This transformation requires law 12.25
from the next section.)
d) Transform pq → r into (p → r) + (q → r).
e) Transform NOT (pq → r) into pqr̄.
12.8.5*: Show that the subsumption laws, 12.18(a) and (b), follow from previously
given laws, in the sense that it is possible to transform p + pq into p and transform
p(p + q) into p using only laws 12.1 through 12.17.
12.8.6: Apply DeMorgan’s laws to turn the following expressions into expressions
where the only NOT’s are applied to propositional variables (i.e., the NOT’s appear
in literals only).
a) NOT (pq + p̄r)
b) NOT (NOT p + q(NOT (r + s̄)))
12.8.7*: Prove the generalized DeMorgan’s laws 12.20(c) and (d) by induction
on k, using the basic laws 12.20(a) and (b). Then, justify the generalized laws
informally by describing what the 2k -row truth tables for each expression and their
subexpressions look like.
12.8.8*: Find the pairs of laws in this section that are duals of one another.
12.8.9*: Prove law 12.24(b) by induction on n.
12.8.10*: Show that law 12.24(b) holds by describing the 2n rows of the truth
table for the expression and each of its subexpressions.
12.8.11: Simplify the following by using the subsumption laws and the commutative
and associative laws for AND and OR.
a) wx̄ + wx̄y + z̄ x̄w
b) (w + x̄)(w + y + z̄)(w̄ + x̄ + ȳ)(x̄)
12.8.12*: Show that the arithmetic analogs of laws 12.14 through 12.20 are false,
by giving specific numbers for which the analogous equalities to not hold.
12.8.13*: If we start with logical expression whose only operators are AND, OR, and
NOT, we can push all the NOT’s down the tree until the only NOT’s are immediately
above propositions; that is, the expression is the AND and OR of literals. Prove that
we can do so. Hint : Whenever we see a NOT, either it is immediately above another
NOT (in which case we can eliminate them both by rule 12.13), or above a proposition
(in which case the statement is satisfied), or it is above an AND and OR (in which case
we can use DeMorgan’s laws to push it down one level). However, a proof that we
eventually reach an equivalent expression with all NOT’s above propositions cannot
proceed by induction on an obvious “size” measure such as the sum of the heights
of the nodes labeled NOT. The reason is that when we use DeMorgan’s laws to push
one NOT down, it is replaced by two NOT’s, and this sum might increase. In order
682 PROPOSITIONAL LOGIC
to prove that we eventually reach an expression with all NOT’s above propositions,
you need to find a suitable “size” measure that always decreases when DeMorgan’s
laws are applied in the direction where a NOT is pushed down below an AND or OR.
Find such a size measure and prove the claim.
✦
✦ ✦
✦
12.9 Tautologies and Methods of Proof
In the past three sections, we have seen one aspect of logic: its use as a design
theory. In Section 12.6 we saw how to use Karnaugh maps to design expressions
given a Boolean function, and in Chapter 13 we shall see how this methodology
helps design the switching circuits from which computers and other digital devices
are built. Sections 12.7 and 12.8 introduced us to tautologies, which can be used
to simplify expressions, and therefore serve as another important tool when good
expressions must be designed for a given Boolean function.
A second important use of logic will be seen in this section. When people reason
or prove statements of mathematics, they use a variety of techniques to further their
arguments. Examples of these techniques are
1. Case analysis,
2. Proof of the contrapositive,
3. Proof by contradiction, and
4. Proof by reduction to truth.
In this section we shall define these techniques, showing how each can be used in
proofs. We also show how these techniques are justified by certain tautologies of
propositional logic.
(pq + p̄q) ≡ q
as desired. ✦
That is, the two cases occur when p is true and when p is false. If q is implied
in both cases, then q must be true. We leave it as an exercise to show that
12.26 follows from 12.25 and other laws we have proved.
12.27. pp̄ ≡ 0.
A proposition and its negation cannot both be true simultaneously. This law
is vital when we make a “proof by contradiction.” We discuss this technique of
proof shortly, in law 12.29, and also in Section 12.11, when we cover resolution
proofs.
✦ Example 12.23. Let us consider a simple example of a proof that shows how
the contrapositive law may be used. This example also shows the limitations of
propositional logic in proofs. Logic takes us part of the way, allowing us to reason
about statements without reference to what the statements mean. However, to
get a complete proof, we normally have to make some argument that refers to the
meaning of our terms. For this example, we need to know what concepts about
integers, like “prime,” “odd,” and “greater than” mean.
We shall consider three propositions about a positive integer x:
a “x > 2”
b “x is a prime”
c “x is odd”
We begin by applying some of the laws we have studied to turn the expression
ab → c into an equivalent expression that is more amenable to proof. First, we use
law 12.28 to turn it into its contrapositive, c̄ → NOT (ab). Then we use DeMorgan’s
law 12.20(a) to turn NOT (ab) into ā + b̄. That is, we have transformed the theorem
to be proved into c̄ → (ā + b̄). Put another way, we need to prove that
STATEMENT “If x is not odd, then x is not greater than 2 or x is not prime.”
We can replace “not odd” by “even,” “not greater than 2” by “equal to or less than
2,” and “not prime” by “composite.” Thus we want to prove
Now we have gone as far as we can go with propositional logic, and we must
start talking about the meaning of our terms. If x is even, then x = 2y for some
integer y; that is what it means for x to be even. Since x is assumed through this
proof to be a positive integer, y must be 1 or greater.
Now we use case analysis, considering the cases where y is 1, and y is greater
than 1, which are the only possibilities, since we just argued that y ≥ 1. If y = 1,
then x = 2, and so we have proved x ≤ 2. If y > 1, then x is the product of two
integers, 2 and y, both greater than 1, which means that x is composite. Thus we
have shown that if x is even, then either x ≤ 2 (in the case y = 1) or x is composite
(in the case y > 1). ✦
Proof by Contradiction
Law 12.13, the elimination of double negatives, lets us replace NOT (p̄) by p,
and so
(p̄ → 0) ≡ (p + 0)
Equivalence to Truth
Our next proof method allows us to prove an expression to be a tautology by
transforming it by substitution of equals for equals until the expression is reduced
to 1 (TRUE).
12.30. Proof by equivalence to truth: (p ≡ 1) ≡ p.
✦ Example 12.25. The expression rs → r says the AND of two expressions implies
the first of them (and by commutativity of AND, also implies the second). We can
show that rs → r is a tautology by the following sequence of equivalences.
rs → r
1) ≡ NOT (rs) + r
2) ≡ (r̄ + s̄) + r
3) ≡ 1 + s̄
4) ≡ 1
(1) follows by applying law 12.24, the definition of → in terms of AND and OR.
(2) is an application of DeMorgan’s law. (3) follows when we use 12.7 and 12.8 to
reorder terms and then replace r + r̄ by 1 according to law 12.25. Finally, (4) is an
application of law 12.12, the fact that 1 is an annihilator for OR. ✦
EXERCISES
12.9.1: Show that laws 12.25 and 12.27 are duals of each other.
12.9.2*: We would like to prove the theorem “If x is a perfect square and x is even,
then x is divisible by 4.”
686 PROPOSITIONAL LOGIC
✦
✦ ✦
✦
12.10 Deduction
We have seen logic as a design theory in Sections 12.6 to 12.8, and as a formalization
of proof techniques in Section 12.9. Now, let us see a third side of the picture: the use
of logic in deduction, that is, in sequences of statements that constitute a complete
Hypothesis and proof. Deduction should be familiar to the reader from the study of plane geometry
conclusion in high school, where we learn to start with certain hypotheses (the “givens”), and
to prove a conclusion by a sequence of steps, each of which follows from previous
steps by one of a limited number of reasons, called inference rules. In this section,
Inference rule we explain what constitutes a deductive proof and give a number of examples.
Unfortunately, discovering a deductive proof for a tautology is difficult. As we
mentioned in Section 12.7, it is an example of an “inherently intractable” problem,
in the NP-hard class. Thus we cannot expect to find deductive proofs except by
luck or by exhaustive search. In Section 12.11, we shall discuss resolution proofs,
which appear to be a good heuristic for finding proofs, although in the worst case
this technique, like all others, must take exponential time.
Applications of Deduction
In addition to being the stuff of which all proofs in mathematics are ultimately made,
deduction or formal proof has many uses in computer science. One application is
Automated automated theorem proving. There are systems that find proofs of theorems by
theorem searching for sequences of steps that proceed from hypotheses to conclusion. Some
proving systems search for proofs on their own, and others work interactively with the user,
taking hints and filling in small gaps in the sequence of steps that form a proof. Some
believe that these systems will eventually be useful for proving the correctness of
programs, although much progress must be made before such facilities are practical.
A second use of deductive proofs is in programming languages that relate de-
duction to computation. As a very simple example, a robot finding its way through
a maze might represent its possible states by a finite set of positions in the centers
of hallways. We could draw a graph in which the nodes represent the positions, and
an arc u → v means that it is possible for the robot to move from position u to
position v by a simple move because u and v represent adjacent hallways.
We could also think of the positions as propositions, where u stands for “The
robot can reach position u.” Then u → v can be interpreted not only as an arc,
but as a logical implication, that is, “If the robot can reach u, then it can reach
v.” (Note the “pun”; the arrow can represent an arc or an implication.) A natural
question to ask is: “What positions can the robot reach from position a.”
We can phrase this question as a deduction if we take the expression a, and
all expressions u → v for adjacent positions u and v, as hypotheses, and see which
propositional variables x we can prove from these hypotheses. In this case, we
don’t really need a tool as powerful as deduction, because depth-first search works,
as discussed in Section 9.7. However, there are many related situations where graph-
theoretic methods are not effective, yet the problem can be couched as a deduction
and a reasonable solution obtained.
We are asked to prove w̄, that is, Joe never gets wet. In a sense, the matter is
trivial, since the reader may check that
(r → u) AND (u → w̄) AND (r̄ → w̄) → w̄
1) r→u Hypothesis
2) u → w̄ Hypothesis
3) (r → u) AND (u → w̄) (c) applied to (1) and (2)
4) (r → u) AND (u → w̄) → (r → w̄) Substitution into law (12.23)
5) r → w̄ Modus ponens, with (3) and (4)
6) r̄ → w̄ Hypothesis
7) (r → w̄) AND (r̄ → w̄) (c) applied to (5) and (6)
8) (r → w̄) AND (r̄ → w̄) ≡ w̄ Substitution into law (12.26)
9) w̄ (d) with (7) and (8)
That is, if p implies each of the qi ’s, and the qi ’s together imply r, then p implies r.
We find that (12.12) is a tautology by the following reasoning. The only way
that (12.12) could be false is if p → r were false and the left-hand side true. But
p → r can only be false if p is true and r is false, and so in what follows we shall
assume p and r̄. We must show that the left-hand side of (12.12) is then false.
If the left-hand side of (12.12) is true, then each of its subexpressions connected
by AND is true. For example, p → q1 is true. Since we assume p is true, the only
way p → q1 can be true is if q1 is true. Similarly, we can conclude that q2 , . . . , qn
are all true. Thus q1 q2 · · · qn → r, must be false, since we assume r is false and we
have just discovered that all the qi ’s are true.
We started by assuming that (12.12) was false and observed that the right-hand
side must therefore be true, and thus p and r̄ must be true. We then concluded that
the left-hand side of (12.12) is false when p is true and r is false. But if the left-hand
side of (12.12) is false, then (12.12) itself is true, and we have a contradiction. Thus
(12.12) can never be false and is therefore a tautology.
Note that if n = 1 in (12.12), then we have the usual transitive law of →,
which is law 12.23. Also, if n = 0, then (12.12) becomes (1 → r) → r, which is
a tautology. Recall that when n = 0, q1 q2 · · · qn is conventionally taken to be the
identity for AND, which is 1.
We also need a family of tautologies to justify the fact that we can add the
hypotheses to the proof. It is a generalization of a tautology discussed in Example
12.25. We claim that for any m and i such that 1 ≤ i ≤ m,
(p1 p2 · · · pm ) → pi (12.13)
is a tautology. That is, the AND of one or more propositions implies any one of them.
The expression (12.13) is a tautology because the only way it could be false is
if the left-hand side is true and the right-hand side, pi , is false. But if pi is false,
then the AND of pi and other p’s is surely false, so the left-hand side of (12.13) is
false. But (12.13) is true whenever its left-hand side is false.
Now we can prove that, given
1. Hypotheses E1 , E2 , . . . , Ek , and
2. A set of inference rules such that, whenever they allow us to write a line F ,
this line is either one of the Ei ’s, or there is a tautology
(F1 AND F2 AND · · · AND Fn ) → F
for some set of previous lines F1 , F2 , . . . , Fn ,
it must be that (E1 AND E2 AND · · · AND Ek ) → F is a tautology for each line F .
The induction is on the number of lines added to the proof.
BASIS. For a basis, we take zero lines. The statement holds, since it says something
about every line F of a proof, and there are no such lines to discuss. That is, our
inductive statement is really of the form “if F is a line then · · · ,” and we know such
an if-then statement is true if the condition is false.
INDUCTION. For the induction, suppose that for each previous line G,
(E1 AND E2 AND · · · AND Ek ) → G
SEC. 12.10 DEDUCTION 691
is a tautology. Let F be the next line added. There are two cases.
where each of the Fj ’s is one of the previous lines. By the inductive hypothesis,
(E1 AND E2 AND · · · AND Ek ) → Fj
for p, and substitute F for r, we know that any substitution of truth values for the
variables of the E’s and F ’s makes the left-hand side of (12.12) true. Since (12.12)
is a tautology, every assignment of truth values must also make the right-hand side
true. But the right-hand side is (E1 AND E2 AND · · · AND Ek ) → F . We conclude
that this expression is true for every assignment of truth values; that is, it is a
tautology.
for every line F of the proof. In particular, if the last line of the proof is our goal
E, we know (E1 AND E2 AND · · · AND Ek ) → E.
692 PROPOSITIONAL LOGIC
EXERCISES
12.10.1*: Give proofs of the following conclusions from the following hypotheses.
You may use inference rules (a) through (d). For tautologies you may use only the
laws stated in Sections 12.8 and 12.9 and tautologies that follow by using instances
of these laws to “substitute equals for equals.”
a) Hypotheses: p → q, p → r; conclusion: p → qr.
b) Hypotheses: p → (q + r), p → (q + r̄); conclusion: p → q.
c) Hypotheses: p → q, qr → s; conclusion: pr → s.
12.10.2: Justify why the following is a rule of inference. If E → F is a line, and G
is any expression whatsoever, then we can add E → (F OR G) as a line.
✦
✦ ✦
✦
12.11 Proofs by Resolution
As we mentioned earlier in this chapter, finding proofs is a hard problem, and since
the tautology problem is very likely to be inherently exponential, there is no general
way to make finding proofs easy. However, there are many techniques known that
for “typical” tautologies appear to help with the exploration needed in the search for
a proof. In this section we shall study a useful inference rule, called resolution, that
is perhaps the most basic of these techniques. Resolution is based on the following
tautology.
(p + q)(p̄ + r) → (q + r) (12.14)
The validity of this rule of inference is easy to check. The only way it could be false
is if q + r were false, and the left-hand side were true. If q + r is false, then both
q and r are false. Suppose p is true, so that p̄ is false. Then p̄ + r is false, and the
left-hand side of (12.14) must be false. Similarly, if p is false, then p + q is false,
which again tells us that the left-hand side is false. Thus it is impossible for the
right-hand side to be false while the left-hand side is true, and we conclude that
(12.14) is a tautology.
Clause The usual way resolution is applied is to convert our hypotheses into clauses,
which are sums (logical OR’s) of literals. We convert each of our hypotheses into
a product of clauses. Our proof begins with each of these clauses as a line of the
proof, and the justification is that each is “given.” We then apply the resolution
rule to construct additional lines, which will always turn out to be clauses. That
is, if q and r in (12.14) are each replaced by any sum of literals, then q + r will also
be a sum of literals.
In practice, we shall simplify clauses by removing duplicates. That is, both q
and r could include a literal X, in which case we shall remove one copy of X from
q + r. The justification is found in laws 12.17, 12.7, and 12.8, the idempotence,
commutativity, and associativity of OR. In general, a useful point of view is that a
clause is a set, rather than a list, of literals. The associative and commutative laws
allow us to order the literals any way we please, and the idempotent law allows us
to eliminate duplicates.
We also eliminate clauses that contain contradictory literals. That is, if both X
and X̄ are found in one clause, then by laws 12.25, 12.7, 12.8, and 12.15, the clause
SEC. 12.11 PROOFS BY RESOLUTION 693
In order to make resolution work, we need to put all hypotheses, and the conclu-
sion, into product-of-sums form, or “conjunctive normal form.” There are several
approaches that may be taken. Perhaps the simplest is the following.
1. First, we get rid of any operators except AND, OR, and NOT. We replace E ≡ F
by (E → F )(F → E), by law 12.21. Then, we replace G → H by
NOT (G) + (H)
according to law 12.24. NAND and NOR are easily replaced by AND or OR, respec-
tively, followed by NOT. In fact, since AND, OR, and NOT are a complete set of
operators, we know that any logical operator whatsoever, including those not
introduced in this book, can be replaced by expressions involving only AND, OR,
and NOT.
2. Next, apply DeMorgan’s laws to push all negations down until they either
cancel with other negations by law 12.13 in Section 12.8, or they apply only to
propositional variables.
3. Now we apply the distributive law for OR over AND to push all OR’s below all
AND’s. The result is an expression in which there are literals, combined by OR’s,
which are then combined by AND’s; this is a conjunctive normal form expression.
Note that to balance conciseness and clarity, we are using overbar, +, and juxtapo-
sition mixed with their equivalents — NOT, OR, and AND — in this and subsequent
expressions.
Step (1) requires us to replace s → t by s̄ + t, giving the AND-OR-NOT expression
p + q AND NOT r(s̄ + t)
In step (2), we must push the first NOT down by DeMorgan’s laws. The sequence of
steps, in which the NOT reaches the propositional variables is
p + q r̄ + NOT (s̄ + t)
p + q r̄ + (NOT s̄)(t̄ )
p + q r̄ + (st̄ )
Now we apply law 12.14 to push the first OR below the first AND.
(p + q) p + r̄ + (st̄)
Next, we regroup, using law 12.8 of Section 12.8, so that we can push the second
and third OR’s below the second AND.
(p + q) (p + r̄) + (st̄)
Finally, we use law 12.14 again, and all OR’s are below all AND’s. The resulting
expression,
(p + q)(p + r̄ + s)(p + r̄ + t̄)
is in conjunctive normal form. ✦
5 You should have observed by now that it doesn’t matter whether we write many hypotheses
or connect them all with AND’s and write one hypothesis.
SEC. 12.11 PROOFS BY RESOLUTION 695
literal is a clause, and a single clause is a product of clauses. Thus we begin with
clauses
(r̄ + u)(ū + w̄)(r + w̄)
Now, suppose we resolve the first and third clauses, using r in the role of p.
The resulting clause is (u + w̄). This clause may be resolved with the second clause
in the hypothesis, with u in the role of p, to get clause (w̄). Since this clause is the
desired conclusion, we are done. Figure 12.25 shows the proof as a series of lines,
each of which is a clause. ✦
1) (r̄ + u) Hypothesis
2) (ū + w̄) Hypothesis
3) (r + w̄) Hypothesis
4) (u + w̄) Resolution of (1) and (3)
5) (w̄) Resolution of (2) and (4)
For example, if we have clauses (p) and (p̄), we may apply (12.14) with q = r = 0,
to get the clause 0.
The reason this approach is valid stems from the contradiction law 12.29 of
Section 12.9, or (p̄ → 0) ≡ p. Here, let p be the statement we want to prove:
(E1 E2 · · · Ek ) → E, for some hypotheses E1 , E2 , . . . , Ek and
conclusion E. Then p̄
is NOT (E1 E2 · · · Ek → E), or NOT NOT (E1 E2 · · · Ek ) + E , using law 12.24. Several
applications of DeMorgan’s laws tell us that p is equivalent to E1 E2 · · · Ek Ē. Thus,
to prove p we can instead prove p̄ → 0, or (E1 E2 · · · Ek Ē) → 0. That is, we
prove that the hypotheses and the negation of the conclusion together imply a
contradiction.
✦ Example 12.30. Let us reconsider Example 12.29, but start with both the
three hypothesis clauses and the negation of the desired conclusion, that is, with
clause (w) as well. The resolution proof of 0 is shown in Fig. 12.26. Using the law
of contradiction, we can conclude that the hypotheses imply w̄, the conclusion. ✦
1) (r̄ + u) Hypothesis
2) (ū + w̄) Hypothesis
3) (r + w̄) Hypothesis
4) (w) Negation of conclusion
5) (u + w̄) Resolution of (1) and (3)
6) (w̄) Resolution of (2) and (5)
7) 0 Resolution of (4) and (6)
EXERCISES
12.11.1: Use the truth table method to check that expression (12.14) is a tautology.
12.11.2: Let the propositions have the intuitive meanings given in Fig. 12.27. Write
a clause or product of clauses that express the following ideas.
SEC. 12.12 SUMMARY OF CHAPTER 12 697
✦
✦ ✦
✦
12.12 Summary of Chapter 12
In this chapter, we have seen the elements of propositional logic, including:
✦ The principal operators, AND, OR, NOT, →, ≡, NAND, and NOR.
✦ The use of truth tables to represent the meaning of a logical expression, includ-
ing algorithms to construct a truth table from an expression and vice versa.
✦ Some of the many algebraic laws that apply to the logical operators.
We also discussed logic as a design theory, seeing:
✦ How Karnaugh maps help us design simple expressions for logical functions
that have up to four variables.
✦ How algebraic laws can be used sometimes to simplify expressions of logic.
Then, we saw that logic helps us express and understand the common proof tech-
niques such as:
✦ Proof by case analysis,
✦ Proof of the contrapositive,
✦ Proof by contradiction, and
✦ Proof by reduction to truth.
Finally, we studied deduction, that is, the construction of line-by-line proofs, seeing:
✦ There are a number of inference rules, such as “modus ponens,” that allow us
to construct one line of a proof from previous lines.
✦ The resolution technique often helps us find proofs quickly by representing lines
of a proof as sums of literals and combining sums in useful ways.
✦ However, there is no known algorithm that is guaranteed to find a proof of an
expression in time less than an exponential in the size of the expression.
698 PROPOSITIONAL LOGIC
✦
✦ ✦
✦
12.13 Bibliographic Notes for Chapter 12
The study of deduction in logic dates back to Aristotle. Boole [1854] developed the
algebra of propositions, and it is from this work that Boolean algebra comes.
Lewis and Papadimitriou [1979] is a somewhat more advanced treatment of
logic. Enderton [1972] and Mendelson [1987] are popular treatments of mathemat-
ical logic. Manna and Waldinger [1990] present the subject from the point of view
of proving correctness of programs.
Genesereth and Nilsson [1987] treat logic from the point of view of applications
to artificial intelligence. There, you can find more on the matter of heuristics
for discovering proofs, including resolution-like techniques. The original paper on
resolution as a method of proof is Robinson [1965].
For more on the theory of intractable problems, read Garey and Johnson [1979].
The concept of NP-completeness is by Cook [1971], and the paper by Karp [1972]
made clear the importance of the concept for commonly encountered problems.
Boole, G. [1854]. An Investigation of the Laws of Thought, McMillan; reprinted by
Dover Press, New York, in 1958.
Cook, S. A. [1971]. “The complexity of theorem proving procedures,” Proc. Third
Annual ACM Symposium on the Theory of Computing, pp. 151–158.
Enderton, H. B. [1972]. A Mathematical Introduction to Logic, Academic Press,
New York.
Garey, M. R. and D. S Johnson [1979]. Computers and Intractability: A Guide to
the Theory of NP-Completeness, W. H. Freeman, New York.
Genesereth, M. R. and N. J. Nilsson [1987]. Logical Foundations for Artificial
Intelligence, Morgan-Kaufmann, San Mateo, Calif.
Karp, R. M. [1972]. “Reducibility among combinatorial problems,” in Complexity
of Computer Computations (R. E. Miller and J. W. Thatcher, eds.), Plenum, New
York, pp. 85–103.
Lewis, H. R. and C. H. Papadimitriou [1981]. Elements of the Theory of Compu-
tation, Prentice-Hall, Englewood Cliffs, New Jersey.
Manna, Z. and R. Waldinger [1990]. The Logical Basis for Computer Programming
(two volumes), Addison-Wesley, Reading, Mass.
Mendelson, E. [1987]. Introduction to Mathematical Logic, Wadsworth and Brooks,
Monterey, Calif.
Robinson, J. A. [1965]. “A machine-oriented logic based on the resolution principle,”
J. ACM 12:1, pp. 23–41.
CHAPTER 13
Using Logic
✦
to
✦ ✦
✦
Design
Computer Components
In this chapter we shall see that the propositional logic studied in the previous
chapter can be used to design digital electronic circuits. Such circuits, found in
every computer, use two voltage levels (“high” and “low”) to represent the binary
values 1 and 0. In addition to gaining some appreciation for the design process,
we shall see that algorithm-design techniques, such as “divide-and-conquer,” can
also be applied to hardware. In fact, it is important to realize that the process of
designing a digital circuit to perform a given logical function is quite similar in spirit
to the process of designing a computer program to perform a given task. The data
models differ significantly, and frequently circuits are designed to do many things
Parallel and in parallel (at the same time) while common programming languages are designed
sequential to execute their steps sequentially (one at a time). However, general programming
operation techniques like modularizing a design are as applicable to circuits as they are to
programs.
✦
✦ ✦
✦
13.1 What This Chapter is About
This chapter covers the following concepts from digital circuit design:
✦ Physical constraints under which circuits are designed, and what properties
circuits must have to produce their answers quickly (Section 13.5).
699
700 USING LOGIC TO DESIGN COMPUTER COMPONENTS
✦
✦ ✦
✦
13.2 Gates
A gate is an electronic device with one or more inputs, each of which can assume
either the value 0 or the value 1. As mentioned earlier, the logical values 0 and
1 are generally represented electronically by two different voltage levels, but the
physical method of representation need not concern us. A gate usually has one
output, which is a function of its inputs, and which is also either 0 or 1.
Each gate computes some particular Boolean function. Most electronic “tech-
nologies” (ways of manufacturing electronic circuits) favor the construction of gates
for certain Boolean functions and not others. In particular, AND- and OR-gates are
Inverter usually easy to build, as are NOT-gates, which are called inverters. AND- and OR-gates
can have any number of inputs, although, as we discuss in Section 13.5, there is
usually a practical limitation on how many inputs a gate can have. The output of
an AND-gate is 1 if all its inputs are 1, and its output is 0 if any one or more of its
inputs are 0. Likewise, the output of an OR-gate is 1 if one or more of its inputs are
1, and the output is 0 if all inputs are 0. The inverter (NOT-gate) has one input; its
output is 1 if its input is 0 and 0 if its input is 1.
We also find it easy to implement NAND- and NOR-gates in most technologies.
The NAND-gate produces the output 1 unless all its inputs are 1, in which case
it produces the output 0. The NOR-gate produces the output 1 when all inputs
are 0 and produces 0 otherwise. An example of a logical function that is harder
to implement electronically is equivalence, which takes two inputs x and y and
produces a 1 output if x and y are both 1 or both 0, and a 0 output when exactly
SEC. 13.3 CIRCUITS 701
one of x and y is 1. However, we can build equivalence circuits out of AND-, OR-,
and NOT-gates by implementing a circuit that realizes the logical function xy + x̄ȳ.
The symbols for the gates we have mentioned are shown in Fig. 13.1. In each
case except for the inverter (NOT-gate), we have shown the gate with two inputs.
However, we could easily show more than two inputs, by adding additional lines. A
one-input AND- or OR-gate is possible, but doesn’t really do anything; it just passes
its input to the output. A one-input NAND- or NOR-gate is really an inverter.
✦
✦ ✦
✦
13.3 Circuits
Gates are combined into circuits by connecting the outputs of some gates to the
Circuit inputs inputs of others. The circuit as a whole has one or more inputs, each of which can
and outputs be inputs to various gates within the circuit. The outputs of one or more gates are
designated circuit outputs. If there is more than one output, then an order for the
output gates must be specified as well.
A B C
✦ Example 13.1. Figure 13.2 shows a circuit that produces as output z, the
equivalence function of inputs x and y. Conventionally, we show inputs at the top.
Both inputs x and y are fed to gate A, which is an AND-gate, and which therefore
produces a 1 output when (and only when) x = y = 1. Also, x and y are inverted
by NOT-gates B and C respectively, and the outputs of these inverters are fed to
AND-gate D. Thus, the output of gate D is 1 if and only if both x and y are 0. Since
the outputs of gates A and D are fed to OR-gate E, we see that the output of that
gate is 1 if and only if either x = y = 1 or x = y = 0. The table in Fig. 13.3 gives
a logical expression for the output of each gate.
Thus, the output z of the circuit, which is the output of gate E, is 1 if and only
if the logical expression xy + x̄ȳ has value 1. Since this expression is equivalent to
the expression x ≡ y, we see that the circuit output is the equivalence function of
its two inputs. ✦
✦ Example 13.2. In Fig. 13.4 we see the directed graph that comes from the
circuit of Fig. 13.2. For example, there is an arc A → E because the output of gate
A is connected to an input of gate E. The graph of Fig. 13.4 clearly has no cycles;
in fact, it is a tree with root E, drawn upside-down. Thus, we conclude that the
circuit of Fig. 13.2 is combinational.
On the other hand, consider the circuit of Fig. 13.5(a). There, the output of
gate A is an input to gate B, and the output of B is an input to A. The graph
for this circuit is shown in Fig. 13.5(b). It clearly has a cycle, so that the circuit is
sequential.
SEC. 13.3 CIRCUITS 703
A B C
Fig. 13.4. Directed graph constructed from the circuit of Fig. 13.2.
x y
A B
A B
Suppose inputs x and y to this circuit are both 1. Then the output of B is
surely 1, and therefore, both inputs to the AND-gate A are 1. Thus, this gate will
produce output 1. Now we can let input y become 0, and the output of OR-gate B
will remain 1, because its other input (the input from the output of A) is 1. Thus,
both inputs to A remain 1, and its output is 1 as well.
However, suppose x becomes 0, whether or not y is 0. Then the output of gate
A, and therefore the circuit output z, must be 0. We can describe the circuit output
z as 1 if, at some time in the past, both x and y were 1 and since then x (but not
necessarily y) has remained 1. Figure 13.6 shows the output as a function of time
for various input value combinations; the low level represents 0 and the elevated
704 USING LOGIC TO DESIGN COMPUTER COMPONENTS
Fig. 13.6. Output as a function of time, for the circuit of Fig. 13.5(a).
level represents 1. ✦
We shall discuss sequential circuits briefly at the end of this chapter. As we just
saw in Example 13.2, sequential circuits have the ability to remember important
things about the sequence of inputs seen so far, and thus they are needed for key
components of computers, such as main memory and registers. Combinational cir-
cuits, on the other hand, can compute the values of logical functions, but they must
work from a single setting for their inputs, and cannot remember what the inputs
were set to previously. Nevertheless, combinational circuits are also vital compo-
nents of computers. They are needed to add numbers, decode instructions into the
electronic signals that cause the computer to perform those instructions, and many
other tasks. In the following sections, we shall devote most of our attention to the
design of combinational circuits.
EXERCISES
13.3.1: Design circuits that produce the following outputs. You may use any of
the gates shown in Fig. 13.1.
Parity function a) The parity, or sum-mod-2, function of inputs x and y that is 1 if and only if
exactly one of x and y is 1.
Majority b) The majority function of inputs w, x, y, and z that is 1 if and only if three or
function more of the inputs are 1.
SEC. 13.4 LOGICAL EXPRESSIONS AND CIRCUITS 705
c) The function of inputs w, x, y, and z that is 1 unless all or none of the inputs
are 1.
d) The exclusive-or function ⊕ discussed in Exercise 12.4.7.
13.3.2*: Suppose the circuit of Fig. 13.5(a) is modified so that both gates A and B
are AND-gates, and both inputs x and y are initially 1. As the inputs change, under
what circumstances will the output be 1?
13.3.3*: Repeat Exercise 13.3.2 if both gates are OR-gates.
✦
✦ ✦
✦
13.4 Logical Expressions and Circuits
It is relatively simple to build a circuit whose output, as a function of its inputs, is
the same as that of a given logical expression. Conversely, given a combinational
circuit, we can find a logical expression for each circuit output, as a function of its
inputs. The same is not true of a sequential circuit, as we saw in Example 13.2.
BASIS. If the expression tree is a single node, the expression can only be an input,
say x. The “circuit” for this expression will be the circuit input x itself.
E1 E2 ··· En
INDUCTION. For the induction, suppose that the expression tree in question is
similar to Fig. 13.7. There is some logical operator, which we call θ, at the root;
θ might be AND or OR, for example. The root has n subtrees for some n, and the
operator θ is applied to the results of these subtrees to produce a result for the
whole tree.
Since we are performing a structural induction, we may assume that the in-
ductive hypothesis applies to the subexpressions. Thus, there is a circuit C1 for
expression E1 , circuit C2 for E2 , and so on.
To build the circuit for E, we take a gate for the operator θ and give it n
inputs, one from each of the outputs of the circuits C1 , C2 , . . . , Cn , in that order.
706 USING LOGIC TO DESIGN COMPUTER COMPONENTS
circuit inputs
• •
• •
• •
C1 C2 ··· Cn
circuit output
Fig. 13.8. The circuit for θ(E1 , . . . , En ) where Ci is the circuit for Ei .
The output of the circuit for E is taken from the θ-gate just introduced. The
construction is suggested in Fig. 13.8.
The circuit we have constructed computes the expression in the obvious way.
However, there may be circuits producing the same output function with fewer gates
or fewer levels. For example, if the given expression is (x + y)z + (x + y)w̄, then
the circuit we construct will have two occurrences of the subcircuit that realizes the
common expression x + y. We can redesign the circuit to use just one occurrence of
this subcircuit, and feed its output everywhere the common subexpression is used.
There are other more radical transformations that we can make to improve the
design of circuits. Circuit design, like the design of efficient algorithms, is an art,
and we shall see a few of the important techniques of this art later in this chapter.
✦ Example 13.3. One possible topological order of the gates in the circuit of Fig.
13.2 is ABCDE, and another is BCDAE. However, ABDCE is not a topological
order, since gate C feeds gate D, but D appears before C in this sequence. ✦
SEC. 13.4 LOGICAL EXPRESSIONS AND CIRCUITS 707
STATEMENT S(i): For the first i gates in the topological order, there are logical
expressions for the output of these gates.
BASIS. The basis will be i = 0. Since there are zero gates to consider, there is
nothing to prove, so the basis part is done.
INDUCTION. For the induction, look at the ith gate in the topological order.
Suppose gate i’s inputs are I1 , I2 , . . . , Ik . If Ij is a circuit input, say x, then let the
expression Ej for input Ij be x. If input Ij is the output of some other gate, that
gate must precede the ith gate in the topological order, which means that we have
already constructed some expression Ej for the output of that gate. Let the operator
associated with gate i be θ. Then an expression for gate i is θ(E1 , E2 , . . . , Ek ). In the
common case that θ is a binary operator for which infix notation is conventionally
used, the expression for gate i can be written (E1 )θ(E2 ). The parentheses are placed
there for safety, although depending on the precedence of operators, they may or
may not be necessary.
✦ Example 13.4. Let us determine the output expression for the circuit in Fig.
13.2, using the topological order ABCDE for the gates. First, we look at AND-gate
A. Its two inputs are from the circuit inputs x and y, so that the expression for the
output of A is xy.
Gate B is an inverter with input x, so that its output is x̄. Similarly, gate C
has output expression ȳ. Now we can work on gate D, which is an AND-gate with
inputs taken from the outputs of B and C. Thus, the expression for the output of
D is x̄ȳ. Finally, gate E is an OR-gate, whose inputs are the outputs of A and D.
We thus connect the output expressions for these gates by the OR operator, to get
the expression xy + x̄ȳ as the output expression for gate E. Since E is the only
output gate of the circuit, that expression is also the circuit output. Recall that the
circuit of Fig. 13.2 was designed to realize the Boolean function x ≡ y. It is easy to
verify that the expression we derived for gate E is equivalent to x ≡ y. ✦
✦ Example 13.5. In the previous examples, we have had only one circuit output,
and the circuit itself has been a tree. Neither of these conditions holds generally.
We shall now take up an important example of the design of a circuit with multiple
outputs, and where some gates have their output used as input to several gates.
One-bit adder Recall from Chapter 1 that we discussed the use of a one-bit adder in building a
circuit to add binary numbers. A one-bit adder circuit has two inputs x and y that
represent the bits in some particular position of the two numbers being added. It
has a third input, c, that represents the carry-in to this position from the position
to the right (next lower-order position). The one-bit adder produces as output the
following two bits:
1. The sum bit z, which is 1 if an odd number of x, y, and c are 1, and
2. The carry-out bit d, which is 1 if two or more of x, y, and c are 1.
708 USING LOGIC TO DESIGN COMPUTER COMPONENTS
yc yc
00 01 11 10 00 01 11 10
0 0 1 0 1 0 0 0 1 0
x x
1 1 0 1 0 1 0 1 1 1
Fig. 13.9. Karnaugh maps for the sum and carry-out functions.
In Fig. 13.9 we see Karnaugh maps for z and d, the sum and carry-out functions
of the one-bit adder. Of the eight possible minterms, seven appear in the functions
for z or d, and only one, xyc, appears in both.
A systematically designed circuit for the one-bit adder is shown in Fig. 13.10.
We begin by taking the circuit inputs and inverting them, using the three inverters
at the top. Then we create AND-gates for each of the minterms that we need in
one or more outputs. These gates are numbered 1 through 7, and each integer
tells us which of its inputs are “true” circuit inputs, x, y, or c, and which are
“complemented” inputs, x̄, ȳ, or c̄. That is, write the integer as a 3-bit binary
number, and regard the bits as representing x, y, and c, in that order. For example,
gate 4, or (100)2 , has input x true and inputs y and c complemented; that is, it
produces the output expression xȳc̄. Notice that there is no gate 0 here, because
the minterm x̄ȳc̄ is not needed for either output.
Finally, the circuit outputs, z and d, are assembled with OR-gates at the bottom.
The OR-gate for z has inputs from the output of each AND-gate whose minterm makes
z true, and the inputs to the OR-gate for d are selected similarly.
Let us compute the output expressions for the circuit of Fig. 13.10. The topo-
logical order we shall use is the inverters first, then the AND-gates 1, 2, . . . , 7, and
finally the OR-gates for z and d. First, the three inverters obviously have output
expressions x̄, ȳ, and c̄. Then we already mentioned how the inputs to the AND-gates
were selected and how the expression for the output of each is associated with the
SEC. 13.4 LOGICAL EXPRESSIONS AND CIRCUITS 709
x y c
• • •
x̄ • •
x • • •
ȳ • •
y • • •
c̄ • •
c • • •
1 2 3 4 5 6 7
z d
binary representation of the number of the gate. Thus, gate 1 has output expression
x̄ȳc. Finally, the output of the OR-gate z is the OR of the output expressions for
gates 1, 2, 4, and 7, that is
x̄ȳc + x̄yc̄ + xȳc̄ + xyc
Similarly, the output of the OR-gate for d is the OR of the output expressions for
gates 3, 5, 6, and 7, which is
x̄yc + xȳc + xyc̄ + xyc
We leave it as an exercise to show that this expression is equivalent to the expression
yc + xc + xy
710 USING LOGIC TO DESIGN COMPUTER COMPONENTS
that we would get if we worked from the Karnaugh map for d alone. ✦
EXERCISES
13.4.1: Design circuits for the following Boolean functions. You need not restrict
yourself to 2-input gates if you can group three or more operands that are connected
by the same operator.
a) x + y + z. Hint : Think of this expression as OR(x, y, z).
b) xy + xz + yz
c) x + (ȳ x̄)(y + z)
13.4.2: For each of the circuits in Fig. 13.11, compute the logical expression for
each gate. What are the expressions for the outputs of the circuits? For circuit (b)
construct an equivalent circuit using only AND, OR, and NOT gates.
x y z
x y z
•
•
output
output
(a)
(b)
13.4.3: Prove the following tautologies used in Examples 13.4 and 13.5:
a) (xy + x̄ȳ) ≡ (x ≡ y)
b) (x̄yc + xȳc + xyc̄ + xyc) ≡ (yc + xc + xy)
SEC. 13.5 SOME PHYSICAL CONSTRAINTS ON CIRCUITS 711
Chips
Chips generally have several “layers” of material that can be used, in combination,
to build gates. Wires can run in any layer, to interconnect the gates; wires on
Feature size different layers usually can cross without interacting. The “feature size,” roughly
the minimum width of a wire, is in 1994 usually below half a micron (a micron is
Micron 0.00l millimeter, or about 0.00004 inches). Gates can be built in an area several
microns on a side.
The process by which chips are fabricated is complex. For example, one step
might deposit a thin layer of a certain substance, called a photoresist, all over a
chip. Then a photographic negative of the features desired on a certain layer is
used. By shining light or a beam of electrons through the negative, the top layer
can be etched away in places where the beam shines through, leaving only the
desired circuit pieces.
✦
✦ ✦
✦
13.5 Some Physical Constraints on Circuits
Integrated Today, most circuits are built as “chips,” or integrated circuits. Large numbers of
circuits gates, perhaps as many as millions of gates, and the wires interconnecting them,
are constructed out of semiconductor and metallic materials in an area about a
centimeter (0.4 inches) on a side. The various “technologies,” or methods of con-
structing integrated circuits, impose a number of constraints on the way efficient
circuits can be designed. For example, we mentioned earlier that certain types of
gates, such as AND, OR, and NOT, are easier to construct than other kinds.
Circuit Speed
Associated with each gate is a delay, between the time that the inputs become active
and the time that the output becomes available. This delay might be only a few
nanoseconds (a nanosecond is 10−9 seconds), but in a complex circuit, such as the
central processing unit of a computer, information propagates through many levels
of gates, even during the execution of a single instruction. As modern computers
perform instructions in much less than a microsecond (which is 10−6 seconds), it is
evidently imperative that the number of gates through which a value must propagate
be kept to a minimum.
Thus, for a combinational circuit, the maximum number of gates that lie along
any path from an input to an output is analogous to the running time of a program
as a figure of merit. That is, if we want our circuits to compute their outputs fast,
Circuit delay we must minimize the longest path length in the graph of the circuit. The delay of
a circuit is the number of gates on the longest path — that is, one plus the length
of the path equals the delay. For example, the adder of Fig. 13.10 has delay 3, since
the longest paths from input to output go through one of the inverters, then one
of the AND-gates, and finally, through one of the OR-gates; there are many paths of
length 3.
Notice that, like running time, circuit delay only makes sense as an “order of
magnitude” quantity. Different technologies will give us different values of the time
that it takes an input of one gate to affect the output of that gate. Thus, if we have
two circuits, of delay 10 and 20, respectively, we know that if implemented in the
712 USING LOGIC TO DESIGN COMPUTER COMPONENTS
same technology, with all other factors being equal, the first will take half the time
of the second. However, if we implement the second circuit in a faster technology,
it could beat the first circuit implemented in the original technology.
Size Limitations
The cost of building a circuit is roughly proportional to the number of gates in the
circuit, and so we would like to reduce the number of gates. Moreover, the size of
a circuit also influences its speed, and small circuits tend to run faster. In general,
the more gates a circuit has, the greater the area on a chip that it will consume.
There are at least two negative effects of using a large area.
1. If the area is large, long wires are needed to connect gates that are located far
apart. The longer a wire is, the longer it takes a signal to travel from one end
Propagation to the other. This propagation delay is another source of delay in the circuit,
delay in addition to the time it takes a gate to “compute” its output.
2. There is a limit to how large chips can be, because the larger they are, the
more likely it is that there will be an imperfection that causes the chip to fail.
If we have to divide a circuit across several chips, then wires connecting the
chips will introduce a severe propagation delay.
Our conclusion is that there is a significant benefit to keeping the number of gates
in a circuit low.
1 Strictly speaking, this observation is true only in 2’s complement notation. In some other
notations, there are two ways to represent 0. For example, in sign-and-magnitude, we would
test only whether the last 31 bits are 0.
SEC. 13.5 SOME PHYSICAL CONSTRAINTS ON CIRCUITS 713
one value, and if we have designed the circuit properly, that one value will be the
OR of all n original values. Thus, we need at least 31 gates to compute the OR of 32
bits, x1 , x2 , . . . , x32 .
x1 x2 x3 x4 x32
···
A naive way to do this OR is shown in Fig. 13.12. There, we group the bits in
a left-associative way. As each gate feeds the next, the graph of the circuit has a
path with 31 gates, and the delay of the circuit is 31.
A better way is suggested in Fig. 13.13. A complete binary tree with five levels
uses the same 31 gates, but the delay is only 5. We would expect the circuit of Fig.
13.13 therefore to run about six times faster than the circuit of Fig. 13.12. Other
factors that influence speed might reduce the factor of six, but even for a “small”
number of bits like 32, the clever design is significantly faster than the naive design.
If one doesn’t immediately “see” the trick of using a complete binary tree as a
circuit, one can obtain the circuit of Fig. 13.13 by applying the divide-and-conquer
714 USING LOGIC TO DESIGN COMPUTER COMPONENTS
···
Divide and paradigm. That is, to take the OR of 2k bits, we divide the bits into two groups of
conquer circuits 2k−1 bits each. Circuits for each of the two groups are combined by a final OR-gate,
as suggested in Fig. 13.14. Of course, the circuit for the basis case k = 1 (i.e.,
two inputs) is provided not by divide-and-conquer, but by using a single two-input
OR-gate. ✦
EXERCISES
13.5.1*: Suppose that we can use OR-gates with fan-in of k, and we wish to take
the OR of n inputs, where n is a power of k. What is the minimum possible delay
for such a circuit? What would be the delay if we used a naive “cascading” circuit
as shown in Fig. 13.12?
SEC. 13.5 SOME PHYSICAL CONSTRAINTS ON CIRCUITS 715
13.5.3*: The divide-and-conquer approach of Fig. 13.14 works even when the num-
ber of inputs is not a power of two. Then the basis must include sets of two or three
inputs; three-input sets are handled by two OR-gates, one feeding the other, assum-
ing we wish to keep strictly to our fan-in limitation of two. What is the delay of
such circuits, as a function of the number of inputs?
13.5.4: First-string commandos are ready, willing, and able. Suppose we have n
commandos, and circuit inputs ri , wi , and ai indicate, respectively, whether the ith
commando, is ready, willing, and able. We only want to send the commando team
on a raid if they all are ready, willing, and able. Design a divide-and-conquer circuit
to indicate whether we can send the team on a raid.
✦
✦ ✦
✦
13.6 A Divide-and-Conquer Addition Circuit
One of the key parts of a computer is a circuit that adds two numbers. While
actual microprocessor circuits do more, we shall study the essence of the problem
by designing a circuit to add two nonnegative integers. This problem is quite
instructive as an example of divide-and-conquer circuit design.
We can build an adder for n-bit numbers from n one-bit adders, connected in
one of several ways. Let us suppose that we use the circuit of Fig. 13.10 as a one-
bit-adder circuit. This circuit has a delay of 3, which is close to the best we can do.2
The simplest approach to building an adder circuit is the ripple-carry adder which
we saw in Section 1.3. In this circuit, an output of each one-bit adder becomes an
input of the next one-bit adder, so that adding two n-bit numbers incurs a delay of
3n. For example, in the case where n = 32, the circuit delay is 96.
left-half right-half
adder adder
BASIS. Consider the case n = 1. Here we have two inputs, x and y, and we need
to compute four outputs, s, t, p, and g, given by the logical expressions
s = xȳ + x̄y
t = xy + x̄ȳ
g = xy
p = x+y
To see why these expressions are correct, first assume there is no carry into
the one place in question. Then the sum bit, which is 1 if an odd number of x, y,
and the carry-in are 1, will be 1 if exactly one of x and y is 1. The expression for
s above clearly has that property. Further, with no carry-in, there can only be a
718 USING LOGIC TO DESIGN COMPUTER COMPONENTS
x y
• •
• • x
• x̄
• y
• ȳ
s g t p
carry-out if both x and y are 1, which explains the expression for g above.
Now suppose that there is a carry-in. Then for an odd number of x, y, and the
carry-in to be 1, it must be that both or neither of x and y are 1, explaining the
expression for t. Also, there will now be a carry-out if either one or both of x and
y are 1, which justifies the expression for p. A circuit for the basis is shown in Fig.
13.16. It is similar in spirit to the full adder of Fig. 13.10, but is actually somewhat
simpler, because it has only two inputs instead of three.
INDUCTION. The inductive step is illustrated in Fig. 13.17, where we build a 2n-
adder from two n-adders. A 2n-adder is composed of two n-adders, followed by two
pieces of circuitry labeled FIX in Fig. 13.17, to handle two issues:
1. Computing the carry propagate and generate bits for the 2n-adder
2. Adjusting the left half of the s’s and t’s to take into account whether or not
there is a carry into the left half from the right
First, suppose that there is a carry into the right end of the entire circuit for the
SEC. 13.6 A DIVIDE-AND-CONQUER ADDITION CIRCUIT 719
n-adder n-adder
gL pL s1 L t1 L · · ·sn L tn L gR pR s1 R t1 R · · ·sn R tn R
FIX
2n-adder. Then there will be a carry out at the left end of the entire circuit if either
of the following hold:
a) Both halves of the adder propagate a carry; that is, pL pR is true. Note this
expression includes the case when the right half generates a carry and the left
half propagates it. Then pL g R is true, but g R → pR , so (pL pR +pL g R ) ≡ pL pR .
b) The left half generates a carry; that is, g L is true. In this case, the existence
of a carry-out on the left does not depend on whether or not there is a carry
into the right end, or on whether the right half generates a carry.
Thus, the expression for p, the carry-propagate bit for the 2n-adder, is
p = g L + pL pR
Now assume there is no carry-in at the right end of the 2n-adder. Then there
is a carry-out at the left end of the 2n-adder if either
a) The right half generates a carry and the left half propagates it, or
b) The left half generates a carry.
Thus, the logical expression for g is
g = g L + pL g R
Now let us turn our attention to the si ’s and the ti ’s. First, the right-half bits
are unchanged from the outputs of the right n-adder, because the presence of the
left half has no effect on the right half. Thus, sn+i = si R , and tn+i = ti R , for
i = 1, 2, . . . , n.
The left-half bits must be modified, however, to take into account the ways in
which the right half can generate a carry. First, suppose that there is no carry-in
at the right end of the 2n-adder. This is the situation that the si ’s are supposed
to tell us about, so that we can develop expressions for the si ’s on the left, that
720 USING LOGIC TO DESIGN COMPUTER COMPONENTS
is, s1 , s2 , . . . , sn . Since there is no carry-in for the right half, there is a carry-in for
the left half only if a carry is generated by the right half. Thus, if g R is true, then
si = ti L (since the ti L ’s tell us about what happens when there is a carry into the
left half). If g R is false, then si = si L (since the si L ’s tell us what happens when
there is no carry into the left half). As a logical expression, we can write
si = si L ḡ R + ti L g R
for i = 1, 2, . . . , n.
Finally, consider what happens when there is a carry-in at the right end of the
2n-adder. Now we can address the question of the values for the ti ’s on the left as
follows. There will be a carry into the left half if the right half propagates a carry,
that is, if pR = 1. Thus, ti takes its value from ti L if pR is true and from si L if pR
is false. As a logical expression,
ti = si L p̄R + ti L pR
In summary, the circuits represented by the box labeled FIX in Fig. 13.17
compute the following expressions:
p = g L + pL pR
g = g L + pL g R
si = si L ḡ R + ti L g R , for i = 1, 2, . . . , n
ti = si L p̄R + ti L pR , for i = 1, 2, . . . , n
These expressions can each be realized by a circuit of at most three levels. For
example, the last expression needs only the circuit of Fig. 13.18.
si L pR ti L
ti
n G(n)
1 9
2 30
4 78
8 186
16 426
32 954
adder uses 12n gates, and for n = 32, this number is 384 (we can save a few gates
if we remember that the carry into the rightmost bit is 0).
We see that for the interesting case, n = 32, the ripple-carry adder, while
much slower, does use fewer than half as many gates as the divide-and-conquer
adder. Moreover, the latter’s growth rate, O(n log n), is higher than the growth
rate of the ripple-carry adder, O(n), so that the difference in the number of gates
gets larger as n grows. However, the ratio is only O(log n), so that the difference
in the number of gates used is not severe. As the difference in the time required by
the two classes of circuits is much more significant [O(n) vs. O(log n)], some sort of
divide-and-conquer adder is used in essentially all modern computers.
EXERCISES
13.6.6**: We observed that if all we want is a 32-bit adder, we do not need all 954
gates as was indicated in Fig. 13.19. The reason is that we can assume there is no
carry into the rightmost place of the 32 bits. How many gates do we really need?
✦
✦ ✦
✦
13.7 Design of a Multiplexer
data inputs
y0 y1 · · · y2d −1
x1
x2
control
inputs ···
xd
y(x1 x2 ···xd )2
s i L ti L
gR
si = si L ḡ R + ti L g R
Notice that there is one term for each data input. The term with data input yi also
has each of the control inputs, either negated or unnegated. We can tell which are
negated by writing i as a d-bit binary integer. If the jth position of i in binary has
0, then xj is negated, and if the jth position has 1, we do not negate xj . Note that
this rule works for any number d of control inputs. ✦
A Divide-and-Conquer Multiplexer
The circuit of Fig. 13.22 has maximum fan-in 4, which is generally acceptable.
However, as d gets larger, the fan-in of the OR-gate, which is 2d , grows unaccept-
ably. Even the AND-gates, with d + 1 inputs each, begin to have uncomfortably
large fan-in. Fortunately, there is a divide-and-conquer approach based on splitting
the control bits in half, that allows us to build the circuit with gates of fan-in at
most 2. Moreover, this circuit uses many fewer gates and is almost as fast as the
generalization of Fig. 13.22, provided we require that all circuits be built of gates
with the same limit on fan-in.
An inductive construction of a family of multiplexer circuits follows: We call
the circuit for a multiplexer with d-control-inputs and 2d -data-inputs a d-MUX.
BASIS. The basis is a multiplexer circuit for d = 1, that is, a 1-MUX, which we
show in Fig. 13.23. It consists of four gates, and the fan-in is limited to 2.
INDUCTION. The induction is performed by the circuit in Fig. 13.24, which con-
structs a 2d-MUX from 2d + 1 copies of d-MUX’s. Notice that while we double the
number of control inputs, we square the number of data inputs, since 22d = (2d )2 .
SEC. 13.7 DESIGN OF A MULTIPLEXER 725
y0 y1 y2 y3
x1 •
x2 •
Suppose that the control inputs to the 2d-MUX call for data input yi ; that is,
i = (x1 x2 · · · x2d )2
Each d-MUX in the top row of Fig. 13.24 takes a group of 2d data inputs, starting
with some yj , where j is a multiple of 2d . Thus, if we use the low-order d control
bits, xd+1 , . . . , x2d , to control each of these d-MUX’s, the selected input is the kth
from each group (counting the leftmost in each group as input 0), where
k = (xd+1 · · · x2d )2
That is, k is the integer represented by the low-order half of the bits.
The data inputs to the bottom d-MUX are the outputs of the top row of
d-MUX’s, which we just discovered are yk , y2d +k , y2×2d +k , . . . , y(2d −1)2d +k . The
726 USING LOGIC TO DESIGN COMPUTER COMPONENTS
y0 y1
x1 •
x1
··· d-MUX
xd
y(x1 x2 ···x2d )2
DELAY
d Divide-and-conquer Simple
MUX MUX
1 3 3
2 5 5
4 9 8
8 17 13
16 33 22
Gate Count
In this section we compare the number of gates between the simple MUX and
the divide-and-conquer MUX. We shall see that the divide-and-conquer MUX has
strikingly fewer gates as d increases.
To count the number of gates in the divide-and-conquer MUX, we can tem-
porarily ignore the inverters. We know that each of the d control inputs is inverted
once, so that we can just add d to the count at the end. Let G(d) be the number of
gates (excluding inverters) used in the d-MUX. Then we can develop a recurrence
for G as follows:
BASIS. For the basis case, d = 1, there are three gates in the circuit of Fig. 13.23,
excluding the inverter. Thus, G(1) = 3.
INDUCTION. For the induction, the 2d-MUX in Fig. 13.24 is built entirely from
2d + 1 d-MUX’s.
The first few values of the recurrence are G(2) = 9, G(4) = 45, and G(8) = 765.
Now consider the number of gates used in the simple MUX, converted to use
only gates of fan-in 2. As before, we shall ignore the d inverters needed for the
control inputs. The final OR-gate is replaced by a tree of 2d − 1 OR-gates. Each
of the 2d AND-gates is replaced by a tree of d AND-gates. Thus, the total number
of gates is 2d (d + 1) − 1. This function is greater than the number of gates for
the divide-and-conquer MUX, approximately by the ratio (d + 1)/3. Figure 13.26
compares the gate counts (excluding the d inverters in each case) for the two kinds
of MUX.
SEC. 13.7 DESIGN OF A MULTIPLEXER 729
GATE COUNT
d Divide-and-conquer Simple
MUX MUX
1 3 3
2 9 11
4 45 79
8 765 2303
16 196,605 1,114,111
EXERCISES
13.7.1: Using the divide-and-conquer technique of this section, construct a
a) 2-MUX
b) 3-MUX
13.7.2*: How would you construct a multiplexer for which the number of data
inputs is not a power of two?
One-hot 13.7.3*: Use the divide-and-conquer technique to design a one-hot-decoder. This
decoder circuit takes d inputs, x1 , x2 , . . . , xd and has 2d outputs y0 , y1 , . . . , y2d −1 . Exactly
one of the outputs will be 1, specifically that yi such that i = (x1 , x2 , . . . , xd )2 .
What is the delay of your circuit as a function of d? How many gates does it use as
a function of d? Hint : There are several approaches. One is to design the circuit
for d by taking a one-hot-decoder for the first d − 1 inputs and splitting each output
of that decoder into two outputs based on the last input, xd . A second is to assume
d is a power of 2 and start with two one-hot-decoders, one for the first d/2 inputs
and the other for the last d/2 inputs. Then combine the outputs of these decoders
appropriately.
13.7.4*: How does your circuit for Exercise 13.7.3 compare, in delay and number
of gates, with the obvious one-hot-decoder formed by creating one AND-gate for each
output and feeding to that gate the appropriate inputs and inverted inputs? How
does the circuit of Exercise 13.7.3 compare with your circuit of this exercise if you
replace AND-gates with large fan-in by trees of 2-input gates?
730 USING LOGIC TO DESIGN COMPUTER COMPONENTS
Majority circuit 13.7.5*: A majority circuit takes 2d − 1 inputs and has a single output. Its output
is 1 if d or more of the inputs are 1. Design a divide-and-conquer majority circuit.
What are its delay and gate count as a function of d? Hint : Like the adder of
Section 13.6, this problem is best solved by a circuit that computes more than we
need. In particular, we can design a circuit that takes n inputs and has n + 1
outputs, y0 , y1 , . . . , yn . Output yi is 1 if exactly i of the inputs are 1. We can then
construct the majority circuit inductively by either of the two approaches suggested
in Exercise 13.7.3.
13.7.6*: There is a naive majority circuit that is constructed by having one AND
gate for every set of d inputs. The output of the majority circuit is the OR of all
these AND-gates. How do the delay and gate count of the naive circuit compare with
that of the divide-and-conquer circuit of Exercise 13.7.5? What if the gates of the
naive circuit are replaced by 2-input gates?
✦
✦ ✦
✦
13.8 Memory Elements
Before leaving the topic of logic circuits, let us consider a very important type
of circuit that is sequential rather than combinational. A memory element is a
collection of gates that can remember its last input and produce that input at its
output, no matter how long ago that input was given. The main memory of the
computer consists of bits that can be stored into and that will hold their value until
another value is stored.
b
load • a
d • out
c
in
Now consider what happens when load = 1. The output of inverter a is now
0, so that the output of AND-gate b will be 0 as well. On the other hand, the first
input to AND-gate c is 1, so that the output of c will be whatever the input in is.
Likewise, as the first input to OR-gate d is 0, the output of d will be the same as the
output of c, which in turn is the same as circuit input in. Thus, setting load to 1
causes the circuit output to become whatever in is. When we change load back to
0, that circuit output continues to circulate between gates b and d, as discussed.
We conclude that the circuit of Fig. 13.27 behaves like a memory element, if
we interpret “circuit input” as meaning whatever value in has at a time when load
is 1. If load is zero, then we say there is no circuit input, regardless of the value of
in. By setting load to 1, we can cause the memory element to accept a new value.
The element will hold that value as long as load is 0, that is, as long as there is no
new circuit input.
EXERCISES
13.8.1: Draw a timing diagram similar to that in Fig. 13.6 for the memory-element
circuit shown in Fig. 13.27.
13.8.2: Describe what happens to the behavior of the memory element shown in
Fig. 13.27 if an alpha particle hits the inverter and for a short time (but enough
time for signals to propagate around the circuit) causes the output of gate a to be
the same as its input.
✦
✦ ✦
✦
13.9 Summary of Chapter 13
After reading this chapter, the reader should have more familiarity with the circuitry
732 USING LOGIC TO DESIGN COMPUTER COMPONENTS
in a computer and how logic can be used to help design this circuitry. In particular,
the following points were covered:
✦ What gates are and how they are combined to form circuits
✦ The difference between a combinational circuit and a sequential circuit
✦ How combinational circuits can be designed from logical expressions, and how
logical expressions can be used to model combinational circuits
✦ How algorithm-design techniques such as divide-and-conquer can be used to
design circuits such as adders and multiplexers
✦ Some of the factors that go into the design of fast circuits
✦ An indication of how a computer stores bits in its electronic circuitry
✦
✦ ✦
✦
13.10 Bibliographic Notes for Chapter 13
Shannon [1938] was the first to observe that Boolean algebra can be used to describe
the behavior of combinational circuits. For a more comprehensive treatment on the
theory and design of combinational circuits, see Friedman and Menon [1975].
Mead and Conway [1980] describe techniques used to construct very large scale
integrated circuits. Hennessy and Patterson [1990] discuss computer architecture
and the techniques for organizing its circuit elements.
Friedman, A. D., and P. R. Menon [1975]. Theory and Design of Switching Circuits,
Computer Science Press, New York.
Hennessy, J. L., and D. A. Patterson [1990]. Computer Architecture: A Quantita-
tive Approach, Morgan Kaufmann, San Mateo, Calif.
Mead, C., and L. Conway [1980]. Introduction to VLSI Systems, Addison-Wesley,
Reading, Mass.
Shannon, C. E. [1938]. “Symbolic analysis of relay and switching circuits,” Trans.
of AIEE 57, pp. 713–723.
CHAPTER 14
✦
✦ ✦
✦
Predicate
Logic
✦
✦ ✦
✦
14.1 What This Chapter Is About
We introduce predicates in Section 14.2. As we shall see, predicates provide much
greater power to express ideas formally than do propositional variables. Much of
the development of predicate logic parallels that of propositional logic in Chapter
12, although there are important differences.
✦ Expressions of predicate logic can be built from predicates using the operators
of propositional logic (Section 14.3).
✦ “Quantifiers” are operators of predicate logic that have no counterpart in
propositional logic (Section 14.4). We can use quantifiers to state that an
expression is true for all values of some argument or that there exists at least
one value of the argument that makes the expression true.
733
734 PREDICATE LOGIC
✦ Tautologies of predicate logic are expressions that are true for all interpreta-
tions. Some tautologies of predicate logic are analogs of tautologies for propo-
sitional logic (Section 14.6), while others are not (Section 14.7).
✦ Further, Turing’s theorem tells us that there are problems we can state but
cannot solve by any computer. An example is whether or not a given C program
goes into an infinite loop on certain inputs.
✦
✦ ✦
✦
14.2 Predicates
If we define the proposition uMary to mean that Mary takes her umbrella, and
wMary to mean that Mary gets wet, then we have the similar set of hypotheses
r → uMary , uMary → w̄Mary , and r̄ → w̄Mary
Atomic Formulas
An atomic formula is a predicate with zero or more arguments. For example, u(X)
is an atomic formula with predicate u and one argument, here occupied by the
variable X. In general, an argument is either a variable or a constant.1 While, in
principle, we must allow any sort of value for a constant, we shall usually imagine
Variables and that values are integers, reals, or character strings.
constants Variables are symbols capable of taking on any constant as value. We should
not confuse “variables” in this sense with “propositional variables,” as used in Chap-
ter 12. In fact, a propositional variable is equivalent to a predicate with no argu-
ments, and we shall write p for an atomic formula with predicate name p and zero
arguments.
An atomic formula all of whose arguments are constants is called a ground
Ground atomic atomic formula. Nonground atomic formulas can have constants or variables as
formula arguments, but at least one argument must be a variable. Note that any proposition,
being an atomic formula with no arguments, has “all arguments constant,” and is
therefore a ground atomic formula.
1 Predicate logic also allows arguments that are more complicated expressions than single
variables or constants. These are important for certain purposes that we do not discuss in
this book. Therefore, in this chapter we shall only see variables and constants as arguments
of predicates.
736 PREDICATE LOGIC
✦ Example 14.1. We might invent a predicate name csg to represent the in-
formation contained in the Course-Student-Grade relation discussed in Section 8.2.
The atomic formula csg(C, S, G) can be thought of as saying, of variables C, S,
and G, that student S took course C and got grade G. Put another way, when we
substitute constants c for C, s for S, and g for G, the value of csg(c, s, g) is TRUE if
and only if student s took course c and got grade g.
We can also express the particular facts (i.e., tuples) in the relation as ground
atomic formulas, by using constants as arguments. For instance, the first tuple of
Fig. 8.1 could be expressed as csg(“CS101”, 12345, “A”), asserting that the student
with ID 12345 got an A in CS101. Finally, we can mix constants and variables as
arguments, so that we might see an atomic formula like csg(“CS101”, S, G). This
atomic formula is true if variables S and G take on any pair of values (s, g) such
that s is a student who took course CS101, and got grade g and false otherwise. ✦
EXERCISES
14.2.1: Identify the following as constants, variables, ground atomic formulas, or
nonground atomic formulas, using the conventions of this section.
a) CS205
b) cs205
c) 205
d) “cs205”
e) p(X, x)
f) p(3, 4, 5)
g) “p(3, 4, 5)”
✦
✦ ✦
✦
14.3 Logical Expressions
The notions that we used in Chapter 12 for propositional logic — literals, logical
expressions, clauses, and so on — carry over to predicate logic. In the next section
we introduce two additional operators to form logical expressions. However, the
basic idea behind the construction of logical expressions remains essentially the
same in both propositional and predicate logic.
Literals
A literal is either an atomic formula or its negation. If there are no variables among
Ground literal the arguments of the atomic formula, then the literal is a ground literal.
2 Constants are often called “atoms” in logic. Unfortunately, what we have referred to as
“atomic formulas” are also called “atoms” at times. We shall generally avoid the term
“atom.”
SEC. 14.3 LOGICAL EXPRESSIONS 737
As for propositional logic, we can use an overbar in place of the NOT operator.
However, the bars become confusing to read when applied to a long expression, and
we shall see NOT used more frequently in this chapter than in Chapter 12.
Logical Expressions
We can build expressions from atomic formulas just as we built expressions in
Section 12.3 from propositional variables. We shall continue to use the operators
AND, OR, NOT, →, and ≡, as well as other logical connectives discussed in Chapter
12. In the next section, we introduce “quantifiers,” operators that can be used to
construct expressions in predicate logic but have no counterpart in propositional
logic.
As with the bar shorthand for NOT, we can continue to use the shorthands of
juxtaposition (no operator) for AND and + for OR. However, we use these shorthands
infrequently because they tend to make the longer expressions of predicate logic hard
to understand.
The following example should give the reader some insight into the meaning
of logical expressions. However, note that this discussion is a considerable oversim-
plification, and we shall have to wait until Section 14.5 to discuss “interpretations”
and the meaning that they impart to logical expressions in predicate logic.
✦ Example 14.3. Suppose that we have predicates csg and snap, which we
interpret as the relations Course-Student-Grade and Student-Name-Address-Phone
that were introduced in Chapter 8. Suppose also that we want to find the grade
of the student named “C. Brown” in course CS101. We could assert the logical
expression
csg(“CS101”, S, G) AND snap(S, “C. Brown”, A, P ) → answer(G) (14.1)
Here, answer is another predicate, intended to be true of a grade G if G is the
grade of some student named “C. Brown” in CS101.
When we “assert” an expression, we mean that its value is TRUE no matter
what values we substitute for its variables. Informally, an expression such as (14.1)
can be interpreted as follows. If we substitute a constant for each of the variables,
then each of the atomic formulas becomes a ground atomic formula. We can decide
whether a ground atomic formula is true or false by referring either to the “real
world,” or by looking it up in a relation that lists the true ground atomic formulas
with a given predicate. When we substitute 0 or 1 for each of the ground atomic
formulas, we can evaluate the expression itself, just as we did for propositional logic
expressions in Chapter 12.
In the case of expression (14.1), we can take the tuples in Fig. 8.1 and 8.2(a)
to be true. In particular,
csg(“CS101”, 12345, “A”)
738 PREDICATE LOGIC
and
snap(12345, “C. Brown”, “12 Apple St.”, “555-1234”)
are true. Then we can let
S = 12345
G = “A”
A = “12 Apple St.”
P = “555-1234”
That makes the left side of (14.1) become 1 AND 1, which has the value 1, of course.
In principle, we don’t know anything about the predicate answer. However, we
asserted (14.1), which means that whatever values we substitute for its variables,
its value is TRUE. Since its left side is made TRUE by the above substitution, the
right side cannot be FALSE. Thus we deduce that answer(“A”) is true. ✦
Other Terminology
We shall use other terms associated with propositional logic as well. In general,
when in Chapter 12 we spoke of propositional variables, in this chapter we speak of
any atomic formula, including a predicate with zero arguments (i.e., a propositional
Clause variable) as a special case. For example, a clause is a collection of literals, connected
by OR’s. Similarly, an expression is said to be in product-of-sums form if it is the
AND of clauses. We may also speak of sum-of-products form, where the expression
is the OR of terms and each such term is the AND of literals.
EXERCISES
14.3.1: Write an expression similar to (14.1) for the question “What grade did
L. Van Pelt get in PH100?” For what value of its argument is answer definitely
true, assuming the facts of Figs. 8.1 and 8.2? What substitution for variables did
you make to demonstrate the truth of this answer?
14.3.2: Let cdh be a predicate that stands for the Course-Day-Hour relation of
Fig. 8.2(c), and cr a predicate for the Course-Room relation of Fig. 8.2(d). Write
an expression similar to (14.1) for the question “Where is C. Brown 9AM Monday
morning?” (More precisely, in what room does the course C. Brown is taking on
Monday at 9AM meet?) For what value of its argument is answer definitely true,
assuming the facts of Figs. 8.1 and 8.2? What substitution for variables did you
make to demonstrate the truth of this answer?
14.3.3**: Each of the operations of relational algebra discussed in Section 8.7 can
be expressed in predicate logic, using an expression like (14.1). For example, (14.1)
itself is the equivalent of the relational algebra expression
πGrade σCourse=“CS101” AND N ame=“C.Brown” (CSG ⊲⊳ SN AP )
Show how the effect of each of the operations selection, projection, join, union, in-
tersection, and difference can be expressed in predicate logic in the form “expression
implies answer.” Then translate each of the relational algebra expressions found in
the examples of Section 8.7 into logic.
SEC. 14.4 QUANTIFIERS 739
✦
✦ ✦
✦
14.4 Quantifiers
Let us return to our example involving the zero-argument predicate r (“It is rain-
ing”) and the one-argument predicates u(X) (“X takes his umbrella) and w(X)
(“X gets wet”). We might wish to assert that “If it rains, then somebody gets
wet.” Perhaps we could try
r → w(“Joe”) OR w(“Sally”) OR w(“Sue”) OR w(“Sam”) OR · · ·
2. We don’t know the complete set of individuals about whom we are speaking.
“For all” We instead need a symbol ∀ (read “for all”) to let us construct the AND of the
collection of expressions formed from a given expression by substituting all possible
values for a given variable. We write (∀X)u(X) in this example to mean “for all
X, X takes his or her umbrella.” In general, for any logical expression E, the
expression (∀X)(E) means that for all possible values of the other variables of E,
every constant we may substitute for X makes E true.
Universal and The symbols ∀ and ∃ are called quantifiers. We sometimes call ∀ the universal
existential quantifier and ∃ the existential quantifier.
quantifiers
✦ Example 14.4. The expression r → (∀X) u(X) OR w(X) means “If it rains,
then for all individuals X, either X takes an umbrella or X gets wet.” Note that
quantifiers can apply to arbitrary expressions, not just to atomic formulas as was
the case in previous examples.
For another example, we can interpret the expression
(∀C) (∃S)csg(C, S, “A”) → (∃T )csg(C, T, “B”) (14.2)
3 The parentheses around the E are sometimes needed and sometimes not, depending on
the expression. The matter will be clear when we discuss precedence and associativity of
operators later in the section. The parentheses around ∃X are part of the notation and are
invariably required.
740 PREDICATE LOGIC
as saying, “For all courses C, if there exists a student S who gets an A in the course,
then there must exist a student T who gets a B.” Less formally, “If you give A’s,
then you also have to give B’s.”
A third example expression is
(∀X) NOT w(X) OR (∃Y )w(Y ) (14.3)
Informally, “Either all individuals X stay dry or, at least one individual Y gets
wet.” Expression (14.3) is different from the other two in this example, in that
here we have a tautology — that is, an expression which is true, regardless of the
meaning of predicate w. The truth of (14.3) has nothing to do with properties of
“wetness.” No matter what the set S of values that make predicate w true is, either
S is empty (i.e., for all X, w(X) is false) or S is not empty (i.e., there exists a Y
for which w(Y ) is true). ✦
Precedence of Operators
In general, we need to put parentheses around all uses of expressions E and F .
However, as with the other algebras we have encountered, it is often possible to
remove parentheses because of the precedence of operators. We continue to use the
precedence of operators defined in Section 12.4, NOT (highest), AND, OR, →, and ≡
(lowest). However, quantifiers have highest precedence of all.
Order of Quantifiers
A common logical mistake is to confuse the order of quantifiers — for example, to
think that (∀X)(∃Y ) means the same as (∃Y )(∀X), which it does not. For example,
if we informally interpret loves(X, Y ) as “X loves Y ,” then (∀X)(∃Y )loves(X, Y )
means “Everybody loves somebody,” that is, for every individual X there is at least
one individual Y that X loves. On the other hand, (∃Y )(∀X)loves(X, Y ) means
that there is some individual Y who is loved by everyone — a very fortunate Y , if
such a person exists.
Note that the parentheses around the quantifiers (∀X) and (∃X) are not used
for grouping, and should be regarded as part of the symbol indicating a quantifier.
Also, remember that the quantifiers and NOT are unary, prefix operators, and the
only sensible way to group them is from the right.
✦ Example 14.6. Thus the expression (∀X) NOT (∃Y )p(X, Y ) is grouped
(∀X) NOT (∃Y )p(X, Y )
and means “For all X there is no Y such that p(X, Y ) is true.” Put another way,
there is no pair of values for X and Y that makes p(X, Y ) true. ✦
int X;
···
main()
{
···
++X;
···
}
void f()
{
int X;
···
}
OR
(∀X) (∀X)
u(X) w(X)
Note we could have used different variables for the two “declarations” of X
in (14.4), perhaps writing (∀X)u(X) OR (∀Y )w(Y ). In general, we can always
rename variables of a predicate logic expression so no one variable appears in two
quantifiers. The situation is analogous to a programming language such as C, in
which we can rename variables of a program, so that the same name is not used
in two declarations. For example, in Fig. 14.1 we could change all instances of the
variable name X in the function f to any new variable name Y.
Informally, “For each individual, either that individual takes his umbrella, or there
exists some (perhaps other) individual who gets wet.” The tree for this expression
is shown in Fig. 14.3. Notice that the use of X within w refers to the closest
enclosing “declaration” of X, which is the existential quantifier. Put another way,
if we travel up the tree from w(X), we meet the existential quantifier before we
meet the universal quantifier. However, the use of X within u is not in the “scope”
of the existential quantifier. If we proceed upward from u(X), we first meet the
universal quantifier. We could rewrite the expression as
(∀X) u(X) OR (∃Y )w(Y )
so no variable is quantified more than once. ✦
(∀X)
OR
u(X) (∃X)
w(X)
Fig. 14.3. Expression tree for (∀X) u(X) OR (∃X)w(X) .
OR
u(X) (∃X)
w(X)
EXERCISES
b) (∃X) NOT p(X) AND (∃Y )(p(Y ) OR ∃X) q(X, Z)
14.4.2: Draw expression trees for the expressions of Exercise 14.4.1. Indicate for
each occurrence of a variable to which quantifier, if any, it is bound.
14.4.3: Rewrite the expression of Exercise 14.4.1(b) so that it does not quantify
the same variable twice.
14.4.5*: Using the csg predicate of our running example, write expressions that
assert the following.
14.4.6*: Design a grammar that describes the legal expressions of predicate logic.
You may use symbolic terminals like constant and variable, and you need not avoid
redundant parentheses.
SEC. 14.5 INTERPRETATIONS 745
✦
✦ ✦
✦
14.5 Interpretations
Until now, we have been rather vague about what an expression of predicate logic
“means,” or how we ascribe a meaning to an expression. We shall approach the
subject by first recalling the “meaning” of a propositional logic expression E. That
meaning is a function that takes a “truth assignment” (assignment of truth values
0 and 1 to the propositional variables in E) as its argument and produces 0 or 1
as its result. The result is determined by evaluating E with the atomic operands
replaced by 0 or 1, according to the given truth assignment. Put another way, the
meaning of a logical expression E is a truth table, which gives the value of E (0 or
1) for each truth assignment.
A truth assignment, in turn, is a function that takes propositional variables as
arguments and returns 0 or 1 for each. Alternatively, we can see a truth assignment
as a table that gives, for each propositional variable, a truth value, 0 or 1. Figure
14.5 suggests the role of these two kinds of functions.
truth
p 0 or 1
assignment
truth meaning
0 or 1
assignment
(b) The meaning of an expression is a function from truth assignments to truth values.
values
interpretation
for 0 or 1
for a predicate
arguments
interpretation
p interpretation
for predicate p
value for
X interpretation
variable X
interpretation meaning 0 or 1
2. p(U, V ) is true whenever U < V . That is, the interpretation of p is the relation
consisting of the infinite set of pairs (U, V ) such that U and V are real numbers,
and U is less than V .
Then (14.5) states that for any real numbers X and Y , if X < Y , then there is
some Z lying strictly between X and Y ; that is, X < Z < Y . For interpretation I1 ,
(14.5) is always true. If X < Y , we can pick Z = (X + Y )/2 — that is, the average
of X and Y — and we can then be sure that X < Z and Z < Y . If X ≥ Y , then
the left-hand side of the implication is false, so surely (14.5) is true.
We can build an infinite number of interpretations for (14.5) based on the
interpretation I1 for predicate p, by picking any real numbers for the free variables
X and Y . By what we just said, any of these interpretations for (14.5) will make
(14.5) true.
A second possible interpretation, I2 , for p is
1. D is the set of integers.
2. p(U, V ) is true if and only if U < V .
Now, we claim that (14.5) is true unless Y = X + 1. For if Y exceeds X by two or
more, then Z can be selected to be X + 1. It will then be the case that X < Z < Y .
If Y ≤ X, then p(X, Y ) is false, and so (14.5) is again true. However, if Y = X + 1,
then p(X, Y ) is true, but there is no integer Z lying strictly between X and Y .
Thus for every integer Z, either p(X, Z) or p(Z, Y ) will be false,
and the right-hand
side of the implication — that is, (∃Z) p(X, Z) AND p(Z, Y ) — is not true.
We can extend I2 to an interpretation for (14.5) by assigning integers to the
free variables X and Y . The analysis above shows that (14.5) will be true for any
such interpretation, except for those in which Y = X + 1.
Our third interpretation for p, I3 , is abstract, without a common meaning in
mathematics like those possessed by interpretations I1 and I2 :
1. D is the set of three symbols a, b, c.
2. p(U, V ) is true if U V is one of the six pairs
aa, ab, ba, bc, cb, cc
and false for the other three pairs: ac, bb, and ca.
Then it happens that (14.5) is true for each of the nine pairs XY . In each case,
either p(X, Y ) is false, or there is a Z that makes the right side of (14.5) true. The
nine cases are enumerated in Fig. 14.7. We may extend I3 to an interpretation for
(14.5) in nine ways, by assigning any combination of a, b, and c to the free variables
X and Y . Each of these interpretations imparts the value true to (14.5). ✦
Meaning of Expressions
Recall that the meaning of an expression in propositional logic is a function from
the truth assignments to truth values 0 and 1, as was illustrated in Fig. 14.5(b).
That is, a truth assignment states all that there is to know about the values of
the atomic operands of the expression, and the expression then evaluates to 0 or 1.
Similarly, in predicate logic, the meaning of an expression is a function that takes an
interpretation, which is what we need to evaluate the atomic operands, and returns
0 or 1. This notion of meaning was illustrated in Fig. 14.6(c).
748 PREDICATE LOGIC
X Y Why true
a a Z = a or b
a b Z =a
a c p(a, c) false
b a Z =a
b b p(b, b) false
b c Z =c
c a p(c, a) false
c b Z =c
c c Z = b or c
✦ Example 14.11. Consider the expression (14.5) from Example 14.10. The free
variables of (14.5) are X and Y . If we are given interpretation I1 of Example 14.10
for p (p is < on reals), and we are given values X = 3.14 and Y = 3.5, then the
value of (14.5) is 1. In fact, with interpretation I1 for p and any values for X and
Y , the expression has value 1, as was discussed in Example 14.10. The same is true
of interpretation I3 for p; any values for X and Y chosen from the domain {a, b, c}.
gives (14.5) the value 1.
On the other hand, if we are given interpretation I2 (p is < on integers) and
values X = 3 and Y = 4, then (14.5) has value 0 as we discussed in Example 14.10.
If we have interpretation I2 and values X = 3 and Y = 5 for the free variables, then
(14.5) has the value 1. ✦
5 Strictly speaking, we must throw away from I the interpretation for any predicate p that
SEC. 14.5 INTERPRETATIONS 749
✦ Example 14.12. Let us evaluate expression (14.5) with the interpretation I2 for
p (< on the integers) and the values 3 and 7 for free variables X and Y , respectively.
The expression tree for (14.5) is shown in Fig. 14.8. We observe that the operator
at the root is →. We did not cover this case explicitly, but the principle should
be clear. The entire expression can be written as E1 → E2 , where E1 is p(X, Y ),
and E2 is (∃Z)(p(X, Z) AND p(Z, Y )). Because of the meaning of →, the entire
expression (14.5) is true except in the case that E1 is true and E2 is false.
p(X, Y ) (∃Z)
AND
p(X, Z) p(Z, Y )
appears in E but not E1 . Also, we must drop the value for any free variable that appears in
E but not E1 . However, there is no conceptual difficulty if we include in an interpretation
additional information that is not used.
6 Technically, E1 might not have any free occurrences of X, even though we apply a quantifier
involving X to E1 . In that case, the quantifier may as well not be there, but we have not
prohibited its presence.
750 PREDICATE LOGIC
difficult. We must consider all possible values of v for Z, to see if there is at least
one value that makes p(X, Z) AND p(Z, Y ) true. For example, if we try Z = 0, then
p(Z, Y ) is true, but p(X, Z) is false, since X = 3 is not less than Z.
If we think about the matter, we see that to make p(X, Z) AND p(Z, Y ) true,
we need a value of v such that 3 < v [so p(X, Z) will be true] and such that v < 7
[so p(Z, Y ) will be true]. For example, v = 4 makes p(X, Z) AND p(Z, Y ) true
and therefore shows that E2 , or (∃Z) p(X, Z) AND p(Z, Y ) , is true for the given
interpretation.
We now know that both E1 and E2 are true. Since E1 → E2 is true when both
E1 and E2 are true, we conclude that (14.5) has value 1 for the interpretation in
which predicate p has the interpretation I2 , X = 3, and Y = 7. ✦
EXERCISES
14.5.1: For each of the following expressions, give one interpretation that makes it
true and one interpretation that makes it false.
SEC. 14.6 TAUTOLOGIES 751
a) (∀X)(∃Y )(loves(X, Y ))
b) p(X) → NOT p(X)
c) (∃X)p(X) → (∀X)p(X)
d) p(X, Y ) AND p(Y, Z) → p(X, Z)
14.5.2: Explain why every interpretation makes the expression p(X) → p(X) true.
✦
✦ ✦
✦
14.6 Tautologies
Recall that in propositional logic, we call an expression a tautology if for every truth
assignment, the value of the expression is 1. The same idea holds true in predicate
logic. An expression E is called a tautology if for every interpretation of E, the
value of E is 1.
For example, in the expression q(X) OR NOT q(X), no matter what the interpre-
tation of q or the value of X, q(X) is either true or false. Thus, either the expression
becomes 1 OR NOT 1 or it becomes 0 OR NOT 0, both of which evaluate to 1.
Equivalence of Expressions
As in propositional logic, we can define two expressions, E and F , of predicate
logic to be equivalent if E ≡ F is a tautology. The “principle of substitution of
equals for equals,” also introduced in Section 12.7, continues to hold when we have
equivalent expressions of predicate logic. That is, if E1 is equivalent to E2 , then we
may substitute E2 for E1 in any expression F1 , and the resulting expression F2 will
be equivalent; that is, F1 ≡ F2 .
✦ Example 14.14. The commutative law for AND says (p AND q) ≡ (q AND p). We
might substitute p(X) for p and q(Y, Z) for q, giving us the tautology of predicate
logic
p(X) AND q(Y, Z) ≡ q(Y, Z) AND p(X)
Thus the expressions p(X) AND q(Y, Z) and q(Y, Z)
AND p(X) are equivalent. Now,
if we have an expression like p(X) AND q(Y, Z) OR q(X, Y ), we can substitute
q(Y, Z) AND p(X) for p(X) AND q(Y, Z), to produce another expression,
q(Y, Z) AND p(X) OR q(X, Y )
and know that
p(X) AND q(Y, Z) OR q(X, Y ) ≡ q(Y, Z) AND p(X) OR q(X, Y )
There are more subtle cases of equivalent expressions in predicate logic. Nor-
mally, we would expect the equivalent expressions to have the same free variables
and predicates, but there are some cases in which the free variables and/or predi-
cates can be different. For example, the expression
p(X) OR NOT p(X) ≡ q(Y ) OR NOT q(Y )
is a tautology, simply because both sides of the ≡ are tautologies, as we argued
in Example 14.13. Thus in the expression p(X) OR NOT p(X) OR q(X) we may
substitute q(Y ) OR NOT q(Y ) for p(X) OR NOT p(X), to deduce the equivalence
p(X) OR NOT p(X) OR q(X) ≡ q(Y ) OR NOT q(Y ) OR q(X)
Since the left-hand side of the ≡ is a tautology, we can also infer that
q(Y ) OR NOT q(Y ) OR q(X)
is a tautology. ✦
EXERCISES
14.6.1: Explain why each of the following are tautologies. That is, what expres-
sion(s) of predicate logic did we substitute into which tautologies of propositional
logic?
SEC. 14.7 TAUTOLOGIES INVOLVING QUANTIFIERS 753
a) p(X) OR q(Y ) ≡ q(Y ) OR p(X)
b) p(X, Y ) AND p(X, Y ) ≡ p(X, Y )
c) p(X) → FALSE ≡ NOT p(X)
✦
✦ ✦
✦
14.7 Tautologies Involving Quantifiers
Tautologies of predicate logic that involve quantifiers do not have direct counterparts
in propositional logic. This section explores these tautologies and shows how they
can be used to manipulate expressions. The main result of this section is that we
can convert any expression into an equivalent expression with all the quantifiers at
the beginning.
Variable Renaming
In C, we can change the name of a local variable, provided we change all uses of
that local variable consistently. Analogously, one can change the variable used in
a quantifier, provided we also change all occurrences of that variable bound to the
quantifier. Also as in C, we must be careful which new variable name we pick,
because if we choose a name that is defined outside the function in question, then
we may change the meaning of the program, thereby committing a serious error.
Bearing in mind this kind of renaming, we can consider the following type of
equivalence and conditions under which it is a tautology.
(QX)E ≡ (QY )E ′ (14.6)
where E ′ is E with all occurrences of X that are bound to the explicitly shown
quantifier (QX) replaced by Y . We claim that (14.6) is a tautology, provided no
occurrence of Y is free in E. To see why, consider any interpretation I for (QX)E (or
equivalently, for (QY )E ′ , since the free variables and predicates of either quantified
expression are the same). If I, extended by giving X the value v, makes E true,
then I with the value v for Y will make E ′ true. Conversely, if extending I by using
v for X makes E false, then extending I with v for Y makes E ′ false.
If quantifier Q is ∃, then should there be a value v for X that makes E true,
there will be a value, namely v, for Y that makes E ′ true, and conversely. If Q is
∀, then all values of X will make E true if and only if all values of Y make E ′ true.
Thus, for either quantifier, (QX)E is true under any given interpretation I if and
only if (QY )E ′ is true under the same interpretation, showing that
(QX)E ≡ (QY )E ′
is a tautology.
which happens to be a tautology. We shall show how to rename one of the two X’s,
to form another tautology with distinct variables used in the two quantifiers.
If we let E in (14.6) be p(X, Y ), and we choose variable
Z to play the role of
Y in (14.6), then we have the tautology (∃X)p(X, Y ) ≡ (∃Z)p(Z, Y ) . That is,
to construct the expression E ′ we substitute Z for X in E = p(X, Y ), to obtain
p(Z, Y ). Thus we can substitute “equals for equals,” replacing the first occurrence
of (∃X)p(X, Y ) in (14.7) by (∃Z)p(Z, Y ), to obtain the expression
(∃Z)p(Z, Y ) OR NOT (∃X)p(X, Y ) .
This expression is equivalent to (14.7), and therefore is also a tautology.
Note that we could also replace X in the second half of (14.7) by Z; it doesn’t
matter whether or not we do so, because the two quantifiers define distinct and
unrelated variables, each of which was named X in (14.7). However, we should
understand that it is not permissible to replace either occurrence of ∃X by ∃Y ,
because Y is free in each of the subexpressions p(X, Y ).
That is, (∃X)p(X, Y ) ≡ (∃Y )p(Y, Y ) is not an instance of (14.6) that is
a tautology, because Y is free in the expression p(X, Y ). To see that it is not a
tautology, let p be interpreted as < on integers. Then for any value of the free
variable Y , say Y = 10, the expression (∃X)p(X, Y ) is true, because we can let
X = 9, for example. Yet the right side of the equivalence, (∃Y )p(Y, Y ) is false,
because no integer is strictly less than itself.
Similarly, it is not permissible to substitute (∃Y )p(Y, Y ) for the first instance
of (∃X)p(X, Y ) in (14.7). The resulting expression,
(∃Y )p(Y, Y ) OR NOT (∃X)p(X, Y ) (14.8)
can also be seen not to be a tautology. Again, let the interpretation of p be < on
the integers, and let, for instance, the value of the free variable Y be 10. Note that
in (14.8), the first two occurrences of Y , in p(Y, Y ), are bound occurrences, bound
to the quantifier (∃Y ). Only the last occurrence of Y , in p(X, Y ), is free. Then
(∃Y )p(Y, Y ) is false for this interpretation, because no value of Y is less than itself.
On the other hand, (∃X)p(X, Y ) is true when Y = 10 (or any other integer, for
that matter), and so NOT (∃X)p(X, Y ) is false. As a result, (14.8) is false for this
interpretation. ✦
SEC. 14.7 TAUTOLOGIES INVOLVING QUANTIFIERS 755
Closed Expressions
An interesting consequence is that for tautologies, we can assume there are no free
variables. We can apply the preceding transformation to universally quantify one
free variable at a time. An expression with no free variables is called a closed
expression.
Informally, (14.9) says that E fails to be true for all X exactly when there is some
value of X that makes E false.
There is a similar tautology that lets us push a NOT inside an existential quan-
tifier.
NOT (∃X)E ≡ (∀X)(NOT E) (14.10)
Informally, there does not exist an X that makes E true exactly when E is false for
all X.
so that the quantifiers are outside the expression. First, we need to rename the
variable used by one of the two quantifiers. By law (14.6), we can replace the
subexpression (∃X) NOT p(X) by (∃Y ) NOT p(Y ), giving us the tautology
(∀X)p(X) OR (∃Y ) NOT p(Y ) (14.14)
Now we can use (14.13), in its variant form where the quantifier on the left
operand of the OR is moved, to take the ∀ outside the OR. The resulting expression
is
(∀X) p(X) OR (∃Y ) NOT p(Y ) (14.15)
Expression (14.15) differs from (14.14) in form, but not in meaning; (14.15) states
that for all values of X, at least one of the following holds:
1. p(X) is true.
2. There is some value of Y that makes p(Y ) false.
To see why (14.15) is a tautology, consider some value v for X. If the
interpretation
under consideration makes p(v) true, then p(X) OR (∃Y ) NOT p(Y ) is true. If p(v)
is false, then in this interpretation, (2) must hold. In particular, when Y = v,
NOT p(Y ) is true, and so (∃Y ) NOT p(Y ) is true.
Finally, we can apply (14.13) to move ∃Y outside the OR. The expression that
results is
(∀X)(∃Y )(p(X) OR NOT p(Y ))
This expression also must be a tautology. Informally, it states that for every value
of X, there exists some value of Y that makes p(X) OR NOT p(Y ) true. To see why,
let v be a possible value of X. If p(v) is true in given interpretation I, then surely
p(X) OR NOT p(Y )
is true, regardless of Y . If p(v) is false in interpretation I, then we may pick v for
Y , and (∃Y ) p(X) OR NOT p(Y ) will be true. ✦
Prenex Form
A consequence of the laws (14.9), (14.10), (14.12), and (14.13) is that, given any
expression involving quantifiers and the logical operators AND, OR, and NOT, we can
find an equivalent expression that has all its quantifiers on the outside (at the top
of the expression tree). That is, we can find an equivalent expression of the form
(Q1 X1 )(Q2 X2 ) · · · (Qk Xk )E (14.16)
where Q1 , . . . , Qk each stand for one of the quantifiers ∀ or ∃, and the subexpression
Quantifier-free E is quantifier free — that is, it has no quantifiers. The expression (14.16) is said
expression to be in prenex form.
We can transform an expression into prenex form in two steps.
1. Rectify the expression. That is, use law (14.6) to make each of the quantifiers
refer to a distinct variable, one that appears neither in another quantifier nor
free in the expression.
2. Then, move each quantifier through NOT’s by laws (14.9) and (14.10), through
AND’s by (14.12), and through OR’s by (14.13).
758 PREDICATE LOGIC
✦ Example 14.19. Examples 14.17 and 14.18 were examples of this process.
We
started in Example 14.17 with the expression (∀X)p(X) OR NOT (∀X)p(X) . By
moving the second ∀ through the NOT, we obtained the expression
(∀X)p(X) OR (∃X) NOT p(X)
with which we started in Example 14.18. We then renamed the second use of X,
which we could (and should) have done initially. By moving
the two quantifiers
through the OR, we obtained (∀X)(∃Y ) p(X) OR NOT p(Y ) , which is in prenex
form. ✦
Note that expressions involving logical operators other than AND, OR, and NOT
can also be put in prenex form. Every logical operator can be written in terms
of AND, OR, and NOT, as we learned in Chapter 12. For example, E → F can be
replaced by NOT E OR F . If we write each logical operator in terms of AND, OR, and
NOT, then we are able to apply the transformation just outlined to find an equivalent
expression in prenex form.
Reordering Quantifiers
Our final family of tautologies is derived by noting that in applying a universal
quantifier to two variables, the order in which we write the quantifiers does not
matter. Similarly, we can write two existential quantifiers in either order. Formally,
the following are tautologies.
(∀X)(∀Y )E ≡ (∀Y )(∀X)E (14.17)
EXERCISES
14.7.1: Transform the following expressions into rectified expressions, that is, ex-
pressions for which no two quantifier occurrences share the same variable.
a) (∃X) NOT p(X) AND (∃Y )(p(Y ) OR (∃X) q(X, Z)
SEC. 14.8 PROOFS IN PREDICATE LOGIC 759
b) (∃X) (∃X)p(X) OR (∃X)q(X) OR r(X)
14.7.2: Turn the following into closed expressions by universally quantifying each
of the free variables. If necessary, rename variables so that no two quantifier occur-
rences use the same variable.
14.7.3*: Does law (14.12) imply that p(X, Y ) AND (∀X)q(X) is equivalent to
(∀X) p(X, Y ) AND q(X)
14.7.5*: Show how to move quantifiers through an → operator. That is, turn
the expression (Q1 X)E → (Q2 Y )F into a prenex form expression. What con-
straints on free variables in E and F do you need?
14.7.6: We can use tautologies (14.9) and (14.10) to move NOT’s inside quantifiers
as well as to move them outside. Using these laws, plus DeMorgan’s laws, we can
move all NOT’s so they apply directly to atomic formulas. Apply this transformation
to the following expressions.
a) NOT (∀X)(∃Y )p(X, Y )
b) NOT (∀X) p(X) OR (∃Y )q(X, Y )
✦
✦ ✦
✦
14.8 Proofs in Predicate Logic
In this chapter and the next, we shall discuss proofs in predicate logic. We do not,
however, extend the resolution method of Section 12.11 to predicate logic, although
it can be done. In fact, resolution is extremely important for many systems that use
predicate logic. The mechanics of proofs were introduced in Section 12.10. Recall
that in a proof of propositional logic we are given some expressions E1 , E2 , . . . , Ek
as hypotheses, or “axioms,” and we construct a sequence of expressions (lines) such
that each expression either
2. Follows from zero or more of the previous expressions by some rule of inference.
Rules of inference must have the property that, whenever we are allowed to add F
to the list of expressions because of the presence of F1 , F2 , . . . , Fn on the list,
(F1 AND F2 AND · · · AND Fn ) → F
760 PREDICATE LOGIC
is a tautology.
Proofs in predicate logic are much the same. Of course, the expressions that are
hypotheses and lines of the proof are expressions of predicate logic, not propositional
logic. Moreover, it does not make sense to have, in one expression, free variables
that bear a relationship to a free variable of the same name in another expression.
Thus we shall require that the hypotheses and lines of the proof be closed formulas.
However, we shall continue to refer to variables that are free in E as “free” in (∀∗)E.
This use of the term “free” is strictly incorrect, but is quite useful.
Expressions in Proofs
Remember that when we see an expression E in a proof, it is really short for the
expression (∀∗)E. Note that E ≡ (∀∗)E is generally not a tautology, and so we are
definitely using one expression to stand for a different expression.
It is also helpful to remember that when E appears in a proof, we are not
asserting that (∀∗)E is a tautology. Rather, we are asserting that (∀∗)E follows
from the hypotheses. That is, if E1 , E2 , . . . , En are the hypotheses, and we correctly
write proof line E, then we know
(∀∗)E1 AND (∀∗)E2 AND · · · AND (∀∗)En → (∀∗)E
is a tautology.
That is, we substitute the constant “B” for the variable G and we substitute variable
S for variable P . The variables S and A remain unchanged. The expression sub(E)
is
csg(“CS101”, S, “B”) AND snap(S, “C.Brown”, A, S) → answer(“B”) (14.20)
Informally, (14.20) says that if there is a student S who received a B in CS101, and
the student’s name is C. Brown, and the student’s phone number and student ID
are identical, then “B” is an answer.
Notice that (14.20) is a special case of the more general rule expressed by
(14.19). That is, (14.20) only infers the correct answer in the case that the grade
is B, and C. Brown, by a strange coincidence, has the same student ID and phone
number; otherwise (14.20) infers nothing. ✦
has free variables X and Y , and it has bound variable Z. Recall that technically,
(14.21) stands for the closed expression (∀∗) p(X, Y ) OR (∃Z)q(X, Z) , and that
here the (∀∗) stands for quantification over the free variables X and Y , that is,
(∀X)(∀Y ) p(X, Y ) OR (∃Z)q(X, Z)
762 PREDICATE LOGIC
Substitution as Special-Casing
Example 14.20 is typical, in that whenever we apply a substitution sub to an expres-
sion E, what we get is a special case of E. If sub replaces variable X by a constant
c, then the expression sub(E) only applies when X = c, and not otherwise. If
sub makes two variables become the same, then sub(E) only applies in the special
case that these two variables have the same value. Nonetheless, substitutions for
variables are often exactly what we need to make a proof, because they allow us to
apply a general rule in a special case, and they allow us to combine rules to make
additional rules. We shall study this form of proof in the next section.
is a tautology.
One might wonder what happened to the implied quantifiers in (14.21) when
we substituted a and b for X and Y . The answer is that in the resulting expression,
p(a, b) OR (∃Z)q(a, Z), there are no free variables, and so the implied expression
(∀∗)(p(a, b) OR (∃Z)q(a, Z)) has no prefix of universal quantifiers; that is,
p(a, b) OR (∃Z)q(a, Z)
stands for itself in this case. We do not replace (∀∗) by (∀a)(∀b), which makes no
sense, since constants cannot be quantified. ✦
EXERCISES
14.8.1: Prove the following conclusions from hypotheses, using the inference rules
discussed in Section 12.10, plus the variable-substitution rule just discussed. Note
that you can use as a line of proof any tautology of either propositional or predi-
cate calculus. However, try to restrict your tautologies to the ones enumerated in
Sections 12.8, 12.9, and 14.7.
a) From hypothesis (∀X)p(X) prove the conclusion (∀X)p(X) OR q(Y ).
b) From hypothesis (∃X)p(X, Y ) prove the conclusion NOT (∀X) NOT p(X, a) .
c) From the hypotheses p(X) and p(X) → q(X) prove the conclusion q(X).
✦
✦ ✦
✦
14.9 Proofs from Rules and Facts
Perhaps the simplest form of proof in predicate logic involves hypotheses that fall
into two classes.
SEC. 14.9 PROOFS FROM RULES AND FACTS 763
✦ Example 14.22. One simple application of proofs from rules and facts is in
answering queries as in the relational model discussed in Chapter 8. Each relation
corresponds to a predicate symbol, and each tuple in the relation corresponds to a
ground atomic formula with that predicate symbol and with arguments equal to the
components of the tuple, in order. For example, from the Course-Student-Grade
relation of Fig. 8.1, we would get the facts
csg(“CS101”, 12345, “A”) csg(“CS101”, 67890, “B”)
csg(“EE200”, 12345, “C”) csg(“EE200”, 22222, “B+”)
csg(“CS101”, 33333, “A–”) csg(“PH100”, 67890, “C+”)
Similarly, from the Student-Name-Address-Phone relation of Fig. 8.2(a), we get the
facts
snap(12345, “C.Brown”, “12 Apple St.”, 555-1234)
snap(67890, “L.VanPelt”, “34 Pear Ave.”, 555-5678)
snap(22222, “P.Patty”, “56 Grape Blvd.”, 555-9999)
To these facts, we might add the rule (14.19),
csg(“CS101”, S, G) AND snap(S, “C.Brown”, A, P ) → answer(G)
to complete the list of hypotheses.
Suppose we want to show that answer(“A”) is true, that is, C. Brown gets an
A in CS101. We could begin our proof with all of the facts and the rule, although
in this case we only need the rule, the first csg fact and the first snap fact. That
is, the first three lines of the proof are
1. csg(“CS101”, S, G) AND snap(S, “C.Brown”, A, P ) → answer(G)
764 PREDICATE LOGIC
Interpreting Rules
Rules, like all expressions that appear in proofs, are implicitly universally quantified.
Thus we can read (14.19) as “for all S, G, A, and P , if csg(“CS101”, S, G) is true,
and snap(S, “C.Brown”, A, P ) is true, then answer(G) is true.” However, we may
treat variables that appear in the body, but not in the head, such as S, A, and P ,
as existentially quantified for the scope of the body. Formally, (14.19) is equivalent
to
(∀G) (∃S)(∃A)(∃P ) csg(“CS101”, S, G) AND snap(S, “C.Brown”, A, P )
→ answer(G)
That is, for all G, if there exists S, A, and P such that csg(“CS101”, S, G) and
snap(S, “C.Brown”, A, P ) are both true, then answer(G) is true.
This phrasing corresponds more closely to the way we think of applying a rule.
It suggests that for each value of the variable or variables that appear in the head,
we should try to find values of the variables appearing only in the body, that make
the body true. If we find such values, then the head is true for the chosen values of
its variables.
To see why we can treat variables that are local to the body as existentially
quantified, start with a rule of the form B → H, where B is the body, and H is the
head. Let X be one variable that appears only in B. Implicitly, this rule is
(∀∗)(B → H)
and by law (14.17), we can make the quantifier for X be the innermost, writing the
expression as (∀∗)(∀X)(B → H). Here, the (∀∗) includes all variables but X. Now
we replace the implication
by its equivalent expression using NOT and OR, that is,
(∀∗)(∀X) (NOT B) OR H . Since X does not appear in H, we may apply law (14.13)
in reverse, to make the (∀X) apply to NOT B only, as (∀∗) (∀X) NOT B OR H .
Next, we use law (14.10) to move the (∀X) inside the negation, yielding
(∀∗) NOT (∃X)(NOT NOT B) OR H
or, after eliminating the double negation, (∀∗) NOT (∃X)B OR H . Finally, we
restore the implication to get (∀∗) (∃X)B → H .
The substitution sub is as given in that example, and the subgoals of sub(R) are
lines (2) and (3) in Example 14.22. By the new inference rule, we could write down
line (6) of Example 14.22 immediately; we do not need lines (4) and (5). In fact,
line (1), the rule R itself, can be omitted from the proof as long as it is a given
hypothesis. ✦
766 PREDICATE LOGIC
✦ Example 14.24. For another example of how rules may be applied in proofs,
let us consider the Course-Prerequisite relation of Fig. 8.2(b), whose eight facts can
be represented by eight ground atomic formulas with predicate cp,
cp(“CS101”, “CS100”) cp(“EE200”, “EE005”)
cp(“EE200”, “CS100”) cp(“CS120”, “CS101”)
cp(“CS121”, “CS120”) cp(“CS205”, “CS101”)
cp(“CS206”, “CS121”) cp(“CS206”, “CS205”)
We might wish to define another predicate before(X, Y ) that means course Y must
be taken before course X. Either Y is a prerequisite of X, a prerequisite of a
prerequisite of X, or so on. We can define the notion “before” recursively, by
saying
1. If Y is a prerequisite of X, then Y comes before X.
2. If X has a prerequisite Z, and Y comes before Z, then Y comes before X.
Rules (1) and (2) can be expressed as rules of predicate logic as follows.
cp(X, Y ) → before(X, Y ) (14.22)
cp(X, Z) AND before(Z, Y ) → before(X, Y ) (14.23)
Let us now explore some of the before facts that we can prove with the eight
Course-Prerequisite facts given at the beginning of the example, plus rules (14.22)
and (14.23). First, we can apply rule (14.22) to turn each of the cp facts into a
corresponding before fact, yielding
before(“CS101”, “CS100”) before(“EE200”, “EE005”)
before(“EE200”, “CS100”) before(“CS120”, “CS101”)
before(“CS121”, “CS120”) before(“CS205”, “CS101”)
before(“CS206”, “CS121”) before(“CS206”, “CS205”)
For example, we may use the substitution
sub1 (X) = “CS101”
sub1 (Y ) = “CS100”
on (14.22) to get the substituted rule instance
cp(“CS101”, “CS100”) → before(“CS101”, “CS100”)
This rule, together with the hypothesis cp(“CS101”, “CS100”), gives us
before(“CS101”, “CS100”)
Now we can use rule (14.23) with the hypothesis cp(“CS120”, “CS101”) and
the fact before(“CS101”, “CS100”) that we just proved, to prove
before(“CS120”, “CS100”)
That is, we apply the substitution
sub2 (X) = “CS120”
sub2 (Y ) = “CS100”
sub2 (Z) = “CS101”
to (14.23) to obtain the rule
SEC. 14.9 PROOFS FROM RULES AND FACTS 767
Paths in a Graph
Example 14.24 deals with a common form of rules that define paths in a directed
graph, given the arcs of the graph. Think of the courses as nodes, with an arc
a → b if course b is a prerequisite of course a. Then before(a, b) corresponds to the
existence of a path of length 1 or more from a to b. Figure 14.9 shows the graph
based on the Course-Prerequisite information from Fig. 8.2(b).
When the graph represents prerequisites, we expect it to be acyclic, because
it would not do to have a course that had to be taken before itself. However, even
if the graph has cycles, the same sort of logical rules define paths in terms of arcs.
We can write these rules
arc(X, Y ) → path(X, Y )
that is, if there is an arc from node X to node Y , then there is a path from X to
Y , and
arc(X, Z) AND path(Z, Y ) → path(X, Y )
That is, if there is an arc from X to some Z, and a path from Z to Y , then there
is a path from X to Y . Notice that these are the same rules as (14.22) and (14.23),
with predicate arc in place of cp, and path in place of before.
cp(“CS120”, “CS101”) AND before(“CS101”, “CS100”)
→ before(“CS120”, “CS100”)
We then may infer the head of this substituted rule, to prove
before(“CS120”, “CS100”)
Similarly, we may apply rule (14.23) to the ground atomic formulas
cp(“CS121”, “CS120”)
and before(“CS120”, “CS100”) to prove before(“CS121”, “CS100”). Then we use
(14.23) on cp(“CS206”, “CS121”) and before(“CS121”, “CS100”) to prove
before(“CS206”, “CS100”)
There are many other before facts we could prove in a similar manner. ✦
EXERCISES
14.9.1*: We can show that the before predicate of Example 14.24 is the transi-
tive closure of the cp predicate as follows. Suppose there is a sequence of courses
c1 , c2 , . . . , cn , for some n ≥ 2, and c1 is a prerequisite of c2 , which is a prerequisite of
c3 , and so on; in general, cp(ci , ci+1 ) is a given fact for i = 1, 2, . . . , n − 1. Show that
c1 comes before cn by showing that before(c1 , ci ) for all i = 2, 3, . . . , n, by induction
on i.
14.9.2: Using the rules and facts of Example 14.24, prove the following facts.
a) before(“CS120”, “CS100”)
b) before(“CS206”, “CS100”)
768 PREDICATE LOGIC
CS206
CS121 CS205
CS120
CS101 EE200
CS100 EE005
✦
✦ ✦
✦
14.10 Truth and Provability
We close our discussion of predicate logic with an introduction to one of the more
subtle issues of logic: the distinction between what is provable and what is true.
We have seen inference rules that allow us to prove things in either propositional or
predicate logic, yet we could not be sure that a given set of rules was complete, in the
sense that they allowed us to prove every true statement. We asserted, for instance,
that resolution as we presented it in Section 12.11 is complete for propositional
logic. A generalized form of resolution, which we do not cover here, is also complete
for predicate logic.
SEC. 14.10 TRUTH AND PROVABILITY 769
Models
However, to understand completeness of a proof-strategy, we need to grasp the
notion of “truth.” To get at “truth,” we need to understand the notion of a model.
Every kind of logic has a notion of models for a collection of expressions. These
models are the interpretations that make the expressions true.
Entailment
We can now state what it means for an expression E to be true, given a collection
of expressions {E1 , E2 , . . . , En }. We say that {E1 , E2 , . . . , En } entails expression E
if every model M for {E1 , E2 , . . . , En } is also a model for E. The double turnstile
Double turnstile operator |= denotes entailment, as
E1 , E2 , . . . , En |= E
The intuition we need is that each interpretation is a possible world. When we say
E1 , E2 , . . . , En |= E, we are saying that E is true in every possible world where the
expressions E1 , E2 , . . . , En are true.
The notion of entailment should be contrasted with the notion of proof. If
we have a particular proof system such as resolution in mind, then we can use the
Single turnstile single turnstile operator ⊢ to denote proof in the same way. That is,
E1 , E2 , . . . , En ⊢ E
770 PREDICATE LOGIC
means that, for the set of inference rules at hand, there is a proof of E from the
hypotheses E1 , E2 , . . . , En . Note that ⊢ can have different meanings for different
proof systems. Also remember that |= and ⊢ are not necessarily the same relation-
ship, although we would generally like to have a proof system in which one is true
if and only if the other is true.
There is a close connection between tautologies and entailment. In particular,
suppose E1 , E2 , . . . , En |= E. Then we claim
(E1 AND E2 AND · · · AND En ) → E (14.24)
is a tautology. Consider some interpretation I. If I makes the left-hand side of
(14.24) true, then I is a model of {E1 , E2 , . . . , En }. Since E1 , E2 , . . . , En |= E,
interpretation I must also make E true. Thus I makes (14.24) true.
The only other possibility is that I makes the left-hand side of (14.24) false.
Then, because an implication is always true when its left-hand side is false, we know
(14.24) is again true. Thus (14.24) is a tautology.
Conversely, if (14.24) is a tautology, then we can prove E1 , E2 , . . . , En |= E.
We leave this proof as an exercise.
Notice that our argument does not depend on whether the expressions involved
are of propositional or predicate logic, or some other kind of logic that we have not
studied. We only need to know that the tautologies are the expressions made true
by every “interpretation” and that a model for an expression or set of expressions
is an interpretation making the expression(s) true.
Undecidability
The logician Alan Turing developed a formal theory of computing in the 1930’s
considerably before there were any electronic computers to model with his theory.
The most important result of this theory is the discovery that certain problems are
undecidable; no computer whatsoever can answer them.
Turing machine A centerpiece of the theory is the Turing machine, an abstract computer that
consists of a finite automaton with an infinite tape divided into squares. In a single
move, the Turing machine can read the character on the one square seen by its tape
head, and based on that character and its current state, replace the character by
a different one, change its state, and move the tape head one square left or right.
An observed fact is that every real computer, as well as every other mathematical
model of what a computing engine should be, can compute exactly what the Turing
machine can compute. Thus we take the Turing machine as the standard abstract
model of a computer.
However, we do not have to learn the details of what a Turing machine can do
in order to appreciate Turing’s theory. It suffices to take as a model of a computer a
kind of C program that reads character input and has only two possible write state-
ments: printf("yes\n") and printf("no\n"). Moreover, after making an output
of either type, the program must terminate, so that it cannot make a contradictory
output later. Understand that a program of this type might, on some inputs, give
neither a “yes” nor a “no” response; it might run forever in a loop.
We shall prove that there is no program like D, the “decider” program of Fig.
14.10(a). D supposedly takes as input a program P of the special type above, and
says “yes” if P says “yes” when given P itself as input. D says “no” if — when
P is given P as input — P either says “no” or P fails to make any decision. As
we shall see, it is this requirement that D figure out the occasions when P is never
going to render a decision that makes D impossible to write.
However, supposing that D exists, it is a simple matter to write a “comple-
menter” program C, as suggested in Fig. 14.10(b). C is formed from the hypotheti-
cal D by changing every statement that prints “no” into one that prints “yes,” and
vice versa.
Now we ask what happens when C is given itself as input, as suggested in Fig.
14.10(c)? If C says “yes,” then as Fig. 14.10(b) reminds us, C is asserting that “C
does not say ‘yes’ on input C.” If C says “no,” then C is asserting that “C says
‘yes’ on input C.” We now have a contradiction similar to Russell’s paradox, where
C can say neither “yes” nor “no” truthfully.
The conclusion is that the decider program D does not really exist. That is,
the problem solved by D, which is whether a given C program of the restricted type
says “yes” or fails to say “yes” (by saying “no” or by saying nothing) when given
itself as input, cannot be solved by computer. It is an undecidable problem.
Since Turing’s original result, a wide variety of undecidable problems have been
discovered. For example, it is undecidable whether a given C program enters an
infinite loop on a given input, or whether two C programs produce the same output
on the same input.
SEC. 14.10 TRUTH AND PROVABILITY 773
?
ր
C → C
ց
?
EXERCISES
1. E1 = (∀X)p(X, X)
2. E2 = (∀X)(∀Y ) p(X, Y ) → p(Y, X)
3. E3 = (∀X)(∀Y )(∀Z) p(X, Y ) AND p(Y, Z) → p(X, Z)
4. E4 = (∀X)(∀Y ) p(X, Y ) OR p(Y, X)
5. E5 = (∀X)(∃Y )p(X, Y )
Which of these five expressions are entailed by the other four? In each case, either
give an argument about all possible interpretations, to show entailment, or give a
particular interpretation that is a model of four of the expressions but not the fifth.
Hint : Start by imagining that the predicate p represents the arcs of a directed graph,
and look at each expression as a property of graphs. The material in Section 7.10
should give some hints either for finding appropriate models in which the domain is
the nodes of a certain graph and predicate p the arcs of that graph, or for showing
why there must be entailment. Note however, that it is not sufficient to show
entailment by insisting that the interpretation be a graph.
14.10.3*: Let S1 and S2 be two sets of expressions of predicate logic (or proposi-
tional logic — it doesn’t matter), and let their corresponding sets of models be M1
and M2 , respectively.
a) Show that the set of models for the set of expressions S1 ∪ S2 is M1 ∩ M2 .
b) Is the set of models for set of expressions S1 ∩ S2 always equal to M1 ∪ M2 ?
14.10.4*: Show that if (E1 AND E2 AND · · · AND En ) → E is a tautology, then
E1 , E2 , . . . , En |= E.
✦
✦ ✦
✦
14.11 Summary of Chapter 14
The reader should have learned the following points from this chapter.
✦ Predicate logic uses atomic formulas, that is, predicates with arguments, as
atomic operands and the operators of propositional logic, plus the two quanti-
fiers, “for all” and “there exists.”
✦ Variables in an expression of predicate logic are bound by quantifiers in a
manner analogous to the binding of variables in a program to declarations.
✦ Instead of the truth assignments of propositional logic, in predicate logic we
have a complex structure called an “interpretation.” An interpretation consists
of a domain of values, relations on that domain for the predicates, and values
from the domain for any free variables.
✦ The interpretations that make a set of expressions true are the “models” of
that set of expressions.
✦ Tautologies of predicate calculus are those that evaluate to TRUE for every inter-
pretation. While many tautologies are obtained by substitution into tautolo-
gies of propositional logic, there are also some important tautologies involving
quantifiers.
SEC. 14.12 BIBLIOGRAPHIC NOTES FOR CHAPTER 14 775
✦
✦ ✦
✦
14.12 Bibliographic Notes for Chapter 14
The books on logic, including Enderton [1972], Mendelson [1987], Lewis and Pa-
padimitriou [1981], and Manna and Waldinger [1990] that we cited in Section 12.14,
also cover predicate logic.
Gödel’s incompleteness theorem appeared in Gödel [1931]. Turing’s paper on
undecidability is Turing [1936].
Gödel, K. [1931]. “Uber formal unentscheidbare satze der Principia Mathematica
und verwander systeme,” Monatschefte fur Mathematik und Physik 38, pp. 173–
198.
Turing, A. M. [1936]. “On computable numbers with an application to the entschei-
dungsproblem,” Proc. London Math. Soc. 2:42, pp. 230–265.
✦✦✦
✦ Index
A ANSI
See American National Standards In-
Abstract data type 261, 307 stitute
See also Dictionary, Priority queue, ANSI C
Queue, Stack See C
Abstract symbol 593–594 Anti-implicant 668
Abstraction 1, 3 Antisymmetry 388–390
Abstraction (of sets) 338–339 Arc 452, 488
Acceptance, of input 533, 539 Argument size
Accepting state 532, 549 See Size of arguments
Activation record 313 Arithmetic expression 46–50, 62–63, 242–
Acyclic graph 454 244, 592, 611–615
Acyclic path 456 Arithmetic operator 18–19, 109
Addition 10–12, 716–723 Arithmetic series 42
See also One-bit adder, Ripple-carry Arity 366
addition Array 14, 109, 232, 301–306, 308–309, 329–
Adjacency list 459–461, 463 333, 358–360
Adjacency matrix 460–462, 515 See also Adjacency matrix, Charac-
Adjacent nodes 456 teristic vector, Circular array, Heap
ADT (data structure)
See Abstract data type ASCII 38
Aho, A. V. 155, 335, 450, 527, 589, 641 Assignment (of values to objects) 157–160
Aleph-zero Assignment-statement 19, 109
Associative law 47, 344–345, 441, 570, 652,
See Countable set
675
Algebra 5, 343
See also Nonassociative operator
See also Algebraic law, Expression,
Astrahan, M. M. 450
Propositional logic, Regular expres-
Atom 338
sion, Relational algebra, Set algebra
Atomic formula 735
Algebraic law
Attribute 404, 414
See Associative law, Commutative
Automaton 530–556, 571–590, 637–639,
law, Distributive law, Idempotence,
704
Identity
See also Deterministic automaton,
Algol 641
ǫ-automaton, Nondeterministic au-
Algorithm 4–5, 20
tomaton
Alphabetic order
Average-case running time 93, 215–216,
See Lexicographic order 270–271
Ambiguous grammar 610–616 See also Running time
American National Standards Institute 24 Awk 450, 566
Anagram 178
See also Partial anagram B
Ancestor 226, 254
See also Lowest common ancestor Backus, J. W. 641
AND 10, 644, 649, 651–652, 657, 675–679, Backus-Naur form
688, 700, 740, 756–757 Backward arc 488, 494
Annihilator 570, 675–676 Backward induction 72
776
INDEX 777
Bag C
See Multiset
Balanced parentheses 64–68, 594–596, 610– C 3, 7, 13–20, 22–23, 25, 64, 265, 312, 368,
611 562, 741–742, 758
Bar-Hillel, Y. 641 Call by reference 265
Calling graph 129–130, 455
Basis 28, 35, 44–46, 59, 132
Cantor, G. 402
Bayes’ rule 202
Car 290
Bell curve 175–176
Carry 10
Bellman, R. E. 335 Carry-in/out 656
Benchmarking 91 Carry-propagate/generate 717–720
Berge, C. 527 Cartesian product 367–368
Berlekamp, E. R. 88 Case analysis 683
Big-oh 96–109, 303, 355 Cdr 290
Bin 181 Cell 7, 23, 293
Binary operator 50 Chamberlin, D. D. 450
Binary relation 366, 380–396 Character 13
Binary search 304–305 Character class 565
Character string 287, 327–333
Binary search tree 258–271
Characteristic vector 357–360, 376, 379,
Binary search tree property 259
383–384
Binary tree 253–258
Child 224, 254
See also Binary search tree, Partially See also Left child, Right child
ordered tree Chip 523, 711, 731
Binomial coefficient 176 Chomsky, N. 641
Bipartite graph 524 Chromatic number 524
Bit 10, 159 Chuck-a-Luck 182
Blackjack 187 Circuit 523
Block 110–111, 114–115, 117, 122–123, 597 See also Logic circuit
Block, of a program Circuit delay 711, 721, 727–728
BNF Circular array 320–321
See Backus-Naur form Clarity 90
Body, of a production 593 Class
See Abstract data type
Body, of a rule 763
Clause 692, 738
See also Hypothesis
Clear (operation on stacks, queues) 307–
Boole, G. 698 308, 318
BOOLEAN 22 Clique 525
Boolean algebra 5, 642 Closed expression 755
Boolean function 647–648, 655–660 Closure 393–395, 560–561
Borodin, A. B. 155 See also Connected component, Topo-
Bounce filter 533–535 logical sorting, Transitive closure
Bound variable 741–744 Codd, E. F. 450
Branching factor 232 Coersion 20
Break Coloring 524–525
Combinational circuit 702
See Jump-statement
Combinations 171–178, 258
Bridge 187
Combinatorics 156–187
Brooks, F. P. 24
Common subsequence 321
BST See also LCS
See Binary search tree Commutative law 47, 344, 441–442, 570,
Bubbledown 276, 508–509 674–675
Bubbleup 274–276, 508–509 Comparable elements 390
Bucket 362 Comparison operator 19, 109
778 INDEX
For all H
See Universal quantifier
For-statement 111–112, 115, 117, 121–122, Halmos, P. R. 402
129 Hash function 362
Fortran 641 Hashing 360–366, 376–379, 384, 409
Head, of a list 288
Forward arc 488, 494
See also Car
Fredkin, E. 285
Head, of a production 593
free 18 Head, of a rule 763
Free variable 741–744 See also Conclusion
Friedman, A. D. 732 Head, of an arc 452–453
Front 318 Heap (data structure) 273–279
Full binary tree 257, 269–270 Heapify 281–282
Full stack 307–308 Heapsort 280–284
Full tree Height 226, 244–247, 254, 472–473
Function 17–18, 69, 127–136, 312–318, Held, G. 450
370–380 Hennessy, J. L. 732
Hopcroft, J. E. 155, 335, 527, 589, 641
See also Boolean function, Monotone
Huffman, D. A. 589
function
Hunt, J. W. 335
Fuzzy logic 211
Hypothesis 686
See also Body, of a rule
G
I
Garey, M. R. 698
Gate 9–10, 700–701 Idempotence 346, 570, 676
GCD Identifier 17
See Greatest common divisor Identity 292, 345, 569, 675
Genesereth, M. R. 698 If-statement 112–114, 117, 122, 129, 597,
Geometric series 42 643
Global variable 22, 741 Implicant 661, 665
See also Static variable Implication 650, 652, 679
Implication of events 209–210
Godel, K. 770, 775
Important state 579
Goto
Incommensurate functions 108
See Jump-statement Incompleteness theorem 770–771
Graham, R. L. 155, 221–222 Increment 19
Grammar 591–641 In-degree 462
Graph 369, 451–528, 531 Independent experiments 195–197, 200–
See also Acyclic graph, Bipartite 201
graph, Calling graph, Course-conflict Index 425–426
graph, Directed graph, Planar graph, See also Primary index, Secondary in-
Reduced graph, Undirected graph dex
Greatest common divisor 74–75 Index-join 438–439
Greedy algorithm 481 Indirect recursion 69, 455
Green, B. F. 335 Induction 27, 34–51, 253
Grep See also Backward induction, Com-
plete induction, Structural induction
See Egrep
Inductive assertion
Ground atomic formula 735 See Loop invariant
Ground literal 736 Inductive definition
Growth rate 99 See Recursive definition
See also Big-oh Inductive hypothesis 35
Guessing a solution 148–152 Inductive step 28, 35, 44, 59, 132
INDEX 781
}}
}} Chapter 1. Computer Science: The Mechanization
of Abstraction
} Section 1.3
1.3.1: The static part of a data model consists of the values for the objects in the
model; the dynamic part consists of the operations that can be applied to these
values. For example, we can think of the set of integers with the operation addition
as a data model. The static part is the set of integers and the dynamic part is the
addition operator.
1.3.3: The data objects in a line-oriented text editor, such as vi, are les consisting
of sequences of lines, where each line is a sequence of characters. A cursor identies
a position within a line. There are operators for positioning the cursor within a
le. Typical operations on lines include inserting an additional line and deleting an
existing line. A line may be modied by inserting, deleting, or changing characters
within it. In addition, there are operators for creating, writing, and reading les.
} Section 1.4
1.4.1: An identier can be one of the names for a box. For example, an identier
x in C may be attached to a box containing an integer by means of a variable
declaration int x;. One of the names of that integer box is then x.
1
2 SOLUTIONS TO SELECTED EXERCISES
}}
}} Chapter 2. Iteration, Induction, and Recursion
} Section 2.2
2.2.1(a): With 5 elements in the array, SelectionSort makes 4 iterations with the
loop-index i = 0; 1; 2; 3. The rst iteration makes 4 comparisons, the second 3, the
third 2, the fourth 1, for a total of 10 comparisons. With the array 6; 8; 14; 17; 23,
there are no swaps (exchanges of elements) in any iteration.
2.2.1(b): On the array 17; 23; 14; 6; 8, SelectionSort makes 4 iterations. The
numbers of comparisons and swaps made during each iteration are summarized in
the following table. We shall not regard a swap as having occurred if the selected
element is already in its proper position. However, the reader should be aware that
lines (6) { (8) of Fig. 2.2 are executed regardless of whether a swap is needed. Note
that when small = i, these lines have no eect.
ITERATION ARRAY AFTER ITERATION NO OF COMPARISONS NO OF SWAPS
Start 17; 23; 14; 6; 8
1 6; 23; 14; 17; 8 4 1
2 6; 8; 14; 17; 23 3 1
3 6; 8; 14; 17; 23 2 0
4 6; 8; 14; 17; 23 1 0
2.2.3: In what follows, we use the conventions and macros of Section 1.6. To begin,
we use the cell/list macro to dene linked lists of characters, as:
DefCell(char, CELL, LIST);
2.2.5: If all n elements in the array A are the same, then SelectionSort(A, n)
makes n(n 1)=2 comparisons but no swaps.
2.2.7: Let T be an arbitrary type. Dene
typedef T TARRAY[MAX];
TARRAY A;
CHAPTER 2. ITERATION, INDUCTION, AND RECURSION 3
T
We modify SelectionSort as follows to sort elements of type . The function
key(x) returns the key of type Kfor the element x. We assume that the function
lt(u,v) returns TRUE if u is \less than" v and FALSE otherwise, where u and v are
elements of type K .
void SelectionSort(TARRAY A, int n) {
int i, j, small;
T temp;
2.2.11:
189
X
a) (2i 1)
i=1
n=2
X
b) (2i)2
i=1
Yk
c) 2i
i=3
} Section 2.3
STATEMENT S (n):
n
X
i = n(n + 1)=2
i=1
BASIS.
P
The basis, n = 1, is obtained by substituting 1 for n in S (n). Doing so, we
get 1i=1 i = 1. We thus see that S (1) is true.
INDUCTION. Now assume that n 1 and that S (n) is true. We must prove
S (n + 1), which is
4 SOLUTIONS TO SELECTED EXERCISES
nX
+1
i = (n + 1)(n + 2)=2
i=1
We can rewrite the left-hand side as
n
X
( i) + (n + 1)
i=1
Then, using the inductive hypothesis to replace the rst term, we get
n(n + 1)=2 + (n + 1) = n(n + 1) + 2(n + 1)=2 = (n + 1)(n + 2)=2
which is the right-hand side of S (n + 1). We have now proven the inductive step
and thus shown that S (n + 1) is true. We conclude that S (n) holds for all n 1.
2.3.1(b): We shall prove the following statement S (n) by induction on n, for n 0.
STATEMENT S (n):
n
X
i2 = n(n + 1)(2n + 1)=6, for all n 0
i=1
BASIS. S (0), the basis, is P0i=1 i2 = 0, which is true by the denition of a sum of
zero elements.
INDUCTION. Assume that n 0 and that S (n) is true. We now need to prove
S (n + 1), which is
nX
+1
i2 = (n + 1)(n + 2)(2n + 3)=6
i=1
We can rewrite the left-hand side as
n
X
( i2) + (n + 1)2
i=1
Using the inductive hypothesis to replace the rst term, we get
n(n + 1)(2n + 1)=6 + (n + 1)2
= (n + 1)(n(2n + 1) + 6(n + 1))=6
= (n + 1)(2n2 + 7n + 6)=6
= (n + 1)(n + 2)(2n + 3)=6
The last expression is the right-hand side of S (n + 1). We have now proven the
inductive step. We therefore conclude S (n) is true for all n 0.
2.3.3:
a) 01101 has three 1's and is therefore of odd parity.
b) 111000111 has six 1's and is of even parity.
CHAPTER 2. ITERATION, INDUCTION, AND RECURSION 5
} Section 2.4
} Section 2.5
2.5.1: As in Fig. 2.12, we establish an invariant that is true at the top of the loop,
that is, at the time when the program tests whether i > n as part of the code for
the for-statement. The invariant, which we prove by induction on the value of the
variable i, is
BASIS. The basis is j = 1, which occurs when we enter the for-loop for the rst
time. At this time, sum has its initialized value 0. Thus, S (1) is true.
2.5.3: The loop invariant we shall prove by induction on k, the value of variable i,
is
STATEMENT S (k): If we reach the test i n in the for-loop with the variable i
having the value k, then x = 22k 1 .
BASIS. The basis is k = 1. Upon entry to the loop, x = 2. Since 22k 1 = 221 1 =
2 1
2 = 2 = 2, we see that S (1) holds.
0
} Section 2.6
2.6.2(b):
2
1
0
1
( ) ( ) ) ( ( )
CHAPTER 2. ITERATION, INDUCTION, AND RECURSION 7
2.6.2(c):
3
2
1
0
( ( ( ) ( ) ) ( ) ( ) )
2.6.4:
a) < is an inx binary operator.
b) & is a prex, unary operator.
c) % is an inx, binary operator.
2.6.6:
a) By direct enumeration we can show that S starts o
0; 5; 7; 10; 12; 14; 15; 17; 19; 20; 22; 24; 25; 26; 27; 28;
We shall prove that 23 is the largest integer not in S .
b) We shall prove the following statement T (n) by induction on n, for n 24.
} Section 2.7
2.7.1:
a) Here is a C function to compute sq(n) = n2 , when n is a positive integer.
int sq(int n) {
if(n==1) return 1;
else return sq(n-1) + 2*n - 1;
}
8 SOLUTIONS TO SELECTED EXERCISES
2.7.5: The following procedure is adapted from Fig. 2.22. The array A and its cursor
i is replaced by L, a pointer to a list of elements. The cursor small, indicating our
current guess at the selected element, is replaced by a pointer Small that points
to the cell of the current guess. Cursor j, used to run down the list of unsorted
elements, is replaced by a pointer J, and n, the size of the array A, is implicit in the
length of the given list.
void SelectionSort(LIST L) {
LIST J, Small;
int temp;
2.7.7: Procedure g(i) prints the remainder when i is divided by 2 and then calls
itself recursively on the integer part of i=2. An easy inductive proof shows that
CHAPTER 2. ITERATION, INDUCTION, AND RECURSION 9
void f(int i) {
if(i==0) printf("0");
else g(i);
}
} Section 2.8
2.8.5: In all of the parts, the trick is to identify an appropriate measure of size for
the arguments that decreases with each recursive call.
a) A good size measure for merge(L,M) would be s the sum of the lengths of the
lists L and M. We see that merge of size s calls merge of size s 1 which calls
merge of size s 2, and so on, until one or the other of L or M becomes NULL.
} Section 2.9
2.9.1: Note: PrintList is in Fig. 2.31(a), not (b) as it states erroneously in the
rst printing. We shall prove by induction on i the following statement S (i), for
i 0.
10 SOLUTIONS TO SELECTED EXERCISES
}}
}} Chapter 3. The Running Time of Programs
} Section 3.3
3.3.1: Lines (1) (3) each time one time unit. For line (4), the test takes one
unit, and it is executed n times. Lines (5) and (6) each take one unit and they are
executed n 1 times. Line (7) takes one unit. Thus, the total time taken by the
program in Fig. 2.13 is 3n + 2 units.
3.3.3: Program A takes less time than program B for all values of n 29. For
n 30, program A takes more time than program B . At n = 29, program A takes
5:4 105 time and program B takes 8:4 105 time. At n = 30, program A takes
1:1 106 time and program B takes 9 105 time.
3.3.5: Program A takes more time than program B for n 3, and less time
thereafter.
TIME MAXIMUM PROBLEM SIZE MAXIMUM PROBLEM SIZE
UNITS SOLVABLE WITH PROGRAM A SOLVABLE WITH PROGRAM B
106 5 3
109 31 7
1012 177 15
} Section 3.4
3.4.1: f1(n) is O f2(n) , O f3 (n) , and O f4 (n) . In each case, we can use wit-
nesses c = 1 and n0 = 0.
f2 (n) is not O f1 (n) , O f3 (n) , or O f4 (n) . To show that f2 (n) is not
O f1 (n) , suppose that it were. Then, there would be witnesses c > 0 and n0 such
that n3 cn2 for all n n0. But this implies, c n for all n n0 , contradicting
our assumption that c is a constant.
f3 (n) is not O f1(n) but it is O f2 (n) and O f4 (n) . To show f3 (n) is
O f4 (n) , we can use n0 = 3 and c = 1. Remember that every even number except
2 is composite.
f4 (n) is not O f1 (n) but it is O f2 (n) and O f3 (n) .
3.4.3: Choose c = 2 and n0 = 0. Since f (n) g(n) for n 0, we know that
f (n) + g(n) 2g(n) for n 0. Therefore, f (n) + g(n) is O g(n) .
12 SOLUTIONS TO SELECTED EXERCISES
} Section 3.5
3.5.1:
a) Choose witnesses c = 1 and n0 = 1. Because a b, we know na nb for
n n0 . Thus, na is O(nb ).
b) Suppose there exist witnesses c > 0 and n0 such that na cnb for all n n0
when a > b. Let d be the larger of n0 and c1=(a b) + 1. Because of the assumed
big-oh relationship, we infer that da cdb or da b c. But from our choice of
d, we know that da b > c, a contradiction. Thus, we conclude that na is not
O(nb ) if a > b.
3.5.3: Since T (n) is O(f (n) , we know that there exist witnesses c > 0 and n0 0
such that T (n) cf (n) for all n n0. Since g(n) 0 for all n 0, we know
g(n)T (n) cf (n)g(n), for n 0. Thus, g(n)T (n) is O g(n)f (n) .
3.5.5: Since f (n) is O g(n) , there exist witnesses c > 0 and n0 0 such
that f (n) cg(n) for all n n0 . Choose d = max(c; 1). For any value of
n, max f (n); g(n) is either f (n) or g(n). If max f (n); g(n) is f (n), we know
f (n) cg(n) dg(n). If max f (n); g(n) is g(n), we know g(n) dg(n). Thus,
max f (n); g(n) is O g(n) .
} Section 3.6
} Section 3.7
3.7.1: The tree is shown in Fig. S3.1. The assignment statements at the leaves (2),
(4), (5), (6), (7), (10), (11) each take O(1) time. The for-statement, (3){(4), takes
O(n) time. The if-statement, (9){(10), takes O(1) time and the while-statement,
(8){(11), takes O(n) time. The running time of the entire program represented by
CHAPTER 3. THE RUNNING TIME OF PROGRAMS 13
Block
(1)-(13)
5 Block
(10)-(12)
if
(10)-(11) 12
11
} Section 3.8
3.8.1: There are several ways to attack this problem. A proof by induction on n
can be used to show that
n
X
(i + n(n + 1)=2) = (n3 + 2n2 + n)=2
i=1
for all n 0. Perhaps simpler is to note that the left-hand side can be written as
n
X n
X
i+ n(n + 1)=2
i=1 i=1
The rst term sums to n(n + 1)=2 as we saw in the introduction to Chapter 2.
The expression in the second term is independent of i. The second term is thus
n2 (n + 1)=2. Adding these two sums, we get
n(n + 1)=2 + n2 (n + 1)=2 = (n3 + 2n2 + n)=2
14 SOLUTIONS TO SELECTED EXERCISES
for
(1)-(6)
block
(2)-(6)
while
2 3
(4)-(6)
block
(5)-(6)
5 6
which is the expression on the right-hand side of the equality we wanted to prove.
3.8.3: Each time we go around the loop we evaluate f (n) and increment i. We
also initialize i before the loop. Initialization and incrementation of i are each O(1)
operations, and we can neglect them. Thebody of the loop is iterated f (n) times,
taking O(1) time per
iteration, or O f (n) time total. Thus, the running time of
the loop is O f (n) plus the time to evaluate f (n) f (n) + 1 times. For example,
the answer to (a) is O n (n! + 1) + O(n!) = O(n n!).
3.8.5: Note that bar(n; n) = (n2 + 3n)=2. The function bar takes O(n) time as
before. Line (8) of procedure foo takes O(n) time and the for-loop of lines (7){(8)
is iterated (n2 + 3n)=2 times. The evaluation of bar(n; n) in the new line (7) takes
O(n) time and can be neglected. Thus, procedure foo now takes O(n3 ) time. The
running time of main is dominated by the running time of foo and thus main takes
O(n3) time.
} Section 3.9
3.9.1: Let T (n) be the running time of sum(L), where n is the length of the list L.
We can dene T (n) by the following recurrence relation:
T (0) = O(1)
T (n) = O(1) + T (n 1)
Replacing the big-oh's by constants, we get
CHAPTER 3. THE RUNNING TIME OF PROGRAMS 15
T (0) = a
T (n) = b + T (n 1), for n 1
3.9.3: Let m be the number of elements yet to be sorted. Let T (m) be the run-
ning time of SelectionSort applied to m elements. We can dene the following
recurrence for T (m):
T (1) = O(1)
T (m) = O(m) + T (m 1), for m > 1
int r;
r = i % j;
if(r != 0) return gcd(j,r);
else return j;
}
For convenience, assume that i > j . (Note that this property always holds except
possibly for the rst invocation of gcd.) Let T (i) be the running time of gcd(i; j ).
Suppose gcd(i; j ) calls gcd(j; m) which calls gcd(m; n). We shall show that
m i=2. There are two cases. First, if j i=2, then m j i=2. Second, If
j > i=2, then m = i MOD j = i j i=2.
Thus, we conclude that after every two calls to gcd, the rst argument is
reduced by at least half. If we substitute the text of gcd for one invocation of the
recursive call, we can model the running time of gcd by the recurrence
T (i) O(1) + T (i=2)
} Section 3.10
3.10.1(a):
if
(1){(6)
ret if
(1) (2){(6)
ret block
(2) (3){(6)
asgn asgn asgn ret
(3) (4) (5) (6)
As in the text, let T (n) be the time taken by split on list of length n. The
running time of the assignments (1), (2), (3), (4), and (6) is each O(1). The running
time of assignment (5) is T (n 2). The running time of the block-node representing
lines (3){(6) is O(1) + T (n 2), as is the running time of the if-node representing
lines (1){(6).
} Section 3.11
T (n i) = T (n i 1) + g(n i)
Substituting this equation into the previous, we get
i
X
T (n) = T (n i 1) + g(n j )
j =0
which is S (i +1). We have thus proven the inductive step. We conclude S (i) is true
for 1 i < n.
3.11.3: The general form of the solution is
2n 1
logX
T (n) = a + g(n=2j )
j =0
}}
}} Chapter 4. Combinatorics and Probability
} Section 4.2
4.2.1(a): 43 = 64.
4.2.3: First, note that we can choose input x so that each of the eight conditions
is either true or false, as we wish. The reason is that each test asks whether x is
divisible by a dierent prime. We may pick x to be the product of those primes
for which we would like the test to be true. It is not possible that this product is
divisible by any of the other primes.
Also note that dierent sets of true conditions lead to dierent values of n.
The reason is that two dierent products of primes cannot yield the same value of
n.
Now, we can compute the answer. We are asked to choose a \color," true or
false, for each of eight conditions. We can do so in 28 = 256 ways.
4.2.5: 10n.
4.2.7: (a) 8K (c) 16M (e) 512P.
} Section 4.3
4.3.1(a): 9! = 362880.
4.3.3: There are at most ve comparisons in any branch. This number is best
possible, since four comparisons can only distinguish 16 dierent orders, and there
are 4! = 24 orders.
Given (a; b; c; d) to sort, the list is split into (a; c) and (b; d). The rst thing
that happens is (a; c) is sorted, resulting in the comparison of a and c. Then,
(b; d) is sorted similarly. The third comparison is between the winners of the two
comparisons. For example, if a and b are the winners, we compare these two. If,
say, a wins, then the fourth comparison is between b and c. If b wins we are done,
but if c wins, we need a fth comparison of b against d. The rst three levels of the
decision tree are shown in Fig. S4.1.
} Section 4.4
all orders
a<c
Y N
all orders where a < c all orders where a c
b<d b<d
Y N Y N
all orders where all orders where all orders where all orders where
a < c and b < d a < c and b d a c and b < d a c and b d
a<b a<d c<b c<d
Fig. S4.1. Decision tree for Exercise 4.3.3..
c) 54 = 625.
d) If there are only ve colors, then the number of codes without repetition is
5!=(5 4)! = 5!=1! = 120. Thus, the number with a repetition but no red peg
is 625 120 = 505.
} Section 4.5
4.5.1(a): 7!= 3! (7 3)! = 7!=(3! 4!) = 5040=(6 24) = 35.
4.5.3(a): 73 , which, as we learned in Exercise 4.5.1(a), is 35.
4.5.5: To begin, we need to pick the positions that are vowels. There are 5 = 10
2
ways to do so. For each of these 10 ways, we can pick the 3 consonant positions
in 213 = 9261 ways. We can pick the two vowel positions in 52 = 25 ways. Thus,
the total number of words of length ve with vowels in two particular positions is
9261 25 = 231525 ways. The number of words altogether is ten times this, or
2,315,250.
4.5.7:
c = 1.0;
for(i=n; i>n-m; i--) {
c *= i;
c /= (i-n+m);
}
} Section 4.6
4.6.3: One way to look at this problem is that we wish to order 64 items, of which
3 are unique (the squares with pieces other than knights), two are indistinguishable
from each other (the squares with the white knights) and the remaining 59 (the
squares that do not have a piece on them) are also indistinguishable from each
other. The number of orders is thus 64!=(59! 2! 1! 1! 1!) = 64!=(59! 2!) =
457,470,720.
4.6.5: (2n)!=(2! 2! 2!) (n times), or (2n)!=2n.
} Section 4.7
4.7.1:
a) (6 + 3)!=(6! 3!) = 9!=(6! 3!) = 84.
c) (6 + 3 + 4)!=(6! 3! 4!) = 13!=(6! 3! 4!) = 60060.
4.7.3: Let us reserve one apple for each of the three children. Then, we may
distribute the remaining four apples as we like. There are (4 + 2)!=(4! 2!) = 15
ways to do so.
} Section 4.8
4.8.1(a): Begin by picking the card that is not part of the two pairs. We can do
so in 52 ways. Now, pick the ranks of the two pairs. There are only 12 remaining
ranks, so we can
do so in 122 = 66 ways. For each of the pairs, we can pick the
4
two suits in 2 , or 6, ways. We now have the 5 cards of the hand without order,
and the number of possibilities is 52 66 6 6 = 123,552.
4.8.2(a): We may pick the Ace four dierent ways, and we may pick the 10-point
card in 16 ways. Thus, there are 4 16 = 64 dierent blackjacks.
4.8.4(a): 129 + 12 12 12
10 + 11 + 12 = 220 + 66 + 12 + 1 = 299.
4.8.7(a): First, we must pick the suit of which there is four.13We can do so in 4
ways. Now, we pick the cards of the suit of four; there are 4 = 715 ways to do
so. For each of the suits of three, we can select the cards in 133 = 286 ways. Thus,
the number of hands is 4 715 286 286 286 = 66,905,856,160 ways.
} Section 4.9
4.9.1(a): 5=36.
4.9.2(a): First, there are 52 51 = 2652 members of the probability space. To
calculate the number of points with one or more Aces, it is easier to calculate the
number with no Ace and subtract this probability from 1. The number of deals of
two cards from the 48 that remain after removing the Aces is 48 47 = 2256. Thus,
the probability of no Ace is 2256=2652, and the probability of at least one Ace is
1 (2256=2652) = 396=2652 = 14:9%.
CHAPTER 4. COMBINATORICS AND PROBABILITY 21
4.9.3(a): The area of a circle of radius 3 inches is 9 = 28:27. The area of the
entire square is 144 square inches. Thus, the probability of hitting the circle is
28:27=144 = 19:6%.
80
4.9.4(a): The probability is 54 7516 = 20 . If we cancel common factors in nu-
merator and denominator, we get 5 20 19 18 17 60=(80 79 78 77 76),
or 1.21%.
} Section 4.10
4.10.1:
a) Among the 18 points with an odd rst die, 9 have an even second die. Thus,
the probability is 9=18, or 50%.
c) There are six points that have 4 as the rst die. Of these, four have a sum at
least 7. Thus, the probability is 4=6, or 66.7%.
4.10.2(a): In the region of 120 points where there are three dierent numbers,
the probability of two 1's is 0. In the region of 90 points with two of one number,
1/6 will have two 1's. In the region of six points with all three dice the same, the
probability of at least two 1's is 1/6. Thus, the probability of at least two 1's is
0 (120=216) + (1=6) (90=216) + (1=6)(6=216) = 7:41%.
4.10.7: An appropriate probability space, in which all points are of equal proba-
bility, is one with six points, two for each of the choices of which prisoner to shoot.
The distinction between the two points for prisoner A is the order in which the
guard will consider the other two prisoners when asked for a prisoner (other than
the questioner) who will not be shot. Thus, the two points for A can be thought of
as (A, \B -before-C ") and (A, C -before-B ). The remaining four points, for B and
C , are specied similarly.
Now, suppose that the guard answers \B " to A's question. There are three
points that could have occurred:
(A, B -before-C ), (C , A-before-B ), and (C , B -before-A)
In only the rst of these will A be shot, so the probability is still 1/3.
} Section 4.11
4.11.1(a): The probability that at least one of the events is at least the largest of
the pi 's. The probability would be exactly max(p1 ; p2; : : :; pn) in the case that all
the events were contained within the largest. The probability of at least one of the
events is no greaterPthan the sum of the pi's, and of course it is no greater than
1. The probbility ni=1 pi would be reached in the case that all the events were
disjoint.
4.12.2:
a) Nothing. It could be anything between 0 and 1.
22 SOLUTIONS TO SELECTED EXERCISES
b) The probability is 1 p.
4.12.3:
a) Between 0 and 0.3.
c) Cold must be contained in both High and Dropping. Thus, Cold cannot have
probability greater than the smaller of High and Dropping, that is, 0.3. Its
probability could be as low as 0, however, for example, if High and Dropping
were disjoint.
} Section 4.12
4.12.1: Intuitively, the expected number of 1's on one die is 1/6. The tosses of dice
are independent, so we expect 1/6 of a 1 from each. In three dice, we thus expect
1/2 a 1.
Alternatively, consider the 63 = 216 tosses of three dice. The number of tosses
with three 1's is 1. The number of tosses with two 1's is 15, since the other die can
be any of ve numbers, and the non-1 die can appear in any of three positions. The
number of tosses with exactly one 1 is 75. In explanation, the 1 can appear in any
of 3 positions. For each of the other two positions, there are 5 choices, for a total of
3 5 5 = 75 tosses. The expected number of 1's is thus (1 3+15 2+75 1)=216 =
(3 + 30 + 75)=216 = 108=216 = 1=2.
4.12.2: Exercise 4.12.1 suggests that the average amount of our winnings is 50
cents. However, the average amount we lose is not 50 cents. It is one dollar times
the probability that we lose. This probability is 125=216, the fraction of tosses that
do not contain a 1. This expected value is 57:9 cents, so the expected value of our
payout is 7:9 cents.
4.12.5: The game is fair. Generalizing Exercise 4.12.1, we have six independent
tosses, each with an expected 1/6 of a 1, so the expected amount recieved in each
toss is one dollar. Since we pay a dollar to play, our net expected payout is 0.
} Section 4.13
}}
}} Chapter 5. The Tree Data Model
} Section 5.2
+ +
x 1 4
x c
24 SOLUTIONS TO SELECTED EXERCISES
+ 5
9 8 7 6
} Section 5.3
5.3.1: For each node, the leftmost child and right sibling are shown in Fig. S5.1.
5.3.5: There are 107 nodes, each with 4 bytes of information and two 4-byte point-
ers, or 1:2 108 bytes. There would be 107 + 1 NULL pointers (see Exercise 5.5.5).
CHAPTER 5. THE TREE DATA MODEL 25
} Section 5.4
5.4.1: Here is a surprisingly simple function that counts the nodes in a tree.
int count(pNODE n) {
if(n != NULL)
return(count(n->rightSibling) +
count(n->leftmostChild) + 1);
else return 0;
}
At rst glance, the function doesn't seem to address the problem. However,
the \inductive assertion" about count is that for any node n, count(n) is the sum
of the number of nodes in the subtree rooted at n and all those subtrees rooted at
siblings of n to the right of n. The induction is straightforward, once we realize that
the induction is on the length of the longest path of leftmost-child and right-sibling
pointers extending from a node. Since the root has no siblings, the desired result
appears at the root.
Another way to look at the count function above is that the leftmost-child
and right-sibling pointers turn the tree into a binary tree with the same number of
nodes. Surely, the rule that the number of nodes in a binary tree rooted at n is 1
plus the sum of the number of nodes in the left and right subtrees makes sense.
Incidentally, this technique applies to any computation on trees that can be
expressed as an associative operator applied to the results of the children (the
operator is + in the case of count), and a nal, unary operator (add-one in this
case). For example, the function to compute the height of a tree in Fig. 5.22 of
the text can be replaced by a simpler (but less transparent) function that computes
the height of n to be the larger of the height of the right sibling of n and 1 plus
the height of the leftmost child of n. The height will be correct at the root, but in
general, height(n) is the largest of the height of n and any siblings of n to the right.
5.4.5: The node listings are
a) Preorder: 1, 2, 4, 8, 9, 3, 5, 10, 13, 14, 15, 6, 7, 11, 12
b) Postorder: 8, 9, 4, 2, 13, 15, 14, 10, 5, 6, 11, 12, 7, 3, 1
5.4.7: First, construct the expression tree. Then the inx and prex expressions
can be read o the tree.
a) Inx expression is (a + b) c =(d e) + f .
b) Prex expression is += +abc def .
} Section 5.5
BASIS. When the tree T is a single node n, line (1) prints the label of the root n.
At line (2) c is set to NULL, and consequently the body of the while-loop is not
executed.
INDUCTION. Suppose n, the root of a tree T , has nodes c1; c2; : : :; ck as children.
Let numnodes(ci ) be the number of nodes in the subtree rooted at ci . Let degree(ci )
be the sum of the degrees of the nodes in the subtree rooted at ci . By the inductive
hypothesis, we know that numnodes
Pk
(ci ) = degree(ci ) + 1 for 1 i k. The total
number of nodes
Pk
in T is 1 + i=1 numnodes (ci ). The sum of the degrees of all the
nodes in T is i=1degree(ci ) + k. We therefore have
k
X k
X
1+ numnodes(ci ) = 1 + degree(ci ) + k
i=1 i=1
Since the root has degree k, the latter sum is 1 plus the sum of all the nodes in T ,
proving the induction.
5.5.5:
BASIS. A leaf has 2 NULL pointers and 1 node.
INDUCTION. Let T be a tree with root r. Let r have children c1 ; : : :; ck , the roots
of subtrees T1 ; : : :; Tk , respectively. Let Ti , as a tree by itself, have pi NULL pointers
and ni nodes. By the inductive hypothesis, pi = ni + 1 for i = 1; 2; : : :; k.
When we assemble T from r and the Ti 's, we replace the NULL pointers in the
rightSibling elds of c1; c2; : : :; ck 1 by non-NULL pointers. The root r has a non-
NULL leftmostChild eld and a NULL rightSibling eld. The number of NULL
P
pointers in T is thus ( ki=1 pi ) (k 1) + P 1. Since pi = ni + 1 by the inductive
hypothesis, the number of NULL pointers Pk
is ( ki=1 ni ) + 2. This is one greater than
the number of nodes in T , which is ( i=1 ni) + 1.
CHAPTER 5. THE TREE DATA MODEL 27
} Section 5.6
5.6.1:
void inorder(TREE t) {
if(t != NULL) {
inorder(t->leftChild);
printf("%d ", t->nodelabel);
inorder(t->rightChild);
}
}
5.6.3: In the code of Fig. S5.2, we assume the function pr(x) returns the precedence
associated with the node label x. We assume leaf-operands have the highest prece-
dence so no parentheses are put around them. When the left operand of the root
t has lower precedence than the root, we put parentheses around the left operand,
and the right operand is treated similarly.
void pinorder(TREE t) {
if(t != NULL) {
if(t->leftChild != NULL) {
if(pr(t->leftChild.nodelabel) <= pr(t->nodelabel))
pinorder(t->leftChild);
else {
printf("(");
pinorder(t->leftChild);
printf(")");
}
}
printf("%d ", t->nodelabel);
if(t->rightChild != NULL) {
if(pr(t->rightChild.nodelabel) <= pr(t->nodelabel))
pinorder(t->rightChild);
else {
printf("(");
pinorder(t->rightChild);
printf(")");
}
}
}
}
} Section 5.7
} Section 5.8
5.8.1: The branching factor is the maximum number of children a node can have.
The smallest tree of height h with branching factor b is a simple path of h +1 nodes.
The largest tree of height h with branching factor b > 1 is a complete b-ary tree (all
nodes on the rst h levels have b children). There is one root, b children of the
P
root,
b2 children of those, and so on to depth h. The total number of nodes is hi=0 bi,
or (bh+1 1)=(b 1) nodes.
} Section 5.9
5.9.1:
a) Sequence of steps to insert 3:
18 18 16 9 7 1 9 3 7 5 3 initially 3 goes into A[11]
18 18 16 9 7 1 9 3 7 5 3 after bubbleUp(A,11) (no change to A)
b) Insert 20:
18 18 16 9 7 1 9 3 7 5 3 20 initially 20 goes into A[20]
20 18 18 9 7 16 9 3 7 5 3 1 after bubbleUp(A,12)
c) Delete maximum element (replacing it by A[12]):
1 18 18 9 7 16 9 3 7 5 3 initial array
18 9 18 7 7 16 9 3 1 5 3 after calling deletemax(A,11)
d) Again, delete maximum element (replacing it by A[11]):
CHAPTER 5. THE TREE DATA MODEL 29
Hairy
Bashful Sleepy
Doc Happy
Pinky
Happy
Bashful Sleepy
Dopey Pinky
Blinky
3 9 18 7 7 16 9 3 1 5 initial array
18 9 16 7 7 3 9 3 1 5 after calling deletemax(A,10)
5.9.7: Clearly, bubbleDown takes O(1) time plus the time of the recursive call. The
second argument of the recursive call is at least twice the value of the second formal
parameter i. When i exceeds n=2, there is no recursive call made. Thus, no more
than log2 n recursive calls can result from an initial call to bubbleDown. Hence, the
total time is O(log n).
} Section 5.10
}}
}} Chapter 6. The List Data Model
} Section 6.2
6.2.1:
a) Length is 5.
b) Prexes are , (2), (2,7), (2,7,1), (2,7,1,8), (2,7,1,8,2).
c) Suxes are , (2), (8,2), (1,8,2), (7,1,8,2), (2,7,1,8,2).
d) Sublists are , (2), (7), (1), (8), (2), (2,7), (7,1), (1,8), (8,2), (2,7,1), (7,1,8),
(1,8,2), (2,7,1,8), (7,1,8,2), (2,7,1,8,2).
e) There are 31 distinct subsequences.
f) The rst 2 is the head.
g) The list (7,1,8,2) is the tail.
h) There are ve positions.
6.2.3: Prexes: There are always exactly n + 1 prexes, one each of the lengths 0
through n.
Sublists: First, suppose that all the positions of a string of length n how
dierent symbols. Then there is one sublist of length 0, n dierent sublists of
length 1, n 1 dierent sublists of length 2, n 2 of length 3, and so on, for a total
of n(n + 1)=2 + 1. This is the maximum possible number. The minimum occurs
when all the positions hold the same symbol. Then, all sublists of the same length
are the same, and there are only n + 1 dierent sublists.
Subsequences: Suppose all symbols are distinct. Then every set of the n po-
sitions yields a distinct subsequence, so there are 2n subsequences. That is the
maximum number. If all positions hold the same symbol, then all subsequences
of the same length are the same, and we have n + 1 subsequences, the minimum
possible number.
6.2.5: 1,2,3 can represent an innite number of dierent kinds of lists of lists
including ((1),(2),(3)), ((1,2),(3)), ((1),(2,3)), ((1,2,3)), (((1,2,3))), ((((1,2,3)))), and
so on.
} Section 6.3
6.3.1:
a) delete(5; L) = (3,1,4,1,9)
b) delete(1; L) = (3,4,1,5,9) or (3,1,4,5,9)
c) pop(L) removes 3 from L leaving (1,4,1,5,9)
d) push(2; L) adds 2 to the beginning of L giving (2,3,1,4,1,5,9)
e) lookup(6; L) returns FALSE.
f) LM = (3,1,4,1,5,9,6,7,8)
g) first(L) = 3; last(L) = 9
h) retrieve(3; L) = 4, the element at position 3
i) length(L) = 5
32 SOLUTIONS TO SELECTED EXERCISES
j) isEmpty(L) = FALSE
6.3.3:
a) One condition under which delete(x; insert(x; L)) = L is true would be if
insert(x; L) always added x to the beginning of L and delete(x; L) removed
the rst occurrence of x from L.
c) first(L) is always equal to retrieve(1; L).
} Section 6.4
6.4.1:
a) Let T (n) be the running time of delete(x; L) where n is the length of list L.
The recurrence for T (n) is
T (0) = a
T (n) = b + T (n 1), n > 0
The solution to this recurrence is T (n) = a + bn.
6.4.3: Here is a program that inserts an element x into a sorted list L.
void insert(ETYPE x, LIST* pL) {
LIST M;
if((*pL) == NULL) {
(*pL) = (LIST) malloc(sizeof(struct CELL));
(*pL)->element = x;
(*pL)->next = NULL;
}
else if(x > (*pL)->element)
insert(x, &((*pL)->next));
else {/* insert x between cell holding pointer pL and
the cell pointed to by *pL */
M = (LIST) malloc(sizeof(struct CELL));
M->element = x;
M->next = *pL;
(*pL) = M;
}
}
} Section 6.5
6.5.1(b): The following procedure deletes element x from list L using linear search
to locate x.
void delete(ETYPE x, LIST* pL) {
int i, j;
i = 1;
while(i < pL->length && x != pL->A[i]) i++
if(i <= pL->length && x = pL->A[i]) {
for(j = i; j < pL->length; j++) {/* shift following
elements forward */
pL->A[j] = pL->A[j+1];
(pL->length)--;
}
}
int i;
In this case, line (8) of Fig. 6.14 correctly returns TRUE since x = A[mid].
INDUCTION. Suppose that d 0 and that S (d) is true. We shall prove S (d + 1).
Suppose that x is in A[low..high] where high low = d + 1. Since high > low,
the block consisting of lines (3){(8) in Fig. 6.14 is executed. Line (3) computes
mid = b(low + high)=2c. There are three cases to consider.
Case 1. If x < A[mid], then x is in A[low..mid-1]. Then, by the inductive
hypothesis, the call to binsearch(x; L; low; mid 1) on line (5) nds x since
mid 1 low = d
Case 2. If x > A[mid], then the call to binsearch(x; L; mid + 1; high) on line (7)
nds x.
Case 3. If x = A[mid], line (8) nds x and returns TRUE.
We have now proven the inductive hypothesis and conclude that S (d) is true for all
d 0.
} Section 6.6
6.6.1: The following table shows the contents of the stack after each operation.
The top of the stack is on the right.
STACK ACTION
a push(a)
ab push(b)
a pop
ac push(c)
acd push(d)
ac pop
ace push(e)
ac pop
a pop
6.6.3: Let us assume that we have a well-formed prex expression containing num-
bers and binary operators. The following algorithm evaluates the expression.
Step 1: Push the numbers and operators from the expression (from left to right)
on to the stack until the top three symbols on the stack are a binary operator , a
number a, and a number b (b is on top).
Step 2: Replace ab on top of the stack by the result of applying the operator to
a and b.
Step 3: Repeat steps (1) and (2) until no more numbers or operators remain in the
prex expression.
CHAPTER 6. THE LIST DATA MODEL 35
} Section 6.7
6.7.1: The rst column in Fig. S6.1 shows the stack of activation records after
we have pushed an activation record for sum. The remaining columns show the
activation records just before we pop each activation record for sum o the stack.
We use ret to name the return value.
} Section 6.8
6.8.1: Below is the contents of the queue after each command. The front of the
queue is at the left.
QUEUE ACTION
a enqueue(a)
ab enqueue(b)
b dequeue
bc enqueue(c)
bcd enqueue(d)
cd dequeue
cde enqueue(e)
de dequeue
e dequeue
} Section 6.9
6.9.1:
a) 4. aana is one.
36 SOLUTIONS TO SELECTED EXERCISES
A[1]
A[2]
A[3]
A[4]
j
i 1 i 1
ret ret 100
temp temp 90
i 2 i 2
ret ret 90
temp temp 70
i 3 i 3
ret ret 70
temp temp 40
i 4 i 4
ret ret 40
temp temp 0
i 5 i 5
ret ret 0
temp temp
b) 7. bacbcab is one.
6.9.3: There are 20 calls to L(1; 1). Let C (i; j ) be the number of calls to L(i; j)
when the strings of length i and j have no symbols in common. The denition of
the recursive algorithm tells us that
C (i; j ) = C (i 1; j ) + C (i; j 1) whenever i > 0 and j > 0
C (1; 1) = 1
C (i; 0) = 0 for all i
C (0; j ) = 0 for all j
From these observations it follows that C (i; 1) = 1 for all i 1 and C (1; j )= 1 for
all j 1. Thus, a simple induction on i + j shows that C (i; j ) = i+i j 1 2 for all
0 ). Thus, L(4; 4) calls L(1; 1) 6 = 20 times.
i 1 and j 1 (note 0=1 3
CHAPTER 6. THE LIST DATA MODEL 37
} Section 6.10
6.10.3: Let n be the maximum string length and c the number of characters per
cell. Assume both n and c are \large." There are two sources of waste space:
the space used by pointers and the unused character space in the last cell. The
average number of cells will be about n=2c, because the average word is about n=2
characters long and is packed c to a cell. There is one pointer of 4 bytes in each
cell, so the number of waste bytes due to pointers is 2n=c. The number of waste
bytes in the last cell is c=2 on the average. We must thus nd the value of c that
minimizes
2n + c
c 2
If you know calculus, you knowp that the minimum occurs when both terms are
equal; that is, c2 = 4n, or c = 2 n.
6.10.5: We are in eect replacing a single byte by a 4-byte integer, which costs us
3 bytes per word stored. If integers can be stored in one byte, then it is a wash; the
costs are the same.
38 SOLUTIONS TO SELECTED EXERCISES
}}
}} Chapter 7. The Set Data Model
} Section 7.2
7.2.1: The set ffa; bg; fag; fb;cgg contains three members fa; bg; fag; fb; cg; each
of which is also a set.
7.2.3(a): One representation is fb; c; ag. Another, using abstraction, is
fx j x is a letter, a x, and x c
} Section 7.3
INDUCTION. Suppose that the plane has been divided into 2n regions with n sets.
Now consider adding an n + 1st set. The new set partitions each of 2n existing
regions into two, containing those points within the new set and those without.
(Here is where the property that no set is a subset of another is used.) Therefore,
n + 1 sets partition the plane into 2 2n = 2n+1 regions. Thus, the inductive step
is proved.
We conclude a Venn diagram with n sets divides the plane into 2n regions for
all n 1.
If there are n 2 sets such that exactly one set is a subset of another, then a
Venn diagram for the n sets would have 2n 1 + 2n 2 nonempty regions. If S and
T are two of the n sets and S T , then every member within S must be contained
with T . T can partition each of the 2n 2 regions formed by the other sets in two,
but the 2n 2 new regions formed by S must all be partitions of regions contained
with Y .
7.3.7: We shall represent a set of elements (integers) by a linked list of SCELLs
dened in the usual way by our DefCell macro:
DefCell(int, SCELL, SLIST);
We shall represent the powerset whose elements are sets by a linked list of PCELLs
dened by:
DefCell(SLIST, PCELL, PLIST);
} Section 7.4
PLIST powerset(SLIST S) {
PLIST P;
SLIST newS;
PLIST newP;
/* copy L to union */
union = NULL;
while(L != NULL) {
newCell = (LIST) malloc(sizeof(struct CELL));
newCell->element = L->element;
newCell->next = union;
union = newCell;
L = L->next
}
while(M != NULL) {
if(!lookup(M->element, union)) {
newCell = (LIST) malloc(sizeof(struct CELL));
newCell->element = M->element;
newCell->next = union;
union = newCell
}
M = M->next
}
}
for(each x on L)
if(!lookup(x, M)
insert(x, dierence)
Again, the C implementation is similar to Fig. S7.2.
7.4.3: If we allowed union to use portions of the lists L and M in the answer, we
could simplify the union program in Figure 7.6 by replacing line (8) by
union = M
7.4.5: We can write a program almost identical to Figure 7.6 replacing union by
symmetric dierence. However, when the rst elements of L and M are the same
we discard both from the symmetric dierence. Thus, we replace lines (11) and (12)
by
(11) else if(L->element == M->element)
(12) return symdiff(L.next, M.next);
42 SOLUTIONS TO SELECTED EXERCISES
} Section 7.5
7.5.1:
a) A pinochle deck contains the A, K, Q, J, 10 and 9 of each suit. The character-
istic vector is thus
107160716 0716 0715
b) The characteristic vector for the red cards (diamonds and hearts) is
013126013
c) The Jack of hearts and the Jack of spades are one-eyed. The King of hearts is
the suicide king. The characteristic vector for these three cards is
036101010102
} Section 7.6
7.6.3(a):
void bucketDelete(ETYPE x, LIST* pL) {
if((*pL) != NULL) {
if((*pL)->element == x) (*pL) = (*pL)->next;
else delete(x, (*pL)->next);
}
}
7.6.3(b):
Boolean bucketLookup(ETYPE x, LIST L) {
if(L == NULL) return FALSE;
else if(L->element == x) return TRUE;
else return lookup(x, L->next);
}
} Section 7.7
7.7.1: Let A = fag and B = fb; ag. Then A B = f(a; b)g and B A = f(b; a)g.
7.7.3:
a) R is a partial function because every node in a tree has at most one parent.
b) R is not a total function from S to S because the root of a tree has no parent.
c) T is never a one-to-one correspondence because R is not a total function from
S to S .
d) The graph for R is isomorphic to the tree.
7.7.5: F is a partial function from S to T with the following properties:
1. For every element ((a; b); c) in S there is an element (a; (b; c)) in T such that
F (((a; b); c)) = (a; (b; c)).
2. For every element (a; (b; c)) in T there is an element ((a; b); c) in S such that
F (((a; b); c)) = (a; (b; c)).
3. For no b in T are there two distinct elements x1 and x2 in S such that F (x1) =
F (x2) = b.
Hence F is a one-to-one correspondence from S to T .
44 SOLUTIONS TO SELECTED EXERCISES
7.7.7:
a) The graph of the inverse of R is obtained by reversing the directions of the arcs
in the graph for R.
} Section 7.8
7.8.1(a):
void delete(DTYPE a, LIST* F) {
if((*F) != NULL)
if((*F)->domain == a) (*F) = (*F)->next;
else delete(a, (*F)->next);
}
7.8.1(b):
RTYPE lookup(DTYPE a, LIST F) {
if(F == NULL) return UNDEFINED;
else if(F->domain == a) return F->range;
else return lookup(a, F->next);
}
7.8.3(b):
void deleteBucket(DTYPE a, LIST* pL) {
if((*pL) != NULL)
if((*pL)->domain == a) (*pL) = (*pL)->next;
else deleteBucket(a, (*pL)->next)
}
} Section 7.9
7.9.1: This is just a slight rewrite of the function lookup from Fig. 7.24. Here
we search the range rather than the domain for the value b and produce a list of
varieties v such that (v; b) is in L.
PLIST lookup(PVARIETY p, RLIST L) {
PLIST P;
7.9.3(a):
void insertP(PVARIETY p, PLIST* pL) {
if((*pL) == NULL) {
(*pL) = malloc(sizeof(struct PCELL));
(*pL)->pollinizer = p;
(*pL)->next = NULL;
}
else if((*pL)->pollinizer != p)
insert P(p, L->next);
}
} Section 7.10
7.10.1: Let R be the relation such that aRa for element a. Then R is re
exive on
the domain fag but not on the domain fa; bg.
7.10.3:
a) R is not re
exive because abcdRabcd is false when b =6 a.
b) R is not symmetric because if abcdRbcda is true, then bcdaRabcd is false when
a; b; c and d are distinct letters.
c) R is not transitive because if abcdRbcda and bcdaRcdab are true then abcdRcdab
is false when a; b; c and d are distinct.
d) R is not antisymmetric nor transitive. Hence R is not a partial order.
e) R is not an equivalence relation because it is not symmetric or transitive.
CHAPTER 7. THE SET DATA MODEL 47
7.10.5: The problem with the \proof" is that there may be no y such that xRy.
Let D be the domain fag and let R by the empty relation on D. Trivially, R is a
symmetric and transitive relation on D.
7.10.7: To count the number of arcs in the full graph, we need to count the number
of pairs of sets (S; T ) such that S T U . Each of the n elements of U may be
placed in one of the following three sets; S , T S , and U T . By the method of
Section 4.2, there are 3n ways to make this assignment. Thus, the full graph for
U has 3n arcs.
In the reduced graph, each set has n arcs, one to each of the sets formed by
inserting or deleting one of the n elements of U . Since each arc is thus counted twice,
once for each end, the number of arcs is n2n=2 = n2n 1. Therefore, 3n n2n 1
arcs are saved.
7.10.9: We shall prove by induction on n
STATEMENT S (n): If a0 Ra1; a1Ra2 ; :::; an 1Ran and R is transitive, then a0Ran.
BASIS. n = 1. Clearly, a0 Ra1 is true.
INDUCTION. We assume S (n) and prove S (n + 1). Consider the sequence of
n + 1 pairs a0 Ra1; a1Ra2 ; :::; an 1Ran ; anRan+1. >From the inductive hypothesis,
we know a0Ran . By transitivity a0 Ran and anRAn+1 imply a0Ran+1 , proving the
inductive step.
7.10.11:
a) R is not re
exive because aRa is false.
b) R is symmetric because if aRb then a and b have the common divisor in both
situations.
c) R is not transitive. For example, 2R6 and 6R9 but 2R9 is false since 2 and 9
do not have a common divisor other than 1.
d) R is not a partial order since R is neither transitive nor antisymmetric.
e) R is not equivalence relation since it is neither re
exive nor transitive.
} Section 7.11
7.11.1: Let A be a set of sets and let E be an equipotence relation on A; that is,
S E T if there is a 1-1 correspondence from S to T . We shall show that E is (a)
re
exive, (b) symmetric, and (c) transitive.
a) S E S for every set S in A because we can dene the identity function f (x) = x
for all x in S as a 1-1 correspondence from S to itself.
b) If S E T , then T E S . Since S E T , there is a 1-1 correspondence from S to
T . We can show that f 1 , the inverse of f , is a 1-1 correspondence from T to
S.
c) If S E T and T E R, then we shall show S E R. Let f be a 1-1 correspondence
from S to T and g a 1-1 correspondence from T to R.
48 SOLUTIONS TO SELECTED EXERCISES
}}
}} Chapter 8. The Relational Data Model
} Section 8.2
8.2.1:
a) For the relation StudentId-Name-Address-Phone we dene the record structure:
struct {
int StudentId;
char Name[30];
char Address[50];
char Phone[10];
}
c) For Course-Day-Hour:
struct {
char Course[5];
char Day[2];
char Hour[4];
}
} Section 8.3
8.3.1:
a) fStudentId, Addressg would be a key assuming that a student has only one
phone at a given address.
b) We could use the relation scheme
StudentId-Name-HomeAddress-LocalAddress-HomePhone-LocalPhone
StudentId is a key for this relation.
c) We need to separate the scheme into three schemes:
StudentId-Name
StudentId-Address
StudentId-Phone
Any other decomposition either has redundancy or does not allow us to asso-
ciate names, ID's, addresses, and phones properly. StudentId is a key for each
scheme.
} Section 8.4
8.4.1: For Exercise 8.3.2, we suggest a database scheme with three relations:
50 SOLUTIONS TO SELECTED EXERCISES
} Section 8.5
Lets us create a linked list of SNAME (\same name") cells, each of which has as
element a pointer to a tuple with that name.
We change the declaration of NODE in Fig. 8.5 of the text by making the second
eld be the header of a linked list of SNAME cells:
typedef struct NODE *TREE;
struct NODE {
char Name[30];
SNAMELIST tuples;
TREE lc, rc;
}
CHAPTER 8. THE RELATIONAL DATA MODEL 51
Here is an outline of a function printTuples(x; T ) that prints all the tuples in the
binary tree T that have x for the Name attribute.
void printTuples(char x[], TREE T) {
if(T != NULL)
if(eq(x, T->Name)) printList(T->tuples);
else if(lt(x, T->Name)) printTuples(x, T->lc);
else printTuples(x, T->rc);
}
} Section 8.6
stage (1) takes O(n) time. Let us assume k tuples with C. Brown in the Name eld
are found.
In stage (2) we search through the Course-StudentId-Grade relation to nd all
tuples whose StudentId matches the StudentId eld of the k tuples found in stage
(1). If there are m tuples in the CSG relation, then stage (2) takes O(km). Let us
assume c CSG-tuples are found.
In stage (3) we search through the Course-Prerequisite relation to nd all tuples
whose Course component matches the Course component of one of the c CSG-
tuples found in stage (2). For each CP -tuple found, we print the Prerequisite eld.
Assuming that there are p tuples in the CP relation, this stage takes O(cp) time.
This three-stage process takes O(n + km + cp) time.
} Section 8.7
8.7.1:
a) Course = \CS101" AND Day = \M"(CDH )
b) Day = \M" AND Hour = \9AM"(CDH )
c) CDH = CDH Course = \CS101"(CDH )
8.7.3:
a) The courses taken by C. Brown can be expressed by the relational-algebra
expression
X = Course(Name = \C.Brown"(SNAP ) ./ CSG)
The prerequisites of courses taken by C. Brown can then be expressed by
Prerequisite(X ./ CP )
Here, we must understand that X is a relation with attribute Course, so the
join is on equality of the one column of X with the Course attribute of CP .
b) The students taking courses in Turing Aud. can be expressed by
Y = StudentId (CSG ./ Room = \Turing Aud." (CR))
The phone number of these students is given by
Phone (Y ./ SNAP )
Here, Y is a relation with attribute StudentId.
c) The prerequisites of CS206 can be expressed by:
Z = Prerequisite(Course = \CS206"(CP )))
The prerequisites of these courses are given by
Prerequisite(Z ./ CP )
Here, Z must be regarded as a relation with attribute Course, not Prerequisite.
CHAPTER 8. THE RELATIONAL DATA MODEL 53
} Section 8.8
8.8.1:
a) Use the primary index to nd in O(1) time each tuple in the StudentId-Name-
Address-Phone relation with StudentId = 12345. Include the tuple in the
answer if the address is not 45 Kumquat Blvd.
b) Use the secondary index on Phone to nd in O(1) time each tuple in the SNAP
relation with Phone = 555-1357. Include the type in the answer if the name
matches C. Brown.
c) Neither index helps. The easiest (and fastest) way to proceed is to iterate
through each of the tuples in the SNAP relation, selecting those tuples where
Name = \C. Brown" is true or Phone = 555-1357 is true (or both are true). If
there are n tuples in the SNAP relation, then this process takes O(n) time.
8.8.3:
a) There are two nested loops, each of which iterates n times. The body of the
inner loop is a comparison of tuples and so takes O(1) time. The total time is
thus O(n2 ).
b) We can sort the relations in O(n log n) time. As we compare tuples, we nd
O(n3=2) matches, and this time dominates the sorting. The total time is thus
O(n3=2).
c) We must consider each of the n tuples in S . For each one, we look up the
matching tuples in R through the index, taking time proportional to the number
of matching tuples found. Since there are n3=2 matches in all, the sum of the
number of matches, over all tuples in S , is n3=2, and thus the total time is
O(n3=2).
d) Same as (c).
8.8.5:
a) Suppose R and S each have the scheme fA; B g and S has an index on attribute
A. To compute R \ S consider each tuple (a; b) in R and use the index on a to
nd all tuples (a; x) in S . Include (a; b) in R \ S if((a; b) is also in S . If A is a
key for relation S , then in this way we can compute R \ S in time proportional
to the sum of the sizes of R and S .
b) The answer is similar to part (a) except we include (a; b) in R S if (a; b) is
not in S .
} Section 8.9
8.9.3: We have, in the solution to Exercise 8.7.1, described each expression with
the selections and projections pushed down as far as they go.
8.9.5: First, we prove that if a tuple t is in C (R ./ S ) it is in C (R) ./ S . Since t
is in C (R ./ S ), t satises condition C and is in R ./ S . If t is in R ./ S , then there
are tuples r in R and s in S that agree with t on their common attributes, and also
54 SOLUTIONS TO SELECTED EXERCISES
agree with each other on the join attribute. Since C (R) makes sense, condition C
must involve only attributes of r. Since r and t agree on common attributes, and t
satises C , r also satises C . Thus, in C (R) ./ S , tuples r and s join to make t,
proving that t is in C (R) ./ S .
Conversely, suppose t is in C (R) ./ S . Then there are r in C (R) and s in S
that agree with t and with each other on common attributes. Since r is in C (R),
r must by in R and must satisfy C . Therefore, t, which agrees with r on whatever
attributes C mentions, must also satisfy C . When we join R ./ S , r in R and s in
S join to form t. Since t satises C , we conclude that t is in C (R ./ S ).
8.9.7: Let R be the relation f(a; b); (a; c)g and S the relation f(a; c)g. Both relations
have attributes A and B . Here, A (R S ) = fag but A (R) A (S ) = fag fag = ;.
Thus, A(R S ) 6= A (R) A (S ).
CHAPTER 9. THE GRAPH DATA MODEL 55
}}
}} Chapter 9. The Graph Data Model
} Section 9.2
9.2.1:
a) There are 8 arcs.
b) There are 2 simple paths: ad and abcd.
c) Nodes a and e are predecessors of node b.
d) Nodes c and f are successors of node b.
e) There are 5 simple cycles: abfa, abcdefa, adefa, bcdeb, adebfa.
f) abfabfa is the only nonsimple cycle of length at most 7.
9.2.3: A complete directed graph has an arc from each node to every other node
including itself. A complete directed graph has the maximum possible number of
arcs: an n-node complete graph has n2 arcs. Thus, a 10-node graph has at most
100 arcs. The smallest number of arcs any graph can have is zero.
9.2.5: The number number of arcs an n-node acyclic directed graph can have is n2
= n(n 1)=2. To see this, we can start with a complete n-node undirected graph
in which there is an edge between every pair of distinct nodes. Such a graph has
n edges. We can then assign a direction to each edge fi; j g so that the edge is
2
directed from node i to node j if i < j . We can show that the resulting directed
graph is acyclic and has the maximum possible number of edges.
9.2.7: The cycle (0; 1; 2; 0) can also be written as the cycles (1; 2; 0; 1) and (2; 0; 1; 2).
9.2.9: Let S be the relation dened on the subset of the nodes of the graph that
are involved in simple cycles. To show that S is an equivalence relation, we need to
show that it is re
exive, symmetric, and transitive.
a) Re
exivity. If node u is involved in a simple cycle, then there is a simple cycle
that begins and ends at u. Thus, uSu.
b) Symmetry. If uSv, then vSu because u and v are included in the same simple
cycle.
c) Transitivity. Suppose uSv and vSu. Then there are two intersecting simple
cycles that include nodes u and w. Let a and b be the rst and last nodes of
intersection on the simple cycle from u to v to u. Then u a w b u is a
simple cycle that includes u and w. Therefore, uSw.
} Section 9.3
LIST successors[MAX];
successors
a b d
b c f
c d
d e
e b f
f a
a b c d e f
a 0 1 0 1 0 0
b 0 0 1 0 0 1
c 0 0 0 1 0 0
d 0 0 0 0 1 0
e 0 1 0 0 0 1
f 1 0 0 0 0 0
Note that we must leave room for the null character '\0' at the end of the two-
character arc label. (We have not shown the null character in the following gures.)
We declare the array of list headers by
CHAPTER 9. THE GRAPH DATA MODEL 57
struct {
char nodeLabel;
LIST successor;
} headers[MAX];
headers
0 A 1 ab 3 ad
1 B 2 bc 5 bf
2 C 3 cd
3 D 4 de
4 E 1 eb 5 ef
5 F 0 fa
a b c d e f
a - ab - ad - -
b - - bc - - bf
c - - - cd - -
d - - - - de -
e - eb - - - ef
f fa - - - - -
STATEMENT S (n): In an undirected graph with n nodes and e edges, the sum of
the degrees of the nodes is 2e.
BASIS. For the basis, we choose n = 1. A single-node graph has 0 edges and the
sum of the degrees of the nodes is 0.
INDUCTION. Assume S (n) holds for all graphs of n nodes. Consider any graph G
of n + 1 nodes and pick a node x in G with m edges incident on x.
58 SOLUTIONS TO SELECTED EXERCISES
If we remove x and all edges incident on x from G, we are left with a graph G0
of n nodes and e edges. By the inductive hypothesis, the sum of the degrees of the
nodes of G0 is 2e. When we restore x to G0 along with its incident edges, we see
that G has m + e edges and the sum of the degrees of its nodes is 2(m + e). This
proves the inductive step.
9.3.7: For an undirected graph, an edge appears twice in an adjacency-matrix and
an adjacency-list representation.
a) Function to insert edge (a; b) into an adjacency matrix:
void insert(NODE a, NODE b, BOOLEAN edges[MAX][MAX])
{
edges[a][b] = TRUE;
edges[b][a] = TRUE;
}
The delete function is similar except we make the two entries in the array FALSE.
b) Function to insert edge (a; b) for an adjacency-list representation:
void insert(NODE a, NODE b, LIST successors[]);
{
insertList(b, &successors[a]);
insertList(a, &successors[b]);
}
} Section 9.4
} Section 9.5
9.5.6:
a) It is easy to see why a node of odd degree inhibits an Euler circuit. The circuit
must visit the node some number of times, and each time it visits, it enters and
leaves on two dierent edges (the direction in which we traverse the circuit is
arbitrary, but once we pick a direction, the edges incident upon a node v can
be identied as entering or leaving). It follows that there are as many entering
edges as leaving edges for a node v, and therefore the number of edges incident
upon v is even.
CHAPTER 9. THE GRAPH DATA MODEL 59
For the converse, we need to show how to construct an Euler circuit when all the
degrees are even. We do so in part (b), where we are asked not only to produce an
algorithm to construct Euler circuits, but to design an ecient algorithm.
b) We need to use an appropriate data structure: adjacency lists, plus a list of
all the edges. We also need to generalize the notion of an Euler circuit to
cover the case in which the graph has more than one connected component.
In that case, we say an \Euler circuit" for the graph is an Euler circuit for
each connected component. Start with any edge, say fv0; v1g, and arbitrarily
pick one of the nodes, say v0 , as the beginning of a path. Extend the path to
nodes v2; v3; : : :, without reusing any edge, which we may do since every time
we enter a node, we know it has even degree so there is an unused edge by
which to leave. Eventually, we repeat a node on the path, say vi .
Now, remove the edges of the cycle, say vi ; vi+1; : : :; vk ; vi, but leave the nodes.
Recursively nd an \Euler circuit" for the remaining graph, but start with the
portion of the path already constructed, v0; v1 ; : : :; vi . Note that we quote \Euler
circuit," because the resulting graph may not be connected. Finally, we need to as-
semble an Euler circuit for the entire graph. We use the removed cycle vi ; : : :; vk ; vi
as a base and follow it around. Each time we visit a node, say vj , if we have not
previously visited any nodes from its connected component, we follow the Euler
circuit for this connected component, starting and ending at vj . Then we continue
around the cycle to vj +1 . When we return around the cycle to vi , we have an Euler
circuit for the entire graph.
} Section 9.6
9.6.3:
a)
Tree arcs: ab, bc, cd, de, ef
Forward arcs: ad, bf
Backward arcs: eb, fa
There are no cross arcs
b)
Tree arcs: ab, bf , bc, cd, de
Forward arcs: ad
Backward arcs: eb, ef , fa
There are no cross arcs
c)
Tree arcs: de, ef , fa, ab, bc
Forward arcs: eb
Backward arcs: ad, bf , cd
There are no cross arcs
60 SOLUTIONS TO SELECTED EXERCISES
} Section 9.7
} Section 9.8
9.8.1:
CITY DISTANCE
Detroit 0
Ann Arbor 28
Escanba INFTY
Flint 58
Grand Rapids 138
Kalamazoo 138
Lansing 78
Marquette INFTY
Menominee INFTY
Saginaw 89
Sault Ste. Marie INFTY
9.8.3(b):
SPECIFIES TIME
AF 0
AA 0.8
HH 1.0
AR 1.3
HE 2.2
AB 1.2
HS 2.9
CHAPTER 9. THE GRAPH DATA MODEL 61
} Section 9.10
9.10.1:
a) The chromatic number of the graph in Fig. 9.4 is 3.
b) The clique number is 3.
c) f Maili, Pearl City, Wahiawa g and f Hilo, Kamuela, Kona g are both cliques
of size 3.
62 SOLUTIONS TO SELECTED EXERCISES
}}
}} Chapter 10. Patterns, Automata, and Regular Ex-
pressions
} Section 10.2
10.2.1:
a) The following automaton accepts strings of 0's and 1's having an even number
of 1's:
0 0
start 0 1 1
1
0 f0; 1g
start 0 1 1 1 2 1 3
0
0
In this automaton, state 0 means the previous input symbol was not a 1, state
1 means the previous input symbol was a 1 and the one before that was not a
1, state 2 means the two previous input symbols were 1's, and state 3 means
the three previous input symbols were 1's.
} Section 10.3
10.3.3: The nondeterministic automaton in Fig. (a) accepts all strings of letters
ending in father, man, or son.
CHAPTER 10. PATTERNS, AUTOMATA, AND REGULAR EXPRESSIONS 63
start 0 f
1 a
2 t
2 h
2 e
2 r
2
m a n
7 8 9
s o n
10 11 12
Fig. (a). Automaton accepting strings ending in father, man, or son.
s u m m a n d
0 0 0 1 0 0 0 0
s u m m a n d
0 0 0 0 0 0 0 0
1 1 2 3
10.3.5: Figures (b) and (c) simulate the automata in Figs. 10.10 and 10.11, re-
spectively.
} Section 10.4
} Section 10.5
10.5.1: The two regular expressions (ac j abc j abbc) and a(c j b(c j bc)) also
dene the same language.
64 SOLUTIONS TO SELECTED EXERCISES
fa,bg
10.5.3:
a) b*((aa*b*)*
b) (0 j 1 j j 9)(0 j 1 j j 9)*.(0 j 1 j j 9)*
c) 0*(10*10*)*
10.5.5:
a) (a j (bc)) j (de)
b) (a j (b*)) j ((a j b)*a)
10.5.7:
a) ; j denes either the empty set or the set containing the empty string.
c) (a j b)* denes the set of all strings of a's and b's (including the empty string).
e) (a*ba*b)*a* denes the set of all strings of a's and b's containing an even
number of b's.
g) R** is the same as R*, that is, the set consisting of the concatenation of zero
or more strings from the set dened by R.
} Section 10.6
10.6.1:
a) Single-character operators and punctuation symbols in C:
!"#$%&'()*+,-./:;<=>?[]^{}~
c) Lower-case consonants:
[bcdfghjklmnpqrstvwxyz]
} Section 10.7
} Section 10.8
10.8.1:
a) Automaton for aaa:
start 0 a
1 2 a
3 4 a
5
8 9 1
10 11
} Section 10.9
10.9.1:
a) ( - a)*a( - e)*e( - i)*i( - o)*o( - u)*u
c) *man
e) ( - a)*a( - a)*a
g) *man
66 SOLUTIONS TO SELECTED EXERCISES
}}
}} Chapter 11. Recursive Description of Patterns
} Section 11.2
} Section 11.3
11.3.1: The new strings for the languages S and L are tabulated in Fig. (a).
CHAPTER 11. RECURSIVE DESCRIPTION OF PATTERNS 67
S L
Round 4: wcdwcdwcds wcdwcds
wcdbse bse
bs;wcdse s;wcds;s
bs;se s;wcds;wcds
bwcdse s;wcds;wcdwcds
s;wcds;bse
s;s;s
s;s;wcds
s;s;wcdwcds
s;s;bse
wcds;s
wcds;wcds
wcds;wcdwcds
wcds;bse
s;wcdwcds
s;bse
11.3.3:
Fig. 11.3 Fig. 11.4
Round 1:
Round 2: () ()
Round 3: ()() (())
(()) ()()
(())()
On round 3, the grammars generate dierent sets of strings. Thus, the answer to
the question is \no." In fact, all all subsequent rounds the sets of strings generated
by the two grammars are dierent. However, the sets of strings generated taken
over all the rounds are the same; both sets are the set of all balanced parenthesis
strings.
11.3.5: Suppose we are generating round r. If we make a substitution that uses
only strings available on round r 2 or earlier, then the same substitution could
have been made on round r 1. Thus, the string generated by this substitution
must have appeared on round r 1 or on some round earlier than that.
} Section 11.4
<E>
<E> + <E>
<N> <N>
<D> 5 <D> 1
2
3
(b) Parse tree for 35+21.
<B>
( <B> ) <B>
( <B> ) <B>
( <B> ) <B>
(c) Parse tree for (()()).
} Section 11.5
<E>
<T >
<T> / <F>
<F> <N>
( <E> ) <D>
<E> + <T> 3
<F> <N>
<N> <D>
<D> 2
1
(d) Parse tree for (1+2)/3.
( ( ) ) ENDM
Call 3 Call 4
Call 5
Call 2
Call 1
(e) Sequence of calls made on input (()).
} Section 11.6
} Section 11.7
11.7.1(a):
STACK LOOKAHEAD REMAINING INPUT
1) <S> b seENDM
2) b<L>e b seENDM
3) <L>e s eENDM
4) <S><T>e s eENDM
5) s<T>e s eENDM
6) <T>e e ENDM
7) e e ENDM
8) ENDM
CHAPTER 11. RECURSIVE DESCRIPTION OF PATTERNS 71
Fig. (f)
11.7.1(c): See Fig. (f).
11.7.5: We factor the rst two productions to get
<Statement> ! if condition then <Statement> <Tail>
<Statement> ! simpleStat
<T ail> ! else <Statement> j
When <Statement> is on top of the stack, the lookahead symbol, if or simpleStat,
tells us which production for <Statement> to use. When we need to expand a
<Tail>, we make the rst choice on lookahead else and the second choice () on
any other lookahead.
72 SOLUTIONS TO SELECTED EXERCISES
} Section 11.8
11.8.1(a):
<A> ! a
<B> ! b
<C> ! <A> j <B>
<D> ! <C> <D> j
<E> ! <D> <A>
11.8.1(c):
<A> ! a
<B> ! b
<C> ! c
<D> ! <A> <D> j
<E> ! <B> <E> j
<F > ! <C> <F> j
<G> ! <D> <E>
<H> ! <G> <F>
11.8.3: If L were dened by a regular expresion, then it would also be dened by
a nite automaton. Suppose the language L = f0n10n j n 0g is the language of
some nite automaton A. Let A have m states. Consider what happens when A
has input 0m 10m . This string is in the language L, so there is a path with label
0m 10m from the start state of A to some nal state f . Consider the rst m + 1
states along this path. As A has only m dierent states, there will be two numbers
of 0's, say i and j , with 0 i < j m, such that after following i 0's and again
after following a total of j 0's, A is in the same state, say s.
Now, consider what happens when the input to A is 0m j +i 10m . The rst i
0's get us to state s. The remainder of the input, 0m j 10m takes us to state f ,
because we know that when the input was 0m 10m , A went from state s to state f
after reading the rst j 0's. Thus, A accepts 0m j +i10m , which is not in L, since
j > i. We contradict our assumption that A accepts language L. Since we assumed
nothing but that A did accept L, we conclude that no automaton accepts L. Hence,
L cannot be accepted by a regular expression.
CHAPTER 12. PROPOSITIONAL LOGIC 73
}}
}} Chapter 12. Propositional Logic
} Section 12.3
12.3.1(a): In this and the next answer we use 0 for FALSE and 1 for TRUE. The
function has domain consisting of pairs of truth values, for p and q respectively, and
a range that
is a truth value. This
function can therefore be represented as the set
f (0; 0); 0 ; (0; 1); 0 ; (1; 0); 1 ; (1; 1); 1 g.
12.3.1(c): f (0; 0); 1 ; (0; 1); 0 ; (1; 0); 0 ; (1; 1); 1 g.
} Section 12.4
12.4.1(a): A row has 1 unless both of the given columns have 1 in that row.
12.4.1(c): A row has 1 if the two given columns agree in the row, and 0 if not.
12.4.3: The logical expression p AND NOT q corresponds to the set expression P Q.
12.4.5: Here are the 16 Boolean functions of two variables.
p q f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15
00 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
01 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
10 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
11 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
Functions f0 , f5 , f10 , and f15 do not depend on the rst argument. That is, these
columns agree in the rows for pq = 00 and pq = 10, and they also agree in the rows
for pq = 01 and pq = 11. Functions f0 , f3 , f12 , and f1 5 do not depend on their
second argument.
12.4.7:
STATEMENT S (b): There are ab ways to paint b houses using a colors.
BASIS. b = 1. There are a colors for one house.
INDUCTION. Assume S (b) and prove S (b + 1). Consider the (b + 1)st house. For
each color choice for this house, there are, by the inductive hypothesis, ab ways to
paint the remaining houses. Thus there are b ab = ab+1 color choices for the b + 1
houses.
} Section 12.5
} Section 12.6
rs
00 01 11 10
00 0 1 1 1
01 1 1 1 1
pq
11 1 1 0 1
10 1 1 1 1
rs
00 01 11 10
00 0 1 0 1
01 1 0 1 0
pq
11 0 1 1 1
10 1 0 1 0
12.6.5:
(a) (p + q + r + s)(p + q + r + s)
(c) (p + q + r + s)(p + q + r + s)(p + q + r + s)(p + q + r + s)(p + q + r +
s)(p + q + r + s)(p + q + r + s)
(e) (p + q)(p + r)
76 SOLUTIONS TO SELECTED EXERCISES
rs
00 01 11 10
00 1 1 1 1
01 1 1 1 1
pq
11 0 0 0 0
10 1 1 0 0
} Section 12.7
} Section 12.8
To begin,
NOT (p1 p2 pk+1) NOT (p1 p2 pk ) + pk+1 (1)
by (12.20a), with p1 p2 pk in place of p and pk+1 in place of q.
By the inductive hypothesis, NOT (p1 p2 pk ) (p1 + p2 + + pk ). When
we make this substitution in (1) and use the associative law of +, we get exactly
S (k + 1).
We can also look at the proof from the point of view of truth tables. It is easy
to observe that both sides of S (k) have value 1 except when all of the pi 's are 1.
The two proofs for (12.20d) have essentially the same ideas.
12.8.9: The question is ill formed in two ways. First, there is the matter of a typo;
k should be n. More serious, there is a simple, noninductive proof of (12.24b), given
(12.24a). By (12.24a), with p1 p2 pn in place of p, we have (p1p2 pn ! q)
NOT (p1 p2 pn) + q . By (12.20c) and the associative law of +, (p1 p2 pn !
q) (p1 + p2 + + pn + q).
12.8.11:
(a) (wx + wxy + zxw) (wx + zxw) (wx)
(b) (w + x)(w + y + z)(w + x + y)(x) (w + x)(w + y + z)(x) (w +
y + z)(x)
} Section 12.9
AND2i=0 1 Ci q
k
} Section 12.10
12.10.1(a):
1) p!q Hypothesis
2) (p ! q) (p + q) Law 12.24(a)
3) p + q (d) with lines (1) and (2)
4) p!r Hypothesis
5) (p ! r) (p + r) Law 12.24(a)
6) p + r (d) with (4) and (5)
7) (p + q) AND (p + r) (c) with (3) and (6)
8) (p + q) AND (p + r) (p + qr) Law 12.14
9) p + qr (d) with (7) and (8)
10) (p + qr) (p ! qr) Law 12.24(a)
11) p ! qr (d) with (9) and (10)
CHAPTER 12. PROPOSITIONAL LOGIC 79
12.10.1(b):
1) p ! (q + r) Hypothesis
2) p ! (q + r) (p + q + r) Law 12.24(a)
3) p + q + r (d) with (1) and (2)
4) p ! (q + r) Hypothesis
5) p ! (q + r) (p + q + r) Law 12.24(a)
6) p + q + r (d) with (4) and (5)
7) (p + q + r)(p + q + r) (c) with (3) and (6)
8) (p + q + r)(p + q + r) (p + q + rr) Law 2.14
9) p + q + rr (d) with (7) and (8)
10) (rr) 0 Law 12.27
11) p + q + 0 Substitution into (7), using (10)
12) p + q Law 12.11
13) (p + q) (p ! q) Law 12.24(a)
14) p!q (d) with (12) and (13)
} Section 12.11
12.11.1:
p q r p + q p + r (p + q)(p + r) q + r (p + q)(p + r) ! (q + r)
0 0 0 0 1 0 0 1
0 0 1 0 1 0 1 1
0 1 0 1 1 1 1 1
0 1 1 1 1 1 1 1
1 0 0 1 0 0 0 1
1 0 1 1 1 1 1 1
1 1 0 1 0 0 1 1
1 1 1 1 1 1 1 1
}}
}} Chapter 13. Using Logic to Design Components
} Section 13.3
13.3.1(a):
x
} Section 13.4
w x y z
x y z
x y z
} Section 13.5
13.5.1: Using OR-gates with fan-in k, we can take the OR of n inputs with delay
logk n, by using a complete k-ary tree of OR-gates. If we used a cascading circuit
like that shown in Fig. 13.13 of the text, the delay would be (n k)=(k 1) + 1.
13.5.3: Let 2k be the smallest power of 2 that is no less than n. Then we can
take the OR of n inputs with k levels of 2-input OR-gates. That k levels is sucient
should be obvious. With that many levels, we can take the OR of 2k inputs, which
is at least n inputs. If n is strictly less than 2k , we can set 2k n of the inputs
to 0. That may let us eliminate some of the OR-gates. Elimination of gates cannot
increase the number of levels.
Also, we cannot take the OR of n inputs in fewer than k levels. In k 1 levels,
we can only take the OR of 2k 1 inputs, which is strictly less than n inputs, because
we chose k so that 2k is the smallest power of 2 equal to or greater than n.
} Section 13.6
} Section 13.7
x2 1-MUX x2 1-MUX
x1 1-MUX
y(x1 x2 )2
13.7.3: In Fig. (d) is the suggestion of a circuit that follows the second strategy of
the hint. Two one-hot decoders for d inputs each are used. The rst has inputs from
the rst d of 2d bits and the second has inputs from the last d bits. Their outputs,
y1 ; : : :; y2d and z1 ; : : :; z2d are combined in all possible ways through AND-gates, to
create 22d outputs, one for each possible setting of the 2d inputs.
Now, let us consider the gate count and delay for this circuit. For the case
d = 1, there is an obvious basis circuit that uses only a single inverter; it has count
1 and delay 1. For the inductive step, the delay increases by only 1 going from d to
2d inputs. Thus, the delay for d inputs is easily seen to be 1 + log2 d.
For the gate count, note that the circuit for 2d input uses twice the gates of
the d-input circuit, plus 22d AND-gates at the last level. Thus, the recurrence for
G(d), the number of gates in the d-input circuits, is
BASIS. G(1) = 1.
INDUCTION. G(2d) = 2G(d) + 22d .
CHAPTER 13. USING LOGIC TO DESIGN COMPONENTS 85
x1 xd xd+1 x2 d
... ...
One-Hot One-Hot
... ...
y1 y2d z1 z2d
...
} Section 13.8
13.8.1:
load
in
out
CHAPTER 14. PREDICATE LOGIC 87
}}
}} Chapter 14. Predicate Logic
} Section 14.2
14.2.1:
a) CS205 is a variable
b) cs205 is a constant
c) 205 is a constant
d) \cs205" is a constant
e) p(X; x) is a nonground atomic formula
f) p(3; 4; 5) is a ground atomic formula
g) \p(3; 4; 5)" is a constant
} Section 14.3
} Section 14.4
14.4.1:
a) (8X )(9Y ) NOT (p(X ) OR p(Y ) AND q(X ))
b) (9X )(NOT p(X ) AND ((9Y )p(Y ) OR (9X )q(X; Z )))
14.4.3: (9X )(NOT p(X ) AND ((9Y )p(Y ) OR (9W )q(W; Z )))
14.4.5:
a) (8C )csg(C , \C. Brown", \A")
b) (9C ) NOT csg(C , \C. Brown", \A")
} Section 14.5
1. D = fa; bg
2. loves(X; Y ) is true if XY is one of aa; ab; bc; bb
Under this interpretation (\everyone loves everyone"), expression (a) is true.
Now consider the interpretation I2 :
1. D = fa; bg
2. loves(X; Y ) is true if XY is one of ba; bb
Under this interpretation (\a is a misanthrope), expression (a) is false.
14.5.1(b): Interpretation I1:
1. D = fag
2. p(a) is true
Under I1 , expression (b) is true.
Interpretation I2 :
1. D = fag
2. p(a) is false
Under I2 , expression (b) is false.
14.5.1(c): Interpretation I1 :
1. D = fag
2. p(a) is true
Under I1 , expression (c) is true.
Interpretation I2 :
1. D = fa; bg
2. p(a) is true, p(b) is false
Under I2 , expression (b) is false because p(a) ! (8X )p(X ) is false.
14.5.1(d): Interpretation I1:
1. D = fa; b; cg
2. p(X; Y ) is true if XY is one of ab; bc; ac
Under I1 , expression (d) is true.
Interpretation I2 :
1. D = fa; b; cg
2. p(X; Y ) is true if XY is one of ab; bc
Under I2 , expression (d) is false.
} Section 14.6
14.6.1:
a) (r OR s) (s OR r) is a tautology in propositional logic (law 12.7). The
predicate logic expression (p(X ) OR q(Y )) (q(Y ) OR p(X )) is derived by
substituting p(X ) for r and q(Y ) for s.
CHAPTER 14. PREDICATE LOGIC 89
} Section 14.7
14.7.1:
a) (9X ) NOT p(X ) AND (9Y )(p(Y ) OR (9W ) q(W; Z )
b) (9X )((9Y )p(Y ) OR (9Z )q(Z ) OR r(X ))
14.7.3: Technically, law (14.12) does not allow us to change the binding of any
variable occurrence. Thus, law (14.12) does not allow us to conclude
p(X; Y ) AND (8X )q(X ) (8X ) p(X; Y ) AND q(X )
However, the two expressions are equivalent for other reasons.