Algorithms Design
Algorithms Design
Algorithms Design
Preface
3 Graphs 73
Basic Definitions and Applications 73
Graph Connectivity and Graph Traversal 78
Implementing Graph Traversal Using Queues and Stacks 87
Testing Bipartiteness: An Application of Breadth-First Search 94
Connectivity in Directed Graphs 97
Contents
Contents
6 2S1 451
6.1 Weighted Interval Scheduling: A Recursive Procedure 252 8.1 Polynomial-Time Reductions 452
6.2 Principles of Dynamic Programming: Memoization or Iteration 8.2 Reductions via "Gadgets": The Satisfiabflity Problem 459
over Subproblems 258 8.3 Efficient Certification and the Definition of NP 463
6.3 Segmented Least Squares: Multi-way Choices 26~ 8.4 NP-Complete Problems 466
8.5 Sequencing,,Problems 473
8.6 Partitioning Problems 481
* The star indicates an optional section. (See the Preface for more information about the relationships
8.7 Graph Coloring 485
among the chapters and sections.)
Contents Contents
X
8.8 Numerical Problems 490 11.8 Arbitrarily Good Approximations: The Knapsack Problem 644
8.9 Co-NP and the Asymmetry of NP 495 649
8.10 A Partial Taxonomy of Hard Problems 497 Exercises 651
500 Notes and Further Reading 659
Exercises 505
Notes and Further Reading 529 12 Local Search 661
12.1 The Landscape of an Optimization Problem 662
531 12.2 The Metropolis Algorithm and Simulated Annealing 666
9 PSPACE: A Class of Problems beyond NP
PSPACE 531 12.3 An Application of Local Se_arch to Hopfield Neural Networks
Some Hard Problems in PSPACE 533 12.4 676
Solving Quantified Problems and Games in Polynomia! 12.5 Choosing a Neighbor Relation 679
12.6 Classification via Local Search 681
Space 536
12.7 690
9.4 Solving the Planning Problem in Polynomial Space 538
543 700
547 Exercises 702
Exercises 550 Notes and Further Reading 705
Notes and Further Reading 551
13 707
553 13.1 A First Application: Contention Resolution 708
10 Extending the Limits of Tractability 13.2 Finding the Global Minimum Cut 714
10.! Finding Smal! Vertex Covers 554
13.3 Random Variables and Their Expectations 719
10.2 Solving NP-Hard Problems on Trees 558
13.4 A Randomized Approximation Algorithm for MAX 3-SAT 724
10.3 Coloring a Set of Circular Arcs 563
13.5 Randomized Divide and Conquer: Median-Finding and
* 10.4 Tree Decompositions of Graphs 572
Quicksort 727
* 10.5 584
13.6 Hashing: A Randomized Implementation of Dictionaries 734
591
13.7 Finding the Closest Pair of Points: A Randomized Approach 741
Exercises 594
13.8 Randomized Caching 750
Notes and Further Reading 598
13.9 Chernoff Bounds 758
13.10 Load Balancing 760
11 Approximation Algorithms 599
13.1! Packet Routing 762
11.1 Greedy Algorithms and Bounds on the Optimum: A Load 13.12 Background: Some Basic ProbabiLity Definitions 769
Balancing Problem 600
776
606
Exercises 782
11.3 Set Cover: A General Greedy Heuristic 612
Notes and Further Reading 793
11.4 The Pricing Method: Vertex Cover 618
11.5 Maximization via the Pricing Method: The Disioint Paths 795
Problem 624
805
11.6 Linear Programming and Rounding: An Application to Vertex
Cover 630 815
* 11.7 Load Balancing Revisited: A More Advanced LP Application 637
Algorithmic !deas are pervasive, and their reach is apparent in examples both
within computer science and beyond. Some of the major shifts in Internet
routing standards can be viewed as debates over the deficiencies of one
shortest-path algorithm and the relative advantages of another. The basic
notions used by biologists to express similarities among genes and genomes
have algorithmic definitions. The concerns voiced by economists over the
feasibility of combinatorial auctions in practice are rooted partly in the fact that
these auctions contain computationally intractable search problems as special
cases. And algorithmic notions aren’t just restricted to well-known and long-
standing problems; one sees the reflections of these ideas on a regular basis,
in novel issues arising across a wide range of areas. The scientist from Yahoo!
who told us over lunch one day about their system for serving ads to users was
describing a set of issues that, deep down, could be modeled as a network flow
problem. So was the former student, now a management consultant working
on staffing protocols for large hospitals, whom we happened to meet on a trip
to New York City.
The point is not simply that algorithms have many applications. The
deeper issue is that the subject of algorithms is a powerful lens through which
to view the field of computer science in general. Algorithmic problems form
the heart of computer science, but they rarely arrive as cleanly packaged,
mathematically precise questions. Rather, they tend to come bundled together
with lots of messy, application-specific detail, some of,it essential, some of it
extraneous. As a result, the algorithmic enterprise consists of two fundamental
components: the task of getting to the mathematically clean core of a problem,
and then the task of identifying the appropriate algorithm design techniques,
based on the structure of the problem. These two components interact: the
more comfortable one is with the full array of possible design techniques,
the more one starts to recognize the clean formulations that lie within messy
Preface Preface XV
xiv
problems out in the world. At their most effective, then, algorithmic ideas do intelligence (planning, game playing, Hopfield networks), computer vision
not just provide solutions to _well-posed problems; they form the language that (image segmentation), data mining (change-point detection, clustering), op-
lets you cleanly express the underlying questions. erations research (airline scheduling), and computational biology (sequence
alignment, RNA secondary structure).
The goal of our book is to convey this approach to algorithms, as a design
process that begins with problems arising across the full range of computing The notion of computational intractability, and NP-completeness in par-
applications, builds on an understanding of algorithm design techniques, and ticular, plays a large role in the book. This is consistent with how we think
results in the development of efficient solutions to these problems. We seek about the overall process of algorithm design. Some of the time, an interest-
to explore the role of algorithmic ideas in computer science generally, and ing problem arising in an application area will be amenable to an efficient
relate these ideas to the range of precisely formulated problems for which we solution, and some of the time it will be provably NP-complete; in order to
can design and analyze algorithms. In other words, what are the underlying fully address a new algorithmic problem, one should be able to explore both
issues that motivate these problems, and how did we choose these particular of these ol)tions with equal familiarity. Since so many natural problems in
ways of formulating them? How did we recognize which design principles were computer science are NP-complete, the development of methods to deal with
appropriate in different situations? intractable problems has become a crucial issue in the study of algorithms,
and our book heavily reflects this theme. The discovery that a problem is NP-
In keeping with this, our goal is to offer advice on how to identify clean
complete should not be taken as the end of the story, but as an invitation to
algorithmic problem formulations in complex issues from different areas of
begin looking for approximation algorithms, heuristic local search techniques,
computing and, from this, how to design efficient algorithms for the resulting
or tractable special cases. We include extensive coverage of each of these three
problems. Sophisticated algorithms are often best understood by reconstruct-
approaches.
ing the sequence of ideas--including false starts and dead ends--that led from
simpler initial approaches to the eventual solution. The result is a style of ex-
position that does not take the most direct route from problem statement to Problems and Solved Exercises
algorithm, but we feel it better reflects the way that we and our colleagues An important feature of the book is the collection of problems. Across all
genuinely think about these questions. chapters, the book includes over 200 problems, almost a!l of them developed
and class-tested in homework or exams as part of our teaching of the course
Overview at Cornell. We view the problems as a crucial component of the book, and
The book is intended for students who have completed a programming- they are structured in keeping with our overall approach to the material. Most
based two-semester introductory computer science sequence (the standard of them consist of extended verbal descriptions of a problem arising in an
"CS1/CS2" courses) in which they have written programs that implement application area in computer science or elsewhere out in the world, and part of
basic algorithms, manipulate discrete structures such as trees and graphs, and the problem is to practice what we discuss in the text: setting up the necessary
apply basic data structures such as arrays, lists, queues, and stacks. Since notation and formalization, designing an algorithm, and then analyzing it and
the interface between CS1/CS2 and a first algorithms course is not entirely proving it correct. (We view a complete answer to one of these problems as
standard, we begin the book with self-contained coverage of topics that at consisting of all these components: a fl~y explained algorithm, an analysis of
some institutions a_re familiar to students from CS1/CS2, but which at other the nmning time, and a proof of correctness.) The ideas for these problems
institutions are included in the syllabi of the first algorithms course. This come in large part from discussions we have had over the years with people
material can thus be treated either as a review or as new material; by including working in different areas, and in some cases they serve the dual purpose of
it, we hope the book can be used in a broader array of courses, and with more recording an interesting (though manageable) application of algorithms that
flexibility in the prerequisite knowiedge that is assumed. we haven’t seen written down anywhere else.
In keeping with the approach outlined above, we develop the basic algo- To help with the process of working on these problems, we include in
rithm design techniques by drawing on problems from across many areas of each chapter a section entitled "Solved Exercises," where we take one or more
computer science and related fields. To mention a few representative examples problems and describe how to go about formulating a solution. The discussion
here, we include fairly detailed discussions of applications from systems and devoted to each solved exercise is therefore significantly longer than what
networks (caching, switching, interdomain routing on the Internet), artificial would be needed simply to write a complete, correct solution (in other words,
xvi Preface Preface xvii
significantly longer than what it would take to receive full credit if these were login and password, search the site for either "Kleinberg°’ or "Tardos" or
being assigned as homework problems). Rather, as with the rest of the text, contact your local Addison-Wesley representative.
the discussions in these sections should be viewed as trying to give a sense Finally, we would appreciate receiving feedback on the book. In particular,
of the larger process by which one might think about problems of this type, as in any book of this length, there are undoubtedly errors that have remained
culminating in the speci.fication of a precise solution. in the final version. Comments and reports of errors can be sent to us by e-mail,
It is worth mentioning two points concerning the use of these problems at the address [email protected]; please include the word "feedback"
as homework in a course. First, the problems are sequenced roughly in order in the subject line of the message.
of increasing difficulty, but this is only an approximate guide and we advise
against placing too much weight on it: since the bulk of the problems were
designed as homework for our undergraduate class, large subsets of the Chapter-by-Chapter Synopsis
problems in each chapter are really closely comparable in terms of difficulty.
Chapter I starts by introducing some representative algorithmic problems. We
Second, aside from the lowest-numbered ones, the problems are designed to
involve some investment of time, both to relate the problem description to the begin immediately with the Stable Matching Problem, since we feel it sets
algorithmic techniques in the chapter, and then to actually design the necessary up the basic issues in algorithm design more concretely and more elegantly
than any abstract discussion could: stable matching is motivated by a natural
algorithm. In our undergraduate class, we have tended to assign roughly three
though complex real-world issue, from which one can abstract an interesting
of these problems per week.
problem statement and a surprisingly effective algorithm to solve this problem.
The remainder of Chapter 1 discusses a list of five "representative problems"
Pedagogical Features and Supplements
that foreshadow topics from the remainder of the course. These five problems
In addition to the Problems and solved exercises, the book has a number of are interrelated in the sense that they are all variations and/or special cases
further pedagogical features, as well as additional supplements to facilitate its of the Independent Set Problem; but one is solvable bya greedy algorithm,
use for teaching. one by dynamic programming, one by network flow, one (the Independent
As noted earlier, a large number of the sections in the book axe devoted Set Problem itself) is NP-complete, and one is PSPACE-complete. The fact that
to the formulation of an algorithmic problem--including its background and closely related problems can vary greatly in complexity is an important theme
underlying motivation--and the design and analysis of an algorithm for this of the book, and these five problems serve as milestones that reappear as the
problem. To reflect this style, these sections are consistently structured around book progresses.
a sequence of subsections: "The Problem," where the problem is described Chapters 2 and 3 cover the interface to the CS1/CS2 course sequence
and a precise formulation is worked out; "Designing the Algorithm," where mentioned earlier. Chapter 2 introduces the key mathematical definitions and
the appropriate design technique is employed to develop an algorithm; and notations used for analyzing algorithms, as wel! as the motivating principles
"Analyzing the Algorithm," which proves properties of the algorithm and behind them. It begins with an informal overview of what it means for a prob-
analyzes its efficiency. These subsections are highlighted in the text with an lem to be computationally tractable, together with the concept of polynomial
icon depicting a feather. In cases where extensions to the problem or further time as a formal notion of efficiency. It then discusses growth rates of func-
analysis of the algorithm is pursued, there are additional subsections devoted tions and asymptotic analysis more formally, and offers a guide to commordy
to these issues. The goal of this structure is to offer a relatively uniform style occurring functions in algorithm analysis, together with standard applications
of presentation that moves from the initial discussion of a problem arising in a in which they arise. Chapter 3 covers the basic definitions and algorithmic
computing application through to the detailed analysis of a method to solve it. primitives needed for working with graphs, which are central to so many of
A number of supplements are available in support of the book itself. An the problems in the book. A number of basic graph algorithms are often im-
instructor’s manual works through al! the problems, providing fi~ solutions to plemented by students late in the CS1/CS2 course sequence, but it is valuable
each. A set of lecture slides, developed by Kevin Wayne of Princeton University, to present the material here in a broader algorithm design context. In par-
is also available; these slides follow the order of the book’s sections and can ticular, we discuss basic graph definitions, graph traversal techniques such
thus be used as the foundation for lectures in a course based on the book. These as breadth-first search and depth-first search, and directed graph concepts
files are available at wunv.aw.com. For instructions on obtaining a professor including strong connectivity and topological ordering.
Preface Preface
Chapters 2 and 3 also present many of the basic data structures that will find this is a valuable way to emphasize that intractability doesn’t end at
be used for implementing algorithms throughout the book; more advanced NP-completeness, and PSPACE-completeness also forms the underpinning for
data structures are presented in subsequent chapters. Our approach to data some central notions from artificial intelligence--planning and game playing--
structures is to introduce them as they are needed for the implementation of that would otherwise not find a place in the algorithmic landscape we are
the algorithms being developed in the book. Thus, although many of the data surveying.
structures covered herewill be familiar to students from the CS1/CS2 sequence,
Chapters 10 through 12 cover three maior techniques for dealing with com-
our focus is on these data structures in the broader context of algorithm design
putationally intractable problems: identification of structured special cases,
and analysis.
approximation algorithms, and local search heuristics. Our chapter on tractable
Chapters 4 through 7 cover four major algorithm design techniques: greedy special cases emphasizes that instances of NP-complete problems arising in
algorithms, divide and conquer, dynamic programming, and network flow. practice may not be nearly as hard as worst-case instances, because they often
With greedy algorithms, the challenge is to recognize when they work and contain some structure that can be exploited in the design of an efficient algo-
when they don’t; our coverage of this topic is centered around a way of clas- rithm. We illustrate how NP-complete problems are often efficiently solvable
sifying the kinds of arguments used to prove greedy algorithms correct. This when restricted to tree-structured inputs, and we conclude with an extended
chapter concludes with some of the main applications of greedy algorithms, discussion of tree decompositions of graphs. While this topic is more suit-
for shortest paths, undirected and directed spanning trees, clustering, and able for a graduate course than for an undergraduate one, it is a technique
compression. For divide and conquer, we begin with a discussion of strategies with considerable practical utility for which it is hard to find an existing
for solving recurrence relations as bounds on running times; we then show. accessible reference for students. Our chapter on approximation algorithms
how familiarity with these recurrences can guide thedesign of algorithms that discusses both the process of designing effective algorithms and the task of
improve over straightforward approaches to a number of basic problems, in- understanding the optimal solution well enough to obtain good bounds on it.
cluding the comparison of rankings, the computation of c!osest pairs of points As design techniques for approximation algorithms, we focus on greedy algo-
in the plane, and the Fast Fourier Transform. Next we develop dynamic pro- rithms, linear programming, anda third method we refer to as "pricing:’ which
gramming by starting with the recursive intuition behind it, and subsequently incorporates ideas from each of the first two. Finally, we discuss local search
building up more and more expressive recurrence formulations through appli- heuristics, including the Metropolis algorithm and simulated annealing. This
cations in which they naturally arise. This chapter concludes with extended topic is often missing from undergraduate algorithms courses, because very
discussions of the dynamic programming approach to two fundamental prob- little is known in the way of provable guarantees for these algorithms; how-
lems: sequence alignment, with applications in computational biology; and ever, given their widespread use in practice, we feel it is valuable for students
shortest paths in graphs, with connections to Internet routing protocols. Fi- to know something about them, and we also include some cases in which
nally, we cover algorithms for network flow problems, devoting much of our guarantees can be proved.
focus in this chapter to discussing a large array of different flow applications. Chapter 13 covers the use of randomization in the design of algorithms.
To the extent that network flow is covered in algorithms courses, students are This is a topic on which several nice graduate-level books have been written.
often left without an appreciation for the wide range of problems to which it Our goal here is to provide a more compact introduction to some of the
can be applied; we try to do iustice to its versatility by presenting applications ways in which students can apply randomized techniques using the kind of
to load balancing, scheduling, image segmentation, and a number of other background in probability one typically gains from an undergraduate discrete
problems. math course.
Chapters 8 and 9 cover computational intractability. We devote most of
our attention to NP-completeness, organizing the basic NP-complete problems Use of the Book
thematically to help students recognize candidates for reductions when they The book is primarily designed for use in a first undergraduate course on
encounter new problems. We build up to some fairly complex proofs of NP- algorithms, but it can also be used as the basis for an introductory graduate
completeness, with guidance on how one goes about constructing a difficult course.
~reduction. We also consider types of computational hardness beyond NP- When we use the book at the undergraduate level, we spend roughly
completeness, particularly through the topic of PSPACE-completeness. We one lecture per numbered section; in cases where there is more than one
Preface Preface xxi
lecture’s worth of material in a section (for example, when a section provides might be able to use particular algorithm design techniques in the context of
further applications as additional examples), we treat this extra material as a their own work. A number of graduate students and colleagues have used
supplement that students carl read about outside of lecture. We skip the starred portions of the book in this way.
sections; while these sections contain important topics, they are less central
to the development of the subject, and in some cases they are harder as well.
We also tend to skip one or two other sections per chapter in the first half of Acknowledgments
the book (for example, we tend to skip Sections 4.3, 4.7-4.8, 5.5-5.6, 6.5, 7.6, This book grew out of the sequence of algorithms co~ses that we have taught
and 7.!1). We cover roughly half of each of Chapters 11-13. at Cornell. These courses have grown, as the field has grown, over a number of
This last point is worth emphasizing: rather than viewing the later chapters years, and they reflect the influence of the Comell faculty who helped to shape
as "advanced," and hence off-limits to undergraduate algorithms courses, we them during this time, including Juris Hartmanis, Monika Henzinger, John
have designed them with the goal that the first few sections of each should Hopcroft, Dexter Kozen, Ronitt Rubinfeld, and Sam Toueg. More generally, we
be accessible to an undergraduate audience. Our own undergraduate course would like to thank al! our colleagues at Corne!l for countless discussions both
involves material from all these chapters, as we feel that all of these topics on the material here and on broader issues about the nature of the field.
have an important place at the undergraduate level. The course staffs we’ve had in teaching the subject have been tremen-
Finally, we treat Chapters 2 and 3 primarily as a review of material from dously helpful in the formulation of this material. We thank our undergradu-
earlier courses; but, as discussed above, the use of these two chapters depends ate and graduate teaching assistants, Siddharth Alexander, Rie Ando, Elliot
heavily on the relationship of each specific course to its prerequisites. Anshelevich, Lars Backstrom, Steve Baker, Ralph Benzinger, John Bicket,
Doug Burdick, Mike Connor, Vladimir Dizhoor, Shaddin Doghmi, Alexan-
The resulting syllabus looks roughly as follows: Chapter 1; Chapters 4-8 der Druyan, Bowei Du, Sasha Evfimievski, Ariful Gan~.,_ Vadim Grinshpun,
(excluding 4.3, 4.7-4.9, 5.5-5.6, 6.5, 6.10, 7.4, 7.6, 7.11, and 7.13); Chapter 9 Ara Hayrapetyan, Chris Jeuell, Igor Kats, Omar Khan£ Mikhail Kobyakov,
(briefly); Chapter 10, Sections.10.! and 10.2; Chapter 11, Sections 11.1, 11.2, Alexei Kopylov, Brian Kulis, Amit Kumar, Yeongwee Lee, Henry Lin, Ash-
11.6, and 11.8; Chapter 12, Sections 12.1-12.3; and Chapter 13, Sections 13.1- win Machanavajjhala, Ayan Mandal, Bill McCloskey, Leonid Meyerguz, Evan
13.5. Moran, Niranjan Nagarajan, Tina Nolte, Travis Ortogero, Martin P~il, Jon
The book also naturally supports an introductory graduate course on Peress, Matt Piotrowski, Joe Polastre, Mike Priscott, Xin Qi, Venu Ramasubra-
algorithms. Our view of such a course is that it should introduce students manian, Aditya Rao, David Richardson, Brian Sabino, Rachit Siamwalla, Se-
destined for research in all different areas to the important current themes in bastian Sllgardo, Alex Slivkins, Chaitanya Swamy, Perry Tam, Nadya Travinin,
algorithm design. Here we find the emphasis on formulating problems to be Sergei Vassilvitskii, Matthew Wachs, Tom Wexler, Shan-Leung Maverick Woo,
useful as well, since students will soon be trying to define their own research Justin Yang, and Misha Zatsman. Many of them have provided valuable in-
problems in many different subfields. For this type of course, we cover the sights, suggestions, and comments on the text. We also thank all the students
later topics in Chapters 4 and 6 (Sections 4.5-4.9 and 6.5-6.10), cover all of in these classes who have provided comments and feedback on early drafts of
Chapter 7 (moving more rapidly through the early sections), quickly cover NP- the book over the years.
completeness in Chapter 8 (since many beginning graduate students will have For the past several years, the development of the book has benefited
seen this topic as undergraduates), and then spend the remainder of the time greatly from the feedback and advice of colleagues who have used prepubli-
on Chapters !0-13. Although our focus in an introductory graduate course is cation drafts for teaching. Anna Karlin fearlessly adopted a draft as her course
on the more advanced sections, we find it usefifl for the students to have the textbook at the University of Washington when it was st~ in an early stage of
full book to consult for reviewing or filling in background knowledge, given development; she was followed by a number of people who have used it either
the range of different undergraduate backgrounds among the students in such as a course textbook or as a resource for teaching: Paul Beame, Allan Borodin,
a course. Devdatt Dubhashi, David Kempe, Gene Kleinberg, Dexter Kozen, Amit Kumar,
Finally, the book can be used to support self-study by graduate students, Mike Molloy, Yuval Rabani, Tim Roughgarden, Alexa Sharp, Shanghua Teng,
researchers, or computer professionals who want to get a sense for how they Aravind Srinivasan, Dieter van Melkebeek, Kevin Wayne, Tom Wexler, and
xxii Preface Preface xxiii
Sue Whitesides. We deeply appreciate their input and advice, which has in- This book was begun amid the irrational exuberance of the late nineties,
formed many of our revisions to the content. We would like to additionally when the arc of computing technology seemed, to many of us, briefly to pass
thank Kevin Wayne for producing supplementary material associated with the through a place traditionally occupied by celebrities and other inhabitants of
book, which promises to greatly extend its utility to future instructors. the pop-cultural firmament. (It was probably iust in our imaginations.) Now,
several years after the hype and stock prices have come back to earth, one can
In a number of other cases, our approach to particular topics in the book appreciate that in some ways computer science was forever changed by this
reflects the infuence of specific colleagues. Many of these contributions have period, and in other ways it has remained the same: the driving excitement
undoubtedly escaped our notice, but we especially thank Yufi Boykov, Ron
that has characterized the field since its early days is as strong and enticing as
Elber, Dan Huttenlocher, Bobby Kleinberg, Evie Kleinberg, Lillian Lee, David
ever, the public’s fascination with information technology is still vibrant, and
McAllester, Mark Newman, Prabhakar Raghavan, Bart Selman, David Shmoys,
the reach of computing continues to extend into new disciplines. And so to
St~ve Strogatz, Olga Veksler, Duncan Watts, and Ramin Zabih.
all students of the subject, drawn to it for so many different reasons, we hope
It has been a pleasure working with Addison Wesley over the past year. you find this book an enjoyable and useful guide wherever your computational
First and foremost, we thank Matt Goldstein for all his advice and guidance in pursuits may take you.
this process, and for helping us to synthesize a vast amount of review material
into a concrete plan that improved the book. Our early conversations about Jon Kleinberg
the book with Susan Hartman were extremely valuable as well. We thank Matt gva Tardos
and Susan, together with Michelle Brown, Marilyn Lloyd, Patty Mahtani, and. Ithaca, 2005
Maite Suarez-Rivas at Addison Wesley, and Paul Anagnostopoulos and Jacqui
Scarlott at Windfall Software, for all their work on the editing, production, and
management of the proiect. We fln-ther thank Paul and Jacqui for their expert
composition of the book. We thank Joyce Wells for the cover design, Nancy
Murphy of Dartmouth Publishing for her work on the figures, Ted Laux for
the indexing, and Carol Leyba and Jennifer McClain for the copyedifing and
proofreading.
We thank Anselm Blumer (Tufts University), Richard Chang (University of
Maryland, Baltimore County), Kevin Compton (University of Michigan), Diane
Cook (University of Texas, Arlington), Sariel Har-Peled (University of Illinois,
Urbana-Champaign), Sanjeev Khanna (University of Pennsylvania), Philip
Klein (Brown University), David Matthias (Ohio State University), Adam Mey-
erson (UCLA), Michael Mitzenmacher (Harvard University), Stephan Olariu
(Old Dominion University), Mohan Paturi (UC San Diego), Edgar Ramos (Uni-
versity of Illinois, Urbana-Champaign), Sanjay Ranka (University of Florida,
Gainesville), Leon Reznik (Rochester Institute of Technology), Subhash Suri
(UC Santa Barbara), Dieter van Melkebeek (University of Wisconsin, Madi-
son), and Bulent Yener (Rensselaer Polytechnic Institute) who generously
contributed their time to provide detailed and thoughtful reviews of the man-
uscript; their comments led to numerous improvements, both large and small,
in the final version of the text.
Finally, we thank our families--Lillian and Alice, and David, Rebecca, and
Amy. We appreciate their support, patience, and many other contributions
more than we can express in any acknowledgments here.
1.1 A First Problem: Stable Matching
As an opening topic, we look at an algorithmic problem that nicely illustrates
many of the themes we will be emphasizing. It is motivated by some very
natural and practical concerns, and from these we formulate a clean and
simple statement of a problem. The algorithm to solve the problem is very
clean as well, and most of our work will be spent in proving that it is correct
and giving an acceptable bound on the amount of time it takes to terminate
with an answer. The problem itself--the Stable Matching Problem--has several
origins.
~ The Problem
The Stable Matching Problem originated, in part, in 1962, when David Gale
and Lloyd Shapley, two mathematical economists, asked the question: Could
one design a college admissions process, or a job recruiting process, that was
self-enforcing? What did they mean by this?
To set up the question, let’s first think informally about the kind of situation
that might arise as a group of friends, all iurdors in college majoring in
computer science, begin applying to companies for summer internships. The
crux of the application process is the interplay between two different types
of parties: companies (the employers) and students (the applicants). Each
applicant has a preference ordering on companies, and each company--once
the applications Come in--forms a preference ordering on its applicants. Based
on these preferences, companies extend offers to some of their applicants,
applicants choose which of their offers to accept, and people begin heading
off to their summer internships.
Chapter 1 Introduction: Some Representative Problems 1. ! A First Problem: Stable,Matching
2
Gale and Shapley considered the sorts of things that could start going (i) E prefers every one of its accepted applicants to A; or
wrong with this process, in the absence of any mechanism to enforce the status (ii) A prefers her current situation over working for employer E.
quo. Suppose, for example, that your friend Raj has iust accepted a summer job
If this holds, the outcome is stable: individual self-interest will prevent any
at the large telecommunications company CluNet. A few days later, the small
applicant/employer deal from being made behind the scenes.
start-up company WebExodus, which had been dragging its feet on making a
few final decisions, calls up Rai and offers him a summer iob as well. Now, R Gale and Shapley proceeded to develop a striking algorithmic solution to
actually prefers WebExodus to CluNet--won over perhaps by the laid-back, this problem, which we will discuss presently. Before doing this, let’s note that
anything-can-happen atmosphere--and so this new development may well this is not the only origin of the Stable Matching Problem. It turns out that for
cause him to retract his acceptance of the CluNet offer and go to WebExodus a decade before the work of Gale and Shapley, unbeknownst to them, the
instead. Suddenly down one summer intern, CluNet offers a job to one of its National Resident Matching Program had been using a very similar procedure,
wait-listed applicants, who promptly retracts his previous acceptance of an with the same underlying motivation, to match residents to hospitals. Indeed,
offer from the software giant Babelsoft, and the situation begins to spiral out this system, with relatively little change, is still in use today.
of control. This is one testament to the problem’s fundamental appeal. And from the
Things look just as bad, if not worse, from the other direction. Suppose point of view of this book, it provides us with a nice first domain in which
that Raj’s friend Chelsea, destined to go to Babelsoft but having just heard Raj’s to reason about some basic combinatorial definitions and the algorithms that
story, calls up the people at WebExodus and says, "You know, I’d really rather build on them.
spend the summer with you guys than at Babelsoft." They find this very easy Formulating the Problem To get at the essence of this concept, it helps to
to believe; and furthermore, on looking at Chelsea’s application, they realize make the problem as clean as possible. The world of companies and applicants
that they would have rather hired her than some other student who actually contains some distracting asymmetries. Each applicant is looking for a single
is scheduled to spend the summer at WebExodus. In this case, if WebExodus company, but each company is looking for many applicants; moreover, there
were a slightly less scrupulous company, it might well find some way to retract may be more (or, as is sometimes the case, fewer) applicants than there are
its offer to this other student and hire Chelsea instead. available slots for summer iobs. Finally, each applicant does not typica!ly apply
Situations like this can rapidly generate a lot of chaos, and many people-- to every company.
both applicants and employers--can end up unhappy with the process as well It is useful, at least initially, to eliminate these complications and arrive at a
as the outcome. What has gone wrong? One basic problem is that the process more "bare-bones" version of the problem: each of n applicants applies to each
is not self-enforcing--if people are allowed to act in their self-interest, then it of n companies, and each company wants to accept a single applicant. We will
risks breaking down. see that doing this preserves the fundamental issues inherent in the problem;
We might well prefer the following, more stable situation, in which self- in particular, our solution to this simplified version will extend directly to the
interest itself prevents offers from being retracted and redirected. Consider more general case as well.
another student, who has arranged to spend the summer at CluNet but calls Following Gale and Shapley, we observe that this special case can be
up WebExodus and reveals that he, too, would rather work for them. But in viewed as the problem of devising a system by which each of n men and
this case, based on the offers already accepted, they are able to reply, "No, it n women can end up getting married: our problem naturally has the analogue
turns out that we prefer each of the students we’ve accepted to you, so we’re of two "genders"--the applicants and the companies--and in the case we are
afraid there’s nothing we can do." Or consider an employer, earnestly following considering, everyone is seeking to be paired with exactly one individual of
up with its top applicants who went elsewhere, being told by each of them, the opposite gender.1
"No, I’m happy where I am." In such a case, all the outcomes are stable--there
are no further outside deals that can be made.
So this is the question Gale and Shapley asked: Given a set of preferences 1 Gale and Shapley considered the same-sex Stable Matching Problem as well, where there is only a
among employers and applicants, can we assign applicants to employers so single gender. This is motivated by related applications, but it turns out to be fairly different at a
that for every employer E, and every applicant A who is not scheduled to work technical level. Given the applicant-employer application we’re considering here, we’ll be focusing
for E, at least one of the following two things is the case? on the version with two genders.
4 Chapter 1 Introduction: Some Representative Problems !.1 A First Problem: Stable Matching
So consider a set M = {m1 ..... ran} of n men, and a set W = {iv1 ..... Ivn} Iv prefers m to m’.
of n women. Let M x W denote the set of all possible ordered pairs of the for Iv’ prefers m to m’.
(m, Iv), where m ~ M and Iv ~ W. A matching S is a set of ordered pairs, each
from M x W, with the property that each member of M and each member If we think about this set of preference lists intuitively, it represents complete
W appears in at most one pair in S. A perfect matching S’ is a matching w agreement: the men agree on the order of the women, and the women agree
the property that each member of M and each member of W appears in exactl on the order of the men. There is a unique stable matching here, consisting
one pair in S’. of the pairs (m, Iv) and (m’, Iv’). The other perfect matching, consisting of the
pairs (m’, Iv) and (m, Iv’), would not be a stable matching, because the pair
Matchings and perfect matchings are objects that will recur freque (m, Iv) would form an instability with respect to this matching. (Both m and
throughout the book; they arise naturally in modeling a wide range of a
I~
n instability: m and w’~
each prefer the other to
eir current partners.
rithmic problems. In the present situation, a perfect matching correspo
simply to a way of pairing off the men with the women, in such a way
Iv would want to leave their respective partners and pair up.)
Next, here’s an example where things are a bit more intricate. Suppose
the preferences are
everyone ends up married to somebody, and nobody is married to more th
one person--there is neither singlehood nor polygamy. m prefers Iv to Iv’.
Now we can add the notion of preferences to this setting. Each man m ~ m’ prefers Iv’ to Iv.
ranks all the women; we will say that m prefers Iv to Iv’ if m ranks Iv high Iv prefers m’ to m.
than Iv’. We will refer to the ordered ranking of m as his preference list. We Iv’ prefers m to m’.
not allow ties in the ranking. Each woman, analogously, ranks all the me
Given a perfect matching S, what can go wrong? Guided by our in What’s going on in this case? The two men’s preferences mesh perfectly with
motivation in terms of employers and applicants, we should be worried ab each other (they rank different women first), and the two women’s preferences
the following situation: There are two pairs (m, Iv) and (m’, to’) in S likewise mesh perfectly with each other. But the men’s preferences clash
Figure 1.1 Perfect matching depicted in Figure 1.1) with the property that m prefers w’ to Iv, and Iv’ pref completely with the women’s preferences.
S with instability (m, w’). m to m’. In this case, there’s nothing to stop m and Iv’ from abandoning In this second example, there are two different stable matchings. The
current partners and heading off together; the set of marriages is not s matching consisting of the pairs (m, w) and (m’, w’) is stable, because both
enforcing. We’ll say that such a pair (m, Iv’) is an instability with respect t men are as happy as possible, so neither would leave their matched partner.
(m, Iv’) does not belong to S, but each of m and Iv’ prefers the other to th But the matching consisting of the pairs (m’, w) and (m, w’) is also stable, for
partner in S. the complementary reason that both women are as happy as possible. This is
Our goal, then, is a set of marriages with no instabilities. We’ll say an important point to remember as we go forward--it’s possible for an instance
a matching S is stable if (i) it is perfect, and (ii) there is no instability to have more than one stable matching.
respect to S. Two questions spring immediately to mind:
~:~ Designing the Algorithm
Does there exist a stable matching for every set of preference lists? we now show that there exists a stable matching for every set of preference
Given a set of preference lists, can we efficiently construct a st lists among the men and women. Moreover, our means of showing this will
matching if there is one? also answer the second question that we asked above: we will give an efficient
algorithm that takes the preference lists and constructs a stable matching.
Some Examples To illustrate these definitions, consider the following two
very simple instances of the Stable Matching Problem. Let us consider some of the basic ideas that.motivate the algorithm.
First, suppose we have a set of two men, fro, m’}, and a set of two women, Initially, everyone is unmarried. Suppose an unmarried man m chooses
{iv, Iv’}. The preference lists are as follows: the woman Iv who ranks highest on his preference list and proposes to
her. Can we declare immediately that (m, Iv) wii1 be one of the pairs in our
prefers Iv to Iv’. final stable matching? Not necessarily: at some point in the future, a man
prefers Iv to IV’. m’ whom Iv prefers may propose to her. On the other hand, it would be
6 Chapter 1 Introduction: Some Representative Problems 1.1 A First Problem: Stable Matching 7
dangerous for w to reject m right away; she may never receive a proposal ~ Analyzing the Algorithm
I~
oman w will become~
ngaged to m if she |
refers him to rat J
from someone she ranks as highly as m. So a natural idea would be to
have the pair (m, w) enter an intermediate state--engagement.
Suppose we are now at a state in which some men and women are/Tee--
First consider the view of a woman w during the execution of the algorithm.
For a while, no one has proposed to her, and she is free. Then a man m may
propose to her, and she becomes engaged. As time goes on, she may receive
not engaged--and some are engaged. The next step could look like this. additional proposals, accepting those that increase the rank of her partner. So
An arbitrary flee man m chooses the highest-ranked woman w to whom we discover the following.
© he has not yet proposed, and he proposes to her. If w is also free, then m
and w become engaged. Otherwise, w is already engaged to some other
(1.1) w remains engaged /Tom the point at which she receives her first
© man m’. In this case, she determines which of m or m’ ranks higher
proposal; and the sequence of partners to which she is engaged gets better and
better (in terms of her preference list).
© on her preference list; this man becomes engaged to w and the other
becomes flee, The view of a man m during the execution of the algorithm is rather
Finally, the algorithm wil! terminate when no one is free; at this moment, different. He is free until he proposes to the highest-ranked woman on his
all engagements are declared final, and the resulting perfect matchdng is list; at this point he may or may not become engaged. As time goes on, he
returned. may alternate between being free and being engaged; however, the following
property does hold.
Here is a concrete description of the Gale-Shapley algorithm, with Fig-
(1.2) The sequence of women to whom m proposes gets worse and worse (in
Figure 1.2 An intermediate ure 1.2 depicting a state of the algorithm.
state of the G-S algorithm
terms of his preference list).
when a free man ra is propos-
ing to a woman w. Initially all m E M and w E W are free Now we show that the algorithm terminates, and give a bound on the
While there is a man m who is free and hasn’t proposed to maximum number of iterations needed for termination.
every woman
Choose such a man m (1,3) The G-S algorithm terminates after at most n2 iterations of the While
Let w be the highest-ranked woman in m’s preference list loop.
to whom m has not yet proposed
If ~ is free then Proof. A useful strategy for upper-bounding the running time of an algorithm,
(m, ~) become engaged as we are trying to do here, is to find a measure of progress. Namely, we seek
Else ~ is currently engaged to m’ some precise way of saying that each step taken by the algorithm brings it
If ~ prefers m’ to m then closer to termination.
m remains free In the case of the present algorithm, each iteration consists of some man
Else w prefers m to m’ proposing (for the only time) to a woman he has never proposed to before. So
(m,~) become engaged if we let ~P(t) denote the set of pairs (m, w) such that m has proposed to w by
nlI becomes free the end of iteration t, we see that for all t, the size of ~P(t + 1) is strictly greater
Endif than the size of ~P(t). But there are only n2 possible pairs of men and women
Endif in total, so the value of ~P(.) can increase at most n2 times over the course of
Endwhile the algorithm. It follows that there can be at most n2 iterations. []
Return the set S of engaged pairs
Two points are worth noting about the previous fact and its proof. First,
An intriguing thing is that, although the G-S algorithm is quite simpl there are executions of the algorithm (with certain preference lists) that can
to state, it is not immediately obvious that it returns a stable matching, or involve close to n2 iterations, so this analysis is not far from the best possible.
even a perfect matching. We proceed to prove this now, through a sequence Second, there are many quantities that would not have worked well as a
of intermediate facts. progress measure for the algorithm, since they need not strictly increase in each
Chapter 1 Introduction: Some Representative Problems 1.1 A First Problem: Stable Matching 9
8
iteration. For example, the number of free individuals could remain constant this execution? If he didn’t, then w must occur higher on m’s preference.list
from one iteration to the next, as could the number of engaged pairs. Thus, than w’, contxadicting our assumption that m prefers w’ to w. If he did, then
these quantities could not be used directly in giving an upper bound on the he was rejected by w’ in favor of some other man m", whom w’ prefers to m.
maximum possible number of.iterations, in the style of the previous paragraph. m’ is the final partner of w’, so either m" = m’ or, by (1.!), w’ prefers her final
Let us now establish that the set S returned at the termination of the partner m~ to m"; either way this contradicts our assumption that w’ prefers
m to mI.
algorithm is in fact a perfect matching. Why is this not immediately obvious?
Essentially, we have to show that no man can "fall off" the end of his preference It follows that S is a stable matching. []
list; the only way for the ~’h±].e loop to exit is for there to be no flee man. In
this case, the set of engaged couples would indeed be a perfect matching. Extensions
So the main thing we need to show is the following. We began by defining the notion of a stable matching; we have just proven
that the G-S algorithm actually constructs one. We now consider some further
(1.4) If m is free at some point in the execution of the algorithm, then there
questions about the behavior of the G-S algorithm and its relation to the
is a woman to whom he has not yet proposed.
properties of different stable matchings.
Proof. Suppose there comes a point when m is flee but has already proposed To begin wit_h, recall that we saw an example earlier in which there could
to every woman. Then by (1.1), each of the n women is engaged at this point be multiple stable matchings. To recap, the preference lists in this example
in time. Since the set of engaged pairs forms a matching, there must also be were as follows:
n engaged men at this point in time. But there are only n men total, and m is
not engaged, so this is a contradiction. ,, prefers w to w’.
~ prefers w’ to w.
(1..~) The set S returned at termination is a peryect matching. prefers m~ to m.
Proof. The set of engaged pairs always forms a matching. Let us suppose that prefers m to m’.
the algorithm terminates with a flee man m. At termination, it must be the Now, in any execution of the Gale-Shapley algorithm, m will become engaged
case that m had already proposed to every woman, for otherwise the ~qhile to w, m’ will become engaged to w’ (perhaps in the other order), and things
loop would not have exited. But this contradicts (1.4), which says that there will stop there. Thus, the other stable matching, consisting of the pairs (m’, w)
cannot be a flee man who has proposed to every woman. ,, and (m, w’), is not attainable from an execution of the G-S algorithm in which
the men propose. On the other hand, it would be reached if we ran a version of
Finally, we prove the main property of the algorithm--namely, that it the algorithm in which the women propose. And in larger examples, with more
results in a stable matching. than two people on each side, we can have an even larger collection of possible
stable matchings, many of them not achievable by any natural algorithm.
(1.6) Consider an executionof the G-S algorithm that returns a set of pairs
S. The set S is a stable matching. This example shows a certain "unfairness" in the G-S algorithm, favoring
men. If the men’s preferences mesh perfectly (they all list different women as
Proof. We have already seen, in (1.5), that S is a perfect matching. Thus, to their first choice), then in all runs of the G-S algorithm all men end up matched
prove S is a stable matching, we will assume that there is an instability with with their first choice, independent of the preferences of the women. If the
women’s preferences clash completely with the men’s preferences (as was the
respect to S and obtain a contradiction. As defined earlier, such an instability
would involve two pairs, (m, w) and (m’, w’), in S with the properties that case in this example), then the resulting stable matching is as bad as possible
for the women. So this simple set of preference lists compactly summarizes a
o m prefers w’ to w, and world in which someone is destined to end up unhappy: women are unhappy
o w’ prefers m to mL if men propose, and men are unhappy if women propose.
In the execution of the algorithm that produced S, m’s last proposal was, by Let’s now analyze the G-S algorithm in more detail and try to understand
definition, to w. Now we ask: Did m propose to w’ at some earlier point in how general this "unfairness" phenomenon is.
10 Chapter 1 Introduction: Some Representative Problems 1.1 A First Problem: Stable Matching 11
To begin With, our example reinforces the point that the G-S algorithm our question above by showing that the order of proposals in the G-S algorithm
is actually underspecified: as long as there is a free man, we are allowed to has absolutely no effect on the final outcome.
choose any flee man to make the next proposal. Different choices specify Despite all this, the proof is not so difficult.
different executions of the algprithm; this is why, to be careful, we stated (1.6)
as "Consider an execution of the G-S algorithm that returns a set of pairs S," Proof. Let us suppose, by way of contradiction, that some execution g of the
instead of "Consider the set S returned by the G-S algorithm." G-S algorithm results in a matching S in which some man is paired with a
Thus, we encounter another very natural question: Do all executions of woman who is not his best valid partner. Since men propose in decreasing
the G-S algorithm yield the same matching? This is a genre of question that order of preference, this means that some man is rejected by a valid partner
arises in many settings in computer science: we have an algorithm that runs during the execution g of the algorithm. So consider the first moment during
asynchronously, with different independent components performing actions the execution g in which some man, say m, is rejected by a valid partner iv.
that can be interleaved in complex ways, and we want to know how much Again, since men propose in decreasing order of preference, and since this is
variability this asynchrony causes in the final outcome. To consider a very the first time such a rejection has occurred, it must be that iv is m’s best valid
different kind of example, the independent components may not be men and partner best(m).
women but electronic components activating parts of an airplane wing; the The reiection of m by iv may have happened either because m proposed
effect of asynchrony in their behavior can be a big deal. and was turned down in favor of iv’s existing engagement, or because iv broke
In the present context, we will see that the answer to our question is her engagement to m in favor of a better proposal. But either way, at this
moment iv forms or continues an engagement with a man m’ whom she prefers
surprisingly clean: all executions of the G-S algorithm yield the same matching.
We proceed to prove this now. to m.
All Executions Yield the Same Matching There are a number of possible Since iv is a valid parmer of m, there exists a stable matching S’ containing
ways to prove a statement such as this, many of which would result in quite the pair (m, iv). Now we ask: Who is m’ paired with in this matching? Suppose
complicated arguments. It turns out that the easiest and most informative ap- it is a woman iv’ ~= iv.
proach for us will be to uniquely characterize the matching that is obtained and Since the rejection of m by iv was the first rejection of a man by a valid
then show that al! executions result in the matching with this characterization. partner in the execution ~, it must be that m’ had not been rejected by any valid
What is the characterization? We’ll show that each man ends up with the parmer at the point in ~ when he became engaged to iv. Since he proposed in
"best possible partner" in a concrete sense. (Recall that this is true if all men decreasing order of preference, and since iv’ is clearly a valid parmer of m’, it
prefer different women.) First, we will say that a woman iv is a valid partner must be that m’ prefers iv to iv’. But we have already seen that iv prefers m’
of a man m if there is a stable matching that contains the pair (m, iv). We will to m, for in execution ~ she rejected m in favor of m’. Since (m’, iv) S’, it
say that iv is the best valid partner of m if iv is a valid parmer of m, and no follows that (m’, iv) is an instability in S’.
woman whom m ranks higher than iv is a valid partner of his. We will use This contradicts our claim that S’ is stable and hence contradicts our initial
best(m) to denote the best valid partner of m. assumption. []
Now, let S* denote the set of pairs {(m, best(m)) : m ~ M}. We will prove
the folloWing fact. So for the men, the G-S algorithm is ideal. Unfortunately, the same cannot
be said for the women. For a woman w, we say that m is a valid partner if
(1.7) Every execution of the C--S algorithm results in the set S*: there is a stable matching that contains the pair (m, w). We say that m is the
ivorst valid partner of iv if m is a valid partner of w, and no man whom iv
This statement is surprising at a number of levels. First of all, as defined, ranks lower than m is a valid partner of hers.
there is no reason to believe that S* is a matching at all, let alone a stable (1.8) In the stable matching S*, each woman is paired ivith her ivorst valid
matching. After all, why couldn’t it happen that two men have the same best partner.
valid partner? Second, the result shows that the G-S algorithm gives the best
possible outcome for every man simultaneously; there is no stable matching Proof. Suppose there were a pair (m, iv) in S* such that m is not the worst
in which any of the men could have hoped to do better. And finally, it answers valid partner of iv. Then there is a stable matching S’ in which iv is paired
Chapter 1 Introduction: Some Representative Problems 1.2 Five Representative Problems 13
12
with a man m’ whom she likes less than m. In S’, m is paired with a woman science courses, we’ll be introducing them in a fair amount of depth in
w’ ~ w; since w is the best valid partner of m, and w’ is a valid partner of m, Chapter 3; due to their enormous expressive power, we’ll also be using them
we see that m prefers w to w’. extensively throughout the book. For the discussion here, it’s enough to think
of a graph G as simply a way of encoding pairwise relationships among a set
But from this it follows that (m, w) is an instability in S’, contradicting the
of objects. Thus, G consists of a pair of sets (V, E)--a collection V of nodes
claim that S’ is stable and hence contradicting our initial assumption. []
and a collection E of edges, each of which "joins" two of the nodes. We thus
represent an edge e ~ E as a two-element subset of V: e = (u, u) for some (a)
Thus, we find that our simple example above, in which the men’s pref-
u, u ~ V, where we call u and u the ends of e. We typica!ly draw graphs as in
erences clashed with the women’s, hinted at a very general phenomenon: for
Figure 1.3, with each node as a small circle and each edge as a line segment
any input, the side that does the proposing in the G-S algorithm ends up with
joining its two ends.
the best possible stable matching (from their perspective), while the side that
does not do the proposing correspondingly ends up with the worst possible Let’s now turn to a discussion of the five representative problems.
stable matching.
Interval Scheduling
1.2 Five Representative Problems Consider the following very simple scheduling problem. You have a resource-- Figure 1.3 Each of (a) and
it may be a lecture room, a supercompnter, or an electron microscope--and (b) depicts a graph on four
The Stable Matching Problem provides us with a rich example of the process of nodes.
many people request to use the resource for periods of time. A request takes
algorithm design. For many problems, this process involves a few significant,
the form: Can I reserve the resource starting at time s, until time f? We will
steps: formulating the problem with enough mathematical precision that we
assume that the resource can be used by at most one person at a time. A
can ask a concrete question and start thinking about algorithms to solve
scheduler wants to accept a subset of these requests, rejecting al! others, so
it; designing an algorithm for the problem; and analyzing the algorithm by
that the accepted requests do not overlap in time. The goal is to maximize the
proving it is correct and giving a bound on the running time so as to establish
number of requests accepted.
the algorithm’s efficiency.
More formally, there will be n requests labeled 1 ..... n, with each request
This high-level strategy is carried out in practice with the help of a few
fundamental design techniques, which are very useful in assessing the inherent i specifying a start time si and a finish time fi. Naturally, we have si < fi for all
i. Two requests i andj are compatible if the requested intervals do not overlap:
complexity of a problem and in formulating an algorithm to solve it. As in any
that is, either request i is for an earlier time interval than request j (fi <
area, becoming familiar with these design techniques is a gradual process; but
with experience one can start recognizing problems as belonging to identifiable or request i is for a later time than request j (1~ _< si). We’ll say more generally
that a subset A of requests is compatible if all pairs of requests i,j ~ A, i ~=j are
genres and appreciating how subtle changes in the statement of a problem can
compatible. The goal is to select a compatible subset of requests of maximum
have an enormous effect on its computational difficulty.
possible size.
To get this discussion started, then,, it helps to pick out a few representa-
We illustrate an instance of this Interual Scheduling Problem in Figure 1.4.
tive milestones that we’ll be encountering in our study of algorithms: cleanly
formulated problems, all resembling one another at a general level, but differ- Note that there is a single compatible set of size 4, and this is the largest
compatible set.
ing greatly in their difficulty and in the kinds of approaches that one brings
to bear on them. The first three will be solvable efficiently by a sequence of
increasingly subtle algorithmic techniques; the fourth marks a major turning
point in our discussion, serving as an example of a problem believed to be un-
solvable by any efficient algorithm; and the fifth hints at a class of problems
believed to be harder stil!.
The problems are self-contained and are al! motivated by computing
applications. To talk about some of them, though, it will help to use the
termino!ogy of graphs. While graphs are a common topic in earlier computer Figure 1.4 An instance of the Interval Scheduling Problem.
14 Chapter ! Introduction: Some Representative Problems 1.2 Five Representative Problems 15
We will see shortly that this problem can be solved by a very natural and Y in such a way that every edge has one end in X and the other end in Y.
algorithm that orders the set of requests according to a certain heuristic and A bipartite graph is pictured in Figure 1.5; often, when we want to emphasize
then "greedily" processes them in one pass, selecting as large a compatible a graph’s "bipartiteness," we will draw it this way, with the nodes in X and
subset as it can. This will be .typical of a class of greedy algorithms that we Y in two parallel columns. But notice, for example, that the two graphs in
will consider for various problems--myopic rules that process the input one Figure 1.3 are also bipartite.
piece at a time with no apparent look-ahead. When a greedy algorithm can be Now, in the problem of finding a stable matching, matchings were built
shown to find an optimal solution for al! instances of a problem, it’s often fairly from pairs of men and women. In the case of bipartite graphs, the edges are
surprising. We typically learn something about the structure of the underlying pairs of nodes, so we say that a matching in a graph G = (V, E) is a set of edges
problem from the fact that such a simple approach can be optimal. M _c E with the property that each node appears in at most one edge of M.
M is a perfect matching if every node appears in exactly one edge of M. Figure 1.5 A bipartite graph.
Weighted Interval Scheduling To see that this does capture the same notion we encountered in the Stable
In the Interval Scheduling Problem, we sohght to maximize the number of Matching Problem, consider a bipartite graph G’ with a set X of n men, a set Y
requests that could be accommodated simultaneously. Now, suppose more of n women, and an edge from every node in X to every node in Y. Then the
generally that each request interval i has an associated value, or weight, matchings and perfect matchings in G’ are precisely the matchings and perfect
vi > O; we could picture this as the amount of money we will make from matchings among the set of men and women.
the ith individual if we schedule his or her request. Our goal will be to find a
compatible subset of intervals of maximum total value. In the Stable Matching Problem, we added preferences to this picture. Here,
we do not consider preferences; but the nature of the problem in arbitrary
The case in which vi = I for each i is simply the basic Interval Scheduling bipartite graphs adds a different source of complexity: there is not necessarily
Problem; but the appearance of arbitrary values changes the nature of the an edge from every x ~ X to every y ~ Y, so the set of possible matchings has
maximization problem quite a bit. Consider, for example, that if v1 exceeds quite a complicated structure. In other words, it is as though only certain pairs
the sum of all other vi, then the optimal solution must include interval 1 of men and women are willing to be paired off, and we want to figure out
regardless of the configuration of the fi~l set of intervals. So any algorithm how to pair off many people in a way that is consistent with this. Consider,
for this problem must be very sensitive to the values, and yet degenerate to a for example, the bipartite graph G in Figure 1.5: there are many matchings in
method for solving (unweighted) interval scheduling when all the values are G, but there is only one perfect matching. (Do you see it?)
equal to 1.
Matchings in bipartite graphs can model situations in which objects are
There appears to be no simple greedy rule that walks through the intervals being assigned to other objects. Thus, the nodes in X can represent jobs, the
one at a time, making the correct decision in the presence of arbitrary values. nodes in Y can represent machines, and an edge (x~, y]) can indicate that
Instead, we employ a technique, dynamic programming, that builds up the machine y] is capable of processing job xi. A perfect matching is then a way
optimal value over all possible solutions in a compact, tabular way that leads of assigning each job to a machine that can process it, with the property that
to a very efficient algorithm. each machine is assigned exactly one job. In the spring, computer science
departments across the country are often seen pondering a bipartite graph in
Bipal~te Matching which X is the set of professors in the department, Y is the set of offered
When we considered the Stable Matching Problem, we defined a matching to courses, and an edge (xi, yj) indicates that professor x~ is capable of teaching
be a set of ordered pairs of men and women with the property that each man course y]. A perfect matching in this graph consists of an assignment of each
and each woman belong to at most one of the ordered pairs. We then defined professor to a course that he or she can teach, in such a way that every course
a perfect matching to be a matching in which every man and every woman is covered.
belong to some pair. Thus the Bipartite Matching Problem is the following: Given an arbitrary
We can express these concepts more generally in terms of graphs, and in bipartite graph G, find a matching of maximum size. If IXI = I YI = n, then there
order to do this it is useful to define the notion of a bipartite graph. We say that is a perfect matching if and only if the maximum matching has size n. We will
a graph G ---- (V, E) is bipa~te if its node set V can be partitioned into sets X find that the algorithmic techniques discussed earlier do not seem adequate
Chapter 1 Introduction: Some Representative Problems 1.2 Five Representative Problems 17
16
for providing an efficient algorithm for this problem. There is, however, a very Given the generality of the Independent Set Problem, an efficient algorithm
elegant and efficient algorithm to find a maximum matching; it inductively to solve it would be quite impressive. It would have to implicitly contain
builds up larger and larger matchings, selectively backtracking along the way. algorithms for Interval Scheduling, Bipartite Matching, and a host of other
This process is called augmeritation, and it forms the central component in a natural optimization problems.
large class of efficiently solvable problems called network flow problems. The current status of Independent Set is this: no efficient algorithm is
known for the problem, and it is conjectured that no such algorithm exists.
The obvious brute-force algorithm would try all subsets of the nodes, checking
Independent Set
each to see if it is independent, and then recording the largest one encountered.
Now let’s talk about an extremely general problem, which includes most of It is possible that this is close to the best we can do on this problem. We will
these earlier problems as special cases. Given a graph G = (V, E), we say see later in the book that Independent Set is one of a large class of problems
a set of nodes S _ V is independent if no ’two nodes~in S are joined by an that are termed NP-compIete. No efficient algorithm is known for any of them;
edge. The Independent Set Problem is, then, the following: Given G, find an but they are all equivalent in the sense that a solution to any one of them
independent set that is as large as possible. For example, the maximum size of would imply, in a precise sense, a solution to all of them.
Figure 1.6 A graph whose
an independent set in the graph in Figure 1.6 is four, achieved by the.four-node
Here’s a natural question: Is there anything good we can say about the
largest independent set has independent set [1, 4, 5, 6}.
size 4.
complexity of the Independent Set Problem? One positive thing is the following:
The Independent Set Problem encodes any situation in which you are If we have a graph G on 1,000 nodes, and we want to convince you that it
trying to choose from among a collection of objects and there are pairwise contains an independent set S of size 100, then it’s quite easy. We simply
conflicts among some of the objects. Say you have n friends, and some pairs show you the graph G, circle the nodes of S in red, and let you check that
of them don’t get along. How large a group of your friends can you invite to no two of them are joined by an edge. So there really seems to be a great
dinner if you don’t want any interpersonal tensions? This is simply the largest difference in difficulty between checking that something is a large independent
independent set in the graph whose nodes are your friends, with an edge set and actually finding a large independent set. This may look like a very basic
between each conflicting pair. observation--and it is--but it turns out to be crucial in understanding this class
Interval Scheduling and Bipartite Matching can both be encoded as special of problems. Furthermore, as we’ll see next, it’s possible for a problem to be
cases of the Independent Set Problem. For Interval Scheduling, define a graph so hard that there isn’t even an easy way to "check" solutions in this sense.
G = (V, E) in which the nodes are the intervals and there is an edge between
each pair of them that overlap; the independent sets in G are then just the Competitive Facility Location
compatible subsets of intervals. Encoding Bipartite Matching as a special case
Finally, we come to our fifth problem, which is based on the following two-
of Independent Set is a little trickier to see. Given a bipartite graph G’ = (V’, E’),
player game. Consider two large companies that operate caf6 franchises across
the objects being chosen are edges, and the conflicts arise between two edges
the country--let’s call them JavaPlanet and Queequeg’s Coffee--and they are
that share an end. (These, indeed, are the pairs of edges that cannot belong
currently competing for market share in a geographic area. First JavaPlanet
to a common matching.) So we define a graph G = (V, E) in which the node
opens a franchise; then Queequeg’s Coffee opens a franchise; then JavaPlanet;
set V is equal to the edge set E’ of G’. We define an edge between each pair
then Queequeg’s; and so on. Suppose they must deal with zoning regulations
of elements in V that correspond to edges of G’ with a common end. We can
that require no two franchises be located too close together, and each is trying
now check that the independent sets of G are precisely the matchings of G’.
to make its locations as convenient as possible. Who will win?
While it is not complicated to check this, it takes a little concentration to deal
with this type of "edges-to-nodes, nodes-to-edges" transformation.2 Let’s make the rules of this "game" more concrete. The geographic region
in question is divided into n zones, labeled 1, 2 ..... n. Each zone i has a
2 For those who are curious, we note that not every instance of the Independent Set Problem can arise
Interval Scheduling, and the graph in Figure 1.3(b) cannot arise as the "conflict graph" in an instance
in this way from Interval Scheduling or from Bipartite Matching; the full Independent Set Problem
of Bipartite Matching.
really is more general. The graph in Figure 1.3(a) cannot arise as the "conflict graph" in an instance of
18 Chapter 1 Introduction: Some Representative Problems Solved Exercises 19
Solved Exercises
Figure 1.7 An instance of the Competitive FaciBt3, Location Problem. Solved Exercise 1
Consider a town with n men and n women seeking to get married to one
another. Each man has a preference list that ranks all the women, and each
woman has a preference list that ranks all the men.
value bi, which is the revenue obtained by either of the companies if it opens
The set of all 2n people is divided into two categories: good people and
a franchise there. Finally, certain pairs of zones (i,]) are adjacent, and local
bad people. Suppose that for some number k, 1 < k < n - 1, there are k good
zoning laws prevent two adjacent zones from each containing a franchise,
men and k good women; thus there are n - k bad men and n - k bad women.
regardless of which company owns them. (They also prevent two franchises
from being opened in the same zone.) We model these conflicts via a graph Everyone would rather marry any good person than any bad person.
G= (V,E), where V is the set of zones, .and ~(i,]) is an edge in E if the Formally, each preference list has the property that it ranks each good person
zones i and ] are adiacent. ~The zoning requirement then says that the full of the opposite gender higher than each bad person of the opposite gender: its
set of franchises opened must form an independent set in G. first k entries are the good people (of the opposite gender) in some order, and
its next n - k are the bad people (of the opposite gender) in some order.
Thus our game consists of two players, P1 and P2, alternately selecting
nodes in G, with P1 moving first. At all times, the set of all selected nodes Show that in every stable matching, every good man is married to a good
must form an independent set in G. Suppose that player P2 has a target bound woman.
B, and we want to know: is there a strategy for P2 so that no matter how P1 Solution A natural way to get started thinking about this problem is to
plays, P2 will be able to select a set of nodes with a total value of at least B? assume the claim is false and try to work toward obtaining a contradiction.
We will call this an instance of the Competitive Facility Location Problem. What would it mean for the claim to be false? There would exist some stable
Consider, for example, the instance pictured in Figure 1.7, and suppose matching M in which a good man m was married to a bad woman w.
that P2’s target bound is B = 20. Then P2 does have a winning strategy. On the Now, let’s consider what the other pairs in M look like. There are k good
other hand, if B = 25, then P2 does not. men and k good women. Could it be the case that every good woman is married
One can work this out by looking at the figure for a while; but it requires to a good man in this matching M? No: one of the good men (namely, m) is
some amount of case-checking of the form, "If P~ goes here, then P2 will go already married to a bad woman, and that leaves only k - ! other good men.
there; but if P~ goes over there, then P2 will go here .... "And this appears to So even if all of them were married to good women, that would still leave some
be intrinsic to the problem: not only is it compntafionally difficult to determine good woman who is married to a bad man.
whether P2 has a winning strategy; on a reasonably sized graph, it would even Let w’ be such a good woman, who is married to a bad man. It is now
be hard for us to convince you that P2 has a winning strategy. There does not easy to identify an instability in M: consider the pair (m, w’). Each is good,
seem to be a short proof we could present; rather, we’d have to lead you on a but is married to a bad partner. Thus, each of m and w’ prefers the other to
lengthy case-by-case analysis of the set of possible moves. their current partner, and hence (m, w’) is an instability. This contradicts our
This is in contrast to the Independent Set Problem, where we believe that assumption that M is stable, and hence concludes the proof.
finding a large solution is hard but checking a proposed large solution is easy.
This contrast can be formalized in the class of PSPACE-complete problems, of Solved Exercise 2
which Competitive Facility Location is an example.. PSPACE-complete prob- We can think about a generalization of the Stable Matching Problem in which
lems are believed to be strictly harder than NP-complete problems, and this certain man-woman pairs are explicitly forbidden. In the case of employers and
conjectured lack of short "proofs" for their solutions is one indication of this applicants, we could imagine that certain applicants simply lack the necessary
greater hardness. The notion of PSPACE-completeness turns out to capture a qualifications or certifications, and so they cannot be employed at certain
large collection of problems involving game-playing and planning; many of companies, however desirable they may seem. Using the analogy to marriage
these are fundamental issues in the area of artificial intelligence. between men and women, we have a set M of n men, a set W of n women,
20 Chapter ! Introduction: Some Representative Problems Solved Exercises 21
and a set F _q M x W of pairs who are simply not allowed to get married. Each won’t work: we don’t want m to propose to a woman w for which the pair
man m ranks all th6 women w for which (m, w) ~ F, and each woman w’ ranks (m, w) is forbidden.
al! the men m’ for which (m’, w’) ~ F. Thus, let’s consider a variation of the G-S algorithm in which we make
In this more general setting, we say that a matching S is stable if it does only one change: we modify the Wh±le loop to say,
not exhibit any of the following types of instability.
While there is a man m who is free and hasn’t proposed to
(i) There are two pairs (m, w) and (m’, w’) in S with the property that every woman w for which (m,w) ~F.
(m, w’) F, m prefers w’ to w, and w’ prefers m to m’. (The usual kind
of instability.) Here is the algorithm in full.
(ii) There is a pair (m, w) E S, and a man m’, so that m’ is not part of any
pair in the matching, (m’, w) F, and w prefers m’ to m. (A single man Initially all m ~M and w ~W are free
is more desirable and not forbidden.) While there is a man m who is free and hasn’t proposed to
every woman w for which (m, w) ~F
(iii) There is a pair (m, w) E S, and a woman W’, so that w’ is not part of
any pair in the matching, (m, w’) F, and m prefers w’ to w. (A single Choose ~uch a man m
Let ~ be the highest-ranked woman in m’s preference list
woman is more desirable and not forbidden.)
to which m has not yet proposed
(iv) There is a man m and a woman w, neither of whom is part of any pair If ~ is free then
in the matching, so that (m, w) F. (There are two single people with
(m,~) become engaged
nothing preventing them from getting married to each other.)
Else w is currently engaged to m’
Note that under these more general definitions, a stable matching need not be If w prefers m’ to m then
a perfect matching. m remains free
Else ~ prefers m to m’
Now we can ask: For every set of preference lists and every set of forbidden
(m,~) become engaged
pairs, is there always a stable matching? Resolve this question by doing one of
mt becomes free
the following two things: (a) give an algorithm that, for any set of preference
Endif
lists and forbidden pairs, produces a stable matching; or (b) give an example
Endif
of a set of preference lists and forbidden pairs for which there is no stable
Endwhile
matching.
Keturn the set S of engaged pairs
Solution The Gale-Shapley algorithm is remarkably robust to variations on
the Stable Matching Problem. So, if you’re faced with a new variation of the We now prove that this yields a stable matching, under our new definition
problem and can’t find a counterexample to stability, it’s often a good idea to of stabi~ty.
check whether a direct adaptation of the G-S algorithm will in fact produce
To begin with, facts (1.1), (1.2), and (1.5) from the text remain true (in
stable matchings.
particular, the algorithm will terminate in at most n2 iterations]. Also, we
That turns out to be the case here. We will show that there is always a don’t have to worry about establishing that the resulting matching S is perfect
stable matching, even in this more general model with forbidden pairs, and (indeed, it may not be]. We also notice an additional pairs of facts. If m is
we will do this by adapting the G-S algorithm. To do this, let’s consider why a man who is not pan of a pair in S, then m must have proposed to every
the original G-S algorithm can’t be used directly. The difficulty, of course, is nonforbidden woman; and if w is a woman who is not part of a pair in S, then
that the G-S algorithm doesn’t know anything about forbidden pairs, and so it must be that no man ever proposed to w.
the condition in the gh±le loop,
Finally, we need only show
While there is a man m who is free and hasn’t proposed to
every woman, (1.9) There is no instability with respect to the returned matching S.
Chapter 1 Introduction: Some Representative Problems Exercises 23
22
Proof. Our general definition of instability has four parts: This means that we Suppose we have two television networks, whom we’ll call A and ~B.
have to make sure that none of the four bad things happens. There are n prime-time programming slots, and each network has n TV
shows. Each network wants to devise a schedule--an assignment of each
First, suppose there is an instability of type (i), consisting of pairs (m, w)
show to a distinct slot--so as to attract as much market share as possible.
and (m’, w’) in S with the property that (m, w’) ~ F, m prefers w’ to w, and w’
prefers m to m’. It follows that m must have proposed to w’; so w’ rejected rn, Here is the way we determine how well the two networks perform
and thus she prefers her final partner to m--a contradiction. relative to each other, given their schedules. Each show has a fixed rating,
which is based on the number of people who watched it last year; we’ll
Next, suppose there is an instability of type (ii), consisting of a pair
assume that no two shows have exactly the same rating. A network wins a
(m, w) ~ S, and a man m’, so that m’ is not part of any pair in the matching,
given time slot if the show that it schedules for the time slot has a larger
(m’, w) ~ F, and w prefers m’ to m. Then m’ must have proposed to w and
rating than the show the other network schedules for that time slot. The
been rejected; again, it follows that w prefers her final partner to
goal of each network is to win as many time slots as possible.
contradiction.
Suppose in the opening week of the fall season, Network A reveals a
Third, suppose there is an instability of type (iii), consisting of a pair
schedule S and Network ~B reveals a schedule T. On the basis of this pair
(m, w) ~ S, and a woman w’, so that w’ is not part of any. pair in the matching,
of schedules, each network wins certain time slots, according to the rule
(m, w’) ~ F, and rn prefers w’ to w. Then no man proposed to w’ at all;
above. We’ll say that the pair of schedules (S, T) is stable if neither network
in particular, m never proposed to w’, and so he must prefer w to
can unilaterally change its own schedule and win more time slots. That
contradiction.
is, there is no schedule S’ such that Network ~t wins more slots with the
Finally, suppose there is an instability of type (iv), consisting of a man pair (S’, T) than it did with the pair (S, T); and symmetrically, there is no
m and a woman w, neither of which is part of any pair in the matching, schedule T’ such that Network ~B wins more slots with the pair (S, T’) than
so that (m, w) ~ F. But for ra to be single, he must have proposed to every it did with the pair (S, T).
nonforbidden woman; in particular, he must have proposed tow, which means
The analogue of Gale and Shapley’s question for this kind of stability
she would no longer be single--a contradiction. []
is the following: For every set of TV shows and ratings, is there always
a stable pair of schedules? Resolve this question by doing one of the
Exercises following two things:
(a) give an algorithm that, for any set of TV shows and associated
Decide whether you think the following statement is true or false. If it is ratings, produces a stable pair of schedules; or
true, give a short explanation. If it is false, give a counterexample.
(b) give an example of a set of TV shows and associated ratings for
True or false? In every instance of the Stable Matching Problem, there is a which there is no stable pair of schedules.
stable matching containing a pair (m, w) such that m is ranked first on the
preference list of w and w is ranked first on the preference list of m.
Gale and Shapley published their paper on the Stable Matching Problem
Decide whether you think the following statement is true or false. If it is in 1962; but a version of their algorithm had already been in use for
true, give a short explanation. If it is false, give a cotmterexample. ten years by the National Resident Matching Program, for the problem of
assigning medical residents to hospitals.
True or false? Consider an instance of the Stable Matching Problem in which
Basically, the situation was the following. There were m hospitals,
there exists a man m and a woman w such that m is ranked first on the
each with a certain number of available positions for hiring residents.
preference list of w and w is ranked first on the preference list of m. Then in
There were n medical students graduating in a given year, each interested
every stable matching S for this instance, the pair (m, w) belongs to S.
in joining one of the hospitals. Each hospital had a ranking of the students
3. There are many other settings in which we can ask questions related in order of preference, and each student had a ranking of the hospitals
to some type of "stability" principle. Here’s one, involx4ng competition in order of preference. We will assume that there were more students
between two enterprises. graduating than there were slots available in the m hospitals.
Chapter 1 Introduction: Some Representative Problems
Exercises 25
24
strong instability? Either give an example of a set of men and women
The interest, naturally, was in finding a way of assigning each student
with preference lists for which every perfect matching has a strong
to at most one hospital, in such a way that all available positions in all
instability; or give an algorithm that is guaranteed to find a perfect
hospitals were filled. (Since we are assuming a surplus of students, there
matching with no strong instability.
would be some students who do not get assigned to any hospital.)
(b) A weak instability in a perfect matching S consists of a man m and
We say that an assignment of students to hospitals is stable ff neither
a woman tv, such that their partners in S are tv’ and m’, respectively,
of the following situations arises. and one of the following holds:
¯ First type of instability: There are students s and s’, and a hospital h, - m prefers u~ to ui, and tv either prefers m to m’ or is indifferent
so that be~veen these two choices; or
- s is assigned to h, and u~ prefers m to m’, and m either prefers u~ to u3’ or is indifferent
- s’ is assigned to no hospital, and between these two choices.
- h prefers s’ to s. In other words, the pairing between m and tv is either preferred
° Second type of instability: There are students s and s~, and hospitals by both, or preferred by one while the other is indifferent. Does
t~ and h’, so that there always exist a perfect matching with no weak instability? Either
- s is assigned to h, and give an example of a set of men and women with preference lists
s’ is assigned to tff, and for which every perfect matching has a weak instability; or give an
- t~ prefers s’ to s, and algorithm that is guaranteed to find a perfect matching with no weak
- s’ prefers tt to h’. instability.
So we basically have the Stable Matching Problem, except that (i)
6. Peripatetic Shipping Lines, inc., is a shipping company that owns n ships
hospitals generally want more than one resident, and (ii) there is a surplus and provides service to n ports. Each of its ships has a schedule that says,
of medical students. for each day of the month, which of the ports it’s currently visiting, or
Show that there is always a stable assignment of students to hospi- whether it’s out at sea. (You can assume the "month" here has m days,
tals, and give an algorithm to find one. for some m > n.) Each ship visits each port for exactly one day during the
month. For safety reasons, PSL Inc. has the following strict requirement:
The Stable Matching Problem, as discussed in the text, assumes that all (t) No two ships can be in the same port on the same day.
men and women have a fully ordered list of preferences. In this problem
The company wants to perform maintenance on all the ships this
we will consider a version of the problem in which men and women can be
month, via the following scheme. They want to truncate each ship’s
indifferent between certain options. As before we have a set M of n men
schedule: for each ship Sg, there will be some day when it arrives in its
and a set W of n women. Assume each man and each woman ranks the scheduled port and simply remains there for the rest of the month (for
members of the opposite gender, but now we allow ties in the ranking.
maintenance). This means that S~ will not visit the remaining ports on
For example (with n = 4), a woman could say that ml is ranked in first
its schedule (if any) that month, but this is okay. So the truncation of
place; second place is a tie between mz and m3 (she has no preference
S~’s schedule will simply consist of its original schedule up to a certain
between them); and m4 is in last place. We will say that tv prefers m to m’
specified day on which it is in a port P; the remainder of the truncated
if m is ranked higher than m’ on her preference list (they are not tied).
schedule simply has it remain in port P.
With indifferences in the ranldngs, there could be two natural notions
Now the company’s question to you is the following: Given the sched-
for stability. And for each, we can ask about the existence of stable
ule for each ship, find a truncation of each so that condition (t) continues
matchings, as follows. to hold: no two ships are ever in the same port on the same day.
(a) A strong instability in a perfect matching S consists of a man m and
Show that such a set of truncations can always be found, and give an
a woman tv, such that each of m and tv prefers the other to their
algorithm to find them.
partner in S. Does there always exist a perfect matching with no
Chapter 1 Introduction: Some Representative Problems Exercises 27
26
Example. Suppose we have two ships and two ports, and the "month" has
four days. Suppose the first ship’s schedule is
Junction Output 1
port P1; at sea; port P2~ at sea (meets Input 2
,, ", ~Junction before Input 1)
and the second ship’s schedule is
at sea; port Pff at sea; port P2 -
Then the (only) way to choose truncations would be to have the first ship
Output 2
remain in port Pz starting on day 3, and have the second ship remain in (meets Input 2
Junction [Junction
port P1 starting on day 2. before Input 1)
Some of your friends are working for CluNet, a builder of large commu-
nication networks, and they, are looking at algorithms for switching in a
particular type of input/output crossbar.
Here is the setup. There are n input wires and rt output wires, each Input 1 Input 2
directed from a source to a terminus. Each input wire meets each output (meets Output 2 (meets Output 1
before Output 1) before Output 2)
;~e in exactly one distinct point, at a special piece of hardware called
a junction box. Points on the ~e are naturally ordered in the direction Figure 1.8 An example with two input wires and two output wires. Input 1 has its
from source to terminus; for two distinct points x and y on the same junction with Output 2 upstream from its junction with Output 1; Input 2 has its
wire, we say, that x is upstream from y if x is closer to the source than junction with Output 1 upstream from its junction with Output 2. A valid solution is
to switch the data stream of Input 1 onto Output 2, and the data stream of Input 2
y, and otherwise we say, x is downstream from y. The order in which one onto Output 1. On the other hand, if the stream of Input 1 were switched onto Output
input wire meets the output ~es is not necessarily the same as the order 1, and the stream of Input 2 were switched onto Output 2, then both streams would
in which another input wire meets the output wires. (And similarly for pass through the junction box at the meeting of Input 1 and Output 2--and this is not
allowed.
the orders in which output wires meet input wires.) Figure !.8 gives an
example of such a collection of input and output wires.
Now, here’s the switching component of this situation. Each input
~e is carrying a distinct data stream, and this data stream must be For this problem, we will explore the issue of truthfulness in the Stable
switched onto one of the output wqres. If the stream of Input i is switched Matching Problem and specifically in the Gale-Shapley algorithm. The
onto Output j, at junction box B, then this stream passes through all basic question is: Can a man or a woman end up better off by lying about
his or her preferences? More concretely, we suppose each participant has
junction boxes upstream from B on input i, then through B, then through
a true preference order. Now consider a woman w. Suppose w prefers man
all junction boxes downstream from B on Output j. It does not matter
m to m’, but both m and m’ are low on her list of preferences. Can it be the
;vhich input data stream gets switched onto which output wire, but
each input data stream must be switched onto a different output wire. case that by switching the order of m and ra’ on her list of preferences (i.e.,
by falsely claiming that she prefers m’ to m) and nmning the algorithm
Furthermore--and this is the trick3, constraint--no two data streams can
with this false preference list, w will end up with a man m" that she truly
pass through the same junction box following the switching operation.
prefers to both m and m’? (We can ask the same question for men, but
Finally, here’s the problem. Show that for any specified pattern in will focus on the case of women for purposes of this question.)
which the input wires and output wires meet each other (each pair meet-
ing exactly once), a valid switching of the data streams can always be Resolve this question by doing one of the following two things:
found--one in which each input data stream is switched onto a different (a) Give a proof that, for any set of preference lists, switching the
output, and no two of the resulting streams pass through the same junc- order of a pair on the list cannot improve a woman’s partner in the Gale-
tion box. Additionally, give an algorithm to find such a valid switching. Shapley algorithm; or
28 Chapter 1 Introduction: Some Representative Problems
approaches we consider are drawn from fundamental issues that arise through- So what we could ask for is a concrete definition of efficiency that is
out computer science, and a general study of algorithms turns out to serve as platform-independent, instance-independent, and of predictive value with
a nice survey of computationa~ ideas that arise in many areas. respect to increasing input sizes. Before focusing on any specific consequences
of this claim, we can at least explore its implicit, high-level suggestion: that
Another property shared by many of the problems we study is their
we need to take a more mathematical view of the situation.
fundamentally discrete nature. That is, like the Stable Matching Problem, they
will involve an implicit search over a large set of combinatorial possibilities; We can use the Stable Matching Problem as an example to guide us. The
and the goal will be to efficiently find a solution that satisfies certain clearly input has a natural "size" parameter N; we could take this to be the total size of
delineated conditions. the representation of all preference lists, since this is what any algorithm for the
problem wi!l receive as input. N is closely related to the other natural parameter
As we seek to understand the general notion of computational efficiency, in this problem: n, the number of men and the number of women. Since there
we will focus primarily on efficiency in running time: we want algorithms that
are 2n preference lists, each of length n, we can view N = 2n2, suppressing
run quickly. But it is important that algorithms be efficient in their use of other
more fine-grained details of how the data is represented. In considering the
resources as well. In particular, the amount of space (or memory) used by an
problem, we will seek to describe an algorithm at a high level, and then analyze
algorithm is an issue that will also arise at a number of points in the book, and
its running time mathematically as a function of this input size N.
we will see techniques for reducing the amount of space needed to perform a
computation.
is by comparison with brute-force search over the search space of possible a consensus began to emerge on how to quantify the notion of a "reasonable"
solutions. running time. Search spaces for natural combinatorial problems tend to grow
exponentially in the size N of the input; if the input size increases by one, the
Let’s return to the example of the Stable Matching Problem. Even when
number of possibilities increases multiplicatively. We’d like a good algorithm
the size of a Stable Matching input instance is relatively small, the search
for such a problem to have a better scaling property: when the input size
space it defines is enormous (there are n! possible perfect matchings between
increases by a constant factor--say, a factor of 2--the algorithm should only
n men and n women), and we need to find a matching that is stable. The
slow down by some constant factor C.
natural "brute-force" algorithm for this problem would plow through all perfect
matchings by enumeration, checking each to see if it is stable. The surprising Arithmetically, we can formulate this scaling behavior as follows. Suppose
punchline, in a sense, to our solution of the Stable Matching Problem is that we an algorithm has the following property: There are absolute constants c > 0
needed to spend time proportional only to N in finding a stable matching from and d > 0 so that on every input instance of size N, its running time is
amgng this stupendously large space of possibilities. This was a conclusion we bounded by cNd primitive computational steps. (In other words, its running
reached at an analytical level. We did not implement the algorithm and try it time is at most proportional to Nd.) For now, we will remain deliberately
out on sample preference lists; we reasoned about it mathematically. Yet, at the vague on what we mean by the notion of a "primitive computational step"-
same time, our analysis indicated how the algorithm could be implemented in but it can be easily formalized in a model where each step corresponds to
practice and gave fairly conclusive evidence that it would be a big improvement a single assembly-language instruction on a standard processor, or one line
over exhaustive enumeration. of a standard programming language such as C or Java. In any case, if this
This will be a common theme in most of the problems we study: a compact’ running-time bound holds, for some c and d, then we say that the algorithm
representation, implicitly specifying a giant search space. For most of these has a polynomial running time, or that it is a polynomial-time algorithm. Note
problems, there will be an obvious brute-force solution: try all possibilities that any polynomial-time bound has the scaling property we’re looking for. If
and see if any one of them works. Not only is this approach almost always too the input size increases from N to 2N, the bound on the running time increases
slow to be useful, it is an intellectual cop-out; it provides us with absolutely from cNd to c(2N)a = c. 2aNa, which is a slow-down by a factor of 2a. Since d is
no insight into the structure of the problem we are studying. And so if there a constant, so is 2a; of course, as one might expect, lower-degree polynomials
is a common thread in the algorithms we emphasize in this book, it would be exhibit better scaling behavior than higher-degree polynomials.
the following alternative definition of efficiency. From this notion, and the intuition expressed above, emerges our third
attempt at a working definition of efficiency.
Proposed Definition of Efficiency (2): An algorithm is efficient if it achieves
qualitatively better worst-case performance, at an analytical level, than
Proposed Definition of Efficiency (3)" An algorithm is efficient if it has a
brute-force search.
polynomial running time.
This will turn out to be a very usefu! working definition for us. Algorithms
that improve substantially on brute-force search nearly always contain a
Where our previous definition seemed overly vague, this one seems much
valuable heuristic idea that makes them work; and they tell us something
too prescriptive. Wouldn’t an algorithm with running time proportional to
about the intrinsic structure, and computational tractability, of the underlying
nl°°--and hence polynomial--be hopelessly inefficient? Wouldn’t we be rel-
problem itself.
atively pleased with a nonpolynomial running time of nl+’02(l°g n)? The an-
But if there is a problem with our second working definition, it is vague- swers are, of course, "yes" and "yes." And indeed, however, much one may
ness. What do we mean by "qualitatively better performance?" This suggests try to abstractly motivate the definition of efficiency in terms of polynomial
that we consider the actual running time of algorithms more carefully, and try time, a primary justification for it is this: It really works. Problems for which
to quantify what a reasonable running time would be. polynomial-time algorithms exist almost invariably turn out to have algorithms
with running times proportional to very moderately growing polynomials like
Polynomial Time as a Definition of Efficiency n, n log n, n2, or n3. Conversely, problems for which no polynomial-time al-
When people first began analyzing discrete algorithms mathematicalfy--a gorithm is known tend to be very difficult in practice. There are certainly
thread of research that began gathering momentum through the 1960s-- exceptions to this principle in both directions: there are cases, for example, in
Chapter 2 Basics of Algorithm Analysis 2.2 Asymptotic Order of Growth 35
34
Table 2.1 The running times (rounded up) of different algorithms on inputs of previous definitions were completely subjective, and hence limited the extent
increasing size, for a processor performing a million high-leve! instructions per second. to which we could discuss certain issues in concrete terms.
In cases where the running time exceeds 10-’s years, we simply record the algorithm as In particular, the first of our definitions, which was tied to the specific
taking a very long time.
implementation of an algorithm, turned efficiency into a moving target: as
n n log rt
2 n2 /73 1.5n 2n n! processor speeds increase, more and more algorithms fal! under this notion of
n= I0 < I see < I sec < I sec < I sec < I SeE < I sec 4 see efficiency. Our definition in terms of polynomial time is much more an absolute
< I sec < i sec < I sec 18 rain 1025 years notion; it is closely connected with the idea that each problem has an intrinsic
n=30 < I sec < I sec
36 years very long level of computational tractability: some admit efficient solutions, and others
n=50 < 1 sec < I sec < 1 sec < 1 sec 11 mJn
do not.
= 100 < 1 sec < I sec < 1 sec 1 sec 12,892 years 1017 years vgry long
= 1,000 < 1 sec < 1 sec 1 sec 18 re_in very long very long very long
10,000 < I sec < I sec 2 min 12 days very long very long very long 2.2 Asymptotic Order of Growth
I00,000 < 1 sec 2 sec 3 hours 32 years very long very long very long Our discussion of computational tractability has turned out to be intrinsically
1,000,000 1 sec 20 sec 12 days 31,710 years very long very long very long based on our ability to express the notion that an algorithm’s worst-case
running time on inputs of size n grows at a rate that is at most proportiona! to
some function f(n). The function f(n) then becomes a bound on the rtmning
time of the algorithm. We now discuss a framework for talking about this
which an algorithm with exponential worst-case behavior generally runs well concept.
on the kinds of instances that arise in practice; and there are also cases where We will mainly express algorithms in the pseudo-code style that we used
the best polynomia!-time algorithm for a problem is completely impractical for the Gale-Shapley algorithm. At times we will need to become more formal,
due to large constants or a high exponent on the polynomial bound. All this but this style Of specifying algorithms will be completely adequate for most
serves to reinforce the point that our emphasis on worst-case, polynomial-time purposes. When we provide a bound on the running time of an algorithm,
bounds is only an abstraction of practical situations. But overwhelmingly, the we will generally be counting the number of such pseudo-code steps that
concrete mathematical definition of polynomial time has turned out to corre- are executed; in this context, one step wil! consist of assigning a value to a
spond surprisingly wel! in practice to what we observe about the efficiency of variable, looking up an entry in an array, following a pointer, or performing
algorithms, and the tractability of problems, in tea! life. an arithmetic operation on a fixed-size integer.
One further reason why the mathematical formalism and the empirical When we seek to say something about the running time of an algorithm on
evidence seem to line up well in the case of polynomial-time solvability is that inputs of size n, one thing we could aim for would be a very concrete statement
the gulf between the growth rates of polynomial and exponential functions such as, "On any input of size n, the algorithm runs for at most 1.62n2 +
is enormous. Suppose, for example, that we have a processor that executes 3.5n + 8 steps." This may be an interesting statement in some contexts, but as
a million high-level instructions per second, and we have algorithms with a general goal there are several things wrong with it. First, getting such a precise
running-time bounds of n, n log2 n, n2, n3, 1.5n, 2n, and n!. In Table 2.1, bound may be an exhausting activity, and more detail than we wanted anyway.
we show the running times of these algorithms (in seconds, minutes, days, Second, because our ultimate goal is to identify broad classes of algorithms that
or years) for inputs of size n = 10, 50, 50,100, 1,000, 10,000,100,000, and have similar behavior, we’d actually like to classify running times at a coarser
1,000,000. level of granularity so that similarities among different algorithms, and among
There is a final, fundamental benefit to making our definition of efficiency different problems, show up more clearly. And finally, extremely detailed
so specific: it becomes negatable. It becomes possible to express the notion statements about the number of steps an algorithm executes are often--in
that there is no efficient algorithm for a particular problem. In a sense, being a strong sense--meaningless. As just discussed, we will generally be counting
able to do this is a prerequisite for turning our study of algorithms into steps in a pseudo-code specification of an algorithm that resembles a high-
good science, for it allows us to ask about the existence or nonexistence level programming language. Each one of these steps will typically unfold
of efficient algorithms as a well-defined question. In contrast, both of our into some fixed number of primitive steps when the program is compiled into
Chapter 2 Basics of Algorithm Analysis 2.2 Asymptotic Order of Growth 37
36
where an algorithm has been proved to have running time O(n~); some years
an intermediate representation, and then into some further number of steps
depending on the particular architecture being used to do the computing. So pass, people analyze the same algorithm more carefully, and they show that
the most we can safely say is that as we look at different levels of computational in fact its running time is O(n2). There was nothing wrong with the first result;
abstraction, the notion of a "step" may grow or shrink by a constant factor-- it was a correct upper bound. It’s simply that it wasn’t the "tightest" possible
for example, .if it takes 25 low-level machine instructions to perform one running time.
operation in our high-level language, then our algorithm that took at most Asymptotic Lower Bounds There is a complementary notation for lower
1.62n2 + 3.5n + 8 steps can also be viewed as taking 40.5n2 + 87.5n + 200 steps bounds. Often when we analyze an algorithm--say we have just proven that
when we analyze it at a level that is closer to the actual hardware. its worst-case running time T(n) is O(n2)--we want to show that this upper
bound is the best one possible. To do this, we want to express the notion that for
O, s2, and ® arbitrarily large input sizes n, the function T(n) is at least a constant multiple of
For all these reasons, we want to express the growth rate of running times some specific function f(n). (In this example, f(n) happens to be n2.) Thus, we
and other functions in a way that is insensitive to constant factors and low- say that T(n) is ~2 if(n)) (also written T(n) = S2 if(n))) if there exist constants
order terms. In other words, we’d like to be able to take a running time like ~ > 0 and no _> 0 so that for all n > n0, we have T(n) > ~. f(n). By analogy with
the one we discussed above, 1.62n2 + 3.5n + 8, and say that it grows like n2, O(-) notation, we will refer to T in this case as being asymptotically lower-
up to constant factors. We now discuss a precise way to do this. bounded by f. Again, note that the constant ~ must be fixed, independent
of n.
Asymptotic Upper Bounds Let T(n) be a function--say, [he worst-case run-
ning time of a certain algorithm on an input of size n. (We will assume that ’ This definition works just like 0(.), except that we are bounding the
all the functions we talk about _here take nonnegative values.) Given another function T(n) from below, rather than from above. For example, returning
function f(n), we say that T(n) is Off(n)) (read as "T(n) is order f(n)") if, for to the function T(n) = pn2 + qn + r, where p, q, and r are positive constants,
sufficiently large n, the function T(n) is bounded above by a constant multiple let’s claim that T(n) = fl (n2). Whereas establishing the upper bound involved
of f(n). We will also sometimes write this as T(n) = Off(n)). More precisely, "inflating" the terms in T(n) until it looked like a constant times n2, now we
T(n) is Off(n)) if there exist constants c > 0 and no >_ 0 so that for all n >_ no, need to do the opposite: we need to reduce the size of T(n) until it looks like
a constant times n2. It is not hard to do this; for all n >_ O, we have
we have T(n) <_ c. f(n). In this case, we will say that T is asymptotically upper-
bounded by f. It is important to note that this definition requires a constant c
T(n) = pn2 + qn + r > pn2,
to exist that works for all n; in particular, c cannot depend on n.
As an example of how this definition lets us express upper bounds on which meets what is required by the definition of f2 (.) with ~ = p > 0.
running times, consider an algorithm whose running time (as in the earlier Just as we discussed the notion of "tighter" and "weaker" upper bounds,
discussion) has the form T(n) = pn2 + qn + r for positive constants p, q, and the same issue arises for lower bounds. For example, it is correct to say that
r. We’d like to claim that any such function is O(n2). To see why, we notice our function T(n) = pn2 + qn + r is S2 (n), since T(n) > pn2 > pn.
that for all n > 1, we have qn <_ qn2, and r < rn2. So we can write
Asymptotically Tight Bounds If we can show that a running time T(n) is
T(n) = pn~ + qn + r < pn2 + qn2 q- yn2 = (P q- q q- r)n2 both O(]’(n)) and also s2 ([(n)), then in a natural sense we’ve found the "right"
bound: T(n) grows exactly like [(n) to within a constant factor. This, for
for all n >_ 1. This inequality is exactly what the definition of O(-) requires: example, is the conclusion we can draw from the fact that T(n) -=pn2 q- qn q- r
T(n) < cn2, where c =p + q + r. is both O(n2) and f2 (n2).
Note that O(.) expresses only an upper bound, not the exact growth rate There is a notation to express this: if a function T(n) is both O([(n)) and
of the function. For example, just as we claimed that the function T(n)= S2([(n)), we say that T(n) is ®([(n)). In this case, we say that [(n) is an
pn2 + qn + r is O(n2), it’s also correct to say that it’s O(n3). Indeed, we just
asymptotically tight bound for T(n). So, for example, our analysis above shows
argued that T(n) <_ (p + q + r)n2, and since we also have n2 < n3, we can that T(n) = pn2 q- qn + r is ®(ha).
conclude that T(n) < (p + q + r)n~ as the definition of O(n~) requires. The
fact that a function can have many upper bounds is not just a trick of the Asymptotically tight bounds on worst-case running times are nice things
notation; it shows up in the analysis of running times as well. There are cases to find, since they characterize the worst-case performance of an algorithm
Chapter 2 Basics of Algorithm Analysis 2.2 Asymptotic Order of Growth 39
58
Proof. We’ll prove part (a) of this claim; the proof of part (b) is very similar.
precisely up to constant factors. And as the definition of ®(-) shows, one can
obtain such bounds by closing the gap between an upper bound and a lower For (a), we’re given that for some constants c and n0, we have f(n) <_ cg(n)
bound. For example, sometimes you will read a (slightly informally phrased) for all n >_ n0. Also, for some (potentially different) constants c’ and n~, we
sentence such as "An upper bound of O(n3) has been shown on the worst-case have g(n) <_ c’h(n) for all n _> n~. So consider any number n that is at least as
running time of the algorithm, but there is no example known on which the large as both no and n~. We have f(n) < cg(n) < cc’h(n), and so f(n) < cc’h(n)
algorithm runs for more than f2 (n2) steps." This is implicitly an invitation to for all n > max(no, n~). This latter inequality is exactly what is required for
search for an asymptotically tight bound on the algorithm’s worst-case running showing that f = O(h). ,,
time.
Sometimes one can also obtain an asymptotically tight bound directly by Combining parts (a) and (b) of (2.2), we can obtain a similar result
computing a limit as n goes to infinity. Essentially, if the ratio of functions for asymptotically tight bounds. Suppose we know that [ = ®(g) and that
f(n) and g(n) converges to a positive constant as n goes to infinity, then g = ®(h). Then since [ = O(g) and g = O(h), we know from part (a) that
[ = O(h); since [ = S2(g) and g = S2(h), we know from part (b) that [ =
f(n) = ®(g(n)).
It follows that [ = ® (h). Thus we have shown
(2.1) Let f and g be two functions that
(2.3) !f/=O(g) andg=®(h),thenf=®(h).
lim f(n___~)
.-->~ g(n) Sums of Functions It is also useful to have results that quantify the effect of
adding two functions. First, if we have an asymptotic upper bound that applies
exists and is equal to some number c > O. Then f(n) = ®(g(n)). to each of two functions f and g, then it applies to their sum.
Proof. We will use the fact that the limit exists and is positive to show that
(2.4) Suppose that f and g are two functions such that for some other function
f(n) = O(g(n)) and f(n) = S2(g(n)), as re.quired by the definition of ®(.). h, we have f = O(h) and g = O(h). Then f + g = O(h).
Since
Proof. We’re given that for some constants c and no, we have f(n) <_ Ch(n)
lira f(n)
n-+oo g(n) = c > 0, for all n > no. Also, for some (potentially different) constants c’ and no,
we have g(n) < c’h(n) for all n > no. ’ So consider any number n that is at
it follows from the definition of a limit that there is some no beyond which the least as large as both no and no.’ We have f(n) + g(n) <_ch(n) + c’h(n). Thus
ratio is always between ½c and 2c. Thus, f(n) < 2cg(n) for all n >_ no, which f(n) + g(n) <_ (c + c’)h(n) for all n _> max(no, n~), which is exactly what is
implies that f(n) = O(g(n)); and [(n) >_ ½cg(n) for all n >_ no, which implies required for showing that f + g = O(h). m
that [(n) = ~(g(n)). []
There is a generalization of this to sums of a fixed constant number of
Properties of Asymptotic Growth Rates functions k, where k may be larger than two. The result can be stated precisely
Having seen the definitions of O, S2, and O, it is useful to explore some of their as follows; we omit the proof, since it is essenti!lly the same as the proof of
basic properties. (2.4), adapted to sums consisting of k terms rather than just two.
Transitivity A first property is transitivity: if a function f is asymptotically (2.5) Let k be a fixed constant, and let fl, f2 ..... & and h be functions such
upper-bounded by a function g, and if g in turn is asymptotically upper- that fi = O(h) for all i. Then fl + f2 +"" + fk = O(h).
bounded by a function h, then f is asymptotically upper-bounded by h. A
similar property holds for lower bounds. We write this more precisely as There is also a consequence of (2.4) that covers the following kind of
follows. situation. It frequently happens that we’re analyzing an algorithm with two
high-level parts, and it is easy to show that one of the two parts is slower
(z.2) than the other. We’d like to be able to say that the running time of the whole
algorithm is asymptotically comparable to the running time of the slow part.
(a) !ff = O(g) and g = O(h), then f = O(h). Since the overall running time is a sum of two functions (the running times of
(b) If f = S2 (g) and g = ga (h), then f = ~2 (h).
2.2 Asymptotic Order of Growth- 41
40 Chapter 2 Basics of Algorithm Analysis
Proof. We write f = ao + aln + a2n2 ÷ " " " ÷ aana, where aa > 0. The upper logb n
loga n --
bound is a direct application of (2.5). First, notice that coefficients aj forj < d logo a
may be negative, but in any case we have ajnJ <_ lajlna for all n > 1. Thus each
This equation explains why you’ll often notice people writing bounds like
term in the polynomial is O(na). Since f is a sum of a constant number of
O(!og n) without indicating the base of the logarithm. This is not sloppy
functions, each of which is O(na), it follows from (2.5) that f is O(na). ¯ usage: the identity above says that loga n =~1 ¯ !ogb n, so the point is that
loga n = ® (logo n), and the base of the logarithm is not important when writing
One can also show that under the conditions of (2.7), we have f = f2 (ha),
bounds using asymptotic notation¯
and hence it follows that in fact f = ® (ha).
2.3 Implementing the Stable Matching Algorithm Using Lists and Arrays
Chapter 2 Basics of Algorithm Analysis
42
an algorithm expressed in a high-level fashion--as we expressed the Gale-
Exponentials Exponential functions are functions of the form f(n) = rn for
Shapley Stable Matching algorithm in Chapter 1, for example--one doesn’t
some constant base r. Here we will be concerned with the case in which r > !,
have to actually program, compile, and execute it, but one does have to think
which results in a very fast-growing function. about how the data will be represented and manipulated in an implementation
In particular, where polynomials raise rt to a fixed exponent, exponentials of the algorithm, so as to bound the number of computational steps it takes.
raise a fixed number to n as a power; this leads to much faster rates of growth.
The implementation of basic algorithms using data structures is something
One way to summarize the relationship between polynomials and exponentials
that you probably have had some experience with. In this book, data structures
is as follows. will be covered in the context of implementing specific algorithms, and so we
(2.9) For every r > 1 and every d > O, we have na = O(rn). will encounter different data structures based on the needs of the algorithms
we are developing. To get this process started, we consider an implementation
In particular, every exponential grows faster thari every polynomial. And as of the Gale-Shapley Stable Matching algorithm; we showed earlier that the
we saw in Table 2.1, when you plug in actual values of rt, the differences in algorithm terminates in at most rt2 iterations, and our implementation here
growth rates are really quite impressive. provides a corresponding worst-case running time of O(n2), counting actual
Just as people write O(log rt) without specifying the base, you’ll also see computational steps rather than simply the total number of iterations. To get
people write "The running time of this algorithm is exponential," without such a bound for the Stable Matching algorithm, we will only need to use two
specifying which exponential function they have in mind. Unlike the liberal of the simplest data structures: lists and arrays. Thus, our implementation also
use of log n, which is iustified by ignoring constant factors, this generic use of provides a good chance to review the use of these basic data structures as well.
the term "exponential" is somewhat sloppy. In particular, for different bases In the Stable Matching Problem, each man and each woman has a ranking
r > s > 1, it is never the case that rn = ® (sn). Indeed, this would require that of all members of the opposite gender. The very first question we need to
for some constant c > 0, we would have rn _< csn for all sufficiently large ft. discuss is how such a ranking wil! be represented. Further, the algorithm
But rearranging this inequality would give (r/s)n < c for all sufficiently large maintains a matching and will need to know at each step which men and
ft. Since r > s, the expression (r/s)n is. tending to infinity with rt, and so it women are free, and who is matched with whom. In order to implement the
cannot possibly remain bounded by a fixed constant c. algorithm, we need to decide which data structures we will use for all these
So asymptotically speaking, exponential functions are all different. Still, things.
it’s usually clear what people intend when they inexactly write "The running An important issue to note here is that the choice of data structure is up
time of this algorithm is exponential"--they typically mean that the running to the algorithm designer; for each algorithm we will choose data structures
time grows at least as fast as some exponential function, and all exponentials that make it efficient and easy to implement. In some cases, this may involve
grow so fast that we can effectively dismiss this algorithm without working out preprocessing the input to convert it from its given input representation into a
flLrther details of the exact running time. This is not entirely fair. Occasionally data structure that is more appropriate for the problem being solved.
there’s more going on with an exponential algorithm than first appears, as
we’!l see, for example, in Chapter 10; but as we argued in the first section of
this chapter, it’s a reasonable rule of thumb. Arrays and Lists
Taken together, then, logarithms, polynomials, and exponentials serve as To start our discussion we wi!l focus on a single list, such as the list of women
useful landmarks in the range of possible functions that you encounter when in order of preference by a single man. Maybe the simplest way to keep a list
analyzing running times. Logarithms grow more slowly than polynomials, and of rt elements is to use an array A of length n, and have A[i] be the ith element
polynomials grow more slowly than exponentials. of the list. Such an array is simple to implement in essentially all standard
programming languages, and it has the following properties.
2.3 Implementing the Stable Matching Algorithm We can answer a query of the form "What is the ith element on the list?"
IJsing Lists and Arrays in O(1) time, by a direct access to the value A[i].
We’ve now seen a general approach for expressing bounds on the running If we want to determine whether a particular element e belongs to the
time of an algorithm. In order to asymptotically analyze the running time of list (i.e., whether it is equal to A[i] for some i), we need to check the
2.3 Implementing the Stable Matching Algorithm Using Lisis and Arrays
Chapter 2 Basics of Algorithm Analysis 45
elements one by one in O(n) time, assuming we don’t know anything Before deleting e:
Element e
about the order in which the elements appear in A.
If the array elements are sorted in some clear way (either numerically
or alphabetically), then we can determine whether an element e belongs
to the list in O(log n) time using binary search; we will not need to use After deleting e:
binary search for any part of our stable matching implementation, but Element e
we will have more to say about it in the next section.
assumption (or notation) allows us to define an array indexed by all men Maybe the trickiest question is how to maintain women’s preferences to
or all women. We need to have a preference list for each man and for each keep step (4) efficient. Consider a step of the algorithm, when man m proposes
woman. To do this we will haye two arrays, one for women’s preference lists to a woman w. Assume w is already engaged, and her current partner is
and one for the men’s preference lists; we will use ManPref Ira, i] to denote rn’ =Current[w]. We would like to decide in O(1) time if woman w prefers rn
the ith woman on man m’s preference hst, and similarly WomanPref [w, i] to or rn’. Keeping the women’s preferences in an array IqomanPref, analogous to
be the ith man on the preference list of woman w. Note that the amount of the one we used for men, does not work, as we would need to walk through
space needed to give the preferences for all 2rt individuals is O(rt2), as each w’s list one by one, taking O(n) time to find m and rn’ on the list. While O(rt)
is still polynomial, we can do a lot better if we build an auxiliary data structure
person has a list of length n.
at the beginning.
We need to consider each step of the algorithm and understand what data
structure allows us to implement it efficiently. Essentially, we need to be able At the start of the algorithm, we create an n. x n array Ranking, where
Ranking[w, m] contains the rank of man m in the sorted order of w’s prefer-
to do each of four things in constant time.
ences. By a single pass through w’s preference list, we can create this array in
linear time for each woman, for a total initial time investment proportional to
1. We need to be able to identify a free man. rt2. Then, to decide which of m or m’ is preferred by w, we simply compare
2. We need, for a man m, to be able to identify the highest-ranked woman the values Ranking[w, rrt] and Ranking[w, rrt’].
to whom he has not yet proposed. This allows us to execute step (4) in constant time, and hence we have
3. For a woman w, we need to decide if w is currently engaged, and if she everything we need to obtain the desired running time.
is, we need to identify her current partner.
¯ 4. For a woman w and two men m and m’, we need to be able to decide, (2.10) The data structures described above allow us to implernentthe G-S
again in constant time, which of m or m’ is preferred by w. algorithm in O(n2) time.
First, consider selecting a free man. We will do this b_y maintaining the set
of flee men as a linked list. When we need to select a flee man, we take the
first man m on this list. We delete m from the list if he becomes engaged, and 2.4 A Survey of Common Running Times
possibly insert a different man rn’, if some other man m’ becomes free. In this When trying to analyze a new algorithm, it helps to have a rough sense of
case, m’ can be inserted at the front of the list, again in constant time. the "landscape" of different running times. Indeed, there are styles of analysis
Next, consider a man m. We need to identify the highest-ranked woman that recur frequently, and so when one sees running-time bounds like O(n),
to whom he has not yet proposed. To do this we will need to maintain an extra O(n log n), and O(n2) appearing over and over, it’s often for one of a very
array Next that indicates for each man m the position of the next woman he small number of distinct reasons. Learning to recognize these common styles
wil! propose to on his list. We initialize Next [m] = 1 for al! men m. If a man m of analysis is a long-term goal. To get things under way, we offer the following
needs to propose to a woman, he’ll propose to w = ManPref[m,Next [re]I, and survey of common running-time bounds and some of the typical .approaches
once he prdposes to w, we increment the value of Next[m] by one, regardless that lead to them.
of whether or not w accepts the proposal. Earlier we discussed the notion that most problems have a natural "search
Now assume man m proposes to woman w; we need to be able to ~denfify space"--the set of all possible solutions--and we noted that a unifying theme
the man m’ that w is engaged to (if there is such a man). We can do this by in algorithm design is the search for algorithms whose performance is more
maintaining an array Current of length n, where Current[w] is the woman efficient than a brute-force enumeration of this search space. In approaching a
w’s current partner m’. We set Current [w] to a special null symbol when we new problem, then, it often helps to think about two kinds of bounds: one on
need to indicate that woman w is not currently engaged; at the start of the the running time you hope to achieve, and the other on the size of the problem’s
algorithm, Current[w] is initialized to this null symbol for all women w. natural search space (and hence on the running time of a brute-force algorithm
To sum up, the data structures we have set up thus far can implement the for the problem). The discussion of running times in this section will begin in
rhany cases with an analysis of the brute-force algorithm, since it is a useful
operations (1)-(3) in O(1) time each.
Chapter 2 Basics of Algorithm Analysis 2.4 A Survey of Common Running Times 49
48
way to get one’s bearings with respect to a problem; the task of improving on qn that is also arranged in ascending
such algorithms will be our goal in most of the book. order. For example, merging the lists 2, 3, 11, 19 and 4, 9, 16, 25 results in the
output 2, 3, 4, 9, 11, 16, 19, 25.
To do this, we could just throw the two lists together, ignore the fact that
Linear Time
they’re separately arranged in ascending order, and run a sorting algorithm.
An algorithm that runs in O(n), or linear, time has a very natural property: But this clearly seems wasteful; we’d like to make use of the existing order in
its running time is at most a constant factor times the size of the input. One the input. One way to think about designing a better algorithm is to imagine
basic way to get an algorithm with this running time is to process the input performing the merging of the two lists by hand: suppose you’re given two
in a single pass, spending a constant amount of time on each item of input piles of numbered cards, each arranged in ascending order, and you’d like to
encountered. Other algorithms achieve a linear time bound for more subtle produce a single ordered pile containing all the cards. If you look at the top
reasons. To illustrate some of the ideas here, we c6nsider two simple linear- card on each stack, you know that the smaller of these two should go first on
time algorithms as examples. the output pile; so you could remove this card, place it on the output, and now
Computing the Maxirrturrt Computing the maximum of n numbers, for ex- iterate on what’s left.
ample, can be performed in the basic "one-pass" style. Suppose the numbers In other words, we have the following algorithm.
are provided as input in either a list or an array. We process the numbers
an in order, keeping a running estimate of the maximum as we go.
Each time we encounter a number ai, we check whether ai is larger than our bn:
current estimate, and if so we update the estimate to Maintain a Cu~ent pointer into each list, initialized to
point to the front elements
While both lists are nonempty:
Let a~ and ~ be the elements pointed to by the Cu~ent pointer
For i= 2 to n
Append the smaller of these two to the output list
If ai> max then
Advance the Cu~ent pointer in the list from which the
set max---- ai
smaller element was selected
Endif
EndWhile
Endfor
Once one list is empty, append the remainder of the other list
to the output
In this way, we do constant work per element, for a total running time of O(n).
Sometimes the constraints of an application force this kind of one-pass See Figure 2.2 for a picture of this process.
algorithm on you--for example, an algorithm running on a high-speed switch
on the Internet may see a stream of packets flying past it, and it can try
computing anything it wants to as this stream passes by, but it can only perform
a constant amount of computational work on each packet, and it can’t save
the stream so as to make subsequent scans through it. Two different subareas
of algorithms, online algorithms and data stream algorithms, have developed
IaAppend the smaller of~
and bj to the output.)
Now, to show a linear-time bound, one is tempted to describe an argument One also frequently encounters O(n log n) as a running time simply be-
like what worked for the maximum-finding algorithm: "We do constant work cause there are many algorithms whose most expensive step is to sort the
per element, for a total running time of O(n)." But it is actually not true that input. For example, suppose we are given a set of n time-stamps xl, x2 ..... xn
we do only constant work per element. Suppose that n is an even number, and on which copies of a file arrived at a server, and we’d like to find the largest
interval of time between the first and last of these time-stamps during which
consider the lists A = 1, 3, 5 ..... 2n - 1 and B = n, n + 2, n + 4 ..... 3n - 2.
The number b1 at the front of list B will sit at the front of the list for no copy of the file arrived. A simple solution to this problem is to first sort the
iterations while elements from A are repeatedly being selected, and hence time-stamps x~, x2 ..... xn and then process them in sorted order, determining
it will be involved in f2 (n) comparisons. Now, it is true that each element the sizes of the gaps between each number and its successor in ascending
can be involved in at most O(n) comparisons (at worst, it is compared with order. The largest of these gaps is the desired subinterval. Note that this algo-
each element in the other list), and if we sum this over all elements we get rithm requires O(rt log n) time to sort the numbers, and then it spends constant
a running-time bound of O(n2). This is a correct boflnd, but we can show work on each number in ascending order. In other words, the remainder of the
something much stronger. algorithm after sorting follows the basic recipe for linear time that we discussed
earlier.
The better way to argue is to bound the number of iterations of the While
loop by an "accounting" scheme. Suppose we charge the cost of each iteration
to the element that is selected and added to the output list. An element can Quadratic Time
be charged only once, since at the moment it is first charged, it is added Here’s a basic problem: suppose you are given n points in the plane, each
to the output and never seen again by the algorithm. But there are only 2n specified by (x, y) coordinates, and you’d like to find the pair of points that
elements total, and the cost of each iteration is accounted for by a charge to are closest together. The natural brute-force algorithm for this problem would,
some element, so there can be at most 2n iterations. Each iteration involves a enumerate all pairs of points, compute the distance between each pair, and
constant amount of work, so the total running time is O(n), as desired. then choose the pair for which this distance is smallest.
While this merging algorithm iterated through its input lists in order, the What is the running time of this algorithm? The number of pairs of points
"interleaved" way in which it processed the lists necessitated a slightly subtle is (~)_ n(n-1)2 , and since this quantity is bounded by ½n2, it is O(n2). More
running-time analysis. In Chapter 3 we will see linear-time algorithms for crudely, the number of pairs is O(n2) because we multiply the number of
graphs that have an even more complex flow of control: they spend a constant ways of choosing the first member of the pair (at most n) by the number
amount of time on each node and edge in the underlying graph, but the order of ways of choosing the second member of the pair (also at most n). The
in which they process the nodes and edges depends on the structure of the distance between points (xi, yi) and (xj, yj) can be computed by the formula
graph. ( (x~ - x/)2 + (y~ - yj)2 in constant time, so the overall running time is O(rt2).
This example illustrates a very common way in which a rtmning time of O(n2)
arises: performing a search over all pairs of input items and spending constant
O(rt log n) Time time per pair.
O(n log n) is also a very common running time, and in Chapter 5 we will Quadratic time also arises naturally from a pair of nested loops: An algo-
see one of the main reasons for its prevalence: it is the running time of any rithm consists of a !oop with O(n) iterations, and each iteration of the loop
algorithm that splits its input into two equa!-sized pieces, solves each piece launches an internal loop that takes O(n) time. Multiplying these two factors
recursively, and then combines the two solutions in linear time. of n together gives the running time.
Sorting is perhaps the most well-known example of a problem that can be The brute-force algorithm for finding the closest pair of points can be
solved this way. Specifically, the Mergesort algorithm divides the set of input written in an equivalent way with two nested loops:
numbers into two equal-sized pieces, sorts each half recursively, and then
merges the two sorted halves into a single sorted output list. We have just
For each input point (xi, yi)
seen that the merging can be done in linear time; and Chapter 5 will discuss
how to analyze the recursion so as to get a bound of O(n log n) on the overall For each other input point (~, ~)
Compute distance d = J(xi - ~)2 +
running time. ¥
Chapter 2 Basics of Algorithm Analysis 2.4 A Survey of Common Running Times 53
If d is less th~n the current minimum, update minimum to d Report that S~ and Sj are disjoint
End/or Endif
End/or End/or
End/or
Note how the "inner" loop, over (xj, yj), has O(n) iterations, each taking
constant time; and the "outer" loop, over (xi, yi), has O(n) iterations, each Each of the sets has maximum size O(n), so the innermost loop takes time
invoking the inner loop once. O(n). Looping over the sets S] involves O(n) iterations around this innermos~
It’s important to notice that the algorithm we’ve been discussing for the loop; and looping over the sets Si involves O(n) iterations around this. Multi-
Closest-Pair Problem really is just the brute-force approach: the natural search plying these three factors of n together, we get the running time of O(n3).
space for this problem has size O(n2), and _we’re simply enumerating it. At For this problem, there are algorithms that improve on O(n3) running
first, one feels there is a certain inevitability about thi~ quadratic algorithm-- time, but they are quite complicated. Furthermore, it is not clear whether
we have to measure all the distances, don’t we?--but in fact this is an illusion. the improved algorithms for this problem are practical on inputs of reasonable
In Chapter 5 we describe a very clever algorithm that finds the closest pair of size.
points in the plane in only O(n log n) time, and in Chapter 13 we show how
randomization can be used to reduce the running time to O(n).
O(nk) Time
In the same way that we obtained a running time of O(n2) by performing brute-
Cubic Time
force search over all pairs formed from a set of n items, we obtain a running
More elaborate sets of nested loops often lead to algorithms that run in time of O(nk) for any constant k when we search over all subsets of size k.
O(n3) time. Consider, for example, the following problem. We are given sets
n}, and we would like Consider, for example, the problem of finding independent sets in a graph,
which we discussed in Chapter 1. Recall that a set of nodes is independent
to know whether some pair of these sets is disjoint--in other words, has no
if no two are joined by an edge. Suppose, in particular, that for some fixed
elements in common.
constant k, we would like to know if a given n-node input graph G has an
What is the running time needed to solve this problem? Let’s suppose that independent set of size k. The natural brute-force aigorithm for this problem
each set Si is represented in such a way that the elements of Si can be listed in would enumerate all subsets of k nodes, and for each subset S it would check
constant time per element, and we can also check in constanttime whether a whether there is an edge joining any two members of S. That is,
given number p belongs to Si. The following is a direct way to approach the
problem.
For each subset S of k nodes
Check whether S constitutes an independent set
For pair of sets Si snd S]
If S is an independent set then
Determine whether Si ~ud S] have ~u element in common
Stop and declare success
End/or
Endif
End/or
This is a concrete algorithm, but to reason about its running time it helps to
If no k-node independent set was fotmd then
open it up (at least conceptually) into three nested loops.
Declare failure
Endif
For each set Si
For each other set S]
To understand the running time of this algorithm, we need to consider two
For each element p of St
quantities. First, the total number of k-element subsets in an n-element set is
Determine whether p also belongs to Sj
End/or
If no element of S~ belongs to Sj then nk) n(n- 1)(n - 2)... (n- k+ 1) nk
2.4 A Survey of Common Running Times 5,5
Chapter 2 Basics of Algorithm Analysis
54
of subsets of an n-element set is 2n, and so the outer loop in this algorithm
Since we are treating k as a constant, this quantity is O(nk). Thus, the outer will run for 2n iterations as it tries all these subsets. Inside the loop, we are
loop in the algorithm above will run for O(n~) iterations as it tries all k-node
checking all pairs from a set S that can be as large as n nodes, so each iteration
subsets of the n nodes of the graph. of the !oop takes at most O(n2) time. Multiplying these two together, we get-a
Inside this loop, we need to test whether a given set S of k nodes constitutes rulming time of O(n22n).
an independent set. The definition of an independent set tells us that we need
Thus see that 2n arises naturally as a running time for a search algorithm
to check, for each pair of nodes, whether there is an edge joining them. Hence
that must consider all subsets. In the case of Independent Set, something
this is a search over pairs, like we saw earlier in the discussion of quadratic
at least nearly this inefficient appears to be necessary; but it’s important
time; it requires looking at (~2), that is, o(k2), pairs and spending constant time
to ke~p in mind that 2n is the size of the search space for many problems,
on each. and for many of them we wil! be able to find highly efficient polynomial-
Thus the total running time is O(k2n~). Since we are treating k as a constant time algorithms. For example, a brute-force search algorithm for the Interval
here, and since constants can be dropped in O(-) notation, we can write this Scheduling Problem that we saw in Chapter 1 would look very similar to the
running time as O(nk). algorithm above: try all subsets of intervals, and find the largest subset that has
Independent Set is a principal example of a problem believed to be compu- no overlaps. But in the case of the Interval Scheduling Problem, as opposed
tationally hard, and in particular it is believed that no algorithm to find k-node to the Independent Set Problem, we will see (in Chapter 4) how to find an
independent sets in arbitrary graphs can avoid having some dependence on k optimal solution in O(n log n) time. This is a recurring kind of dichotomy in
in the exponent. However, as we will discuss in Chapter 10 in the context of the study of algorithms: two algorithms can have very similar-looking search
a related problem, even once we’ve conceded that brute-force search over k- spaces, but in one case you’re able to bypass the brute-force search algorithm,
element subsets is necessary, there can be different ways of going about this and in the other you aren’t.
that lead to significant differences in the efficiency of the computation. The function n! grows even more rapidly than 2n, so it’s even more
menacing as a bound on the performance of an algorithm. Search spaces of
Beyond Polynomial Time size n! tend to arise for one of two reasons. First, n! is the number of ways to
The previous example of the Independent Set Problem starts us rapidly down match up n items with n other items--for example, it is the number of possible
the path toward running times that grow faster than any polynomial. In perfect matchings of n men with n women in an instance of the Stable Matching
particular, two kinds of bounds that coine up very frequently are 2n and Problem. To see this, note that there are n choices for how we can match up
the first man; having eliminated this option, there are n - 1 choices for how we
and we now discuss why this is so.
can match up the second man; having eliminated these two options, there are
Suppose, for example, that we are given a graph and want to find an n - 2 choices for how we can match up the third man; and so forth. Multiplying
independent set of maximum size (rather than testing for the existence of one all these choices out, we get n(n - 1)(n - 2) -- ¯ (2)(1) = n!
with a given number of nodes). Again, people don’t know of algorithms that
improve significantly on brute-force search, which in this case would look as Despite this enormous set of possible solutions, we were able to solve
the Stable Matching Problem in O(n2) iterations of the proposal algorithm.
fol!ows.
In Chapter 7, we will see a similar phenomenon for the Bipartite Matching
Problem we discussed earlier; if there are n nodes on each side of the given
For each subset S of nodes bipartite graph, there can be up to n! ways of pairing them up. However, by
Check whether S constitutes ~n independent set a fairly subtle search algorithm, we will be able to find the largest bipartite
If g is a larger independent set than the largest seen so far then
matching in O(n3) time.
~ecord the size of S as the current maximum
The function n! also arises in problems where the search space consists
Endif
of all ways to arrange n items in order. A basic problem in this genre is the
Endfor
Traveling Salesman Problem: given a set of n cities, with distances between
all pairs, what is the shortest tour that visits all cities? We assume that the
This is very much like the brute-force algorithm for k-node independent sets, salesman starts and ends at the first city, so the crux of the problem is the
except that now we are iterating over all subsets of the graph. The total number
56 Chapter 2 Basics of Algorithm Analysis 2.5 A More Complex Data Structure: Priority Queues
57
implicit search over all orders of the remaining n - 1 cities, leading to a search 2.5 A More Complex Data Structure:
space of size (n- 1)!. In Chapter 8, we will see that Traveling Salesman Priority Queues
is another problem that, like Independent Set, belongs to the class of NP-
Our primary goal in this book was expressed at the outset of the chapter:
complete problems and is believed to have no efficient solution.
we seek algorithms that improve qualitatively on brute-force search, and in
general we use polynomial-time solvability as the concrete formulation of
this. Typically, achieving a polynomial-time solution to a nontrivial problem
Sublinear Time is not something that depends on fine-grained implementation details; rather,
Finally, there are cases where one encounters running times that are asymp- the difference between exponential and polynomial is based on overcoming
totically smaller than linear. Since it takes linear time just to read the input, higher-level obstacles. Once one has an efficient algorithm to solve a problem,
these situations tend to arise in a model of computation where the input can be however, it is often possible to achieve further improvements in running time
"queried" indirectly rather than read completely, and the goal is to minimize by being careful with the implementation details, and sometimes by using
the amount of querying that must be done. more complex data structures.
Some complex data structures are essentially tailored for use in a single
Perhaps the best-known example of this is the binary search algorithm.
kind of algorithm, while others are more generally applicable. In this section,
Given a sorted array A of n numbers, we’d like to determine whether a given
we describe one of the most broadly useful sophisticated data structures,
number p belongs to the array. We could do this by reading the entire array,
but we’d like to do it much more efficiently, taking advantage of the fact that the priority queue. Priority queues will be useful when we describe how to
the array is sorted, by carefully probing particular entries. In particular, we implement some of the graph algorithms developed later in the book. For our
purposes here, it is a useful illustration of the analysis of a data structure that,
probe the middle entry of A and get its value--say it is q--and we compare q
to p. If q = p, we’re done. If q > p, then in order for p to belong to the array unlike lists and arrays, must perform some nontrivial processing each time it
A, it must lie in the lower half of A; so we ignore the upper half of A from is invoked.
now on and recursively apply this search in the lower half. Finally, ff q < p,
then we apply the analogous reasoning and recursively search in the upper ~ The Problem
half of A. In the implementation of the Stable Matching algorithm in Section 2.3, we
The point is that in each step, there’s a region of A where p might possibly discussed the need to maintain a dynamically changing set S (such as the set
be; and we’re shrinking the size of this region by a factor of two with every of all free men in that case). In such situations, we want to be able to add
probe. So how large is the "active" region of A after k probes? It starts at size elements to and delete elements from the set S, and we want to be able to
n, so after k probes it has size at most (½)kn. select an element from S when the algorithm calls for it. A priority queue is
Given this, how long will it take for the size of the active region-to be designed for applications in which elements have a priority value, or key, and
reduced to a constant? We need k to be large enough so that (½)k = O(1/n), each time we need to select an element from S, we want to take the one with
and to do this we can choose k = log2 n. Thus, when k = log2 n, the size of highest priority.
the active region has been reduced to a constant, at which point the recursion A priority queue is a data structure that maintains a set of elements S,
bottoms out and we can search the remainder of the array directly in constant where each element v ~ S has an associated value key(v) that denotes the
time. priority of element v; smaller keys represent higher priorities. Priority queues
So the running time of binary search is O(log n), because of this successive support the addition and deletion of elements from the set, and also the
shrinking of the search region. In general, O(log n) arises as a time bohnd selection of the element with smallest key. Our implementation of priority
whenever we’re dealing with an algorithm that does a constant amount of queues will also support some additional operations that we summarize at the
work in order to throw away a constant fraction of the input. The crucial fact end of the section.
is that O(log n) such iterations suffice to shrink the input down to constant A motivating application for priority queues, and one that is useful to keep
size, at which point the problem can generally be solved directly. in mind when considering their general function, is the problem of managing
Chapter 2 Basics of Algorithm Analysis 2.5 A More Complex Data Structure: Priority Queues 59
58
real-time events such as the scheduling of processes on a computer. Each of a priority queue. We could just have the elements in a list, and separately
process has a priority, or urgency, but processes do not arrive in order of have a pointer labeled M±n to the one with minimum key. This makes adding
their priorities. Rather, we have a current set of active processes, and we want new elements easy, but extraction of the minimum hard. Specifically, finding
to be able to extract the one with the currently highest priority and run it. the minimum is quick--we just consult the M±n pointer--but after removing
We can maintain the set of processes in a priority queue, with the key of a this minimum element, we need to update the ~±n pointer to be ready for the
process representing its priority value. Scheduling the highest-priority process next operation, and this would require a scan of all elements in O(n) time to
Corresponds to selecting the element with minimum key from the priority find the new minimum.
queue; concurrent with this, we will also be inserting new processes as they This complication suggests that we should perhaps maintain the elements
arrive, according to their priority values. in the sorted order of the keys. This makes it easy to extract the element with
How efficiently do we hope to be able to execute the operations in a priority smallest key, but now how do we add a new element to our set? Should we
queue? We will show how to implement a priority queue containing at most have the elements in an array, or a linked list? Suppose we want to add s
n elements at any time so that elements can be added and deleted, and the with key value key(s). If the set S is maintained as a sorted array, we can use
element with minimum key selected, in O(log n) time per operation. binary search to find the array position where s should be inserted in O(log n)
Before discussing the implementation, let us point out a very basic appli- time, but to insert s in the array, we would have to move all later elements
cation of priority queues that highlights why O(log n) time per operation is one position to the right. This would take O(n) time. On the other hand, if we
essentially the "right" bound to aim for. maintain the set as a sorted doubly linked list, we could insert it in O(1) time
into any position, but the doubly linked list would not support binary search,
(2.11) A sequence of O(n) priority queue operations can be used to sort a set and hence we may need up to O(n) time to find the position where s should
of n numbers. be inserted.
Proof. Set up a priority queue H, and insert each number into H with its value The Definition of a Heap So in all these simple approaches, at least one of
as a key. Then extract the smallest number one by one until all numbers have the operations can take up to O(n) time--much more than the O(log n) per
been extracted; this way, the numbers will come out of the priority queue in operation that we’re hoping for. This is where heaps come in. The heap data
sorted order. structure combines the benefits of a sorted array and list for purposes of this
application. Conceptually, we think of a heap as a balanced binary tree as
Thus, with a priority queue that can perform insertion and the extraction shown on the left of Figure 2.3. The tree will have a root, and each node can
of minima in O(log n) per operation, we can sort n numbers in O(n log n) have up to two children, a left and a right child. The keys in such a binary tree
time. It is known that, in a comparison-based model of computation (when are said to be in heap order if the key of any element is at least as large as the
each operation accesses the input only by comparing a pair of numbers), key of the element at its parent node in the txee. In other words,
the time needed to sort must be at least proportional to n log n, so. (2.11)
Heap order: For every element v, at a node i, the element w at i’s parent
highlights a sense in which O(log n) time per operation is the best we can
satisfies key(w) < key(v).
hope for. We should note that the situation is a bit more complicated than
this: implementations of priority queues more sophisticated than the one we In Figure 2.3 the numbers in the nodes are the keys of the corresponding
present here can improve the running time needed for certain operations, and elements.
add extra functionality. But (2.11) shows that any sequence of priority queue Before we discuss how to work with a heap, we need to consider what data
operations that results in the sorting of n numbers must take time at least structure should be used to represent it. We can use poiriters: each node at the
proportional to n log n in total. heap could keep the element it stores, its key, and three pointers pointing to
the two children and the parent of the heap node. We can avoid using pointers,
A Data Structure for Implementing a Priority Queue however, if a bound N is known in advance on the total number of elements
We will use a data structure called a heap to implement a priority queue. that will ever be in the heap at any one time. Such heaps can be maintained
Before we discuss the structure of heaps, we should consider what happens in an array H indexed by i = 1 ..... N. We will think of the heap nodes as
with some simpler, more natural approaches to implementing the flmctions corresponding to the positions in this array. H[1] is the root, and for any node
2.5 A More Complex Data Structure: Priority Queues 61
6O Chapter 2 Basics of Algorithm Analysis
~a
EaCh node’s key is at least~
s large as its parent’s.
The H e a p i fy - u p process is movingI
element v toward the root.
1 2 5 10 3 7 11 15 17 20 9 15 8 16 X
Figure 2.3 Values in a heap shown as a binaD, tree on the left, and represented as an Figure 2.4 The Heapify-up process. Key 3 (at position 16) is too small (on the left).
array on the right. The arrows show the children for the top three nodes in the tree. After swapping keys 3 and 11, the heap xdolation moves one step closer to the root of
the tree (on the right).
at position i, the children are the nodes at positions leftChild(i) = 2i and ] = parent(i) to continue fixing the heap by pushing the damaged part upward.
rightChild(f) = 2i + 1. So the two children of the root are at positions 2 and Figure 2.4 shows the first two steps of the process after an insertion.
3, and the parent of a node at position i is at position parent(f) =/i/2J. If
the heap has n < N elements at some time, we will use the first rt positions
of the array to store the n heap elements, and use lenggh(H) to denote the Heapify-up (H, i) :
number of elements in H. This representation keeps the heap balanced at all If i> 1 then
times. See the right-hand side of Figure 2.3 for the array representation of the let ] = parent(i) = Lil2J
If key[H[i]]<key[H[j]] then
heap on the left-hand side.
swap the array entries H[i] mad H[j]
Heapify-up (H, j )
Implementing the Heap Operations Endif
The heap element with smallest key is at the root, so it takes O(1) time to Endif
identify the minimal element. How do we add or delete heap elements? First
conside~ adding a new heap element v, and assume that our heap H has n < N To see why Heapify-up wOrks, eventually restoring the heap order, it
elements so far. Now it will have n + 1 elements. To start with, we can add the helps to understand more fully the structure of our slightly damaged heap in
new element v to the final position i = n + 1, by setting H[i] = v. Unfortunately, the middle of this process. Assume that H is an array, and v is the element in
this does not maintain the heap property, as the key of element v may be position i. We say that H is almost a heap with the key of H[i] too small, if there
smaller than the key of its parent. So we now have something that is almost-a is a value ~ _> key(v) such that raising the value of key(v) to c~ would make
heap, except for a small "damaged" part where v was pasted on at the end. the resulting array satisfy the heap property. (In other words, element v in H[i]
is too small, but raising it to cz would fix the problem.) One important point
We will use the procedure Heap±f y-up to fix our heap. Letj = parent(i) =
to note is that if H is almost a heap with the key of the root (i.e., H[1]) too
L//2] be the parent of the node i, and assume H[j] = w. If key[v] < key[w],
then we will simply swap the positions of v and w. This wil! fix the heap small, then in fact it is a~heap. To see why this is true, consider that if raising
property at position i, but the resulting structure will possibly fail to satisfy the value of H[1] to c~ would make H a heap, then the value of H[!] must
the heap property at position j--in other words, the site of the "damage" has also be smaller than both its children, and hence it already has the heap-order
moved upward from i to j. We thus call the process recursively from position property.
2.5 A More Complex Data Structure: Priority Queues
62 Chapter 2 Basics of Algorithm Analysis 63
Now, the function f3 isn’t so hard to deal with. It starts out smaller than
Exercises
I0n, but once n >_ 10, then clearly I0n < nn. This is exactly what we need for
Suppose you have algorithms with the five running times listed below.
the definition of O(.) notation: for all n >_ 10, we have I0n _< cnn, where in this (Assume these are the exact running times.) How much slower do each of
case c = 1, and so I0n = o(nn). these algorithms get when you (a) double the input size, or (b) increase
Finally, we come to function fls, which is admittedly kind of strange- the input size by one?
looking. A useful rule of thumb in such situations is to try taking logarithms
(a) n2
to see whether this makes things clearer. In this case, log2 fs(n) = ~ n =
(!og2 n)l/2. What do the logarithms of the other functions look like? log f4(n) = n3
log2 log2 n, while log fa(n) = ½ log2 n. All of these can be viewed as functions lOOn2
of log2 n, and so using the notation z = log2 n, we can write nlog n
1 2n
log fa(n) = -z
3 Suppose you have algorithms with the sLx running times listed below.
log f4(n) = log2 z (Assume these are the exact number of operations performed as a func-
log fs(n) = z~/2 tion of the input size n.) Suppose you have a computer that can perform
10t° operations per second, and you need to compute a result in at most
Now it’s easier to see what’s going on. First, for z > 16, we have log2 z < an hour of computation. For each of the algorithms, what is the largest
z1/2. But the condition z > 16 is the same as n >_ 216 -= 65,536; thus once input size n for which you would be able to get the result within an hour?
n > 216 we have log/4(n) _< log/s(n), and so/4(n) _< Is(n). Thus we can write (a) rt~
f4(n) _= O(fs(n)). Similarly we have z11~< ½z once z >_ 9--in other words, (b) n3
once n > 29 = 512. For n above this bound we have log fs(n) < log f2(n) and
(c) lOOn~
hence fs(n)< f2(n), and so we can write Is(n)= O(f2(n)). Essentially, we
(d) n log n
have discovered that 2l~/i-~ n is a function whose growth rate lies somewhere
between that of logarithms and polynomials. (e) 2n
Since we have sandwiched fs between f4 and f2, this finishes the task of (f) 22"
putting the functions in order.
Take the foilowing list of functions and arrange them in ascending order
of growth rate. That is, if function g(n) immediately follows function f(n)
in your list, then it should be the case that f(n) is O(g(n)).
Solved Exercise 2
Let f and g be two functions that take nonnegative values, and suppose that v/ fl(n) = n
f = O(g). Show that g = fl (f).
~/ f3(n) = n + 10
Solution This exercise is a way to formalize the intuition that O(.) and fl (-)
are in a sense opposites. It is, in fact, not difficult to prove; it is just a matter ~/f4(n) = lon
of unwinding the definitions. ~/fstn) = 10on
We’re given that, for some constants c and no, we have f(n) < cg(n) for fc,(n) = n2 log n
all n >_ n0. Dividing both sides by c, we can conclude that g(n) >_ ~f(n) for Take the following list of functions and arrange them in ascending order
all n >_ no. But this is exactly what is required to show that g = fl (f): we have
of growth rate. That is, if function g(n) immediately follows function f(n)
established that g(n) is at least a constant multiple of f(n) (where the constant
in your list, then it should be the case that f(n) is O(g(n)).
is ~), for all sufficiently large n (at least no).
Exercises 69
Chapter 2 Basics of Algorithm Analysis
68
~ gl(a) = 2~°4i~ the relevant entries of the array B, filling in a value for each--it
contains some highly urmecessary sources of inefficiency. Give a
" g2(n) = 2n
different algorithm to solve this problem, with an asymptotically
i g4(n) ---- n4/3 better nmning time. In other words, you should design an algorithm
g3(n) = n(log n)3 with running time O(g(n)), where lim~_.oo g(n)/f(n) = O.
gs(n) = nlogn
There’s a class of folk songs and holiday songs in which each verse
g6(n) = 22n
consists of the previous verse, with one extra line added on. "The Twelve
i gT(n) = 2n2 Days of Christmas" has this property; for e.xample, when you get to the
fifth verse, you sing about the five golden rings and then, reprising the
Assume you have functions f and g such that f(n) is O(g(n)). For each of lines from the fourth verse, also cover the four calling birds, the three
the following statements, decide whether you think it is true or false and French hens, the two turtle doves, and of course the.partridge in the’pear
give a proof or counterexample. tree. The Aramaic song "Had gadya" from the PassoVer Haggadah works
(a) log2 f(n)’is O(log2g(n))- like this as well, as do many other songs.
(b) 2f(n) is O(2g(~)). These songs tend to last a long time, despite having relatively short
(C) /(n)2 iS O(g(n)2). scripts. In particular, you can convey the words plus instructions for one
of these songs by specifying just the new line that is added In each verse,
Consider the following basic problem. You’re given an array A consisting without ha~4ng to write out all the previous lines each time. (So the phrase
A[n]. You’d like to output a two-dimensional "five golden rings" ouly has to be written once, even though it will appear
n-by-n array B in which B[i,j] (for i <j) contains the sum of array entries in verses five and Onward.)
A[i] through A~]--that is, the sum A[i] +A[i + 1] +-" + A[j]. (The value of There’s someth~g asy~nptotic that can be analyzed here. Suppose,
array entry B[i,j] is left unspecified whenever i >_j, so it doesn’t matter for concreteness, that ~ach line has a length that i~ bounded by a constant
what is output for these values.) c, and suppose that the song, when sung out loud, runs for n words total.
Here’s a simple algorithm to solve this problem. Show how to encode such a song using a script that has length f(n), for
a function f(n) that grows as slowly as possible.
For i=1, 2,...,n
n
You’re doing some stress-testing on various models of glass jars to
Add up array entries A[i] through A[j] determine the height from which they can be dropped and still not break.
Store the result in B[i,]] The setup for this experiment, on a particular type of jar, is as follows.
End/or You have a ladder with n rungs, and you want to find the highest rung
End/or
from which you can drop a copy of the jar and not have it break..We ca~,
this the highest safe rung.
(a) For some function f that you should choose, give a bound of the
form O(f(n)) on the running time of this algorithm on an input of It might be natural to try binary search: drop a jar from the middle
size n (i.e., a bound on the number of operations performed by the rung, see if it breaks, and then recursively try from rung n/4 or 3n/4
algorithm). depending on the outcome. But this has the drawback that y9u could
break a lot of jars in finding the answer.
(b) For this same function f, show that thertmning time of the algorithm
on an input of size n is also ~2 (f(n)). (This shows an asymptotically If your primary goal were to conserve jars, on the other hand, you
tight bound of ®(f(n)) on the running time.) could try the following strategy. Start by dropping a jar from the first
rung, then the second rung, and so forth, climbing one higher each time
(c) Although the algorithm you analyzed in parts (a) and (b) is the most until the jar breaks. In this way, you only need a single j ar--at the moment
natural way to solve the problem--after all, it just iterates through
Chapter 2 Basics of Algorithm Analysis Notes and Further Reading 71
70
it breaks, you have the correct answer--but you may have to drop it rt cost spanning trees, and we will discuss randomized hashing in Chapter 13.
times (rather than log rt as in the binary search solution). A number of other data structures are discussed in the book by Tarjan (1983).
The LEDA library (Library of Efficient Datatypes and Algorithms) of Mehlhorn
So here is the trade-off: it seems you can perform fewer drops if
and Ngher (1999) offers an extensive library of data structures useful in
you’re willing to break more jars. To understand better how this trade-
combinatorial and geometric applications.
off works at a quantitative level, let’s consider how to run this experiment
given a fixed "budget" of k >_ 1 jars. In other words, you have to determine Notes on the Exercises Exercise 8 is based on a problem we learned from
the correct answer--the highest safe rung--and can use at most k jars In Sam Toueg.
doing so.
(a) Suppose you are given a budget of k = 2 jars. Describe a strategy for
finding the highest safe rung that requires you to drop a jar at most
f(n) times, for some function f(n) that grows slower than linearly. (In
other words, it should be the case that limn-.~ f(n)/n = 0.)
(b) Now suppose you have a budget of k > 2 jars, for some given k.
Describe a strategy for fInding the highest safe rung using at most
k jars. If fk(n) denotes the number of times you need to drop a jar
should have.
the property that each grows asymptotically slower than the previous
one: lirnn_~ fk(n)/fk_l(n) = 0 for each k.
Our focus in this book is on problems with a discrete flavor. Just as continuous
mathematics is concerned with certain basic structures such as real numbers,
vectors, and matrices, discrete mathematics has developed basic combinatorial
structures that lie at the heart of the subiect. One of the most fundamental and
expressive of these is the graph.
The more one works with graphs, the more one tends to see them ev-
erywhere. Thus, we begin by introducing the basic definitions surrounding
graphs, and list a spectrum of different algorithmic settings where graphs arise
naturally. We then discuss some basic algorithmic primitives for graphs, be-
ginning with the problem of connectivity and developing some fundamental
graph search techniques.
different ways. First, we could have a node for each computer and
When we want to emphasize that the graph we are considering is not
an edge joining u and u if there is a direct physical link connecting
directed, we will cal! it an undirected graph; by default, however, the term
them. Alternatively, for studying the large-scale structure of the Internet,
"graph" will mean an undirected graph. It is also worth mentioning two
warnings in our use of graph terminology. First, although an edge e in an people often define a node to be the set of all machines controlled by
undirected graph should properly be written as a set of nodes {u, u}, one will a single Internet service provider, with an edge joining u and v if there
is a direct peering relationship between them--roughly, an agreement
more often see it written (even in this book) in the notation used for ordered
pairs: e = (u, v). Second, a node in a graph is also frequently called a vertex; to exchange data under the standard BCP protocol that governs global
in this context, the two words have exactly the same meaning. Internet routing. Note that this latter network is more "virtual" than
the former, since the links indicate a formal agreement in addition to
Examples of Graphs Graphs are very simple to define: we just take a collec- a physical connection.
tion of things and join some of them by edges. But at this level of abstraction,
In studying wireless networks, one typically defines a graph where
it’s hard to appreciate the typical kinds of situations in which they arise. Thus,
the nodes are computing devices situated at locations in physical space,
we propose the following list of specific contexts in which graphs serve as
and there is an edge from u to u if u is close enough to u to receive a signal
important models. The list covers a lot of ground, and it’s not important to
remember everything on it; rather, it will provide us with a lot of usefifl ex- from it. Note that it’s often useful to view such a graph as directed, since
it may be the case that u can hear u’s signal but u cannot hear u’s signal
amples against which to check the basic definitions and algorithmic problems
that we’ll be encountering later in the chapter. Also, in going through the list, (if, for example, u has a stronger transmitter). These graphs are also
interesting from a geometric perspective, since they roughly correspond
it’s usefi~ to digest the meaning of the nodes and the meaning of the edges in.
to putting down points in the plane and then joining pairs that are close
the context of the application. In some cases the nodes and edges both corre-
spond to physical objects in the real world, in others the nodes are real objects together.
while the edges are virtual, and in still others both nodes and edges are pure Inyormation networks. The World Wide Web can be naturally viewed as a
abstractions. directed graph, in which nodes correspond to Web pages and there is an
edge from u to v if u has a hyperlink to v. The directedness of the graph
1. Transportation networks. The map of routes served by an airline carrier is crucial here; many pages, for example, link to popular news sites,
naturally forms a graph: the nodes are airports, and there is an edge from
but these sites clearly do not reciprocate all these links. The structure of
u to t~ if there is a nonstop flight that departs from u and arrives at v.
all these hyperlinks can be used by algorithms to try inferring the most
Described this way, the graph is directed; but in practice when there is an
important pages on the Web, a technique employed by most current
edge (u, u), there is almost always an edge (u, u), so we would not lose
search engines.
much by .treating the airline route map as an undirected graph with edges
joining pairs of airports that have nonstop flights each way. Looking at The hypertextual structure of the Web is anticipated by a number of
such a graph (you can generally find them depicted in the backs of in- information networks that predate the Internet by many decades. These
flight airline magazines), we’d quickly notice a few things: there are often include the network of cross-references among articles in an encyclopedia
a small number of hubs with a very large number of incident edges; and or other reference work, and the network of bibliographic citations
it’s possible to get between any two nodes in the graph via a very small among scientific papers.
number of intermediate stops. Social networks. Given any collection of people who interact (the em-
Other transportation networks can be modeled in a similar way. For ployees of a company, the students in a high school, or the residents of
example, we could take a rail network and have a node for each terminal, a small town), we can define a network whose nodes are people, with
and an edge joining u and v if there’s a section of railway track that an edge joining u and v if they are friends with one another. We could
goes between them without stopping at any intermediate terminal. The have the edges mean a number of different things instead of friendship:
standard depiction of the subway map in a major city is a drawing of the undirected edge (u, v) could mean that u and v have had a roman-
such a graph. tic relationship or a financial relationship; the directed edge (u, v) could
Communication networks. A collection of computers connected via a mean that u seeks advice from v, or that u lists v in his or her e-mail
2.
communication network can be naturally modeled as a graph in a few address book. One can also imagine bipartite social networks based on a
3.1 Basic Definitions and Applications 77
Chapter 3 Graphs
76
{3.3) For each j >_ !, layer LI produced by BFS consists of all nodes at distaffce
exactly j from s. There is a path from s to t if and only if t appears in some, the BFS tree, because by the time we look at this edge out of node 3, we
layer. already know about node 5.
(c) We then consider the nodes in layer L2 in order, but the only new node
A further property of breadth-first search is that it produces, in a very discovered when we look through L2 is node 6, which is added to layer
natural way, a tree T rooted at s on the set of nodes reachable from s. L3. Note that the edges (4, 5) and (7, 8) don’t get added to the BFS tree,
Specifically, for each such node v (other than s), consider the moment when because they don’t result in the discovery of new nodes.
v is first "discovered" by the BFS algorithm; this happens when some node (d) No new nodes are discovered when node 6 is examined, so nothing is put
in layer Lj is being examined, and we find that it has an edge to the previously in layer L4, and the algorithm terminates. The full BFS tree is depicted
unseen node v. At this moment, we add the edge (u, v) to the tree
in Figure 3.3 (c).
becomes the parent of v, representing the fact that u is "responsible" for
completing the path to v. We call the tree T that is produced in this way a We notice that as we ran BFS on this graph, the nontree edges all either
breadth-first search tree. connected nodes in the same layer, or connected nodes in adjacent layers. We
Figure 3.3 depicts the construction of a BFS tree rooted at node 1 for the now prove that this is a properW of BFS trees in general.
graph in Figure 3.2. The solid edges are the edges of T; the dotted edges are
edges of G that do not belong to T. The execution of BFS that produces this (3.4) Let T be a breadth-first search tree, let x and y be nodes in T belonging
tree can be described as follows. to layers Li and Lj respectively, and let (x, y) be an edge of G. Then i and j differ
by at most 1.
(a) Starting from node 1, layer L1 consists of the nodes {2, 3}.
(6) Layer Li is then grown by considering the nodes in layer L1 in order (say,
first 2, then 3). Thus we discover nodes 4 and 5 as soon as we look at 2, Proof. Suppose by way of contradiction that i and j differed by more than 1;
so 2 becomes their parent. When we consider node 2, we also discover in particular, suppose i < j - 1. Now consider the point in the BFS algorithm
an edge to 3, but this isn’t added to the BFS tree, since we already know when the edges incident to x were being examined. Since x belongs to layer
about node 3. Li, the only nodes discovered from x belong to layers Li+1 and earlier; hence,
We first discover nodes 7 and 8 when we look at node 3. On the other if y is a neighbor of x, then it should have been discovered by this point at the
hand, the edge from 3 to 5 is another edge of G that does not end up in latest and hence should belong to layer Li+1 or earlier. []
Chapter 3 Graphs 3.2 Graph Connectivity and Graph Traversal
82 83
Current component
Proof. We have already argued that for any node u ~ R, there is a path from s
containing s to v.
Now, consider a node tu ~ R, and suppose bY way of contradiction, that
there is an s-tu path P in G. Since s ~ R but tu R, there must be a first node v
on P that does not belong to R; and this’node ~is not equal to s. Thus there is
a node u immediately preceding u on P, so (u, u) is an edge. Moreover, since v
is the first node on P that does not belong to R, we must have u ~ R. It follows
that (u, v) is an edge where u ~ R and u ~g R; this contradicts the stopping rule
for the algorithm. []
Figure 3.4 When growing the connected component containing s, we look for nodes
like v that have not yet been x4sited.
For any node t in the component R, observe that it is easy to recover the
actual path from s to t along the lines of the argument above: we simply record,
for each node u, the edge (u, u) that was considered in the iteration in which
Exploring a Connected Component u was added to R. Then, by tracing these edges backward from t, we proceed
The set of nodes discovered by the BFS algorithm is precisely those reachable through a sequence of nodes that were added in earlier and earlier iterations,
from the starting node s. We will refer to this set R as the connected component eventually reaching s; this defines an s-t path.
of G containing s; and once we know the connected component containing s, To conclude, we notice that the general algorithm we have defined to
we can simply check whether t belongs to it so as to answer the question of grow R is underspecified, so how do we decide which edge to consider next?
s-t connectivity. The BFS algorithm arises, in particular, as a particular way of ordering the
Now, if one thinks about it, it’s clear that BFS is iust one possible way to nodes we visit--in successive layers, based on their distance from s. But
produce this component. At a more general level, we can build the component there are other natural ways to grow the component, several of which lead
R by "exploring" G in any order, starting from s. To start off, we define R = {s}. to efficient algorithms for the connectivity problem while producing search
Then at any point in time, if we find an edge (u, v) where u ~ R and v ~ R, we patterns with different structures. We now go on to discuss a different one of
can add u to R. Indeed, if there is a path P from s to u, then there is a path these algorithms, depth-first search, and develop some of its basic properties.
from s to v obtained by first following P and then following the edge (u, v).
Figure 3.4 illustrates this basic step in growing the component R.
Depth-First Search
Suppose we continue growing the set R until there are no more edges
leading out of R; in other words, we run the following algorithm. Another natural method to find the nodes reachable from s is the approach you
might take if the graph G were truly a maze of interconnected rooms and you
were walking around in it. You’d start from s and try the first edge leading out
R will consist of nodes to which s has a path of it, to a node u. You’d then follow the first edge leading out of u, and continue
Initially R = {s} in this way until you reached a "dead end"--a node for which you had already
While there is ~u edge (u,u) where uER and explored all its neighbors. You’d then backtrack until you got to a node with
Add u to R an unexplored neighbor, and resume from there. We call this algorithm depth-
Endwhile first search (DFS), since it explores G by going as deeply’ as possible and only
retreating when necessary.
Here is the key property of this algorithm. DFS is also a particular implementation of the generic component-growing
algorithm that we introduced earlier. It is most easily described in recursive
(3 !5)SetR prod~ded at the end of the aIgori&m is ~re~isely the ~b;~ctea form: we can invoke DFS from any starting point but maintain global knowl-
cOmpone~ Of G edge of which nodes have already been explored.
Chapter 3 Graphs 3.2 Graph Connectivity and Graph Traversal
84 85
DFS(u) :
Mark u as "Explored" and add u to R
For each edge (u,u) incident to u
If v is not marked "Explored" then
Recursively invoke DFS(u)
Endif
Endfor
To apply this to s-t connectivity, we simply declare all nodes initially to be not
explored, and invoke DFS(s). Ca) (d)
There are some fundamental similarities and some fundamental differ-
ences between DFS and BFS. The similarities are based on the fact that they
both build the connected component containing s, and we will see in the next
section that they achieve qualitatively similar levels of efficiency.
While DFS ultimately visits exactly the same set of nodes as BFS, it typically "
does so in a very different order; it probes its way down long paths, potentially
getting very far from s, before backing up to try nearer unexplored nodes. We
can see a reflection of this difference in the fact that, like BFS, the DFS algorithm
yields a natural rooted tree T on the component containing s, but the tree will
generally have a very different structure. We make s the root of the tree T,
and make u the parent of v when u is responsible for the discovery of v. That
is, whenever DFS(v) is invoked directly during the ca!l to DFS(u), we add the
edge (u, v) to T. The resulting tree is called a depth-first search tree of the (g)
component R.
Figure 3.5 The construction of a depth-first search tree T for the graph in Figure 3.2,
Figure 3.5 depicts the construction of a DFS tree rooted at node 1 for the with (a) through (g) depicting the nodes as they are discovered in sequence. The solid
graph in Figure 3.2. The solid edges are the edges of T; the dotted edges are edges are the edges of T; the dotted edges are edges of G that do not belong to T.
edges of G that do not belong to T. The execution of DFS begins by building a
path on nodes 1, 2, 3, 5, 4. The execution reaches a dead.end at 4, since there
To establish this, we first observe the following property of the DFS
are no new nodes to find, and so it "backs up" to 5, finds node 6, backs up
algorithm and the tree that it produces.
again to 3, and finds nodes 7 and 8. At this point there are no new nodes to find
in the connected component, so all the pending recursive DFS calls terminate, (3.6) For a given recursive call DFS(u), all nodes that are marked "Explored"
. one by one, and the execution comes to an end. The full DFS tree is depicted between the invocation and end of this recursive call are descendants of u
in Figure 3.5(g). in T.
This example suggests the characteristic way in which DFS trees look
Using (3.6), we prove
different from BFS trees. Rather than having root-to-leaf paths that are as short
as possible, they tend to be quite narrow and deep. However, as in the case (3,7) Let T be a depth-first search tree, let x and y be nodes in T, and let
of BFS, we can say something quite strong about the way in which nontree (x, y) be an edge of G that is not an edge of T. Then one of x ory is an ancestor
edges of G must be arranged relative to the edges of a DFS tree T: as in the
of the other.
figure, nontree edges can only connect ancestors of T to descendants.
3.3 Implementing Graph Traversal Using Queues and Stacks 87
Chapter 3 Graphs
86
component. We then find a node v (if any) that was not visited by the search
Proof. Suppose that (x, y) is an edge of G that is not an edge of T, and suppose
from s, and iterate, using BFg starting from v, to generate its Connected
without loss of generality that x is reached first by the DFS algorithm. When component--which, by (3.8), will be disioint from the component of s. We
the edge (x, y) is examined during the execution of DFS(x), it is not added
continue in this way until all nodes have been visited.
to T because y is marked "Explored." Since y was not marked "Explored"
when DFS(x) was first invoked, it is a node that was discovered between the
invocation and end of the recursive call DFS(x). It follows from (3.6) that y is 3.3 Implementing Graph Traversal Using Queues
a descendant of x. ,, and Stacks
So far we have been discussing basic algorithmic primitives for working with
The Set of All Connected Components graphs without mentioning any implementation details. Here we discuss how
So far we have been talking about the connected component containing a to use lists and arrays to represent graphs, and we discuss the trade-offs
particular node s. But there is a connected component associated with each between the different representations. Then we use these data structures to
node in the graph. What is the relationship between these components? implement the graph traversal algorithms breadth-first search (BFS) and depth-
In fact, this relationship is highly structured and is expressed in the first search (DFS) efficiently. We will see that BFS and DFS differ essentially
following claim. only in that one uses a queue and the other uses a stack, two simple data
structures that we will describe later in this section.
(3.8) For any two nodes s and t in a graph, their connected components are
either identical or disjoint. Representing Graphs
This is a statement that is very clear intuitively, if one looks at a graph like There are two basic ways to represent graphs: by an adjacency matrix and
the example in Figure 3.2. The graph is divided into multiple pieces with no by an adjacency list representation. Throughout the book we wil! use the
edges between them; the largest piece is the connected component of nodes adjacency list representation. We start, however, by reviewing both of these
1 through 8, the medium piece is the connected component of nodes 11, 12, representations and discussing the trade-offs between them.
and 13, and the sma!lest piece is the connected component of nodes 9 and 10. A graph G = (V, E) has two natural input parameters, the number of nodes
To prove the statement in general, we )ust need to show how to define these IVI, and the number of edges IEI. We will use n = IVI and m = IEI to denote
"pieces" precisely for an arbitrary graph. these, respectively. Running times will be given in terms of both of these two
parameters. As usual, we will aim for polynomial running times, and lower-
Proof. Consider any two nodes s and t in a graph G with the property that degree polynomials are better. However, with two parameters in the running
there is a path between s and t. We claim that the connected components time, the comparison is not always so clear. Is O(m2) or O(n3) a better running
containing s and t are the same set. Indeed, for any node v in the component
time? This depends on what the relation is between n and m. With at most
of s, the node v must also be reachable from t by a path: we can )fist walk one edge between any pair of nodes, the number of edges m can be at most
from t to s, and then on from s to v. The same reasoning works with the roles
(~) < n2. On the other hand, in many applications the graphs of interest are
of s and t reversed, and so a node is in the component of one if and only if it connected, and by (3.1), connected graphs must have at least m > n - ! edges.
is in the component of the other. But these comparisons do not always tell us which of two running times (such
On the other hand, if there is no path between s and t, then there cannot as m2 and n3) are better, so we will tend to keep the running times in terms
be a node v that is in the connected component of each. For if there were such of both of these parameters. In this section we aim to. implement the basic
a node v, then we could walk from s to v and then on to t, constructing a graph search algorithms in time O(m + n). We will refer to this as linear time,
path between s and t. Thus, if there is no path between s and t, then their since it takes O(m + n) time simply to read the input. Note that when we work
connected components are dis)tint. ,, with connected graphs, a running time of O(m + n) is the same as O(m), since
m>_n-1.
This proof suggests a natural algorithm for producing all the connected
Consider a graph G = (V, E) with n nodes, and assume the set of nodes
components of a graph, by growing them one component at a time. We start
is V = {1 ..... n}. The simplest way to represent a graph is by an adjacency
with an arbi~ary node s, and we use BFS (or DFS) to generate its connected
3.3 Implementing Graph Traversal Using Queues and Stacks 89
Chapter 3 Graphs
88
Proof. Each edge e = (u, w) contributes exactly twice to this sum: once in the.
matrix, which is an n x n matrix A where A[u, v] is equal to ! if the graph
quantity nu and once in the quantity nw. Since the sum is the total of th~
contains the edge (u, v) and 0 otherwise. If the graph is undirected, the matrix A
contributions of each edge, it is 2m. ¯
is symmetric, with A[u, v] = A[v, u] for all nodes u, v ~ V. The adjacency
matrix representation allows us to check in O(1) time if a given edge (u, v) is
present in the graph. However, the representation has two basic disadvantages. We sum up the comparison between adjacency matrices and adjacency
lists as follows.
o The representation takes ®(n2) space. When the graph has many fewer
edges than n2, more compact representations are possible. (3.10) The adjacency matrix representation of a graph requires O(n2) space,
o Many graph algorithms need to examine all edges incident to a given node while the adjacency list representation requires only O(m + n) ~pace.
v. In the adjacency matrix representation, doing this involves considering
all other nodes w, and checking the matrix entry A[v, w] to see whether Since we have already argued that m < n2, the bound O(m + n) is never
the edge (v, w) is present--and this takes ®(n) time. In the worst case, worse than O(n2); and it is much better when the underlying graph is sparse,
v may have ® (n) incident edges, in which case checking all these edges with m much smaller than n2.
will take ® (n) time regardless of the representation. But many graphs in Now we consider the ease of accessing the information stored in these two
practice have significantly fewer edges incident to most nodes, and so it
different representations. Recall that in an adjacency matrix we can check in
would be good to be able to find all these incident edges more efficiently.
O(1) time if a particular edge (u, v) is present in the graph. In the adjacency list
The representation of graphs used throughout the book is the adjacency representation, this can take time proportional to the degree O(nv): we have to
list, which works better for sparse graphs--that is, those with many fewer than follow the pointers on u’s adjacency list to see if edge u occurs on the list. On
n2 edges. In the adjacency list representation there is a record for each node u, the other hand, if the algorithm is currently looking at a node u, it can read
containing a list of the nodes to which v has edges. To be precise, we have an the list of neighbors in constant time per neighbor.
array Adj, where Adj [v] is a record containing a list of all nodes adjacent to In view of this, the adjacency list is a natural representation for explorihg
node v. For an undirected graph G = (V, E), each edge e = (v, w) ~ E occurs on graphs. If the algorithm is currently looking at a node u, it can read this list
two adjacency lists: node w appears on the list for node v, and node ~ appears of neighbors in constant time per neighbor; move to a neighbor ~ once it
on the list for node w. encounters it on this list in constant time; and then be ready to read the list
Let’s compare the adiacency matrix and adiacency list representations. associated with node v. The list representation thus corresponds to a physical
First consider the space required by the representation. An adjacency matrix notion of "exploring" the graph, in which you learn the neighbors of a node
requires O(n2) space, since it uses an n x n matrix. In contrast, we claim that u once you arrive at u, and can read them off in constant time per neighbor.
the adjacency list representation requires only O(m + n) space. Here is why.
First, we need an array of pointers of length n to set up the lists in Adj, and Queues and Stacks
then we need space for all the lists. Now, the lengths of these lists may differ
Many algorithms have an inner step in which they need to process a set of
from node to node, but we argued in the previous paragraph that overall, each
elements, such the set of all edges adjacent to a node in a graph, the set of
edge e = (v, w) appears in exactly two of the lists: the one for u and the one
visited nodes in BFS and DFS, or the set of all free men in the Stable Matching
for w. Thus the total length of al! lists is 2m = O(m).
algorithm. For this purpose, it is natural to maintain the set of elements to be
Another (essentially equivalent) way to iustify this bound is as follows. considered in a linked list, as we have done for maintaining the set of free men
We define the degree nv of a node v to be the number of incident edges it has. in the Stable Matching algorithm.
The length of the list at Adj [u] is list is nv, so the total length over all nodes is
One important issue that arises is the order in which to consider the
O (~v~v nu). Now, the sum of the degrees in a graph is a quantity that often
elements in such a list. In the Stable Matching algorithm, the order in which
comes up in the analysis of graph algorithms, so it is useful to work out what we considered the free men did not affect the outcome, although this required
this sum is. a fairly subtle proof to verify. In many other algorithms, such as DFS and BFS,
the order in which elements are considered is crucial.
(3.9) ~u~v nv=2m.
90 Chapter 3 Graphs 3.3 Implementing Graph Traversal Using Queues and Stacks 91
Two of the simplest and most natural options are to maintain a set of Add u to the list L[i+ I]
elements as either a queue or a stack. A queue is a set from which we extract Endif
elements in first-in, first-out (FIFO) order: we select elements in the same order End/or
in which they were added. A stack is a set from which we extract elements Increment the layer counter i by one
in last-in, first-out (LIFO) order: each time we select an element, we choose Endwhile
the one that was added most recently. Both queues and stacks can be easily
implemented via a doubly linked list. In both cases, we always select the first In this implementation it does not matter whether we manage each list
element on our list; the difference is in where we insert a new element. In a L[i] as a queue or a stack, since the algorithm is allowed to consider the nodes
queue a new element is added to the end of the list as the last element, while in a layer Li in any order.
in a stack a new element is placed in the first position on the list. Recall that a
doubly linked list has explicit First and Last pointers to the beginning and (3.11) The above implementation of the BFS algorithm tans in time O(m + n)
end, respectively, so each of these insertions can be done in constant time. (i.e., linear in the input size), if the graph is given by the adjacency list
representation.
Next we will discuss how to implement the search algorithms of the
previous section in linear time. We will see that BFS can be thought of as
Proof. As a first step, it is easy to bound the running time of the algorithm
using a queue to select which node to consider next, while DFS is effectively
by O(n2) (a weaker bound than our claimed O(m + n)). To see this, note that
using a stack.
there are at most n lists L[i] that we need to set up, so this takes O(n) time.
Now we need to consider the nodes u on these lists. Each node occurs on at
Implementing Breadth-First Search most one list, so the For loop runs at most n times over a].l iterations of the
The adjacency list data stxucture is ideal for implementing breadth-first search. While loop. When we consider a node u, we need to look through all edges
The algorithm examines the edges leaving a given node one by one. When we (u, u) incident to u. There can be at most n such edges, and we spend O(1)
are scanning the edges leaving u and come to an edge (u, u), we need to time considering each edge. So the total time spent on one iteration of the For
know whether or not node u has been previously discovered by the search. loop is at most O(n). We’ve thus concluded that there are at most n iterations
To make this simple, we maintain an array Discovered of length n and set of the For loop, and that each iteration takes at most O(n) time, so the total
Discovered[u] = true as soon as our search first sees u. The algorithm, as time is at most O(n2).
described in the previous section, constructs layers of nodes LI, L2 ..... where To get the improved O(m + n) time bound, we need to observe that the
Li is the set of nodes at distance i from the source s. To maintain the nodes in For loop processing a node u can take less than O(n) time if u has only a
a layer Li, we have a list L[i] for each i --- 0, I, 2 few neighbors. As before, let nu denote the degree of node u, the number of
edges incident to u. Now, the time spent in the For loop considering edges
BFS (s) : incident to node u is O(nu), so the total over all nodes is O(Y~u~v ha). Recall
Set Discovered[s] = true and Discovered[u] = false for all other u from (3.9) that ~,v nu = 2m, and so the total time spent considering edges
Initialize L[0] to consist of the single element s over the whole algorithm is O(m). We need O(n) additional time to set up
Set the layer counter i----0 lists and manage the array Discovered. So the total time spent is O(m + n)
Set the current BFS tree T=0 as claimed. ~
While /[f] is not empty
Initialize an empty list i[i÷ I] We described the algorithm using up to n separate lists L[i] for each layer
For each node u E i[i] L~. Instead of all these distinct lists, we can implement the algorithm using a
Consider each edge (u, u) incident to u single list L that we maintain as a queue. In this way, the algorithm processes
If Discovered[u] = false then nodes in the order they are first discovered: each time a node is discovered,
Set Discovered[u] = true it is added to the end of the queue, and the algorithm always processes the
Add edge (u,u) to the tree T edges out of the node that is currently first in the queue.
3.3 Implementing Graph Traversal Using Queues and Stacks 93
Chapter 3 Graphs
92
If we maintain the discovered nodes in this order, then al! nodes in layer Li DFS (s) :
will appear in the queue ahead of all nodes in layer Li+l, for i = 0, 1, 2 .... Thus, Initialize S to be a stack with one element s
all nodes in layer Li will be considered in a contiguous sequence, followed While S is not empty
by all nodes in layer Li+l, and so forth. Hence this implementation in terms Take a node u from S
of a single queue wi!l produce the same result as the BFS implementation If Explored[u] = false then
above. Set Explored[u] = true
For each edge (u, v) incident to u
Add v to the stack S
Endfor
Implementing Depth-First Search
Endif
We now consider the depth-first search algorithm: In the previous section we
Endwhile
presented DFS as a recursive procedure, which is a natural way to specify it.
However, it can also be viewed as almost identical to BFS, with the difference There is one final wrinkle to mention. Depth-first search is underspecified,
that it maintains the nodes to be processed in a stack, rather than in a queue. since the adjacency list of a node being explored can be processed in any order.
Essentially, the recursive structure of DFS can be viewed as pushing nodes Note that the above algorithm, because it pushes all adjacent nodes onto the
onto a stack for later processing, while moving on to more freshly discovered stack before considering any of them, in fact processes each adjacency list
nodes. We now show how to implement DFS by maintaining this stack of in the reverse order relative to the recursive version of DFS in the previous
nodes to be processed explicitly. section.
In both BFS and DFS, there is a distinction between the act of discovering
a node v--the first time it is seen, when the algorithm finds an edge leading (3.12) The above algorithm implements DFS, in the sense that it visits the
to v--and the act of exploring a node v, when all the incident edges to v are nodes in exactly the same order as the recursive DFS procedure in the previous
scanned, resulting in the potential discovery of further nodes. The difference section (except that each ad]acency list is processed in reverse order).
between BFS and DFS lies in the way in which discovery and exploration are
interleaved. If we want the algorithm to also find the DFS tree, we need to have each
In BFS, once we started to explore a node u in layer Li, we added all its node u on the stack S maintain the node that "caused" u to get added to
newly discovered neighbors to the next layer L~+I, and we deferred actually the stack. This can be easily done by using an array parent and setting
exploring these neighbors until we got to the processing of layer L~+I. In parent[v] = u when we add node v to the stack due to edge (u, v). When
contrast, DFS is more impulsive: when it explores a node u, it scans the we mark a node u # s as Explored, we also can add the edge (u,parent[u])
neighbors of u until it finds the fffst not-yet-explored node v (if any), and to the tree T. Note that a node v may be in the stack S multiple times, as it
can be adjacent to multiple nodes u that we explore, and each such node adds
then it immediately shifts attention to exploring v.
a copy of v to the stack S. However, we will only use one of these copies to
To implement the exploration strategy of DFS, we first add all of the nodes explore node v, the copy that we add last. As a result, it suffices to maintain one
adjacent to u to our list of nodes to be considered, but after doing this we value parent [v] for each node v by simply overwriting the value parent [v]
proceed to explore a new neighbor v of u. As we explore v, in turn, we add every time we add a new copy of v to the stack S.
the neighbors of v to the list we’re maintaining, but we do so in stack order,
The main step in the algorithm is to add and delete nodes to and from
so that these neighbors will be explored before we return to explore the other
the stack S, which takes O(1) time. Thus, to bound t~e running time, we
neighbors of u. We only come back to other nodes adjacent to u when there
need to bound the number of these operations. To count the number of stack
are no other nodes left.
operations, it suffices to count the number of nodes added to S, as each node
In addition, we use an array Explored analogous to the Discovered array needs to be added once for every time it can be deleted from S.
we used for BFS. The difference is that we only set Explored[v] to be true
when we scan v’s incident edges (when the DFS search is at v), while BFS sets How many elements ever get added to S? As before, let nu denote the
Discovered[v] to true as soon as v is first discovered. The implementation degree of node v. Node v will be added to the stack S every time one of its
nv adjacent nodes is explored, so the total number of nodes added to S is at
in full looks as follows.
Chapter 3 Graphs 3.4 Testing Bipartiteness: An Application of Breadth-First Search
94 95
most ~u nv = 2m. This proves the desired O(m + n) bound on the running Clearly a triangle is not bipartite, since we can color one node red,,another
time of DFS. one blue, and then we can’t do anything with the third node. More generally,
consider a cycle C of odd leng~, with nodes numbered 1, 2, 3 ..... 2k, 2k + 1.
(3.13) The above implementation of the DFS algorithm runs in time O( m + n) If we color node 1 red, then we must color node 2 blue, and then we must color
(i.e., linear in the input size), if the graph is given by the adjacency list node 3 red, and so on--coloring odd-numbered nodes red and even-numbered
representation. nodes blue. But then we must color node 2k + 1 red, and it has an edge to node
1, which is also red. This demonstrates that there’s no way to partition C into
red and blue nodes as required. More generally, if a graph G simply contains
Finding the Set of All Connected Components an odd cycle, then we can apply the same argument; thus we have established
In the previous section we talked about how one c.an use BFS (or DFS) to find the following.
all connected components of a graph. We start with an arbitxary node s, and
we use BFS (or DFS) to generate its connected component. We then find a (3,14) If.d graph G is bipartite, then it cannot contain an odd cycle.
node v (if any) that was not visited by the search from s and iterate, using
BFS (or DFS) starting from v to generate its connected component--which, by It is easy to recognize that a graph is bipartite when appropriate sets X
(3.8), wil! be disjoint from the component of s. We continue in this way until and Y (i.e., red and blue nodes) have actually been identified for us; and in
all nodes have been visited. many settings where bipartite graphs arise, this is natural. But suppose we
Although we earlier expressed the running time of BFS and DFS as O(m +" encounter a graph G with no annotation provided for us, and we’d like to
n), where m and n are the total number of edges and nodes in the graph, both determine for ourselves whether it is bipartite--that is, whether there exists a
BFS and DFS in fact spend work only on edges and nodes in the connected partition into red and blue nodes, as required. How difficult is this? We see from
component containing the starting node. (They never see any of the other (3.14) that an odd cycle is one simple "obstacle" to a graph’s being bipartite.
nodes or edges.) Thus the above algorithm, although it may run BFS or Are there other, more complex obstacles to bipartitness?
DFS a number of times, only spends a constant amount of work on a given
edge or node in the iteration when the connected component it belongs to is
under consideration. Hence the overall running time of this algorithm is still /’~ Designing the Algorithm ~
O(m + n). In fact, there is a very simple procedure to test for bipartiteness, and its analysis
can be used to show that odd cycles are the only obstacle. First we assume
the graph G is connected, since otherwise we can first compute its connected
3.4 Testing Bipartiteness: An Application of
components and analyze each of them separately. Next we pick any node s ~ V
Breadth-First Search and color it red; there is no loss in doing this, since s must receive some color.
Recall the definition of a bipartite graph: it is one where the node set V can It follows that all the neighbors of s must be colored blue, so we do this. It
be partitioned into sets X and Y in such a way that every edge has one end then follows that all the neighbors of these nodes must be colored red, their
in X and the other end in Y. To make the discussion a little smoother, we can neighbors must be colored blue, and so on, unti! the whole graph is colored. At
imagine that the nodes in the set X are colored red, and the nodes in the set this point, either we have a valid red/blue coloring of G, in which every edge
Y are colored blue. With this imagery, we can say a graph is bipartite if it is has ends of opposite colors, or there is some edge with ends of the same color.
possible to color its nodes red and blue so that every edge has one red end In this latter case, it seems clear that there’s nothing we ’could have donei G
and one blue end. simply is not bipartite. We now want to argue this point precisely and also
work out an efficient way to perform the coloring.
~ The Problem The first thing to notice is that the co!oring procedure we have just
In the earlier chapters, we saw examples of bipartite graphs. Here we start by described is essentially identical to the description of BFS: we move outward
asking: What are some natural examples of a nonbipartite graph, one where from s, co!oring nodes as soon as we first encounter them. Indeed, another
no such partition of V is possible? way to describe the coloring algorithm is as follows: we perform BFS, coloring
3.5 Connectivity in Directed Graphs 97
Chapter 3 Graphs
96
and then the y-z path in T. The length of this cycle is (j - i) + 1 + (j - i),-adding
s red, all of layer L1 blue, all of layer L2 red, and so on, coloring odd-numbered the length of its three parts separately; this is equal to 2(j - i) + 1;which is an
layers blue and even-numbered layers red. odd number. []
We can implement this on top of BFS, by simply taking the implementation
of BFS and adding an extra array Color over the nodes. Whenever we get
to a step in BFS where we are adding a node v to a list L[i + 1], we assign 3.5 Connectivity in Directed Graphs
Color[u] = red if i + I is an even number, and Color[u] = blue if i + 1 is an Thus far, we have been looking at problems on undirected graphs; we now
odd number. At the end of this procedure, we simply scan al! the edges and consider the extent to which these ideas carry over to the case of directed
determine whether there is any edge for which both ends received the same graphs.
color. Thus, the total running time for the coloring algorithm is O(m + n), iust Recall that in a directed graph, the edge (u, v) has a direction: it goes from
as it is for BFS. u to v. In this way, the relationship between u and v is asymmetric, and this
has qualitative effects on the structure of the resulting graph. In Section 3.1, for
~ Analyzing the Algorithm example, we discussed the World Wide Web as an instance of a large, complex
We now prove a claim that shows this algorithm correctly determines whether directed graph whose nodes are pages and whose edges are hyperlinks. The act
G is bipartite, and it also shows that we can find an odd cycle in G whenever of browsing the Web is based on following a sequence of edges in this directed
it is not bipartite. graph; and the directionality is crucial, since it’s not generally possible to
browse "backwards" by following hyperlinks in the reverse direction.
(3.15} Let G be a connected graph, and let LI, L2 .... be the layers produced At the same time, a number of basic definitions and algorithms have
by BFS starting at node s. Then exactly one of the following two things must natural analogues in the directed case. This includes the adjacency list repre-
hold. sentation and graph search algorithms such as BFS and DFS. We now discuss
these in turn.
(0 There is no edge of G joining two nodes of the same layer. In this case G
is a bipartite graph in which the nodes in even-numbered layers can be
colored red, and the nodes in odd-numbered layers can be colored blue. Representing Directed Graphs
(ii) There is an edge of G joining two nodes of the same layer. In this case, G In order to represent a directed graph for purposes of designing algorithms,
~ he cycle through x, y,~
d z has odd length9
contains an odd-length cycle, and so it cannot be bipartite. we use a version of the adiacency list representation that we employed for
undirected graphs. Now, instead of each node having a single list of neighbors,
each node has two lists associated with it: one list consists of nodes to which it
ProoL First consider case (i), where we suppose that there is no edge joining
has edges, and a second list consists of nodes from which it has edges. Thus an
two nodes of the same layer. By (3.4), we know that every edge of G ioins nodes
algorithm that is currently looking at a node u can read off the nodes reachable
either in the same layer or in adiacent layers. Our assumption for case (i) is
by going one step forward on a directed edge, as well as the nodes that would
precisely that the first of these two alternatives never happens, so this means
be reachable if one went one step in the reverse direction on an edge from u.
Layer Li that every edge joins two nodes in adjacent layers. But our coloring procedure
gives nodes in adjacent layers the opposite colors, and so every edge has ends The Graph Search Algorithms
with opposite colors. Thus this coloring establishes that G is bipartite.
Breadth-first search and depth-first search are almost the same in directed
Now suppose we are in case (ii); why must G contain an odd cycle? We graphs as they are in undirected graphs. We will focus here on BFSi We start
are told that G contains an edge joining two nodes of the same layer. Suppose at a node s, define a first layer of nodes to consist of all those to which s has
Layer Lj
this is the edge e = (x, y), with x, y ~ Lj. Also, for notational reasons, recall an edge, define a second layer to consist of all additional nodes to which these
that L0 ("layer 0") is the set consisting of just s. Now consider the BFS tree T first-layer nodes have an edge, and so forth. In this way, we discover nodes
Figure 3.6 If two nodes x and
y in the same layer a_re joined produced by our algorithm, and let z be the node whose layer number is as layer by layer as they are reached in this outward search from s, and the nodes
by an edge, then the cycle large as possible, subject to the condition that z is an ancestor of both x and y in layer j are precisely those for which the shortest path from s has exactly
through x, y, and their lowest in T; for obvious reasons, we can cal! z the lowest common ancestor of x and y.
common ancestor z has odd j edges. As in the undirected case, this algorithm performs at most constant
length, demonstrating that Suppose z ~ Li, where i < j. We now have the situation pictured in Figure 3.6. work for each node and edge, resulting in a running time of O(m + n).
the graph cannot be bipartite. We consider the cycle C defined by following the z-x path in T, then the edge e,
3.6 Directed Acyclic Graphs and Topological Ordering
Chapter 3 Graphs 99
98
every node has a path to s. Then s and v are mutually reachable for every v,
It is important to understand what this directed version of BFS is comput-
and so it follows that every two nodes u and v are mutually reachable: s and
ing. In directed graphs, it is possible for a node s to have a path to a node t u are mutually reachable, and s and v are mutually reachable, so by (3.16) we
even though t has no path to s; and what directed BFS is computing is the set
also have that u and v are mutually reachable.
of all nodes t with the property that s has a path to t. Such nodes may or may
By analogy with connected components in an undirected graph, we can
not have paths back to s.
define the strong component containing a node s in a directed graph to be the
There is a natural analogue of depth-first search as well, which also runs
set of all v such that s and v are mutually reachable. If one thinks about it, the
in linear time and computes the same set of nodes. It is again a recursive
algorithm in the previous paragraph is really computing the strong component
procedure that tries to explore as deeply as possible, in this case only following
containing s: we run BFS starting from s both in G and in Gre"; the set of nodes
edges according to their inherent direction. Thus, when DFS is at a node u, it
reached by both searches is the set of nodes with paths to and from s, and
recursively launches .a depth-first search, in order, for each node to which u
hence this set is the strong component containing s.
has an edge.
There are further similarities between the notion of connected components
Suppose that, for a given node s, we wanted the set of nodes with paths
to s, rather than the set of nodes to which s has paths. An easy way to do this in undirected graphs and strong components in directed graphs. Recall that
connected components naturally partitioned the graph, since any two were
would be to define a new directed graph, Grev, that we obtain from G simply
either identical or disjoint. Strong components have this property as well, and
by reversing the direction of every edge. We could then run BFS or DFS in GreY;
for essentially the same reason, based on (3.16).
a node has a path from s in Gre~ if and only if it has a path to s in G.
(3.17) For any two nodes s and t in a directed graph, their strong Components
Strong Connectivity
are either identical or disjoint.
Recall that a directed graph is strongly connected if, for every two nodes u and
v, there is a path from u to v and a path from v to u. It’s worth also formulating
Proof. Consider any two nodes s and t that are mutually reachable; we claim
some terminology for the property at the heart of this definition; let’s say that
that the strong components containing s and t are identical. Indeed, for any
two nodes u and v in a directed graph are mutually reachable if there is a path
node v, if s and v are mutually reachable, then by (3.16), t and v are mutually
from u to v and also a path from v to u. (So a graph is strongly connected if
reachable as wel!. Similarly, if t and v are mutually reachable, then again by
every pair of nodes is mutually reachable.)
(3.16), s and v are mutually reachable.
Mutual teachability has a number of nice properties, many of them stem-
On the other hand, if s and t are not mutually reachable, then there cannot
ruing from the following simple fact.
be a node v that is in the strong component of each. For if there were such
(3.16) If u and v are mutually reachable, and v and iv are mutually reachable, a node v, then s and u would be mutually reachable, and ~ and t would be
then u and iv are mutually reachable. mutually reachable, so from (3.16) it would follow that s and t were mutually
Proof. To construct a path from u to w, we first go from u to v (along the reachable. ,,
path guaranteed by the mutual teachability of u and v), and then on from v In fact, although we will not discuss the details of this here, with more
to iv (along the path guaranteed by the mutual teachability of v and w). To work it is possible to compute the strong components for all nodes in a total
construct a path from w to u, we just reverse this reasoning: we first go from time of O(m + n).
iv to v (along the path guaranteed by the mutual reachability of v and iv), and
then on from v to u (along the path guaranteed by the mutual teachability of 3.6 Directed Acyclic Graphs and
u and v). a
Topological Ordering
There is a simple linear-time algorithm to test if a directed graph is strongly
If an undirected graph has no cycles, then it has an extremely simple structure:
connected, implicitly based on (3.16). We pick any node s and run BFS in G
each of its connected components is a tree. But it is possible for a directed graph
starting from s. We then also run BFS starting from s in Grev. Now, if one of
to have no (directed).cycles and still have a very rich structure. For example,
these two searches fails to reach every node, then clearly G is not strongly
such graphs can have a large number of edges: if we start with the node
connected. But suppose we find that s has a path to every node, and that
3.6 Directed Acyclic Graphs and Topological Ordering 101
Chapter 3 Graphs
100
Let’s continue a little further with this picture of DAGs as precedence
~e~ a topological ordering, all
ges point from left to right.)
relations. Given a set of tasks with dependencies, it would be natural to seek
a valid order in which the tasks could be performed, so that all dependencies
are respected. Specifically, for a directed graph G, we say that a topological
ordering of G is an ordering of its nodes as ul, u2 ..... un so that for every edge
(ui, uj), we have i < j. In other words, all edges point "forward" in the ordering.
A topological ordering on tasks provides an order in which they can be safely
performed; when we come to the task vj, all the tasks that are required to
precede it have already been done. In Figure 3.7(b) we’ve labeled the nodes of
the DAG from part (a) with a topological ordering; note that each edge indeed
(c) goes from a lower-indexed node to a higher-indexed node.
In fact, we can view a topological ordering of G as providing an immediate
Figure 3.7 (a) A directed acyclic graph. (b) The same DAG with a topological ordering, "proof" that G has no cycles, via the following.
specified by the labels on each node. (c) A different drawing of the same DAG, arranged
so as to emphasize the topological ordering.
(3.18} If G has a topological ordering, then G is a DAG.
set {1, 2 ..... n} and include an edge (i,j) whenever i <j, then the resulting Proof. Suppose, by way of contradiction, that G has a topological ordering
directed graph has (~) edges but no cycles. un, and also has a cycle C. Let ui be the lowest-indexed node on C,
If a directed graph has no cycles, we call it--naturally enough--a directed and let uj be the node on C just before ui--thus (vj, vi) is an edge. But by our
acycIic graph, or a DAG for short. (The term DAG is typically pronounced as a choice of i, we havej > i, which contradicts the assumption that u1, u2 ..... un
word, not spelled out as an acronym.) In Figure 3.7(a) we see an example of was a topological ordering. ~
a DAG, although it may take some checking to convince oneself that it really
has no directed cycles. The proof of acyclicity that a topological ordering provides can be very
useful, even visually. In Figure 3.7(c), we have drawn the same graph as
in (a) and (b), but with the nodes laid out in the topological ordering. It is
~,~ The Problem
immediately clear that the graph in (c) is a DAG since each edge goes from left
DAGs are a very common structure in computer science, because many kinds
to right.
of dependency networks of the type we discussed in Section 3.1 are acyclic.
Thus DAGs can be used to encode precedence relations or dependencies in a Computing a Topological Ordering Themain question we consider here is
natural way. Suppose we have a set of tasks labeled {1, 2 ..... n} that need to the converse of (3. ! 8): Does every DAG have a topological ordering, and if so,
be performed, and there are dependencies among them stipulating, for certain how do we find one efficiently? A method to do this for every DAG would be
pairs i and j, that i must be performed before j. For example, the tasks may be very useful: it would show that for any precedence relation on a set of tasks
courses, with prerequisite requirements stating that certain courses must be without cycles, there is an efficiently computable order in which to perform
taken before others. Or the tasks may correspond to a pipeline of computing the tasks.
iobs, with assertions that the output of iob i is used in determining the input
to iob j, and hence job i must be .done before iob j. ~ Designing and Analyzing the Algorithm
We can represent such an interdependent Set of tasks by introducing a In fact, the converse of (3.18) does hold, and we establish this via an efficient
node for each task, and a directed edge (i, j) whenever i must be done before algorithra to compute a topological ordering. The key to this lies in finding a
j. If the precedence relation is to be at all meaningful, the resulting graph G way to get started: which node do we put at the beginning of the topological
must be a DAG. Indeed, if it contained a cycle C, there would be no way to do ordering? Such a node Vl would need to have no incoming edges, since any
any of the tasks in C: since each task in C cannot begin until some other one such incoming edge would violate the defining property of the topological
completes, no task in C could ever be done, since none could be done first.
3.6 Directed Acyclic Graphs and Topological Ordering
Chapter 3 Graphs 103
102
ordering, that all edges point forward. Thus, we need to prove the following
fact.
(3.19) In every DAG G, there is a node v with no incoming edges.
Proof. Let G be a directed graph in which every node has at least one incoming
edge. We show how to find a cycle in G; this will prove the claim. We pick
any node v, and begin following edges backward from v: sihce v has at least
one incoming edge (u, v), we can walk backward to u; then, since u has at (a)
least one incoming edge (x, u), we can walk backward to x; and so on. We
can continue this process indefinitely, since every node we encounter has an
incoming edge. But after n + I steps, we will have visited some node w twice. If
we let C denote the sequence of nodes encountered between successive visits
to w, then clearly C forms a cycle, m
In fact, the existence of such a node v is all we need to produce a topological (d) (e) (f)
ordering of G by induction. Specifically, let us claim by induction that every
Figure 3.8 Starting from the graph in Figure 3.7, nodes are deleted one by one so as
DAG has a topological ordering. This is clearly true for DAGs on one or two to be added to a topologica! ordering. The shaded nodes are those with no incoming
nodes. Now suppose it is true for DAGs with up to some number of nodes n. edges; note that there is always at least one such edge at every stage of the algorithm’s
Then, given a DAG G on n + 1 nodes, we find a node v with no incoming edges, execution.
as guaranteed by (3.19). We place v first in the topological ordering; this is
safe, since all edges out of v will point forward. Now G-(v} is a DAG, since
deleting v cannot create any cycles that weren’t there previously. Also, G- {v}
(3.19) guarantees, is that when we apply this algorithm to a DAG, there will
has n nodes, so we can apply the induction hypothesis to obtain a topological
always be at least one such node available to delete.
ordering of G- {v}. We append the nodes of G- {v} in this order after v; this is
an ordering of G in which all edges point forward, and hence it is a topological To bound the running time of this algorithm, we note that identifying a
node v with no incoming edges, and deleting it from G, can be done in O(n)
ordering.
time. Since the algorithm runs for n iterations, the total running time is O(n2).
Thus we have proved the desired converse of (3.18).
This is not a bad running time; and if G is very dense, containing ®(n2)
(3.20) ff G is a DAG, then G has a topological ordering. .... ~ edges, then it is linear in the size of the input. But we may well want something
better when the number of edges m is much less than n2. In such a case, a
The inductive proof contains the following algorithm to compute a topo- running time ofO(m + n) could be a significant improvement over ®(n2).
logical ordering of G. In fact, we can achieve a running time of O(m + n) using the same high-
level algorithm--iteratively deleting nodes with no incoming edges. We simply
To compute a topological ordering of G: have to be more efficient in finding these nodes, and we do tBis as follows.
Find a node v with no incoming edges and order it first We declare a node to be "active" ff it has not yet been deleted by the
Delete v from G algorithm, and we explicitly maintain two things:
Recursively compute a topological ordering of G-{v}
and append this order after u (a) for each node m, the number of incoming edges that tv has from active
nodes; and
In Figure 3.8 we show the sequence of node deletions that occurs when this
algorithm is applied to the graph in Figure 3.7. The shaded nodes in each
(b) the set S of all active nodes in G that have no incoming edges from other
active nodes.
iteration are those with no incoming edges; the crucial point, which is what
Solved Exercises 105
Chapter 3 Graphs
104
with a base station, and your friends find that if the robots get too close to one
At the start, all nodes are active, so we can initialize (a) and (b) with a single
another, then there are problems with interference among the transmitters. So
pass through the nodes and edges. Then, each iteration consists of selecting
a natural problem arises: how to plan the motion of the robots in such a way
a node u from the set S and deleting it. After deleting u, we go through all
that each robot gets to its intended destination, but in the process the robots
nodes tv to which u had an edge, and subtract one from the number of active
don’t come close enough together to cause interference problems.
incoming edges that we are maintaining for w. If this causes the number
of active incoming edges tow to drop to zero, then we add tv to the set S. We can model this problem abstractly as follows. Suppose that we have
Proceeding in this way, we keep track of nodes that are eligible for deletion at an undirected graph G = (V, E), representing the floor plan of a building, and
all times, while spending constant work per edge over the course of the whole there are two robots initially located at nodes a and b in the graph. The robot
at node a wants to travel to node c along a path in G, and the robot at node b
algorithm.
wants to travel to node d. This is accomplished by means of a schedule: at
each time step, the schedule specifies that one of the robots moves across a
Solved Exercises single edge, from one node to a neighboring node; at the end of the schedule,
the robot from node a should be sitting on c, and the robot from b should be
Solved Exercise 1 sitting on d.
Figure 3.9 How many topo- Consider the directed acyclic graph G in Figure 3.9. How many topological A schedule is interference-free if there is no point at which the two.robots
logical orderings does this orderings does it have? occupy nodes that are at a distance < r from one another in the graph, for a
graph have? Solution Recall that a topological ordering of G is an ordering of the nodes. given parameter r. We’ll assume that the two starting nodes a and b are at a
as vl, v2 ..... vn so that all edges point "forward": for every edge (vi, vj), we distance greater than r, and so are the two ending nodes c and d.
have i < j. Give a polynomial-time algorithm that decides whether there exists an
So one way to answer this question would be to write down all 5- 4.3.2- interference-free schedule by which each robot can get to its destination.
1 = 120 possible orderings and check whether each is a topological ordering.
Solution This is a problem of the following general flavor. We have a set
But t_his would take a while.
of possible configurations for the robots, where we define a configuration
Instead, we think about this as follows. As we saw in the text (or reasoning
to be a choice of location for each one. We are trying to get from a given
directly from the definition), the first node in a topological ordering must be
starting configuration (a, b) to a given ending configuration (c, d), subject to
one that has no edge coming into it. Analogously, the last node must be one
constraints on how we can move between configurations (we can only change
that has no edge leaving it. Thus, in every topological ordering of G, the node a
one robot’s location to a neighboring node), and also subject to constraints on
must come first and the node e must come last. which configurations are "legal."
Now we have to figure how the nodes b, c, and d can be arranged in the
This problem can be tricky to think about if we view things at the level of
middle of the ordering. The edge (c, d) enforces the requirement that c must
the underlying graph G: for a given configuration of the robots--that is, the
come before d; but b can be placed anywhere relative to these two: before
current location of each one--it’s not clear what rule we should be using to
both, between c and d, or after both. This exhausts ~11 the possibilities, and
decide how to move one of the robots next. So instead we apply an idea that
so we conclude that there are three possible topological orderings: can be very useful for situations in which we’re trying to perform this type of
a,b,c,d,e search. We observe that our problem looks a lot like a path-finding problem,
not in the original graph G but in the space of all possible configurations.
a,c,b,d,e
Let us define the following (larger) graph H. The node set of H is the set
a,c,d,b,e of all possible configurations of the robots; that is, H consists of a!! possible
pairs of nodes in G. We join two nodes of H by an edge if they represent
Solved Exercise 2 configurations that could be consecutive in a schedule; that is, (u, v) and
(u’, u’) will be joined by an edge in H if one of the pairs u, u’ or v, u’ are equal,
Some friends of yours are working on techniques for coordinating groups of
and the other pair corresponds to an edge in G.
mobile robots. Each robot has a radio transmitter that it uses to communicate
Exercises
106 Chapter 3 Graphs 107
Now we have H’, and so we just need to decide whether there is a path
We can already observe that paths in H from (a,/)) to (c, d) correspond
from (a, b) to (c, d). This can be done using the connectivity algorithm
to schedules for the robots: such a path consists precisely of a sequence of from the text in time that is linear in the number of nodes and edges
configurations in which, at each step, one robot crosses a single edge in G. of H’. Since H’ has O(n2) nodes and O(n~) edges, this final step takes
However, we have not yet encoded the notion that the schedule should be polynomial time as well.
interference-free.
To do this, we simply delete from H all nodes that correspond to configura-
tions in which there would be interference. Thus we define H~ to be the graph Exercises
obtained from H by deleting all nodes (u, v) for which the distance between
u and v in G is at most r. 1. Considhr the directed acyclic graph G in Figure 3.10. How many topolog- Figure 3.10 How many topo-
The full algorithm is then as follows. We construct the graph H’, and then ical orderings does it have? logical orderings does this
graph have?
run the connectiviW algorithm from the text to determine whether there is a Give an algorithm to detect whether a given undirected graph contains
path from (a, b) to (c, d). The correctness of the algorithm follows from the
a cycle. If the graph contains a cycle, then your algorithm should output
fact that paths in H’ correspond to schedules, and the nodes in H’ correspond
one. (It should not output all cycles in the graph, just one of them.) The
precisely to the configurations in which there is no interference.
running time of your algorithm should be O(m + n) for a graph with n
Finally, we need to consider the running time. Let n denote the number nodes and m edges.
of nodes in G, and m denote the number of edges in G. We’ll analyze the
running time by doing three things: (1) bounding the size of H’ (which will in 3. The algorithm described in Section 3.6 for computing a topological order-
general be larger than G), (2) bounding the time it takes to construct H’, and ing of a DAG repeatediy finds a node with no incoming edges and deletes
(3) bounding the time it takes to search for a path from (a, b) to (c, d) in H. it. This will eventually produce a topological ordering, provided that the
¯ input graph really is a DAG.
1. First, then, let’s consider the size of H’. H’ has at most nz nodes, since But suppose that we’re given an arbitrary graph that may or may not
its nodes correspond to pairs of nodes in G. Now, how many edges does be a DAG. Extend the topological ordering algorithm so that, given an
H’ have? A node (u, v) will have edges to (u’, v) for each neighbor u’ input directed graph G, it outputs one of two things: (a) a topological
of u in G, and to (u, v’) for each neighbor v’ of v in G. A simple upper ordering, thus establishing that a is a DAG; or (b) a cycle in G, thus
bound says that there can be at most n choices for (u’, u), and at most n establishing that a is not a DAG. The nmning time of your algorithm
choices for (u, v’), so there are at most 2n edges incident to each node should be O(m + n) for a directed graph with n nodes and m edges.
of H’. Summing over the (at most) n2 nodes of H’, we have O(n3) edges.
inspired by the example of that great Cornellian, Vladimir Nabokov, some
(We can actually give a better bound of O(mn) on the number of
of your frien.ds have become amateur lepidopterists (they study butter-
edges in H~, by using the bound (3.9) we proved in Section 3.3 on the
flies). Often when they return from a trip with specimens of butterf~es,
sum of the degrees in a graph. We’ll leave this as a further exercise.)
it is very difficult for them to tell how many distinct species they’ve
2. Now we bound the time needed to construct H’. We first build H by caught--thanks to the fact that many species look very similar to one
enumerating all pairs of nodes in G in time O(n2), and constructing edges another.
using the defiNtion above in time O(n) per node, for a total of O(n3).
One day they return with n butterflies, and thfiy believe that each
Now we need to figure out which nodes to delete from H so as to produce
belongs to one of two different species, which we’ll call A and B for
H’. We can do this as follows. For each node u in G, we run a breadth- purposes of this discussion. They’d like to divide the n specimens into
first search from u and identify all nodes u within distance r of u. We list
all these pairs (u, v) and delete them from H. Each breadth-first search two groups--those that belong to .4 and those that belong to B--but it’s
in G takes time O(m + n), and we’re doing one from each node, so the very hard for them to directly label any one specimen. So they decide to
adopt the following approach.
total time for this part is O(rnri + n2).
Chapter 3 Graphs
Exercises 109
108
the following property: at all times, eac~ device i is within 500 meters
For each pair of specimens i and j, they study them carefully side by
of at least n/2 of the other devices. (We’ll assume n is an even number.)
side. If they’re confident enough in their judgment, then they 1abe! the
What they’d like to know is: Does this property by itself guarantee that
pair (i,]) either "same" (meaning they believe them both to come from
the network will remain connected?
the same species) or "different" (meaning they believe them to come from
different species). They also have the option of rendering no judgment Here’s a concrete way to formulate the question as a claim about
on a given pair, in which case we’]] call the pair ambiguous. graphs.
So now they have the collection of n specimens, as we]] as a collection Claim: Let G be a graph on n nodes, where n is an even number. If every node
of m judgments (either "same" or "different") for the pairs that were not of G has degree at least hi2, then G is connected.
declared to be ambiguous. They’d like to know if this data is consistent
with the idea that each butterfly is from one of species A or B. So more Decide whether you think the claim is true or false, and give a proof of
concretely, we’ll declare the m judgments to be consistent if it is possible either the claim or its negation.
to label each specimen either A or/3 in such a way that for each pair (i,])
labeled "same," it is the case that i andj have the same label; and for each A number of stories In the press about the structure of the Internet and
pair (i,j) labeled "different," it is the case that i andj have different labels. the Web have focused on some version of the following question: How
They’re in the middle of tediously working out whether their judgments far apart are typical nodes in these networks? If you read these stories
are consistent, when one of them realizes that you probably have an carefully, you find that many of them are confused about the difference
algorithm that would answer this question right away. between the diameter of a network and the average distance in a network;
they often jump back and forth between these concepts as though they’re
Give an algorithm with running time O(m + n) that determines
the same thing.
whether the m judgments are consistent.
As in the text, we say that the distance between two nodes a and v
A binary tree is a rooted tree in which each node has at most two children. in a graph G = (V, E) is the minimum number of edges in a path joining
Show by induction that in any binary tree the number of nodes with two them; we’]] denote this by dist(a, u). We say that the diameter of G is
children is exactly one less than the number of leaves. the maximum distance between any pair of nodes; and we’H denote this
quantity by diam(G).
We have a connected graph G = (V, E), and a specific vertex a ~ V. Suppose Let’s define a related quantity, which we’H ca]] the average pairwise
we compute a depth-first search tree rooted at a, and obtain a tree T that distance In G (denoted apd(G)). We define apd(G) to be the average, over
includes all nodes of G. Suppose we then compute a breadth-first search
all (~) sets of two distinct nodes a and u, of the distance between a and ~.
tree rooted at a, and obtain the same tree T. Prove that G = T. (In other That is,
words, if T is both a depth-first search tree and a breadth-first search
tree rooted at a, then G cannot contain anY edges that do not belong to
T.)
Some friends of yours work on wireless networks, and they’re currently
Here’s a simple example to convince yourself that there are graphs G
studying the properties of a network of n mobile devices. As the devices
move around (actually, as their human owners move around), they defIne for which diam(G) # apd(G). Let G be a graph with ~ee nodes a, v, w, and
with the two edges {a, ~} and {v, w}. Then
a graph at any point in time as follows: there is a node representing each
of the n devices, and there is an edge between device i and device j ff the diam(G) = dist(a, w) = 2,
physical locations of i andj are no more than 500 meters apart. (if so, we
say that i and ] are "in range" of each other.) while
They’d like it to be the case that the network of devices is connected at
apd(G) = [dist(u, v) + dist(a, w) + dist(u, w)]/3 = 4/3.
all times, and so they’ve constrained the motion of the devices to satisfy
Chapter 3 Graphs Exercises 111
110
Of course, these two numbers aren’t all that far apart in the case of Of course, a single, spurious short path between nodes v and w in
this three-node graph, and so it’s natural to ask whether there s alway S a such a network may be more coincidental than anything else; a large
dose relation between them. Here’s a claim that tries to make this precise. number of short paths between u and w can be much more convincing.
So In addition to the problem of computing a single shortest v-w path
Claim: There exists a positive natural number c so that for all connected graphs in a graph G, social networks researchers have looked at the problem of
G, it is the case that determining the number of shortest u-w paths.
diam(G) This rams out to be a problem that can be solved efficiently. Suppose
apd(G) - we are given an undirected graph G = (V, E), and we identif3, two nodes v
Decide whether you think the claim is true or false, and give a proof of and w in G. Give an algorithm that computes the number of shortest u-w
paths In G. (The algorithm should not list all the paths; just the number
either the claim or its negation.
suffices.) The nmning time of your algorithm should be O(m + n) for a
~q~There’s a natural intuition that two nodes that are far apart in a com- graph with n nodes and m edges.
munication network--separated by many hops--have a more tenuous
connection than two nodes that are close together. There are a number 11. You’re helping some security analysts monitor a collection of networked
of algorithmic results that are based to some extent on different ways of computers, tracking the spread of an online virus. There are n computers
making this notion precise. Here’s one that involves the susceptibiliw of in the system, labeled C1, C2 ..... Cn, and as input you’re given a collection
paths to the deletion of nodes. of trace data Indicating the times at which pairs of computers commu-
Suppose that an n-node undirected graph G = (V, E) contains two nicated. Thus the data is a sequence of ordered triples (Ci, Cj, tk); such a
nodes s and t such that the distance between s and t is strictly greater triple indicates that Ci and Cj exchanged bits at time tk. There are m triples
than n/2. Show that there must exist some node u, not equal to either s total.
or t, such that deleting v from G destroys all s-t paths. (In other words, We’ll assume that the tTiples are presented to you in sorted order of
the graph obtained from G by deleting v contains no path from s to t.) time. For purposes of simplicity, we’ll assume that each pair of computers
Give an algorithm with runnin~ time O(m + n) to find such a node v. communicates at most once during the interval you’re observing.
The security analysts you’re working with would like to be able to
10, A number of art museums around the countts, have been featuring work
by an artist named Mark Lombardi (1951-2000), consisting of a set of answer questions of the following form: If the virus was inserted into
intricately rendered graphs. Building on a great deal of research, these computer Ca at time x, could it possibly have infected computer Cb by
graphs encode the relationships among people involved in major political time y? The mechanics of infection are simple: if an infected computer
scandals over the past several decades: the nodes correspond to partici- Ci communicates with an uninfected computer Cj at time t~ (in other
pants, and each edge indicates some type of relationship between a pair words, if one of the triples (Ci, Cp t~) or (Cj, Ci, t~) appears In the trace
of participants. And so, if you peer c!osely enough at the drawings, you data), then computer Ci becomes infected as well, starting at time t~.
can trace out ominous-looking paths from a high-ranking U.S. govern- Infection can thus spread from one machine to another across a sequence
ment official, to a former business partner, to a bank in Switzerland, to of communications, provided that no step in this sequence involves a
move backward in time. Thus, for example, If Ci is infected by time tk,
a shadowy arms dealer.
and the trace data contains triples (Ci, Cj, tD and (Cp Cq, tr), where tk <_ tr,
Such pictures form striking examples of social networks, which, as
then Ca will become infected via C~. (Note that it is Okay for t~ to be equal
we discussed in Section 3.1, have nodes representing people and organi-
to 6; this would mean that Cj had open connections to both Ci and Cq at
zations, and edges representing relationships of various kinds. And the
the same time, and so a virus could move from Ci to Ca.)
short paths that abound in these networks have attracted considerable
attention recently, as people ponder what they mean. In the case of Mark For example, suppose n = 4, the trace data consists of the triples
Lombardi’s graphs, they hint at the short set of steps that can carry you
(Ci, C2, 4), (C2, C4, 8), (C3, C4, 8), (Cl, C4, 12),
from the reputable to the disreputable.
Notes and Further Reading 113
Chapter 3 Graphs
112
Euler (1736), grown through interest in graph representations of maps and
and the virus was inserted into computer C1 at time 2. Then C3 would be
chemical compounds in the nineteenth century, and emerged as a systematic
infected at time 8 by a sequence of three steps: first C2 becomes ~ected
area of study in the twentieth century, first as a branch of mathematics and later
at time 4, then C4 gets the virus from C2 at time 8, and then G3 gets the
also through its applications to computer science. The books by Berge (1976),
virus from C4 at time 8. On the other hand, if the trace data were Bollobas (1998), and Diestel (2000) provide substantial further coverage of
8), (C1, C4, 12), (C~0 Cz, 14), graph theory. Recently, extensive data has become available for studying large
networks that arise in the physical, biological, and social sciences, and there
and again the virus was inserted into computer C1 at time 2, then C3 has been interest in understanding properties of networks that span all these
would not become infected during the period of observation: although different domains. The books by Barabasi (2002) and Watts (2002) discuss this
¢z becomes infected at time 14, we see that ¢~ only communicates with Cz emerging area of research, with presentations aimed at a general audience.
before ~2 was infected. There is no sequence of communications moving
The basic graph traversal techniques covered in this chapter have numer-
forward in time by which the virus could get from C1 to C~ in this second
ous applications. We will see a number of these in subsequent chapters, and
example. we refer the reader to the book by Tarjan (1983) for further results.
Design an algorithm that answers questions of this type: given a
Notes on the Exercises Exercise 12 is based on a result of Martin Golumbic
collection of trace data, the algorithm should decide whether a virus
and Ron Shamir.
introduced at computer Ca at time x could have infected computer Cb
by time y. The algorithm sh6uld run in time O(rn + n).
12. You’re helping a group of ethnographers analyze some oral history data
they’ve collected by interviewing members of a village to learn about the
fives of people who’ve lived there over the past two hundred years.
From these interviews, they’ve learned about a set of n people (all
Vn" They’ve also
collected facts about when these people lived relative to one another.
Each fact has one of the following two forms:
* For some i and j, person Vi died before person Pj was born; or
o for some i and j, the life spans of Vi and Pj overlapped at least partially.
Naturally, they’re not sure that all these facts are correct; memories
are not so good, and a lot of this was passed down by word of mouth. So
what they’d like you to determine is whether the data they’ve collected is
at least internally consistent, in the sense that there could have existed a
set of people for which all the facts they’ve learned simultaneously hold.
Give an efficient algorithm to do this: either it should produce pro-
posed dates of birth and death for each of the n people so that all the facts
hold true, or it should report (correctly) that no such dates can exist--that
is, the facts collected by the ethnographers are not internally consistent.
In Wall Street, that iconic movie of the 1980s, Michael Douglas gets up in
front of a room full of stockholders and proclaims, "Greed... is good. Greed
is right. Greed works." In this chapter, we’ll be taking a much more understated
perspective as we investigate the pros and cons of short-sighted greed in the
design of algorithms. Indeed, our aim is to approach a number of different
computational problems with a recurring set of questions: Is greed good? Does
greed work?
It is hard, if not impossible, to define precisely what is meant by a greedy
algorithm. An algorithm is greedy if it builds up a solution in sma!l steps,
choosing a decision at each step myopically to optimize some underlying
criterion. One can often design many different greedy algorithms for the same
problem, each one locally, incrementally optimizing some different measure
on its way to a solution.
When a greedy algorithm succeeds in solving a nontrivial problem opti-
mally, it typically implies something interesting and useful about the structure
of the problem itself; there is a local decision role that one can use to con-
struct optimal solutions. And as we’ll see later, in Chapter 11, the same is true
of problems in which a greedy algorithm can produce a solution that is guar-
anteed to be close to optimal, even if it does not achieve the precise optimum.
These are the kinds of issues we’ll be dealing with in this chapter. It’s easy to
invent greedy algorithms for almost any problem; finding cases in which they
work well, and proving that they work well, is the interesting challenge.
The first two sections of this chapter will develop two basic methods for
proving that a greedy algorithm produces an-optimal solution to a problem.
One can view the first approach as establishing that the greedy algorithm stays
ahead. By this we mean that if one measures the greedy algorithm’s progress
4.1 Interval Scheduling: The Greedy Mgorithm Stays Ahead 117
Chapter 4 Greedy Algorithms
116
The most obvious rule might be to always select the available request
in a step-by-step fashion, one sees that it does better than any other algorithm that starts earliest--that is, the one with minimal start time s(i). This
at each step; it then follows that it produces an optimal solution. The second way our resource starts being used as quickly as possible.
approach is known as an exchange argument, and it is more general: one
considers any possible solution to the problem and gradually transforms it This method does not yield an optimal solution. If the earliest request
into the solution found by the greedy algorithm without hurting its quality. i is for a very long interval, then by accepting request i we may have to
Again, it will follow that the greedy algorithm must have found a solution that reject a lot of requests for shorter time intervals. Since our goal is to satisfy
as many requests as possible, we will end up with a suboptimal solution.
is at least as good as any other solution.
In a really bad case--say, when the finish time f(i) is the maximum
Following our introduction of these two styles of analysis, we focus on
among al! requests--the accepted request i keeps our resource occupied
several of the most well-known applications of greedy algorithms: shortest for the whole time. In this case our greedy method would accept a single
paths in a graph, the Minimum Spanning Tree Problem, and the construc- request, while the optimal solution could accept many. Such a situation
tion of Huff-man codes for performing data compression. They each provide is depicted in Figure 4.1 (a).
nice examples of our analysis techniques. We also explore an interesting re-
lationship between minimum spanning trees and the long-studied problem of This might suggest that we should start out by accepting the request that
clustering. Finally, we consider a more complex application, the Minimum- requires the smallest interval of time--namely, the request for which
Cost Arborescence Problem, which further extends our notion of what a greedy f(i)- s(i) is as small as possible. As it turns out, this is a somewhat
better rule than the previous one, but it still can produce a suboptimal
algorithm is.
schedule. For example, in Figure 4.!(b), accepting the short interval in
the middle would prevent us from accepting the other two, which form
4.1 Interval Scheduling: The Greedy Algorithm an optimal solution.
Stays Ahead
Let’s recall the Interval Scheduling Problem, which was the first of the five
representative problems we considered in Chapter 1. We have a set of requests
{1, 2 ..... n}; the ith request corresponds to an interval of time starting at s(i) I
and finishing at f(i). (Note that we are slightly changing the notation from
Section 1.2, where we used si rather than s(i) and fi rather than f(i). This (a)
change of notation will make things easier to talk about in the proofs.) We’ll
say that a subset of the requests is compatible if no two of them overlap in time,
l
and our goal is to accept as large a compatible subset as possible. Compatible
sets of maximum size will be called optimaL.
A greedy rule that does lead to the optimal solution is based on a fourth
idea: we should accept first the request that finishes first, that is, the reques~t i 8
for which f(i) is as small as possible. This is also quite a natural idea: we ensure
that our resource becomes free as soon as possible while still satisfying one’ Selecting interval
request. In this way we can maximize the time left to satisfy other requests.
Figure 4.2 Sample run of the Interval Scheduling Algorithm. At each step the selected
Let us state the algorithm a bit more formally. We will use R to denote intervals are darker lines, and the intervals deleted at the corresponding step are
the set of requests that we have neither accepted nor rejected yet, and use A indicated with dashed lines.
to denote the set of accepted requests. For an example of how the algorithm
runs, see Figure 4.2.
What we need to show is that this solution is optimal. So, for purposes of
Initially let R be the set of all requests, and let A be empty comparison, let (9 be an optimal set of intervals. Ideally one might want to show
While R is n6t yet empty that A = (9, but this is too much to ask: there may be many optimal solutions,
Choose a request i ~R that has the smallest finishing time and at best A is equal to a single one of them. So instead we will simply show
Add request i to A that ]A] = 1(91, that is, that A contains the same number of intervals as (9 and
Delete all requests from R that axe not compatible with request i hence is also an optimal solution.
EndWhile The idea underlying the proof, as we suggested initially, wil! be to find
Keturn the set A as the set of accepted requests a sense inwhich our greedy algorithm "stays ahead" of this solution (9. We
will compare the partial solutions that the greedy algorithm constructs to initial
segments of the solution (9, and show that the greedy algorithm is doing better
~ Analyzing the Algorithm in a step-by-step fashion.
While this greedy method is quite natural, it is certainly not obvious that it We introduce some notation to help with this proof. Let il ..... ik be the set
returns an optimal set of intervals. Indeed, it would only be sensible to reserve of requests in A in the order they were added to A. Note that IAI --- k. Similarly,
judgment on its optimality: the ideas that led to the previous nonoptimal let the set of requests in (9 be denoted by jl ..... Jrn. Our goal is to prove that
versions of the greedy method also seemed promising at first. k = m. Assume that the requests in (9 are also ordered in the natural left-to-
As a start, we can immediately declare that the intervals in the set A right order of the corresponding intervals, that is, in the order of the start and
returned by the algorithm are all compatible. finish points. Note that the requests in (9 are compatible, which implies that
the start points have the same order as the finish points.
(4.1) A is a compatible set of requests.
Chapter 4 Greedy Algorithms 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead 121
120
I~
C~an the greedy algorithm’s
interval really finish later?)
(4.3) The greedy algorithm returns an optimal set A.
lr-1 ir ? Proof. We will prove the statement by contradiction. If A is not optimal, then
I
Jr-1
an optimal set (9 must have more requests, that is, we must have m > k.
I Applying (4.2) with r = k, we get that f(ik) < f(Jk). Since m > k, there is a
request Jk+~ in (9. This request starts after request jk ends, and hence after
Figure 4.3 The inductive step in the proof that the greedy algorithm stays ahead. ik ends. So after deleting all requests that are not compatible with requests
ik, the set of possible requests R still contains Jk+l. But the greedy
algorithm stops with request ik, and it is only supposed to stop when R is
Our intuition for the greedy method came from wanting our resource to empty--a contradiction. []
become flee again as soon as possible after satisfying the first request. And
indeed, our greedy rule guarantees that f(i1) < f(Jl). This is the sense in which Implementation and Running Time We can make our algorithm run in time
we want to show that our greedy rule "stays ahead"--that each of its intervals O(n log n) as follows. We begin by sorting the n requests in order of finishing
finishes at least as soon as the corresponding interval in the set O. Thus we now
time and labeling them in this order; that is, we will assume that f(i) < f(j)
prove that for each r >_ !, the rth accepted request in the algorithm’s schedule when i <j. This takes time O(n log n). In an additional O(n) time, we construct
finishes no later than the rth request in the optimal schedule.
an array S[1... n] with the property that S[i] contains the value s(i).
(4.2) For all indices r < k rye have f (ir) <_ [(Jr)- We now select requests by processing the intervals in order of increasing
f(i). We always select the first interval; we then iterate through the intervals in
Proof. We will prove this statement by induction. For r = 1 the statement is order until reaching the first interval ] for which s(j) > f(1); we then select this
clearly true: the algorithm starts by selecting the request i1 with minimum one as well. More generally, if the most recent interval we’ve selected ends
finish time. at time f, we continue iterating through subsequent intervals until we reach
Now let r > 1. We will assume as our induction hypothesis that the the first ] for which s(J) _> f. In this way, we implement the greedy algorithm
statement is true for r- 1, and we will try to prove it for r. As shown in analyzed above in one pass through the intervals, spending constant time per
Figure 4.3, the induction hypothesis lets us assume that f(ir_1) _< f(Jr-1)- In interval. Thus this part of the algorithm takes time O(n).
order for the algorithm’s rth interval not to finish earlier as well, it would
need to "fall behind" as shown. But there’s a simple reason why this could Extensions
not happen: rather than choose a later-finishing interval, the greedy algorithm
The Interval Scheduling Problem we considered here is a quite simple schedul-
always has the option (at worst) of choosing jr and thus fulfilling the induction
ing problem. There are many further complications that could arise in practical
step.
settings. The following point out issues that we will see later in the book in
We can make this argument precise as follows. We know (since (9 consists various forms.
of compatible intervals) that f(Jr-1) -< s(Jr). Combining this with the induction
hypothesis f(ir_1) < f(jr-1), we get f(ir_1) < s(Jr). Thus the interval Jr is in the In defining the problem, we assumed that all requests were known to
set R of available intervals at the time when the greedy algorithm selects the scheduling algorithm when it was choosing the compatible subset.
The greedy algorithm selects the available interval with smallest finish time; It would also be natural, of course, to think about the version of the
since interval Jr is one of these available intervals, we have f(ir) < f(Jr). This problem in which the scheduler needs to make decisions about accepting
completes the induction step. z or rejecting certain requests before knowing about the full set of requests.
Customers (requestors) may well be impatient, and they may give up
Thus we have formalized the sense in which the greedy algorithm is and leave if the scheduler waits too long to gather information about all
remaining ahead of (9: for each r, the rth interval it selects finishes at least other requests. An active area of research is concerned with such on-
as soon as the rth interval in (9. We now see why this implies the optimality line algorithms, which must make decisions as time proceeds, without
of the greedy algorithm’s set A. knowledge of future input.
4.1. Interval Scheduling: The Greedy Algorithm Stays Ahead
122 Chapter 4 Greedy Algorithms 123
Our goal was to maximize the number of satisfied requests. But we could e lj
picture a situation in which each request has a different value to us. For c d
I 1 I l I
example, each request i could also have a value vi (the amount gained b h
by satisfying request i), and the goal would be to maximize our income: a i
the sum of the values of all satisfied requests. This leads to the Weighted
Interval Scheduling Problem, the second of the representative problems (a)
we described in Chapter 1.
There are many other variants and combinations that can arise. We now
discuss one of these further variants in more detail, since it forms another case
in which a greedy algorithm can be used to produce an optimal solution.
II
Consider the jobs i=l ..... n in this order ~he main step in showing the optimality of our algorithm is to establish
Assign job i to the time interval from s(f)----/ to f(O=f+ti that there is an optimal schedule that has no inversions and no idle time.-~Fo do
Let f = f + ti this, we will start with any optimal schedule having no idle time; we will then
End convert it into a schedule with no inversions without increasing its maximum
Keturn the set of scheduled intervals [s(O,/(0] for f= 1 ..... n lateness. Thus the resulting schedt~ng after this conversion will be optimal
as wel!.
~ Analyzing the Algorithm (4.9) There is an optimal schedule that has no inversions and no idle time.
To reason about the optimality of the algorithm, we first observe that the
schedule it produces has no "gaps’--times when the machine is not working Proof. By (4.7), there is an optimal schedule (9 with no idle time. The proof
yet there are iobs left. The time that passes during a gap will be called idle will consist of a sequence of statements. The first of these is simple to establish.
time: there is work to be done, yet for some reason the machine is sitting idle.
Not only does the schedule A produced by our algorithm have no idle time; (a) If (9 has an inversion, then there is a pair of jobs i and j such that j is
it is also very easy to see that there is an optimal schedule with this property. scheduled immediately after i and has d] < di.
We do not write down a proof for this. Ihdeed, consider an inversion in which a iob a is scheduled sometime before
a iob b, and da > db. If we advance in the scheduled order of jobs from a to b
(4.7) There is an optimal schedule with no idle time. one at a time, there has to come a point at which the deadline we see decreases
Now, how can we prove that our schedule A is optimal, that is, its for the first time. This corresponds to a pair of consecutive iobs that form an
maximum lateness L is as small as possible? As in previous analyses, we wi~ inversion.
start by considering an optimal schedule (9. Our plan here is to gradually Now suppose (9 has at least one inversion, and by (a), let i andj be a pair of
modify (9, preserving its optimality at each step, but eventually transforming inverted requests that are consecutive in the scheduled order. We wil! decrease
it into a schedule that is identical to the schedule A found by the greedy the number of inversions in 0 by swapping the requests i andj in the schedule
algorithm. We refer to this type of analysis as an exchange argument, and we O. The pair (i, j) formed an inversion in (9, this inversion is eliminated by the
will see that it is a powerful way to think about greedy algorithms in general. swap, and no new inversions are created. Thus we have
We first try characterizing schedules in the following way. We say that a
(b) After swapping i and ] we get a schedule with one less inversion.
schedule A’ has an inversion if a job i with deadline di is scheduled before
another job j with earlier deadline d] < di. Notice that, by definition, the The hardest part of this proof is to argue that the inverted schedule is also
schedule A produced by our algorithm has no inversions. If there are jobs optimal.
with identical deadlines then there can be many different schedules with no
(c) The new swapped schedule has a maximum lateness no larger than that
inversions. However, we can show that all these schedules have the same of O.
maximum lateness L.
It is clear that if we can prove (c), then we are done.- The initial schedule 0
(4.8) All schedules with no inversions and no idle time have the same can have at most (~) inversions (if all pairs are inverted), and hence after at
maximum lateness. most (~) swaps we get an optimal schedule with no inversions.
Proof. If two different schedules have neither inversions nor idle time, then So we now conclude by proving (c), showing that b~; swapping a pair of
they might not produce exactly the same order of jobs, but they can only differ consecutive, inverted jobs, we do not increase the maximum lateness L of the
in the order in which jobs with identical deadlines are scheduled. Consider schedule. []
such a deadline d. In both schedules, the jobs with deadline d are all scheduled
consecutively (after all jobs with earlier deadlines and before all jobs with Proof of (c). We invent some notation to describe the schedule (9: assume
later deadlines). Among the jobs with deadline d, the last one has the greatest that eachl’r.
lateness request
Let L’r =ismax
scheduled for the
r l’r denote thetime intervallateness
maximum [s(r), f(r)] andschedule.
of this has
lateness, and this lateness does not depend on the order of the jobs. []
131
(a) Extensions
There are many possible generalizations of this scheduling problem. For ex-
After swapping: ample, we assumed that all jobs were available to start at the common start
time s. A natural, but harder, version of this problem would contain requests i
that, in addition to the deadline dz and the requested time t~, would also have
an earliest possible starting time ri. This earliest possible starting time is usu-
dI di
ally referred to as the release time. Problems with release times arise natura!ly
in scheduling problems where requests can take the form: Can I reserve the
Figure 4.6 The effect of swapping two consecutive, inverted jobs. room for a two-hour lecture, sometime between 1 P.M. and 5 P.M.? Our proof
that the greedy algorithm finds an optimal solution relied crucially on the fact
that all jobs were available at the common start time s. (Do you see where?)
Unfortunately, as we wi!l see later in the book, in Chapter 8, this more general
Let ~ denote the swapped schedule; we will use ~(r), ~(r), ~r, and ~ to denote
version of the problem is much more difficult to solve optimally.
the corresponding quantities in the swapped schedule.
Now recall our two adiacent, inverted jobs i and]. The situation is roughly
as pictured in Figure 4.6. The finishing time of] before the swap is exactly equal 4.3 Optimal Caching: A More Complex Exchange
to the finishing time of i after the swap. Thus all jobs other than jobs i and ] Argument
finish at the same time in the two schedules. Moreover, job j will get finished We now consider a problem that involves processing a sequence of requests
earlier in the new schedule, and hence the swap does not increase the lateness of a different form, and we develop an aig~~rithm whose analysis requires
of job j. a more subtle use of the exchange argumem. The problem is that of cache
Thus the only thing to worry about is job i: its lateness may have been maintenance.
increased, and what if this actually raises the maximum lateness of the
whole schedule? After the swap, job i finishes at time f(j), when job j was
~.~ The Problem
finished in the schedule (9. If job i is late in this new schedule, its lateness
is ~i = ~(i) - di = f(j) -di. But the crucial point is that i cannot be more late To motivate caching, consider the following situation. You’re working on a
in the schedule -~ than j was in the schedule (9. Specifically, our assumption long research paper, and your draconian library will only allow you to have
eight books checked out at once. You know that you’ll probably need more
d~ > dj implies that
than this over the course of working on the paper, but dt any point in time,
~ = f(]) -di < f(]) - di = l~. you’d like to have ready access to the eight books that are most relevant at
that tirng. How should you decide which books to check out, and when should
Since the lateness of the schedule (9 was L’ >_ lj > ~, this shows that the swap you return some in exchange for others, to minimize the number of times you
does not increase the maximum lateness of the schedule. [] have to exchange a book at the library?
This is precisely the problem that arises when dealing with a memory
The optimality of our greedy algorithm now follows immediately. hierarchy: There is a small amount of data that can be accessed very quickly,
4.3 Optimal Caching: A More Complex Exchange Argument
Chapter 4 Greedy Algorithms 133
132
Thus, on a particular sequence of memory references, a cache main-
and a large amount of data that requires more time to access; and you must
tenance algorithm determines an eviction schedule--specifying which items
decide which pieces of data to have close at hand. should be evicted from the cache at which points in the sequence--and t_his
Memory hierarchies have been a ubiquitous feature of computers since determines the contents of the cache and the number of misses over time. Let’s
very early in their history. To begin with, data in the main memory of a consider an example of this process.
processor can be accessed much more quickly than the data on its hard disk;
but the disk has much more storage capaciW. Thus, it is important to keep Suppose we have three items [a, b, c], the cache size is k = 2, and we
the most regularly used pieces o~ data in main memory, and go to disk as are presented with the sequence
infrequently as possible. The same phenomenon, qualitatively, occurs with
a,b,c,b,c,a,b.
on-chip caches in modern processors. These can be accessed in a few cycles,
and so data can be retrieved from cache much more quickly than it can be Suppose that the cache initially contains the items a and b. Then on the
retrieved from main memory. This is another level of hierarchy: smal! caches third item in the sequence, we could evict a so as to bring in c; and
have faster access time than main memory, which in turn is smaller and faster on the sixth item we could evict c so as to bring in a; we thereby incur
to access than disk. And one can see extensions of this hierarchy in many two cache misses over the whole sequence. After thinking about it, one
other settings. When one uses a Web browser, the disk often acts as a cache concludes that any eviction schedule for this sequence must include at
for frequently visited Web pages, since going to disk is stil! much faster than least two cache misses.
downloading something over the Internet.
Under real operating conditions, cache maintenance algorithms must
Caching is a general term for the process of storing a small amount of dat~ process memory references dl, d2 .... without knowledge of what’s coming
in a fast memory so as to reduce the amount of time spent interacting with a in the future; but for purposes of evaluating the quality of these algorithms,
slow memory. In the previous examples, the on-chip cache reduces the need systems researchers very early on sought to understand the nature of the
to fetch data from main memory, the main memory acts as a cache for the optimal solution to the caching problem. Given a fifll sequence S of memory
disk, and the disk acts as a cache for the Internet. (Much as your desk acts as references, what is the eviction schedule that incurs as few cache misses as
a cache for the campus library, and the assorted facts you’re able to remember possible?
without looMng them up constitute a cache for the books on your desk.)
For caching to be as effective as possible, it should generally be the case ~ Designing and Analyzing the Algorithm
that when you go to access a piece of data, it is already in the cache. To achieve
In the 1960s, Les Belady showed that the following simple rule will always
this, a cache maintenance algorithm determines what to keep in the cache and
what to evict from the cache when new data needs to be brought in. incur the minimum number of misses:
with k = 3 and items {a, b, c} initially in the cache. The Farthest-in-Future rule Proving the Optimalthy of Farthest-in-Future We now proceed with the
will produce a schedule S that evicts c on the fourth step and b on the seventh exchange argument showing that Farthest-in-Future is optimal. Consider an
step. But there are other eviction schedules that are just as good. Consider arbitrary sequence D of memory references; let S~F denote the schedule
the schedule S’ that evicts b on the fourth step and c on the seventh step, produced by Farthest-in-Future, and let S* denote a schedule that incurs the
incurring the same number of misses. So in fact it’s easy to find cases where minimum possible number of misses. We will now gradually "transform" the
schedules produced by rules other than Farthest-in-Future are also optimal; schedule S* into the schedule SEE, one eviction decision at a time, without
and given this flexibility, why might a deviation from Farthest-in-Future early increasing the number of misses.
on not yield an actual savings farther along in the sequence.~ For example, on Here is the basic fact we use to perform one step in the transformation.
the seventh step in our example, the schedule S’ is actually evicting an item
(c) that is needed farther into the future than the item evicted at this point by (4.12) Let S be a reduced scheduIe that makes the same eviction deasions
Farthest-in-Future, since Farthest-in-Future gave up c earlier on. as SEE through the first j items in the sequence, for a number j. Then there is a
reduced schedule S’ that makes the same eviction decisions as SEE through the
These are some of the kinds of things one should worry about before
first ] + 1 items, and incurs no more misses than S does.
concluding that Farthest-in-Future really is optimal. In thinking about the
example above, we quickly appreciate that it doesn’t really matter whether
b or c is evicted at the fourth step, since the other one should be evicted at Proof. Consider the (j + 1)st request, to item d = dy+l. Since S and SEE have
agreed up to this point, they have the same cache contents. If d is in the cache
the seventh step; so given a schedule where b is evicted first, we can swap
the choices of b and c without changing the cost. This reasoning--swapping, for both, then no eviction decision is necessary (both schedules are reduced),
one decision for another--forms the first outline of an exchange argument that and so S in fact agrees with SEE through step j + 1, and we can set S’ = S.
Similarly, if d needs to be brought into the cache, but S and SEE both evict the
proves the optimality of Farthest-in-Future. same item to make room for d, then we can again set S’ = S.
Before delving into this analysis, let’s clear up one important issue. Al!
So the interesting case arises when d needs to be brought into the cache,
the cache maintenance algorithms we’ve been considering so far produce
schedules that only bring an item d into the cache .~n a step i if there is a and to do this S evicts item f while SEE evicts item e ~ f. Here S and SEE do
request to d in step i, and d is not already in the cache. Let us ca!l such a not already agree through step j + ! since S has e in cache while SEE has f in
schedule reduced--it does the minimal amount of work necessary in a given cache. Hence we must actually do something nontrivial to construct S’.
step. But in general one could imagine an algorithm that produced schedules As a first step, we should have S’ evict e rather than f. Now we need to
that are not reduced, by bringing in items in steps when they are not requested. further ensure that S’ incurs no more misses than S. An easy way to do this
We now show that for every nonreduced schedule, there is an equally good would be to have S’ agree with S for the remainder of the sequence; but this
reduced schedule. is no longer possible, since S and S’ have slightly different caches from this
point onward. So instead we’ll have S’ try to get its cache back to the same
Let S be a schedule that may not be reduced. We define a new schedule
state as S as quickly as possible, while not incurring unnecessary misses. Once
-~--the reduction of S--as follows. In any step i where S brings in an item d
the caches are the same, we can finish the construction of S’ by just having it
that has not been requested, our construction of ~ "pretends" to do this but
behave like S.
actually leaves d in main memory. It only really brings d into the cache in
the next step j after this in which d is requested. In this way, the cache miss Specifically, from request j + 2 onward, S’ behaves exactly like S until one
incurred by ~ in step j can be charged to the earlier cache operation performed of the following things happens for the first time.
by S in step i, when it brought in d. Hence we have the following fact. (i) There is a request to an item g ~ e, f that is not in the cache of S, and S
evicts e to make room for it. Since S’ and S only differ on e and f, it must
(4.11) -~ is a reduced schedule that brings in at most as many items as the be that g is not in the cache of S’ either; so we can have S’ evict f, and
schedule S. now the caches of S and S’ are the same. We can then have S’ behave
exactly like S for the rest of the sequence.
Note that for any reduced schedule, the number of items that are brought (ii) There is a request to f, and S evicts an item e’. If e’ = e, then we’re all
set: S’ can simply access f from the cache, and after this step the caches
in is exactly the number of misses.
Chapter 4 Greedy Algorithms 4.4 Shortest Paths in a Graph
136 137
d’(v) = mine=(a,v):a~s d(a) + ~e. We choose the node v e V-S for which t~s
quantity is minimized, add v to S, and define d(v) to be the value d’(v). 3
Set S
Figure 4.8 The shortest path Pv and an alternate s-v path P through the node
Implementation and Running Time To conclude our discussion of Dijkstra’s
Algorithm, we consider its running time. There are n - 1 iterations of the
krt~±].e loop for a graph with n nodes, as each iteration adds a new node v
to S. Selecting the correct node u efficiently is a more subtle issue. One’s first
long as Pv by the time it has left the set S. Indeed, in iteration k + 1, Dijkstra’s
impression is that each iteration would have to consider each node v ~ S,
Algorithm must have considered adding node y to the set S via the edge (x, y)
and go through all the edges between S and u to determine the minimum
and rejected this option in favor of adding u. This means that there is no path
mine=(u,u):u~s d(u)+g-e, so that we can select the node v for which this
from s to y through x that is shorter than Pv- But the subpath of P up to y is
minimum is smallest. For a graph with m edges, computing all these minima
such a path, and so this subpath is at least as long as P,. Since edge length~
can take O(m) time, so this would lead to an implementation that runs in
are nonnegative, the full path P is at least as long as P, as well.
O(mn) time.
This is a complete proof; one can also spell out the argument in the
previous paragraph using the following inequalities. Let P’ be the Subpath We can do considerably better if we use the right data structures. First, we
of P from s to x. Since x ~ S, we know by the induction hypothesis that Px is a will explicitly maintain the values of the minima d’(u) = mJne=(u,u):u~s d(u) +
shortest s-x path (of length d(x)), and so g(P’) > g(Px) = d(x). Thus the subpath ~e for each node v V - S, rather than recomputing them in each iteration.
We can further improve the efficiency by keeping the nodes V - S in a priority
of P out to node y has length ~(P’) + g(x, y) > d(x) + g.(x, y) > d’(y), and the
full path P is at least as long as this subpath. Finally, since Dijkstra’s Algorithm queue with d’(u) as their keys. Priority queues were discussed in Chapter 2;
selected u in this iteration, we know that d’(y) >_ d’(u) = g(Pv). Combining these they are data structures designed to maintain a set of n elements, each with a
key. A priority queue can efficiently insert elements, delete elements, change
inequalities shows that g(P) >_ ~(P’) + ~.(x, y) >_ g(P~). ’~
an element’s key, and extract the element with the minimum key. We will need
the third and fourth of the above operations: ChangeKey and Ex~cractN±n.
Here are two observations about Dijkstra’s Algorithm and its analysis.
First, the algorithm does not always find shortest paths if some of the edges How do we implement Dijkstra’s Algorithm using a priority queue? We put
can have negative lengths. (Do you see where the proof breaks?) Many the nodes V in a priority queue with d’(u) as the key for u ~ V. To select the node
shortest-path applications involve negative edge lengths, and a more com- v that should be added to the set S, we need the Extrac~cN±n operation. To see
plex algorithm--due to Bellman and Ford--is required for this case. We will how to update the keys, consider an iteration in which node u is added to S, and
see this algorithm when we consider the topic of dynamic programming. let tv ~ S be a node that remains in the priority queue. What do we have to do
to update the value of d’(w)? If (v, w) is not an edge, then we don’t have to do
The second observation is that Dijkstra’s Algorithm is, in a sense, even
anything: the set of edges considered in the minimum mihe=(u,w):a~s d(u) + ~e
simpler than we’ve described here. Dijkstra’s Algorithm is really a "contin-
is exactly the same before and after adding v to S. If e’ = (v, w) ~ E, on
uous" version of the standard breadth-first search algorithm for traversing a
the other hand, then the new value for the key is min(d’(w), d(u) + ~-e’). If
graph, and it can be motivated by the following physical intuition. Suppose
d’(ro) > d(u) + ~e’ then we need to use the ChangeKey operation to decrease
the edges of G formed a system of pipes filled with water, joined together at
the key of node w appropriately. This ChangeKey operation can occur at most
the nodes; each edge e has length ge and a fixed cross-sectional area. Now
once per edge, when the tail of the edge e’ is added to S. In summary, we have
suppose an extra droplet of water falls at node s and starts a wave from s. As
the following result.
the wave expands out of node s at a constant speed, the expanding sphere
Chapter 4 Greedy Algorithms 4.5 The Minimum Spanning Tree Problem 143
142
cycles until we had a tree; with nonnegative edges, the cost would not increase
(4,1S) Using a priority queue, Di]kstra’s Algorithm can be implemented on during this process.
a graph with n nodes and m edges to run in O(m) time, plus the time for n We will call a subset T __c E a spanning tree of G if (V, T) is a tree. Statement
Extrac~Min and m ChamgeKey operations. (4.16) says that the goal of our network design problem can be rephrased as
that of finding the cheapest spanning tree of the graph; for this reason, it
Using the heap-based priority queue implementation discussed in Chap- is generally called the Minimum Spanning Tree Problem. Unless G is a very
ter 2, each priority queue operation can be made to run in O(log n) time. Thus simple graph, it will have exponentially many different spanning trees, whose
the overall time for the implementation is O(m log r~). structures may look very different from one another. So it is not at all clear
how to efficiently find the cheapest tree from among all these options.
4.5 The Minimum Spanning Tree Problem
We now apply an exchange argument in the context of a second fundamental fi Designing Algorithms
problem on graphs: the Minimum Spanning Tree Problem. As with the previous problems we’ve seen, it is easy to come up with a number
of natural greedy algorithms for the problem. But curiously, and fortunately,
~ The Problem this is a case where many of the first greedy algorithms one tries turn out to be
correct: they each solve the problem optimally. We will review a few of these
Suppose we have a set of locations V = {vl, v2 ..... vn}, and we want to build a
communication network on top of them. The network should be connected-- algorithms now and then discover, via a nice pair of exchange arguments, some
there should be a path between every pair of nodes--but subiect to this’ of the underlying reasons for this plethora of simple, optimal algorithms.
requirement, we wish to build it as cheaply as possible. Here are three greedy algorithms, each of which correctly finds a minimum
For certain pairs (vi, vj), we may build a direct link between vi and vj for spanning tree.
a certain cost c(vi, vj) > 0. Thus we can represent the set of possible links that One simple algorithm starts without any edges at all and builds a span-
may be built using a graph G = (V, E), with a positive cost Ce associated with ning tree by successively inserting edges from E in order of increasing
each edge e = (vi, vj). The problem is to find a subset of the edges T_ E so cost. As we move through the edges in this order, we insert each edge
that the graph (V, T) is connected, and the total cost ~e~T Ce is as small as e as long as it does not create a cycle when added to the edges we’ve
possible. (We will assume that thefull graph G is connected; otherwise, no already inserted. If, on the other hand, inserting e would result in a cycle,
solution is possible.) then we simply discard e and continue. This approach is called Kruskal’s
Here is a basic observation. Algorithm.
Another simple greedy algorithm can be designed by analogy with Dijk-
(4.16) Let T be a minimum-cost solution to the network design problem
stra’s Algorithm for paths, although, in fact, it is even simpler to specify
defined above. Then (V, T) is a tree. than Dijkstra’s Algorithm. We start with a root node s and try to greedily
Proof. By definition, (V, T) must be connected; we show that it also will grow a tree from s outward. At each step, we simply add the node that
contain no cycles. Indeed, suppose it contained a cycle C, and let e be any can be attached as cheaply as possibly to the partial tree we already have.
edge on C. We claim that (V, T - {e}) is still connected, since any path that More concretely, we maintain a set S _c V on which a spanning tree
previously used the edge e can now go. "the long way" around the remainder has been constructed so far. Initially, S = {s}. In each iteration, we grow
of the cycle C instead. It follows that (V, T - {e}) is also a valid solution to the S by one node, adding the node v that minimizes the "attachment cost"
problem, and it is cheaper--a contradiction. " mine=(u,u):u~s ce, and including the edge e = (u, v) that achieves this
minimum in the spanning tree. This approach is called Prim’s Algorithm.
If we allow some edges to have 0 cost (that is, we assume only that the
Finally, we can design a greedy algorithm by running sort of a "back-
costs Ce are nonnegafive), then a minimum-cost solution to the network design
ward" version of Kruskal’s Algorithm. Specifically, we start with the full
problem may have extra edges--edges that have 0 cost and could option!lly
graph (V, E) and begin deleting edges in order of decreasing cost. As we
be deleted. But even in this case, there is always a minimum-cost solution that
get to each edge e (starting from the most expensive), we delete it as
is a tree. Starting from any optimal solution, we could keep deleting edges on
4.5 The Minimum Spanning Tree Problem 145
Chapter 4 Greedy Algorithms
144
easier to express the arguments that follow, and we will show later in this
section how this assumption can be easily eliminated.
When Is It Safe to Include an Edge in the Minimum Spanning Tree? The
crucial fact about edge insei-tion is the following statement, which we wil!
refer to as the Cut Property.
Proof. Let T be a spanning tree that does not contain e; we need to show that T
does not have the minimum possible cost. We’!l do this using an exchange
(b) argument: we’ll identify an edge e’ in T that is more expensive than e, and
with the property exchanging e for e’ results in another spanning tree. This
Figure 4.9 Sample run of the Minimum Spanning Tree Algorithms of (a) Prim and resulting spanning tree will then be cheaper than T, as desired.
(b) Kruskal, on the same input. The first 4 edges added to the spanning tree are indicated
The crux is therefore to find an edge that can be successfully exchanged
by solid lines; the ne~xt edge to be added is a dashed line.
with e. Recall that the ends of e are v and w. T is a spanning tree, so there
must be a path P in T from v to w. Starting at ~, suppose we follow the nodes
long as doing so would not actually disconnect the graph we currently of P in sequence; there is a first node w’ on P that is in V - S. Let u’ E S be the
have. For want of a better name, this approach is generally called the node just before w’ on P, and let e’ = (v’, w’) be the edge joining them. Thus,
Reverse-Delete Algorithm (as far as we can te!l, it’s never been named e’ is an edge of T with one end in S and the other in V - S. See Figure 4.10 for
after a specific person). the situation at this stage in the proof.
For example, Figure 4.9 shows the first four edges added by Prim’s and If we exchange e for e’, we get a set of edges T’= T- (e’} U {e). We
Kruskal’s Algorithms respectively, on a geometric instance of the Minimum claim that T’ is a spanning tree. Clearly (V, T’) is connected, since (V, T)
Spanning Tree Problem in which the cost of each edge is proportional to the is connected, and any path in (V, T) that used the edge e’ = (~’, w’) can now
geometric distance in the plane. be "rerouted" in (V, T’) to follow the portion of P from v’ to v, then the edge
e, and then the portion of P from w to w’. To see that (V, T’) is also acyclic,
The fact that each of these algorithms is guaranteed to produce an opti-
note that the only cycle in (V, T’ U {e’}) is the one composed of e and the path
mal solution suggests a certain "robustness" to the Minimum Spanning Tree
P, and this cycle is not present in (V, T’) due to the deletion of e’.
Problem--there are many ways to get to the answer. Next we explore some of
the underlying reasons why so many different algorithms produce minimum- We noted above that the edge e’ has one end in S and the other in V - S.
cost spanning trees. But e is the cheapest edge with this property, and so ce < ce,. (The inequality
is strict since no two edges have the same cost.) Thus the total cost of T’ is
less than that of T, as desired. ,,
f! Analyzing the Algorithms
All these algorithms work by repeatedly inserting or deleting edges from a The proof of (4.17) is a bit more subtle than it may first appear. To
partial solution. So, to analyze them, it would be useful to have in hand some appreciate this subtlety, consider the following shorter but incorrect argument
basic facts saying when it is "safe" to include an edge in the minimum spanning for (4.17). Let T be a spanning tree that does not contain e. Since T is a
tree, and, correspondingly, when it is safe to eliminate an edge on the grounds spanning tree, it must contain an edge f with one end in S and the other in
that it couldn’t possibly be in the minimum spanning tree. For purposes of V - S. Since e is the cheapest edge with this property, we have ce < cf, and
the analysis, we will make the simplifying assumption that all edge costs are hence T - If} U {el is a spanning tree that is cheaper than T.
distinct from one another (i.e., no two are equal). This assumption makes it
4.5 The Minimum Spanning Tree Problem 147
Chapter 4 Greedy Algorithms
146
So if we can show that the output (V, T) of Kruskal’s Algorithm is in fact
a spanning tree of G, then we will be done. Clearly (V, T) contains no cycles,
since the algorithm is explicitly designed to avoid creating cycles. Further, if
(V, T) were not connected, then there would exist a nonempty subset of nodes
S (not equal to all of V) such that there is no edge from S to V - S. But this
contradicts the behavior of the algorithm: we know that since G is connected,
there is at least one edge between S and V - S, and the algorithm will add the
first of these that it encounters. []
(e can be swapped for e’.) Proof. For Prim’s Algorithm, it is also very easy to show that it only adds
edges belonging to every minimum spanning tree. Indeed, in each iteration of
the algorithm, there is a set S _ V on which a partial spanning tree has been
constructed, and a node u and edge e are added that minimize the quantity
mine=(u,u):u~s Ce. By definition, e is the cheapest edge with one end in S and the
other end in V - S, and so by the Cut Property (4.17) it is in every minimum
spanning tree.
Figure 4.10 Swapping the edge e for the edge e’ in the spanning tree T, as described in
the proof of (4.17). It is also straightforward to show that Prim’s Algorithm produces a span-
ning tree of G, and hence it produces a minimum spanning tree. []
The problem with this argument is not in the claim that f exists, or that When Can We Guarantee an Edge Is Not in the Minimum Spanning
T {f} U {e} is cheaper than T. The difficulty is that T - {f} U {e} may not be Tree? The crucial fact about edge deletion is the following statement, which
a spanning tree, as shown by the example of the edge f in Figure 4.10. The we wil! refer to as the Cycle Property.
point is that we can’t prove (4.17) by simply picking any edge in T that crosses
(4.20) Assume that all edge costs are distinct. Let C be any cycle in G, and
from S to V - S; some care must be taken to find the right one.
let edge e = (v, w) be the most expensive edge belonging to C. Then e does not
The Optimality of Kraskal’s and Prim’s Algorithms We can now easily belong to any minimum spanning tree of G.
prove the optimality of both Kruskal’s Algorithm and Pfim’s Algorithm. The
point is that both algorithms only include an edge when it is justified by the Proof. Let T be a spanning tree that contains e; we need to show that T does
Cut Property (4.17). not have the minimum possible cost. By analogy with the proof of the Cut
Property (4.17), we’ll do this with an exchange argument, swapping e for a
(4.18) Kruskal’s Algorithm produces a minimum spanning tree of G. cheaper edge in such a way that we still have a spanning tree.
So again the question is: How do we find a cheaper edge that can be
Proof. Consider any edge e = (v, tu) added by Kruskal’s Algorithm, and let exchanged in this way with e? Let’s begin by deleting e from T; this partitions
S be the set of all nodes to which v has a path at the moment iust before the nodes into two components: S, containing node u; and V - S, containing
e is added. Clearly v ~ S, but tu S, since adding e does not create a cycle. node tu. Now, the edge we use in place of e should have one end in S and the
Moreover, no edge from S to V - S has been encountered yet, since any such other in V - S, so as to stitch the tree back together.
edge could have been added without creating a cycle, and hence would have We can find such an edge by following the cycle C. The edges of C other
been added by Kruskal’s Algorithm. Thus e is the cheapest edge with one end than e form, by definition, a path P with one end at u and the other at tu. If
in S and the other in V- S, and so by (4.17) it belongs to every minimum we follow P from u to tu, we begin in S and end up in V - S, so there is some
spanning tree.
4.5 The Minimum Spanning Tree Problem 149
Chapter 4 Greedy Algorithms
148
contradiction that (V, T) contains a cycle C. Consider the most expensive edge
e on C, which would be the first one encountered by the algorithm. This e.dge
should have been removed, since its removal would not have disconnected
the graph, and this contradicts the behavior of Reverse-Delete. []
While we will not explore this further here, the combination of the Cut
Property (4.17) and the Cycle Property (4.20) implies that something even
more general is going on. Any algorithm that builds a spanning tree by
repeatedly including edges when justified by the Cut Property and deleting
edges when justified by the Cycle Property--in any order at all--will end up
with a minimum spanning tree. This principle allows one to design natural
greedy algorithms for this problem beyond the three we have considered here,
~Tcan be swapped for e.) and it provides an explanation for why so many greedy algorithms produce
optimal solutions for this problem.
Figure 4.11 Swapping the edge e’ for the edge e in the spanning tree T, as described in
the proof of (4.20). Eliminating the Assumption that All Edge Costs Are Distinct Thus far, we
have assumed that all edge costs are distinct, and this assumption has made the
analysis cleaner in a number of places. Now, suppose we are given an instance
edge e’ on P that crosses from S to V - S. See Figure 4.11 for an illustration of, of the Minimum Spanning Tree Problem in which certain edges have the same
this. cost - how can we conclude that the algorithms we have been discussing still
Now consider the set of edges T~ = T - {e} LJ [e’}. Arguing just as in the provide optimal solutions?
proof of the Cut Property (4.17), the graph (V, T~) is connected and has no There turns out to be an easy way to do this: we simply take the instance
cycles, so T’ is a spanning tree of G. Moreover, since e is the most expensive and perturb all edge costs by different, extremely small numbers, so that they
edge on the cycle C, and e’ belongs to C, it must be that e’ is cheaper than e, all become distinct. Now, any two costs that differed originally will sti!l have
and hence T’ is cheaper than T, as desired. [] the same relative order, since the perturbations are so small; and since all
of our algorithms are based on just comparing edge costs, the perturbations
The Optimality of the Reverse-Delete Algorithm Now that we have the Cycle effectively serve simply as "tie-breakers" to resolve comparisons among costs
Property (4.20), it is easy to prove that the Reverse-Delete Algorithm produces that used to be equal.
a minimum spanning tree. The basic idea is analogous to the optimality proofs Moreover, we claim that any minimum spanning tree T for the new,
for the previous two algorithms: Reverse-Delete only adds an edge when it is perturbed instance must have also been a minimum spanning tree for the
justified by (4.20). original instance. To see this, we note that if T cost more than some tree T* in
the original instance, then for small enough perturbations, the change in the
(4.21) The Reverse-Delete Algorithm produces a minimum spanning tree
cost of T cannot be enough to make it better than T* under the new costs. Thus,
of G. if we run any of our minimum spanning tree algorithms, using the perturbed
costs for comparing edges, we will produce a minimum spanning tree T that
Proof. Consider any edge e = (v, w) removed by Reverse-Delete. At the time is also optimal for the original instance.
that e is removed, it lies on a cycle C; and since it is the first edge encountered
by the algorithm in decreasing order of edge costs, it must be the most Implementing Prim’s Algorithm
expensive edge on C. Thus by (4.20), e does not belong to any minimum
We next discuss how to implement the algorithms we have been considering
spanning tree. so as to obtain good running-time bounds. We will see that both Prim’s and
So if we show that the output (V, T) of Reverse-Delete is a spanning tree Kruskal’s Algorithms can be implemented, with the right choice of data struc-
of G, we will be done. Clearly (V, T) is connected, since the algorithm never tures, to run in O(m log n) time. We will see how to do this for Prim’s Algorithm
removes an edge when this will disconnect the graph. Now, suppose by way of
Chapter 4 Greedy Algorithms 4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 151
150
all robust against failures. One could instead make resilience an explicit goal,
here, and defer discussing the implementation of Kruskal’s Algorithm to the
for example seeking the cheapest connected network on the set of sites that
next section. Obtaining a running time close to this for the Reverse-Delete
remains connected after the deletion of any one edge.
Algorithm is difficult, so we do not focus on Reverse-Delete in this discussion.
All of these extensions lead to problems that are computationally much
For Pfim’s Algorithm, while the proof of correctness was quite different
harder than the basic Minimum Spanning Tree problem, though due to their
from the proof for Dijkstra’s Algorithm for the Shortest-Path Algorithm, the
implementations of Prim and Dijkstra are almost identical. By analogy with importance in practice there has been research on good heuristics for them.
Dijkstra’s Algorithm, we need to be able to decide which node v to add next to
the growing set S, by maintaining the attachment costs a(v) = mine=(u,v):aEs Ce 4.6 Implementing Kruskal’s Algorithm:
for each node v ~ V - S. As before, we keep the nodes in a priority queue with
these attachment costs a(v) as the keys; we select a node with an Extra¢~cNin
The IJnion-Find Data Structure
operation, and update the attachment costs using ChangeKey operations. One of the most basic graph problems is to find the set of connected compo-
There are n - I iterations in which we perform Ex~crac~cNin, and we perform nents. In Chapter 3 we discussed linear-time algorithms using BFS or DFS for
finding the connected components of a graph.
ChangeKey at most once for each edge. Thus we have
In this section, we consider the scenario in which a graph evolves through
(4.22) Using a priority queue, Prim’s Algorithm can be implemented on a the addition of edges. That is, the graph has a fixed population of nodes, but it
graph with n nodes and m edges to run in O(m) time, plus the time for n grows over time by having edges appear between certain paizs of nodes. Our
Ex~;rac~Iqin, and m ChangeKey operations. goal is to maintain the set of connected components of such a graph thxoughout
this evolution process. When an edge is added to the graph, we don’t want
As with Dijkstra’s Algorithm, if we use a heap-based priority queue we to have to recompute the connected components from scratch. Rather, we
can implement both Ex~crac~cMin and ChangeKey in O(log n) time, and so get will develop a data structure that we ca!l the Union-Find structure, which
an overall running time of O(m log n). will store a representation of the components in a way that supports rapid
searching and updating.
Extensions This is exactly the data structure needed to implement Kruskal’s Algorithm
efficiently. As each edge e = (v, w) is considered, we need to efficiently find
The minimum spanning tree problem emerged as a particular formulation
the identities of the connected components containing v and w. If these
of a broader network design goal--finding a good way to connect a set of
components are different, then there is no path from v and w, and hence
sites by installing edges between them. A minimum spaxming tree optimizes
edge e should be included; but if the components are the same, then there is
a particular goa!, achieving connectedness with minimum total edge cost. But
a v-w path on the edges already included, and so e should be omitted. In the
there are a range of fllrther goals one might consider as well.
event that e is included, the data structure should also support the efficient
We may, for example, be concerned about point-to-point distances in the merging of the components of v and w into a single new component.
spanning tree we .build, and be willing to reduce these even if we pay more
for the set of edges. This raises new issues, since it is not hard to construct
examples where the minimum spanning tree does not minimize point-to-point ~ The Problem
distances, suggesting some tension between these goals. The Union-Find data structure allows us to maintain disjoint sets (such as the
Alternately, we may care more about the congestion on the edges. Given components of a graph) in the following sense. Given a node u, the operation
traffic that needs to be routed between pairs of nodes, one could seek a Find(u) will return the name of the set containing u. This operation can be
spanning tree in which no single edge carries more than a certain amount of used to test if two nodes u and v are in the same set, by simply checking
this traffic. Here too, it is easy to find cases in which the minimum spanning if Find(u) = Find(v). The data structure will also implement an operation
Union(A, B) to take two sets A and B and merge them to a single set.
tree ends up concentrating a lot of traffic on a single edge.
More generally, it is reasonable to ask whether a spanning tree is even the These operations can be used to maintain connected components of an
evolving graph G = (V, E) as edges are added. The sets will be the connected
right kind of solution to our network design problem. A tree has the property
that destroying any one edge disconnects it, which means that trees are not at components of the graph. For a node u, the operation Find(u) will return the
4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 153
Chapter 4 Greedy Algorithms
152
we save some time by choosing the name for the union to be the name of one of
name of the component containing u. If we add an edge (u, v) to the graph,
the sets, say, set A: this way we only have to update the values Component [s]
then we first test if u and v are already in the same connected component (by
for s ~ B, but not for any s ~ A. Of course, if set B is large, this idea by itself
testing if Find(u) = Find(v)). If they are not, then Union(Find(u),Find(v)) doesn’t help very much. Thus we add one further optimization. When set B
can be used to merge the two components into one. It is important to note is big, we may want to keep its name and change Component [s] for all s ~ A
that the Union-Find data structure can only be used to maintain components
instead. More generally, we can maintain an additional array size of length
of a graph as we add edges; it is not designed to handle the effects of edge n, where size[A] is the size of set A, and when a Union(A, B) operation is
deletion, which may result in a single component being "split" into two. performed, we use the name of the larger set for the union. This way, fewer
To summarize, the Union-Find data structure will support three oper- elements need to have their Componen~c values updated.
afions. Even with these optimizations, the worst case for a Union operation is
o MakeUnionFind(S) for a set S will return a Union-Find data structure still O(n) time; this happens if we take the union of two large sets A and B,
on set S where all elements are in separate sets. This corresponds, for each containing a constant fraction of all the elements. However, such bad
example, to the connected components of a graph with no edges. Our cases for Union cannot happen very often, as the resulting set A U B is even
goal will be to implement MakeUnionFind in time O(n) where n bigger. How can we make this statement more precise? Instead of bounding
o For an element u ~ S, the operation Find(u) will return the name of the the worst-case running time of a single Union operation, we can bound the
set containing u. Our goal will be to implement Find(u) in O(log n) time. total (or average) running time of a sequence of k Union operations.
Some implementations that we discuss will in fact .take only 0(1) time,
for this operation. (4.23) Consider the array implementation of the Union-Find data structure
o For two sets A and B, the operation Union(A, B) will change the data for some set S of size n, where unions keep the name of the larger set. The
structure by merging the sets A and B into a single set. Our goal .will be Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and any
to implement Union in O(log n) time. . sequence of k Union operations takes at most O(k log k) time.
Let’s briefly discuss what we mean by the name of a set--for example,
as returned by the Find operation. There is a fair amount of flexibility in Proof. The claims about the MakeUnionFind and Find operations are easy
defining the names of the sets; they should simply be consistent in the sense to verify. Now consider a sequence of k Union operations. The only part
that Find(v) and Find(w) should return the same name if v and w belong to~ of a Union operation that takes more than O(I) time is updating the array
the same set, and different names otherwise. In our implementations, we will Component. Instead of bounding the time spent on one Union operation,
name each set using one of the elements it contains. we will bound the total time spent updating Component[v] for an element
u fi-Lroughout the sequence of k operations.
A Simple Data Structure for Union-Find Recall that we start the data structure from a state when all n elements are
Maybe the simplest possible way to implement a Union-Find data structure in their own separate sets. A single Union operation can consider at most two
is to maintain an array Component that contains the name of the set cuirenfly of these original one-element sets, so after any sequence of k Union operations,
containing each element. Let S be a set, and assume it has n elements denoted all but at most 2k elements of S have been completely untouched. Now
{1 ..... n}. We will set up an array Component of size n, where Component [s] is consider a particular element v. As v’s set is involved in a sequence of Union
the name of the set containing s. To implement MakeUnionFind(S), we set up operations, its size grows. It may be that in some of these Unions, the value
the array and initialize it to Component Is] = s for all s ~ S. This implementation of Component[v] is updated, and in others it is not. But our convention is that
makes Find(u) easy: it is a simple lookup and takes only O(.1) time. However, the union uses the name of the larger set, so in every update to Component [v]
Union(A, B) for two sets A and B can take as long as O(n) time, as we have the size of the set containing u at least doubles. The size of v’s set starts out at
I, and the maximum possible size it can reach is 2k (since we argued above
to update the values of Component Is] for all elements in sets A and B.
that all but at most 2k elements are untouched by Union operations). Thus
To improve this bound, we will do a few simple optimizafions. First, it is
Component[v] gets updated at most 1og2(2k) times throughout the process.
useful to explicitly maintain the list of elements in each set, so we don’t have to Moreover, at most 2k elements are involved in any Union operations at all, so
look through the whole array to find the elements that need updating. Further,
4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 155
Chapter 4 Greedy Algorithms
154
we get a bound of O(k log k) for the time spent updating Component values IThe set {s, u, w} was merged into {t, u, z}.)
in a sequence of k Union operations. ,,
While this bound on the average running time for a sequence of k opera-
tions is good enough in many applications, including implementing Kruskal’s
Algorithm, we will try to do better and reduce the worst-case time required.
We’ll do this at the expense of raising the time required for the Find operation-
to O(log n).
A Better Data Structure for Union-Find Figure 4.12 A Union-Find data structure using pointers. The data structure has only
The data structure for this alternate implementation uses pointers. Each node two sets at the moment, named after nodes u andj. The dashed arrow from u to u is the
v ~ S will be contained in a record with an associated pointer to the name result of the last Union operation. To answer a Find query, we follow the arrows unit
we get to a node that has no outgoing arrow. For example, answering the query Find(i)
of the set that contains v. As before, we will use the elements of the set S would involve following the arrows i to x, and then x to ].
as possible set names, naming each set after one of its elements. For the
MakeUnionFind(S) operation, we initiafize a record for each element v ~ S
with a pointer that points to itself (or is defined as a null pointer), to indicate structure in Figure 4.12 followed this convention. To implement this choice
that v is in its own set. efficiently, we will maintain an additional field with the nodes: the size of the
Consider a Union operation for two sets A and/3, and assume that the corresponding set.
name we used for set A is a node v ~ A, while set B is named after node u ~ B.
The idea is to have either u or u be the name of the combined set; assume we (4.24) Consider the above pointer-based implementation of the Union-Find
select v as the name. To indicate that we took the union of the two sets, and data structure [or some set S oy size n, where unions keep the name o[ the larger
that the name of the union set is v, we simply update u’s pointer to point to v. set. A Union operation takes O(1) t~me, MakeUnionFind(S) takes O(n) time,
We do not update the pointers at the other nodes of set B. and a Find operation takes O(log n) time.
As a resuk, for elements w ~/3 other than u, the name of the set they
belong to must be computed by following a sequence of pointers, first lead~g Proof. The statements about Union and MakeUnionFind are easy to verify.
them to the "old name" u and then via the pointer from u to the "new name" v. The time to evaluate Find(u) for a node u is the number of thnes the set
See Figure 4.12 for what such a representation looks like. For example, the twO containing node u changes its name during the process. By the convention
sets in Figure 4.12 could be the outcome of the following sequence of Union that the union keeps the name of the larger set, it follows that every time the
operations: Union(w, u), Union(s, u), Union(t, v), Union(z, u), Union(i, x), name of the set containing node u changes, the size of this set at least doubles.
Union(y, j), Union(x, ]), and Union(u, Since the set containing ~ starts at size 1 and is never larger than n, its size can
This pointer-based data structure implements Union in O(1) time: all we double at most log2 rt times, and so there can be at most log2 n name changes.
have to do is to update one pointer. But a Find operation is no longer constant []
time, as we have to follow a sequence of pointers through a history of old
names the set had, in order to get to the current name. How long can a Find(u) Further Improvements
operation take.~ The number of steps needed is exactly the number of times
Next we will briefly discuss a natural optimization in the pointer-based Union-
the set containing node u had to change its name, that is, the number of times
the Component[u] array position would have been updated in our previous Find data structure that has the effect of speeding up the Find operations.
array representation. This can be as large as O(n) if we are not careful with Strictly speaking, this improvement will not be necessary for our purposes in
this book: for all the applications of Union-Find data structures that we con-
choosing set names. To reduce the time required for a Find operation, we wll!
sider, the O(log n) time per operation is good enough in the sense that further
use the same optimization we used before: keep the name of the larger set
improvement in the time for operations would not translate to improvements
as the name of the union. The sequence of Unions that produced the data
156 Chapter 4 Greedy Algorithms 4.7 Clustering
157
in the overall running time of the algorithms where we use them. (The Union- ¯ e time, since after finding the name x of the set containing v, we have to go
Find operations will not be the only computational bottleneck in the running back through the same path of pointers from v to x, and reset each of these
time of these algorithms.) pointers to point to x directly. But this additional work can at most double
To motivate the improved version of the data structure, let us first discuss a the time required, and so does not change the fact that a Find takes at most
bad case for the running time of the pointer-based Union-Find data structure. O(log n) time. The real gain from compression is in making subsequent calls to
First we build up a structure where one of the Find operations takes about log n Find cheaper, and this can be made precise by the same type of argument we
time. To do this, we can repeatedly take Unions of equal-sized sets. Assume v used in (4.23): bounding the total tLme for a sequence of n Find operations,
is a node for which the Find(v) operation takes about log rt time. Now we can rather than the worst-case time for any one of them. Although we do not go
issue Find(v) repeatedly, and it takes log rt for each such call. Having to follow into the details here, a sequence of n Find operations employing compression
the same sequence of log rt pointers every time for finding the name of the set requires an amount of time that is extremely close to linear in rt; the actual
containing v is quite redundant: after the first request for Find(v), we akeady upper bound is O(not(rt)), where or(n) is an extremely slow-growing function
"know" the name x of the set containing v, and we also know that all other of n called the irtverse Ackermartrt furtctiort. (In particular, o~(rt) < 4 for any
nodes that we touched during our path from v to the current name also are all value of rt that could be encountered in practice.)
contained in the set x. So in the improved implementation, we will compress
the path we followed after every Find operation by resetting all pointers along Implementing Kruskal’s Algorithm
the path to point to the current name of the set. No information is lost by Now we’ll use the Union-Find data structure to implement Kruskal’s Algo-
doing this, and it makes subsequent Find operations run more quickly. See, rithm. First we need to sort the edges by cost. This takes time O(m log m).
Figure 4.13 for a Union-Find data structure and the result of Find(v) using Since we have at most one edge between any pair of nodes, we have m < rt2
path compression. and hence this running time is also O(m log rt).
Now consider the running time of the operations in the resulting imple- After the sorting operation, we use the Union-Find data structure to
mentation. As before, a Union operation takes O(1) time and MakeUnion- maintain the connected components of (V, T) as edges are added. As each
Find(S) takes O(rt) time to set up a data structure for a set of size ft. How did edge e = (v, w) is considered, we compute Find(u) and Find(v) and test
the time required for a Find(v) operation change? Some Find operations can if they are equal to see if v and w belong to different components. We
still take up to log n time; and for some Find operations we actually increase use Union(Find(u),Find(v)) to merge the two components, if the algorithm
decides to include edge e in the tree T.
We are doing a total of at most 2m Find and n- 1 Union operations
I ow points directly
Enverything to x.from v to x1
on the path
over the course of Kruskal’s Algorithm. We can use either (4.23) for the
array-based implementation of Union-Find, or (4.24) for the pointer-based
implementation, to conclude that this is a total of O(m log rt) time. (While
more efficient implementations of the Union-Find data structure are possible,
this would not help the running time of Kruskal’s Algorithm, which has an
unavoidable O(m log n) term due to the initial sorting of the edges by cost.)
To sum up, we have
spanning trees arise in a range of different settings, several of which appear f! Designing the Algorithm
on the surface to be quite different from one another. An appealing example To find a clustering of maximum spacing, we consider growing a graph on the
is the role that minimum spanning trees play in the area of clustering. vertex set U. The connected components will be the clusters, and we will try
to bring nearby points together into the same cluster as rapidly as possible.
(This way, they don’t end up as points in different clusters that are very close
together.) Thus we start by drawing an edge between the closest pair of points.
f! The Problem We then draw an edge between the next closest pair of points. We continue
Clustering arises whenever one has a co!lection of obiects--say, a set of adding edges between pairs of points, in order of increasing distance d(p~, p]).
photographs, documents, or microorganisms--that one is trying to classify In this way, we are growing a graph H on U edge by edge, with connected
or organize into coherent groups. Faced with such a situation, it is natural components corresponding to clusters. Notice that we are only interested in
to look first for measures of how similar or dissimilar each pair of obiects is. the connected components of the graph H, not the full set of edges; so if we
One common approach is to define a distance function on the objects, with are about to add the edge (pi, pj) and find that pi and pj already belong to the
the interpretation that obiects at a larger distance from one another are less same cluster, we will refrain from adding the edge--it’s not necessary, because
similar to each other. For points in the physical world, distance may actually it won’t change the set of components. In this way, our graph-growing process
be related to their physical distance; but in many applications, distance takes will never create a cycle; so H will actually be a union of trees. Each time
on a much more abstract meaning. For example, we could define the distance we add an edge that spans two distinct components, it is as though we have
between two species to be the number of years since they diverged in the merged the two corresponding clusters. In the clustering literature, the iterative
course of evolution; we could define the distance between two images in ~/ merging of clusters in this way is often termed single-link clustering, a special
video stream as the number of corresponding pixels at which their intensity case of hierarchical agglomerative clustering. (Agglomerative here means that
values differ by at least some threshold. we combine clusters; single-link means that we do so as soon as a single link
Now, given a distance function on the objects, the clustering problem joins them together.) See Figure 4.14 for an example of an instance with k = 3
seeks to divide them into groups so that, intuitively, obiects within the same clusters where this algorithm partitions the points into an intuitively natural
group are "close," and objects in different groups are "far apart." Starting from grouping.
this vague set of goals, the field of clustering branches into a vast number of What is the connection to minimum spanning trees? It’s very simple:
technically different approaches, each seeking to formalize this general notion although our graph-growing procedure was motivated by this cluster-merging
of what a good set of groups might look like. idea, our procedure is precisely Kruskal’s Minimum Spanning Tree Algorithm.
We are doing exactly what Kruskal’s Algorithm would do if given a graph G
CIusterings of Maximum Spacing Minimum spanning trees play a role in one on U in which there was an edge of cost d(Pi, pj) between each pair of nodes
of the most basic formalizations, which we describe here. Suppose we are given (Pi, Pj)- The only difference is that we seek a k-clustering, so we stop the
a set U of n obiects, labeledpl,p2 ..... Pn. For each pair, p~ and pj, we have a procedure once we obtain k connected components.
numerical distance d(p~, pj). We require only that d(Pi, P~) = 0; that d(p~, p]) > 0 In other, words, we are running Kruskal’s Algorithm but stopping it just
for distinct p~ and pT; and that distances are symmetric: d(pi, p]) = d(pj, p~). before it adds its last k - 1 edges. This is equivalent to taking the rill minimum
Suppose we are seeking to divide the obiects in U into k groups, for a spanning tree T (as Kruskal’s Algorithm would have produced it), deleting the
given parameter k. We say that a k-clustering of U is a partition of U into k k - 1 most expensive edges (the ones that we never actually added), and defin-
nonempty sets C1, C2 ..... Q. We define the spacing of a k-clustering to be the ing the k-clustering to be the resulting connected components C1, C2 .....
minimum distance between any pair of points lying in different clusters. Given Thus, iteratively merging clusters is equivalent to computing a minimum span-
that we want points in different clusters to be far apart from one another, a ning tree and deleting the most expensive edges.
natural goal is to seek the k-clustering with the maximum possible spacing.
The question now becomes the following. There are exponentially many
~ Analyzing the Algorithm
different k-clusterings of a set U; how can we efficiently find the one that has Have we achieved our goal of producing clusters that are as spaced apart as
maximum spacing? possible? The following claim shows that we have.
Chapter 4 Greedy Algorithms
160 4.8 Huffrnan Codes and Data Compression
161
Cluster I Cluster Cr
Cluster 2
¯ l \ Cluster C~ : .....
Cluster 3 .......... : ..... Cluster C; / ....................
Figure 4.15 An illustration of the proof of (4.26), showing that the spacing of any
other dustering can be no larger than that of the clustering found by the single-linkage
algorithm.
Proof. Let e denote the clustering C1, Ca ..... Ck. The spacing of e is precisely
the length d* of the (/( - !)st most expensive edge in the minimum spanning 4.8 Huffman Codes and Data Compression
tree; this is the length of the edge that Kruskal’s Mgorithm would have added
In the Shortest-Path and Minimum Spanning Tree Problems, we’ve seen how
next, at the moment we stopped it. greedy algorithms can be used to commit to certain parts of a solution (edges
Now consider some other/(-clustering e’, which partitions U into non- in a graph, in these cases), based entirely on relatively short-sighted consid-
empW sets C[, C; ..... C~. We must show that the spacing of e’ is at most erations. We now consider a problem in which this style of "committing" is
d*. carried out in an even looser sense: a greedy rule is used, essentially, to shrink
Since the two clustefings e and e’ are not the same, it must be that one the size of the problem instance, so that an equivalent smaller problem can
of our clusters Cr is not a subset of any of the/( sets C; in e’. Hence there then be solved by recursion. The greedy operation here is ’proved to be "safe,"
are points Pi, Pj ~ Cr that belong to different clusters in e’--say, Pi ~ C~s and in the sense that solving the smaller instance still leads to an optimal solu-
tion for the original instance, but the global consequences of the initial greedy
Now consider the picture in Figure 4.15. Since pi and pj belong to the same decision do not become fully apparent until the full recursion is complete.
component Cr, it must be that Kruskal’s Algorithm added all the edges of a
The problem itself is one of the basic questions in the area of data com-
PrPj path P before we stopped it. In particular, this means that each edge on pression, an area that forms part of the foundations for digital communication.
4.8 Huffman Codes and Data Compression 163
Chapter 4 Greedy Algorithms
162
that can take files as input and reduce their space ~rough efficient encoding
~ The Problem schemes.
Encoding Symbols Using Bits Since computers ultimately operate on se-
We now describe one of the fundamental ways of formulating this issue,
quences of bits (i.e., sequences consisting only of the symbols 0 and 1), one
building up to the question of how we might construct the optimal way to take
needs encoding schemes that take text written in richer alphabets (such as the
advantage of the nonuniform frequencies of the letters. In one sense, such an
alphabets underpinning human languages) and converts this text into long
optimal solution is a very appealing answer to the problem of compressing
strings of bits. data: it squeezes all the available gains out of nonuniformities in the frequen-
The simplest way to do this would be to use a fixed number of bits for cies. At the end of the section, we will discuss how one can make flLrther
each symbol in the alphabet, and then just concatenate the bit strings for progress in compression, taking advantage of features other than nonuniform
each symbol to form the text. To take a basic example, suppose we wanted to frequencies.
encode the 26 letters of English, plus the space (to separate words) and five
punctuation characters: comma, period, question mark, exclamation point,
and apostrophe. This would give us 32 symbols in total to be encoded. Variable-Length Encoding Schemes Before the Internet, before the digital
Now, you can form 2b different sequences out of b bits, and so if we use 5 computer, before the radio and telephone, there was the telegraph. Commu-
bits per symbol, then we can encode 2s= 32 symbols--just enough for our nicating by telegraph was a lot faster than the contemporary alternatives of
purposes. So, for example, we could let the bit string 00000 represent a, the hand-delivering messages by railroad or on horseback. But telegraphs were
bit string 00001 represent b, and so forth up to 11111, which could represent the only capable of transmitting pulses down a wire, and so if you wanted to send
apostrophe. Note that the mapping of bit strings to symbols is arbitrary; the’
a message, you needed a way to encode the text of your message as a sequence
point is simply that five bits per symbol is sufficient. In fact, encoding schemes of pulses.
like ASCII work precisely this way, except that they use a larger number of
bits per symbol so as to handle larger character sets, including capital letters, To deal with this issue, the pioneer of telegraphic communication, Samuel
parentheses, and all those other special symbols you see on a typewriter or Morse, developed Morse code, translating each letter into a sequence of dots
computer keyboard. (short pulses) and dashes (long pulses). For our purposes, we can think of
dots and dashes as zeros and ones, and so this is simply a mapping of symbols
Let’s think about our bare-bones example with just 32 symbols. Is there
into bit strings, just as in ASCII. Morse understood the point that one could
anything more we could ask for from an encoding scheme?. We couldn’t ask
communicate more efficiently by encoding frequent letters with short strings,
to encode each symbo! using just four bits, since 24 is only 16--not enough
and so this is the approach he took. (He consulted local printing presses to get
for the number of symbols we have. Nevertheless, it’s not clear that over large frequency estimates for the letters in English.) Thus, Morse code maps e to 0
stretches of text, we really need to be spending an average of five bits per
(a single dot), t to 1 (a single dash), a to 01 (dot-dash), and in general maps
symbol. If we think about it, the letters in most human alphabets do not more frequent letters to shorter bit strings.
get used equally frequently. In English, for example, the letters e, t: a, o, i,
and n get used much more frequently than q, J, x, and z (by more than an In fact, Morse code uses such short strings for the letters that the encoding
order of magnitude). So it’s really a tremendous waste to translate them all of words becomes ambiguous. For example, just using what we know about
the encoding of e, t, and a, we see that the string 0101 could correspond to
into the same number of bits; instead we could use a small number of bits for
the frequent letters, and a larger number of bits for the less frequent ones, and any of the sequences of letters eta, aa, etet, or aet. (There are other possi-
hope to end up using fewer than five bits per letter when we average over a bilities as well, involving other letters.) To deal with this ambiguity, Morse
code transmissions involve short pauses between letter; (so the encoding of
long string of typical text.
aa would actually be dot-dash-pause-dot-dash-pause). This is a reasonable
This issue of reducing the average number of bits per letter is a funda-
solution--using very short bit strings and then introducing pauses--but it
mental problem in the area of data compression. When large files need to be
means that we haven’t actually encoded the letters using just 0 and 1; we’ve
shipped across communication networks, or stored on hard disks, it’s impor-
actually encoded it using a three-letter alphabet of 0, 1, and "pause." Thus, if
tant to represent them as compactly as possible, subject to the requirement
we really needed to encode everything using only the bits 0 and !, there would
that a subsequent reader of the file should be able to correctly reconstruct it.
need to be some flLrther encoding in which the pause got mapped to bits.
A huge amount of research is devoted to the design of compression algorithms
Chapter 4 Greedy Algorithms 4.8 Huffman Codes and Data Compression 165
164
on the rest of the message, 0000011101; next they will conclude that the second
Prefix Codes The ambiguity problem in Morse code arises because there exist letter is e, encoded as 000.
pairs of letters where the bit string that encodes one letter is a prefix of the bit
string that encodes another. To eliminate this problem, and hence to obtain an Optimal Prefix Codes We’ve been doing all this because some letters are
encoding scheme that has a well-defined interpretation for every sequence of
more frequent than others, and we want to take advantage of the fact that more
bits, it is enough to map letters to bit strings in such a way that no encoding frequent letters can have shorter encodings. To make this objective precise, we
is a prefix of any other. now introduce some notation to express the frequencies of letters.
Concretely, we say that a prefix code for a set S of letters is a function y Suppose that for each letter x ~ S, there is a frequency fx, representing the
that maps each letter x ~ S to some sequence of zeros and ones, in such a way fraction of letters in the text that are equal to x. In other words, assuming
that for distinct x, y ~ S, the sequence },(x) is not a prefix of the sequence y(y). there are n letters total, nfx of these letters are equal to x. We notice that the
Now suppose we have a text consisting of a sequence of letters xlx2x3 ¯ ¯ ¯ frequencies sum to 1; that is, ~x~S fx = 1.
x~.~ We can convert this to a sequence of bits by simply encoding each letter as Now, if we use a prefix code ~, to encode the given text, what is the total
a bit sequence using ~ and then concatenating all these bit sequences together: length of our encoding? This is simply the sum, over all letters x ~ S, of the
~ (xl) y (x2) ¯ ¯ ¯ y (xn). If we then hand this message to a recipient who knows the number of times x occurs times the length of the bit string }, (x) used to encode
function y, they will be able to reconstruct the text according to the following x. Using Iy(x)l to denote the length y(x), we can write this as
rule.
o Scan the bit sequence from left to right. encoding length = ~ nfxl},(x)[ = n ~ fx[y(x)l.
o As soon as you’ve seen enough bits to match the encoding of some letter, x~S
output this as the first letter of the text. This must be the correct first letter, Dropping the leading coefficient of n from the final expression gives us
since no shorter or longer prefix of the bit sequence could encode any ~x~s fxl}’(x)l, the average number of bits required per letter. We denote this
other letter. quantity by ABL0,’).
o Now delete the corresponding set of bits from the front of the message
To continue the earlier example, suppose we have a text with the letters
and iterate. S = {a, b, c, d, e}, and their frequencies are as follows:
In this way, the recipient can produce the correct set of letters without our
having to resort to artificial devices like pauses to separate the letters. £=.B2, f~=.25, f~=.20, fa=.~8, f~=.o5.
For example, suppose we are trying to encode the set of five letters Then the average number of bits per letter using the prefix code Yl defined
S = {a, b, c, d, e}. The encoding ~1 specified by previously is
y~(a) = 11
Zl(b) = O1 .32.2+.25.2+.20.3+.18.2+.05.3 =2.25.
y~(c) = 001 It is interesting to compare this to the average number of bits per letter using
y~(d) = 10 a fixed-length encoding. (Note that a fixed-length encoding is a prefix code:
}q(e) = 000 if all letters have encodings of the same length, then clearly no encoding can
be a prefix of any other.) With a set S of five letters, we would need three bits
is a prefix code, since we can check that no encoding is a prefix of any other. per letter for a fixed-length encoding, since two bits could only encode four
NOW, for example, the string cecab would be encoded as 0010000011101. A letters. Thus, using the code ~1 reduces the bits per letter from 3 to 2.25, a
recipient of this message, knowing y~, would begin reading from left to right. savings of 25 percent.
Neither 0 nor O0 encodes a letter, but 001 does, so the recipient concludes that
And, in fact, Yl is not the best we can do in this example. Consider the
the first letter is c. This is a safe decision, since no longer sequence of bits
prefix code ya given by
beginning with 001 could encode a different letter. The recipient now iterates
4.8 Huffman Codes and Data Compression
Chapter 4 Greedy Algorithms 167
166
g2(a) = 11 to y. But this is the same as saying that x would lie on the path from the
root to y, which isn’t possible if x is a leaf. []
g2(b) = 10
g2(c) = 01
This relationship between binary trees and prefix codes works in the other
g2(d) = 001 direction as well. Given a prefix code g, we can build a binary tree recursively
g2(e) = 000 as follows. We start with a root; all letters x ~ S whose encodings begin with
The average number of bits per letter using gz is a 0 will be leaves in the left subtree of the root, and all letters y ~ S whose
encodlngs begin with a 1 will be leaves in the right subtree of the root. We
.32.2 + .25- 2 -k .20 ¯ 2 + .18.3 4- .05- 3 = 2.23. now build these two subtrees recursively using this rule.
For example, the labeled tree in Figure 4.16(a) corresponds to the prefix
So now it is natural to state the underlying question. Given an alphabet code g0 specified by
and a set of frequencies for the letters, we would like to produce a prefix
code that is as efficient as possible--namely, a prefix code that minimizes the go(a) -- 1
average nu}nber of bits per letter ABL(g) = ~_,x~S fxlg(x)l. We will call such a go(b) -- 011
prefix code optimal.
g0(c) = 010
g0(d) = 001
f! Designing the Algorithm g0(e) = 000
The search space for this problem is fairly complicated; it includes all possible
ways of mapping letters to bit strings, subiect to the defining property of prefix
codes. For alphabets consisting of an extremely small number of letters, it is To see this, note that the leaf labeled a is obtained by simply taking the right-
hand edge out of the root (resulting in an encoding of !); the leaf labeled e is
feasible to search this space by brute force, but this rapidly becomes infeasible.
obtained by taMng three successive left-hand edges starting from the root; and
We now describe a greedy method to construct an optimal prefix code analogous explanations apply for b, c, and d. By similar reasoning, one can
very efficiently. As a first step, it is useful to develop a tree-based means of see that the labeled tree in Figure 4.16(b) corresponds to the prefix code gl
representing prefix codes that exposes their structure more clearly than simply defined earlier, and the labeled tree in Figure 4.16(c) corresponds to the prefix
the lists of function values we used in our previous examples. code g2 defined earlier. Note also that the binary trees for the two prefix codes
Representing Prefix Codes Using Binary Trees Suppose we take a rooted tree gl and g2 are identical in structure; only the labeling of the leaves is different.
T in which each node that is not a leaf has at most two children; we call such The tree for go, on the other hand, has a different structure.
a tree a binary tree. Further suppose that the number of leaves is equal to the Thus the search for an optimal prefix code can be viewed as the search for
size of the alphabet S, and we label each leaf with a distinct letter in S. a binary tree T, together with a labeling of the leaves of T, that minimizes the
Such a labeled binary tree T naturally describes a prefix code, as follows. average number of bits per letter. Moreover, this average quantity has a natural
For each letter x ~ S, we follow the path from the root to the leaf labeled x; interpretation in the terms of the structure of T: the length of the encoding of
each time the path goes from a node to its left child, we write down a 0, and a letter x ~ S is simply the length of the path from the root to the leaf labeled
each time the path goes from a node to its right child, we write down a 1. We x. We will refer to the length of this path as the depth of the leaf, and we will
take the resulting string of bits as the encoding of x. denote the depth of a leaf u in T simply by depthw(u). (As fwo bits of notational
convenience, we will drop the subscript T when it is clear from context, and
Now we observe
we will often use a letter x ~ S to also denote the leaf that is labeled by it.)
(4.27) The enCoding of S Constructed from T is a prefix code. Thus we dre seeking the labeled tree that minimizes the weighted average
of the depths of all leaves, where the average is weighted by the frequencies
of the letters that label the leaves: ~x~s Ix" depthw(X). We will use ABL(T) to
Proof. In order for the encoding of x to be a prefix of the encoding of y, the denote this quantity.
path from the root to x would have to be a prefix of the path from the root
4.8 Huffman Codes and Data Compression 169
Chapter 4 Greedy Algorithms
168
a node u with exactly one child u. Now convert T into a tree T’ by replacing
node u with v.
To be precise, we need to distinguish two cases. If u was the root of the
tree, we simply delete node u and use u as the root. If u is not the root, let w
be the parent of u in T. Now we delete node u and make v be a child of w
in place of u. This change decreases the number of bits needed to encode any
leaf in the subtree rooted at node u, and it does notaffect other leaves. So the
prefix code corresponding to T’ has a smaller average number of bits per letter
than the prefix code for T, contradicting the optimality of T. []
Proof. This has a quick proof using an exchange argument. If fy < fz, then Proof. If w were not a leaf, there would be some leaf w’ in the subtree below
consider the code obtained by exchanging the labels at the nodes u and it. But then w’ would have a depth greater than that of v, contradicting our
v. In the expression for the average number of bits per letter, ,~BL(T*)= assumption that v is a leaf of maximum depth in T*. ~,
~x~S fx depth(x), the effect of this exchange is as follows: the multiplier on fy
increases (from depth(u) to depth(v)), and the multiplier on fz decreases by So v and w are sibling leaves that are as deep as possible in T*. Thus our
the same amount (from depth(v) to depth(u)). level-by-level process of labeling T*, as justified by (4.29), will get to the level
Thus the change to the overall sum is (depth(v) - depth(u))(fy - fz). If containing v and w last. The leaves at this level will get the lowest-frequency
letters. Since we have already argued that the order in which we assign these
~fy < fz, this change is a negative number, contradicting the supposed optimality
of the prefix code that we had before the exchange, m letters to the leaves within this level doesn’t matter, there is an optimal labeling
in which u and w get the two lowest-frequency letters of all.
We can see the idea behind (4.29) in Figure 4. !6 (b): a quick way to see that We sum this up in the following claim.
the code here is not optimal is to notice that it can be improved by exchanging
the positions of the labels c and d. Having a lower-frequency letter at a strictly (4.31) There is an optimal prefix code, with corresponding tree T*, in which
smaller depth than some other higher-frequency letter is precisely what (4.29) :the two lowest-frequency letters are assigned to leaves that are Siblings in T*.
rules out for an optimal solution.
Chapter 4 Greedy Algorithms 4.8 Huffman Codes and Data Compression 173
172
Take the leaf labeled ~ and add two children below it
labeled y* and z*
Endif
on the size of the alphabet. Clearly it is optimal for all two-letter alphabets such that ABL(Z) < ABL(T); and by (4.31), there is such a tree Z in which the
(since it uses only one bit per letter). So suppose by induction that it is optimal leaves representing y* and z* are siblings.
for all alphabets of size/~ - 1, and consider an input instance consisting of an It is now easy to get a contradiction, as follows. If we delete the leaves
alphabet S of size labeled y* and z* from Z, and label their former parent with w, we get a tree
Let’s quickly recap the behavior of the algorithm on this instance. The Z’ that defines a prefix code for S’. In the same way that T is obtained from
algorithm merges the two lowest-frequency letters y*, z* ~ S into a single letter T’, the tree Z is obtained from ZI by adding leaves for y* and z* below to; thus
o0, calls itself recursively on the smaller alphabet S’ (in which y* and z* are the identity in (4.32) applies to Z and Z’ as well: ABL(Z’) = ABL(Z) -- [to.
replaced by a)), and by induction produces an optimal prefix code for S’, But we have assumed that ABL(Z) < ABL(T); subtracting/:to from both sides
represented by a labeled binary tree T’. It then extends this into a tree T for S, of this inequality we get ,~BL(Z’) < ABL(T’), which contradicts the optimality
by attaching leaves labeled y* and z* as children of the node in T’ labeled of T’ as a prefix code for S’. ,,
There is a close relationship between ABL(T) and ABL(T’). (Note that the
former quantity is the average number of bits used to encode letters in S, while Implementation and Running Time It is clear that Huffman’s Algorithm can
the latter quantity is the average number of bits used to encode letters in S’.) be made to run in polynomial time in k, the number of letters in the alphabet.
The recursive calls of the algorithm define a sequence of k - 1 iterations over
(4.32) ABL(T’) = ABL(T) -- fro- smaller and smaller alphabets, and each iteration except the last consists
Proof. The depth of each lefter x other than y*, z* is the same in both T and simply of identifying the two lowest-frequency letters and merging them into
T’. Also, the depths of y* and z* in T are each one greater than the depth of a single letter that has the combined frequency. Even without being careful
o) in T’. Using this, plus the fact that [to = fy. + fz*, we have about the implementation, identifying the lowest-frequency letters can be done
in a single scan of the alphabet, in time O(k), and so summing this over the
ABL(T) = ~ ~" depthr(X) k - 1 iterations gives O(k2) time.
But in fact Huffman’s Algorithm is an ideal setting in which to use a
= f~,- depthriv*) + fz*" depthr(z*) + ~ ~. depthT(X) priority queue. Recall that a priority queue maintains a set of/c elements,
x-aY*r- ,Z* each with a numerical key, and it allows for the insertion of new elements and
depthT,(X) the extraction of the element with the minimum key. Thus we can maintain
= (fy* q- fz*)" (1 q- depthT,(~o)) +
x~y*,z* the alphabet S in a priority queue, using each letter’s frequency as its key.
In each iteration we just extract the minimum twice (this gives us the two
= ]’to" (1 q- depthr,(O))) q- ]’x" depthr’(X) lowest-frequency letters), and then we insert a new letter whose key is the
x~-y*,z*
sum of these two minimum frequencies. Our priority queue now contains a
representation of the alphabet that we need for the next iteration.
Using an implementation of priority queues via heaps, as in Chapter 2, we
= L + ~ ]’x" depthr’(X) can make each insertion and extraction of the minimum run in time O(log k);
xES~ hence, each iteration--which performs just three of these operations--takes
time O(log/0. Summing over all k iterations, we get a total running time of
= ]:to q- ABE(T/)..
O(k log k).
Using this, we now prove optimality as follows.
Extensions
(4.33) The Huffinan code for a given alphabet achieves the minimum average
number of bits per letter of any prefix code. The structure of optimal prefix codes, which has been our focus here, stands
as a fundamental result in the area of data compression. But it is important to
Proof. Suppose by way of contradiction that the tree T produced by our greedy understand that this optimality result does not by any means imply that we
algorithm is not optimal. This means that there is some labeled binary tree Z have found the best way to compress data under all circumstances.
4.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 177
Chapter 4 Greedy Algorithms
176
letter over a long run of text that follows. Such approaches, which change
What more could we want beyond an optimal prefix code? First, consider the encoding in midstream, are called adaptive compression schemes, and
an application in which we are transmitting black-and-white images: each for many kinds of data they lead to significant improvements over the static
image is a 1,000-by-l,000 array of pixels, and each pixel takes one of the two method we’ve considered here.
values black or white. Further, suppose that a typical image is almost entirely These issues suggest some of the directions in which work on data com-
white: roughly 1,000 of the million pixels are black, and the rest are white. Now, pression has proceeded. In many of these cases, there is a trade-off between
if we wanted to compress such an image, the whole approach of prefix codes the power of the compression technique and its computational cost. In partic-
has very little to say: we have a text of length one million over the two-letter ular, many of the improvements to Huffman codes just described come with
alphabet {black, white}. As a result, the text is already encoded using one bit a corresponding increase in the computational effort needed both to produce
per letter--the lowest possible in our framework. the compressed version of the data and also to decompress it and restore the
It is clear, though, that such images should be highly compressible. original text. Finding the right balance among these trade-offs is a topic of
Intuitively, one ought to be able to use a "fraction of a bit" for each white pixel, active research.
since they are so overwhelmingly frequent, at the cost of using multiple bits
for each black pixel. (In an extreme version, sending a list of (x, y) coordinates
for each black pixel would be an improvement over sending the image as a * 4.9 Minimum-Cost Arborescences: A Multi-Phase
text with a million bits.) The challenge here is to define an encoding scheme Greedy Algorithm
where the notion of using fractions of bits is well-defined. There are results
As we’ve seen more and more examples of greedy algorithms, we’ve come to
in the area of data compression, however, that do iust this; arithmetic coding
appreciate that there can be considerable diversity in the way they operate.
and a range of other techniques have been developed to handle settings like
Many greedy algorithms make some sort of an initial "ordering" decision on
this. the input, and then process everything in a one-pass fashion. Others make
A second drawback of prefix codes, as defined here, is that they cannot more incremental decisions--still local and opportunistic, but without a g!obal
adapt to changes in the text. Again let’s consider a simple example. Suppose we "plan" in advance. In this section, we consider a problem that stresses our
are trying to encode the output of a program that produces a long sequence intuitive view of greedy algorithms still further.
of letters from the set {a, b, c, d}. Further suppose that for the first half of
this sequence, the letters a and b occur equally frequently, while c and d do
not occur at all; but in the second half of this sequence, the letters c and d ,~J The Problem
occur equally frequently, while a and b do not occur at all. In the framework The problem is to compute a minimum-cost arborescence of a directed graph.
developed in this section, we are trying to compress a text over the four-letter This is essentially an analogue of the Minimum Spanning Tree Problem for
alphabet {a, b, c, d}, and all letters are equally frequent. Thus each would be directed, rather than undirected, graphs; we will see that the move to directed
encoded with two bits. graphs introduces significant new complications. At the same time, the style
But what’s really happening in this example is that the frequency remains of the algorithm has a strongly greedy flavor, since it still constructs a solution
stable for half the text, and then it changes radically. So one could get away according to a local, myopic rule.
with iust one bit per letter, plus a bit of extra overhead, as follows. We begin with the basic definitions. Let G = (V, E) be a directed graph in
o Begin with an encoding in which the bit 0 represents a and the bit 1 which we’ve distinguished one node r ~ V as a root. An arborescence (with
represents b. respect to r) is essentially a directed spanning tree rooted at r. Specifically, it
is a subgraph T = (V, F) such that T is a spanning tree of G if we ignore the
o Halfway into the sequence, insert some kind of instruction that says,
direction of edges; and there is a path in T from r to each other node v ~ V if
"We’re changing the encoding now. From now on, the bit 0 represents c we take the direction of edges into account. Figure 4.18 gives an example of
and the bit I represents d:’ two different arborescences in the same directed graph.
o Use this new encoding for the rest of the sequence. There is a useful equivalent way to characterize arborescences, and this
The point is that investing a small amount of space to describe a new encoding is as follows.
can pay off many times over if it reduces the average number of bits per
4.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 179
Chapter 4 Greedy Algorithms
178
The basic problem we consider here is the following. We are given a
directed graph G = (V, E), with a distinguished root node r and with a non-
negative cost ce >_ 0 on each edge, and we wish to compute an arborescence
rooted at r of minimum total cost. (We will refer to this as an optimal arbores-
cence.) We will assume throughout that G at least has an arborescence rooted
at r; by (4.35), this can be easily checked at the outset.
It is easy to see that, just as every connected graph has a spanning tree, a
directed graph has an arborescence rooted at r provided that r can reach every
node. Indeed, in this case, the edges in a breadth-first search tree rooted at r
will form an arborescence.
(a)
(4.t5) A directed graph G has an arborescence rooted at r if and only if the¢e Figure 4.19 (a) A directed graph with costs onits edges, and (b) an optimal arborescence
rooted at r for this graph.
_
4.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 181
Chapter 4 Greedy Algorithms
180
This is because an arborescence has exactly one edge entering each node
the other edges out of r. This kind of argument never clouded our thinking in in the sum. Since the difference between the two costs is independent of the
the Minimum Spanning Tree Problem, where it was always safe to plunge choice of the arborescence T, we see that T has minimum cost subiect to {ce}
ahead and include the cheapest edge; it suggests that finding the optimal if and only if it has minimum cost subject to {c’e}. ,,
arborescence may be a significantly more complicated task. (It’s worth noticing
that the optimal arborescence in Figure 4.19 also includes the most expensive
We now consider the problem in terms of the costs {de}. All the edges in
edge on a cycle; with a different construction, one can even cause the optimal
our set F* have cost 0 under these modified costs; and so if (V, F*) contains
arborescence to include the most expensive edge in the whole graph.) a cycle C, we know that all edges in C have cost 0. This suggests that we can
Despite this, it is possible to design a greedy type of algorithm for this afford to use as many edges from C as we want (consistent with producing an
problem; it’s just that our myopic rule for choosing edges has to be a little arborescence), since including edges from C doesn’t raise the cost.
more sophisticated. First let’s consider a little more carefully what goes wrong Thus our algorithm continues as follows. We contract C into a single
with the general strategy of including the cheapest edges. Here’s a particular supemode, obtaining a smaller graph G’ = (V’, E’). Here, V’ contains the nodes
version of this strategy: for each node v # r, select the cheapest edge entering of V-C, plus a single node c* representing C. We transform each edge e E E to
v (breaking ties arbitrarily), and let F* be this set of n - 1 edges. Now consider an edge e’ E E’ by replacing each end of e that belongs to C with the new node
the subgraph (V, F*). Since we know that the optimal arborescence needs to c*. This can result in G’ having parallel edges (i.e., edges with the same ends),
have exactly one edge entering each node v # r, and (V, F*) represents the which is fine; however, we delete self-loops from E’--edges that have both
cheapest possible way of making these choices, we have the following fact2 ends equal to c*. We recursively find an optimal arborescence in this smaller
(4.36) I[ (V, F*) is an arborescence, then it is a minimum-cost arborescence. graph G’, subject to the costs {C’e}. The arborescence returned by this recursive
call can be converted into an arborescence of G by including all but one edge
on the cycle C.
So the difficulty is that (V, F*) may not be an arborescence. In this case, In summary, here is the full algorithm.
(4.34) implies that (V, F*) must contain a cycle C, which does not include the
root. We now must decide how to proceed in this situation. For each node u7&r
To make matters somewhat clearer, we begin with the following observa- Let Yu be the minimum cost of an edge entering node
tion. Every arborescence contains exactly one edge entering each node v # r; Modify the costs of all edges e entering v to c’e=ce
so if we pick some node v and subtract a uniform quantity from the cost of Choose one 0-cost edge entering each u7~r, obtaining a set F*
every edge entering v, then the total cost of every arborescence changes by If F* forms an arborescence, then return it
exactly the same amount. This means, essentially, that the actual cost of the Else there is a directed cycle C_CF*
cheapest edge entering v is not important; what matters is the cost of all other Contract C to a single supernode, yielding a graph G’= (V’,E’)
edges entering v relative to this. Thus let Yv denote the minimum cogt of any Recursively find an optimal arborescence (V’,F’) in G’
edge entering v. For each edge e = (u, v), with cost ce >_ O, we define its modi- with costs [C’e}
fied cost c’e to be ce - Yv- Note that since ce >_ y,, all the modified costs are still Extend (V’,F ~) to an arborescence (V, F) in G
nonnegafive. More crucially, our discussion motivates the following fact. by adding all but one edge of C
The arborescences of G’ are in one-to-one correspondence with arborescences The algorithm finds an optimal arborescence robted at ~ in G: ’:
of G that have exactly one edge entering the cycle C; and these corresponding
arborescences have the same cost with respect to {c’e}, since C consists of 0- Proof. The proof is by induction on the number of nodes in G. If the edges
cost edges. (We say that an edge e = (u, v) enters C if v belongs to C but u does of F form an arborescence, then the algorithm returns an optimal arborescence
not.) So to prove that our algorithm finds an optimal arborescence in G, we by (4.36). Otherwise, we consider the problem with the modified costs {c’e},
must prove that G has an optimal arborescence with exactly one edge entering which is equivalent by (4.37). After contracting a 0-cost cycle C to obtain a
C. We do this now. smaller graph G’, the algorithm produces an optimal arborescence in G’ by the
inductive hypothesis. Finally, by (4.38), there is an optimal arborescence in G
(4.38) Let C be a cycle in G consisting of edges of cost O, such that r ~ C. that corresponds to the optimal arborescence computed for G’. ~
Then there is an optimal arborescence rooted at r that has exactly one edge
entering C.
Solved Exercises
Proof. Consider an optimal arborescence T in G. Since r has a path in T to
every node, there is at least one edge of T that enters C. If T enters C exactly Solved Exercise 1
once, then we are done. Otherwise, suppose that T enters C more than once. Suppose that three of your friends, inspired by repeated viewings of the
We show how to modify it to obtain a.n arborescence of no greater cost that horror-movie phenomenon The Blair Witch Project, have decided to hike the
enters C exactly once. Appalachian Trail this summer. They want to hike as much as possible per
Let e = (a, b) be an edge entering C that lies on as short a path as possible day but, for obvious reasons, not after dark. On a map they’ve identified a
from r; this means in particular that no edges on the path from r to a can enter large set of good stopping points for camping, and they’re considering the
C. We delete all edges of T that enter C, except for the edge e. We add in all following system for deciding when to stop for the day. Each time they come
edges of C except for the one edge that enters b, the head of edge e. Let T’ to a potential stopping point, they determine whether they can make it to the
denote the resulting subgraph of G. next one before nightfall. If they can make it, then they keep hiking; otherwise,
they stop.
We claim that T’ is also an arborescence. This will establish the result,
since the cost of T’ is clearly no greater than that of T: the only edges of Despite many significant drawbacks, they claim this system does have
T’ that do not also belong to T have cost 0. So why is T’ an arborescence? one good feature. "Given that we’re only hiking in the daylight," they claim,
Observe that T’ has exactly one edge entering each node v # r, and no edge "it minimizes the number of camping stops we have to make."
entering r. So T’ has exactly n - 1 edges; hence if we can show there is an Is this true? The proposed system is a greedy algorithm, and we wish to
path in T’ for each v, then T’ must be connected in an undirected sense, and determine whether it minimizes the number of stops needed.
hence a tree. Thus it would satisfy our initial definition of an arborescence.
To make this question precise, let’s make the following set of simplifying
So consider any node v # r; we must show there is an r-v path in T’. If assumptions. We’ll model the Appalachian Trail as a long line segment of
v ~ C, we can use the fact that the path in T from r to e has been preserved length L, and assume that your friends can hike d miles per day (independent
in the construction of T’; thus we can reach v by first reaching e and then of terrain, weather conditions, and so forth). We’ll assume that the potential
following the edges of the cycle C. Now suppose that v C, and let P denote stopping points are located at distances xl, x2 ..... xn from the start of the
the r-v path in T. If P did not touch C, then it sti!l exists in T’. Otherwise, trail. We’ll also assume (very generously) that your friends are always correct
let tv be the last node in P C~ C, and let P’ be the subpath of P from tv to v. when they estimate whether they can make it to the next stopping point before
Observe that all the edges in P’ still exist in T’. We have already argued that nightfall.
u~ is reachable from r in T’, since it belongs to C. Concatenating this path We’ll say that a set of stopping points is valid if the distance between each
to tv with the subpath P’ gives us a path to v as well. ,, adjacent pair is at most d, the first is at distance at most d from the start of
the trail, and the last is at distance at most d from the end of the trai!. Thus
We can now put all the pieces together to argue that our algorithm is a set of stopping points is valid if one could camp only at these places and
correct. ,
Solved Exercises 185
Chapter 4 Greedy Algorithms
184
stil! make it across the whole trail. We’ll assume, naturally, that the full set of on the first day before stopping. Now let ] > 1 and assume that the claim is
true for all i < j. Then
n stopping points is valid; otherwise, there would be no way to make it the
whole way. xqj - xqj_l <_ d,
We can now state the question as follows. Is your Mends’ greedy
algorithm--hiking as long as possible each day--optimal, in the sense that it since S is a valid set of stopping points, and
finds a valid set whose size is as small as possible.~
xq~ - Xp~:~ < xq~ - xq~_~
SoIation Often a greedy algorithm looks correct when you first encounter it,
since xp~.l > xqj_l by the induction hypothesis. Combining these two inequal-
so before succumbing too deeply to its intuitive appeal, it’s useful to ask: why
ities, we have
might it not work~. What should we be worried about.~
There’s a natural concern with this algorithm: Might it not help to stop
early on some day, so as to get better synchronized with camping opportunities
on future days~. But if you think about it, you start to wonder whether this could This means that your Mends have the option of hiking all the way from
really happen. Could there really be an alternate solution that intentionally.lags Xpi_~ to Xqi in one day; and hence the location Xpj at which they finally stop
behind the greedy solution, and then puts on a burst of speed and passes the can only be farther along than xq~. (Note the similarity with the corresponding
greedy solution? How could it pass it, giv._en that the greedy solution travels as proof for the Interval Scheduling Problem: here too the greedy algorithm is
far as possible each day? staying ahead because, at each step, the choice made by the alternate solution
This last consideration starts to look like the outline of an argument based is one of its valid options.) i
on the "staying ahead" principle from Section 4.1. Perhaps we can show that as
long as the greedy camping strategy is ahead on a given day, no other solution we must have Xp,
Statement n < L -implies
(4.40) d, for otherwise your that
in particular MendsXqmwould
<_ Xpm.never
NOW, have needed
if m < k, then
can catch up and overtake it the next day.
We now turn this into a proof showing the algorithm is indeed optimal, to stop at the location Xp,~+~. Combining these two inequalities, we have
identifying a natural sense in which the stopping points it chooses "stay ahead" concluded that Xqm < L -- d; but this contradicts the assumption that S is a
of any other legal set of stopping points. Although we are following the style valid set of stopping points.
of proof from Section 4.1, it’s worth noting an interesting contrast with the Consequently, we cannot have m < k, and so we have proved that the
Interval Scheduling Problem: there we needed to prove that a greedy algorithm greedy algorithm produces a valid set of stopping points of minimum possible
maximized a quantity of interest, whereas here we seek to minimize a certain size.
quantity.
Let R = {xpl ..... xpk} denote the set of stopping points chosen- by the Solved Exercise 2
greedy algorithm, and suppose by way of contradiction that there is a smaller Your ~ends are starting a security company that needs to obtain licenses for
valid set of stopping points; let’s call this smaller set S = {xq~ ..... xqm}, with n different pieces of cryptographic software. Due to regulations, they can only
obtain these licenses at the rate of at most one per month.
To obtain a contradiction, we first show that the stopping point reached by Each license is currently selling for a price of $!00. However, they are
the greedy algorithm on each day j is farther than the stopping point reached all becoming more expensive according to exponential growth curves: in
under the alternate solution. That is, particular, the cost of license] increases by a factor of rj > 1 each month, where
rj is a given parameter. This means that if license] is purchased t months from
(4.40) For each j = 1, 2 ..... m, we have Xpj > x~tj. now, it will cost 100. r~. We will assume that all the price growth rates are
distinct; that is, ri ~ r1 for licenses i # ] (even though they start at the same
Proof. We prove this by induction on j. The case j = 1 follows directly from
price of $100).
the definition of the greedy algorithm: your friends travel as long as possible
Chapter 4 Greedy Algorithms Solved Exercises 187
186
The question is: Given that the company can only buy at most one license to both expressions, we want to show that the second term is less than the
a month, in which order should it buy the licenses so that the total amount of first one. So we want to show that
money it spends is as small as possible? t ’ ~t+l t ~t+l
ri+~ + ’t < ri + lt+l
Give an algorithm that takes the n rates of price growth rI, r2 ..... rn, and
rt+ l ~t ~t+l ~t
computes an order in which to buy the licenses so that the total amount of t -- It < tt+l -- tt+l
money spent is minimized. The running time of your algorithm shonld be r~(rt - I) < r~+t(rt+~ - I).
polynomial in n.
But this last inequality is true simply because ri > 1 for all i and since rt < rt+P
Solution Two natural guesses for a good sequence would be to sort the ri in
decreasing order, or to sort them in increasing order. Faced with alternatives This concludes the proof of correctness. The running time of the algorithm
like this, it’s perfectly reasonable to work out a small example and see if the is O(n log n), since the sorting takes that much time and the rest (outputting)
example eliminates at least one of them. Here we could try rl = 2, r2 = 3, and is linear. So the overall running time is O(n log n).
r3 = 4. Buying the licenses in increasing order results in a total cost of Note: It’s interesting to note that things become much less straightforward
if we vary this question even a little. Suppose that instead of buying licenses
100(2 -t- 32 4- 43) -= 7,500, whose prices increase, you’re trying to sell off equipment whose cost is
while buying them in decreasing order results in a total cost of depreciating. Item i depreciates at a factor of ri < I per month, starting from
$i00, so if you sell it t months from now you wil! receive 100. t (In
ri. other
100(4 + 32 + 23) ---- 2300. words, the exponential rates are now less than 1, instead of greater than 1.) If
you can only sell one item per month, what is the optimal order in which to
This tells us that increasing order is not the way to go. (On the other hand, it sell them? Here, it turns out that there are cases in which the optimal solution
doesn’t tell us immediately that decreasing order is the right answer, but our doesn’t put the rates in either increasing or decreasing order (as in the input
goal was just to eliminate one of the two options.)
4’ 2’ "
Let’s try proving that sorting the ri in decreasing order in fact always gives
the optimal solution. When a greedy algorithm works for problems like this,
in which we put a set of things in an optimal order, we’ve seen in the text that Solved Exercise 3
it’s often effective to try proving correctness using an exchange argument. Suppose you are given a connected graph G, with edge costs that you may
To do this here, let’s suppose that there is an optimal solution O that assume are all distinct. G has n vertices and m edges. A particular edge e of G
is specified. Give an algorithm with running time O(m + n) to decide whether
differs from our solution S. (In other words, S consists of the licenses sorted in
decreasing order.) So this optimal solution O must contain an inversion--that e is contained in a minimum spanning tree of G.
is, there must exist two neighboring months t and t + 1 such that the price Solution From the text, we know of two rules by which we can conclude
increase rate of the license bought in month t (let us denote it by rti is less whether an edge e belongs to a minimum spanning tree: the Cut Property
than that bought in month t + 1 (similarly, we use rt+l to denote this). That (4.17) says that e is in every minimum spanning tree when it is the cheapest
is, we have rt < rt+l. edge crossing from some set S to the complement V - S; and the Cycle Property
We claim that by exchanging these two purchases, we can strictly improve (4.20) says that e is in no minimum spanning tree if it is the most expensive
our optimal solution, which contradicts the assumption that O was optimal. edge on some cycle C. Let’s see if we can make use of these two rules as part
Therefore if we succeed in showing this, we will successflflly show that ’ou.r of an algorithm that solves this problem in linear time. ,
algorithm is indeed the correct one. - Both the Cut and Cycle Properties are essentially talking about how e
Notice that if we swap these two purchases, the rest of the purchafies relates to the set of edges that are cheaper than e. The Cut Property can be
are identically priced. In O, the amount paid during the two months involved viewed as asking: Is there some set S __ V so that in order to get from S to V - S
in the swap is 100(r[ +q+u’-t+~" On the other hand, if we swapped these two without using e, we need to use an edge that is more expensive than e? And
purchases, we would pay 100(r~+~ + r~+~). Since the constant 100 is common if we think about the cycle C in the statement of the Cycle Property, going the
Exercises 191
Chapter 4 Greedy Algorithms
190
Your friend is working as a camp counselor, and he is in charge of
Prove that, for a given set of boxes with specified weights, the greedy
organizing activities for a set of junior-high-school-age campers. One of
algorithm currently in use actually minimizes the number of trucks that
his plans is the following mini-triathalon exercise: each contestant must
are needed. Your proof should follow the type of analysis we used for
swim 20 laps of a pool, then bike 10 miles, then run 3 miles. The plan is
the Interval Scheduling Problem: it should establish the optimality of this
to send the contestants out in a staggered fashion, via the following rule:
greedy packing algorithm by identif34ng a measure under which it "stays
the contestants must use the pool one at a lime. In other words, first one
ahead" of all other solutions.
contestant swims the 20 laps, gets out, and starts biking. As soon as this
Some of your friends have gotten Into the burgeoning field of time-series first person is out of the pool, a second contestant begins swimming the
data mining, in which one looks for patterns in sequences of events that 20 laps; as soon as he or she is out and starts biking, a third contestant
occur over time. Purchases at stock exchanges--what’s being bought-- begins swimming.., and so on.)
are one source of data with a natural ordering in time. Given a long Each contestant has a projected swimming time (the expected time it
sequence S of such events, your friends want an efficient way to detect will take him or her to complete the 20 laps), a projected biking time (the
certain "patterns" in them--for example, they may want to know if the expected time it will take him or her to complete the 10 miles of bicycling),
four events and a projected running time (the time it will take him or her to complete
the 3 miles of running). Your friend wants to decide on a schedule for the
buy Yahoo, buy eBay, buy Yahoo, buy Oracle
triathalon: an order in which to sequence the starts of the contestants.
occur in this sequence S, in order but not necessarily consecutively. Let’s say that the completion time of a schedul~ is the earliest time at
They begin with a collection of possible events (e.g., the possible’ which all contestants will be finished with all three legs of the triathalon,
transactions) and a sequence S of n of these events. A given event may assuming they each spend exactly their projected swimming, biking, and
occur multiple times in S (e.g., Yahoo stock may be bought many times running times on the three parts. (Again, note that participants can bike
In a single sequence S). We will say that a sequence S’ is a subsequence and run simultaneously, but at most one person can be in the pool at
of S if there is a way to delete certain of the events from S so that the any time.) What’s the best order for sending people out, if one wants the
remaining events, in order, are equal to the sequence S’. So, for example, whole competition to be over as early as possible? More precisely, give
the sequence of four events above is a subsequence of the sequence an efficient algorithm that produces a schedule whose completion time
is as small as possible.
buy Amazon, buy Yahoo, buy eBay, buy Yahoo, buy Yahoo,
buy Oracle
The wildly popular Spanish-language search engine E1 Goog needs to do
Their goal is to be able to dream up short sequences and quickly a serious amount of computation every time it recompiles its index. For-
detect whether they are subsequences of S. So this is the problem they tunately, the company has at its disposal a single large supercomputer,
pose to you: Give an algorithm that takes two sequences of even~s--S’ of together with an essentia!ly unlimited supply of high-end PCs.
length m and S of length n, each possibly containing an event more than They’ve broken the overall computation into n distinct jobs, labeled
once--and decides in time O(m + n) whether S’ is a subsequence of S.
71, J2 ..... Jn, which can be performed completely Independently of one
another. Each job consists of two stages: first it needs to be preprocessed
Let’s consider a long, quiet country road with houses scattered very
on the supercomputer, and then it needs to be finished on one of the
sparsely along it. (We can picture the road as a long line segment, with
PCs. Let’s say that job J~ needs p~ seconds of time on. the supercomputer,
an eastern endpoint and a western endpoint.) Further, let’s suppose that
followed by f~ seconds of time on a PC.
despite the bucolic setting, the residents of all these houses are avid cell
phone users. You want to place cell phone base stations at certain points Since there are at least n PCs available on the premises, the finishing
along the road, so that every house is within four miles of one of the base of the jobs can be performed fully in para!lel--all the jobs can be pro-
stations. cessed at the same time. However, the supercomputer can only work on
a single job at a time, so the system managers need to work out an order
Give an efficient algorithm that achieves this goal, using as few base
in which to feed the jobs to the supercomputer. As soon as the first job
stations as possible.
Chapter 4 Greedy Algorithms Exercises 193
192
in order is done on the supercomputer, it can be handed off to a PC for Suppose T is no longer the minimum-cost spanning tree. Give a
finishing; at that point in time a second job can be fed to the supercom- linear-time algorithm (time O(IEI)) to update the tree T to the new
purer; when the second job is done on the supercomputer, it can proceed minLmum-cost spanning tree.
to a PC regardless of whether or not the first job is done (since the PCs
work in parallel); and so on. 11. Suppose you are given a connected graph G = (V, E), with a cost ce on
each edge e. In an earlier problem, we saw that when all edge costs are
Let’s say that a schedule is an ordering of the jobs for the super-
distinct, G has a unique minimum spanning tree. However, G may have
computer, and the completion time of the schedule is the earliest time at
many minimum spanning trees when the edge costs are not all distinct.
which all jobs will have finished processing on the PCs. This is an impor-
Here we formulate the question: Can Kruskal’s Algorithm be made to find
tant quantity to minimize, since it determines how rapidly E1 Goog can
all the minimum spanning trees of G?
generate a new index.
RecaLl that Kxuskal’s Algorithm sorted the edges in order of increas-
Give a polynomial-time algorithm that finds a schedule with as small
ing cost, then greedily processed edges one by one, adding an edge e as
a completion time as possible. long as it did not form a cycle. When some edges have the same cost, the
phrase "in order of increasing cost" has to be specified a little more care-
8. Suppose you are given a connected graph G, with edge costs that are all fully: we’Ll say that an ordering of the edges is valid if the corresponding
distinct. Prove that G has a tmique minimum spann~g tree. sequence of edge costs is nondecreasing. We’Ll say that a valid execution
of Kruskal’s Algorithm is one that begins with a valid ordering of the
One of the basic motivations behind the’Minimum Spanning Tree Proble~fi edges of G.
is the goal of designing a spanning network for a set of nodes with
For any graph G, and any minimum spanning tree T of G, is there a
minimum total cost. Herewe explore another type of objective: designing
valid execution of Kruskal’s Algorithm onG that produces T as output?
a spanning network for which the most expensive edge is as cheap as
Giv,e a proof or a countere.xample.
possible.
Specifically, let G -= (V, E) be a connected graph with n vertices, m 12. Suppose you have n video streams that need to be sent, one after another,
edges, and positive edge costs that you may assume are all distinct. Let over a communication link. Stream i consists of a total of bi bits that need
T = (V, E’) be a spanning tree of G; we define the bottleneck edge of T to to be sent, at a constant rate, over a period of ti seconds. You cannot send
be the edge of T with the greatest cost. two streams at the same time, so you need to determine a schedule for the
A spanning tree T of G is a minimum-bottleneck spanning tree ff there streams: an order in which to send them. Whichever order you choose,
is no spanning tree T’ of G with a cheaper bottleneck edge. there cannot be any delays between the end of one stream and the start
(a) Is every minimum-bottleneck tree of G a minimum spanning tree of of the next. Suppose your schedule starts at time 0 (and therefore ends at
time ~1 ti, whichever order you choose). We assume that all the values
G? Prove or give a counterexample.
bi and t~ are positive integers.
(b) Is every minimum spanning tree of G a minimum-bottleneck tree of
G? Prove or give a counterexample. Now, because you’re just one user, the link does not want you taking
up too much bandwidth, so it imposes the following constraint, using a
fixed parameter r:
10. Let G = (V, E) be an (undirected) graph with costs ce >_ 0 on the edges e ~ E.
Assume you are given a minimum-cost spanning tree T in G. Now assume (,) For each natural number t > O, the total number of bits you send over the
that a new edge is added to G, connecting two nodes v, tv V with cost c. time interval from 0 to t cannot exceed rt.
(a) Give an efficient algorithm to test if T remains the minimum-cost
spanning tree with the new edge added to G (but not to the tree T). Note that this constraint is only imposed for time intervals that start at
0, not for time intervals that start at any other value.
Make your algorithm run in time O(IEI). Can you do it in O(IVI) time?
Please note any assumptions you make about what data structure is We say that a schedule is valid if it satisfies the constraint (.) imposed
used to represent the tree T and the graph G. by the link.
Exercises 195
Chapter 4 Greedy Algorithms
194
w2 = 2. Then doing job 1 first would yield a weighted completion time
The Problem. Given a set of n streams, each specified by its number of
of 10.1 + 2.4 = 18, while doing the second job first would yield the larger
bits bi and its time duration ti, as well as the link parameter r, determine
weighted completion time of 10.4 + 2.3 = 46.
whether there exists a valid schedule.
Example. Suppose we have n = 3 streams, with
(hi, q) = (2000, 1), (b2, t2) = (6000, 2), (b3, t3) = (2000, 1), 14. You’re working with a group of security consultants who are helping to
monitor a large computer system. There’s particular interest in keeping
and suppose the link’s parameter is r = 5000. Then the schedule that runs track of processes that are labeled "sensitive." Each such process has a
the streams in the order 1, 2, 3, is valid, since the constraint (.) is satisfied: designated start time and finish time, and it rtms continuously between
t = 1: the whole first stream has been sent, and 2000 < 5000.1 these times; the consultants have a list of the planned start and finish
t = 2: half of the second stream has also been sent, times of al! sensitive processes that will be run that day.
and 2000+ 5000 5000- 2 As a simple first step, they’ve written a program called s~ca~;us_check
Similar calcalations hold for t = 3 and t = 4. that, when invoked, runs for a few seconds and records various pieces
(a) Consider the following claim: of logging information about all the sensitive processes running on the
system at that moment. (We’ll model each invocation of status_check
Claim: There exists a valid schedule if and only if each stream i satisfies
as lasting for only this single point in time.) What they’d like to do is to
bi < rti. run status_check as few times as possible during the day, but enough
Decide whether you think the claim is true or false, and give a proof that for each sensitive process P, status_check is invoked at least once
of either the claim or its negation. during the execution of process P.
(b) Give an algorithm that takes a set of n streams, each specified by its (a) Give an efficient algorithm that, given the start and finish times of
number of bits bi and its time duration ti, as well as the link parameter all the sensitive processes, finds as small a set of times as possi-
r, and determines whether there exists a valid schedule. The rtmning ble at which to invoke s~;a~cus_check, subject to the requirement
time of your algorithm should be polynomial in n. that s~a~cus_check is invoked at least once during each sensitive
process P.
A small business--say, a photocopying service with a single large
machine--faces the following scheduling problem. Each morning they (b) WtKle you were designing your algorithm, the security consultants
get a set of jobs from customers. They want to do the jobs on their single were engaging in a little back-of-the-envelope reasoning. "Suppose
machine in an order that keeps their customers happiest. Customer i’s we can find a set of k sensitive processes with the property that no
job will take ti time to complete. Given a schedule (i.e., an ordering of the two are ever running at the same time. Then clearly your algorithm
jobs), let Ci denote the finishing time of job i. For example, if job j is the will need to invoke s~ca~;us_check at least k times: no one invocation
first to be donel we would have Ci = tj; and ff job j is done right after job of s~a~cus_check can handle more than one of these processes."
i, we would have Ci = Q + ti. Each customer i also has a given weight wg This is true, of course, and after some further discussion, you al!
~sents his or her importance to the business. The happiness of begin wondering whether something stronger is true as well, a kind
customer i is expected to be dependent o~ the finishing time of i’s job. of converse to the above argument. Suppose that k* is the largest
So the company decides that they want to order the jobs to mJnimlze the value of k such that one can find a set of k sensitive processes with
weighted sum of the completion times, ~,n i=1 wiCi" no two ever running at the same time. Is it the ~ase that there must
Design an efficient algorithm to solve this problem. That is, you are be a set of k* times at which you can run s~a~;us_check so that some
given a set of n jobs with a processing time ti and a weight w~ for each invocation occurs during the execution of each sensitive process? (In
job. You want to order the jobs so as to minimize the weighted sum of other words, the kind of argument in the previous paragraph is really
the completion times, ~P=I wiCi- the only thing forcing you to need a lot of invocations of
check.) Decide whether you think this claim is true or false, and give
Example. Suppose there are two jobs: the first takes time q = ! and has
a proof or a counterexample.
weight wl = !0, while the second job takes time t2 = 3 and has weight
Exercises 197
Chapter 4 Greedy Algorithms
196
whether it’s possible to associate each of the account’s n events with
15. The manager of a large student union on campus comes to you with the a distinct one of the n suspicious transactions in such a way that, if the
following problem. She’s in charge of a group of n students, each of whom
account event at time x~ is associated with the suspicious transaction that
is scheduled to work one shift during the week. There are different jobs
occurred approximately at time tj, then Itj - x~l <_ e~. (In other words, they
associated with these shifts (tending the main desk, helping with package
want to know if the activity on the account lines up with the suspicious
delivery, rebooting cranky information kiosks, etc.), but.we can view each
transactions to within the margin of error; the tricky part here is that
shift as a single contiguous interval of time. There can be multiple shifts
they don’t know which account event to associate with which suspicious
going on at once. transaction.)
She’s trying to choose a subset of these n students to form a super-
Give an efficient algorithm that takes the given data and decides
vising committee that she can meet with once a week. She considers such
whether such an association exists. If possible, you should make the
a committee to be complete if, for every student not on the committee,
running time be at most O(n2).
that student’s shift overlaps (at least partially) the shift of some student
who is on the committee. In this way, each student’s performance can be 17. Consider the following variation on the Interval Scheduling Problem. You
observed by at least one person who’s serving on the committee. have a processor that can operate 24 hours a day, every day. People
Give an efficient algorithm that takes the schedule of n shifts and submit requests to run daily jobs on the processor. Each such job comes
produces a complete supervising committee containing as few students with a start time and an end time; if the job is accepted to run on the
as possible. processor, it must run conl~nuously, every day, for the period between
Example. Suppose n = 3, and the shifts are its start and end times. (Note that certain jobs can begin before midnight
and end after midnight; this makes for a type of situation different from
Monday 4 p.M.-Monday 8 P.M., what we saw in the Interval Scheduling Problem.)
Monday 6 p.M.-Monday 10 P.M.,
Given a list of n such jobs, your goal is to accept as many jobs as
Monday 9 P.M.-Monday 1I P.M..
possible (regardless of their length), subject to the constraint that the
Then the smallest complete supervising committee would consist of just processor can run at most one job at any given point in time. Provide an
the second student, since the second shift overlaps both the first and the algorithm to do this with a running time that is polynomial in n. You may
third. assume for simplicity that no two jobs have the same start or end times.
Example. Consider the fol!owing four jobs, specified by (start-time, end-
16. Some security consultants wor~g in the financial domain are cur-
rently advising a client who is investigating a potential money-latmdering time) pairs.
scheme. The investigation thus far has indicated that n suspicious trans- (6 P.M., 6 A.M.), (9 P.M., 4 A.M.), (3 A.M., 2 P.M.), (1 P.M., 7 P.M.).
actions took place in recent days, each involving money transferred into a
single account. Unfortunately, the sketchy nature of the evidence to date The optimal solution would be to pick the two jobs (9 P.M., 4 A.M.) and (1
means that they don’t know the identiW of the account, the amounts of P.M., 7 P.~1.), which can be scheduled without overlapping.
the transactions, or the exact t~nes at which the transactions took place.
18. Your friends are planning an expedition to a small town deep in the Cana-
What they do have is an approximate time-stamp for each transaction; the
dian north next winter break. They’ve researched all the travel options
evidence indicates that transaction i took place at time ti ~: e~, for some
and have drawn up a directed graph whose nodes represent intermediate
"margin of error" ev (In other words, it took place sometime between t~ - ei
destinations and edges represent the roads between them.
and t~ + e~.) Note that different transactions may have different margins
In the course of this, they’ve also learned that extreme weather causes
of error.
roads in this part of the world to become quite slow in the winter and
In the last day or so, they’ve come across a bank account that (for
may cause large travel delays. They’ve found an excellent travel Web site
other reasons we don’t need to go into here) they suspect might be the
that can accurately predict how fast they’ll be able to trave_l along the
one involved in the crime. There are n recent events involving the account,
roads; however, the speed of travel depends on the time of year. More
which took place at times Xl, x2 ..... xn. To see whether it’s plausible
precisely, the Web site answers queries of the following form: given an
that this really is the account they’re looking for, they’re wondering
Chapter 4 Greedy Algorithms Exercises 199
198
edge e = (u, w) connecting two sites v and w, and given a proposed starting Show that such a tree exists, and give an efficient algorithm to find
time t from location u, the site will return a value fe(t), the predicted one. That is, give an algorithm constructing a spanning tree T in which,
for each u, v v, the bottleneck rate of the u-v path in T is equal to the
arrival time at w. The Web site guarantees that re(t) >_ t for all edges e
and all times t (you can’t travel backward in time), and that fe(t) is a best achievable bottleneck rate for the pair u, v in G.
monotone increasing function of t (that is, you do not arrive earlier by 20. Every September, somewhere In a far-away mountainous part of the
starting later). Other than that, the functions fe(t) may be arbitrary. For world, the county highway crews get together and decide which roads to
example, in areas where the travel time does not vary with the season, keep dear through thecoming winter. There are n towns in this county,
we would have fe(t) = t + ee, where ee is the time needed to travel from the and the road system can be viewed as a (connected) graph G = (V, E) on
beginning to the end of edge e. this set of towns, each edge representing a road joining two of them.
Your friends want to use the Web site to determine the fastest way In the winter, people are high enough up in the mountains that they
to travel through the directed graph from their starting point to their stop worrying about the length of roads and start worrying about their
intended destination. (You should assume that they start at time 0, and altitude--this is really what determines how difficult the trip will be.
that all predictions made by the Web site are completely correct.) Give a So each road--each edge e in the graph--is annotated with a number
polynomial-time algorithm to do this, where we treat a single query to ue that gives the altitude of the highest point on the road. We’ll assume
the Web site (based on a specific edge e and a time t) as taking a single that no two edges have exactly the same altitude value ae. The height of
computational step. a path P in the graph is then the maximum of ae over all edges e on P.
Fina~y, a path between towns i andj is declared tO be winter-optimal flit
achieves the minimum possible height over a~ paths from i to j.
19. A group of network designers at the communications company CluNet The highway crews are goIng to select a set E’ ~ E of the roads to keep
find themselves facing the following problem. They have a connected dear through the winter; the rest will be left unmaintained and kept off
graph G = (V, E), in which the nodes represent sites that want to com- limits to travelers. They all agree that whichever subset of roads E’ they
municate. Each edge e is a communication link, with a given available decide to keep clear, it should have the properW that (v, E’) is a connected
bandwidth by subgraph; and more strongly, for every pair of towns i and j, the height
For each pair of nodes u, u ~ V, they want to select a single u-u path P of the winter-optimal path in (V, E’) should be no greater than it is In the
on which this pair will communicate. The bottleneck rate b(V) of this p athbV fi~ graph G = (V, E). We’ll say that (V, E’) is a minimum-altitude connected
is the minimumbandwidth of any edge it contains; that is, b(P) = mine~p e. subgraph if it has this property.
The best achievable bottleneck rate for the pair u, v in G is simply the Given that they’re goIng to maintain ~s key property, however, they
maximum, over all u-v paths P in G, of the value b(P). otherwise want to keep as few roads clear as possible. One year, they hit
It’s getting to be very complicated to keep track of a path for each pair upon the following conjecture:
of nodes, and so one of the network designers makes a bold suggestion: The minimum spanning tree of G, with respect to the edge weights ae, is a
Maybe one can find a spanning tree T of G so that for every pair of nodes minimum-altitude connected subgraph.
u, v, the unique u-v path in the tree actually attains the best achievable
bottleneck rate for u, v in G. (In other words, even if you could choose (In an earlier problem, we claimed that there is a unique minimum span-
any u-v path in the whole graph, you couldn’t do better than the u-u path ning tree when the edge weights are distinct. Thus, thanks to the assump-
tion that all ae are distinct, it is okay for us to speak of the minimum
In T.)
spanning tree.)
This idea is roundly heckled in the offices of CluNet for a few days,
and there’s a natural reason for the skepticism: each pair of nodes Initially, this conjecture is somewhat counterintuitive, sInce the min-
might want a very different-looking path to maximize its bottleneck rate; imum spanning tree is trying to minimize the sum of the values ae, while
why should there be a single tree that simultaneously makes everybody the goal of minimizing altitude seems to be asking for a fully different
happy? But after some failed attempts to rule out the idea, people begin thing. But lacking an argument to the contrary, they begin considering an
even bolder second conjecture:
to suspect it could be possible.
Exercises
Chapter 4 Greedy Algorithms 201
200
21. Let us say that a graph G = (V, E) is a near-tree if it is connected and has at The root generates a clock signal which is propagated along the edges
most n + 8 edges, where n = IVI. Give an algorithm with running t~me O(n)
to the leaves. We’]] assume that the time it takes for the signal to reach a
that takes a near-tree G with costs on its edges, and returns a minimum
given leaf is proportional to the distance from the root to the leaf.
spanning tree of G. You may assume that all the edge costs are distinct.
Now, if all leaves do not have the same distance from the root, then
the signal will not reach the leaves at the same time, and this is a big
22. Consider the Minimum Spanning Tree Problem on an undirected graph
G = (V, E), with a cost ce >_ 0 on each edge, where the costs may not all problem. We want the leaves to be completely synchronized, and all to
be different. If the costs are not a~ distinct, there can in general be receive the signal at the same time. To make this happen, we will have to
many distinct minimum-cost solutions. Suppose we are given a spanning increase the lengths of certain edges, so that all root-to-leaf paths have
tree T c E with the guarantee that for every e ~ T, e belongs to some the same length (we’re not able to shrink edge lengths). If we achieve this,
minimum-cost spanning tree in G. Can we conclude that T itself must then the tree (with its new edge lengths) will be said to have zero skew.
be a minimum-cost spanning tree in G? Give a proof or a counterexample Our goal is to achieve zero skew in a way that keeps the sum of all the
with explanation. edge lengths as small as possible.
Give an algorithm that increases the lengths of certain edges so that
23. Recall the problem of computing a minimum-cost arborescence in a the resulting tree has zero skew and the total edge length is as sma]] as
directed graph G = (V, E), with a cost ce >_ 0 on each edge. Here we will possible.
consider the case in which G is a directed acyclic graph--that is, it contains
Example. Consider the tree in Figure 4.20, in which letters name the nodes
no directed cycles. and numbers indicate the edge lengths.
As in general directed graphs, there can be many distinct minimum- The unique optimal solution for ~s instance would be to take the
cost solutions. Suppose we are given a directed acyclic graph G = (V, E), three length-1 edges and increase each of their lengths to 2. The resulting
and an arborescence A c E with the guarantee that for every e ~ A, e
tree has zero skew, and the total edge length is 12, the smallest possible.
belongs to some minimum-cost arborescence in G. Can we conclude that
A itself must be a minimum-cost arborescence in G? Give a proof or a 25. Suppose we are given a set of points P = [Pl,P2 ..... Pn}, together with a
counterexample with explanation. distance function d on the set P; d is simply a function bn paJ_rs of points in
P with the properties that d(p~,pi) = d(py, Pi) > 0 ff i #j, and that d(p~, pi) = 0
24. TimJ.ng circuits are a crucial component of VLSI chips. Here’s a simple for each i.
model of such a timing circuit. Consider a complete balanced binary tree
We define a hierarchical metric onP to be any distance function r that
with n leaves, where n is a power of two. Each edge e of the tree has an
can be constructed as fo]]ows. We build a rooted tree T with n leaves, and
associated length ~e, which is a positive number. The distance from the we associate with each node v of T (both leaves and internal nodes) a
root to a given leaf is the sum of the lengths of all the edges on the path height hr. These heights must satisfy the properties that h(v) = 0 for each
from the root to the leaf.
Exercises 203
Chapter 4 Greedy Algorithms
202
Here is one way to do this. Let G be a connected graph, and T and T’
leaf v, and ff u is the parent of v in T, then h(u) >_ h(v). We place each point two different spanning trees of G.. We say that T and T’ are neighbors if
in P at a distinct leaf in T. Now, for any pair of points p~ and Pi, their T contains exactly one edge that is not in T’, and T"contains exactly one
distance ~(p~, Pi) is defined as follows. We determine the least common edge that is not in T.
ancestor v in T of the leaves containing p~ and Pi, and define ~(p~,
Now, from any graph G, we can build a (large) graph 9~ as follows.
We say that a hierarchical metric r is consistent with our distance The nodes of 9~ are the spanning trees of G, and there is an edge between
function d if, for all pairs i,j, we have r(p~,pl) _< d(p~,Pi). two nodes of 9C if the corresponding spanning trees are neighbors.
Give a polynomial-time algorithm that takes the distance function d Is it true that, for any connected graph G, the resulting graph ~
and produces a hierarchical metric ~ with the following properties. is connected? Give a proof that ~K is always connected, or provide an
(i) ~ is consistent with d, and example (with explanation) of a connected graph G for which % is not
<- connected.
(ii) ff ~’ is any other hierarchical metric consistent with d, then ~’(P~,Pi)
r(p~,pi) for each pair of points Pi and
28. Suppose you’re a consultant for the networking company CluNet, and
26. One of the first things you learn in calculus is how to minimize a dif- they have the following problem. The network that they’re currently
ferentiable function such as y = ax2 + bx + c, where a > 0. The Minimum working on is modeled by a connected graph G = (V, E) with n nodes.
Spanning Tree Problem, on the other hand, is a minimization problem of Each edge e is a fiber-optic cable that is owned by one of two companies--
a very different flavor: there are now just a~ finite number of possibilities, creatively named X and Y--and leased to CluNet.
for how the minimum might be achieved--rather than a continuum of Their plan is to choose a spanning tree T of G and upgrade the links
possibilities--and we are interested in how to perform the computation corresponding to the edges of T. Their business relations people have
without having to exhaust this (huge) finite number of possibilities. already concluded an agreement with companies X and Y stipulating a
One Can ask what happens when these two minimization issues number k so that in the tree T that is chosen, k of the edges will be owned
are brought together, and the following question is an example of this. by X and n - k - 1 of the edges will be owned by Y.
Suppose we have a connected graph G = (V, E). Each edge e now has a time- CluNet management now faces the following problem. It is not at all
varying edge cost given by a function fe :R-+R. Thus, at time t, it has cost clear to them whether there even exists a spanning tree T meeting these
re(t). We’l! assume that all these functions are positive over their entire conditions, or how to find one if it exists. So this is the problem they put
range. Observe that the set of edges constituting the minimum spanning to you: Give a polynomial-time algorithm that takes G, with each edge
tree of G may change over time. Also, of course, the cost of the minimum labeled X or Y, and either (i) returns a spanning tree with e~xactly k edges
spanning tree of G becomes a function of the time t; we’ll denote this labeled X, or (ii) reports correctly that no such tree exists.
function ca(t). A natural problem then becomes: find a value of t at which
cG(t) is minimized. 29. Given a list of n natural numbers all, d2 ..... tin, show how to decide
Suppose each function fe is a polynomial of degree 2: re(t) =aetz + in polynomial time whether there exists an undirected graph G = (V, E)
bet + Ce, where ae > 0. Give an algorithm that takes the graph G and the whose node degrees are precisely the numbers d~, d2 ..... dn. (That is, ff
values {(ae, be, ce) : e ~ E} and returns a value of the time t at which the V = {Ul, v2 ..... vn}, then the degree of u~ should be exactly dv) G should not
minimum spanning tree has minimum cost. Your algorithm should run contain multiple edges between the same pair of nodes, or "!oop" edges
in time polynomial in the number of nodes and edges of the graph G. You with both endpoints equal to the same node.
may assume that arithmetic operations on the numbers {(ae, be, q)} can
be done in constant time per operation. 30. Let G = (V, E) be a graph with n nodes in which each pair of nodes is
joined by an edge. There is a positive weight w~i on each edge (i,]); and
27. In trying to understand the combinatorial StlXlcture of spanning trees, we will assume these weights satisfy the triangle inequality tv~k <_ ra~i + Wik.
we can consider the space of all possible spanning trees of a given graph For a subset V’ _ V, we will use G[V’] to denote the subgraph (with edge
and study the properties of this space. This is a strategy that has been weights) induced on the nodes in V’.
applied to many similar problems as well.
Chapter 4 Greedy Algorithms Notes and Further Reading 205
204
32. Consider a directed graph G = (V, E) with a root r ~ V and nonnegative
We are given a set X _ V of k terminals that must be connected by
costs on the edges. In this problem we consider variants of the ~um-
edges. We say that a Steiner tree onX is a set Z so that X ~_ Z _ V, together
cost arborescence algorithm.
with a spanning subtree T of G[Z]. The weight of the Steiner tree is the
weight of the tree T. (a) The algorithm discussed in Section 4.9 works as follows. We modify
the costs, consider the subgraph of zero-cost edges, look for a
Show that the problem of finding a minimum-weight Steiner tree on
directed cycle in this subgraph, and contract it (if one exists). Argue
X can be solved in time briefly that instead of looking for cycles, we can instead identify and
contract strong components of this subgraph.
31. Let’s go back to the original motivation for the Minimum Spanning Tree
Problem. We are given a connected, undirected graph G = (V, E) with (b) In the course of the algorithm, we defined Yv to be the minimum
positive edge lengths {~e}, and we want to find a spanning subgraph of cost of an edge entering ~, and we modified the costs of all edges e
it. Now suppose we are ~g to settle for a subgraph/4 = (V, F) that is entering node u to be c’e = ce - yr. Suppose we instead use the follow-
"denser" than a tree, and we are interested in guaranteeing that, for each ing modified cost: c~’ = max(0, ce - 2y~). This new change is_likely to
pair of vertices a, v ~ V, the length of the shortest u-v path in/4 is not turn more edges to 0 cost. Suppose now we find an arborescence T
much longer than the length of the shortest a-v path in G. By the length of 0 cost. Prove that this T has cost at most twice the cost of the
of a path P here, we mean the sum of ~e over all edges e in P. minimum-cost arborescence in the original graph.
Here’s a variant of Kruskal’s Algorithm designed to produce such a (c) Assume you do not find an arborescence of 0 cost. Contract al! 0-
subgraph. cost strong components and recursively apply the same procedure
on the resttlting graph unti! an arborescence is found. Prove that this
* First we sort all the edges in order of increasing length. (You may
T has cost at most twice the cost of the minimum-cost arborescence
assume all edge lengths are distinct.)
in the original graph.
o We then construct a subgraph H = (V, F) by considering each edge in
order. 33. Suppose you are given a directed graph G = (V, E) In which each edge has
¯ When we come to edge e = (u, v), we add e to the subgraph/4 if there a cost of either 0 or 1. Also suppose that G has a node r such that there is a
is currently no a-v path in/4. (This is what Kruskal’s Algorithm would path from r to every other node in G. You are also given an integer k. Give a
do as well.) On the other hand, if there is a u-v path in/4, we let duv polynomial-time algorithm that either constructs an arborescence rooted
denote the length of the shortest such path; again, length is with at r of cost exactly k, or reports (correctly) that no such arborescence
respect to the values {~e}. We add e to/4 ff 3~e < duv- exists.
In other words, we add an edge even when a and v are already In the same
connected component, provided that the addition of the edge reduces Notes and Further Reading
their shortest-path distance by a sufficient amount.
Let H = (V, F) be the, subgraph of G returned by the algorithm. Due to their conceptual cleanness and intuitive appeal, greedy algorithms have
(a) Prove that for evet3~ pair of nodes a, v ~ V, the length of the shortest a long history and many applications throughout computer science. In this
u-v path in H is at most three times the length of the shortest a-v chapter we focused on cases in which greedy algorithms find the optimal
solution. Greedy algorithms are also often used as simple heuristics even when
path in G.
they are not guaranteed to find the optimal solution. In Chapter 11 we will
(b) Despite its ability to approximately preserve sh°rtest-p ath distances’
discuss greedy algorithms that find near-optimal approximate solutions.
the subgraph/4 produced by the algorithm cannot be too dense.
Let f(n) denote the maximum number of edges that can possibly As discussed in Chapter 1, Interval Scheduling can be viewed as a special
be produced as the out-put of this algorithm, over all n-node input case of the Independent Set Problem on a graph that represents the overlaps
among a collection of intervals. Graphs arising this way are called interval
graphs with edge lengths. Prove that
graphs, and they have been extensively studied; see, for example, the book
by Golumbic (1980). Not just Independent Set but many hard computational
Chapter 4 Greedy Mgofithms .~otes and Further Reading 207
206
problems become much more tractable when restricted to the special case of of U.S. Supreme Court Justice Potter Stewart’s famous test for obscenity--
interval graphs. "I know it when I see it"--since one finds disagreements within the research
Interval Scheduling and the problem of scheduling to minimize the max- community on what constitutes the boundary, even intuitively, between greedy
imum lateness are two of a range of basic scheduling problems for which and nongreedy algorithms. There has been research aimed at formalizing
a simple greedy algorithm can be shown to produce an optimal solution. A classes of greedy algorithms: the theory of matroids is one very influential
wealth of related problems can be found in the survey by Lawier, Lenstra, example (Edmonds 1971; Lawler 2001); and the paper of Borodin, Nielsen, and
Rinnooy Kan, and Shmoys (1993). Rackoff (2002) formalizes notions of greedy and "greedy-type" algorithms, as
well as providing a comparison to other formal work on this quegtion.
The optimal algorithm for caching and its analysis are due to Belady
(1966). As we mentioned in the text, under real operating conditions caching Notes on the Exercises Exercise 24 is based on results of M. Edahiro, T. Chao,
algorithms must make eviction decisions in real time without knowledge of Y. Hsu, J. Ho, K. Boese, and A. Kahng; Exercise 31 is based on a result of Ingo
future requests. We will discuss such caching strategies in Chapter 13. Althofer, Gantam Das, David Dobkin, and Deborah Joseph.
The algorithm for shortest paths in a graph with nonnegative edge lengths
is due to Dijkstra (1959). Surveys of approaches to the Minimum Spanning Tree
Problem, together with historical background, can be found in the reviews by
Graham and Hell (1985) and Nesetril (1997).
The single-link algorithm is one of the most~widely used approaches to,
the general problem of clustering; the books by Anderberg (1973), Duda, Hart,
and Stork (2001), and Jaln and Dubes (1981) survey a variety of clustering
techniques.
The algorithm for optimal prefix codes is due to Huffman (1952); the ear-
lier approaches mentioned in the text appear in the books by Fano (1949) and
Shannon and Weaver (1949). General overviews of the area of data compres-
sion can be found in the book by Bell, Cleary, and Witten (1990) and the
survey by Lelewer and Hirschberg (1987). More generally, this topic belongs
to the area of information theory, which is concerned with the representation
and encoding of digital information. One of the founding works in this field
is the book by Shannon and Weaver (1949), and the more recent textbook by
Cover and Thomas (1991) provides detailed coverage of the subject..
The algorithm for finding minimum-cost arborescences is generally cred-
ited to Chu and Liu (1965) and to Edmonds (1967) independently. As discussed
in the chapter, this multi-phase approach stretches our notion of what consti-
tutes a greedy algorithm. Itis also important from the perspective of linear
programming, since in that context it can be viewed as a fundamental ap-
plication of the pricing method, or the primal-dual technique, for designing
algorithms. The book by Nemhauser and Wolsey (1988) develops these con-
nections to linear program~ning. We will discuss this method in Chapter 11 in
the context of approximation algorithms.
More generally, as we discussed at the outset of the chapter, it is hard to
find a precise definition of what constitutes a greedy algorithm. In the search
for such a deflation, it is not even clear that one can apply the analogue
Chapter
Divide artd Cortquer
Chapter 2 that the natural brute-force algorithm for finding the closest pair (5.1) For some constant c,
among n points in the plane would simply measure all ® (n2) distances, for
T(n) < 2T(n/2) + cn
a (polynomial) running time of O(n2). Using divide and conquer, we wi!!
improve the running time to O(n log n). At a high level, then, the overall theme when n > 2, and
of this chapter is the same as what we’ve been seeing earlier: that improving on
brute-force search is a fundamental conceptual hurdle in solving a problem T(2) _< c.
efficiently, and the design of sophisticated algorithms can achieve this. The The structure of (5.1) is typical of what recurrences will look like: there’s an
difference is simply that the distinction between brute-force search and an
inequality or equation that bounds T(n) in terms of an expression involving
improved solution here will not always be the distinction between exponential
T(k) for sma!ler values k; and there is a base case that generally says that
and polynomia!. T(n) is equal to a constant when n is a constant. Note that one can also write
(5.1) more informally as T(n)< 2T(n/2)+ O(n), suppressing the constant
c. However, it is generally useful to make c explicit when analyzing the
5.1 A First Recurrence: The Mergesort Algorithm recurrence.
To motivate the general approach to analyzing divide-and-conquer algorithms,
To keep the exposition simpler, we will generally assume that parameters
we begin with the Mergesort Algorithm. We discussed the Mergesort Algorithm
like n are even when needed. This is somewhat imprecise usage; without this
briefly in Chapter 2, when we surveyed common running times for algorithms.
assumption, the two recursive calls would be on problems of size In/2] and
Mergesort sorts a given list of numbers by first diviiting them into two equal
[n/2J, and the recurrence relation would say that
halves, sorting each half separately by recursion, and then combining the
results of these recursive calls--in the form of the two sorted halves--using T(n) < T([n/2]) + T(Ln/2J) + cn
the linear-time algorithm for merging sorted lists that we saw in Chapter 2.
for n > 2. Nevertheless, for all the recurrences we consider here (and for most
To analyze the running time of Mergesort, we will abstract its behavior into that arise in practice), the asymptotic bounds are not affected by the decision
the following template, which describes many common divide-and-conquer to ignore all the floors and ceilings, and it makes the symbolic manipulation
algorithms. much cleaner.
(Q Divide the input into two pieces of equal size; solve the two subproblems Now (5.1) does not exphcitly provide an asymptotic bound on the growth
on these pieces separately by recursion; and then combine the two results rate of the function T; rather, it specifies T(n) implicitly in terms of its values
into an overall solution, spending only linear time for the initial division on smaller inputs. To obtain an explicit bound, we need to solve the recurrence
and final recombining. relation so that T appears only on the left-hand side of the inequality, not the
fight-hand side as well.
In Mergesort, as in any algorithm that fits this style, we also need a base case
Recurrence solving is a task that has been incorporated into a number
for the recursion, typically having it "bottom out" on inputs of some constant
size. In the case of Mergesort, we will assume that once the input has been of standard computer algebra systems, and the solution to many standard
reduced to size 2, we stop the- recursion and sort the two elements by simply recurrences can now be found by automated means. It is still useful, however,
to understand the process of solving recurrences and to recognize which
comparing them to each other.
recurrences lead to good running times, since the design of an efficient divide-
Consider any algorithm that fits the pattern in (J-), and let T(n) denote its and-conquer algorithm is heavily intertwined with an understanding of how
worst-case running time on input instances of size n. Supposing that n is even, a recurrence relation determines a running time.
the algorithm spends O(n) time to divide the input into two pieces of size n/2
each; it then spends time T(n/2) to solve each one (since T(n/2) is the worst-
case nmning time for an input of size n/2); and finally it spends O(n) time Approaches to Solving Recurrences
to combine the solutions from the two recursive calls. Thus the running time There are two basic ways one can go about solving a recurrence, each of which
T(n) satisfies the following recurrence relation. we describe in more detail below.
5.1 A First Recurrence: The M~rgesort Algorithm 213
212 Chapter 5 Divide and Conquer
The most intuitively natural way to search for a solution to a recurrence is Idengfying a pattern: What’s going on in general? At level j of the
to "unroll" the recursion, accounting for the running time across the first recursion, the number of subproblems has doubled j times, so there are
few levels, and identify a pattern that can be continued as the recursion now a total of 2J. Each has correspondingly shrunk in size by a factor
expands. One then sums the running times over all levels of the recursion of two j times, and so each has size n/2J, and hence each takes time at
(i.e., until it "bottoms out" on subproblems of constant size) and thereby most cn/2J. Thus level j contributes a total of at most 2~(cn/2~) = cn to
the total running time.
arrives at a total running time.
A second way is to start with a guess for the solution, substitute it into Summing over all levels of recursion: We’ve found that the recurrence
the recurrence relation, and check that it works. Formally, one justifies in (5.1) has the property that the same upper bound of cn applies to
this plugging-in using an argument by induction on n. There is a useful total amount Of work performed at each level. The number of times the
variant of this method in which one has a general form for the solution, input must be halved in order to reduce its size from n to 2 is log2 n.
but does not have exact values for all the parameters. By leaving these So summing the cn work over log n levels of recursion, we get a total
parameters unspecified in the substitution, one can often work them out running time of O(n log n).
as needed. We summarize this in the following claim.
We now discuss each of these approaches, using the recurrence in (5.1) as an
example. (5.2) Any function T(.) satisfying (5.1) is bounded by O(n log n), when
n>l.
Unrolling the Mergesort Recurrence
Let’s start with the first approach to solving the recurrence in (5.1). The basic
argument is depicted in Figure 5.1. Substituting a Solution into the Mergesort Recurrence
The argument establishing (5.2) can be used to determine that the function
o Analyzing the first few levels: At the first level of recursion, we have a T(n) is bounded by O(n log n). If, on the other hand, we have a guess for
single problem of size n, which takes time at most cn plus the time spent the running time that we want to verify, we can do so by plugging it into the
in all subsequent rect~sive calls. At the next level, we have two problems recurrence as follows.
each of size n/2. Each of these takes time at most cn/2, for a total of at
most cn, again plus the time in subsequent recursive calls. At the third Suppose we believe that T(n) < cn log2 n for all n > 2, and we want to
level, we have four problems each of size n/4, each taking time at most check whether this is indeed true. This clearly holds for n = 2, since in this
cn/4, for a total of at most cn. case cnlog2 n = 2c, and (5.1) explicitly tells us that T(2) < c. Now suppose,
by induction, that T(m) <_ cm log2 m for all values of m less than n, and we
want to establish this for T(n). We do this by writing the recurrence for T(n)
Level 0: cn and plugging in the inequality T(n/2) <_ c(n/2) log2(n/2). We then simplify the
resulting expression by noticing that log2(n/2) = (log2 n) - 1. Here is the ftfll
calculation.
T(n) < 2T(n/2) + cn
Level 1:crt/2 + crt/2 = cn total
< 2c(n/2) loga(n/2) + cn
= cn[(log2 n) - 1] + cn
= (cn log2 n) - cn + cn
Level 2: 4(cn/4) = cn total
= cn log2 n.
This establishes the bound we want for T(n), assuming it holds for smaller
Figure 5.1 Unrolling the recurrence T(n) < 2T(n/2) + O(n). values m < n, and thus it completes the induction argument.
5.2 Further Recurrence Relations 215
214 Chapter 5 Divide and Conquer
cn time plus Level O: cn total (5.4) Any function T(.) satisfying (5.3) with q > 2 is bounded by O(nl°ga q).
recursive calls
So we find that the running time is more than linear, since log2 q > !,
but still polynomial in n. Plugging in specific values of q, the running time
Level 1:cn/2 + cn/2 + cn/2 = (3/2)cn total
is O(nl°g~ 3) = O(nl.sg) when q = 3; and the running time is O(nl°g~ 4) = O(n2)
when q = 4. This increase in running time as q increases makes sense, of
course, since the recursive calls generate more work for larger values of q.
Level 2: 9(cn/4) = (9/4)cn total
Applying Partial Substitution The appearance of log2 q in the exponent
followed naturally from our solution to (5.3), but it’s not necessarily an
expression one would have guessed at the outset. We now consider how an
Figure 5.2 Unrolling the recurrence T(n) < 3T(n/2) + O(rt). approach based on partial substitution into the recurrence yields a different
way of discovering this exponent.
Summing over all levels of recursion: As before, there are log). n levels of Suppose we guess that the solution to (5.3), when q > 2, has the form
recursion, and the total amount of work performed is the sum over-all T(n) <_ kna for some constants k > 0 and d > 1. This is quite a general guess,
these: since we haven’t even tried specifying the exponent d of the polynomial. Now
let’s try starting the inductive argument and seeing what constraints we need
on k and d. We have
This is a geometric sum, consisting of powers of r = q/2. We can use the T(n) <_ qT(n/2) + cn,
formula for a geometric sum when r > 1, which gives us the formula
and applying the inductive hypothesis to T(n/2), this expands to
rlogz
r(n) <_cn \ r-n _!.1)< cn
+ cn
Since we’re aiming for an asymptotic upper bound, it is useful to figure
out what’s simply a constant; we can pull out the factor of r - 1 from = q, knd + cn.
2a
the denominator, and write the last expression as
nrlog2 n This is remarkably close to something that works: if we choose d so that
T(n) <_ q/2d = 1, then we have T(n) < knd + cn, which is almost right except for the
extra term cn. So let’s deal with these two issues: first, how to choose d so we
Finally, we need to figure out what rl°g2 n is. Here we use a very handy get q/2a = 1; and second, how to get rid of the cn term.
identity, which says that, for any a > 1 and b > 1, we have al°g b = blOg a
Choosing d is easy: we want 2d = q, and so d = log2 q. Thus we see that
Thus
the exponent log2 q appears very naturally once we dec.ide to discover which
rlog2 n _~_ nlog2 r = nlog2(q/2) = n(logz q)-l. value of d works when substituted into the recurrence.
But we still have to get rid of the cn term. To do this, we change the
Thus we have form of our guess for T(n) so as to explicitly subtract it off. Suppose we try
n ¯ n(l°g2 q)--I <_ nlog2 q = O(nl°g2 q). the form T(n) <_ kna - gn, where we’ve now decided that d = log2 q but we
T(n) <_
haven’t fixed the constants k or g. Applying the new formula to T(n/2), this
expands to
We sum this up as follows.
218 Chapter 5 Divide and Conquer 5.2 Further Recurrence Relations 219
+ cn cn time, plus
recursive calls Level 0: cn total
q. knd - q~ n + cn
2a 2
= knd -- q~’n + cn
~ /~ Level 1:cn/2 total
2
= l~na - (~ - c)n.
~/~ Level2:cn/4total
This now works completely, if we simply choose ~ so that (~ - c) = ~: in other
Figure 5.3 Unrolling the recurrence T(n) <_ T(n/2) + O(n).
words, ~ = 2c/(q - 2). This completes the inductive step for n. We also need
to handle the base case n = 2, and this we do using the fact that the value of
k has not yet been fixed: we choose k large enough so that the formula is a We sum this up as follows.
valid upper bound for the case n = 2.
most of the work is spent in the recursion: when q = 1, the total running time total of at most cn2/2, again plus the time in subsequent recursive calls.
is dominated by the top level, whereas when q > 2 it’s dominated by the work At the third level, we have four problems each of size n/4, each taking
done on constant-size subproblems at the bottom of the recursion. Viewed this time at most c(n/4)2 = cn2/16, for a total of at most cn2/4. Already we see
way, we can appreciate that the recurrence for q = 2 really represents a "knife- that something is different from our solution to the analogous recurrence
edge"--the amount of work done at each level is exactly the same, which is (5.!); whereas the total amount of work per level remained the same in
that case, here it’s decreasing.
what yields the O(n log n) running time.
Identifying a pattern: At an arbitrary level j of the recursion, there are 2J
subproblems, each of size n/2J, and hence the total work at this level is
A Related Recurrence: T(n) <_ 2T(n/2) + O(n2)
bounded by 2Jc@)2 = cn2/2j.
We conclude our discussion with one final recurrence relation; it is illustrative
both as another application of a decaying geometric sum and as an interesting Summing over all levels of recarsion: Having gotten this far in the calcu-
contrast with the recurrence (5.1) that characterized Mergesort. Moreover, we lation, we’ve arrived at almost exactly the same sum that we had for the.
wil! see a close variant of it in Chapter 6, when we analyze a divide-and- case q = 1 in the previous recurrence. We have
conquer algorithm for solving the Sequence Alignment Problem using a small
cn2 1og2 n~-I (~.)
amount of working memory. T(n) <_ log~-I : = cn2 < 2cn2 = O(n2),
The recurrence is based on the following divide-and-conquer structure. j=0 23 -
j=0
Divide the input into two pieces of equal size; solve the two subproblems, where the second inequality follows from the fact that we have a con-
on these pieces separately by recursion; and then combine the two results vergent geometric sum.
into an overall solution, spending quadratic time for the initial division In retrospect, our initial guess of T(n) = O(n2 log n), based on the analogy
and final recombining. to (5.1), was an overestimate because of how quickly n2 decreases as we
For our proposes here, we note that this style of algorithm has a running time replace it with (~)2, (n)2,~ (~)~, and so forth in the unrolling of the recurrence.
T(n) that satisfies the fo!lowing recurrence. This means that we get a geometric sum, rather than one that grows by a fixed
amount over all n levels (as in the solution to (5.1)).
(5.6) For some constant c,
T(n) <_ 2T(n/2) + cn2
5.3 Counting Inversions
when n > 2, and We’ve spent some time discussing approaches to solving a number of common
T(2) < c. recurrences. The remainder of the chapter will illustrate the application of
divide-and-conquer to problems from a number of different domains; we will
use what we’ve seen in the previous sections to bound the running times
One’s first reaction is to guess that the solution will be T(n) = O(n2 log n), of these algorithms. We begin by showing how a variant of the Mergesort
since it looks almost identical to (5.1) except that the amount of work per level technique can be used to solve a problem that is not directly related to sorting
is larger by a factor equal to the input size. In fact, this upper bound is correct numbers.
(it would need a more careful argument than what’s in the previous sentence),
but it will turn out that we can also show a stronger upper bound. ~ The Problem
We’ll do this by unrolling the recurrence, following the standard template We will consider a problem that arises in the analysis of rankings, which
for doing this. are becoming important to a number of current applications. For example, a
o Analyzing the first few levels: At the first level of recursion, we have a number of sites on the Web make use of a technique known as collaborative
single problem of size n, which takes time at most cn2 plus the time spent filtering, in which they try to match your preferences (for books, movies,
in al! subsequent recursive calls. At the next level, we have two problems, restaurants) with those of other people out on the Internet. Once the Web site
has identified people with "similar" tastes to yours--based on a comparison
each of size n/2. Each of these takes time at most c(n/2)2 = cn2/4, for a
Chapter 5 Divide and Conquer 5.3 Counting Inversions 223
222
~ Designing and Analyzing the Algorithm
of how you and they rate various things--it can recommend new things that
these other people have liked. Another application arises in recta-search tools What is the simplest algorithm to count inversions? Clearly, we could look
on the Web, which execute the same query on many different search engines at ~very pair of numbers (ai, aj) and determine whether they constitute an
and then try to synthesize the results by looking for similarities and differences inversion; this would take O(n2) time.
among the various rankings that the search engines return. We now show how to count the number of inversions much more quickly,
in O(n log n) time. Note that since there can be a quadratic number of inver-
A core issue in applications like this is the problem of comparing two
sions, such an algorithm must be able to compute the total number without
rankings. You rank a set of rt movies, and then a collaborative filtering system
ever looking at each inversion individually. The basic idea is to follow the
consults its database to look for other people who had "similar" rankings. But
strategy (]-) defined in Section 5.1. We set m = [n/2] and divide the list into
what’s a good way to measure, numerically, how similar two people’s rankings
the two pieces a~ ..... am and ara+l ..... an. We first count the number of
are? Clearly an identical ranking is very similar, and a completely reversed
inversions in each of these two halves separately. Then we count the number
ranking is very different; we want something that interpolates through the
of inversions (az, aj), where the two numbers belong to different halves; the
middle region. trick is that we must do this part in O(n) time, if we want to apply (5.2). Note
Let’s consider comparing your ranking and a stranger’s ranking of the that these first-half/second-half inversions have a particularly nice form: they
same set of n movies. A natural method would be to label the movies from are precisely the pairs (a,, aj), where ai is in the first half, aj is in the second
1 to n according to your ranking, then order these labels according to the half, and ai > aj.
stranger’s ranking, and see how many pairs are "out of order." More concretely,,
To help with counting the number of inversions between the two halves,
we will consider the following problem. We are given a sequence of rt numbers
we will make the algorithm recursively sort the numbers in the two halves as
art; we will assume that all the numbers are distinct. We want to define
a measure that tells us how far this list is from being in ascending order; the well. Having the recursive step do a bit more work (sorting as wel! as counting
inversions) will make the "combining" portion of the algorithm easier.
value of the measure should be 0 if al < a2 < ¯ ¯ ¯ K an, and should increase as
the numbers become more scrambled. So the crucial routine in this process is Nerge-and-Cotmt. Suppose we
have recursively sorted the first and second halves of the list and counted the
A natural way to quantify this notion is by counting the number of inversions in each. We now have two sorted lists A and B, containing the first
inversions. We say that two indices i < j form an inversion if ai > aj, that is, and second halves, respectively. We want to produce a single sorted list C from
if the two elements ai and aj are "out of order." We will seek to determine the their union, while also counting the number of pairs (a, b) with a ~ A, b ~ B,
5 number of inversions in the sequence a~ ..... art. and a > b. By our previous discussion, this is precisely what we will need
2
for the "combining" step that computes the number of first-half/second-half
Just to pin down this definition, consider an example in which the se-
inversions.
quence is 2, 4, 1, 3, 5. There are three inversions in this sequence: (2, 1), (4, 1),
and (4, 3). There is also an appealing geometric way to visualize the inver- This is closely related to the simpler problem we discussed in Chapter 2,
sions, pictured in Figure 5.4: we draw the sequence of input numbers in the which formed the corresponding "combining" step for Mergeso.rt: there we had
Figure 5.4 Counting the
number of inversions in the order they’re p~ovided, and below that in ascending order. We then draw a two sorted lists A and B, and we wanted to merge them into a single sorted list
sequence 2, 4, 1, 3, 5. Each line segment between each number in the top list and its copy in the lower in O(n) time. The difference here is that we want to do something extra: not
crossing pair of line segments only should we produce a single sorted list from A and B, but we should also
corresponds to one pair that list. Each crossing pair of line segments corresponds to one pair that is in the
is in the opposite order in opposite order in the two lists--in other words, an inversion. count the number of "inverted pairs" (a, b) where a ~ A,, b ~ B, and a > b.
the input list and the ascend- It turns out that we will be able to do this in very much the same style
ing list--in other words, an Note how the number of inversions is a measure that smoothly interpolates
inversion. that we used for merging. Our Merge-and-Count routine will walk through
between complete agreement (when the sequence is in ascending order, then
the sorted lists A and B, removing elements from the front and appending
there are no inversions) and complete disagreement (if the sequence is in
them to the sorted list C. In a given step, we have a Current pointer into each
descending order, then every pair forms an inversion, and so there are (~) of
list, showing our current position. Suppose that these pointers are currently
Chapter 5 Divide and Conquer 5.4 Finding the Closest Pair of Points
224 225
Elements inverted 0nce one list is empty, append the remainder of the other list
with by < ai to the output
Return Count and the merged list
]A
The rimming time of Merge-and-Count can be bounded by the analogue
]B of the argument we used for the original merging algorithm at the heart of
Mergesort: each iteration of the While loop takes constant time, and in each
Figure 5.5 Merging two sorted fists while also counting the number of inversions iteration we add some element to the output that will never be seen again.
between them. Thus the number of iterations can be at most the sum of the initial lengths of
A and B, and so the total running time is O(n).
We use this Merge-and-Count routine in a recursive procedure that
at elements, ai and bi. In one step, we compare the elements ai and by being simultaneously sorts and counts the number of inversions in a list L.
pointed to in each list, remove the smaller one from its list, and append it to
the end of list C.
Sort-and-Count (L)
This takes care of merging. How do we also count the number of inver-
If the list has one element then
sions.~ Because A and B are sorted, it is actually very easy to keep track of the
there are no inversions
number of inversions we encounter. Every time the element a~ is appended to
Else
C, no new inversions are encountered, since a~ is smaller than everything left
Divide the list into two halves:
in list B, and it comes before all of them. On the other hand, if bI is appende
A contains the first [rt/2] elements
to list C, then it is smaller than all the remaining items in A, and it comes
B contains the remaining [n/2J elements
after all of them, so we increase our count of the number of inversions by the
(rA, A) = Sort-and-Count (A)
number of elements remaining in A. This is the crucial idea: in constant time,
(rB, B) = Sort-and-Count (B)
we have accounted for a potentially large number of inversions. See Figure 5.5
(r, L) = Merge-and-Count (A, B)
for an illustration of this process. Endif
To summarize, we have the following algorithm. Return r=rA+rB+r, and the sorted list L
Merge-and-Count(A,B) Since our Merge-and-Count procedure takes O(n) time, the rimming time
Maintain a Cuwent pointer into each list, initialized to T(n) of the full Sort-and-Count procedure satisfies the recurrence (5.1). By
point to the front elements (5.2), we have
Maintain a variable Count for the number of inversions,
initialized to 0
(S.7) The Sort-and-Count algorithm correctly sorts the input list and counts
While both lists are nonempty:
the number of inversions; it runs in O(n log n) time for a list with n elements:
Let ai and ~ be the elements pointed to by the Cuwent pointer
Append the smaller of these two to the output list
If ~ is the smaller element then
Increment Count by the number of elements remaining in A
Endif
5.4 Finding the losest Pair of Points
Advance the Cu~ent pointer in the list from which the We now describe another problem that can be solved by an algorithm in the
smaller element was selected. style we’ve been discussing; but finding the right way to "merge" the solutions
EndWhile to the two subproblems it generates requires quite a bit of ingenuity.
Chapter 5 Divide and Conquer 5.4 Finding the Closest Pair of Points 227
226
P and the closest pair among the points in the "right half" of P; and then we
~ The Problem use this information to get the overall solution in linear time. If we develop an
The problem we consider is very simple to state: Given rt points in the plane, algorithm with this structure, then the solution of our basic recurrence from
find the pair that is closest together. (5.1) will give us an O(n log n) running time.
The problem was considered by M. I. Shamos and D. Hoey in the early It is the last, "combining" phase of the algorithm that’s tricky: the distances
1970s, as part of their proiect to work out efficient algorithms for basic com- that have not been considered by either of our recursive calls are precisely those
putational primitives in geometry. These algorithms formed the foundations that occur between a point in the left half and a point in the right half; there
of the then-fledgling field of compatational geometry, and they have found are S2 (n2) such distances, yet we need to find the smallest one in O(n) time
their way into areas such as graphics, computer vision, geographic informa- after the recursive calls return. If we can do this, our solution will be complete:
tion systems, and molecular modeling. And although the closest-pair problem it will be the smallest of the values computed in the recursive calls and this
is one of the most natural algorithmic problems in geometry, it is sm~risingly minimum "left-to-right" distance.
hard to find an efficient algorithm for it. It is immediately clear that there is an
O(n2) solution--compute the distance between each pair of points and take Setting Up the Recursion Let’s get a few easy things out of the way first.
the minimum--and so Shamos and Hoey asked whether an algorithm asymp- It will be very useful if every recursive call, on a set P’ c_ p, begins with two
totically faster than quadratic could be found. It took quite a long time before lists: a list p.t~ in which a~ the points in P’ have been sorted by increasing x-
they resolved this question, and the O(n log n) algorithm we give below is coordinate, and a list P; in which all the points in P’ have been sorted by
essentially the one they discovered. In fact, when we return to this problem in increasing y-coordinate. We can ensure that this remains true throughout the
Chapter 13, we wi!l see that it is possible to further improve the running fim~ algorithm as follows.
to O(n) using randomization. First, before any of the recursion begins, we sort all the points in P by x-
coordinate and again by y-coordinate, producing lists Px and Py. Attached to
each entry in each list is a record of the position of that point in both lists.
/¢::~ Designing the Algorithm
We begin with a bit of notation. Let us denote the set of points by P = The first level of recursion will work as follows, with all further levels
working in a completely analogous way. We define O to be the set of points
{Pl ..... Pn}, where Pi has coordinates (x;, Yi); and for two points Pi, Pj E P,
in the first In/2] positions of the list Px (the "left half") and R to be the set of
we use d(p~, pj) to denote the standard Euclidean distance between them. Our
points in the final [n/2J positions of the list Px (the "right half"). See Figure 5.6.
goal is to find a pair of points pi, pl that minimizes d(pi, p1).
By a single pass through each of Px and Py, in O(n) time, we can create the
We will assume that no two points in P have the same x-coordinate or
the same y-coordinate. This makes the discussion cleaner; and it’s easy to
eliminate this assumption either by initially applying a rotation to the points
that makes it ~e, or by slightly extending the algorithm we develop here. Q Lim L
It’s instructive to consider the one-dimensional version of this problem for O
a minute, since it is much simpler and the contrasts are revealing. How would
we find the closest pair of points on a line? We’d first sort them, in O(n log n) O
o
time, and then we’d walk through the sorted list, computing the distance from
o
each point to the one that comes after it. It is easy to see that one of these
distances must be the minimum one.
In two dimensions, we could try sorting the points by their y-coordinate
(or x-coordinate) and hoping that the two closest points were near one another
in the order of this sorted list. But it is easy to construct examples in which they o o
are very far apart, preventing us from adapting our one-dimensional approach.
Instead, our plan will be to apply the style of divide and conquer used Figure 5.6 The first level of recursion: The point set P is divided evenly into Q and R by
the line L, and the closest pair is found on each side recursively.
in Mergesort: we find the closest pair among the points in the "left half" of
Chapter 5 Divide and Conquer 5.4 Finding the Closest Pair of Points
228 229
following four lists: Qx, consisting of the points in Q sorted by increasing x- (5.9) There exist q ~ O and r ~ R for which d(q, r) < a if and only if there
ach box can
coordinate; Qy, consisting of the points in Q sorted by increasing y-coordinate;
and analogous lists Rx and Ry. For each entry of each of these lists, as before,
exist s, s’ ~ S for which d(s, s’) < &
It’s worth noticing at this point that S might in fact be the whole set P, in
I~ontain at most |
ne input point.)
we record the position of the point in both lists it belongs to. which case (5.8) and (5.9) really seem to buy us nothing. But this is actuary
Line L
We now recursively determine a closest pair of points in Q (with access far from true, as the following amazing fact shows.
to the lists Qx and Qy). Suppose that q~ and q~ are (correctly) returned as a
closest pair of points in Q. Similarly, we determine a closest pair of points in (5.10) If s, s’ ~ S have the property that d(s, s’) < a, then S and s’ are within
R, obtaining r~ and r~. 15 positions of each other in the sorted list Sy. L812
Combining the Solutions The general machinery of divide and conquer has
gotten us this far, without our really having delved into the structure of the Proof. Consider the subset Z of the plane consisting of all points within
closest-pair problem. But it still leaves us with the problem that we saw distance ~ of L. We partition Z into boxes: squares with horizontal and vertical Boxes ~
looming originally: How do we use the solutions to the two subproblems as sides of length 8/2. One row of Z will consist of four boxes whose horizontal
part of a linear-time "combining" operation? sides have t_he same y-coordinates. This collection of boxes is depicted in
Let 8 be the minimum of d(qo,ql
* *) and d(r~,rl).
* * The real question is: Are Figure 5.7.
there points q E Q and r E R for which d(q, r) < 87 If not, then we have already Suppose two points of S lie in the same box. Since all points in this box lie
found the closest pair in one of our recursive calls. But if there are, then the on the same side of L, these two points either both belong to O or both belong
closest such q and r form the closest pair in P. to R. But any two points in the same box are within distance ~. ~/2 < 8,
Let x* denote the x-coordinate of the rightmost point in Q, and let L denote which contradicts our definition of ~ as the minimum distance between any Figure 5.7 The portion of the
pair of points in Q or in R. Thus each box contains at most one point of S. plane dose to the dividing
the vertical line described by the equation x = x*. This line L "separates" Q line L, as analyzed in the
from R. Here is a simple fact. Now suppose that s, s’ ~ S have the property that d(s, s’) < 8, and that they proof of (5.10).
are at least 16 positions apart in Sy. Assume without loss of generality that s
(5.8) If there exists q ~ Q and r ~ R for which d(q, r) < & then each of q and has the smaller y-coordinate. Then, since there can be at most one point per
r lies within a distance ~ of L. box, there are at least three rows of Z lying between s and s’. But any two
points in Z separated by at least three rows must be a distance of at least 38/2
Proof. Suppose such q and r exist; we write q = (qx, qy) and r = (rx, ry). By apart--a contradiction. []
the definition of x*, we know that qx < x* <_ rx. Then we have
x* - qx <- rx - qx <- d(q, r) < ~ We note that the value of 15 can be reduced; but for our purposes at the
moment, the important thing is that it is an absolute constant.
and In view of (5.10), we can conclude the algorithm as follows. We make one
rx - x* < rx - qx <- d(q, r) < 8, pass through Sy, and for each s ~ Sy, we compute its distance to each of the
next 15 points in Sy. Statement (5.10) implies that in doing so, we will have
so each of q and r has an x-coordinate within ~ of x* and hence lies within computed the distance of each pair of points in S (if any) that are at distance
distance a of the line L. [] less than 8 from each other. So having done this, we can compare the smallest
such distance to 8, and we can report one of two things~ (i) the closest pair
So if we want to find a close q and r, we can restrict our search to the of points in S, if their distance is less than 8; or (if) the (correct) conclusion
narrow band consisting only of points in P within 8 of L. Let S __c p denote this that no pairs of points in S are within ~ of each other. In case (i), this pair is
the closest pair in P; in case (if), the closest pair found by our recursive calls
set, and let Sy denote the list consisting of the points in S sorted by increasing
is the closest pair in P.
y-coordinate. By a single pass through the list Py, we can construct Sy in O(n)
time. Note the resemblance between this procedure and the algorithm we re-
We can restate (5.8) as follows, in terms of the set S. jected at the very beginning, which tried to make one pass through P in order
Chapter 5 Divide and Conquer 5.5 Integer Multiplication 231
of y-coordinate. The reason such an approach works now is due to the ex- Else
tra knowledge (the value of 8) we’ve gained from the recursive calls, and the Return (r~, r~)
special structure of the set S. Endif
This concludes the description of the "combining" part of the algorithm,
since by (5.9) we have now determined whether the minimum distance
~ Analyzing the Algorithm
between a point in Q and a point in R is less than 8, and if so, we have
we first prove that the algorithm produces a correct answer, using the facts
found the closest such pair.
we’ve established in the process of designing it.
A complete description of the algorithm and its proof of correctness are
implicitly contained in the discussion so far, but for the sake of concreteness, The algorithm correctly outputs a closest pair of points in P.
we now summarize both.
Summary of the Algorithm A high-level description of the algorithm is the Proof. As we’ve noted, all the components of the proof have already been
following, using the notation we have developed above. worked out, so here we just summarize how they fit together.
We prove the correctness by induction on the size of P, the case of [P[ _< 3
Closest-Pair (P) being clear. For a given P, the closest pair in the recursive calls is computed
Construct Px and Py (O(n log n) time) correctly by induction. By (5.!0) and (5.9), the remainder of the algorithm
(p~, p~) = Closest-Pair-Kec(Px,Py) correctly determines whether any pair of points in S is at distance less than
8, and if so returns the closest such pair. Now the closest pair in P either has
Closest-Pair-Rec(Px, Py) both elements in one of Q or R, or it has one element in each. In the former
If [PI ~ 3 then case, the closest pair is correctly found by the recursive call; in the latter case,
find closest pair by measuring all pairwise distances this pair is at distance less than 8, and it is correctly found by the remainder
Endif of the algorithm..,
Construct Qx, Q),, Rx, Ry (O(n) time) We now bound the running time as well, using (5.2).
(q$,q~) = Closest-Pair-Rec(Ox,
(r~,r~) = Closest-Pair-Rec(Rx, Ry) (~.12) The running time of the algorithm is O(n log n).
Proof. The initial sorting of P by x- and y-coordinate takes time O(n log n).
x* = maximum x-coordinate of a point in set Q The running time of the remainder of the algorithm satisfies the recLt~,ence
L = {~,y) : x = x*} (5.1), and hence is O(n log n) by (5.2). []
S = points in P within distance ~ of L.
5.5 Integer Multiplication
Construct Sy (O(n) time)
We now discuss a different application of divide and conquer, in which the
For each point s ~ Sy, compute distance from s
"default" quadratic algorithm is improved by means of a different recurrence.
to each of next 15 points in Sy
Let s, s’ be pair achieving minimum of these distances
The analysis of the faster algorithm will exploit one of the recurrences
sidered in Section 5.2, in which more than two recursive calls are spawned at
(O(n) time)
each level.
If d(s,#) < 8 then
Retur~ (s, s’) f! The Problem
Else if d(q~,q~) < d(r~,r~) then The problem we consider is an extremely basic one: the multiplication of two
Return (q~,q~) integers. In a sense, this problem is so basic that one may not initially think of it
5.5 Integer Multiplication 233
Chapter 5 Divide and Conquer
232
11oo (5.1). The combining of the solution requires a constant number of additions
x 11Ol of O(n)-bit numbers, so it takes time O(n); thus, the running time T(n) is
IlOO bounded by the recurrence
0000
IlOO T(n) < 4T(n/2) + cn
II00
~0011100 for a constant c. Is this good enough to give us a subquadratic running time?
(a) We can work out the answer by observing that this is just the case q = 4 of
the class of recurrences in (5.3). As we saw earlier in the chapter, the solution
Figure 5.8 The elementary-school algorithm for multipl~4ng two integers, in (a) decimal to this is T(n) < o(nl°g2q) = O(n2).
and (b) binary representation.
So, in fact, our divide-and-conquer algorithm with four-way branching
was just a complicated way to get back to quadratic time! If we want to do
even as an algorithmic question. But, in fact, elementary schoolers are taught a better using a strategy that reduces the problem to instances on n/2 bits, we
concrete (and quite efficient) algorithm to multiply two n-digit numbers x and should try to get away with only three recursive calls. This will lead to the case
y. You first compute a "partial product" by multiplying each digit ofy separately q = 3 of (5.3), which we saw had the solution T(n) <_ O(nl°g2 q) = O(n1-59).
by x, and then you add up all the partial products. (Figure 5.8 should help you Recall that our goal is to compute the expression xlYl - 2n ÷ (xlYo ÷ xoYl) ¯
recall this algorithm. In elementary school we always see this done in base- 2 n/2 ÷ X0y0 in Equation (5.1). It turns out there is a simple trick that lets us
10, but it works exactly the same way in base-2 as well.) Counting a. sing!9 determine al! of the terms in this expression using just three recursive calls. The
operation on a pair of bits as one primitive step in this computation, it takes txick is to consider the result of the single multiplication (xl + Xo) (Yl + Yo) =
O(n) time to compute each partial product, and O(n) time to combine it in
xffl + Xlyo + xoyl + x0yo. This has the four products above added together, at
with the running sum of all partial products so far. Since there are n partial the cost of a single recursive multiplication. If we now also determine xly~ and
products, this is a total running time of O(n2).
XoYo by recursion, then we get the outermost terms explicitly, and we get the
If you haven’t thought about this much since elementary school, there’s middle term by subtracting xly1 and xoyo away from (xl + Xo)(y1 + Y0).
something initially striking about the prospect of improving on this algorithm. Thus, in fl~, our algorithm is
Aren’t all those partial products "necessary" in some way? But, in fact, it
is possible to improve on O(n2) time using a different, recursive way of
Recursive-Mult iply (x, y) :
performing the multiplication.
Write x=x1-2nl2+x0
Y = Yl "2n/2 + YO
~ Designing the Algorithm Compute Xl+X0 and YI+YO
The improved algorithm is based on a more clever way to break up the product P = Recursive-Multiply(Xl +xo, Yl +Yo)
into partial sums. Let’s assume we’re in base-2 (it doesn’t really matter), and XlYl = Recursive-Multiply(Xl, Yl)
start by writing x as Xl- 2n/2 + Xo. In other words, xl corresponds to the "high- XoYo = Recursive-Multiply(xo, Yo)
order" n/2 bits, and x0 corresponds to the "low-order" n/2 bits. Similarly, we Return XlYI ¯ 2n + (p -- XlYI -- x0Y0). 2n/2 + x0Y0
write y = Yl" 2n/2 + Yo- Thus, we have
xy = (X1¯ 2n/2 ÷ X0)(Yl" 2n/2 ÷ Y0)
~ Analyzing the Algorithm
(5.1)
We can determine the running time of this algorithm as follows. Given two n-
bit numbers, it performs a constant number of additions on O(n)-bit numbers,
Equation (5.1) reduces the problem of solving a single n-bit instance
(multiplying the two R-bit numbers x and y) to the problem of solving four n/2- in addition to the three recursive calls. Ignoring for now the issue that x1 ÷ Xo
bit instances (computing the products xlYl, xly0, xoYl, and xoY0)- So we have and yl + Yo may have n/2 + I bits (rather than just n/2), which turns out not
a first candidate for a divide-and-conquer solution: recursively compute the to affect the asymptotic results, each of these recursive calls is on an instance
results for these four n/2-bit instances, and then combine them using Equation of size n/2. Thus, in place of our four-way branching recursion, we now have
5.6 Convolutions and the Fast Fourier Transform 235
Chapter 5 Divide and Conquer
234
a three-way branching one, with a running time that satisfies aobo aobl ... aobn_2 aobn_~
a~bo alb~ ... a~bn_2 albn_l
T(n) <_ 3T(n/2) + cn a2bo azbx ... azbn_2 azbn_l
an_2bn_l ÷ an_lbn-2, an-lbn-1). In other words, the coefficient vector c of C(x) is the convolution of the
coefficient vectors of A(x) and B(x).
This definition is a bit hard to absorb when you first see it. Another way to
think about the convolution is to picture an rt x n table whose (f,j) entry is Arguably the most important application of convolutions in practice is
for signal processing. This is a topic that could fill an entire course, so
a~bj, like this,
237
we’ll just give a simple example here to suggest one way in which the We mention one final application: the problem of combining histograms.
convolution arises. Suppose we’re studying a population of people, and we have the fol!ow-
am-l) which represents ing two histograms: One shows the annual income of all the men in the
population, and one shows the annual income of all the women. We’d
a sequence of measurements, such as a temperature or a stock price,
now like to produce a new histogram, showing for each k the number of
sampled at m consecutive points in time. Sequences like this are often pairs (M, W) for which man M and woman W have a combined income
very noisy due to measurement error or random fluctuations, and so a
of k.
common operation is to "smooth" the measurements by averaging each
value ai with a weighted sum of its neighbors within k steps to the left This is precisely a convolution. We can write the first histogram as a
and right in the sequence, the weights decaying quickly as one moves ara-1), to indicate that there are a~ men with annual
away from ai. For example, in Gaussian smoothing, one replaces ai with income equal to i. We can similarly write the second histogram as a
bn_~). Now, let c~ denote the number of pairs (m, w)
with combined income k; this is the number of ways of choosing a man
with income ai and a woman with income hi, for any pair (i, j) where
i + j = k. In other words,
for some "width" parameter k, and with Z chosen simply to normalize
the weights in the average to add up to 1. (There are some issues With c~= ~ aib~.
(i,]):i+j=k
boundary conditions--what do we do when i - k < 0 or i + k > m?--but
we could dea! with these, for example, by discarding the first and last k so the combined histogram c = (co .....cra+n-2) is simply the convolu-
entries from the smoothed signal, or by scaling them differently to make tion of a and b.
up for the missing terms.) (Using terminology from probability that we will develop in Chap-
To see the connection with the convolution operation, we picture ter 13, one can view this example as showing how convolution is the
this smoothing operation as follows. We first define a "mask" underlying means for computing the distribution of the sum of two in-
dependent random variables.)
Wk_D
Computing the Convolution Having now motivated the notion of convolu-
consisting of the weights we want to use for averaging each point with tion, let’s discuss the problem of computing it efficiently. For simplicity, we
e-1, 1, e-1 ..... will consider the case of equal length vectors (i.e., m = n), although everything
e-(k-l)2, e-k2) in the Gaussian case above.) We then iteratively position we say carries over directly to the case of vectors of unequal lengths.
this mask so it is centered at each possible point in the sequence a; and Computing the convolution is a more subtle question than it may first
for each positioning, we compute the weighted average. In other words, appear. The definition of convolution, after al!, gives us a perfectly valid way
~ k to compute it: for each k, we just calculate the sum
we replace ai with a, = ~s=-k Wsai+s"
This last expression is essentially a convolution; we just have to
warp the notation a bit so that this becomes clear. Let’s define b = E aibj
b2k) by setting be = tvk_g. Then it’s not hard to check that (i,j):i+j=k
with this definition we have the smoothed value and use this as the value of the kth coordinate. The trouble is that this direct way
of computing the convolution involves calculating the product a~bj for every
ai albg. pair (i,j) (in the process of distributing over the sums in the different terms)
~,~):]+~=i+k and this is ®(n2) arithmetic operations. Spending O(n2) time on computing
In other words, the smoothed sequence is just the convolution of the the convolution seems natural, as the definition involves O(n2) multiplications
original signal and the reverse of the mask (with some meaningless azbj. However, it’s not inherently clear that we have to spend quadratic time to
coordinates at the beginning and end). compute a convolution, since the input and output both only have size O(n).
: 5.6 Convolutions and the Fast Fourier Transform 239
Chapter 5 Divide and Conquer
238
Could one design an algorithm that bypasses the quadratic-size definition of O(n) arithmetic operations, since it simply involves the multiplication of O(n)
numbers. But the situation doesn’t look as hopeful with steps (i) and (fii). In
convolution and computes it in some smarter way?
particular, evaluating the polynomials A and B on a single value takes S2 (n)
In fact, quite surprisingly, this is possible. We now describe a method operations, and our plan calls for performing 2n such evaluations. This seems
that computes the convolution of two vectors using only O(n log n) arithmetic to bring us back to quadratic time right away.
operations. The crux of this method is a powerful technique known as the Fast
Fourier Transform (FFT). The FFT has a wide range of further applications in The key idea that will make this all work is to find a set of 2n values
analyzing sequences of numerical values; computing convolutions quickly, x~, x2 ..... x2n that are intimately related in some way, such that the work in
evaluating A and B on all of them can be shared across different evaluations. A
which we focus on here, is iust one of these applications.
set for which this wil! turn out to work very well is the complex roots o[ unity.
The Complex Roots of Unity At this point, we’re going to need-to recal! a
~ Designing and Analyzing the Algorithm few facts about complex numbers and their role as solutions to polynomial
To break through the quadratic time barrier for convolutions, we are going equations.
to exploit the connection between the convolution and the multiplication of Recall that complex numbers can be viewed as lying in the "complex
two polynomials, as illustrated in the first example discussed previously. But
plane," with axes representing their real and imaginary parts. We can write
rather than use convolution as a primitive in polynomial multiplication, we a complex number using polar coordinates with respect to this plane as re°i,
are going to exploit this connection in the opposite direction. where e’r~= -1 (and e2’-n = 1). Now, for a positive integer k, the polynomial
Suppose we are given the vectors a = (ao, al ..... an_~) and b= (bo,, equation xk = 1 has k distinct complex roots, and it is easy to identify them.
bn_l). We will view them as the polynomials A(x) = ao + alx + a2x2 + Each of the complex numbers wj,k = e2’-qi/k (for] = 0, 1, 2 ..... k - 1) satisfies
¯ . ¯ an_~xn-~ and B(x) = bo + b~x + b2x2 ÷" "" bn-1xn-1, and we’ll seek to com- the equation, since
pute their product C(x) = A(x)B(x) in O(n log rt) time. If c = (Co, c~ .... , c2n_2)
is the vector of coefficients of C, then we recall from our earlier discussion
that c is exactly the convolution a ¯ b, and so we can then read off the desired and each of these numbers is distinct, so these are all the roots. We refer to
answer directly from the coefficients of C(x). these numbers as the kth roots of unity. We can picture these roots as a set of k
Now, rather than multiplying A and B symbolically, we can treat them as equally spaced points lying on the unit circle in the complex plane, as shown
functions of the variable x and multiply them as follows. in Figure 5.9 for the case k = 8.
For our numbers x~ ..... x2n on which to evaluate A and B, we will choose
(i) First we choose 2n values xl, x2 ..... x2n and evaluate A(xj) and B(xj) for
the (2n)th roots of unity. It’s worth mentioning (although it’s not necessary for
each of j = !, 2 ..... 2n.
understanding the algorithm) that the use of the complex roots of unity is the
(ii) We can now compute C(xi) for each ] very easily: C(xj) is simply the basis for the name Fast Fourier Transform: the representation of a degree-d
product of the two numbers A(xj) and B(xj).
(iii) Finally, we have to recover C from its values on x~, x2 ..... x2n. Here we
take advantage of a fundamental fact about polynomials: any polynomial
of degree d can be reconstructed from its values on any set of d + 1 or
more points. This is known as polynomial interpolation, and we’ll discuss
the mechanics of performing interpolation in more detail later. For the
moment, we simply observe that since A and B each have degree at
most n - !, their product C has degree at most 2n - 2, and so it can be
reconstructed from the values C(xl), C(x2) ..... C(x2n) that we computed
in step (ii).
This approach to multiplying polynomials has some promising aspects
and some problematic ones. First, the good news: step (ii) requires only Figure 5.9 The 8th roots of unity in the complex plane.
5.6 Convolutions and the Fast Fourier Transform
240 Chapter 5 Divide and Conquer 241
constant number of operations. Doing this for a!l 2rt roots of unity i~ therefore
polynomial P by its values on the (d ÷ 1)st roots of unity is sometimes referred
O(n) additional operations after the two recursive calls, and so the bound
to as the discrete Fourier transform of P; and the heart of our procedure is a
T(n) on the number of operations indeed satisfies T(n) < 2T(n/2) + O(n). We
method for making this computation fast. run the same procedure to evaluate the polynomial B on the (2n)th roots of
A Recursive Procedure for Polynomial Eualuatiou We want to design an unity as well, and this gives us the desired O(n log n) bound for step (i) of our
algorithm for evaluating A on each of the (2n)th roots of unity recursively, so algorithm outline.
as to take advantage of the familiar recurrence from (5.1)--namely, T(n) <_
2T(n/2) ÷ O(n) where T(n) in this case denotes the number of operations Polynomial Interpolation We’ve now seen how to evaluate A and B on the
required to evaluate a polynomial of degree n - 1 on all the (2n)th roots of set of all (2n)th roots of unity using O(n log n) operations and, as noted above,
unity. For simplicity in describing this algorithm, we will assume that rt is a we can clearly compute the products C(wj,n) = A(ooj,2n)B(o)j,2n) in O(n) more
power of 2. operations. Thus, to conclude the algorithm for multiplying A and B, we
How does one break the evaluation of a polynomia! into two equal-sized need to execute step (iii) in our earlier outline using O(n log n) operations,
subproblems? A useful trick is to define two polynomials, Aeven(X) and Aoaa(x), reconstructing C from its values on the (2n)th roots of unity.
that consist of the even and odd coefficients of A, respectively. That is, In describing this part of the algorithm, it’s worth keeping track of the
Aeven(X) = a0 + a2x + a4x2 +... + an_2x(n-2)/2, following top-level point: it turns out that the reconstruction of C can be
achieved simply by defining an appropriate polynomial (the polynomial D
and below) and evaluating it at the (2n)th roots of unity. This is exactly what
Aoad(X) = a~ + a3x + asx2 + ¯ ¯ ¯ + a(n-Dx(n-:2)/2" we’ve just seen how to do using O(n log n) operations, so we do it again here,
spending an additional O(n log n) operations and concluding the algorithms.
Simple algebra shows us that
Consider a polynomial C(x)= y~.an-1 s=o Csxs that we want to reconstruct
A(x) = Aeven(X2) + XAodd(X2), from its values C(ws,2n) at the (2rt)th roots of unity. Define a new polynomial
D(x)= Z-~s=oV’2n-1 dsxs, where ds = C(cos,an). We now consider the values of D(x)
and so this gives us a way to compute A(x) in a constant number of operations,
given the evaluation of the two constituent polynomials that each have half at the (2n)th roots of unity.
the degree of A.
Now Suppose that we evaluate each of Aeuen and Aoaa on the rtth roots of
unity. This is exactly a version of the problem we face with A and the (2n)th
roots of unity, except that the input is half as large: the degree is (n - 2)/2
rather than n - 1, and we have rt roots of unity rather than 2n. Thus we can
perform these evaluations in time T(n/2) for each of Ae~en and Aoaa, for a total
time of 2T(n/2).
We’re now very close to having a recursive algorithm that obeys (5.1) and
gives us the running time we want; we just have to produce the evaluations
of A on the (2n)th roots of unity using O(n) additional operations. But this is by definition. Now recall that OOs,2n --= (e2’-ri/2n)s. Using this fact and extending
easy, given the results from the recursive calls on Aeuen and Aoaa. Consider the notation to COs,2n = (e2’-ri/2n)s even when s >_ 2n, we get that
one of these roots of unity roj,2n = e2’rji/2n. The quantity o)},2n is equal to
(e2’-rji/2n)2 =- e2’rji/n, and hence o¢,2n is an nth root of unity. So when we go 2n-1 2n-1
to compute D(oAj,2.n_) = ~ Ct(~ e(2"rO(st+js)/2n)
t=0 s=0
A(o)j,2n) = Aeuen(W~.,2n) ÷ wj,2nAodd(O)~,2n) ,
2n-1 2n-1
we discover that both of the evaluations on the right-hand side have been = ~_~ Ct(~ c°~+j,2n)"
performed in the recursive step, and so we can determine A(mj,2n) using a t=0 s=0
Chapter 5 Divide and Conquer Solved Exercises 243
242
To analyze the last line, we use the fact that for any (2n)th root of unity eo 7~ 1, Solution Let’s start with a general discussion on how to achieve a nmning
"2n-1 time of O(log n) and then come back to the specific problem here. If one needs
we have v~s=O ms-= O. This is simply because eo is by definition a root of
2n--1 t to compute something using only O(log n) operations, a useful strategy that
x2n - 1 = 0; since x2n - t = (x - 1)(~t=0 X ) and eo 7~ 1, it follows that co is we discussed in Chapter 2 is to perform a constant amount of work, throw
(x-2~-I x~).
also a root of !-~t=0 away half the input, and continue recursively on what’s left. This was the
Thus the only term of the last line’s outer sum that is not equal to 0 is idea, for example, behind the O(log n) running time for binary search.
for q such that wt+j,zn = I; and this happens if t + j is a multiple of 2n, that We can view this as a divide-and-conquer approach: for some constant
y~,2n--! s K-’2n--1 1 = 2n. So we get that
is, if t = 2n -j. For this value, s=0 wt+j,2n = Z-,s=0 c > 0, we perform at most c operations and then continue recursively on an
D(a~j,2n) = 2nczn_~. Evaluating the polynomial D(x) at the (2n)th roots of unity input of size at mbst n/2. As in the chapter, we will assume that the recursion
thus gives us the coeffients of the polynomial C(x) in reverse order (multiplied "bottoms out" when n = 2, performing at most c operations to finish the
by 2n each). We sum this up as follows. computation. If T(n) denotes the running time on an input of size n, then
we have the recurrence
(S.14) For any polynomial C(x) -- -- x-’2n-1 Csxs, and corresponding polynomial
Z-,s=0
1 ~0
= X-,2n_C((’°s,
D(X) Z-~s=O 1 2n)Xs" we have that c = ~D( 2n-s,2n)" (5.16)
s
We can do a]] the evaluations of the values D(eozn_s, Zn) in O(nlog.n) T(n) < T(n/2) + c
operations using the divide-and-conquer approach developed for step (i). when n > 2, and
And this wraps everything up: we reconstruct the polynomial C from its
T(2) < c.
values on the (2n)th roots of unity, and then the coefficients of C are the
coordinates in the convolution vector c = a ¯ b that we were originally seeking.
In summary, we have shown the following. It is not hard to solve this recurrence by unrolling it, as follows.
(S.lS) Using the Fast Fourier Transforrrt to determine the product polynomial Analyzing the first few levels: M the first level of recursion, we have a
C(x), we can compute the convolution of the original vectors a and b in single problem of size n, which takes time at most c plus the time spent
O(n log rO time. in all subsequent recursive calls. The next level has one problem of size
at most n/2, which contributes another c, and the level after that has
one problem of size at most n/4, which contributes yet another c.
Identifying apattem: No matter how many levels we continue, each level
Solved Exercises will have just one problem: level j has a single problem of size at most
n/2J, which contributes c to the running time, independent of].
Solved Exercise 1
Suppose you are given an array A with n entries, with each entry holding a Summing over all levels of recursion: Each level of the recursion is
A[n] contributing at most c operations, and it takes log2 n levels of recursion to
is unimodal: For some index p between 1 and n, the values in the array entries reduce n to 2. Thus the total running time is at most c times the number
increase up to position p in A and then decrease the remainder of the way of levels of recursion, which is at most c log2 n = O(log rt).
unt~ position n. (So if you were to draw a plot with the array position j on the We can also do this by partial substitution. Suppose ~ve guess that T(n) <
x-axis and the value of the entry A[j] on the y-axis, the p!otted points would k !ogb n, where we don’t know k or b. Assuming that this holds for smaller
rise until x-value p, where they’d achieve their maximum, and then fal! from values of n in an inductive argument, we would have
there on.) T(n) < T(n/2) + c
You’d like to find the "peak entry" p without having to read the entire
array--in fact, by reading as few entries of A as possible. Show how to find _< k log,(n/2) + c
the entry p by reading at most O(log n) entries of A. = k logb n 7 k logb 2 + c.
Chapter 5 Divide and Conquer Solved Exercises 245
244
The first term on the fight is exactly what we want, so we just need to choose k there was no way to make money during the n days, you should report this
and b to negate the added c at the end. This we can do by setting b = 2 instead.)
and k = c, so that k logb 2 = c log2 2 = c. Hence we end up with the solution For example, suppose n = 3, p(1) = 9, p(2) = 1, p(3) = 5. Then you should
T(n) < c log2 n, which is exactly what we got by unrolling the recurrence. return "buy on 2, sell on 3" (buying on day 2 and selling on day 3 means they
Finally, we should mention that one can get an O(log n) running time, by would have made $4 per share, the maximum possible for that period).
essentially the same reasoning, in the more general case when each level of Clearly, there’s a simple algorithm that takes time O(n2): try all possible
the recursion throws away any constant fraction of the input, transforming an pairs of buy/sell days and see which makes them the most money. Your
instance of size n to one of size at most an, for some constant a < 1. It now investment friends were hoping for something a little better. "
takes at most log1/a n levels of recursion to reduce n down to a constant size, Show how to find the correct numbers i and ] in time O(n log n).
and each level of recnrsion involves at most c operations. Solution We’ve seen a number of instances in this chapter where a brute-
Now let’s get back to the problem at hand. If we wanted to set ourselves force search over pairs of elements can be reduced to O(n log n) by divide and
up to use (5.15), we could probe the midpoint of the array and try to determine conquer. Since we’re faced with a similar issue here, let’s think about how we
whether the "peak entry" p lies before or after this midpoint. might apply a divide-and-conquer strategy.
So suppose we look at the value A[n/2]. From this value alone, we can’t A natural approach would be to consider the first n/2 days and the final
tell whether p lies before or after n/2, since we need to know whether entry n/2 days separately, solving the problem recursively on each of these two
n/2 is sitting on an "up-slope" or on a "down-slope." So we also look at the sets, and then figure out how to get an overall solution from this in O(n) time.
values A[n/2 - 1] and A[n/2 ÷ 1]. There are now three possibilities. This would give us the usual recurrence r(n) < 2T (-~) + O(n), and hence
O(n log n) by (5.1).
If A[rt/2 - !] < A[n/2] < A[n/2 + 1], then entry n/2 must come strictly Also, to make things easier, we’ll make the usual assumption that n is a
beforep, and so we can continue recursively on entries n/2 + 1through ft. power of 2. This is no loss of generality: if n’ is the next power of 2 greater
If A[n/2 - 1] > A[n/2] > A[n/2 + 1], then entry n/2 must come strictly than n, we can set p(i) = p(n) for all i between n and n’. In this way, we do
after p, and so we can continue recursively on entries 1 through n/2 - 1. not change the answer, and we at most double the size of the input (which
Finally, if A[n/2] is larger than both A[n/2 - 1] and A[n/2 + 1], we are will not affect the O0 notation).
done: the peak entry is in fact equal to rt/2 in this case. Now, let S be the set of days 1 ..... n/2, and S’ be the set of days n/2 +
n. Our divide-and-conquer algorithm will be based on the fol!owing
In all these cases, we perform at most three probes of the array A and observation: either there is an optimal solution in which the investors are
reduce the problem to one of at most half the size. Thus we can apply (5.16) holding the stock at the end of day n/2, or there isn’t. Now, if there isn’t, then
to conclude that the running time is O(log n). the optimal solution is the better of the optimal solutions on the ,,sets S and S’.
If there is an optimal solution in which they hold the stock at the end of day
n/2, then the value of this solution is p(j) - p(i) where i ~ S and j S’. But
Solved Exercise 2 this value is maximized by simply choosing i S which minimizes p(i), and
You’re consulting for a small computation-intensive investment company, and choosing j ~ S’ which maximizes p(j).
they have the following type of problem that they want to solve over and over.
Thus our algorithm is to take the best of the following three possible
A typical instance of the problem is the following. They’re doing a simulation solutions.
in which they look at n consecutive days of a given stock, at some point in
the past. Let’s number the days i = 1, 2 ..... n; for each day i, they have a o The optimal solution on S.
price p(i) per share for the stock on that day. (We’ll assume for simplicity that o The optimal solution on S’.
the price was fixed during each day.) Suppose during this time period, they * The maximum of p(j) -p(i), over i ~ Sandy ~ S’.
wanted to buy 1,000 shares on some day and sell all these shares on some
(later) day. They want to know: When should they have bought and when The first two alternatives are computed in time T(n/2), each by recursion,
should they have sold in order to have made as much money as possible? (If and the third alternative is computed by finding the minimum in S and the
Exercises 247
Chapter 5 Divide and Conquer
246
corresponding to it, and we’ll say that two bank cards are equivalent if
maximum in S’, which takes time O(n). Thus the running time T(n) satisfies
they correspond to the same account.
It’s very difficult to read the account number off a bank card directly,
T(n) <_2T (-~) + O(n), but the bank has a high-tech "equivalence tester" that takes two bank
cards and, after performing some computations, determines whether
as desired. they are equivalent.
We note that this is not the best running time achievable for this problem.
Their question is the following: among the collection of n cards, is
In fact, one can find the optimal pair of days in O(n) tA’ne using dynamic there a set of more than n/2 of them that are all equivalent to one another?
programming, the topic of the next chapter; at the end of that chapter, we will Assume that the only feasible operations you can do with the cards are
pose this question as Exercise 7. to pick two of them and plug them in to the equivalence tester. Show how
to decide the answer to their question with only O(n log n) invocations of
the equivalence tester.
Exercises
You’ve been working with some physicists who need to study, as part of
1. You are interested in analyzing some hard-to-obtain data from two sepa- their experimental design, the interactions among large numbers of very
rate databases. Each database contains n numerical values--so there are small charged particles. Basically, their setup works as follows. They have
2n values total--and you may assume that no two values are the same. an inert lattice structure, and they use this for placing charged particles
You’d like to determine the median of this set of 2n values, which we will at regular spacing along a straight line. Thus we can model their structure
define here to be the nth smallest value. n} on the real line; and at each of
However, the only way you can access these values is through queries these points j, they have a particle with charge qJ" (Each charge can be
to the databases. Ina single query, you can specify a value k to one of the either positive or negative.)
two databases, and the chosen database will return the/(m smallest value They want to study the total force on each particle, by measuring it
that it contains. Since queries are expensive, you would like to compute and then comparing it to a computationa! prediction. This computational
the median using as few queries as possible. part is where they need your help. The total net force on particle j, by
Give an algorithm that finds the median value using at most O(log n) Couiomb’s Law, is equal to
queries.
..(i-i)2 ..(]-i)2
Recall the problem of finding the number of inversions. As in the text,
an, which we assume are all They’ve written the following simple program to compute F~ for all j:
distinct, and we define an inversion to be a pair i < j such that ai > ai.
We motivated the problem of counting inversions as a good measure n
of how different two orderings are. However, one might feel that this Initialize Fi to 0
measure is too sensitive. Let’s call a pair a significant inversion ff i <j and n
ai > 2aj. Give an O(n log n) algorithm to count the number of significant If i < j then
C qi qi
inversions between two orderings.
Else if i > j then
Suppose you’re consulting for a bank that’s concerned about fraud de- C qi qJ
Add - q 0------ ~
tection, and they come to you with the following problem. They have a Endif
collection of n bank cards that they’ve confiscated, suspecting them of Endfer
being used in fraud. Each bank card is a small plastic object, contain- Output F]
ing a magnetic stripe with some encrypted data, and it corresponds to Endfor
a unique account in the bank. Each account can have many bank cards
Notes and Further Reading 249
Chapter 5 Divide and Conquer
248
It’s not hard to analyze the running time of this program: each 5
invocation of the inner loop, over i, takes O(n) time, and this inner loop
is invoked O(n) times total, so the overall running time is O(n2).
The trouble is, for the large values of n they’re working with, the pro-
gram takes several minutes to run. On the other hand, their experimental
setup is optimized so that they can throw down n particles, perform the
measurements, and be ready to handle n more particles withJ_n a few sec-
onds. So they’d really like it ff there were a way to compute all the forces
Fi much more quickly, so as to keep up with the rate of the experiment.
Help them out by designing an algorithm that computes all the forces
F1 in O(n log n) time.
Hidden surface removal is a problem in computer graphics that scarcely Figure 5.10 An instance of hidden surface removal with five lines (labeled 1-5 in the
figure). All the lines except for 2 are visible.
needs an introduction: when Woody is standing in front of Buzz, you
should be able to see Woody but not Buzz; when Buzz is standing in
front of Woody .... well, you get the idea.
The magic of hidden surface removal is that you-can often compute natural numbers (i,j), where 1 < i < ~. and 1 _<] _< n; the nodes (i,j) and
things faster than your intuition suggests. Here’s a clean geometric ex- (k, e) are joined by an edge ff and only ff [i -/~1 + [/- el = 1.)
ample to illustrate a basic speed-up that can be achieved. You are given n We use some of the terminology of the previous question. Again,
nonvertical ]lnes in the plane, labeled L1 ..... Ln, with the i~ line specified each node u is labeled by a real number x~; you may assume that all these
by the equation y = aix + hi. We will make the assumption that no three of labels are distinct. Show how to find a local minimum of G using only
the lines all meet at a single point. We say line Lg is uppermost at a given O(n) probes to the nodes of G. (Note that G has n2 nodes.)
x-coordinate x0 if its y-coordinate at x0 is greater than the y-coordinates
of a~ the other lines at x0: a~xo + bi > aixo + b1 for all ] ~ i. We say line L~ is
visible if there is some x-coordinate at which it is uppermost--intuitively, Notes and Further Reading
some portion of it can be seen if you look down from "y = 002’
Give an algorithm that takes n lines as input and in O(n log n) time The militaristic coinage "divide and conquer" was introduced somewhat after
returns all of the ones that are visible. Figure 5.10 gives an example. the technique itself. Knuth (1998) credits John yon Neumann with one early
explicit application of the approach, the development of the Mergesort Algo-
Consider an n-node complete binary tree T, where n = 2d - 1 fo~ some d. rithm in 1945. Knuth (1997b) also provides further discussion of techniques
Each node v of T is labeled with a real number xv. You may assume that for solving recurrences.
the real numbers labeling the nodes are all distinct. A node v of T is a
The algorithm for computing the closest pair of points in the plane is due
local minimum ff the label xv is less than the label xw for all nodes w that to Michael Shamos, and is one of the earliest nontrivial algorithms in the field
are joined to v by an edge. of computational geometry; the survey paper by Staid (1999) discusses a wide
You are given such a complete binary tree T, but the labeling is only range of results on closest-point problems. A faster randomized algorithm for
specified in the following implicit way: for each node v, you can determine this problem will be discussed in Chapter 13. (Regarding the nonobviousness
the value xu by probing the node v. Show how to find a local minim _u~m of of the divide-and-conquer algorithm presented here, Staid also makes the in-
T using only O(log n) probes to the nodes of T. teresting historical observation that researchers originally suspected quadratic
Suppose now that you’re given an n x n grid graph G. (An n x n grid graph time might be the best one could do for finding the closest pair of points in
is just the adjacency graph of an n x n chessboard. To be completely the plane.) More generally, the divide-and-conquer approach has proved very
precise, it is a graph whose node set is the set of all ordered pairs of useful in computational geometry, and the books by Preparata and Shamos
Chapter
Dynamic Programmiag
brute-force search: although it’s systematically working through the exponen- We use the notation from our discussion of Interval Scheduling in Sec-
tially large set of possible solutions to the problem, it does this without ever n, with each request i specifying a
examining them all explicitly. It is because of this careful balancing act that start time s~ and a finish time f~. Each interval i now also has a value, or weight
dynamic programming can be a tricky technique to get used to; it typically v~. Two intervals are compatible if they do not overlap. The goal of our current
takes a reasonable amount of practice before one is fully comfortable with it. n} of mutually compatible intervals,
With this in mind, we now turn to a first example of dynamic program- so as to maximize the sum of the values of the selected intervals, ~ss vi-
ming: the Weighted Interval Scheduling Problem that we defined back in Let’s suppose that the requests are sorted in order of nondecreasing finish
Section 1.2. We are going to develop a dynamic programming algorithm for time: fl < f2 < "" ".<_ fn. We’ll say a request i comes before a request] if i <j.
this problem in two stages: first as a recursive procedure that closely resembles This wil! be the natural left-to-right order in which we’ll consider intervals.
brute-force search; and then, by reinterpreting this procedure, as an iterative To help in talking about this order, we define p(j), for an interval ], to be the
algorithm that works by building up solutions to larger and larger subproblems. largest index i < ] such that intervals i and j are disjoint. In other words, i
is the leftmost interval that ends before j begins. We define p(]) = 0 if no
request i < j is disjoint from ]. An example of the definition of p(]) is shown
6.1 Weighted Interval Scheduling: in Figure 6.2.
A Recursive Procedure Now, given an instance of the Weighted Interval Scheduling Problem, let’s
We have seen that a particular greedy algorithm produces an optimal solution consider an optimal solution CO, ignoring for now that we have no idea what
to the Interval Scheduling Problem, where the goal is to accept as large a it is. Here’s something completely obvious that we can say about CO: either
set of nonoverlapping intervals as possible. The Weighted Interval Scheduling interval n (the last one) belongs to CO, or it doesn’t. Suppose we explore both
Problem is a strictly more general version, in which each interval has a certain sides of this dichotomy a little further. If n ~ CO, then clearly no interval indexed
value (or weight), and we want to accept a set of maximum value. ~ strictly between p(n) and n can belong to CO, because by the definition ofp(n),
n - 1 all overlap interval n.
Moreover, if n s CO, then CO must include an optimal solution to the problem
~ Designing a Recursive Algorithm consisting of requests {1 .....
Since the original Interval Scheduling Problem is simply the special case in p(n)}--for if it didn’t, we could replace CO’s
choice of requests from {1 .....p(n)} with a better one, with no danger of
which all values are equal to 1, we know already that most greedy algorithms overlapping request n.
will not solve this problem optimally. But even the algorithm that worked
before (repeatedly choosing the interval that ends earliest) is no longer optimal
in this more general setting, as the simple example in Figure 6.1 shows.
Indeed, no natural greedy algorithm is known for this pr0blem, which is Index
what motivates our switch to dynamic programming. As discussed above, we V1 = 2
1 t p(1) = 0
wil! begin our introduction to dynamic programming with a recursive type of
172 = 4
.algorithm for this problem, and then in the next section we’ll move to a more 2 t p(2) = 0
iterative method that is closer to the style we use in the rest of this chapter.
3 V3 = 4
p(3) = 1
4 v4= 7
p(4) = 0
Index Value = 1
5 p(S) = 3
Value = 3
t
6 1 p(6) = 3
Value = 1
;
Figure 6.2 An instance of weighted interval scheduling with the functions p(i) defined
Figure 6.1 A simple instance of weighted interval scheduling. for each interval j.
6.1 Weighted Interval Scheduling: A Recursive Procedure 255
Chapter 6 Dynamic Programming
254
On the other hand, if n ~ (9, then (9 is simply equal to the optimal solution The correctness of the algorithm follows directly by induction on j:
n - 1}. This is by completely
(6,3) Compute’0pt(j) correctly computes OPT(j) for each] = 1, 2, ..,, n,
analogous reasoning: we’re assuming that (9 does not include request n; so if
n - 1}, we could
replace it with a better one. Proof. By definition OPT(0) ---- 0. Now, take some j > 0, and suppose by way
n} of induction that Compute-0pt(i) correctly computes OPT(i) for all i <j. By
involves looking at the optimal solutions of smaller problems of the form the induction hypothesis, we know that Compute-0pt(p(j)) = OPT(p(])) and
]}. Thus, for any value of] between ! and n, let (9i denote the optimal Compute-0pt(j -.1) = OPT(j -- 1); and hence from (6.1) it follows that
j}, and let OPT(j) denote
OPT(j) ----- max(u] + Compute-Opt(2(])), Compute-Opt(j -- I))
the value of this solution. (We define OPT(0) = 0, based on the convention = Compute-Opt(j).
that this is the optimum over an empty set of intervals.) The optimal solution
we’re seeking is precisely (gn, with value OPT(n). For the optimal solution (9i
j}, our reasoning above (generalizing from the case in which Unfortunately, if we really implemented the algorithm Compute-Opt as
j = n) says that either j ~ (9i’ in which case OPT(j) = 11i q- OPT(p(])), or j 0i, just written, it would take exponential time to run in the worst case. For
in which case OPT(j) = OPT(j -- 1). Since these are precisely the two possible example, see Figure 6.3 for the tree of calls issued for the instance of Figure 6.2:
the tree widens very quickly due to the recursive branching. To take a more
choices (j (9i or j 0i), we can hn-ther say that
extreme example, on a nicely layered instance like the one in Figure 6.4, where
n, we see that Compute-Opt(j) generates
(6.1) OPT(j) ---- max(v] + OPTQg(j)), OPT(j -- 1)).
separate recursive calls on problems of sizes j - 1 and j - 2. In other words,
And how do we decide whether n belongs to the optimal solution (9i? This the total number of calls made to Compute-Opt on this instance will grow
too is easy: it belongs to the optimal solution if and only if the first of the
options above is at least as good as the second; in other words,
OPT(6)
j} if and
OPT(3)
only if
Uj + OPT(p(])) >_ OPT(j -- 1). OPT(l)
These facts form the first crucial component on which a ,dynamic pro-
gramming solution is based: a recurrence equation that expresses th6 optimal OPT(3)
solution (or its value) in terms of the optimal solutions to smaller subproblems.
Despite the simple reasoning that led to this point, (6.1) is already a
OPT(l)
significant development. It directly gives us a recursive algorithm to compute
OPT(n), assuming that we have already sorted the requests by finishing time OPT(2)
Compute-Opt (])
If j----0 then
IgThe tree of subproblems~
rows very quickly. )
Keturn 0
OPT(1)
Else
Return max(u]+Compute-Opt (p (j)), Compute-OptG - I)) Figure 6.3 The tree of subproblems called by Compute-Opt on the problem instance
Endif of Figure 6.2.
Chapter 6 Dynamic Programming 6.1 Weighted Interval Scheduling: A Recursive Procedure
256 257
like the Fibonacci numbers, which increase exponentially. Thus we have not Proof. The time spent in a single call to M-Compute-0pt is O(1), excluding the
achieved a polynomial-time solution. time spent in recursive calls it generates. So the rurming time is bounded by a
constant times the number of calls ever issued to M-Compute-0pt. Since the
implementation itself gives no explicit upper bound on this number of calls,
Memoizing the Recursion we try to find a bound by looking for a good measure of "progress."
In fact, though, we’re not so far from having a polynomial-time algorithr~. The most useful progress measure here is the number of entries in M that
A fundamental observation, which forms the second crucial component of a are not "empty." Initially this number is 0; but each time the procedure invokes
dynamic programming solution, is that our recursive algorithm COmpute- the recurrence, issuing two recursive calls to M-Compute-0pt, it fills in a new
¯ Opt is really only solving n+ 1 different subproblems: Compute-0pt(0), entry, and hence increases the number of filled-in entries by 1. Since M has
Compute-0pt(n). The fact that it runs in exponential only n + I entries, it follows that there can be at most O(n) calls to M-Compute-
time as written is simply due to the spectacular redundancy in the number of Opt, and hence the running time of M-Compute-0pt(n) is O(n), as desired.
times it issues each of these calls.
How could we eliminate all this redundancy? We could store the value of
Compute-0pt in a globally accessible place the first time we compute it and Computing a Solution in Addition to Its Value
then simply use this precomputed value in place of all future recursive calls. So far we have simply computed the value of an optimal solution; presumably
This technique of saving values that have already been computed is referred we want a flail optimal set of intervals as well. It would be easy to extend
to as memoization. M-Compute-0pt so as to keep track of an optimal solution in addition to its
We implement the above strategy in the more "intelligent" procedure M- value: we could maintain an additional array S so that S[i] contains an optimal
Compute-0pt. This procedure will make use of an array M[0... hi; M[j] will i}. Naively enhancing the code to maintain
start with the value "empty," but will hold the value of Compute-0pt(j) as the solutions in the array S, however, would blow up the rulming time by an
soon as it is first determined. To determine OPT(n), we invoke M-Compute- additional factor of O(n): while a position in the M array can be updated in
Opt(n). O(1) time, writing down a set in the $ array takes O(n) time. We can avoid
this O(n) blow-up by not explicitiy maintaining $, but ra~er by recovering the
optimal solution from values saved in the array M after the optimum value
M-Compute-Opt (])
has been computed.
If ] = 0 then
Return 0 We know from (6.2) that j belongs to an optimal solution for the set
Else if M~] is not empty then j} if and only if v] + OPT(p(j)) > OPT(j -- 1). Using this
Return M~] observation, we get the following simple procedure, which "traces back"
Else through the array M to find the set of intervals in an optimal solution.
6.2 Principles of Dynamic Programming 259
Chapter 6 Dynamic Programming
258
values that come earlier in the array. Once we have the array M, the problem
Find-Solut ion (j) is solved: M[n] contains the value of the optimal solution on the full instance,
If j = 0 then and Find-Solut ion can be used to trace back through M efficiently and return
Output nothing an optimal solution itself.
Else The point to realize, then, is that we can directly compute the entries in
If ui+ M[P(J)]>-M[J- 1] then M by an iterative algorithm, rather than using memoized recursion. We just
Output j together with the result of Find-Solution(p~))
start with M[O] = 0 and keep incrementing j; each time we need to determine
Else a value M[j], the a.nswer is provided by (6.1). The algorithm looks as follows.
Output the result of Find-Solution(j- I)
Endif It erat ive-Comput e-Opt
Endif M[O] = 0
For j=l, 2 ..... n
Since Find-Solution calls itself recursively only on strictly smaller val-
MId] = max(ui + M[pq) ], M[j - 1])
ues, it makes a total of O(rt) recursive calls; and since it spends constant time Enddor
per call, we have
(6.5) Giuert the array M of the optimal ualues of the sub-problems, Find- ~ Analyzing the Algorithm
Solution returns art opdmal solution in O(n) dine. By exact analogy with the proof of (6.3), we can prove by induction on j that
this algorithm writes OPT(j) in array entry M[j]; (6.1) provides the induction
step. Also, as before, we can pass the filled-in array M to Find-Solution to
6.2 Principles of Dynamic Programming: get an optimal solution in addition to the value. Finally, the running time
Memoization or Iteration over Subproblems of Iterative-Compute-0pt is clearly O(n), since it explicitly runs for n
iterations and spends constant time in each.
We now use the algorithm for the Weighted Interval Scheduling Problem
developed in the previous section to summarize the basic principles of dynamic An example of the execution of Iterative-Compute-0pt is depicted in
programming, and also tc offer a different perspective that will be fundamental Figure 6.5. In each iteration, the algorithm fills in one additional entry of the
to the rest of the chapter: iterating over subproblems, rather than computing array M, by comparing the value of uj + M[p(j)] to the value ofM[j - 1].
solutions recursively.
A Basic Outline of Dynamic Programming
In the previous section, we developed a polynomial-time solution to the
Weighted Interval Scheduling Problem by first designing an exponential-time This, then, provides a second efficient algorithm to solve the Weighted In-
recursive algorithm and then converting it (by memoization) to an efficient terval Scheduling Problem. The two approaches clearly have a great deal of
recursive algorithm that consulted a global array M of optimal solutions to conceptual overlap, since they both grow from the insight contained in the
subproblems. To really understand what is going on here, however, it helps recurrence (6.1). For the remainder of the chapter, we wil! develop dynamic
to formulate an essentially equivalent version of the algorithm. It is this new programming algorithms using the second type of approach--iterative build-
formulation that most explicitly captures the essence of the dynamic program- ing up of subproblems--because the algorithms are often simpler to express
ming technique, and it will serve as a general template for the algorithms we this way. But in each case that we consider, there is an equivalent way to
develop in later sections. formulate the algorithm as a memoized recursion.
Most crucially, the bulk of our discussion about the particular problem of
selecting intervals can be cast more genera~y as a rough template for designing
f! Designing the Algorithm dynamic programming algorithms. To set about developing an algorithm based
The key to the efficient algorithm is really the array M. It encodes the notion
on dynamic programming, one needs a collection of subproblems derived from
that we are using the value of optimal solutions to the subproblems on intervals
the original problem that satisfies a few basic properties.
j} for each j, and it uses (6.1) to define the value of M[j] based on
6.3 Segmented Least Squares: Multi-way Choices
260 Chapter 6 Dynamic Programming 261
Figure 6.5 Part (b) shows the iterations of r~cerag±ve-¢omPu~Ce-0P~c on the sample ~ The Problem
instance of Weighted Interval Scheduling depicted In part (a). Often when looking at scientific or statistical data, plotted on a two-
dimensional set of axes, one tries to pass a "line of best fit" through the
data, as in Figure 6.6.
(i) There are only a polynomial number of subproblems. This is a foundational problem in statistics and numerical analysis, formu-
(ii) The solution to the original problem can be easily computed from the lated as follows. Suppose our data consists of a set P of rt points in the plane,
solutions to the subproblems. (For example, the original problem may (xn, Yn); and suppose x < x < .. ¯ < x Given
actually be one of the subproblems.) a line L defined by the equation y = ax + b, we say1 that2the error n. of L with
respect to P is the sum of its squared "distances" to the points in P:
(iii) There is a natural ordering on subproblems from "smallest" to "largest;’
together with an easy-to-compute recurrence (as in (6.1) and (6.2)) that n
allows one to determine the solution to a subproblem from the solutions Error(L, P) = Y~,(Yi - axi - b)2-
to some number of smaller subproblems. i=1
0 O0o
0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0O0 O0 0 0 0O0 O0
Figure 6.7 A set of points that lie approxLmately on two lines. Figure 6.8 A set of points that lie approximately on three lines.
with the input, and by tuning C, we can penalize the use of additional lines Suppose we let OPT(i) denote the optimum solution for the points
to a greater or lesser extent.) Pi, and we let ei,j denote the minimum error of any line with re-
There are exponentially many possible partitions of P, and initially it is not pj. (We will write OPT(0) = 0 as a boundary case.) Then
our observation above says the following.
clear that we should be able to find the optimal one efficiently. We now show
how to use dynamic programming to find a partition of minimum penalty in
(6.6) If the last segment of the optimal partition is Pi .....
time polynomial in n. Pn, then the value
of the optimal solution is OPT(n) = ei,n + C + OPT(i -- 1).
~ Designing the Algorithm Using the same observation for the subproblem consisting of the points
To begin with, we should recall the ingredients we need for a dynamic program- p], we see that to get OPT(]) we should find the best way to produce a
ming algorithm, as outlined at the end of Section 6.2.We want a polynomial p]--paying the error plus an additive C for this segment--
number of subproblems, the solutions of which should yield a Solution to the together with an optimal solution OPT(i -- 1) for the remaining points. In other
original problem; and we should be able to build up solutions to these subprob- words, we have iustified the following recurrence.
1eros using a recurrence. As with the Weighted Interval Scheduling Problem,
it helps to think about some simple properties of the optimal sohition. Note,
however, that there is not really a direct analogy to weighted interval sched- p~,
uling: there we were looking for a subset of n obiects, whereas here we are OPT(]) = min(ei ~ + C + OPT(i -- 1)),
seeking to partition n obiects.
For segmented least squares, the following observation is very usefi.d: p1 is used in an optimum solution for the subproblem
The last point Pn belongs to a single segment in the optimal partition, and if and only if the minimum is obtained using index i.
that segment begins at some earlier point Pi- This is the type of observation
that can suggest the right set of subproblems: if we knew the identity of the The hard part in designing the algorithm is now behind us. From here, we
Pn (see Figure 6.9), then we could remove those points simply build up the solutions OPT(i) in order of increasing i.
from consideration and recursively solve the problem on the remaining points
Pi-l" Segmented-Least-Squares (n)
Array M[O... n]
Set M[0]---- 0
For all pairs i<j
OPT(i- 1) i
I( % .o._0-o-o-°-
P]
0 End/or
0 n
0 Use the recurrence (6.7) to compute P/!~]
0
End/or
0
Return M[n]
00000 O0
the subproblems. The tricky issue here lies in figuring out a good set of value for the remaining available weight w. Assume that W is an integer, and
all requests i = 1 ..... n have integer weights wi. We will have a subproblem
subproblems.
for each i = O, 1 ..... n and each integer 0 < w < W. We will use OPT(i, IU) tO
denote the value of the optimal solution using a subset of the items {1 .....
/J Designing the Algorithm with maximum allowed weight w, that is,
A False Start One general strategy, which worked for us in the case of
Weighted Interval Scheduling, is to consider subproblems involving only the
first i requests. We start by trying this strategy here. We use the notation
OPT(i), analogously to the notation used before, to denote the best possible i} that satisfy ~jss wj <_ w.
i}. The key to our method for
Using this new set of subproblems, we will be able to express the value
the Weighted Interval Scheduling Problem was to concentrate on an optimal OPT(i, w) as a simple expression in terms of values from smaller problems.
solution CO to our problem and consider two cases, depending on whether or Moreover, OPT(n, W) is the quantity we’re looking for in the end. As before,
not the last request n is accepted or rejected by this optimum solution. Just let 0 denote an optimum solution for the original problem.
as in that case, we have the first part, which follows immediately from the
o If n CO, then OPT(n, W) --- OPT(n -- 1, W), since we can simply ignore
definition of OPT(i).
item n.
o If n CO, then OPT(n) = OPT(n -- t). If n ~ CO, then OPT(n, W) = Wn ÷ OPT(n -- 1, W - wn), since we now seek
Next we have to consider the case in which n ~ CO. What we’d like here to use the remaining capacity of W - wn in an optimal way across items
is a simple recursion, which tells us the best possible value we can get for n-1.
solutions that contain the last request n. For Weighted Interval Scheduling
When the nth item is too big, that is, W < wn, then we must have OPT(n, W) =
this was easy, as we could simply delete each request that conflicted with
OPT(n -- 1, W). Otherwise, we get the optimum solution allowing all n requests
request n. In the current problem, this is not so simple. Accepting request n
by taking the better of these two options. Using the same line of argument for
does not immediately imply that we have to reject any other request. Instead,
n - 1} that we will accept, the subproblem for items {1 ..... i}, and maximum allowed weight w, gives
us the following recurrence.
we have less available weight left: a weight of wn is used on the accepted
request n, and we only have W - wn weight left for the set S of remaining
(6.8) If w < wi then OPT(i, W) = OPT(i -- 1, w). Otherwise
requests that we accept. See Figure 6.10.
A Better Solution This suggests that we need more subproblems: To find out OPT(i, W) = max(oPT(i" 1, Iv), wi + OPT(i -- 1, U~ -- lVi)).
the value for OPT(n) we not only need the value of OPT(n -- 1), but we also need
to know the best solution we can get using a subset of the first n- 1 items As before, we want to design an algorithm that builds up a table of all
and total allowed weight W - wn. We are therefore going to use many more OPT(i, w) values while computing each of them at most once.
i} of the items, and each possible
Subset-Sum(n, W)
Array M[0... n, 0... W]
W
For i=1,2 ..... n
,I For w=0 ..... W
Use the recurrence (6.8) to compute M[i, w]
End/or
End/or
Figure 6.10 After item n is included in the solution, a weight of ran is used up and there Return M[/Z, W]
is W - tun available weight left.
Chapter 6 Dynamic Programming 6.4 Subset Sums and Knapsacks: Adding a Variable 271
270
M[i - 1, u? -- wi]. To recover an optimal set S of items, we can trace back through the array
As an example of this algorithm executing, consider an instance with M by a procedure similar to those we developed in the previous sections.
weight limit W = 6, and n = 3 items of sizes w1 = w2 = 2 and w3 = 3. We find
that the optimal value OPT(3, 6) = 5 (which we get by using the third item and (6.10) Given a table M of the optimal values of thesubproblems, the optimal
one of the first two items). Figure 6.12 illustrates the way the algorithm fills set S can be found in O(n) time.
in the two-dimensional table of OPT values row by row.
Next we w/l! worry about the running time of this algorithm. As before .in
Extension: The Knapsack Problem
the case of weighted interval scheduling, we are building up a table of solutions
M, and we compute each of the values M[i, w] in O(1) time using the previous The Knapsack Problem is a bit more complex than the scheduling problem we
values. Thus the running time is proportional to the number of entries in the discussed earlier. Consider a situation in which each item i has a normegative
table. weight wi as before, and also a distinct value vi. Our goal is now to find a
6.5 RNA Secondary Structure: Dynamic Programming over Intervals 273
272 Chapter 6 Dynamic Programming
subset S of maximum value ~],i~s vi, subject to the restriction that the total G C
weight of the set should not exceed W: ~i~s wi <- W.
It is not hard to extend our dynamic programming algorithm to this more U A
general problem. We use the analogous set of subproblems, OPT(i, I/Y), to denote A U
the value of the optimal solution using a subset of the items {1 ..... i} and
maximum available weight w. We consider an optimal solution (9, and identify
U~ ~A G~ C
two cases depending on whether or not n E (9.
Using this line of argument for the subproblems implies the following analogue
of {6.8}.
(6.11) If tv < wi then OPT(i, W) = OPT(i -- 1, W). Otherwise
~C
OPT(i, W) = max(oPT(i -- 1, w), vi + OPT(i -- 1, w -- wi)). A
Using this recurrence, we can write down a completely analogous dynamic Figure 6.13 An RNA secondary structure. Tinck lines connect adjacent elements of the
sequence; thin lines tndlcate parrs of elements that are matched.
programming algorithm, and this implies the following fact.
(i) (No sharp tams.) The ends of each pair in S are separated by at least four (b)
intervening bases; that is, if (i,j) ~ S, then i <j - 4. Figure 6.14 Two views of an RNA secondaI3, structure. In the second view, (b), the
(ii) The elements of any pair in S consist of either {A, U} or {C, G} (in either string has been "stretched" lengthwise, and edges connecting matched pairs appear as
noncrossing "bubbles" over the string.
order).
(iii) S is a matching: no base appears in more than one pair.
(iv) (The noncrossing condition.) If (i, j) and (k, g) are two pairs in S, then a single-stranded RNA molecule B = bib2 ¯ ¯ ¯ bn and determines a secondary
we cannot have i < k <j < g. (See Figure 6.14 for an illustration.) structure S with the maximum possible number of base pairs.
Note that the RNA secondary structure in Figure 6.13 satisfies properties (i)
through (iv). From a structural point of view, condition (i) arises simply ~ Designing and Analyzing the Algorithm
because the RNA molecule cannot bend too sharply; and conditions (ii) and A First Attempt at Dynamic Programming The natural first attempt to
(iii) are the fundamental Watson-Crick rules of base-pairing. Condition (iv) is apply dynamic programming would presumably be based on the following
the striking one, since it’s not obvious why it should hold in nature. But while subproblems: We say that OPT(j) is the maximum number of base pairs in a
there are .sporadic exceptions to it in real molecules (via so-called pseudo- secondary structure on bib2¯ .. bj. By the no-sharp-turns condition above, we
knotting), it does turn out to be a good approximation to the spatial constraints know that OPT(j) = 0 for j < 5; and we know that OPT(n) is the solution we’re
. looking for.
Now, out of all the secondary structures that are possible for- a single The trouble comes when we try writing down a recurrence that expresses
RNA molecule, which are the ones that are likely to arise under physiological OPT(j) in terms of the solutions to smaller subproblems. We can get partway
conditions? The usual hypothesis is that a single-stranded RNA molecule wfl! there: in the optimal secondary structure on bib2¯ ¯ ¯ bj, it’s the case that either
form the secondary structure with the optimum total flee energy. The correct o j is not involved in a pair; or
model for the free energy of a secondary structure is a subject of much debate;
e j pa~s with t for some t < j - 4.
but a first approximation here is to assume that the flee energy of a secondary
structure is proportional simply to the number of base pairs that it contains. In the first case, we just need to consult our solution for OPT(j -- 1). The second
Thus, having said all this, we can state the basic RNA secondary structure case is depicted in Figure 6.15(a); because of the noncrossing condition,
prediction problem very simply: We want an efficient algorithm that takes we now know that no pair can have one end between 1 and t- 1 and the
other end between t 4- 1 and j - 1. We’ve therefore effectively isolated two
new subproblems: one on the bases blb2 . .. bt_l, and the other on the bases
3 Note that the symbol T from the alphabet of DNA has been replaced by a U, but this is not important bt+~¯ ¯ ¯ bj_~. The first is solved by OPT(t -- 1), but the second is not on our list
of subproblems, because it does not begin with b~.
for us here.
6.5 RNA Secondary Structure: Dynamic Programming over Intervals
276 Chapter 6 Dynamic Programming 277
As always, we can recover the secondary structure itself (not just its value)
by recording how the minima in (6.13) are achieved and tracing back through
the computation.
which involves three gaps and no mismatches. Which is better: one gap and
one mismatch, or three gaps and no mismatches?
6.6 Sequence Alignment
For the remainder of this chapter, we consider two further dynamic program- This discussion has been made easier because we know roughly what
ming algorithms that each have a wide range of applications. In the next two the correspondence ought to !ook like. When the two strings don’t look like
sections we discuss seqaence alignment, a fundamental problem that arises English words--for example, abbbaabbbbaab and ababaaabbbbbab--it may
take a little work to decide whether they can be lined up nicely or not:
in comparing strings. Following this, we turn to the problem of computing
shortest paths in graphs when edges have costs that may be negative.
abbbaa--bbbbaab
ababaaabbbbba-b
~ The Problem
Dictionaries on the Web seem to get more and more useful: often it seems easier
Dictionary interfaces and spell-checkers are not the most computationally
to pull up a bookmarked online dictionary than to get a physical dictionary
intensive application for this type of problem. In fact, determining similarities
down from the bookshelf. And many online dictionaries offer functions-that
among strings is one of the central computational problems facing molecular
you can’t get from a printed one: if you’re looking for a definition and type inca
biologists today.
word it doesn’t contain--say, ocurrance--it will come back and ask, "Perhaps
you mean occurrence?" How does it do this? Did it truly know what you had Strings arise very naturally in biology: an organism’s genome--its ful! set
in mind? of genetic material--is divided up into giant linear DNA molecules known
as chromosomes, each of which serves conceptually as a one-dimensional
Let’s defer the second question to a different book and think a little about
chemical storage device. Indeed, it does not obscure reality very much to
the first one. To decide what you probably meant, it would be natural to search
think of it as an enormous linear tape, containing a string over the alphabet
the dictionary for the word most "similar" to the one you typed in. To do this,
{A, C, G, T}. The string of symbols encodes the instructions for building
we have to answer the question: How should we define similarity between
protein molecules; using a chemical mechanism for reading portions of the
two words or strings? chromosome, a cell can construct proteins that in turn control its metabolism.
Intuitively, we’d like to say that ocurrance and occurrence are similar
Why is similarity important in this picture? To a first approximation, the
because we can make the two words identical if we add a c to the first word
sequence of symbols in an organism’s genome can be viewed as determining
and change the a to an e. Since neither of these changes seems so large, we
the properties of the organism. So suppose we have two strains of bacteria,
conclude that the words are quite similar. To put it another way, we can nearly
X and Y, which are closely related evolutionarlly. Suppose further that we’ve
line up the two words letter by letter:
determined that a certain substring in the DNA of X codes for a certain kind
of toxin. Then, if we discover a very "similar" substring in the DNA of Y,
we might be able to hypothesize, before performing any experiments at all,
that this portion of the DNA in Y codes for a similar kind of toxin. This use
of computation to guide decisions about biological experiments is one of the
The hyphen (-) indicates a gap where we had to add a letter to the second hallmarks of the field of computational biology.
word to get it to line up with the first. Moreover, our lining up is not perfect
Al! this leaves us with the same question we asked initially, while typing
in that an e is lined up with an a.
badly spelled words into our online dictionary: How should we define the
We want a model in which similarity is determined roughly by the number notion of similarity between two strings?
of gaps and mismatches we incur when we line up the two words. Of course,
In the early 1970s, the two molecular biologists Needleman and Wunsch
there are many possible ways to line up the two words; for example, we could
proposed a definition of similarity, which, basically unchanged, has become
have written
Chapter 6 Dynamic Programming 6.6 Sequence Alignment 281
280
the standard definition in use today. Its position as a standard was reinforced by view, in designing an algorithm for sequence alignment, we wil! take them as
given. To go back to our first example, notice how these parameters determine
its simplicity and intuitive appeal, as wel! as through its independent discovery
by several other researchers around the same time. Moreover, this definition of which alignment of ocurrance and occurrence we should prefer: the first is
similarity came with an efficient dynamic programming algorithm to compute strictly better if and only if ~ + O~ae < 33.
it. In this way, the paradigm of dynamic programming was independently
discovered by biologists some twenty years after mathematicians and computer ~ Designing the Algorithm
scientists first articulated it. We now have a concrete numerical definition for the similarity between
The definition is motivated by the considerations we discussed above, and strings X and Y: it is the minimtim cost of an alignment between X and Y. The
in particular by the notion of "lining up" two strings. Suppose we are given two lower this cost, the more similar we declare the strings to be. We now turn to
strings X and Y, where X consists of the sequence of symbols XlX2¯ ¯ ¯ xm and Y the problem of computing this minimum cost, and an optimal alignment that
consists of the sequence of symbols Y~Y2" " "Yn- Consider the sets {1, 2 ..... In} yields it, for a given pair of strings X and Y.
and {1, 2 ..... n} as representing the different positions in the strings X and Y, One of the approaches we could try for this problem is dynamic program-
and consider a matching of these sets; recall that a matching is a set of ordered ruing, and we are motivated by the fo!lowing basic dichotomy.
pairs with the property that each item occurs in at most one pair. We say that a
o In the optimal alignment M, either (in, n) ~ M or (in, n) ~ M. (That is,
matching M of these two sets is an alignment if there are no "crossing" pairs: either the last symbols in the two strings are matched to each other, or
if (i, ]), (i’, j’) ~ M and i < i’, then j < j’. Intuitively, an alignment gives a Way they aren’t.)
of lining up the two strings, by telling us which pairs of positions will be lined
up with one another. Thus, for example, By itself, this fact would be too weak to provide us with a dynamic program-
ming solution. Suppose, however, that we compound it with the following
stop- basic fact.
-tops (6.14) Let M be any alignment of X and Y. If (in, n) C M, then either the
Inth position of X or the nth position of Y is not matched in M.
corresponds to the alignment {(2, 1), (3, 2), (4, 3)}.
Our definition of similarity will be based on finding the optiinal alignment Proof. Suppose by way of contradiction that (in, n) 9~ M, and there are num-
between X and Y, according to the following criteria. Suppose M is a given bers i < In andj < n so that (in,j) ¢M and (i, n) CM. But this contradicts our
alignment between X and Y. definition ofaligninent: we have (i, n), (in,j) ¢M with i < In, but n > i so the
o First, there is a parameter ~ > 0 that defines a gap penalty. For each pairs (i, n) and (in,i) cross. []
position of X or Y that is not matched in M--it is a gap--we incur a
There is an equivalent way to write (6.14) that exposes three alternative
cost of 3.
possibilities, and leads directly to the formulation of a recurrence.
o Second, for each pair of letters p, q in our alphabet, there is a Inisinatch
cost of %q for lining up p with q. Thus, for each (i, j) ~ M, we pay the (6.i5) In an optiinal alignment M, at least one of the following is true:
appropriate mismatch cost o~xiyj for lining up xi with yj. One generally
assumes that %~ = 0 for each letter p--there is no mismatch cost to line (i) (In, n) ~ M; or
up a letter with another copy of itself--although this wil! not be necessary (iO the Inth position of X is not matched; or
in anything that follows. (iii) the n~h position of Y is not matched.
o The cost of M is the sum of its gap and mismatch costs, and we seek an
alignment of minimum cost. Now, let OPT(i, j) denote the minimum cost of an alignment between
The process of minimizing this cost is often referred to as sequence aligninent xlx2...xi and YlY2""Yj. If case (i) of (6.15) holds, we pay aXmy, and then
align XlX2 ¯ ¯ ¯ xm_l as well as possible with YlY2 " ¯ ¯ Yn-1; we get OPT(m, n) =
in the biology literature. The quantities ~ and {oOq) are external parameters
that must be plugged into software for sequence alignment; indeed, a lot of ~x,~,, + OPT(In -- 1, n -- 1). If case (ii) holds, we pay a gap cost of ~ since the
work goes into choosing the se~ngs for these parameters. From our point of Inth position of X is not matched, and then we align xlx2 ¯ ¯ ¯ Xm-1 as we!l as
Chapter 6 Dynamic Programming 6.6 Sequence Alignment 283
282
possible with YlYa ¯ "" Yn- In this way, we get OPT(m, n) = ~ + OPT(m -- !, n). x3
Similarly, if case (iii) holds, we get OPT(m, n) = ~ + OPT(m, n -- 1).
Using the same argument for the subproblem of finding the minimum-cost x2
alignment between XlX2" ¯ ¯ xi and YlY2" " "Yi’ we get the following fact.
(6.16) The minimum al~nment costs satisfy the ~olIotving recurrenCe for i > 1 x1
and ] >_ 1:
OPT(i, ]) = min [axiyj + OPT(i ’ 1’] ~ 1), ~ + opT(i" 1,]),~ + OPT(i,] " i)],
Moreover, (i. ]) is in an optimal alignment M for this subproblem if and only Yl Y2 Y3 Y4
It is easy to verify that when this algorithm completes, the array entry this backward version, naturally enough, as Backward-Space-Efficient-
Alignment.
B[i, 1] holds the value of OPT(i, rt) for i = 0, 1 ..... m. Moreover, it uses O(mn)
time and O(m) space. The problem is: where is the alignment itselff We Combining the Forward and Backward Formulations So now we have
haven’t left enough information around to be able to run a procedure like syrmnetric algorithms which build up the values of the functions f and g.
Find-Alignment. Since B at the end of the algorithm only contains the last The idea will be to use these two algorithms in concert to find the optimal
two columns of the original dynamic programming array A, if we were to try alignment. First, here are two basic facts summarizing some relationships
tracing back to get the path, we’d run out of information after iust these two between the functions f and g.
columns. We could imagine getting around this difficulty by trying to "predict"
what the alignment is going to be in the process of running our space-efficient (6.19) The ler~th of the shortest comer-to-comer path in Gxy that passes
procedure. In particular, as we compute the values in the jth column Of the through (i,j) is [(i,j) + g(i,j).
(now implicit) array A, we could try hypothesizing that a certain entry has a
very small value, and hence that the alignment that passes through this entry Proof. Let ~q denote the length of the shortest corner-to-corner path in Gxv
that passes through (i,j). Clearly, any such path must get from (0, 0) to (i,j)
is a promising candidate to be the optimal one. But this promising alignment
might run into big problems later on, and a different alignment that currently and then from (i,j) to (m, n). Thus its length is at least [(i,j) +g(i,j), and so
we have ~ii > f(i,j) + g(i,j). On the other hand, consider the corner-to-corner
looks much less attractive could turn out to be the optimal one.
path that consists of a minimum-length path from (0, 0) to (i, j), followed by a
There is, in fact, a solution to this problem--we will be able to recover
minimum-length path from (i,j) to (m, n). This path has length f(i, ]) + g(i, j),
the alignment itself using O(m + n) space--but it requires a genuinely new and so we have ~0 <- [(i, j) + g(i, j). It follows that gij = [(i, j) + g(i, j). []
idea. The insight is based on employing the divide-and-conquer technique
that we’ve seen earlier in the book. We begin with a simple alternative way to
implement the basic dynamic programming solution. (6,20) Let k be any number in {0, .,n}, and let q be an index that
minimizes the quantity [(q, k) + g(q, k). Then there is a comer-to-comer path
A Backward Formulation of the Dynamic Program Recall that we use f(i, j) of minimum length that passes through the node (q, k).
to denote the length of the shortest path from (0, 0) to (i, j) in the graph Gxv.
(As we showed in the initial sequence alignment algorithm, [(i,j) has the Proof. Let ~* denote the length of the shortest corner-to-corner path in Gxy.
same value as OPT(i,j).) Now let’s define g(i,j) to be the length of the shortest Now fix a value of k ~ {0 ..... n}. The shortest corner-to-corner path must use
path from (i, ]) to (m, n) in Gxv. The function g provides an equally natural some node in the kth column of Gx~,--let’s suppose it is node (p, k)--and thus
dynamic programming approach to sequence alignment, except that we build by (6.19)
it up in reverse: we start with g(m, n) = 0, and the answer we want is g(0, 0).
By strict analogy with (6.16), we have the following recurrence for g. e.* = f(p, k) + g(p, k) >_ rain
q f(q, k) + g(q, k).
Now consider the index q that achieves the minimum in the right-hand side
(6.18) For i < mandj < n we have
of this expression; we have
g(i, ]) = min[c%+~yj+1 + g(i + 1, j + 1), ~ + g(i, ] + 1), 3 - g(i + 1, j)].
~.* >_ f(q, k) + g(q, k).
By (6.19) again, the shortest corner-to-corner path using, the node (q, k) has
This is just the recurrence one obtains by taking the graph GxT, "rotating" length f(q, k) + g(q, k), and since g* is the minimum length of any corner-to-
it so that the node (m, n) is in the lower left corner, and using the previous ap- corner path, we have
proach. Using this picture, we can also work out the full dynamic programming
algorithm to build up the values of g, backward starting from (m, n). Similarly, ~* <~ f(q, k) + g(q, k).
there is a space-efficient version of this backward dynamic programming al-
gorithm, analogous to Space-Efficient-Alignment, which computes the It follows that ~* = f(q, k) + g(q, k). Thus the shortest corner-to-corner path
value of the optimal alignment using ordy O(m ÷ n) space. We will refer to using the node (q, k) has length ~*, and this proves (6.20). []
6.7 Sequence Alignment in Linear Space via Divide and Conquer 289
Chapter 6 Dynamic Programming
288
Using (6.20) and our space-efficient algorithms to compute the value of the Second recursive call
optimal alignment, we will proceed as follows. We divide Gxy along its center
column and compute the value of f(f, n/2) and g(i, n/2) for each value of i,
using our two space-efficient algorithms. We can then determine the minimum
value of f(i, n/2) + g(f, n/2), and conclude via (6.20) that there is a shortest
corner-to-corner path passing through the node (f, n/2). Given this, we can
search for the shortest path recursively in the portion of Gxy between (0, 0)
and (i, n/2) and in the portion between (i, n/2) and (m, n). The crucial point
is that we apply these recursive calls sequentially and reuse the working space
from one call to the next. Thus, since we only work on one recursive call at a
time, the total space usage is O(m + n). The key question we have to resolve Yl Y2 Y3 ~
is whether the running time of this algorithm remains O(rnn). First recursive call
In running the algorithm, we maintain a globally accessible list P which Figure 6.19 The first level of recurrence for the space-efficient Divide-and-Conquer-
will hold nodes on the shortest corner-to-corner path as they are discovered. Alignment. The two boxed regions indicate the input to the two recursive cells.
Initially, P is empty. P need only have rn + n entries, since no corner-to-corner
path can use more than this many edges. We also use the following notation:
X[i :j], for 1 _< f < j _< rn, denotes the substring of X consisting of xixi+l ...x j; (6.21) The running t~me of Divide-and-Conquer-Alignment on strings
and we define Y[i :j] analogously. We will assume for simplicity that n is a length m and n is O(mn).
power of 2; this assumption makes the discussion much cleaner, although it
can be easily avoided. Proof. Let T(m, n) denote the maximum running time of the algorithm on
strings of length m and n. The algorithm performs O(mn) work to build up
Divide-and-Conquer-Alignment (X, Y) the arrays B and B’; it then runs recursively on strings of size q and n/2, and
Let m be the number of symbols in X on strings of size m - q and n/2. Thus, for some constant c, and some choice
Let n be the number of symbols in Y of index q, we have
If m_<2 or ~<_2 then T(ra, n) < cran + T(q, n/2) + T(m - q, n/2)
Compute optimal alignment using Alignment (X , Y) T(m, 2) < cm
Call Space-Efficient-Alignment (X, Y[I : n/2])
T(2, n) <_ cn.
Call Backward-Space-Efficient-Alignment (X, Y[n/2 + 1 : hi)
Let q be the index minimizing [(q, n/2)+g(q, ~/2) This recurrence is more complex than the ones we’ve seen in our earlier
Add (q, n/Z) to global list P applications of divide-and-conquer in Chapter 5. First of all, the running time
Divide-and-Conquer-Alignment (X[I : q], Y[I : n/2]) is a function of two variables (m and n) rather than just one; also, the division
Divide-and-Conquer-Alignment (X[q + 1 : n], Y[n/2 + 1 : hi) into subproblems is not necessarily an "even split," but instead depends on
Return P the value q that is found through the earlier work done by the algorithm.
So how should we go about solving such a recurrence? One way is to
As an example of the first level of recursion, consider Figure 6.19. If the try guessing the form by considering a special case of the recurrence, and
minimizing index q turns out to be 1, we get the two subproblems pictured. then using partial substitution to fill out the details of this guess. Specifically,
suppose that we were in a case in which rn = n, and in which the split point
q were exactly in the middle. In this (admittedly restrictive) special case, we
Analyzing the Algorithm could write the function T(.) in terms of the single variable n, set q = n/2
The previous arguments already establish that the algorithm returns the correct (since we’re assuming a perfect bisection), and have
answer and that it uses O(m + n) space. Thus, we need only verify the
T(n) < 2T(n/2) + cn2.
following fact.
6.8 Shortest Paths in a Graph 291
Chapter 6 Dynamic Programming
290
routing algorithms that determine the most efficient path in a communication
This is a useful expression, since it’s something that we solved in our earlier
network.
discussion of recurrences at the outset of Chapter 5. Specifically, this recur-
rence implies T(n) = O(n2). In this section and the next two, we will consider the following two related
problems.
So when m = n and we get an even split, the running time grows like the
square of n. Motivated by this, we move back to the fully general recurrence Given a graph G with weights, as described above, decide if G has a
for the problem at hand and guess that T(m, n) grows like the product of m and negative cycle--that is, a directed cycle C such that
n. Specifically, we’ll guess that T(m, n) ! kmn for some constant k, and see if
we can prove this by induction. To start with the base cases m ! 2 and n ! 2,
we see that these hold as long as k >_ c/2. Now, assuming T(m’, n’) ! km’n’
holds for pairs (m’, n’) with a smaller product, we have o If the graph has no negative cycles, find a path P from an origin node s
T(m, n) ! cmn + T(q, n/2) + T(m - q, n/2) to a destination node t with minimum total cost:
! cmn + kqn/2 + k(m - q)n/2 Ec
= cmn + kqn/2 + kmn/2 - kqn/2 ijEP
= (c + k/2)mn. should be as small as possible for any s-t path. This is generally called
both the Minimum-Cost Path Problem and the Shortest-Path Problem.
Thus the inductive step will work if we choose k = 2c, and this completes the,
proof. [] In terms of our financial motivation above, a negative cycle corresponds to a
profitable sequence of transactions that takes us back to our starting point: we
buy from i1, sell to i2, buy from i2, sell to i3, and so forth, finally arriving back
6.8 Shortest Paths in a Graph at i1 with a net profit. Thus negative cycles in such a network can be viewed
For the final three sections, we focus on the problem of finding shortest paths as good arbitrage opportunities.
in a graph, together with some closely related issues. It makes sense to consider the minimum-cost s-t path problem under the
assumption that there are no negative cycles. As illustrated by Figure 6.20, if
~ The Problem there is a negative cycle C, a path Ps from s to the cycle, and another path Pt
Let G = (V, E) be a directed graph. Assume that each edge (i, j) ~ E has an from the cycle to t, then we can build an s-t path of arbitrarily negative cost:
associated weight cij. The weights can be used to model a number of different we first use Ps to get to the negative cycle C, then we go around C as many
things; we will picture here the interpretation in which the weight cq represents times as we want, and then we use Pt to get from C to the destination t.
a cost for going directly from node f to node j in the graph.
Earlier we discussed Diikstra’s Algorithm for finding shortest paths in /J Designing and Analyzing the Algorithm
graphs with positive edge costs. Here we consider the more complex problem A Few False Starts Let’s begin by recalling Dijkstra’s Algorithm for the
in which we seek shortest paths when costs may be negative. Among the Shortest-Path Problem when there are no negative costs. That method
motivations for studying this problem, here are two that particularly stand
out. First, negative costs turn out to be crucial for modeling a number of
phenomena with shortest paths. For example, the nodes may represent agents
in a financial setting, and cq represents the cost of a transaction in which
we buy from agent i and then immediately sell to agent j. In this case, a
path would represent a succession of transactions, and edges with negative
costs would represent transactions that result in profits. Second, the algorithm
that we ’develop for dealing with edges of negative cost turns out, in certain
crucial ways, to be more flexible and decentralized than Dijkstra’s Algorithm. Figure 6.20 In this graph, one can find s-t paths of arbitrarily negative cost (by going
around the cycle C many times).
As a consequence, it has important applications for the design of distributed
6.8 Shortest Paths in a Graph 293
Chapter 6 Dynamic Programming
292
computes a shortest path from the origin s to every other node v in the graph,
essentially using a greedy algorithm. The basic idea is to maintain a set S
with the property that the shortest path from s to each node in S is known.
We start with S = {s}--since we know the shortest path from s to s has cost 0 P
when there are no negative edges--and we add elements greedily to this set S.
As our first greedy step, we consider the minimum-cost edge leaving node s, Figure 6.22 The minimum-cost path P from v to t using at most i edges.
(a) that is, mini~v csi. Let v be a node on which this minimum is obtained. A key
observation underlying Dijkstra’s Algorithm is that the shortest path from s
to v is the single-edge path {s, v}. Thus we can immediately add the node (6.22) I[ G has no negative cycles, then there is a shortest path firom s to t
to the set S. The path {s, v} is clearly the shortest to v if there are no negative that is simple (i.e., does not repeat nodes), and hence has at most n - 1 edges.
edge costs: any other path from s to v would have to start on an edge out of s
that is at least as expensive as edge (s, Proof. Since every cycle has nonnegative cost, the shortest path P from s to
The above observation is no longer true if we can have negative edge t with the fewest number of edges does not repeat any vertex v. For if P did
costs. As suggested by the example in Figure 6.21 (a), a path that starts on an repeat a vertex v, we could remove the portion of P between consecutive visits
expensive edge, but then compensates with subsequent edges of negative cost, to v, resulting in a path of no greater cost and fewer edges. ,,
can be cheaperthan a path that starts on a cheap edge. This suggests that the
Figure 6.21 (a) With negative
edge costs, Dijkstra’s Algo- Diikstra-style greedy approach will not work here. Let’s use OPT(i, ~) to denote the minimum cost of a v-t path using at most
rithIn can give the wrong Another natural idea is to first modify the costs cij by adding some large i edges. By (6.22), our original problem is to compute OPT(n -- 1, S). (We could
answer for the Shortest-Path instead design an algorithm whose subproblems correspond to the minimum
Problem. (b) Adding 3 to the constant M to each; that is, we let c~j = c0 + M for each edge (i,)) ~ E. If the cost of an s-v path using at most i edges. This would form a more natural
cost of each edge wi!l make constant M is large enough, then all modified costs are nonnegafive, and we
all edges nonnegative, but it parallel with Dijkstra’s Algorithm, but it would not be as natural in the context
can use Diikstra’s Algorithm to find the minimum-cost path subiect to costs
will change the identity of the of the routing protocols we discuss later.)
shortest s-t path. c’. However, this approach fails to find the correct minimum-cost paths with
respect to the original costs c. The problem here is that changing the costs from We now need a simple way to express OPT(i, V) using smaller subproblems.
c to c’ changes the minimum-cost path. For example (as in Figure 6.21(b)), if We will see that the most natural approach involves the consideration of
a path P consisting of three edges is only slightly cheaper than another path many different options; this is another example of the principle of "multi-
P’ that has two edges, then after the change in costs, P’ will be cheaper, since way choices" that we saw in the algorithm for the Segmented Least Squares
we only add 2M to the cost of P’ while adding 3M to the cost of P. Problem.
Let’s fix an optimal path P representing OPT(i, V) as depicted in Figure 6.22.
A Dynamic Programming Approach We will try to use dynamic program- o If the path P uses at most i - 1 edges, then OPT(i, V) = OPT(i -- 1, v).
ruing to solve the problem of finding a shortest path from s to t when there o If the path P uses i edges, and the first edge is (v, w), then OPT(i, v) =
are negative edge costs but no negative cycles. We could try an idea that has
Cvw -]- OPT(i -- 1, w).
worked for us so far: subproblem i could be to find a shortest path using only
the first i nodes. This idea does not immediately work, but it can be made This leads to the following recursive formula.
to work with some effort. Here, however, we will discuss a simpler and more
efficient solution, the Bellman-Ford Algorithm. The development of dynamic (6.23) If i > 0 then
programming as a general algorithmic technique is often credited to the work OPT(i,v)=min(oPT(i 1, v),min(OPT(i--l’w)÷Cvw))i
of Bellman in the !950’s; and the Bellman-Ford Shortest-Path Algorithm was
one of the first applications.
Using this recurrence, we get the following dynamic programming algo-
The dynamic programming solution we develop will be based on the rithm to compute the value OPT(n -- 1, s).
following crucial observation.
6.8 Shortest Paths in a Graph 295
Chapter 6 Dynamic Programming
294
If we are a little more careful in the analysis of the method above, we can
Shortest-Path (G, s, t) improve the running-time bound to O(mn) without significantly changing the
~= number of nodes in G algorithm itself.
Array M[O...n- I, V]
Define M[O,t]=O and M[O,u]=oo for all other u~V The Shoz~ZesZ2Path method can be implemented in O(mn) time:
For i_--l,...,n-I
For u~V in any order Proof. Consider the computation of the array entry M[i, v] according to the
Compute A4[i, v] using the recurrence (6.23) recurrence (6.23);. we have
Endfor
Enddor M[i, v] = min(M[i - 1, v], min(M[i -- 1, w] + cu~)).
tu~V
3 Return M[~t -- I, S]
We assumed it could take up to O(n) time to compute this minimum, since
Ca) there are n possible nodes w. But, of course, we need only compute this
The correctness of the method follows directly by induction from (6.23).
We can bound the running time as follows. The table M has n2 entries; and minimum over all nodes w for which v has an edge to w; let us use nu to denote
012345 each entry can take O(n) time to compute, as there are at most n nodes w ~ V this number. Then it takes time O(nu) to compute the array entry M[i, v]. We
have to compute an entry for every node v and every index 0 < i < n - 1, so
t we have to consider. this gives a running-time bound of
0 -2 -2 -2 (6.24) The Shortest-Path method correctly computes the minimum cost.of
3 3 3 an s-t path in any graph that has no negative cycles, and runs fn O(n3) time.
3 3 2
0 0 Io 0 Given the table M containing the optimal values of the subproblems, the In Chapter 3, we performed exactly this kind of analysis for other graph
shortest path using at most i edges can be obtained in O(in) time, by tracing algorithms, and used (3.9) from that chapter to bound the expression ~usv nu
Co) back through smaller subproblems. for undirected graphs. Here we are dealing with directed graphs, and nv denotes
As an example, consider the graph in Figure 6.23 (a), where the goal is to the number of edges leaving v. In a sense, it is even easier to work out the
Figure 6.23 For the directed
graph in (a), the Shortest- find a shortest path from each node to t. The table in Figure 6.23 (b) shows the value of ~v,v nu for the directed case: each edge leaves exactly one of the
Path Algorithm constructs array M, with entries corresponding to the values M[i, v] from the algorithm. nodes in V, and so each edge is counted exactly once by this expression. Thus
the dynamic programming Thus a single row in the table corresponds to the shortest path from a particular we have ~u~v n~ = m. Plugging this into our expression
table in (b).
node to t, as we allow the path to use an increasing number of edges. For
example, the shortest path from node d to t is updated four times, as it changes
from d-t, to d-a-t, to d-a-b-e-t, and finally to d-a-b-e-c-t.
for the running time, we get a running-time bound of O(mn).
Extensions: Some Basic Improvements to the Algorithm
An Improved Running-Time Analysis We can actually provide a better Improving the Memory Requirements We can also significantly improve the
running-time analysis for the case in which the graph G does not have too memory requirements with only a small change to the implementation. A
many edges. A directed graph with n nodes can have close to n2 edges, since common problem with many dynamic programming algorithms is the large
there could potentially be an edge between each pair of nodes, but many space usage, arising from the M array that needs to be stored. In the Bellman-
graphs are much sparser than this. When we work with a graph for which Ford Algorithm as written, this array has size n2; however, we now show how
the number of edges m is significantly less than nz, we’ve already seen in a to reduce this to O(n). Rather than recording M[i, v] for each value i, we will
number of cases earlier in the book that it can be useful to write the running- use and update a single value M[v] for each node v, the length of the shortest
time in terms of both m and n; this way, we can quantify our speed-up on path from v to t that we have found so far. We still run the algorithm for
graphs with relatively fewer edges.
6.9 Shortest Paths and Distance Vector Protocols 297
Chapter 6 Dynamic Programming
296
Now note that if G has no negative cycles, then (6.27) implies that the
iterations i = 1, 2 ..... n - 1, but the role of i will now simply be as a counter; pointer graph P will never have a cycle. For a node v, consider the path we
in each iteration, and for each node v, we perform the update get by following the edges in P, from v to first[v] = v~, to first[v~] = v2, and so
M[v] = min(M[v], ~m~r~(cuw + M[tu])). forth. Since the pointer graph has no cycles, and the sink t is the only node
that has no outgoing edge, this path must lead to t. We claim that when the
We now observe the following fact. algorithm terminates, this is in fact a shortest path in G from v to t.
(6.26) Throughout the algorithm M[v] is the length of some path from v to (6.28) Suppose G has no negative cycles, and consider the pointer graph P
t, and after i rounds of updates the value M[v] is no larger than the length of at the termination, of the algorithm. For each node v, the path in P from v to t
the shortest path from v to t using at most i edges. is a shortest v-t path in G.
Proof. Consider a node v and let tv = first[v]. Since the algorithm terminated,
Given (6.26), we can then use (6.22) as before to show that we are done after
n - 1 iterations. Since we are only storing an M array that indexes over the we must have M[v] = cuw + M[w]. The value M[t] = O, and hence the length
of the path traced out by the pointer graph is exactly M[v], which we know is
nodes, this requires only O(n) working memory.
the shortest-path distance. []
Finding the Shortest Paths One issue to be concerned about is whether this
space-efficient version of the algorithm saves enough information to recover Note that in the more space-efficient version of Bellman-Ford, the path
the shortest paths themselves. In the case of the Sequence Alignment Problem whose length is M[v] after i iterations can have substantially more edges than
in the previous section, we had to resort to a tricky divide-and-conquer.methpd i. For example, if the graph is a single path from s to t, and we perform updates
to recover the solution from a similar space-efficient implementation. Here,
in the reverse of the order the edges appear on the path, then we get the final
however, we will be able to recover the shortest paths much more easily.
shortest-path values in just one iteration. This does not always happen, so we
To help with recovering the shortest paths, we will enhance the code by cannot claim a worst-case running-time improvement, but it would be nice to
having each node v maintain the first node (after itself) on its path to the be able to use this fact opportunisticaBy to speed up the algorithm on instances
destination t; we will denote this first node by first[v]. To maintain first[v], where it does happen. In order to do this, we need a stopping signal in the
we update its value whenever the distance M[v] is updated. In other words, algorithm--something that tells us it’s safe to terminate before iteration n - 1
whenever the value of M[v] is reset to the minimum t~EVrnin(cvro + M[w]), we set is reached.
first[v] to the node w that attains this minimum. Such a stopping signal is a simple consequence of the following observa-
Now let P denote the directed "pointer graph" whose nodes are V, and tion: If we ever execute a complete iteration i in which no M[v] value changes,
whose edges are {(v, first[v])}. The main observation is the following. then no M[v] value will ever change again, since future iterations will begin
with exactly the same set of array entries. Thus it is safe to stop the algorithm.
(6.27) If the pointer graph P contains a cycle C, then this cycle.must have Note that it is not enough for a particular M[v] value to remain the same; in
negative cost. order to safely terminate, we need for all these values to remain the same for
Proof. Notice that if first[v] = w at any time, then we must have M[v] >_ a single iteration.
cvw +M[w]. Indeed, the left- and right-hand sides are equal after the update
that sets first[v] equal to w; and since M[w] may decrease, this equation may 6.9 Shortest Paths and Distance Vector Protocols
turn into an inequality. One important application of the Shortest-Path Problem is for routers in a
Let Vl, v2 ..... vk be the nodes along the cycle C in the pointer graph, communication network to determine the most efficient path to a destination.
and assume that (vk, Vl) is the last edge to have been added. Now, consider We represent the network using a graph in which the nodes correspond to
the vaiues right before this last update. At this time we have M[v~] >_ cvi~i+~ + routers, and there is an edge between v and tv if the two touters are connected
M[vi+l] for all i = 1 ..... k - !, and we also have M[vk] > cvkul +M[vl] since by a direct communication link. We define a cost cuw representing the delay on
we are about to update M[vk] and change first[v~] to Vl. Adding all these the link (v, w); the Shortest-Path Problem with these costs is to determine t_he
inequalities, the M[vi] values cance!, and we get 0 > ~g-li=l c,ivi+~ + cvm: a path with minimum delay from a source node s to a destination t. Delays are
negative cycle, as claimed. []
Chapter 6 Dynamic Programming 6.9 Shortest Paths and Distance Vector Protocols 299
298
naturally nonnegative, so one could use Dijkstra’s Algorithm to compute the For all edges (U, w) in any order
shortest path. However, Dijkstra’s shortest-path computation requires global M[u] = min(M[u], cuw + M[w])
knowledge of the network: it needs to maintain a set S of nodes for which If this changes the value of M[U], then first[u]=w
shortest paths have been determined, and make a global decision about which End/or
node to add next to S. While reuters can be made to run a protocol in the End/or
background that gathers enough global information to implement such an If no value changed in this iteration, then end the algorithm
algorithm, it is often cleaner and more flexible to use algorithms that require End/or
Return M[S]
only local knowledge of neighboring nodes.
If we think about it, the Bellman-Ford Algorithm discussed in the previous In this algorithm, nodes are sent updates of their neighbors’ distance
section has just such a "local" property. Suppose we let each node v maintain values in rounds, and each node sends out an update in each iteration in which
its value M[v]; then to update this value, u needs only obtain the value M[w] it has changed. However, if the nodes correspond to reuters in a network, then
from’each neighbor w, and compute we do not expect everything to run in lockstep like this; some reuters may
report updates much more quickly than others, and a router with an update to
report may sometimes experience a delay before contacting its neighbors. Thus
the renters will end up executing an asynchronous version of the algorithm:
based on the information obtained. each time a node w experiences an update to its M[w] value, it becomes
We now discuss an improvement to the Bellman-Ford Algorithm that "active" and eventually notifies its neighbors of the new value. If we were
makes it better suited for reuters and, at the same time, a faster algorithm to watch the behavior of all reuters interleaved, it would look as follows.
in practice. Our current implementation of the Bellman-Ford Algorithm can be
thought of as a pull-based algorithm. In each iteration i, each node v has to Asynchronous-Shortest-Path(G, s, t)
contact each neighbor w, and "pull" the new value M[w] from it. If a node w n= number of nodes in G
has not changed its value, then there is no need for ~ to get the value again; Array M[V]
however, u has no way of knowing this fact, and so it must execute the pnll Initialize M[t]=0 and M[u]=oo for all other uE V
anyway. Declare t to be active and all other nodes inactive
This wastefulness suggests a symmetric push-based implementation, While there exists an active node
where values are only transmitted when they change. Specifically, each node Choose an active node u)
w whose distance value M[w] changes in an iteration informs all its neighbors For all edges (u, uT) in any order
of the new value in the next iteration; this allows them to update their values M[u] = min(M[u], cw.u + M[w])
accordingly. If M[w] has not changed, then the neighbors of w already have If this changes the value of M[u], then
the current value, and there is no need to "push" it to them again. This leads first[u] = w
to savings in the running time, as not all values need to be pushed in each iter- u becomes active
ation. We also may terminate the algorithm early, if no value changes during End/or
an iteration. Here is a concrete description of the push-based implementation. u~ becomes inactive
EndWhile
Push-Based-Shortest-Path(G, s, t)
One can show that even this version of the algorithm, with essentially no
~= number of nodes in G
coordination in the ordering of updates, will converge to the correct values of
Array M[V]
the shortest-path distances to t, assuming only that each time a node becomes
Initialize M[t]=O and M[u]=oo for all other u ~ V
active, it eventually contacts its neighbors.
For 1=1 ..... n-1
For 1//~ V in any order
The algorithm we have developed here uses a single destination t, and
If M[uT] has been updated in the previous iteration then all nodes v ~ V compute their shortest path to t. More generally, we are
6.10 Negative Cycles in a Graph 301
Chapter 6 Dynamic Programming
300
not know that the deletion of (v, t) has eliminated all paths from s to t. Instead,
presumably interested in finding distances and shortest paths between all pairs it sees that M[s]= 2, and so it updates M[v] =Cvs +M[s] = 3, assuming that
of nodes in a graph. To obtain such distances, we effectively use n separate it will use its cost-1 edge to s, followed by the supposed cost-2 path from s
computations, one for each destination. Such an algorithm is referred to as to t. Seeing this change, node s will update M[s] = csv +M[v] = 4, based on
a distance uector protocol, since each node maintains a vector of distances to its cost-1 edge to v, followed by the supposed cost-3 path from v to t. Nodes
every other node in the network. s and v will continue updating their distance to t until one of them finds an
alternate route; in the case, as here, that the network is truly disconnected,
these updates will continue indefinitely--a behavior known as the problem of
Problems with the Distance Vector Protocol
counting to infinity.
One of the major problems with the distributed implementation of Bellman-
Ford on routers (the protocol we have been discussing above) is that it’s derived To avoid this problem and related difficulties arising from the limited
from an initial dynamic programming algorithm that assumes edge costs will amount of information available to nodes in the Bellman-Ford Algorithm, the
remain constant during the execution of the algorithm. Thus far we’ve been designers of network routing schemes have tended to move from distance
designing algorithms with the tacit understanding that a program executing vector protocols to more expressive path vector protocols, in which each node
the algorithm will be running on a single computer (or a centrally managed stores not just the distance and first hop of their path to a destination, but
set of computers), processing some specified input. In this context, it’s a rather some representation of the entire path. Given knowledge of the paths, nodes
benign assumption to require that the input not change while the progra_m is can avoid updating their paths to use edges they know to be deleted; at the
actually running. Once we start thinking about routers in a network, however, same time, they require significantly more storage to keep track of the full
this assumption becomes troublesome. Edge costs may change for all ~sorts of paths. In the history of the Internet, there has been a shift from distance vector
reasons: links can become congested and experience slow-downs; or a link protocols to path vector protocols; currently, the path vector approach is used
(v, w) may even fail, in which case the cost c~ effectively increases to oo. in the Border Gateway Protocol (BGP) in the Internet core.
Here’s an indication of what can go wrong with our shortest-path algo-
rithm when this happens. If an edge (v, w) is deleted (say the link goes down), 6.10 Negative Cycles in a Graph
it is natural for node v to react as follows: it should check whether its shortest So far in our consideration of the Bellman-Ford Algorithm, we have assumed
path to some node t used the edge (v, w), and, if so, it should increase the
that the underlying graph has negative edge costs but no negative cycles. We
distance using other neighbors. Notice that this increase in distance from v can
now consider the more general case of a graph that may contain negative
now trigger increases at v’s neighbors, if they were relying on a path through v,
cycles.
and these changes can cascade through the network. Consider the extremely
simple example in Figure 6.24, in which the original graph has three edges
/~:~ The Problem
(s, v), (v, s) and (u, t), each of cost 1.
Now suppose the edge (v, t) in Figure 6.24 is deleted. How dbes node v There are two natural questions we will consider.
react? Unfortunately, it does not have a global map of the network; it only How do we decide if a graph contains a negative cycle?
knows the shortest-path distances of each of its neighbors to t. Thus it does
How do we actually find a negative cycle in a graph that contains one?
The algorithm developed for finding negative cycles will also lead to an
~s
The deleted edge causes an unbou----nded~
equence of updates by s and u.
improved practical implementation of the Bellman-Ford Algorithm from the
previous sections.
It turns out that the ideas we’ve seen so far will allow us to find negative
cycles that have a path reaching a sink t. Before we develop the details of this,
let’s compare the problem of finding a negative cycle that can reach a given t
with the seemingly more natural problem of finding a negative cycle anywhere
Figure 6.24 When the edge (v, t) is deleted, the distributed Bellman-Ford Algorithm in the graph, regardless of its position related to a sink. It turns out that if we
will begin "counting to infiniW."
6.10 Negative Cycles in a Graph 303
Chapter 6 Dynamic Programming
302
get shorter and shorter as we go around a negative cycle. In fact, for any node
(Any negative cycle in G wil! be able to reach i.~ v on a negative cycle that has a path to t, we have the following.
(6.30) If node v can reach node t and is contained in a negative cycle, then
lim OPT(i, v)
If the graph has no negative cycles, then (6.22) implies following statement.
(6.31) If there are no negative cycles in G, then OPT(i, V) = OPT(n -- !, V) for
all nodes v and all i > n.
But for how large an i do we have to compute the values OPT(i, V) before
concluding that the graph has no negative cycles? For example, a node v may
satisfy the equation OPT(n, V) = OPW(n- 1, v), and yet still lie on a negative
cycle. (Do you see why?) However, it turns out that we will be in good shape
Figure 6.25 The augmented graph.
if this equation holds for all nodes.
develop a solution to the first problem, we’l! be able to obtain a solution to (6.32) There is no negative cyc!e with a path to tif and only if opT(n,
the second problem as well, in the following way. Suppose we start with a
graph G, add a new node t to it, and connect each other node v in the graph
to node t via an edge of cost 0, as shown in Figure 6.25. Let us call the new Proof. Statement (6.31) has already proved the forward direction. For the other
"augmented graph" G’. direction, we use an argument employed earlier for reasoning about when it’s
safe to stop the Bellman-Ford Algorithm early. Specifically, suppose OPT(n, v) =
(6.29) The augmented graph G’ has a negative cycle C such that there is a OPT(n -- 1, V) for all nodes v. The values of OPT(n + 1, V) can be computed
path from C to the sink t if and only if the original graph has a negative cycle. from OPT(n, v); but all these values are the same as the corresponding OPW(n --
1, v). It follows that we will have OPT(n + 1, v) = OPT(n -- !, V). Extending this
reasoning to future iterations, we see that none of the values will ever change
Proof. Assume G has a negative cycle. Then this cycle C clearly has an edge again, that is, OPT(i, v) = OPT(n -- 1, V) for al! nodes v and all i >_ n. Thus there
to t in G’, since all nodes have an edge to t. cannot be a negative cycle C that has a path to t; for any node w on this cycle
Now suppose G’ has a negative cycle with a path to t. Since no edge leaves C, (6.30) implies that the values OPT(i, w) would have to become arbitrarily
t in G’, this cycle cannot contain t. Since G’ is the same as G asidefrom the negative as i increased. ,,
node t, it follows that this cycle is also a negative cycle of G. a
Statement (6.52) gives an O(mn) method to decide if G has a negative
So it is really enough to solve the problem of deciding whether G has a cycle that can reach t. We compute values of OPT(i, v) for nodes of G and for
negative cycle that has a path to a given sink node t, and we do this now. values of i up to n. By (6.32), there is no negative cycle if and only if there is
some value of i < n at which OPT(i, v) = OPT(i -- 1, v) fo~: all nodes v.
~ Designing and Analyzing the Algorithm So far we have determined whether or not the graph has a negative cycle
with a path from the cycle to t, but we have not actually found the cycle. To
To get started thinking about the algorithm, we begin by adopting the original
find a negative cycle, we consider a node v such that OPT(n, V) 7~ OPT(n -- 1, V):
version of the BeLlman-Ford Algorithm, which was less efficient in its use
for this node, a path P from v to t of cost OPT(n, V) must use exactly n edges.
of space. We first extend the definitions of OPT(i, v) from the Bellman-Ford
We find this minimum-cost path P from v to t by tracing back through the
Algorithm, defining them for values i >_ n. With the presence of a negative
subproblems. As in our proof of (6.22), a simple path can only have n- !
cycle in the graph, (6.22) no longer applies, and indeed the shortest path may
304 Chapter 6 Dynamic Programming 6.10 Negative Cycles in a Graph 305
edges, so P must contain a cycle C. We claim that this cycle C has negative Consider a new edge (v, w), with first[v] = w, that is added to the pointer
cost. graph P. Before we add (v, w) the pointer graph has no cycles, so it consists of
paths from each node v to the sink t. The most natural way to check whether
(6.33) If G has n nodes and OPT(n, v) ~ OPT(n -- 1, V), then a path P from v adding edge (v, w) creates a cycle in P is to follow the current path from tv to
to t of cost OPT(n, v) contains a cycle C, and C has negative cost. the terminal t in time proportional to the length of this path. If we encounter
v along this path, then a cycle has been formed, and hence, by (6.27), the
Proof. First observe that the path P must have n edges, as OPT(n, V) ~ OPT(n -- graph has a negative cycle. Consider Figure 6.26, for example, where in both
1, v), and so every path using n - 1 edges has cost greater than that of the (a) and (b) the pointer firstly] is being updated from u to tv; in (a), this does
path P. In a graph with n nodes, a path consisting of n edges must repeat not result in a (negative) cycle, but in (b) it does. However, if we trace out the
a node somewhere; let w be a node that occurs on P more than once. Let C sequence of pointers from v like this, then we could spend as much as O(n)
be the cycle on P between two consecutive occurrences of node w. If C were time following the path to t and still not find a cycle. We now discuss a method
not a negative cycle, then deleting C from P would give us a v-t path with that does not require an O(n) blow-up in the running time.
fewer than n edges and no greater cost. This contradicts our assumption that
We know that before the new edge (v, w) was added, the pointer graph
OPT(n, v) ~: O~T(n -- 1, V), and hence C must be a negative cycle. []
was a directed tree. Another way to test whether the addition of (v, rv) creates
a cycle is to consider al! nodes in the subtree directed toward v. If w is in this
(6.34) The algorithm above finds a negative cycle in G, if such a cycle e:fists, subtree, then (v, rv) forms a cycle; otherwise it does not. (Again, consider the
and runs in O(rnn) time. ’ two sample cases in Figure 6.26.) To be able to find all nodes in the subtree
directed toward v, we need to have each node v maintain a list of all other
nodes whose selected edges point to v. Given these pointers, we can find
Extensions: Improved Shortest Paths and Negative Cycle the subtree in time proportional to the size of the subtree pointing to v, at
Detection Algorithms most O(n) as before. However, here we will be able to make additional use
of the work done. Notice that the current distance value Mix] for all nodes x
At the end of Section 6.8 we discussed a space-efficient implementation of the
in the subtree was derived from node v’s old value. We have just updated v’s
Bellman-Ford algorithm for graphs with no negative cycles. Here we implement
distance, and hence we know that the distance values of all these nodes will
the detection of negative cycles in a comparably space-efficient way. In addition
be updated again. We’ll mark each of these nodes x as "dormant," delete the
to the savings in space, this will also lead to a considerable speedup in practice
even for graphs with no negative cycles. The implementation will be based on
the same pointer graph P derived from the "first edges" (v, firstly]) that we
used for the space-efficient implementation in Section 6.8. By (6.27), we know
that if the pointer graph ever has a cycle, then the cycle has negative.cost, and
we are done. But if G has a negative cycle, does this guarantee that the pointer
graph will ever have a cycle? Furthermore, how much extra computation time
do we need for periodically checking whether P has a cycle?
Ideally, we would like to determine whether a cycle is created in the pointer
graph P every time we add a new edge (v, w) with firstly] = w. An additional
advantage of such "instant" cycle detection will be that we will not have to wait
for n iterations to see that the graph has a negative cycle: We can terminate
Update to Update to
as soon as a negative cycle is found. Earlier we saw that if a graph G has no
first[v] = w first[v] = w
negative cycles, the algorithm can be stopped early if in some iteration the
shortest path values M[v] remain the same for all nodes v. Instant negative (a)
cycle detection wil! be an analogous early termination rule for graphs that Figure 6.26 Changing the pointer graph P when firstly] is updated from u to w. In (b),
have negative cycles. this creates a (negative) cycle, whereas in (a) it does not.
Chapter 6 Dynamic Programming Solved Exercises 307
306
edge (x, first[x]) from the pointer graph, and not use x for future updates until a negative cycle, as simple paths can have at most n- 1 edges. Finally, the
its distance value changes. time spent marking nodes dormant is bounded by the time spent on updates.
We summarize the discussion with the following claim about the worst-case
This can save a lot of future work in updates, but what is the effect on the
performance of the algorithm. In fact, as mentioned above, this new version
worst-case running time? We can spend as much as O(n) extra time marking
is in practice the fastest implementation of the algorithm even for graphs that
nodes dormant after every update in distances. However, a node can be marked do not have negative cycles, or even negative-cost edges.
dormant only if a pointer had been defined for it at some point in the past, so
the time spent on marking nodes dormant is at most as much as the time the (6.36) The improved algorithm outlined above finds a negative cycle in G if
algorithm spends updating distances. such a cycle exists. It terminates immediately if the pointer graph P of first[v]
Now consider the time the algorithm spends on operations other than pointers contains a cycle C, or if there is an iteration in which no update occurs
marking nodes dormant. Recall that the algorithm is divided into iterations, to any distance value M[v]. The algorithm uses O(n) space, has at most n
where iteration i + 1 processes nodes whose distance has been updated in iterations, and runs in O(mn) time in the worst case.
iteration i. For the original version of the algorithm, we showed in (6.26) that
after i iterations, the value M[v] is no larger than the value of the shortest path Solved Exercises
from v to t using at most i edges. However, with many nodes dormant in each
iteration, this may not be true anymore. For example, if the shortest path .from Solved Exercise 1
v to t using at most i edges starts on edge e = (u, w), and w is dormant in Suppose you are managing the construction of billboards on the Stephen
this iteration, then we may not update the distance value M[v], and so it stays Daedalus Memorial Highway, a heavily traveled stretch of road that runs
at a value higher than the length of the path through the edge (v, w). This west-east for M miles. The possible sites for billboards are given by numbers
seems like a problem--however, in this case, the path through edge (u, w) is xl, x2 ..... Xn, each in the interval [0, M] (specifying their position along the
not actually the shortest path, so M[v] will have a chance to get updated later highway, measured in miles from its western end). If you place a billboard at
to an even smaller value. location xi, you receive a revenue of ri > 0.
So instead of the simpler property that held for M [v] in the original versions Regulations imposed by the county’s Highway Department require that
of the algorithm, we now have the the following claim. no two of the billboards be within less than or equal to 5 miles of each other.
(6.35) Throughout the algorithm M[v] is the length of some simple path from You’d like to place billboards at a subset of the sites so as to maximize your
v to t; the path has at least i edges if the distance value M[v] is updated in total revenue, subject to this restriction.
iteration i; and after i iterations, the value M[v] is the length of the shortest Example. Suppose M = 20, n = 4,
path for all nodes v where there is a shortest v-t path using at most i edges.
{x1, x2, x3, x4}={6, 7, 12, 14},
Proof. The first pointers maintain a tree of paths to t, which implies that all and
paths used to update the distance values are simple. The fact that updates in
iteration i are caused by paths with at least i edges is easy to show by induction {rl, r2, r3, r4} = {5, 6, 5, 1}.
on i. Similarly, we use induction to show that after iteration i the value Then the optimal solution would be to place billboards at xl and x3, for a total
is the distance on all nodes v where the shortest path from v to t uses at most revenue of 10.
i edges. Note that nodes u where M[v] is the actual shortest-path distance
Give an algorithm that takes an instance of this problem as input and
cannot be dormant, as the value M[u] will be updated in the next iteration for
returns the maximum total revenue that can be obtained from any valid subset
all dormant nodes. E
of sites. The running time of the algorithm should be polynomial in n.
Using this claim, we can see that the worst-case running time of the Solution We can naturally apply dynamic programming to this problem if
algorithm is still bounded by O(mn): Ignoring the time spent on marking we reason as follows. Consider an optimal solution for a given input instance;
nodes dormant, each iteration is implemented in O(m) time, and there can in this solution, we either place a billboard at site xn or not. If we don’t, the
be at most n - I iterations that update values in the array M without finding optimal solution on sites xl ..... xn is really the same as the optimal solution
Solved Exercises 309
308 Chapter 6 Dynamic Programming
on sites x1 ..... xn-1; if we do, then we should ehminate xn and all other sites simply define e(]) to be the largest value of i for which we’ve seen xi in our
that are within 5 miles of it, and find an optimal solution on what’s left. The scan.
same reasoning applies when we’re looking at the problem defined by just the Here’s a final observation on this problem. Clearly, the solution looks
firstj sites, xl ..... xj: we either include xj in the optimal solution or we don’t, very much fike that of the Weighted Interval Scheduling Problem, and there’s
with the same consequences. a fundamental reason for that. In fact, our billboard placement problem
Let’s define some notation to help express this. For a site xj, we let e(j) can be directly encoded as an instance of Weighted Interval Scheduling, as
denote the easternmost site xi that is more than 5 miles from xj. Since sites follows. Suppose that for each site xi, we define an interval with endpoints
are numbered west to east, this means that the sites xl, x2 ..... xeq) are still [x~ - 5, xi] and weight ri. Then, given any nonoverlapping set of intervals, the
valid options once we’ve chosen to place a billboard at xj, but the sites corresponding set of sites has the property that no two lie within 5 miles of
each other. Conversely, given any such set of sites (no two within 5 miles), the
Xeq)+~ ..... x~_~ are not. intervals associated with them will be nonoverlapping. Thus the collections
Now, our reasoning above justifies the following recurrence. If we let OPT(j)
denote the revenue from the optimal subset of sites among x~ ..... xj, then we of nonoveflapping intervals correspond precisely to the set of valid billboard
placements, and so dropping the set of intervals we’ve just defined (with their
have weights) into an algorithm for Weighted Interval Scheduling will yield the
OPT(]) ---- max(r/+ OPT(e(])), OPT(] -- 1)). desired solution.
(a) Show that the following algorithm does not correctly solve Ms
problem, by giving an instance on which it does not return the correct
answer.
For iterations i = I to n Figure 6.29 The correct answer for this ordered graph is 3: The longest path from v~ to
If hi+I > ~i + gi+l then Vn uses the three edges (v,, v2),(v2, v4), and (U4, US).
Output "Choose no job in week i"
Output "Choose a high-stress job in week i+I"
Continue with iteration i+ 2
While there is an edge out of the node w
Else
Output "Choose a low-stress job in week i" Choose the edge (w, ~)
Continue with iteration i+ 1 for which ] is as small as possible
Endif Set m = ~
Increase i by 1
end while
Return i as the length of the longest path
To avoid problems with overflowing array bounds, we difine
hi = ~i = 0 when i > n.
In your example, say what the correct answer is and also what In your example, say what the correct answer is and also what the
algorithm above finds.
the above algorithm finds.
Give an efficient algorithm that takes an ordered graph G and returns
~) Give an efficient algorithm that takes values for Q, Q,. , ~n and
hn and returns the value of an optimal plan. the length of the longest path that begins at vI and ends at vn. (Again,
the length of a path is the number of edges in the path.)
Let G = (V, E) be a directed graph with nodes v~ ..... vn. We say that G is
an ordered graph if it has the following properties. Suppose you’re running a lightweight consulting business--just you, two
(i) Each edge goes from a node with a lower index to a node with a higher associates, and some rented equipment. Your clients are distributed
index. That is, every directed edge has the form (vi, vi) with i < j. between the East Coast and the West Coast, and this leads to the following
(ii) Each node except Vn has at least one edge leaving it. That is, for every question.
node vi, i = 1, 2 .....n - 1, there is at least one edge of the fo.rm (vi, vi). Each month, you can either run your business from an office in New
The length of a path is the number of edges in it. The goal in this York (NY) or from an office in San Francisco (SF). In month i, you’ll incur
question is to solve the following problem (see Figure 6.29 for an exam- an operating cost of hri ff you run the business out of NY; you’ll incur an
ple). operating cost of Si if you run the business out of SF. (It depends on the
distribution of client demands for that month.)
Given an ordered graph G, find the length of the longest path that begins at However, ff you run the business out of one city in month i, and then
vt and ends at vn. out of the other city in month i + 1, then you incur a,fixed moving cost of
(a) Show that the following algorithm does not correctly solve this M to switch base offices.
problem, by giving an example of an ordered graph on which it does Given a sequence of n months, a plan is a sequence of n locations--
not return the correct answer. each one equal to either NY or SF--such that the ith location indicates the
city in which you will be based in the ith month. The cost of a plan is the
Set ~u ---- UI sum of the operating costs for each of the n months, plus a moving cost
Set L=O of M for each time you switch cities. The plan can begin in either city.
Chapter 6 Dynamic Programming Exercises 317
316
The problenz Given a value for the moving cost M, and sequences of text. If English were written without spaces, the analogous problem would
operating costs N1 ..... Nn and S1 ..... Sn, find a plan of minimum cost. consist of taking a string like "meetateight" and deciding that the best
(Such a plan will be called optimal.) segmentation is "meet at eight" (and not "me et at eight," or "meet ate
ight," or any of a huge number of even less plausible alternatives). How
Example. Suppose n = 4, M = 10, and the operating costs are given by the
could we automate this process?
following table.
A simple approach that is at least reasonably effective is to find a
Month 4 segmentation that simply maximizes the cumulative "quality" of its indi-
Month 1 Month 2 Month 3
vidual constituent words. Thus, suppose you axe given a black box that,
NY 1¯ 3 20 30 for any string of letters x = xlx2-., xk, will return a number quality(x). This
SF 50 20 2 4 number can be either positive or negative; larger numbers correspond to
more plausible English words. (So quaIity("rne") would be positive, while
Then the plan of minimum cost would be the sequence of locations quality("ght") would be negative.)
[NL N~, SF, SF], Given a long string of letters y = YlY2 ¯" "Yn, a segmentation of y is a
partition of its letters into contiguous blocks of letters; each block corre-
with a total cost of 1 + 3 + 2 + 4 + 10 = 20, where the final term of 10 arises sponds to a word in the segmentation. The total quality of a segmentation
because you change locations once. is determined by adding up the qualities of each of its blocks. (So we’d
(a) Show that the following algorithm does not correctly solve this get the right answer above provided that quaIity("rneet") + quality("at") +
problem, by giving an instance on which it does not return the correct quality(" eight") was greater than the total quality of any other segmenta-
answer. tion of the string.)
Give an efficient algorithm that takes a string y and computes a
For i = I to n segmentation of maximum total quality. (You can treat a single call to
If Ni < Si then the black box computing quality(x) as a single computational step.)
Output "NY in Month (A final note, not necessary for solving the problem: To achieve better
Else performance, word segmentation software in practice works with a more
Output "SF in Month i" complex formulation of the problem--for example, incorporating the
End notion that solutions should not only be reasonable at the word level, but
also form coherent phrases and sentences. If we consider the example
In your example, say what the correct answer is and also what the
"theyouthevent," there are at least three valid ways to segment this
algorithm above finds. into common English words, but one constitutes a much more coherent
(b) Give an example of an instance in which every optimal plan must phrase than the other two. If we think of this in the terminology of formal
move (i.e., change locations) at least three times. languages, this broader problem is like searching for a segmentation
Provide a brief explanation, saying why your example has this that also can be parsed well according to a grammar for the underlying
property. language. But even with these additional criteria and constraints, dynamic
(c) Give an efficient algorithm that takes values for n, M, and sequences programming approaches lie at the heart of a number of successful
of operating costs N1 ..... Nn and Sl .....Sn, and returns the cost of segmentation systems.)
an optimal plan. In a word processor, the goal of "pretty-printing" is to take text with a
5. As some of you know well, and others of you may be interested to learn, ragged right margin, like this,
a number of languages (including Chinese and Japanese) are written
without spaces between the words. Consequently, software that works Call me Ishmael.
with text written in these languages must address the word segmentation Some years ago,
problem--inferring li_kely boundaries between consecutive words in the never mind how long precisely,
Exercises 319
Chapter 6 Dynamic Programming
318
share, p(j) - p(i)? (If there is no way to make money during the n days, we
having little or no money in my pu~se,
should conclude this instead.)
and nothing particular to interest me on shore,
I thought I would sail about a little In the solved exercise, we showed how to find the optimal pair of
and see the watery part of the world. days i and j in time O(n log n). But, in fact, it’s possible to do better than
this. Show how to find the optimal numbers i and j in time O(n).
and turn it into text whose right margin is as "even" as possible, like this.
The residents of the underground city of Zion defend themselves through
a combination of kung fu, heavy artillery, and efficient algorithms. Re-
Call me Ishmael. Some years ago, never cently they have become interested in automated methods that can help
mind how long precisely, having little fend off attacks by swarms of robots.
or no money in my purse, and nothing
particular to interest me on shore, I
Here’s what one of these robot attacks looks like.
thought I would sail about a little ¯ A swarm of robots arrives over the course of n seconds; in the ith
and see the watery part of the world. second, xi robots arrive. Based on remote sensing data, you know
this sequence Xl, x2 ..... x~ in advance.
To make this precise enough for us to start ~g about how to ¯ You have at your disposal an electromagnetic pulse (EMP), which can
write a pretty-printer for text, we need to figure out what it means for-the destroy some of the robots as they arrive; the EMP’s power depends
right margins to be "even." So suppose our text consists of a sequence of on how long it’s been allowed to charge up. To make this precise,
words, W = {wl, wz ..... wn}, where wi consists of ci characters, we have there is a function f(.) so that ifj seconds have passed since the EMP
a maximum line length of L. We will assume we have a fixed-width font was last used, then it is capable of destroying up to f(]) robots.
and ignore issues of punctuation or hyphenation. ¯ So specifically, ff it is used in the kth second, and it has beenj seconds
A formatting of W consists of a partition of the words in W into lines.
In the words assigned to a single line, there should be a space after each since it was previously used, then it will destroy rrfin(xk, f(])) robots.
(After t~s use, it will be completely drained.)
wg are assigned to one line,
word except the last; and so if wj, wj+l ..... ¯ We will also assume that the EMP starts off completely drained, so
then we should have ff it is used for the first time in the jth second, then it is capable of
[
L
~(Q+I) +Ck_<L.
destroying up to f(]) robots.
The problem. Given the data on robot arrivals x~, x2 .....
xn, and given the
recharging function f(.), choose the points in time at which you’re going
We will call an assignment of words to a line valid if it satisfies this
to activate the EMP so as to destroy as many robots as possible.
inequality. The d~ference between the left-hand side and the right-hand
side will be called the slack of the line--that is, the number of spaces left Example. Suppose n = 4, and the values of xi and f(i) are given by the
at the right margin. following table.
Give an efficient algorithm to find a partition of a set of words W
into valid lines, so that the sum of the squares of the slacks of all lines i 1 23 4
including the last line) is minkn~zed. Xi 1 10 10 1
As a solved exercise in Chapter 5, we gave an algorithm with O(n log n) f(O 1 2 4 8
~//7. running time for the following problem. We’re looking at the price of a
given stock over n consecutive days, numbered i = 1, 2 ..... n. For each The best solution would be to activate the EMP in the 3rd and the 4tu
seconds. In the 3ra second, the EMP has gotten to charge for 3 seconds,
day i, we have a price p(i) per share for the stock on that day. (We’ll
assume for simplicity that the price was fixed during each day.) We’d like and so it destroys min(10, 4) = 4 robots; In the 4th second, the EMP has only
to know: How should we choose a day i on which to buy the stock and a gotten to charge for 1 second since its last use, and it destroys min(1, 1) = 1
later day j > i on which to sell it, if we want to maximize the profit per robot. This is a total of 5.
Exercises 321
Chapter 6 Dynamic Programming
320
the days on which you’re going to reboot so as to maximize the total
(a) Show that the following algorithm does not correctly solve this amount of data you process.
problem, by giving an instance on which it does not return the correct
Example. Suppose n = 4, and the values of xi and st are given by the
following table.
/3 cannot appear in consecutive minutes. For example, if your job is on In your example, say what the correct answer is and also what the
machine A in minute i, and you want to switch to mach~e B, then your algorithm above finds.
choice for minute i + 1 must be move, and then your choice for minute i + 2 an and
canbe B. The value of a plan is the total number of steps that you manage bn and returns the value of an optimal plan.
to execute over the n minutes: so it’s the sum of ai over all minutes in
which the job is on A, plus the sum of bi over all minutes in which the job 11. Suppose you’re consulting for a company that manufactures PC equip-
is on B. ment and ships it to distributors all over the country. For each of the
bn, find a plan of next n weeks, they have a projected supply s~ of equipment (measured in
pounds), whi4h has to be shipped by an air freight carrier.
maximum value. (Such a strategy will be called optgmal.) Note that your
plan can start with either of the machines A or B in minute 1. Each week’s supply can be carried by one of two air freight companies,
AorB.
Example. Suppose n = 4, and the values of a~ and bi are given by the
following table. Company A charges a fixed rate r per pound (so it costs r- s~ to ship
a week’s supply si).
Company B makes contracts for a fixed amount c per week, indepen-
Minute 1 Minute 2 Minute 3 Minute 4 dent of the weight. However, contracts with company B must be made
1 10 In blocks of four consecutive weeks at a time.
A 10 1
20 20 A schedule, for the PC company, is a choice of air freight Company
B 5 1
(A or B) for each of the n weeks, with the restriction that company B,
whenever it is chosen, must be chosen for blocks of four contiguous
Then the plan of maximum value would be to choose A for:minute 1, weeks at a 0me. The cost of the schedule is the total amount paid to
then move for minute 2, and then B for minutes 3 and 4. The value of this company A and B, according to the description above.
plan would be 10 + 0 + 2O + 20 = 5O. Give a polynomial-time algorithm that takes a sequence of supply
(a) Show that the following algorithm does not correctly solve this Sn and returns a schedule of minimum cost.
problem, by giving an instance on which it does not return the correct
Example. Suppose r = 1, c = 10, and the sequence of values is
In minute I, choose the machine achieving the larger of aI, bl Then the optimal schedule would be to choose company A for the first
Set three weeks, then company B for a block of four consecutive weeks, and
~hile i < n then company A for the fInal three weeks.
What was the choice in minute i--I?
If A: 12. Suppose we want to replicate a file over a collection of n servers, labeled
If hi+l >ai+ai+l then Sn. TO place a copy of the file at server Si results in a placement
Choose moue in minute i and B in minute i+ 1 cost of q, for an integer q > 0.
Proceed to iteration i+ 2 Now, if a user requests the file from server Si, and no copy of the file is
Else present at S,, then the servers S~+l, S~+2, S,+3... are searched In order until
Choose A in minute a copy of the file is fInally found, say at server Si, where j > i. This results
Proceed to iteration i+ In an access cost ofj - i. (Note that the lower-indexed servers S~_> S~_2 ....
Endif are not consulted In this search.) The access cost is 0 if Si holds a copy of
If B: behave as above with roles of A and B reversed the file. We will require that a copy of the file be placed at server Sn, so
EndWhile that all such searches ~ terminate, at the latest, at
Exercises 325
Chapter 6 Dynamic Programming
324
We’d like to place copies of the fries at the servers so as to minimize two opposing concerns in maintaining such a path: we want paths that are
short, but we also do not want to have to change the path frequently as the
the sum of placement and access costs. Formally, we say that a configu- network structure changes. (That is, we’d like a single path to continue
ration is a choice, for each server Si with i = 1, 2 ..... n - 1, of whether to working, if possible, even as the network gains and loses edges.) Here is
place a copy of the file at Si or not. (Recall that a copy is always placed at a way we might model this problem.
Sn.) The total cost of a configuration is the sum of all placement costs for
Suppose we have a set of mobile nodes v, and at a particular point in
servers with a copy of the file, plus the sum of all access costs associated
time there is a set E0 of edges among these nodes. As the nodes move, the
with all n servers.
set of edges changes from E0 to E~, then to E2, then to E3, and so on, to an
Give a p olynomial-time algorithm to find a configuration of minimum edge set Eb. Fir i = 0, 1, 2 ..... b, let G~ denote the graph (V, E~). So if we were
total cost. to watch the structure of the network on the nodes V as a "time lapse," it
would look precisely like the sequence of graphs Go, G~, G2 .....Gb_~, G~.
13. The problem of searching for cycles in graphs arises naturally in financial We will assume that each of these graphs G~ is connected.
trading applications. Consider a firm that trades shares in n different
Now consider two particular nodes s, t ~ V. For an s-t path P in one
companies. For each pair i ~j, they maintain a trade ratio rq, meaning
of the graphs Gi, we define the length of P to be simply the number of
that one share of i trades for rq shares ofj. Here we allow the rate r to be
edges in P, and we denote this g(P). Our goal is to produce a sequence of
fractional; that is, rq = ~ means that you can trade ~ee shares of i to get
paths P0, P~ ..... P~ so that for each i, Pg is an s-t path in G~. We want the
two shares of j.
paths to be relatively short. We also do not want there to be too many
A trading cycle for a sequence of shares ~1, iz ..... ~k consists of changes--points at which the identity of the path switches. Formally, we
successively trading shares in company il for shares in company ~z, then
define changes(Po, P~ ..... P~) to be the number of indices i (0 < i < b - 1)
shares in company iz for shares i3, and so on, finally trading shares in ik
for which Pi # P~+I"
back to shares in company ~. After such a sequence of trades, one’ends up
with shares in the same company i~ that one starts with. Trading around a Fix a constant K > 0. We define the cost of the sequence of paths
cycle is usually a bad idea, as you tend to end up with fewer shares than PO, P1 ..... Pb tO be
you started with. ]But occasionally, for short periods of time, there are b
opportunities to increase shares. We will call such a cycle an opportunity COSt(Po, PI ..... Pb) = ~ f-(Pi) + K. changes(Po, P~ ..... Pb).
cycle, if trading along the cycle increases the number of shares. This
happens exactly if the product of the ratios along the cycle is above 1. In
(a) Suppose it is possible to choose a single path P that is an s-t path in
analyzing the state of the market, a firm engaged in trading would like
each of the graphs Go, G~ .....Gb. Give a polynomial-time algorithm
to know if there are any opportunity cycles. to find the shortest such path.
Give a polynomial-time algorithm that finds such an opportunity (b) Give a polynomial-time algorithm to find a sequence of paths
cycle, if one exists. P0, P~ .....P~ of minimum cost, where P~ is an s-t path in G~ for
i=0,1 .....b.
14, A large collection of mobile wireless devices can naturally form a network
in which the devices are the nodes, and two devices x and y are connected
15. On most clear days, a group of your friends in the Astronomy Department
by an edge if they are able to directly communicate with each other (e.g., gets together to plan out the astronomical events ~they’re going to try
by a short-range radio link). Such a network of wireless devices is a highly
observing that night. We’ll make the following assumptions about the
dynamlc object, in which edges can appear and disappear over time as
events.
the devices move around. For instance, an edge (x, y) might disappear as x
and y move far apart from each other and lose the ability to communicate o There are n events, which for simplicity we’ll assume occur in se-
quence separated by exactly one minute each. Thus event j occurs
directly.
In a network that changes over time, it is natural to look for efficient at minute j; if they don’t observe this event at exactly minute j, then
ways of maintaining a path between certain designated nodes. There are they miss out on it.
Exercises 327
Chapter 6 Dynamic Programming
326
The sky is mapped according to a one-dimensional coordinate system Update current position to be coord.~~ at minute ]
Endwhile
( measured in degrees from some central baseline); event j will be
i taldng place at coordinate dj, for some integer value dj. The telescope
starts at coordinate 0 at minute 0.
The last event, n, is much more important than the others; so it is
Output the set S
In your example, say what the correct answer is and also what
the algorithm above finds.
required that they observe event n.
~) Give an efficient algorithm that takes values for the coordinates should call B before D. )
The Astronomy Department operates a_large telescope that can be dl, da ..... dn of the events and returns the size of an optimal solution.
used for viewing these events. Because it is such a complex instrument, it
can only move at a rate of one degree per minute. Thus they do not expect 16. There are many sunny days in Ithaca, New York; but t~s year, as it
to be able to observe all n events; they just want to observe as many as happens, the spring ROTC picnic at CorneAl has fallen on a rainy day. The
possible, limited by the operation of the telescope and the requirement ranldng officer decides to postpone the picnic and must notify everyone
that event n must be observed. by phone. Here is the mechanism she uses to do t~s.
We say that a subset S of the events is viewable ff it is possible to Each ROTC person on campus except the ranking officer reports to
observe each event j ~ S at its appointed time j, and the telescope has a unique superior officer. Thus the reporting hierarchy can be described
adequate time (moving at its maximum of one degree per minute) to move by a tree T, rooted at the ranking officer, in which each other node v
between consecutive events in S. . has a parent node u equal to his or her superior officer. Conversely, we
The problem. Given the coordinates of each of the n events, find a will call u a direct subordinate of u. See Figure 6.30, In which A is the Figure 6.30 A hierarchy with
viewable subset of maximum size, subject to the requirement that it ranking officer, B and D are the direct subordinates of A, and C is the four people. The fastest
should contain event n. Such a solution will be called optimal. d~ect subordinate of B. broadcast scheme is for A
to call B in the first round.
Example. Suppose the one-dimensional coordinates of the events are as To notify everyone of the postponement, the ranking officer first In the second round, A ca~s
ca~s each of her direct subordinates, one at a time. As soon as each D and B calls C. If A were to
shown here. call D first, thenC could not
subordinate gets the phone call, he or she must notify each of his or learn the news until the third
her direct subordinates, one at a time. The process continues this way round.
Event 1 2 345 678 9 until everyone has been notified. Note that each person in this process
Coordinate 1 -4 -1 4 5 -4 6 7 -2 can only cal! direct subordinates on the phone; for example, in Figure
6.30, A would not be allowed to call C.
Then the optimal solution is to observe events 1, 3, 6, 9. Note that the
We can picture this process as being divided into rounds. In one
telescope has time to move from one event in this set to the next, even
round, each person who has already learned of the postponement can
moving at one degree per minute. call one of his or her direct subordinates on the phone. The number of
(a) Show that the following algorithm does not correctly solve this
problem, by giving aninstance onwhichit does not return the correct rounds it takes for everyone to be notified depends on the sequence in
which each person calls their direct subordinates. For example, in Figure
anSWer. 6.30, it will take only two rounds if A starts by calling B, but it will take
three rounds if A starts by ca]Jing D.
Mark all events j with Idn-dil >n-] as illegal (as
observing them would prevent you from observing event n) Give an efficient algorithm that determines the minimum number of
rounds needed for everyone to be notified, and outputs a sequence of
Mark all other events as legal phone calls that achieves this minimum number of rounds.
Initialize clLrrent position to coordinate 0 at minute 0 /
While not at end of event sequence 17~Your friends have been studying the dosing prices of tech stocks, looking
Find the earliest legal event ] that can be reached without for interesting patterns. They’ve defined something called a rising trend,
exceeding the maximum movement rate of the telescope as follows.
Add ] to the set S
Chapter 6 Dynamic Programming Exercises 329
328
They have the closing price for a given stock recorded for n days in Suppose you are given two strings A = ala2 ¯ .. am and B = b~b2 . . . bn
Pin]. A rising trend and a proposed alignment between them. Give an O(mn) algorithm to
p[ik], for days decide whether this alignment is the unique minimum-cost alignment
between A and B.
i~ < i2 <.. - < ik, SO that
* i~ = 1, and 19. You’re consulting for a group of people (who would prefer not to be
k- 1. mentioned here by name) whose jobs consist of monitoring and analyzing
Thus a rising trend is a subsequence of the days--beginning on the first electronic signals coming from ship s in coastal Atlantic waters. They want
day and not necessarily contiguous--so that the price strictly increases a fast algorithm for a basic primitive that arises frequently: "untangling"
over the days in this subsequence. a superposition of two known signals. Specifically, they’re picturing a
They are interested in fin~Rng the longest rising trend in a given situation in which each of two ships is emitting a short sequence of 0s
and Is over and over, and they want to make sure that the signal they’re
sequence of prices.
hearing is simply an interleaving of these two emissions, with nothing
Example. Suppose n = 7, and the sequence of prices is
extra added in.
This describes the whole problem; we can make it a little more explicit
as follows. Given a string x consisting of 0s and ls, we write x~ to denote k
Then the longest rising trend is given by the prices on days !, 4, and 7. copies of x concatenated together. We say that a string x’ is a repetition
Note that days 2, 3, 5, and 6 consist of increasing prices; but becaflse fffis ofx if it is a prefix ofxk for some number k. So x’ = 10110110110 is a repetition
subsequence does not begin on day 1, it does not fit the definition of a of x = 101.
rising trend. We say that a string s is an interleaving of x and y if its symbols can be
(a) Show that the following algorithm does not correctly r&urn the partitioned into two (not necessarily contiguous) subsequences s’ and s",
length of the longest rising trend, by giving an instance on which so that s’ is a repetition ofx and s" is a repetition ofy. (So each symbol in
it fails to return the correct answer. s must belong to exactly one of s’ or s".) For example, if x = 101 and y = 00,
then s = 100010101 is an interleaving ofx and y, since characters 1,2,5,7,8,9
Define i=I form 101101--a repetition of x--and the remaining characters 3,4,6 form
L=I 000--a repetition of y.
For ] = 2 to n In terms of our application, x and y are the repeating sequences from
If PU]> P[i] then the two ships, and s is the signal we’re listening to: We want to make sure
Set i=j. s "unravels" into simple repetitions of x and y. Give an efficient algorithm
Add 1 to L that takes strings s, x, and y and decides if s is an interleaving of x and y.
Endif
Endfor 20. Suppose it’s nearing the end of the semester and you’re taking n courses,
each with a final project that still has to be done. Each project will be
In your example, give the actual length of the longest rising trend, graded on the following scale: It w~ be assigned an integer number on
and say what the algorithm above returns. a scale of 1 to g > 1, higher numbers being better grades. Your goal, of
Give an efficient algorithm that takes a sequence of prices P[1], course, is to maximize your average grade on the n projects.
Pin] and returns the length of the longest rising trend. You have a total of H > n hours in which to work on the n projects
cumulatively, and you want to decide how to divide up this time. For
simplicity, assume H is a positive integer, and you’ll spend an integer
18. Consider the sequence alignment problem over a four-letter alphabet number of hours on each project. To figure out how best to divide up
{zl, z2, z3, z4}, with a given gap cost and given mismatch costs. Assume
that each of these parameters is a positive integer. n] (rough
Exercises 331
Chapter 6 Dynamic Programming
330
negative, but every cycle in the graph has strictly positive cost. We are
estimates, of course) for each of your rt courses; if you spend tt < H hours also given two nodes v, w ~ V. Give an efficient algorithm that computes
on the project for course i, you’]] get a grade of fi(h). (You may assume the number of shortest v-w paths in G. (The algorithm should not list all
that the functions fi are nondecreasing: if tt < h’, then fi(h) < f~(h’).) the paths; just the number suffices.)
So the problem is: Given these functions {fi}, decide how many hours
to spend on each project (in integer values only) so that your average 23. Suppose you are given a directed graph G = (V, E) with costs on the edges
grade, as computed according to the fi, is as large as possible. In order ce for e ~ E and a sink t (costs may be negative). Assume that you also have
to be efficient, the running time of your algorithm should be polynomial ~te values d(v) for v E V. Someone claims that, for each node v E V, the
in n, g, and H; none of these quantities should appear as an exponent in quantity d(v).is the cost of the minimum-cost path from node v to the
sink t.
your running time.
(a) Give a linear-time algorithm (time O(m) if the graph has rn edges) that
21. Some time back, you helped a group of friends who were doing sim- verifies whether this claim is correct.
nlations for a computation-intensive investment company, and they’ve
come back to you with a new problem. They’re looking at n consecutive (b) Assume that the distances are correct, and d(v) is finite for all v ~ V.
days of a given stock, at some point in the past. The days are numbered Now you need to compute distances to a different sink t’. Give an
i = 1, 2 ..... n; for each day i, they have a price p(i) per share for the stock O(rn log n) algorithm for computing distances d’(v) for all nodes v ~ V
to the sink node t’. (Hint: It is useful to consider a new cost function
on that day.
defined as follows: for edge e = (v, w), let c’~ = c~ - d(v) + d(w). Is there
For certain (possibly large) values of k, they want to study what they a relation between costs of paths for the two different costs c and c’?)
call k-shot strategies. A k-shot strategy is a collection of m pairs of days
(hi, Sl) ..... (brn, sin), where 0 _< rn < k and 24. Gerrymandering is the practice of carving up electoral districts in very
careful ways so as to lead to outcomes that favor a particular poetical
l <_bl <Sl <b2 <s2..’<bm <Srn <-rt"
party. Recent court challenges to the practice have argued that through
We view these as a set of up to k nonoverlapping intervals, dur’.mg each this Calculated redistric~g, large numbers of voters are being effectively
of which the investors buy 1,000 shares of the stock (on day b~) and then (and intentionally) disenfranchised.
sel! it (on day s~). The return of a given k-shot strategy is simply the profit Computers, it turns out, have been implicated as the source of some
obtained from the rn buy-sell transactions, namely, of the "villainy" in the news coverage on this topic: Thanks to powerful
software, gerrymandering has changed from an activity carried out by a
1,000 ~ p(si) - p(bi). bunch of people with maps, pencil, and paper into the industrial-strength
process that it is today. Why is gerrymandering a computational problem?
The investors want to assess the value of k-shot strategies by running There are database issues involved in tracking voter demographics down
simulations on their rt-day trace of the stock price. Your goal is to design to the level of individual streets and houses; and there are algorithmic
an efficient algorithm that determines, given the sequence of prices, the k- issues involved in grouping voters into districts. Let’s think a bit about
shot strategy with the maximum possible return. Since k may be relatively what these latter issues look like.
large in these simulations, your running time shonld be polynomial in Suppose we have a set of n precincts P~, P2 ..... Pa, each containing
both r~ and k; it should not contain k in the exponent. m registered voters. We’re supposed to divide thes~e precincts into two
districts, each consisting of r~/2 of the precincts. Now, for each precinct,
22. To assess how "well-connected" two nodes in a directed graph are, one we have information on how many voters are registered to each of two
can not only look at the length of the shortest path between them, but
political parties. (Suppose, for simplicity, that every voter is registered
can also count the number of shortest paths.
to one of these two.) We’ll say that the set of precincts is susceptible to
This turns out to be a problem that can be solved efficiently, subject gerrymandering ff it is possible to perform the division into two districts
to some restrictions on the edge costs. Suppose we are given a directed in such a way that the same party holds a majority in both districts.
graph G = (V, E), with costs on the edges; the costs may be posigve or
Exercises 333
Chapter 6 Dynamic Programming
332
Give an algorithm to determine whether a given set of precincts the best way to sell x shares by day n. In other words, find natural numbers
is susceptible to gerrymandering; the running time of your algorithm Y~, Ya ..... Yn so that x = y~ +... + Yn, and selling Yi shares on day i for
i = 1, 2 ..... n maximizes the total income achievable. You should assume
should be polynomial in n and m. that the share value Pi is monotone decreasing, and f(.) is monotone
Example. Suppose we have n = 4 precincts, and the following information increasing; that is, selling a larger number of shares causes a larger
on registered voters. drop in the price. Your algorithm’s running time can have a polynomial
dependence on n (the number of days), x (the number of shares), and p~
1 2 3 4 (the peak price of the stock).
Precinct
Number registered for party A 55. 43 60 47 Example Co~sider the case when n = 3; the prices for the three days are
40 53 90, 80, 40; and f(y) = 1 for y _< 40,000 and f(y) = 20 for y > 40,000. Assume
Number registered for party B 45 57
you start with x = 100,000 shares. Selling all of them on day i would yield
This set of precincts is susceptible since, if we grouped precincts 1 a price of 70 per share, for a total income of 7,000,000. On the other hand,
and 4 into one district, and precincts 2 and 3 into the other, then party selling 40,000 shares on day 1 yields a price of 89 per share, and selling
A would have a majority in both districts. (Presumably, the "we" who are the remaining 60,000 shares on day 2 results in a price of 59 per share,
doing the grouping here are members of party A.) This example is a quick for a total income of 7,100,000.
illustration of the basic unfairness in gerrymandering: Although party A
holds only a slim majority in the overall population (205 to 195), it ends 26. Consider the following inventory problem. You are running a company
up with a majority in not one but both districts. that sells some large product (let’s assume you sell trucks), and predic-
tions tell you the quantity of sales to expect over the next n months. Let
25. Consider the problem facedby a stockbroker trying to sell a large number di denote the number of sales you expect in month i. We’ll assume that
of shares of stock in a company whose stock price has been steadily MI sales happen at the beginning of the month, and trucks that are not
falling in value. It is always hard to predict the right moment to sell stock, sold are stored until the beginning of the next month. You can store at
but owning a lot of shares in a single company adds an extra complication: most S trucks, and it costs C to store a single truck for a month. You
the mere act of selling many shares in a single day will have an adverse receive shipments of trucks by placing orders for them, and there is a
effect on the price. fixed ordering fee of K each time you place an order (regardless of the
Since future market prices, and the effect of large sales on these number of trucks you order). You start out with no trucks. The problem
prices, are very hard to predict, brokerage firms us.e models of the market is to design an algorithm that decides how to place orders so that you
to help them make such decisions. In this problem, we will consider the satisfy all the demands {d~}, and minimize the costs. In summary:
following simple model. Suppose we need to sell x shares of stock in a * There are two parts to the cost: (1) storage--it costs C for every truck
company, and suppose that we have an accurate model of the market: on hand that is not needed that month; (2) ordering fees--it costs K
it predicts that the stock price will take the values Pl, P2 ..... Pn over the for every order placed.
next n days. Moreover, there is a function f(.) that predicts the effect
* In each month you need enough trucks to satisfy the demand d~,
of large sales: ff we sell y shares on a single day, it will permanently
but the number left over after satisfying the demand for the month
decrease the price by f(y) from that day onward. So, ff we sell Yl shares
should not exceed the inventory limit S.
on day 1, we obtain a price per share of Pl - f(Yl), for a total income of
yl- (p~ - f(Y3). Having sold y~ shares on day 1, we can then sell y2 shares Give an algorithm that solves this problem in time that is polynomial in
on day 2 for a price per share ofp2 - f(Y0 - f(Y2); this yields an additional n andS.
income of Y2 (P2 - f(Y3 - f(Y2))" This process continues over all n days.
(Note, as in our calculation for day 2, that the decreases from earlier days 27. The owners of an Independently operated gas station are faced with the
are absorbed into the prices for all later days.) following situation. They have a large underground tank in which they
Design an efficient algorithm that takes the prices p~ ..... pn~a~d the store gas; the tank can hold up to L gallons at one time. Ordering gas is
function f(.) (written as a list of values f(1), f(2) ..... f(x)) and determines quite expensive, so they want to order relatively rarely. For each order,
Notes and Further Reading 335
Chapter 6 Dynamic Programming
334
with a spanning subtree T of G[Z]. The weight of the Steiner tree is the
they need to pay a fixed price P for delivery in addition to the cost of the
weight of the tree T.
gas ordered. However, it costs c to store a gallon of gas for an extra day,
so ordering too much ahead increases the storage cost. Show that there is function f(.) and a polynomial function p(.) so that
the problem of finding a ~um-weight Steiner tree on X can be solved
They are planning to close for a week in the winter, and they want
in time O(f(k). p(n)).
their tank to be empty by the time they close. Luckily, based on years of
experience, they have accurate projections for how much gas they will
need each day tmtfl this point in time. Assume that there are n days left
unti! they close, and they need gt gallons of gas for each of the days Notes and Further Reading
i = 1 ..... n. Assume that the tank is empty at the end of day 0. Give an
Richard Bellman is credited with pioneering the systematic study of dynamic
algorithm to decide on which days they should place orders, and how
programming (Bellman 1957); the algorithm in this chapter for segmented least
much to order so as to minimize their total cost. squares is based on Bellman’s work from this early period (Bellman 1961).
Dynamic programming has since grown into a technique that is widely used
28. Recall the scheduling problem from Section 4.2 in which we sought to across computer science, operations research, control theory, and a number
~ze the maximum lateness. There are r~ jobs, each with a deadline
of other areas. Much of the recent work on this topic has been concerned with
dt and a required processing time tt, and all jobs are available to be
scheduled starting at l~me s. For a job i to be done, it needs tO be assighed stochastic dynamic programming: Whereas our problem formulations tended
to tacitly assume that al! input is known at the outset, many problems in
a period from st >_ s to ft = st + tt, and different jobs should be assigned
nonoverlapping intervals. As usual, such an assignment of times will be scheduling, production and inventory planning, and other domains involve
uncertainty, and dynamic programming algorithms for these problems encode
called a schedule.
this uncertainty using a probabilistic formulation. The book by Ross (1983)
In this problem, we consider the same setup, but want to optimize a provides an introduction to stochastic dynamic programming.
different objective. In particular, we consider the case in which each job
must either be done by its deadline or not at all. We’ll say that a subset J of Many extensions and variations of the Knapsack Problem have been
studied in the area of combinatorial optimization. As we discussed in the
the jobs is schedulable if there is a schedule for the jobs in J so that each
chapter, the pseudo-polynomial bound arising from dynamic programming
of them finishes by its deadline. Your problem is to select a schedulable
can become prohibitive when the input numbers get large; in these cases,
subset of maximum possible size and give a schedule for this subset that
dynamic programming is often combined with other heuristics to solve large
allows each job to finish by its deadline.
(a) Prove that there is an optimal solution J (i.e., a schedulable set of instances of Knapsack Problems in practice. The book by Martello and Toth
(1990) is devoted to computational approaches to versions of the Knapsack
maximum size) in which the jobs in J are scheduled in increasing
Problem.
order of their deadlines.
Dynamic programming emerged as a basic technique in computational bi-
(b) Assume that a~ deadlines dt and required times tt are integers. Give ology in the early 1970s, in a flurry of activity on the problem of sequence
an algorithm to find an optimal solution. Your algorithm should
comparison. Sankoff (2000) gives an interesting historical account of the early
run in time polynomial in the number of jobs n, and the maximum work in this period. The books by Waterman (1995) and Gusfield (1997) pro-
deadline D = rnaxt dr. vide extensive coverage of sequence alignment algorithms (as well as many
related algorithms in computational biology); Mathews and Zuker (2004) dis-
29. Let G : (V, E) be a graph with n nodes in which each pair of nodes is cuss further approaches to the problem of RNA secondary structure prediction.
joined by an edge. There is a positive weight tvq on each edge (i,j); and
The space-efficient algorithm for sequence alignment is due to Hirschberg
we will assume these weights satisfy the triangle inequality tvtk < tvq + tvjk.
(1975).
For a subset V’ ___ V, we will use G[V’] to denote the subgraph (with edge
The algorithm for the Shortest-Path Problem described in this chapter is
weights) induced on the nodes in V’.
We are given a set X c V of k terminals that must be connected by based originally on the work of Bellman (1958) and Ford (1956). Many op-
edges. We say that a Stein~r tree on X is a set Z so that X __ Z _ V, together timizations, motivated both by theoretical and experimental considerations,
336 Chapter 6 Dynamic Programming
have been added to this basic approach to shortest paths; a Web site main-
tained by Andrew Goldberg contains state-of-the-art code that he has de-
veloped for this problem (among a number of others), based on work by
Cherkassky, Goldberg and Radzik (1994). The applications of shortest-path
methods to Internet routing, and the trade-offs among the different algorithms
for networking applications, are covered in books by Bertsekas and Gallager
(1992), Keshav (1997), and Stewart (1998).
Notes on the Exercises Exercise 5 is based on discussions with Lillian Lee;
Exercise 6 is based on a result of Donald Knuth; Exercise 25 is based on results
of Dimitris Bertsimas and Andrew Lo; and Exercise 29 is based on a result of
S. Dreyfus and R. Wagner.
distinct sets of objects, such as the relation between customers and stores; or There is a single source node s e V.
houses and nearby fire stations; and so forth. There is a single sink node t ~ V.
One of the oldest problems in combinatorial algorithms is that of deter- Nodes other than s and t will be called internal nodes.
mining the size of the largest matching in a bipartite graph G. (As a special t0
We will make two assumptions about the flow networks we deal with: first,
case, note that G has a perfect matching if and orfly if IXl -- IVl and it has a that no edge enters the source s and no edge leaves the sink t; second, that
matching of size IXI.) This problem turns out to be solvable by an algorithm there is at least one edge incident to each node; and third, that a!l capacities
that runs in polynomial time, but the development of this algorithm needs are integers. These assumptions make things cleaner to think about, and while
ideas fundamentally different from the techniques that we’ve seen so far. they eliminate a few pathologies, they preserve essentially all the issues we
Rather than developing the algorithm directly, we begin by formulating a want to think about.
general class of problems--network ]:low problems--that includes the Bipartite Figure 7.2 illustrates a flow network with four nodes and five edges, and
Matching Problem as a special case. We then develop a polynomial-time Figure 7.2 A flow network,
capacity values given next to each edge.
algorithm for a general problem, the Maxim~m-FIotv Problem, and show how with source s and sink t. The
Defining Flow Next we define what it means for our network to carry traffic, numbers next to the edges
this provides an efficient algorithm for Bipartite Matching as well. While the are the capacities.
or flow. We say that an s-t [low is a function [ that maps each edge e to a
initial motivation for network flow problems comes from the issue of traffic in
nonnegative real number, ]: :E --~ R+; the value ]:(e) intuitively represents the
a network, we will see that they have applications in a surprisingly diverse set
of areas and lead to efficient algorithms not just for Bipartite Matching, but amount of flow carried by edge e. A flow ]: must satisfy the following two
properties.1
for a host of other problems as well.
(i) (Capacity conditions) For each e ~ E, we have 0 _< f(e) <_ %
(ii) (Conservation conditions) For each node v other than s and t, we have
7.1 The Maximum-Flow Problem and the
Ford-Fulkerson Algorithm
e into u e out of u
f! The Problem Here Y~,einto u f(e) sums the flow value f(e) over al! edges entering node v,
One often uses graphs to model transportation networks--networks whose while ~e out of u f(e) is the sum of flow values over all edges leaving node v.
edges carry some sort of traffic and whose nodes act as "switches" passing Thus the flow on an edge cannot exceed the capacity of the edge. For
traffic between different edges. Consider, for example, a highway system in every node other than the source and the sink, the amount of flow entering
which the edges are highways and the nodes are interchanges; or a computer must equal the amount of flow leaving. The source has no entering edges (by
network in which the edges are links that can carry packets and the nodes are our assumption), but it is allowed to have flow going out; in other words, it
switches; or a fluid network in which edges are pipes that carry liquid, and can generate flow. Symmetrically, the sink is allowed to have flow coming in,
the nodes are iunctures where pipes are plugged together. Network models even though it has no edges leaving it. The value of a flow f, denoted v(f), is
of this type have several ingredients: capacities on the edges, indicating how defined to be the amount of flow generated at the source:
much they can carry; source nodes in the graph, which generate traffic; sink
(or destination) nodes in the graph, which can "absorb" traffic as it arrives; v(f)-- y~. f(e).
and fina~y, the traffic itself, which is transmitted across the edges. e out of s
Flow Networks We’ll be considering graphs of this form, and we refer to the To make the notation more compact, we define ]:°ut(v)= ~e out of u f(e)
traffic as ]:low--an abstract entity that is generated at source nodes, transmitted and ]:~(v) = ~e into u f(e). We can extend this to sets of vertices; if S c_ V, we
across edges, and absorbed at sink nodes. Formally, we’ll say that a [low
network is a directed graph G = (V, E) with the following features. i Our notion of flow models traffic as it goes through the network at a steady rate. We have a single
variable f(e) to denote the amount of flow on edge e. We do not model bursty traffic, where the flow
o Associated with each edge e is a capacity, which is a nonnegative number
fluctuates over time.
that we denote ce.
7.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm
Chapter 7 Network Flow 341
54O
define fout (S) = ~e out of S f(e) and fin(s) ---- ~e into $ f(e). In this terminology,
the conservation condition for nodes v ~ s, t becomes fin(v) = f°ut(v); and we
10 10
can write v(f) = f°Ut(s).
The Maximum-Flow Problem Given a flow network, a natural goa! is to
arrange the traffic so as to make as efficient use as possible of the available
capacity. Thus the basic algorithmic problem we will consider is the following:
Given a flow network, find a flow of maximum possible value. 2O
It was purely to be able to perform this operation that we defined the residual
~0 20 graph; to reflect the importance of augment, one often refers to any s-t path
in the residual graph as an augmenting path.
The result of augment(i:, P) is a new flow [’ in G, obtained by increasing
(a) (b) and decreasing the flow values on edges of P. Let us first verify that [’ is indeed
a flow.
Figure 7.4 (a) The graph G with the path s, u, u, t used to push the first 20 units of flow.
(b) The residual graph of the resulting flow [, with the residual capacity next to each
edge. The dotted line is the new augmenting path. (c) The residual graph after pushing (7.1) f ’ is a flow in G.
an additional !0 units of flow along the new augmenting path s, v, u, t.
Proof. We must verify the capacity and conservation conditions.
So we include the edge e = (u, u) in Gf, with a capacity of ce - f(e). We Since f’ differs from f only on edges of P, we need to check the capacity
will call edges included this way forward edges. conditions only on these edges. Thus, let (u, u) be an edge of P. Informally,
o For each edge e = (u, v) of G on which f(e) > 0, there are f(e) units of the capacity condition continues to hold because if e = (u, u) is a forward
flow that we can "undo" if we want to, by pushing flow backward, sO edge, we specifically avoided increasing the flow on e above ce; and if (u, u)
we include the edge e’ = (v, u) in Gf, with a capacity of f(e). Note that is a backward edge arising from edge e = (u, u) ~ E, we specific!lly avoided
e’ has the same ends as e, but its direction is reversed; we will call edges decreasing the flow on e below 0. More concretely, note that bottleneck(P, f)
included this way backward edges. is no larger than the residual capacity of (u, v). If e = (u, v) is a forward edge,
then its residual capacity is ce - f(e); thus we have
This completes the definition of the residual graph Gf. Note that each edge e
in G can give rise to one or two edges in Gf: If 0 < f(e) < ce it results in both 0 <_ f(e) <_ f’(e) = f(e) + bottleneck(P, f) <_ f(e) + (ce - f(e)) = ce,
a forward edge and a backward edge being included in Gf. Thus Gf has at
most twice as many edges as G. We will sometimes refer to the capacity of an so the capacity condition holds. If (u, u) is a backward edge arising from edge
edge in the residual graph as a residual capacity, to help distinguish it from e = (u, u) ~ E, then its residual capacity is f(e), so we have
the capacity of the corresponding edge in the original flow network G.
Augmenting Paths in a Residual Graph Now we want to make precise the ce >_ f(e) >_ f’(e) = f(e) - bottleneck(P, f) >_ f(e) - f(e) = O,
way in which we push flow from s to t in Gf. Let P be a simple s-t path in
that is, P does not visit any node more than once. We define bottleneck(P, and again the capacity condition holds.
to be the minimum residual capacity of any edge on P, with respect to the We need to check the conservation condition at each internal node that
flow f. We now define the following operation augment(f, P), which yields a lies on the path P. Let u be such a node; we can verify that the change in
new flow f’ in G. the amount of flow entering v is the same as the change in the amount of
flow exiting u; since f satisfied the conservation condition at u, so must f’.
augment(f, P) Technically, there are four cases to check, depending on whether the edge of
Let b = bottleneck(P, f) P that enters v is a forward or backward edge, and whether the edge of P that
For each edge (u,u)~P exits u is a forward or backward edge. However, each of these cases is easily
then
If e= (u, u) is a forw~rd~edge worked out, and we leave them to the reader. N
increase f(e) in G by b;
7.1 The Maximum-Flow Problem and the Ford-Fulkerson Algorithm 345
Chapter 7 Network Flow
344
Proof. The first edge e of P must be an edge out of s in the residual graph
This augmentation operation captures the type of forward and backward
Gf; and since the path is simple, it does not visit s again. Since G has no
pushing of flow that we discussed earlier. Let’s now consider the following edges entering s, the edge e must be a forward edge. We increase the flow
algorithm to compute an s-t flow in G. on this edge by bottleneck(P, f), and we do not change the flow on any
other edge incident to s. Therefore the value of f’ exceeds the value of f by
Max-Flow bottleneck(P, f). m
Initially [(e)=0 for all e in G
While there is an s-t path in the residual graph We need one more observation to prove termination: We need to be able
Let P be a simple s-t path in G[ to bound the maximum possible flow value. Here’s one upper bound: If all the
f’ = augment(f, P) edges out of s could be completely saturated with flow, the value of the flow
Update [ to be f’ would be y~,eoutofsCe. Let C denote this sum. Thus we have v(f) < C for all
Update the residual graph G[ to be G[, s-t flows f. (C may be a huge. overestimate of the maximum value of a flow
Endwhile in G, but it’s handy for us as a finite, simply stated bound.) Using statement
Keturn [ (7.3), we can now prove termination.
We’ll call this the Ford-Fulkerson Algorithm, after the two researchers who {7.4) Suppose, as above, that all capacities in the flow network G are integers.
developed it in 1956. See Figure 7.4 for a run of the algorithm. The Ford- Then the Ford-Fulkerson Algorithm terminates in at most C iterations of the
Fulkerson Algorithm is really quite simple. What is not at all clear is w, hethe! While loop.
its central ~h±le loop terminates, and whether the flow returned is a maximum
flow. The answers to both of these questions turn out to be fairly subtle. Proof. We noted above that no flow in G can have value greater than C, due to
the capacity condition on the edges leaving s. Now, by (7.3), the value of the
~ Analyzing the Algorithm: Termination and Running Time flow maintained by the Ford-Fulkerson Algorithm increases in each iteration;
First we consider some properties that the algorithm maintains by induction so by (7.2), it increases by at least 1 in each iteration. Since it starts with the
on the number of iterations of the ~hile loop, relying on our assumption that value 0, and cannot go higher than C, the Wh±le loop in the Ford-Fulkerson
all capacities are integers. Algorithm can run for at most C iterations. ¯
(7.2) At every intermediate stage of the Ford-Fulkerson Algorithm, the flow
Next we consider the running time of the Ford-Fulkerson Algorithm. Let n
values {f(e)} and the residual capacities in G[ are integers. denote the number of nodes in G, and m denote the number of edges in G. We
Proof. The statement is clearly true before any iterations of the Vhile loop. have assumed that all nodes have at least one incident edge, hence m > n/2,
Now suppose it is true after ] iterations. Then, since all residual capacities in and so we can use O(m + n) = O(m) to simplify the bounds.
Gf are integers, the value bottleneck(P, f) for the augmenting path found in
iteration j + 1 will be an integer. Thus the flow f’ will have integer values, and {7.B) Suppose, as above, that all capacities in the flow network G are integers.
hence so wil! the capacities of the new residual graph, m Then the Ford-Fulkerson Algorithm can be implemented to run in O(mC) time.
We can use this property to prove that the Ford-Fulkerson Algorithm Proof. We know from (7.4) that the algorithm terminates in at most C itera-
terminates. As at previous points in the book we wil! look for a measure of tions of the Wh±le loop. We therefore consider the amo,unt Of work involved
progress that will imply termination. in one iteration when the current flow is [.
First we show that the flow value strictly increases when we apply an The residual graph Gf has at most 2m edges, since each edge of G gives
augmentation. rise to at most two edges in the residual graph. We will maintain Gf using an
adjacency list representation; we will have two linked lists for each node v,
(7.3) Let f be a flow in G, and let P be a simple s-t path in G[. Theft one containing the edges entering v, and one containing the edges leaving v.
v(f’) : v(f) + bottleneck(P, f); and since bottleneck(P, f) > 0,_we have To find an s-t path in G[, we can use breadth-first search or depth-first search,
9(f’) > ,(f).
7.2 Maximum Flows and Minimum Cuts in a Network 347
Chapter 7 Network Flow
346
This statement is actually much stronger than a simple upper bound. It
which run in O(m + n) time; by our assumption that ra >_ n/2, O(m + n) is the
says that by watching the amount of flow f sends across a cut, we can exactly
same as O(m). The procedure augment (f, P) takes time O(n), as the path P
measure the flow value: It is the total amount that leaves A, minus the amount
has at most n - I edges. Given the new flow f’, we can build the new residual
that "swirls back" into A. This makes sense intuitively, although the proof
graph in O(rn) time: For each edge e of G, we construct the correct forward requires a little manipulation of sums.
and backward edges in G[,. []
Proof. By definition u(f) = f°Ut(s). By assumption we have fin(s) = 0, as the
A somewhat more efficient version of the algorithm would maintain the source s has no entering edges, so we can write v(f) = f°Ut(s) - fin(s). Since
linked lists of edges in the residual graph Gf as part of the augment procedure every node v in A other than s is internal, we know that f°ut(v) - fin(v) = 0
that changes the flow f via augmentation. for all such nodes. Thus
v(f) = ~(f°ut(v) - fin(u)),
u~A
7.2 Maximum Flows and Minimum Cuts in a
since the only term in this sum that is nonzero is the one in which v is set to s.
Network
Let’s try to rewrite the sum on the right as follows. If an edge e has both
We now continue with the analysis of the Ford-Fulkerson Algorithm, an activity
ends in A, then f(e) appears once in the sum with a "+" and once with a "-",
that will occupy this whole section. In the process, we will not only learn a
and hence these two terms cancel out. If e has only its tail in A, then f(e)
lot about the algorithm, but also find that analyzing the algorithm provide~ us
appears just once in the sum, with a "+’. If e has only its head in A, then f(e)
with considerable insight into the Maximum-Flow Problem itself.
also appears just once in the sum, with a "-". Finally, if e has neither end in
A, then f(e) doesn’t appear in the sum at all. In view of this, we have
f!~ Analyzing the Algorithm: Flows and Cuts
~ f°ut(v) - fin(v) = f(e) - ~ f(e) = f°Ut(A) - fin(A).
Our next goal is to show that the flow that is returned by the Ford-F~kerson v~A e out of A e into A
Algorithm has the maximum possible value of any flow in G. To make progress
toward this goa!, we return to an issue that we raised in Section 7.1: the way in Putting together these two equations, we have the statement of (7.6).
which the structure of the flow network places upper bounds on the maximum
value of an s-t flow. We have already seen one upper bound: the value u(f) of If A = {s}, then f°Ut(A) = f°Ut(s), and fin(A) = 0 as there are no edges
any s-t-flow f is at most C = ~e out of s Ce. Sometimes this bound is useful, but entering the source by assumption. So the statement for this set A = {s} is
sometimes it is very weak. We now use the notion of a cut to develop a much exactly the definition of the flow value u (f).
more genera! means of placing upper bounds on the maximum-flow value. Note that if (A, B) is a cut, then the edges into B are precisely the edges
Consider dividing the nodes of the graph into two sets, A and .B, so that out of A. Similarly, the edges out of B are precisely the edges into A. Thus we
s ~ A and t s B. As in our discussion in Section 7.1, any such division places have f°Ut(A) = fin(B) and fin(A) = f°Ut(B), just by comparing the definitions
an upper bound on the maximum possible flow value, since all the flow must for these two expressions. So we can rephrase (7.6) in the following way.
cross from A to B somewhere. Formally, we say that an s-t cut is a partition (7.7) Let f be any s-t flow, and (A,B) any s-t cut. Then v(f) =fin(B) -f°Ut(B).
(A, B) of the vertex set V, so that s s A and t ~ B. The capacity of a cut (A, B),
which we wil! denote c(A, B), is simply the sum of the capacities of all edges If we set A = V - {t} and B = {t} in (7.7), we have v(f) = fin(B) - f°Ut(B) =
fin(t) - f°ut(t). By our assumption the sink t has no leavj,’ng edges, so we have
out of A: c(A, B) = ~e out of ACe"
f°ut(t) = O. This says that we could have originally defined the value of a flow
Cuts turn out to provide very natural upper bounds on the values of flows,
equally wel! in terms of the sink t: It is fEn(t), the amount of flow axfiving at
as expressed by our intuition above. We make this precise via a sequence of
the sink.
facts.
A very useful consequence of (7.6) is the following upper bound.
(7.6} Let f be any S-t flow, and (A,B) any s-t cut. Then V(f)
(7.8) Let f be any s’t flow, and (A, B) any s’t cut. Then v(f) < c(A; B):
7.2 Maximum Flows and Minimum Cuts in a Network 349
Chapter 7 Network Flow
348
~ f(e)
out of A
e out of A
= c(A, B).
n(U’, v’) carries~
Here the first line is simply (7.6); we pass from the first to the second since o flow.
fro(A) >_ 0, and we pass from the third to the fourth by applying the capacity
conditions to each term of the sum. [] Figure 7.5 The (A*, B*) cut in the proof of (7.9).
The harder part is to bound the number of augmentations done in each > c(A, B) - mA.
scaling phase. The idea here is that we are using paths that augment the flow Here the first inequality follows from our bounds on the flow values of edges
by a lot, and so there should be relatively few augmentations. During the A- across the cut, and the second inequality follows from the simple fact that the
scaling phase, we only use edges with residual capacity of at least A. Using graph only contains m edges total.
(7.3), we have The maximum-fiow value is bounded by the capacity of any cut by (7.8).
We use the cut (A, B) to obtain the bound claimed in the second statement.
(7.17) Durin~ the A-scaling phase, each augmentation increases the. [low
value by at least A.
7.4 The Preflow-Push Maximum.Flow Algorithm
Chapter 7 Network Flow 357
356
number of iterations that is polynomial in the numbers 4 and 5, completely
(7.19) The number of aug//1entations in a scaling phase is at most 2//1. independently of the values of the capacities. Such an algorithm, which is
Proof. The statement is clearly true in the first scaling phase: we can use polynomial in IVI and IEI only, and works with numbers having a polyno-
each of the edges out of s only for at most one augmentation in that phase. mial number of bits, is called a strongly polynomial algorithm. In fact, there
Now consider a later scaling phase A, and let fp be the flow at the end of the is a simple and natural implementation of the Ford-Fulkerson Algorithm that
previous scaling phase. In that phase, we used A’ = 2A as our parameter. By leads to such a strongly polynomial bound: each iteration chooses the aug-
(7.18), the maximum flow f* is at most v(f*) <_ u(fp) + mlA’ = v(fp) + 2mlA. In menting path with the fewest number of edges. Dinitz, and independently
the A-scaling phase, each augmentation increases the flow by at least A, and Edmonds and Karp, proved that with this choice the algorithm terminates in
hence there can be at most 2//1 augmentations. " at most O(mn) it6rations. In fact, these were the first polynomial algorithms
for the Maximum-Flow Problem. There has since been a huge amount of work
An augmentation takes 0(//1) time, including the time required to set up devoted to improving the running times of maximum-flow algorithms. There
the graph and find the appropriate path. We have at most 1 + [log2 C1 scaling are currently algorithms that achieve running times of O(mln log n), O(n3), and
phases and at most 2m augmentations in each scaling phase. Thus we have O(min(n2/~, ml/a)m log n log U), where the last bound assumes that all capac-
the following result. ities are integral and at most U. In the next section, we’ll discuss a strongly
polynomial maximum-flow algorithm based on a different principle.
(7.20) The Scaling Max.Flow Algorithm in a graph with m edges and irtteger
capacities finds a mlaximum flow in at most 2//1(1 + [log2 C]) augmentations. 7.4 The Preflow-Push Maximum-Flow Algorithm
It can be implemented to run in at most 0(//l2 log2 C) time.
From the very beginning, our discussion of the Maximum-Flow Problem-has
been centered around the idea of an augmenting path in the residual graph.
When C is large, this time bound is much better than the O(mC) bound
However, there are some very powerful techniques for maximum fl0w that are
that applied to an arbitrary implementation of the Ford-Fulkerson Algorithm.
not explicitly based on augmenting paths. In this section we study one such
In our example at the beginning of this section, we had capacities of size technique, the Preflow-Push Algorithm.
100, but we could just as well have used capacities of size 21°°; in this case,
the generic Ford-Fulkerson Algorithm could take time proportional to 21°°,
while the scaling algorithm will take time proportional to log2(21°°) = 100. ~ Designing the Algorithm
One way to view this distinction is as follows: The generic Ford-Fulkerson Algorithms based on augmenting paths maintain a flow f, and use the augment;
Algorithm requires time proportional to the/magnitude of the capacities, while procedure to increase the value of the flow. By way of contrast, the Preflow-
the scaling algorithm only requires time proportional to the number of bits Push Algorithm will, in essence, increase the flow on an edge-by-edge basis.
needed to specify the capacities in the input to the problem. As a result, the Changing the flow on a single edge will typically violate the conservation con-
scaling algorithm is running in time polynomial in the size of the input (i.e., the dition, and so the algorithm will have to maintain something less well behaved
number of edges and the numerical representation of the capacities), and so than a flow--something that does not obey conservation--as it operates.
it meets our traditional goal of achieving a polynomial-time algorithm. Bad Preflows We say that an s-t preflow (prefIow, for short) is a function f that
implementations of the Ford-Fnlkerson Algorithm, which can require close maps each edge e to a nonnegative real number, f : E --~ R+. A preflow f must
to C iterations, do not meet this standard of polynomiality. (Recall that in satisfy the capacity conditions:
Section 6.4 we used the term pseudo-polynomia! to describe such algorithms,
(i) For each e E E, we have 0 < f(e) <_ ce.
which are polynomial in the magnitudes of the input numbers but not in the
number of bits needed to represent them.) In place of the conservation conditions, we require only inequalities: Each
node other than s must have at least as much flow entering as leaving.
Extensions: Strongly Polynomial Algorithms (ii) For each node v other than the source s, we have
Could we ask for something qualitatively better than what the scaling algo-
rithm guarantees? Here is one thing we could hope for: Our example graph ~ f(e)>_ ~ f(e).
(Figure 7.6) had four nodes and five edges; so it would be nice to use a e into v e out of v
7.4 The Preflow-Push Maximum-Flow Algorithm
Chapter 7 Network Flow 359
358
Intuitively, the height difference n between the source and the sink is meant to
We wil! call the difference ensure that the flow starts high enough to flow from s toward the sink t, while
the steepness condition will help by making the descent of the flow gradual
e into v e out of v enough to make it to the sink.
the excess of the preflow at node v. Notice that a preflow where all nodes The key property of a compatible preflow and labeling is that there can be
other than s and t have zero excess is a flow, and the value of the flow is no s-t path in the residual graph.
exactly el(t) = -ef(s). We can still define the concept of a residual graph Gf
for a preflow f, iust as we did for a flow. The algorithm wil! "push" flow along (7.21) If s-t preflow f is compatible with a labeling h, then there is no s-t
edges of the residual graph (using both forward and backward edges). path in the residuhI graph Gf.
Pref~ows and Labelings The Preflow-Push Algorithm will maintain a preflow Proof. We prove the statement by contradiction. Let P be a simple s-t path in
and work on converting the preflow into a flow. The algorithm is based on the the residual graph G. Assume that the nodes along P are s, v1 ..... vk = t. By
physical intuition that flow naturally finds its way "downhi!l." The "heights" definition of a labeling compatible with preflow f, we have that h(s) = n. The
for this intuition will be labels h(v) for each node v that the algorithm will edge (s, vl) is in the residual graph, and hence h(v~) > h(s) - 1 = n - 1. Using
define and maintain, as shown in Figure 7.7. We will push flow from nodes induction on i and the steepness condition for the edge (vi_~, vi), we get that
with higher labels to those with lower labels, following the intuition that fluid for all nodes vi in path P the height is at least h(vi) > n - i. Notice that the last
flows downhi!l. To make this precise, a labeling is a function h : V -+ Z>0 from node of the path is vk = t; hence we get that h(t) > n - k. However, h(t) = 0
the nodes to the non_negative integers. We will also refer to the labels as heights by definition; and k < n as the path P is simple. This contradiction proves the
of the nodes. We wil! say that a labeling h and an s-t preflow f are compatible if claim. []
Since flows are nonnegative, we see that the sum of the excesses in B is zero; The hardest part of the analysis is proving a bound on the number of
since each individual excess in B is nonnegative, they must therefore all be 0. nonsaturating pushes, and this also will be the bottleneck for the theoretical
bound on the running time.
(7.29) Throughout the algorithm, the number of nonsaturating push opera-
Now we are ready to prove that the labels do not change too much. Recall
tions is at most 2ham.
that n denotes the number of nodes in V.
7.4 The Preflo~v-Push Maximum-Flow Algorithm
Chapter 7 Network Flow 365
364
But since qb remains nonnegative throughout, and it decreases by at least
Heights
~pnuheheight of node w has to~
crease by 2 before it can |
sh flow back to node v. )
1 on each nonsaturating push operation, it follows that there can be at most
4ran2 nonsaturating push operations. []
(7.30) If at each step we choose the node with excess at maximum heigh.t,
then the number of nonsaturating push operations throughout the algorithm is
at most 4n3.
Nodes Proof. Consider the maximum height H = maxu:er(u)>0 h(v) of any node with
excess as the algorithm proceeds. The analysis will use this maximum height
Figure 7.8 After a saturating push(f, h, u, w), the height of u exceeds the heig,ht of tv H in place of the potential function qb in the previous O(nam) bound.
by 1.
This maximum height H can only increase due to relabeling (as flow
is always pushed to nodes at lower height), and so the total increase in H
Proof. For this proof, we will use a so-called potential function method. For a throughout the algorithm is at most 2nz by (7.26). H starts out 0 and remains
preflow f and a compatible .labeling h, we define nonnegative, so the number of times H changes is at most 4nz.
~(f,h)= ~ h(v) Now consider the behavior of the algorithm over a phase of time in
v:ef(v)>O which H remains constant. We claim that each node can have at mo~t one
nonsaturating push operation during this phase. Indeed, during this phase,
to be the sum of the heights of all nodes with positive excess. (~ is often called flow is being pushed from nodes at height H to nodes at height H - 1; and
a potential since it resembles the "potential energy" of all nodes with positive after a nonsaturating push operation from v, it must receive flow from a node
excess.) at height H + I before we can push from it again.
In the initial prefiow and labeling, all nodes with positive excess are at Since there are at most n nonsaturating push operations between each
height 0, so ~b (f, h) = 0. ¯ (f, h) remains nonnegative throughout the algo- change to H, and H changes at most 4n2 times, the total number of nonsatu-
rithm. A nonsaturating push(f, h, v, w) operation decreases cb(f, h) by at least rating push operations is at most 4n3. []
l, since after the push the node v will have no excess, and w, the only node
that gets new excess from the operation, is at a height 1 less than v. How-
As a follow-up to (7.30), it is interesting to note that experimentally the
ever, each saturating push and each relabel operation can increase q~(f, h).
computational bottleneck of the method is the number of relabeling operations,
A relabel operation increases cb(f, h) by exactly 1. There are at most 2nz
and a better experimental running time is obtained by variants that work on
relabel operations, so the total increase in q~(f, h) due to relabel opera-
increasing labels faster than one by one. This is a point that we pursue further
tions is 2nz. A saturating push(f, h, v, w) operation does not change labels,
in some of the exercises.
but it can increase q~(f, h), since the node w may suddenly acquire positive
excess after the push. This would increase q~(f, h) by the height of w, which
is at most 2n- 1. There are at most 2nm saturating push operations, so the Implementing the Preflow-Push Algorithm
total increase in q~(f, h) due to push operations is at most 2mn(2n - 1). So, Finally, we need to briefly discuss how to implement this algorithm efficiently.
between the two causes, ~(f, h) can increase by at most 4ranz during the
Maintaining a few simple data structures will allow us to effectively implement
algorithm.
7.5 A First Application: The Bipartite Matching Problem 367
Chapter 7 Network Flow
366
edge is not in the residual graph. In the first case, we clearly need to relabel v
the operations of the algorithm in constant time each, and overall to imple-
before applying a push on this edge. In the latter case, one needs to apply push
ment the algorithm in time O(mn) plus the number of nonsaturating push
to the reverse edge (tv, u) to make (v, tv) reenter the residual graph. However,
operations. Hence the generic algorithm will run in O(mn2) time, while the
version that always selects the node at maximum height will run in O(n3) time. when we apply push to edge (tv, v), then tv is above v., and so v needs to be
relabeled before one can push flow from v to tv again. E
We can maintain all nodes with excess on a simple list, and so we will be
able to select a node with excess in constant time. One has to be a bit more Since edges do not have to be considered again for push before relabeling,
careful to be able to select a node with maximum height H in constant time. In
we get the followi.ng.
order to do this, we will maintain a linked list of all nodes with excess at every
possible height. Note that whenever a node v gets relabeled, or continues to (7.32) When the current(v) pointer reaches the end of the edge list for v,
have positive excess after a push, it remains a node with maximum height H. the relabel operation can be applied to node v.
Thus we only have to select a new node after a push when the current node v
no longer has positive excess. If node v was at height H, then the new node at After relabeling node v, we reset current(v) to the first edge on the list and
maximum height will also be at height H or, if no node at height H has excess, start considering edges again in the order they appear on v’s list.
then the maximum height will be H - 1, since the previous push operation
out of v pushed flow to a node at height H - 1. (7.33) The running time of the Preflotv-Push Algorithm; implemented using
Now assume we have selected a node v, and we need to select an edge the above data structures, is O(mn) plus 0(1) for each nonsaturating push
(v, w) on which to apply push(f, h, v, w) (or relabel(f, h, u) if no sluch W operation. In particular, the generic Preflotv’Push Algorithm runs in O(n2m)
exists). To be able to select an edge quickly, we will use the adjacency list time, tvhiIe the version tvhere rye altvays select the node at maximum height
representation of the graph. More precisely, we will maintain, for each node v, runs in O(na) time. ~
all possible edges leaving v in the residual graph (both forward and backward
edges) in a linked list, and with each edge we keep its capacity and flow value. Proof. The initial flow and relabeling is set up in O(m) time. Both push and
Note that this way we have two copies of each edge in our data structure: a relabel operations can be implemented in O(1) time, once the operation
forward and a backward copy. These two copies will have pointers to each has been selected. Consider a node v. We know that v can be relabeled at
other, so that updates done at one copy can be carried over to the other one most 2n times throughout the algorithm. We will consider the total time the
in O(1) time. We will select edges leaving a node v for push operations in the algorithm spends on finding the fight edge on which to push flow out of node v,
order they appear on node v’s list. To facilitate this selection, we will maintain between two times that node v gets relabeled. If node v has du adjacent edges,
a pointer current(v) for each node v to the last edge on the list that has been then by (7.32) we spend O(du) time on advancing the current(v) pointer
considered for a push operation. So, if node v no longer has excess after a between consecutive relabelings of v. Thus the total time spent on advancing
nonsaturating push operation out of node v, the pointer current(v) will stay the current pointers throughout the algorithm is O(y~.u~V ndu) = O(mn), as
at this edge, and we will use the same edge for the next push operation out of claimed. ,,
v. After a saturating push operation out of node v, we advance current(v) to
the next edge on the list.
The key observation is that, after advancing the pointer current(v) from 7.5 A First Application: The Bipartite Matching
an edge (v, w), we will not want to apply push to this edge again until we Problem
relabel v. Having developed a set of powerful algorithms for the Maximum-Flow Prob-
lem, we now turn to the task of developing applications of maximum flows
(7.31) After the current(v) pointer is advanced from an edge (v, tv), roe and minimum cuts in graphs. We begin with two very basic applications. First,
cannot apply push to this edge until v gets relabeIed. in this section, we discuss the Bipartite Matching Problem mentioned at the
beginning of this chapter. In the next section, we discuss the more general
Proof. At the moment current(v) is advanced from the edge (v, tv), there is
Disjoint Paths Problem.
some reason push cannot be applied to this edge. Either h(tv) > h(v), or the
7.5 A First Application: The Bipartite Matching Problem 369
Chapter 7 Network Flow
368
/~--"~ Analyzing the Algorithm
~ The Problem The analysis is based on showing that integer-valued flows in G’ encode
One of our original goals in developing the Maximum-Flow Problem was to matchings in G in a fairly transparent fashion. First, suppose there is a
be able to solve the Bipartite Matching Problem, and we now show how to matching in G consisting of/~ edges (xil, Yil) ..... (Xik, yik). Then consider the
do this. Recall that a bipartite graph G = (V, E) is an undirected graph whose flow f that sends one unit along each path of the form s, xij, yij, t--that is,
node set can be partitioned as V = X U Y, with the property that every edge f(e) = 1 for each edge on one of these paths. One can verify easily that the
e ~ E has one end in X and the other end in Y. A matching M in G is a subset capacity and conservation conditions are indeed met and that f is an s-t flow
of the edges M ___ E such that each node appears in at most one edge in M. of value k.
The Bipartite Matching Problem is that of finding a matching in G of largest Conversely, suppose there is a flow f’ in G’ of value k. By the integrality
possible size. theorem for maximum flows (7.14), we know there is an integer-valued flow f
of value k; and since all capacities are 1, this means that f(e) is equal’to either
0 or 1 for each edge e. Now, consider the set M’ of edges of the form (x, y) on
which the flow value is 1.
~ Designing the Algorithm Here are three simple facts about the set M’.
The graph defining a matching problem is undirected, while flow networks are
directed; but it is actually not difficult to use an algorithm for the Maximu_m- (7.34) M’ contains t~ edges.
Flow Problem to find a maximum matching.
Proof. To prove this, consider the cut (A, B) in G’ with A = {s} U X. The value
Beginning with the graph G in an instance of the Bipartite M~tching
of the flow is the total flow leaving A, minus the total flow entering A. The
Problem, we construct a flow network G’ as shown in Figure 7.9. First we
first of these terms is simply the cardinality of M’, since these are the edges
direct all edges in G from X to Y. We then add a node s, and an edge (s, x) leaving A that carry flow, and each carries exactly one unit of flow. The second
from s to each node in X. We add a node t, and an edge (y, t) from each node
of these terms is 0, since there are no edges entering A. Thus, M’ contains/c
in Y to t. Finally, we give each edge in G’ a capacity of 1. edges. []
We now compute a maximum s-t flow in this network G’. We will discover
that the value of this maximum is equal to the size of the maximum matching (7.35) Each node in X is the tail of at most one edge in M’.
in G. Moreover, our analysis will show how one can use the flow itself to
recover the matching. Proof. To prove this, suppose x ~ X were the tail of at least two edges in M’.
Since our flow is integer-valued, this means that at least two units of flow
leave from x. By conservation of flow, at least two units of flow would have
to come into x--but this is not possible, since only a single edge of capacity 1
enters x. Thus x is the tail of at most one edge in M’. []
(7.37) The size of the maximum matching in G is equal to the value of the
Ca3 maximum flow in G’ ; and the edges in such a matching in G are the edges that
carry flow from X to Y in G’.
Figure 7.9 (a) A bipartite graph. (b) The corresponding flow network, with all capacities
equal to 1.
7.5 A First Application: The Bipartite Matching Problem 371
Chapter 7 Network Flow
370
Note the crucial way in which the integrality theorem (7.14) figured in
this construction: we needed to know if there is a maximum flow in G’ that
takes only the values 0 and 1.
Bounding the Running Time Now let’s consider how quickly we can com-
pute a maximum matching in G. Let n = IXI = IYI, and let m be the number
of edges of G. We’ll tacitly assume that there is at least one edge incident to
each node in the original problem, and hence m >_ n/2. The time to compute (a) (b)
a maximum matching is dominated by the time to compute an integer-valued
maximum flow in G’, since converting this to a matching in G is simple. For Figure 7.10 (a) A bipartite graph, with a matching M. (b) The augmenting path in the
corresponding residual graph. (c) The matching obtained by the augmentation.
this flow problem, we have that C = ~eoutofs Ce "~- IXI =-n, as s has an edge
of capacity I to each node of X. Thus, by using the O(mC) bound in (7.5), we
get the following. Extensions: The Structure of Bipartite Graphs with
No Perfect Matching
Algorithmically, we’ve seen how to find perfect matchings: We use the algo-
(7.38} The Ford-Fulkerson Algorithm can be used to find a maximum match-
" rithm above to find a maximum matching and then check to see if this matching
ing in a bipartite graph in O(mn) time. - ......... ~ ~ ............ .... is perfect.
............. : ~ .... ......
But let’s ask a slightly less algorithmic question. Not all bipartite graphs
have perfect matchings. What does a bipartite graph without a perfect match-
It’s interesting that if we were to use the "better" bounds of O(m2 log2 C) or
3) ing look like? Is there an easy way to see that a bipartite graph does not have a
O(n that we developed in the previous sections, we’d get the inferior running perfect matching--or at least an easy way to convince someone the graph has
times of O(m2 log n) or O(n~) for this problem. There is nothing contradictory no perfect matching, after we run the algorithm? More concretely, it would be
in this. These bounds were designed to be good for all instances, even when C nice if the algorithm, upon concluding that there is no perfect matching, could
is very large relative to m and n. But C = rt for the Bipartite Matching Problem, produce a short "certificate" of this fact. The certificate could allow someone
and so the cost of this extra sophistication is not needed. to be quickly convinced that there is no perfect matching, without havhag to
It is worthwhile to consider what the augmenting paths mean in the look over a trace of the entire execution of the algorithm.
network G’. Consider the matching M consisting of edges (x2, Y2), (X3, Y3)’ One way to understand the idea of such a certificate is as follows. We can
and (xs, Ys) in the bipartite graph in Figure 7.1; see also Figure 7210. Let f decide if the graph G has a perfect matching by checking if the maximum flow
be the corresponding flow in G’. This matching is not maximum, so f is not in a related graph G’ has value at least n. By the Max-Flow Min-Cut Theorem,
a maximum s-t flow, and hence there is an augmenting path in the residual there will be an s-t cut of capacity less than n if the maximum-flow value in
graph G}. One such augmenting path is marked in Figure 7.10(b). Note that G’ has value less than n. So, in a way, a cut with capacity less than n provides
the edges (x2, Y2) and (x3, y3) are used backward, and all other edges are used such a certificate. However, we want a certificate that has a natural meaning
forward. All augmenting paths must alternate between edges used backward in terms of the original graph G.
and forward, as all edges of the graph G’ go from X to Y. Augmenting paths
What might such a certificate look like? For example, if there are nodes
are therefore also called alternating paths in the context of finding a maximum
x1, x2 ~ X that have only one incident edge each, and the other end of each
matching. The effect of this augmentation is to take the edges used backward
edge is the same node y, then clearly the graph has no perfect matching: both
out of the matching, and replace them with the edges going forward. Because
xl and x2 would need to get matched to the same node y. More generally,
the augmenting path goes from s to t, there is one more forward edge than
consider a subset of nodes A _ X, and let F(A) _c y denote the set of all nodes
backward edge; thus the size of the matching increases by one.
7.6 Disjoint Paths in Directed and Undirected Graphs
372 Chapter 7 Network Flow 373
that are adjacent to nodes in A. If the graph has a perfect matching, then each tNode y can be moved "~
node in A has to be matched to a different node in F(A), so F(A) has to be at o the s-side of the cut.)
least as large as A. This gives us the following fact.
(7.39) If a bipartite graph G = (V, E) with two sides X and Y has a perfect
matching, then for all A c_ X we must have IF(A)[ >_
This statement suggests a type of certificate demonstrating that a graph
does not have a perfect matching: a set A c__ X such that IF(A)I < IAI. But is the
converse of (7.39) also true? Is it the case that whenever there is no perfect
matching, there is a set A like this that proves it? The answer turns out to
be yes, provided we add the obvious condition that IXl = Igl (without which
there could certainly not be a perfect matching). This statement is known
in the literature as Hall’s Theorem, though versions of it were discovered (a) (b)
independently by a number of different people--perhaps first by KSnig--in
Figure 7.11 (a) A minimum cut in proof of (7.40). (b) The same cut after moving node
the early 1900s. The proof of the statement also provides a way to find such a y to the A’ side. The edges crossing the cut are dark.
subset A in polynomial time.
(7.40) Assume that the bipartite graph G = (V, E) has two sides X and Y don’t have to be concerned about nodes x ~ X that are not in A. The two ends
such that [XI = [YI- Then the graph G either has a perfect matching or there is of the edge (x, y) will be on different sides of the cut, but this edge does not
a subset A c_X such that IF(A)I < IAI. A perfect matching or an appropriate add to the capacity of the cut, as it goes from B’ to A’.)
subset A can be found in O(mn) time: Next consider the capacity of this minimum cut (A’, B’) that has F(A) _.c A’
as shown in Figure 7.1! (b). Since all neighbors of A belong to A’, we see that
Proof. We wil! use the same graph G’ as in (7.37). Assume that IXI = IYI = n. the only edges out of A’ are either edges that leave the source s or that enter
By (7.37) the graph G has a maximum matching ff and only if the value of the the sink t. Thus the capacity of the cut is exactly
maximum flow in G’ is n.
We need to show that if the value of the maximum flow is less than n,
then there is a subset A such that IV(A)I < IAI, as claimed in the statement. Notice that IX ~B’] = n -]A[, and IY aA’] > [F(A)]. Now the assumption that
By the Max-Flow Min-Cut Theorem (7.12), if the maximum-flow value is less c(A’, B’) < n implies that
than n, then there is a cut (A’, B’) with capacity less than n in G’. Now the
set A’ contains s, and may contain nodes from both X and Y as shown in n - IAI + IF(A)[ _< IX A B’I 3- IY ~ A’[ = c(A’, B’) < n.
Figure 7.11. We claim that the set A =X N A’ has the claimed property. This Comparing the first and the last terms, we get the claimed inequaliW [A[ >
will prove both parts of the statement, as we’ve seen in (7.11) that a minimum /F(A)I. []
cut (A’, B’) can also be found by running the Ford-Fulkerson Algorithm.
First we claim that one can modify the minimum cut (A’, B’) so as to 7.6 Disjoint Paths in Directed and
ensure that F(A) _ A’, where A =X C/A’ as before. To do this, consider a node Undirected Graphs
y ~ F(A) that belongs to B’ as shown in Figure 7.11(a). We claim that by moving
y from B’ to A’, we do not increase the capacity of the cut. For what happens In Section 7.1, we described a flow f as a kind of "traffic" in the network.
But our actual definition of a flow has a much more static feel to it: For each
when we move y from B’ to A’? The edge (y, t) now crosses the cut, increasing
edge e, we simply specify a number f(e) saying the amount of flow crossing e.
the capacity by one. But previously there was at least one edge (x, y) with
Let’s see if we can revive the more dynamic, traffic-oriented picture a bit, and
x ~ A, since y ~ F(A); all edges from A and y used to cross the cut, and don’t
try formalizing the sense in which units of flow "travel" from the source to
anymore. Thus, overall, the capacity of the cut cannot increase. (Note that we
7.6 Disjoint Paths in Directed and Undirected Graphs
Chapter 7 Network Flow 375
not only give us the maximum number of edge-disjoint paths, but the paths
the sink. From this more dynamic view of flows, we will arrive at something
as well.
called the s-t Disjoint Paths Problem.
zation of the maximum number of edge-disjoint s-t paths. We say that a set
I~low around a cycle~ F ___ E of edges separates s from t if, after removing the edges F from the graph
~can be zeroed out. G, no s-t paths remain in the graph.
(7.45) In every directed graph with nodes s and t, the maximum number of
edge-disjoint s-t paths is equal to the minimum number of edges whose removal
separates s from t.
Proof. If the remOva! of a set F __c E of edges separates s from t, then each s-t
path must use at least one edge from F, and hence the number of edge-disjoint_
Figure 7.I2 The edges in the figure all carry one unit of flow. The path P of dashed s-t paths is at most IFI.
edges is one possible path in the proof of (7.42). To prove the other direction, we will use the Max-Flow Min-Cut Theorem
(7.13). By (7.43) the maximum number of edge-disjoint paths is the value v
We can summarize (7.41) and (7.42) in the following result. of the maximum s-t flow. Now (7.13) states that there is an s-t cut (A, B) with
capacity v. Let F be the set of edges that go from A to B. Each edge has capacity
{7.43} There are k edge-disjoint paths in a directed graph G from s to t if and 1, so IFI = v and, by the definition of an s-t cut, removing these u edges from
only if the value of the maximum value of an s-t flow in G is at least k. G separates s from t. *,
Notice also how the proof of (7.42) provides an actual procedure for This result, then, can be viewed as the natural special case of the Max-
constructing the k paths, given an integer-valued maximum flow in G. This Flow Min-Cut Theorem in which all edge capacities are equal to ~. In fact,
procedure is sometimes referred to as a path decomposition of the flow, since it this special case was proved by Menger in 1927, much before the full Max-
"decomposes" the flow into a constituent set of paths. Hence we have shown Flow Min-Cut Theorem was formulated and proved; for this reason, (7.45)
that our flow-based algorithm finds the maximum number of edge-disjoint s-t is often called Menger’s Theorem. If we think about it, the proof of Hall’s
paths and also gives us a way to construct the actual paths. Theorem (7.40) for bipartite matchings involves a reduction to a graph with
Bounding the Running Time For this flow problem, C = ~eoutofs ce < unit-capacity edges, and so it can be proved using Menger’s Theorem rather
Igl-= n, as there are at most IVI edges out of s, each of which has capac- than the general Max-Flow Min-Cut Theorem. In other words, Hall’s Theorem
it’] 1. Thus, by using the O(mC) bound in (7.5), we get an integer maximum is really a specia! case of Menger’s Theorem, which in turn is a special case
of the Max-Flow Min-Cut Theorem. And the history follows this progression,
flow in O(mn) time.
since they were discovered in this order, a few decades apaxt.2
The path decomposition procedure in the proof of (7.42), which produces
the paths themselves, can also be made to run in O(mn) time. To see this, note
that this procedure, with a little care, can produce a single path from s to t Extensions: Disjoint Paths in Undirected Graphs
using at most constant work per edge in the graph, and hence in O(m) time. Finally, we consider the disjoint paths problem in an undirected graph G.
Since there can be at most n 1 edge-disioint paths from s to t (each must Despite the fact that our graph G is now undirected, we can use the maximum-
use a different edge out of s), it therefore takes time O(mn) to produce al! the flow algorithm to obtain edge-disjoint paths in G. The idea is quite simple: We
paths. replace each undirected edge (u, v) in G by two directed edges (u, v) and
In summary, we have shown
a In fact, in an interesting retrospective written in 1981, Menger relates his version of the story of how
(7.44) The Ford-Fulkerson Algorithm can be used to find a maximum set of he first explained his theorem to K6nig, one of the independent discoverers of HaWs Theorem. You
edge-disjoint s-t paths in a directed graph G in O(mn) time. might think that K6nig, having thought a lot about these problems, would have immediately grasped
why Menger’s generalization of his theorem was true, and perhaps even considered it obvious. But, in
fact, the opposite happened; K6nig didn’t believe it could be right and stayed up all night searching
A Version of the Max-Flow Min-Cut Theorem for Disjoint Paths The Max-
for a counterexample. The next day, exhausted, he sought out Menger and asked him for the proof.
Flow Min-Cut Theorem (7.13) can be used to give the following characteri-
Chapter 7 Network Flow 7.7 Extensions to the Maximum-Flow Problem
378 379
(v, u), and in this way create a directed version G’ of G. (We may delete the be solved in polynomia! time because they can be reduced to the problem of
edges into s and out of t, since they are not useful.) Now we want to use the finding a maximum flow or a minimum cut in a directed graph.
Ford-Fulkerson Algorithm in the resulting directed graph. However, there is an Bipartite Matching is a natural first application in this vein; in the coming
important issue we need to deal with first. Notice that two paths P1 and P2 may sections, we investigate a range of further applications. To begin with, we
be edge-disjoint in the directed graph and yet share an edge in the undirected stay with the picture of flow as an abstract kind of "traffic," and look for
graph G: This happens if P1 uses directed edge (u, v) while P2 uses edge (v, u). more general conditions we might impose on this traffic. These more general
However, it is not hard to see that there always exists a maximum flow in any conditions will turn out to be useful for some of our further applications.
network that uses at most one out of each pair of oppositely directed edges. In particular, we focus on two generalizations of maximum flow. We will
(7.46) In any flow network, there is a maximum .flow f where for all opposite see that both can be reduced to the basic Maximum-Flow Problem.
directed edges e = (u, v) and e’ = (v, u), either f(e) = 0 or f(e’) = O. If the
capacities of the flow network are integral, then there also is such an integral
maximum flow.
~ The Problem: Circulations with Demands
Proof. We consider any maximum flow f, and we modify it to satisfy the
claimed condition. Assume e = (u, v) and e’= (v, u) are opposite directed One simplifying aspect of our initial formulation of the Maximum-Flow Prob-
edges, and f(e) 7~ 0, f(e’) ~: 0. Let ~ be the smaller of these values, and mo-dify lem is that we had only a single source s and a single sink t. Now suppose
f by decreasing the flow value on both e and e’ by 8. The resulting flow f’ is that there can be a set S of sources generating flow, and a set T of sinks that
can absorb flow. As before, there is an integer capacity on each edge.
feasible, has the same value as f, and its value on one of e and e’ is 0. []
With multiple sources and sinks, it is a bit unclear how to decide which
Now we can use the Ford-Fulkerson Algorithm and the path decomposition source or sink to favor in a maximization problem. So instead of maximizing
procedure from (7.42) to obtain edge-disjoint paths in the undirected graph G. the flow value, we wi!l consider a problem where sources have fixed supply
values and sinks have fixed demand values, and our goal is to ship flow
(7.47) There are k edge-disjoint paths in an undirected graph G ~rom s to t from nodes with available supply to those with given demands. Imagine, for
if and only if the maximum value of an s-t flow in the directed version G’ of G example, that the network represents a system of highways or railway lines in
is at least k. Furthermore, the Ford-Fulkerson Algorithm can be used to find a which we want to ship products from factories (which have supply) to retail
maximum set of disjoint s-t paths in an undirected graph G in O(mn) time. outlets (which have demand). In this type of problem, we will not be seeking to
maximize a particular value; rather, we simply want to satisfy all the demand
The undirected analogue of (7.45) is also true, as in any s-t cut, at most using the available supply.
one of the two oppositely directed edges can cross from the s-sid~ to the t- Thus we are given a flow network G = (V, E) with capacities on the edges.
side of the cut (for if one crosses, then the other must go from the t-side to Now, associated with each node v ~ V is a demand du. If du > 0, this indicates
the s-side). that the node v has a demand of du for flow; the node is a sink, and it wishes
to receive du units more flow than it sends out. If du < 0, this indicates that v
(7.48) In every undirected graph with nodes s and t, the maximum number of has a supply of -du; the node is a source, and it wishes to send out -du units
edge-disjoint s-t paths is equal to the minimum number of edges whose removal more flow than it receives. If du = 0, then the node v is neither a source nor a
separates s from t. sink. We will assume that all capacities and demands are integers.
We use S to denote the set of all nodes with negative demand and T to
denote the set of all nodes with positive demand. Although a node v in S wants
7.7 Extensions to the Maximum-Flow Problem to send out more flow than it receives, it will be okay for it to have flow that
Much of the power of the Maximum-Flow Problem has essentially nothing to enters on incoming edges; it should just be more than compensated by the flow
do with the fact that it models traffic in a network. Rather, it lies in the fact that leaves v on outgoing edges. The same applies (in the opposite direction)
that many problems with a nontrivial combinatorial search component can to the set T.
Chapter 7 Network Flow 7.7 Extensions to the Maximum-Flow Problem
380 381
Figure 7.13 (a) Aninstance of the Circulation Problem together with a solution: Numbers Thanks to (7.49), we know that
inside the nodes are demands; numbers labeling the edges are capadties and flow
values, with the flow values inside boxes. (b) The result of reducing this instance to an
equivalent instance of the Maximum-Flow Problem.
E
v:dv>O v:du<O
b~oiuminating a lower ~ to look like and to illustrate some of the most common uses of flows and cuts
nd from an edge.) in the design of efficient combinatorial algorithms. One point that will emerge
is the following: Sometimes the solution one wants involves the computation
Lower bound of 2 ~ ~
of a maximum flow, and sometimes it involves the computation of a minimum
1 3 cut; both flows and cuts are very useful algorithmic tools.
We begin with a basic application that we call survey design, a simple
version of a task faced by many companies wanting to measure customer
satisfaction. More generally, the problem illustrates how the construction used
to solve the Bipartite Matching Problem arises naturally in any setting where
we want to carefully balance decisions across a set of options--in this case,
designing questionnaires by balancing relevant questions across a population
of consumers.
(a)
Figure 7.15 (a) An instance of the Circulation Problem with lower bounds: Numbers
inside the nodes are demands, and numbers labeling the edges are capacities. We also ~ The Problem
assign a lower bound of 2 to one of the edges. (b) The result of reducing this instance A major issue in the burgeoning field of data mining is the study of consumer
to an equivalent instance of the Circulation Problem without lower bounds. preference patterns. Consider a company that sells k products and has a
database containing the purchase histories of a large number of customers.
(Those of you with "Shopper’s Club" cards may be able to guess how this data
(7.52) There is a feasible circulation in G if and only if there is a feasible gets collected.) The company wishes to conduct a survey, sending customized
circulation in G’. If all demands, capacities, and lower bounds in G are integers, questionnaires to a particular group of n of its customers, to try determining
and there is a feasible circulation, then there is a feasible circulation that is which products people like overall.
integer-valued. Here are the guidelines for designing the survey.
Proof. First suppose there is a circulation f’ in G’. Define a circulation f in G Each customer wil! receive questions about a certain subset of the
by f(e) = f’(e) + ge- Then f satisfies the capacity conditions in G, and products.
A customer can only be asked about products that he or she has pur-
fin(v) - /°ut(v) = E (ge + f’(e)) - (ge + f’(e)) = Lv + (d~ - Lv) = d~, chased.
e into v e out of v
To make each questionnaire informative, but not too long so as to dis-
so it satisfies the demand conditions in G as well.
Conversely, suppose there is a circulation f in G, and define a circulation courage participation, each customer i should be asked about a number
f’ in G’ by f’(e) = f(e) - ~e. Then f’ satisfies the capacity conditions in G’, and of products between q and c[.
Finally, to collect sufficient data about each product, there must be
(f’)in(v) - (f’)°ut(v) = E (f(e) - g.~) - if(e) - ~’e) = between pj and pj distinct customers asked about each product j.
e into v e out of v
More formally, the input to the Survey Design Problem consists of a bipartite
so it satisfies the demand conditions in G’ as well. graph G whose nodes are the customers and the products,’and there is an edge
between customer i and product j if he or she has ever purchased product j.
7.8 Survey Design Further, for each customer i = ! ..... n, we have limits ci < c[ on the number
Many problems that arise in applications can, in fact, be solved efficiently by of products he or she can be asked about; for each product j = 1 ..... k, we
a reduction to Maximum Flow, but it is often difficult to discover when such have limits pj <_ p; on the number of distinct customers that have to be asked
a reduction is possible. In the next few sections, we give several paradigmatic about it. The problem is to decide if there is a way to design a questionnaire
examples of such problems. The goal is to indicate what such reductions tend for each customer so as to satisfy all these conditions.
7.9 Airline Scheduling
Chapter 7 Network Flow 387
386
Proof. The construction above immediately suggests a way to turn a survey
Customers Products
design into the corresponding flow. The edge (i,j) will carry one unit of flow
if customer i is asked about product j in the survey, and will carry no flow
otherwise. The flow on the edges (s, i) is the number of questions asked
from customer i, the flow on the edge (j, t) is the number of customers who
were asked about product j, and finally, the flow on edge (t, s) is the overall
number of questions asked. This flow satisfies the 0 demand, that is, there is
flow conservation at every node. If the survey satisfies these rules, then the
corresponding flo~v satisfies the capacities and lower bounds.
Conversely, if the Circulation Problem is feasible, then by (7.52) there
is a feasible circulation that is integer-valued, and such an integer-valued
circulation naturally corresponds to a feasible survey design. Customer i will
be surveyed about product j if and only if the edge (i, j) carries a unit of flow.
Figure 7.16 The Survey Design Problem can be reduced to the problem of finding a
feasible circulation: Flow passes from customers (with capadW bounds indicating how
many questions they can be asked) to products (with capadW bounds indicating how
many questions should be asked about each product).
7.9 Airline Scheduling
The computational problems faced by the nation’s large airline carriers are
almost too complex to even imagine. They have to produce schedules for thou-
!¢j Designing the Algorithm sands of routes each day that are efficient in terms of equipment usage, crew
We will solve this problem by reducing it to a circulation problem on a flow allocation, customer satisfaction, and a host of other factors--all in the face
network G’ with demands and lower bounds as shown in Figure 7.!6. To obtain of unpredictable issues like weather and breakdowns. It’s not .surprising that
the graph G~ from G, we orient the edges of G from customers to products, add they’re among the largest consumers of high-powered algorithmic techniques.
nodes s and t with edges (s, i) for each customer i = 1 ..... n, edges (j, t) for Covering these computational problems in any realistic level of detail
each product ] = 1 ..... k, and an edge (t, s). The circulation in this network would take us much too far afield. Instead, we’ll discuss a "toy" problem that
will correspond to the way in which questions are asked. The flow on the edge captures, in a very clean way, some of the resource allocation issues that arise
(s, i) is the number of products included on the questionnaire for customer i, in a context such as this. And, as is common in this book, the toy problem will
so this edge will have a capacity of c~ and a lower bound of ci. The flow on the be much more useful for our purposes than the "real" problem, for the solution
edge (j, t) will correspond to the number of customers who were asked about to the toy problem involves a very genera! technique that can be applied in a
product j, so this edge will have a capacity of p~ and a lower bound of pj. Each wide range of situations.
edge (i, j) going from a customer to a product he or she bought has capacity
1, and 0 as the lower bound. The flow carried by the edge (t, s) corresponds ~ The Problem
to the overall number of questions asked. We can give this edge a capacity of Suppose you’re in charge of managing a fleet of airplanes and you’d like to
~i c’i and a lower bound of ~i q. All nodes have demand 0. create a flight schedule for them. Here’s a very simple model for this. Your
Our algorithm is simply to construct this network G’ and check whether market research has identified a set of m particular flight segments that would
it has a feasible circulation. We now formulate a claim that establishes the be very lucrative if you could serve them; flight segment j is specified by four
correctness of this algorithm. parameters: its origin ai~ort, its destination airport, its departure time, and
its arrival time. Figure 7.17(a) shows a simple example, consisting of six flight
/-.,~ Analyzing the Mgorithm segments you’d like to serve with your planes over the course of a single day:
(7.53) The graph G’ just constructed has a feasible circulation if and only (I) Boston (depart 6 A.M.) - Washington DC (arrive 7 A.M.)
(2) Philadelphia (depart 7 A.M.) - Pittsburgh (arrive 8 A.M.)
there is a feasible way to design the survey.
7.9 Airline Scheduling 389
Chapter 7 Network Flow
388
Los Angeles (depart 12 noon) - Las Vegas (! P.M.)
LAX II LAS 5 SEA 6
BOS 6 DCA 7 DCA 8
in between flights (3) and (6).
Formulating the Problem We can model this situation in a very general
way as follows, abstracting away from specific roles about maintenance times
SEA and intermediate flight segments: We will simply say that flight j is reachable
PHL 11 SFO 2 SFO
PHL 7 PIT 8 from flight i if it is possible to use the same plane for flight i, and then later
2:15 3:15
(a) for flight j as we.ll. So under our specific rules (a) and (b) above, we can
easily determine for each pair i,j whether flight j is reachable from flight
i. (Of course, one can easily imagine more complex rules for teachability.
For example, the length of maintenance time needed in (a) might depend on
LAX II -’--.. "’-...LA~S 5 the airport; or in (b) we might require that the flight segment you insert be
BOS 6 DCA 7~’~)~A 8 SE/~A6
sufficiently profitable on its own.) But the point is that we can handle any
set of rules with our definition: The input to the problem will include not just
the flight segments, but also a specification of the pairs (i, j) for which a later
flight j is reachable from an earlier flight i. These pairs can form an arbitrary
directed acyclic graph.
PHL 7 - PHL 11 ." SFO
PIT 8"’. // 2:15 The goal in this problem is to determine whether it’s possible to serve all
m flights on your original list, using at most k planes total. In order to do this,
you need to find a way of efficiently reusing planes for multiple flights.
For example, let’s go back to the instance in Figure 7.17 and assume we
Figure 7.17 (a) A small instance of our simple Airline Scheduling Problem. (b) An
expanded graph showing which flights are reachable from which others. have k = 2 planes. If we use one of the planes for flights (1), (3), and (6)
as proposed above, we wouldn’t be able to serve all of flights (2), (4), and
(5) with the other (since there wouldn’t be enough maintenance time in San
(3) Washington DC (depart 8 A.M.) - Los Angeles (arrive 1I A.M.) Francisco between flights (4) and (5)). However, there is a way to serve all six
(4) Philadelphia (depart 11 A.M.) - San Francisco (arrive 2 P.M.) flights using two planes, via a different solution: One plane serves flights (!),
(5) San Francisco (depart 2:15 P.M.) - Seattle (arrive 3:!5 P.M.) (3), and (5) (splicing in an LAX-SFO flight), while the other serves (2), (4),
(6) Las Vegas (depart 5 P.M.) - Seattle (arrive 6 P.M.) and (6) (splicing in PIT-PHL and SFO-LAS).
Note that each segment includes the times you want the flight to serve as well
as the airports. f! Designing the Algorithm
It is possible to use a single plane for a flight segment i, and then later for We now discuss an efficient algorithm that can solve arbitrary instances of
a flight segment j, provided that the Airline Scheduling Problem, based on network flow. We will see that flow
(a) the destination of i is the same as the origin of j, and there’s enough time techniques adapt very naturally to this problem.
to perform maintenance on the plane in between; or The solution is based on the following idea. Units of’flow will correspond
(b) you can add a flight segment in between that gets the plane from the to airplanes. We will have an edge for each flight, and upper and lower capacity
destination of i to the origin of j with adequate time in between. bounds of 1 on these edges to require that exactly one unit of flow crosses this
edge. In other words, each flight must be served by one of the planes. If (ui, vi)
For example, assuming an hour for intermediate maintenance time, you could
is the edge representing flight i, and (uj, vj) is the edge representing flight j,
use a single plane for flights (1), (3), and (6) by having the plane sit in and flight j is reachable from flight i, then we wil! have an edge from ui to uj
Washington, DC, between flights (!) and (3), and then inserting the flight
7.10 Image Segmentation
Chapter 7 Network Flow 391
590
We now convert this to a schedule using the same kind of construction we
with capacity 1; in this way, a unit of flow can traverse (ui, vi) and then move
saw in the proof of (7.42), where we converted a flow to a collection of paths.
directly to (uj, vj). Such a construction of edges is shown in Figure 7.17(b).
In fact, the situation is easier here since the graph has no cycles. Consider an
We extend this to a flow network by including a source and sink; we now edge (s, ai) that carries one unit of flow. It follows by conservation that (ui, vi)
give the full construction in detail. The node set of the underlying graph G is carries one unit of flow, and that there is a unique edge out of vi that carries
defined as follows. one unit of flow. If we continue in this way, we construct a path P from s to
o For each flight i, the graph G will have the two nodes ui and vi. t, so that each edge on this path carries one unit of flow. We can apply this
o G will also have a distinct source node s and sink node t. construction to each edge of the form (s, u~) carrying one unit of flow; in this
way, we produce/~’ paths from s to t, each consisting of edges that carry one
The edge set of G is defined as fo!lows. unit of flow. Now, for each path P we create in this way, we can assign a single
o For each i, there is an edge (ui, Vi) with a lower bound of ! and a capacity plane to perform all the flights contained in this path. []
of 1. (Each flight on the list mast be served.)
o For each i and j so that flight j is reachable from flight i, there is an edge
Extensions: Modeling Other Aspects of the Problem
(vi, uj) with a lower bound of 0 and a capacity of 1. (The same plane can
Airline scheduling consumes countless hours of CPU time in rea! life. We
perform flights i and j.)
o For each i, there is an edge (s, ai) with a lower bound of 0 and a capacity mentioned at the beginning, however, that our formulation here is really a
toy problem; it ignores several obvious factors that would have to be taken
of 1. (Any plane can begin the day with flight i.) into account in these applications. First of all, it ignores the fact that a given
o For eachj, there is an edge (vp t) with a lower bound of 0 and a capacity plane can only fly a certain number of hours before it needs to be temporarily
of 1. (Any plane can end the day with flight j.) taken out of service for more significant maintenance. Second, we are making
o There is an edge (s, t) with lower bound 0 and capacity k. (If we have up an optimal schedule for a single day (or at least for a single span of time) as
extra planes, we don’t need to use them for any of the flights.) though there were no yesterday or tomorrow; in fact we also need the planes
to be optimally positioned for the start of day N + 1 at the end of day N. Third,
Finally, the node s will have a demand of -k, and the node t will have a
al! these planes need to be staffed by flight crews, and while crews are also
demand of k. All other nodes will have a demand of 0. reused across multiple flights, a whole different set of constraints operates here,
Our algorithm is to construct the network G and search for a feasible since human beings and airplanes experience fatigue at different rates. And
circulation in it. We now prove the correctness of this algorithm. these issues don’t even begin to cover the fact that serving any particular flight
segment is not a hard constraint; rather, the real goal is to optimize revenue,
/~ Analyzing the Algorithm and so we can pick and choose among many possible flights to include in our
schedule (not to mention designing a good fare structure for passengers) in
(7.54) There is a way to perform all flights using at most k planes if and only
order to achieve this goal.
if there is a feasible circulation in the network G.
Ultimately, the message is probably this: Flow techniques are useful for
Proof. First, suppose there is a way to perform all flights using k’ <_ k planes. solving problems of this type, and they are genuinely used in practice. Indeed,
The set of flights performed by each individual plane defines a path P in the our solution above is a general approach to the efficient reuse of a limited set
network G, and we send one unit of flow on each such path P. To satisfl] the full of resources in many settings. At the same time, running an airline efficiently
demands at s and t, we send k - k’ units of flow on the edge (s, t). The resulting in real life is a very difficult problem.
circulation satisfies all demand, capacity, and lower bound conditions.
Conversely, consider a feasible circulation in the network G. By (7.52), 7.10 Image Segmentation
we know that there is a feasible circulation with integer flow values. Suppose
A central problem in image processing is the segmentation of an image into
that k’ units of flow are sent on edges other than (s, t). Since al! other edges
various coherent regions. For example, you may have an image representing
have a capacity bound of 1, and the circulation is integer-valued, each such
a picture of three people standing in front of a complex background scene. A
edge that carries flow has exactly one unit of flow on it.
7.10 Image Segmentation 393
Chapter 7 Network Flow
392
For each pixel i, we have a likelihood ai that it belongs to the foreground,
natural but difficult goal is to identify each of the three people as coherent and a likelihood bi that it belongs to the background. For our puI~oses, we
objects in the scene. will assume that these likelihood values are arbitrary nonnegative numbers
provided as part of the problem, and that they specify how desirable it is to
/,~ The Problem have pixe! i in the background or foreground. Beyond this, it is not crucial
One of the most basic problems to be considered along these lines is that precisely what physical properties of the image they are measuring, or how
of foreground/background segmentation: We wish to labe! each pixel in an they were determined.
image as belonging to either the foreground of the scene or the background. It In isolation, we would want to label pixel i as belonging to the foreground
turns out that a very natural model here leads to a problem that can be solved if ai > hi, and to the background otherwise. However, decisions that we
efficiently by a minimum cut computation. make about the neighbors of i should affect our decision about i. If many
Let V be the set of pixels in the underlying image that we’re analyzing. of i’s neighbors are labeled "background," for example, we should be more
We will declare certain pairs of pixels to be neighbors, and use E to denote inclined to label i as "background" too; this makes the labeling "smoother" by
the set of all pairs of neighboring pixels. In this way, we obtain an undirected minimizing the amount of foreground/background boundary. Thus, for each
graph G = (V, E). We will be deliberately vague on what exactly we mean by pair (i,]) of neighboring pixels, there is a separation penalty Pij >-- 0 for placing
a "pixel;’ or what we mean by the "neighbor" relation. In fact, any graph one of i or j in the foreground and the other in the background.
G will yield an efficiently solvable problem, so we are free to define these We can now specify our Segmentation Problem precisely, in terms of the
notions in any way that we want. Of course, it is natural to picture the pixels likelihood and separation parameters: It is to find a partition of the set of pixels
as constituting a grid of dots, and the neighbors of a pixel to be those ~hat are into sets A and B (foreground and background, respectively) so as to maximize
directly adiacent to it in this grid, as shown in Figure 7.18(a).
(i,j)~E
IAr~{i,j}l=l
Thus we are rewarded for having high likelihood values and penalized for
having neighboring pairs (i,j) with one pixel in A and the other in B. The
problem, then, is to compute an optimal labeling--a partition (A, B) that
maximizes q(A, B).
(i,13aE
Thus we see that the maximization of q(A, B) is the same problem as the
minimization of the quantity
Pij.
j~B b.
IANIi,I|I=I
As for the missing source and the sink, we work by analogy with our con-
structions in previous sections: We create a new "super-source" s to represent
the foreground, and a new "super-sink" t to represent the background. This
also gives us a way to deal with the values ai and bi that reside at the nodes
(whereas minimum cuts can only handle numbers associated with edges).
Specifically, we will attach each of s and t to every pixel, and use a~ and bi to
.i define appropriate capacities on the edges between pixel i and the source-and Figure 7.19 An s-~ cut on a graph constructed from four pLxels. Note how the three
sink respectively. types of terms in the expression for q’(A, B) are captured by the cut.
Finally, to take care of the undirected edges, we model each neighboring
pair (i, j) with two directed edges, (i, j) and (], i), as we did in the undirected
Disjoint Paths Problem. We will see that this works very well here too, since If we add up the contributions of these three kinds of edges, we get
in any s-t cut, at most one of these two oppositely directed edges Can cross
from the s-side to the t-side of the cut (for if one does, then the other must go P~
(i,])~
IAnli,jll~l
from the t-side to the s-side).
Specifically, we define the following flow network G’ = (V’, Et) shown in =q(,B).
’A
Figure 7.18(b). The node set Vt consists of the set V of pixels, together with So everything fits together perfectly. The flow network is set up so that the
two additional nodes s and t. For each neighboring pair of pixels i and j, we capacity of the cut (A, B) exactly measures the quantity q’(A, B): The three
add directed edges (i,j) and (j, i), each with capacity p~j. For each pixel i, we kinds of edges crossing the cut (A, B), as we have just defined them (edges
add an edge (s, i) with capacity a~ and an edge (i, t) with capacity hi. from the source, edges to the sink, and edges involving neither the source nor
Now, an s-t cut (A, B) corresponds to a partition of the pixels into sets A the sink), correspond to the three kinds of terms in the expression for q’(A, B).
and B. Let’s consider how the capacity of the cut c(A, B) relates to the quantity Thus, if we want to minimize q’(A, B) (since we have argued earlier that
q~(A, B) that we are trying to minimize. We can group the edges that cross the this is equivalent to maximizing q(A, B)), we just have to find a cut of minimum
cut (A, B) into three natural categories. capaciW. And this latter problem, of course, is something that we know how
o Edges (s, j), where j ~ B; this edge contributes aj to the capacity of the to solve efficiently.
cut. Thus, through solving this minimum-cut problem, we have an optimal
o Edges (i, t), where i ~ A; this edge contributes bi to the capacity of the algorithm in our model of foreground/background segmentation.
cut.
o Edges (i,)) where i ~ A andj ~ B; this edge contributes Pii to the capacity (7.55) The solution to the Segmentation Problem can be obtained by a
minimum-cut algorithm in the graph G’ constructed above. For a minimum
of the cut. cut (A’, B~), the partition (A, B) obtained by deleting s* and t* maximizes the
Figure 7.19 illustrates what each of these three kinds of edges looks like relative segmentation value q(A, B).
to a cut, on an example with four pixels.
7.11 Project Selection 397
Chapter 7 Network Flow
396
The Project Selection Problem is to select a feasible set of projects with maxi-
7.11 Project Selection mum profit.
Large (and small) companies are constantly faced with a balancing act between
This problem also became a hot topic of study in the mining literature,
projects that can yield revenue, and the expenses needed for activities that can
starting in the early 1960s; here it was called the Open-Pit Mining Problem.3
support these projects. Suppose, for example, that the telecommunications
Open-pit mining is a surface mining operation in which blocks of earth are
giant CluNet is assessing the pros and cons of a project to offer some new
extracted from the surface to retrieve the ore contained in them. Before the
type of high-speed access service to residential customers. Marketing research
mining operation begins, the entire area is divided into a set P of blocks,
shows that the service will yield a good amount of revenue, but it must be
and the net value. Pi of each block is estimated: This is the value of the ore
weighed against some costly preliminary projects that would be needed in
minus the processing costs, for this block considered in isolation. Some of
order to make this service possible: increasing the fiber-optic capacity in the
these net values will be positive, others negative. The full set of blocks has
core of their network, and buying a newer generation of high-speed routers.
precedence constraints that essentially prevent blocks from being extracted
What makes these types of decisions particularly tricky is that they interact before others on top of them are extracted. The Open-Pit Mining Problem is to
in complex ways: in isolation, the revenue from the high-speed access service determine the most profitable set of blocks to extract, subject to the precedence
might not be enough to justify modernizing the routers; hotvever, once the constraints. This problem falls into the framework of project selection--each
company has modernized the ronters, they’ll also be in a position to pursue block corresponds to a separate project.
a lucrative additional project with their corporate customers; and maybe .this
additional project will tip the balance. And these interactions chain together:
the corporate project actually would require another expense, but this ii~ ~ Designing the Algorithm
turn would enable two other lucrative projects--and so forth. In the end, the Here we will show that the Project Selection Problem can be solved by reducing
question is: Which projects should be pursued, and which should be passed it to a minimum-cut computation on an extended graph G’, defined analogously
up? It’s a basic issue of balancing costs incurred with profitable opportunities to the graph we used in Section 7.10 for image segmentation. The idea is to
construct G’ from G in such a way that the source side of a minimum cut in
that are made possible.
G’ will correspond to an optimal set of projects to select.
To form the graph G’, we add a new source s and a new sink t to the graph
G as shown in Figure 7.20. For each node i ~ P with pi > 0, we add an edge
~ The Problem (s, i) with capacity pi. For each node i ~ P with pi < 0, we add an edge (i, t)
Here’s a very general framework for modeling a set of decisions such as this. with capacity -p~. We will set the capacities on the edges in G later. However,
There is an underlying set P of projects, and each proiect i E P has an associated we can already see that the capacity of the cut ([s}, P U {t}) is C =
revenue p~, which can either be positive or negative. (In other words, each so the maximum-flow value in this network is at most C.
of the lucrative opportunities and costly infrastructure-building steps in our
We want to ensure that if (A’, B’) is a minimum cut in this graph, then
example above will be referred to as a separate proiect.) Certain proiects are
A = A’- {s} obeys the precedence constraints; that is, if the node i E A has
prerequisites for other proiects, and we model this by an underlying directed
an edge (i, j) ~ E, then we must have j ~ A. The conceptually cleanest way
acyclic graph G = (P, E). The nodes of G are the projects, and there is an edge
to ensure this is to give each of the edges in G capacity of oo. We haven’t
(i,j) to indicate that project i can only be selected if project j is selected as
previously formalized what an infinite capacity would mean, but there is no
well. Note that a project f can have many prerequisites, and there can be many
problem in doing this: it is simply an edge for which the capacity condition
projects that have project j as one of their prerequisites. A set of projects A c_ p
imposes no upper bound at all. The algorithms of the previous sections, as well
is feasible if the prerequisite of every project in A also belongs to A: for each
as the Max-Flow Min-Cut Theorem, carry over to handle infinite capacities.
i E A, and each edge (i, j) ~ E, we also have j ~ A. We will refer to requirements
However, we can also avoid bringing in the notion of infinite capacities by
of this form as precedence constraints. The profit of a set of projects is defined
to be
3 In contrast to the field of data mining, which has motivated several of the problems we considered
profit(A) = ~ P~- earlier, we’re talking here about actual mining, where you dig things out of the ground.
tEA
Chapter 7 Network Flow 7.11 Project Selection 399
398
(7.56) The capacity of the cut (A’, B’), as defined from a project set A
ects with
satisfying the precedence constraints, is c(A’, B’) = C - ~i~A Pi.
Proof. Edges of G’ can be divided into three categories: those corresponding
to the edge set E of G, those leaving the source s, and those entering the sink
t. Because A satisfies the precedence constraints, the edges in E do not cross
Projects the cut (A’, B’), and hence do not contribute to its capacity. The edges entering
Projects
the sink t contribute
-Pi
i~A and pi<O
to the capacity of the cut, and the edges leaving the source s contribute
Pi.
i¢A and pi>O
Using the definition of C, we can rewrite this latter quantity as C-
~i~A and pi>0 Pi" The capacity of the cut (A’, B’) is the sum of these two terms,
which is
subset of
Projects with
proiects
positive value i~A and pi<O i~A and pi>O i~A
as claimed. []
Figure 7.20 The flow graph used to solve the Project Selection Problem. A possible Next, recall that edges of G have capacity more than C = Y~,i~P:pi>0 Pi, and
minimum-capacity cut is shown on the right.
so these edges cannot cross a cut of capacity at most C. This implies that such
cuts define feasible sets of projects.
(7.57) If (A’, B’) is a cut with capacity at most C, then the set A = A’- {s}
simply assigning each of these edges a capacity that is "effectively infinite:’ In satisfies the precedence constraints.
our context, giving each of these edges a capacity of C + 1 would accomplish
this: The maximum possible flow value in G’ is at most C, and so no minimum Now we can prove the main goal of our construction, that the minimum
cut can contain an edge with capacity above C. In the description below, it cut in G’ determines the optimum set of projects. Putting the previous two
will not matter which of these options we choose. claims together, we see that the cuts (A’, B’) of capacity at most C are in one-
We can now state the algorithm: We compute a minimum cut (A’, B’) in to-one correspondence with feasible sets of project A = A’- {s}. The capacity
G’, and we declare A’-{s} to be the optimal set of projects. We now turn to of such a cut (A’, B’) is
proving that this algorithm indeed gives the optimal solution. c(A’, B’) ---- C -- profit(A).
The capacity value C is a constant, independent of the cut (A’, B’), so the cut
with minimum capacity corresponds to the set of projects A with maximum
/.~ Analyzing the Algorithm profit. We have therefore proved the following.
First consider a set of projects A that satisfies the precedence constraints.Let
A’ = A U {s} and B’ = (P-A) U {t}, and consider the s-t cut (A’, B’). If the set
A satisfies the precedence constraints, then no edge (i, j) E E crosses this cut,
as shown in Figure 7.20. The capacity of the cut can be expressed as follows.
7.12 Baseball Elimination 401
Chapter 7 Network Flow
400
have to play gx~, games against one another. Finally, we are given a specific
7.12 Baseball Elimination team z.
Over on the radio side the producer’s saying, "See that thing in the We will use maximum-flow techniques to achieve the following two things.
paper last week about Einstein?... Some reporter asked him to figure First, we give an efficient algorithm to decide whether z has been eliminated
out the mathematics of the pennant race. You know, one team wins so from first place--or, to put it in positive terms, whether it is possible to choose
many of their remaining games, the other teams win this number or outcomes for all the remaining games in such a way that the team z ends with
that number. What are the myriad possibilities? Who’s got the edge?" at least as many wins as every other team in S. Second, we prove the following
clean characterization theorem for baseball elimination--essentially, that there
"The hell does he know?"
is always a short "proof" when a team has been eliminated.
"Apparently not much. He picked the Dodgers to eliminate the
Giants last Friday.’" (7.59) Suppose that team z has indeed been eliminated. Then there exists a
--Don DeLillo, Underworld "proof; of this fact of the following form:
z can finish with at most m wins.
/..~ The Problem There is a set of teams T c_ S so that
Suppose you’re a reporter for the Algorithmic Sporting News, and the following
situation arises late one September. There are four basebal! teams trying to ~ wx + ~ gx~, > mlT[.
x,y~T
finish in first place in the American League Eastern Division; let’s call them
New York, Baltimore, Toronto, and Boston. Currently, each team has the (And hence one of the teams in T must end with strictly more than m
following number of wins: wins.)
New York: 92, Baltimore: 91, Toronto: 91, Boston: 90. As a second, more complex illustration of how the averaging argument in
There are five games left in the season: These consist of all possible pairings (7.59) works, consider the following example. Suppose we have the same four
of the four teams above, except for New York and Boston. teams as before, but now the current number of wins is
The question is: Can Boston finish with at least as many wins as every New York: 90, Baltimore: 88, Toronto: 87, Boston: 79.
other team in the division (that is, finish in first place, possibly in a fie)?
The remaining games are as follows. Boston still has four games against each
If you think about it, you realize that the answer is no. One argument is of the other three teams. Baltimore has one more game against each of New
the following. Clearly, Boston must win both its remaining games and New York and Toronto. And finally, New York and Toronto still have six games left
York must lose both its remaining games. But this means that Baltimore and
to play against each other. Clearly, things don’t !ook good for Boston, but is it
Toronto will both beat New York; so then the winner of the Baltimore-Toronto actually eliminated?
game will end up with the most wins.
The answer is yes; Boston has been eliminated. To see this, first note
Here’s an argument that avoids this kind of cases analysis. Boston can that Boston can end with at most 91 wins; and now consider the set of teams
finish with at most 92 wins. Cumulatively, the other three teams have 274 T = {New York, Toronto}. Together New York and Toronto already have 177
wins currently, and their three games against each other will produce exactly
wins; their six remaining games will result in a total of 183; and !~ > 91.
three more wins, for a final total of 277. But 277 wins over three teams means This means that one of them must end up with more than 91 wins, and so
that one of them must have ended up with more than 92 wins.
Boston can’t finish in first. Interestingly, in this instance the set of all three
So now you might start wondering: (i) Is there an efficient algorithm teams ahead of Boston cannot constitute a similar proof: All three teams taken
to determine whether a team has been eliminated from first place? And (ii) togeher have a total of 265 wins with 8 games left among them; this is a total
whenever a team has been eliminated from first place, is there an "averaging" of 273, and 273
-T- = 91 -- not enough by itself to prove that Boston couldn’t end
argument like this that proves it? up in a multi-way tie for first. So it’s crucial for the averaging argument that we
In more concrete notation, suppose we have a set S of teams, and for each
choose the set T consisting just of New York and Toronto, and omit Baltimore.
x ~ S, its current number of wins is wx. Also, for two teams x, y ~ S, they still
7.12 Baseball Elimination 403
Chapter 7 Network Flow
402
give the edge (vx, t) a capacity of m - wx. Finally, an edge of the form (Ux~,
~ Designing and Analyzing the Algorithm should have at least gx3, units of capacity, so that it has the ability to transport
We begin by constructing a flow network that provides an efficient algorithm a!l the wins from ux3, on to vx; in fact, our analysis will be the cleanest if we
for determining whether z has been eliminated. Then, by examining the give it infinite capacity. (We note that the construction stil! works even if this
minimum cut in this network, we will prove (7.59). edge is given only gx7 units of capacity, but the proof of (7.59) will become a
Clearly, if there’s any way for z to end up in first place, we should have little more complicated.)
z win all its remaining games. Let’s suppose that this leaves it with m wins. Now, if there is a flow of value g*, then it is possible for the outcomes
We now want to carefully a~ocate the wins from all remaining games so that of all remaining games to yield a situation where no team has more than m
no other team ends with more than m wins. Allocating wins in this way can wins; and hence,’if team z wins all its remaining games, it can still achieve at
be solved by a maximum-flow computation, via the following basic idea. We least a tie for first place. Conversely, if there are outcomes for the remaining
have a source s from which all wins emanate. The ith win can pass through games in which z achieves at least a tie, we can use these outcomes to define
one of the two teams involved in the ith game. We then impose a capacity a flow of value g*. For example, in Figure 7.21, which is based on our second
constraint saying that at most m - wx wins can pass through team x. example, the indicated cut shows that the maximum flow has value at most
More concretely, we construct the following flow network G, as shown in 7, whereas g* = 6 + 1 + 1 = 8.
Figure 7.21. First, let S’ = S - {z}, and let g* = ~x,y~s’ gx~-the total number In summary, we have shown
of games left between all pairs of teams in S’. We include nodes s and t, a
node vx for each team x ~ S’, and a node Uxy for each pair of teams x, y ~ S’ (7.60) Team z has been eliminated if and only if the maximum flow in G
with a nonzero number of games left to play against each other. We h~ive the has value strictly less than g*. Thus we can test in polynomial time whether z
following edges. has been eliminated.
o Edges (s, uxy) (wins emanate from s);
o Edges (ux~, Vx) and (ux~, vy) (only x or y can win a game that they play
Characterizing When a Team Has Been Eliminated
against each other); and
Our network flow construction can also be used to prove (7.59). The idea is that
o Edges (vx, t) (wins are absorbed at t).
the Max-Flow Min-Cut Theorem gives a nice "if and only if" characterization
Let’s consider what capacities we want to place on these edges. We want for the existence of flow, and if we interpret this characterization in terms
wins to flow from s to uxy at saturation, so we give (s, u~y) a capacity of of our application, we get the comparably nice characterization here. This
We want to ensure that team x cannot win more than m - wx games, so we illustrates a general way in which one can generate characterization theorems
for problems that are reducible to network flow.
he set T= {NY, Toronto}
~roves Boston is
liminated.
Proof of (7.59). Suppose that z has been eliminated from first place. Then
the maximum s-t flow in G has value g’ < g*; so there is an s-t cut (A, B) of
capacity g’, and (A, B) is a minimum cut. Let T be the set of teams x for which
vx ~ A. We will now prove that T can be used in the "averaging argument" in
(7.59).
First, consider the node u.t:~, and suppose one of x or y is not in T, but
, ux3, ~ A. Then the edge (Ux~, vx) would cross from A into B, and hence the
cut (A, B) would have infinite capacity. This contradicts the assumption that
(A, B) is a minimum cut of capacity less than g*. So if one of x or y is not in
T, then ux3, ~ B. On the other hand, suppose both x and y be!ong to T, but
Balt-Tor Bait ux3, ~ B. Consider the cut (A’, B’) that we would obtain by adding u~ to the set
A and deleting it from the set B. The capacity of (A’, B’) is simply the capacity
Figure 7.21 The flow network for the second example. As the minimum cut indicates,
there is no flow of value g*, and so Boston has been eliminated. of (A, B), minus the capacity g~ of the edge (s, uxy)--for this edge (s, Uxy) used
7.13 A Further Direction: Adding Costs to the Matching Problem
404 Chapter 7 Network Flow 405
to cross from A to B, and now it does not cross from A’ to B’. But since gx~ > O, ~ The Problem
this means that (A’, B’) has smaller capacity than (A, B), again contradicting A natural way to formulate a problem based on this notion is to introduce
our assumption that (A, B) is a minimum cut. So, if both x and y belong to T, costs. It may be that we incur a certain cost to perform a given job on a given
then Uxy ~ A. machine, and we’d like to match jobs with machines in a way that minimizes
Thus we have established the following conclusion, based on the fact that the total cost. Or there may be n fire trucks that must be sent to n distinct
(A, B) is a minimum cut: uxy ~ A if and only if both x, y ~ T. houses; each house is at a given distance from each fire station, and we’d
like a matching that minimizes the average distance each truck drives to its
Now we just need to work out the minimum-cut capacity c(A, B) in terms
associated house. In short, it is very useful to have an algorithm that finds a
of its constituent edge capacities. By the conclusion in the previous paragraph,
perfect matching "of minimum total cost.
we know that edges crossing from A to B have one of the following two forms:
Formaliy, we consider a bipartite graph G = (V, E) whose- node set, as
o edges of the form (vx, t), where x ~ T, and usual, is partitioned as V = X U Y so that every edge e ~ E has one end in X
o edges of the form (s, Uxy), where at least one of x or y does not belong and the other end in Y. Furthermore, each edge e has a nonnegafive cost ce.
to r (in other words, {x, y} ~ r). For a matching M, we say that the cost of the matching is the total cost of a!l
edges in M, that is, cost(M) = ~e~v~ Ce" The Minimum-Cost Perfect Matching
Thus we have Problem assumes that IXI = IYI = n, and the goal is to find a perfect matching
of minimum cost.
x~T {x,y}~T
Why are such prices useful? Intuitively, compatible prices suggest that the
matching is cheap: Along the matched edges reward equals cost, while on
all other edges the reward is no bigger than the cost. For a partial matching,
this may not imply that the matching has the smallest possible cost for its
size (it may be taking care of expensive jobs). However, we claim that if M
is any matching for which there exists a set of compatible prices, then GM
has no negative cycles. For a perfect matching M, this will imply that M is of
minimum cost by (7.63).
To see why GM can have no negative cycles, we extend the definition of
Figure 7.22 A matching M (the dark edges), and a residual graph used to increase the
reduced cost to edges in the residual graph by using the same expression size of the matching.
~ =p(v)+ ce -p(w) for any edge e = (v, w). Observe that the definition
of compatible prices implies that all edges in the residual graph GM have
nonnegafive reduced costs. Now, note that for any cycle C, we have To get some intuition on how to do this, consider an unmatched node x
with respect to a matching M, and an edge e = (x, y), as shown in Figure 7.22.
cost(c) = E = E If the new matching M’ includes edge e (that is, if e is on the augmenting
eEC
path we use to update the matching), then we will want to have the reduced
since all the terms on the right-hand side corresponding to prices cancel out. cost of this edge to be zero. However, the prices p we used with matching M
We know that each term on the right-hand side is nonnegafive, and so clearly may result in a reduced cost ~ > 0 -- that is, the assignment of person x to
cost(C) is nonnegafive. job y, in our economic interpretation, may not be viewed as cheap enough.
There is a second, algorithmic reason why it is usefnl to have prices on We can arrange the zero reduced cost by either increasing the price p(y)
the nodes. When you have a graph with negative-cost edges but no negative reward) by ~, or by decreasing the price p(x) by the same amount. To keep
cycles, you can compute shortest paths using the Bellman-Ford Algorithm in prices nonnegative, we will increase the price p(y). However, node y may be
O(mn) time. But if the graph in fact has no negative-cost edges, then you can matched in the matching M to some other node x’ via an edge e’ = (x’, y), as
use Diikstra’s Algorithm instead, which only requires time O(m log n)--almost shown in Figure 7.22. Increasing the reward p(y) decreases the reduced cost
a full factor of n faster. of edge e’ to negative, and hence the prices are no longer compatible. To keep
In our case, having the prices around allows us to compute shortest paths things compatible, we can increase p(x’) by the same amount. However, this
with respect to the nonnegafive reduced costs ~, arriving at an equivalent change might cause problems on other edges. Can we update all prices and
answer. Indeed, suppose we use Diikstra’s Algorithm to find the minimum keep the matching and the prices compatible on all edges? Surprisingly, this
cost dp,M(U) of a directed path from s to every node u ~ X U Y subje.ct to the can be done quite simply by using the distances from s to all other nodes
costs ~. Given the minimum costs dp,M(Y) for an unmatched node y ~ Y, the computed by Dijkstra’s Algorithm.
(nonreduced) cost of the path from s to t through y is dp,M(Y) + P(Y), and so (7.6S) Let M be a matching, let p be compatible prices, and let M’ be a
we find the minimum cost in O(n) additional time. In summary, we have the matching obtained by augmenting along the minimum-cost path from s to t.
following fact. Then p’ (v) = dp,M(V) + p(v) is a compatible set of prices for M’.
(7.64) Let M be a matching, and p be compatible prices. We can use one Proof. To prove compatibility, consider first an edge e =,(x’, y) ~ M. The only
run of Dijkstra’s Algorithm and O(n ) extra time to find the minimum-cost path edge entering x’ is the directed edge 0~, x’), and hence dp,M(x’) = dp,M(y) --
from s to t. ~, where ~=p(y)+ce -p(x’), and thus we get the desired equation on
such edges. Next consider edges (x, y) in M’-M. These edges are along the
Updating the Node Prices We took advantage of the prices to improve one
iteration of the algorithm. In order to be ready for the next iteration, we need minimum-cost path from s to t, and hence they satisfy dp,M(y) = dp,M(X) + ~
as desired. Finally, we get the required inequality for all other edges since all
not only the minimum-cost path (to get the next matching), but also a way to
produce a set of compatible prices with respect to the new matching. edges e = (x,y) ~M must satisfy dp,M(y) < dp,M(X) + ~. []
Solved Exercises 411
Chapter 7 Network Flow
410
Finally, we have to consider how to initialize the algorithm, so as to get it perfect matching M and house prices P are in equilibrium if, for all edges
(x, y) ~ M and all other houses y’, we have
underway. We initialize M to be the empty set, define p(x) = 0 for all x ~ X,
and define p(y), for y a Y, to be the minimum cost of an edge entering y. Note v(x, y) - P(~) > v(x, y’) - P(y’).
that these prices are compatible with respect to M = ¢.
We summarize the algorithm below. But can we find a perfect matching and a set of prices so as to achieve this
state of affairs, with every buyer ending up happy? In fact, the minimum-cost
perfect matching and an associated set of compatible prices provide exactly
Start with M equal to the empty set what we’re lookin.g for.
Define p(x)=0 for x~X, and p(y)--- win ce for y~Y
e into y
While M is not a perfect matching (7,67) LetM be aperfect matchingofminimum cost, where ce = ~v(x, y) for
Find a minimum-cost s-[ path P in GM using (7.64) with prices p each edge e ~ (x, y), and let p be a compatible set of prices, Then the matching
Augment along P to produce a new matching Mr M and the set ofprices {P(y) = -p(y):y ~ Y} are in equilibrium,
Find- a set of compatible prices with respect to Mr via (7.65)
Endwhile
Proof. Consider an edge e = (x, y) ~ M, and let e’ = (x, y’). Since M and p are
compatible, we have p(x) + ce = p(y) and p(x) + ce, > p(y’). Subtracting these
The final set of compatible prices yields a proof that GM has no negative two inequalities to cance! p(x), and substituting the values of p and c, we get
cycles; and by (7.63), this implies that M has minimum cost. the desired inequality in the definition of equilibrium. []
(7.66) The minimum-cost perfect matching can be found in the time required
i Solved Exercises
Solved Exercise 1
Suppose you are given a directed graph G = (V, E), with a positive integer
Extensions: An Economic Interpretation of the Prices capacity ce on each edge e, a designated source s ~ V, and a designated sink
To conclude our discussion of the Minimum-Cost Perfect Matching Problem, t ~ V. You are also given an integer maximum s-t flow in G, defined by a flow
we develop the economic interpretation of the prices a bit further. We consider value fe on each edge e.
the following scenario. Assume X is a set of n people each looking to buy a Now suppose we pick a specific edge e ~ E and increase its capacity by
house, and Y is a set of n houses that they are all considering. Let v(x, y) denote one unit. Show how to find a maximum flow in the resulting capacitated graph
the value of house y to buyer x. Since each buyer wants one of the houses, in time O(m + n), where m is the number of edges in G and n is the number
one could argue that the best arrangement would be to find a perfect matching of nodes.
M that maximizes ~(x,y)~4 v(x, y). We can find such a perfect matching by
using our minimum-cost perfect matching algorithm with costs ce = -v(x, y) Solution The point here is that O(m + n) is not enough time to compute a
new maximum flow from scratch, so we need to figure out how to use the flow
if e = (x, y).
f that we are given. Intuitively, even after we add 1 to the capacity of edge e,
The question we will ask now is this: Can we convince these buyers to the flow f can’t be that far from maximum; after all, we haven’t changed the
buy the house they are allocated? On her own, each buyer x would want to network very much.
buy the house y that has maximum value v(x, y) to her. How can we convince
her to buy instead the house that our matching M al!ocated~. We will use prices In fact, it’s not hard to show that the maximum flow value can go up by
to change the incentives of the buyers. Suppose we set a price P(y) for each at most 1.
house y, that is, the person buying the house y must pay P(Y). With these
(7.68) Consider the flow network G’ obtained by adding 1 to the capacity of
prices in mind, a buyer will be interested in buying the house with maximum
net value, that is, the house y that maximizes v(x, y) -P(Y). We say that a e. The value of the maximum flow in G’ is either v(f) or v(f) + 1.
Solved Exercises 413
Chapter 7 Network Flow
412
For a given parameter c, each doctor should be assigned to work at most
ProoL The value of the maximum flow in G’ is at least v(f), since 1: is still a c vacation days total, and only days when he or she is available.
feasible flow in this network. It is also integer-valued. So it is enough to show
For each vacation period j, each doctor should be assigned to work at
that the maximum-flow value in G’ is at most v(]~) + 1. most one of the days in the set Dj. (In other words, although a particular
By the Max-Flow Min-Cnt Theorem, there is some s-t cut (A, B) in the doctor may work on several vacation days over the course of a year, he or
original flow network G of capacity v(]:). Now we ask: What is the capacity of she should not be assigned to work two or more days of the Thanksgiving
(A, B) in the new flow network G’? All the edges crossing (A, B) have the same weekend, or two or more days of the July 4th weekend, etc.)
capacity in G’ that they did in G, with the possible exception of e (in case e
The algorithm should either return an assignment of doctors satisfying these
crosses (A, B)). But ce only increased by 1, and so the capacity of (A, B) in the
constraints or report (correctly) that no such assignment exists.
new flow network G’ is at most v t) + 1. []
Solution This is a very natural setting in which to apply network flow, since
at a high level we’re trying to match one set (the doctors) with another set
Statement (7.68) suggests a natural algorithm. Starting with the feasible
(the vacation days). The complication comes from the requirement that each
flow 1~ in G’, we try to find a single augmenting path from s to t in the residual
doctor can work at most one day in each vacation period.
graph G}. This takes time O(m + n). Now one of two things will happen. Either
So to begin, let’s see how we’d solve the problem without that require-
we will fai! to find an augmenting path, and in this case we know that/~ is
a maximum flow. Otherwise the angmentation succeeds, producing a flow f’ ment, in the simpler case where each doctor i has a set Si of days when he or
of value at least ~(/~) + 1. In this case, we know by (7.68) that f’ must be a she can work, and each doctor should be scheduled for at most c days total.
maximum flow. $o either way, we produce a maximum flow after a single The construction is pictured in Figure 7.23 (a). We have a node ui representing
each doctor attached to a node ve representing each day when he or she can
augmenting path compntation.
Holidays Holidays
Solved Exercise 2
Gadgets
You are helping the medical consulting firm Doctors Without Weekends set up
Doctors Doctors
the work schedules of doctors in a large hospital. They’ve got the regular dally
schedules mainly worked out. Now, however, they need to deal with all the
special cases and, in particular, make sure that they have at least one doctor
covering each vacation day. Source Sink Source Sink
Here’s how this works. There are k vacation periods (e.g., the week of
Christmas, the July 4th weekend, the Thanksgiving weekend .... ), each
spanning several contiguous days. Let Dj be the set of days included in the
jth vacation period; we will refer to the union of all these days, UjDj, as the set
of all vacation days.
There are n doctors at the hospital, and doctor i has a set of vacation days
Si when he or she is available to work. (This may include certain days from a
given vacation period but not others; so, for example, a doctor may be able to
(a) (b)
work the Friday, Saturday, or Sunday of Thanksgiving weekend, but not the
Thursday.) Figure 7.23 (a) Doctors are assigned to holiday days without restricting how many
days in one holiday a doctor can work. (b) The flow network is expanded with "gadgets"
Give a polynomial-time algorithm that takes this information and deter- that prevent a doctor from working more than one day fTom each vacation period. The
mines whether it is possible to select a single doctor to work on each vacation shaded sets correspond to the different vacation periods.
day, subject to the following constraints.
Chapter 7 Network Flow Exercises 415
414
work; this edge has a capacity of 1. We attach a super-source s to each doctor Exercises
node ui by an edge of capacity c, and we attach each day node ve to a super-
1. (a) List a~ the minimum s-t cuts in the flow network pictured in Fig-
sink t by an edge with upper and lower bounds of 1. This way, assigned days
can "flow" through doctors to days when they can work, and the lower bounds ure 7.24. The capacity of each edge appears as a label next to the
edge.
on the edges from the days to the sink guarantee that each day is covered. Fi-
na!ly, suppose there are d vacation days total; we put a demand of +d on the (b) What is the minimum capacity of an s-t cut in the flow network in
sink and -d on the source, and we look for a feasible circulation. (Recall that Figure 7.25? Again, the capacity of each edge appears as a label next
once we’ve introduced lower bounds on some edges, the algorithms in the text to the edge.
Figure 7.24 What are the
are phrased in terms of circulations with demands, not maximum flow.) m2mimum s-t cuts in this flow
But now we have to handle the extra requirement, that each doctor can Figure 7.26 shows a flow network on which an s-t flow has been computed. network?
work at most one day from each vacation period. To do this, we take each pair The capacity of each edge appears as a label next to the edge, and the
(i,]) consisting of a doctor i and a vacation period j, and we add a "vacation numbers in boxes give the amount of flow sent on each edge. (Edges
gadget" as follows. We include a new node wq with an incoming edge of without boxed numbers--specifically, the four edges of capacity 3--have
capacity 1 from the doctor node ui, and with outgoing edges of capacity 1 to no flow being sent on them.)
each day in vacation period ] when doctor i is available to work. This gadget (a) What is the value of this flow? Is this a maximum (s,t) flow in this
serves to "choke off" the flow from ai into the days associated with vacation graph?
period ], so that at most one unit of flow can go to them collectively. The (b) Find a minimum s-t cut in the flow network pictured in Figure 7.26,
construction is pictured in Figure 7.23 (b). As before, we put a demand of +d and also say what its capacity is.
on the sink and -d on the source, and we look for a feasible circulation. The Figure 7.25 What is the min-
total running time is the time to construct the graph, which is O(nd), plus the imum capacity of an s-t cut in
3. Figure 7.27 shows a flow network on which an s-t flow has been computed.this flow network?
time to check for a single feasible circulation in this graph. The capacity of each edge appears as a label next to the edge, and the
The correctness of the algorithm is a consequence of the following claim. numbers in boxes give the amount of flow sent on each edge. (Edges
without boxed numbers have no flow being sent on them.)
(7.69) There is a way to assign doctors to vacation days in a way that respects
all constraints if and only if there is a feasible circulation in the flow network (a) What is the value of this flow? Is this a maximum (s,t) flow in this
we have constructed. graph?
10
Figure 7.28 The floor plan in (a) is ergonomic, because we can wire switches t6 fixtures
Figure 7.27 What is the value of the depicted flow? Is it a maximum flow? What isthe in such a way that each fLxture is visible from the switch that controls it. (This can be
minimum cut? done by wiring switch 1 to a, switch 2 to b, and switch 3 to c.) The floor plan in (b) is not
ergonomic, because no such wiring is possible.
(b) Find a ~um s-t cut in the flow network pictured in Figure 7.27,
and also say what its capacity is. Sometimes this is possible and sometimes it isn’t. Consider the two
simple floor plans for houses in Figure 7.28. There are three light fixtures
4. Decide whether you think the folloxomg statement is true or false. If it is
true, give a short explanation. If it is false, give a cotmterexample. (labeled a, b, c) and three switches (labeled 1, 2, 3). It is possible to wire
switches to fLxtures in Figure 7.28(a) so that every switcd~ has a line of
Let G be an arbitrary flow network, with a source s, a sink t, and ;a positive
sight to the fixture, but this is not possible in Figure 7.28(b).
integer capacity ce on every edge e. If f is a maximum s-t flow in G, then f
Let’s call a floor plan, together with n light fixture locations and n
saturates every edge out of s with flow (i.e., for all edges e out of s, we have
switch locations, ergonomic if it’s possible to wire one switch to each
f(e) = ce). fixture so that every fixture is visible from the switch that controls it.
5. Decide whether you think the following statement is true or false. If it is A floor plan will be represented by a set of m horizontal or vertical
true, give a short explanation. If it is false, give a counterexample. line segments in the plane (the walls), where the ith wall has endpoints
Let G be an arbitrary flow network, with a source s, a sink t, and a positive (xi, yi), (x~, y~). Each of the n switches and each of the n fLxtures is given by
its coordinates in the plane. A fixture is visible from a sw~tch if the line
integer capacity ce on every edge e; and let (A, B) be a mimimum s-t cut with
segment joining them does not cross any of the walls.
respect to these capaci.ties {ce : e ~ E}. Now suppose we add 1 to eve’ry capacity;
then (A, B) is still a minimum s-t cut with respect to these new capacities Give an algorithm to decide if a given floor plan is ergonomic. The
{l+ce:e~E}. running time should be polynomial in m and n. You may assume that you
have a subroutine with O(1) running time that takes two line segments as
6. Suppose you’re a consultant for the Ergonomic Architecture Commission, input and decides whether or not they cross in the plane.
and they come to you with the following problem.
They’re really concerned about designing houses that are "user- Consider a set of mobile computing clients in a certain town who each
friendly," and they’ve been having a lot of trouble with the setup of light need to be connected to one of several possible base stations. We’ll
fixtures and switches in newly designed houses. Consider, for example, suppose there are n clients, with the position of each client specified
a one-floor house with n light fixtures and n locations for light switches by its (x, y) coordinates in the plane. There are also k base stations; the
mounted in the wall. You’d like to be able to wire up one switch to control position of each of these is specified by (x, y) coordinates as well.
each light fixture, in such a way that a person at the switch can see the For each client, we wish to connect it to exactly one of the base
light fixture being controlled. stations. Our choice of connections is constrained in the following ways.
Exercises 419
Chapter 7 Network Flow
418
There is a range parameter r--a client can only be connected to a base blood type supply demand
station that is within distance r. There is also a load parameter L--no O 50 45
more than L clients can be connected to any single base station.
A 36 42
Your goal is to design a polynomial-time algorithm for the following B 11 8
problem. Given the positions of a set of clients and a set of base stations,
AB 8 3
as well as the range and load parameters, decide whether every client can
be connected simnltaneously to a base station, subject to the range and
load conditions in the previous paragraph. Is th6 105 units of blood on hand enough to satisfy the 100 units
of demand? Find an allocation that satisfies the maximum possible
number of patients. Use an argument based on a minimum-capacity
8. Statistically, the arrival of spring typically resnlts in increased accidents
and increased need for emergency medical treatment, which often re- cut to show why not all patients can receive blood. Also, provide an
quires blood transfusions. Consider the problem faced by a hospital that explanation for this fact that would be understandable to the clinic
is trying to evaluate whether its blood supply is sufficient. administrators, who have not taken a course on algorithms. (8o, for
The basic rule for blood donation is the following. A person’s own example, this explanation should not involve the words flow, cut, or
graph in the sense we use them in this book.)
blood supply has certain antigens present (we can think of antigens-as a
kind of molecnlar signature); and a person cannot receive blood with a
particnlar antigem if their own blood does not have this antigen present. Network flow issues come up in dealing with natural disasters and other
Concretely, this principle underpins the division of blood into four types: crises, since major unexpected events often require the movement and
A, B, AB, and O. Blood of type A has the A antigen, blood of type B has the B evacuation of large numbers of people in a short amount of time.
antigen, blood of type AB has both, and blood of type O has neither. Thus, Consider the following scenario. Due to large-scale flooding in a re-
patients with type A can receive only blood types A or 0 in a transfusion, gion, paramedics have identified a set of n injured people distributed
patients with type B can receive only B or O, patients with type 0 can across the region who need to be rushed to hospitals. There are k hos-
receive only O, and patients with type AB can receive anY of the four pitals in the region, and each of the n people needs to be brought to a
types.4 hospital that is within a half-hour’s driving time of their current location
(a) Let So, SA, SB, and sAB denote the supply in whole units of the different (so different people will have different options for hospitals, depending
blood types on hand. Assume that the hospital knows the projected on where they are right now).
demand for eachblood type do, dA, dB, and dAB for the coming week. At the same time, one doesn’t want to overload any one of the
Give a polynomial-time algorithm to evaluate if the blood on hand hospitals by sending too many patients its way. The paramedics are in
wonld suffice for the projected need. touch by cell phone, and they want to collectively work out whether they
(b) Consider the following example. Over the next week, they expect to can choose a hospital for each of the injured people in such a way that
need at most 100 units of blood. The typical distribution of blood the load on the hospitals is balanced: Each hospital receives at most [n/k]
types in U.S. patients is roughly 45 percent type O, 42 percent type people.
A, 10 percent type B, and 3 percent type AB. The hospital wants to Give a polynomial-time algorithm that takes the given information
know if the blood supply it has on hand would be enough if 100 about the people’s locations and determines whether this is possible.
patients arrive with the expected type distribution. There is a total
of 105 units of blood on hand. The table below gives these demands, 10. Suppose you are given a directed graph G = (Vo E), with a positive integer
and the supply on hand. capacity ce on each edge e, a source s ~ V, and a sink t ~ V. You are also
given a maximum s-t flow in G, defined by a flow value fe on each edge
e. The flow f is acyclic: There is no cycle in G on which all edges carry
4 The Austrian scientist Karl Landsteiner received the Nobel Prize in 1930 for his discovery of the positive flow. The flow f is also integer-valued.
blood types A, B, O, and AB.
Exercises 421
Chapter 7 Network Flow
420
node. In this problem, we consider the variant of the Maximum-Flow and
Now suppose we pick a specific edge e* ~ E and reduce its capacity Minimum-Cut problems with node capacities.
by 1 unit. Show how to find a maximum flow in the resulting capacitated
Let G = (V, E) be a directed graph, with source s ~ V; sink t ~ V, and
graph in time O(m + n), where m is the number of edges in G and n is the
normegative node capacities {cu > 0} for each v ~ V. Given a flow f in this
number of nodes. graph, the flow though a node v is defined as fin(v). We say that a flow
is feasible if it satisfies the usual flow-conservation constraints and the
11. Your friends have written a very fast piece of maximum-flow code based
on repeatedly finding augmenting paths as in Section 7.1. However, after node-capacity constraints: fin(v) _< Cu for all nodes.
you’ve looked at a bit of out-put from it, you realize that it’s not always Give a polynomial-time algorithm to find an s-t maximum flow in
finding a flow of maximum value. The bug turns out to be pretty easy such a node-capacitated network. Define an s-t cut for node-capacitated
to find; your friends hadn’t really gotten into the whole backward-edge networks, and show that the analogue of the Max-Flow Min-Cut Theorem
thing when writing the code, and so their implementation builds a variant holds true.
of the residual graph that only includes the forward edges. In other words,
it searches for s-t paths in a graph df consisting only of edges e for which 14. We define the Escape Problem as follows. We are given a directed graph
f(e) < ce, and it terminates when there is no augmenting path consisting G = (V, E) (picture a network of roads). A certain collection of nodes X c V:
entirely of such edges. We’ll call this the Forward-Edge-OnlY Algorithm. are designated as populated nodes, and a certain other collection S c V
(Note that we do not try to prescribe how this algorithm chooses its are designated as safe nodes. (Assume that X and S are disjoint.) In case
forward-edge paths; it may choose them in any fashion it wants, provided of an emergency, we want evacuation routes from the popnlated nodes
that it terminates only when there are no forward-edge paths.) to the safe nodes. A set of evacuation routes is defined as a set of paths
It’s hard to convince your friends they need to reimplement the in G so that (i) each node in X is the taft of one path, (ii) the last node on
code. In addition to its blazing speed, they claim, in fact, that it never each path lies in S, and (iii) the paths do not share any edges. Such a set of
returns a flow whose value is less than a fLxed fraction of optimal. Do you paths gives a way for the occupants of the populated nodes to "escape"
believe this? The crux of their claim can be made precise in the following to S, without overly congesting any edge in G.
statement. (a) Given G, X, and S, show how to decide in polynomial time whether
There is an absolute constant b > 1 (independent of the particular input such a set of evacuation routes exists.
flow network), so that on every instance of the Maximum-Flow Problem, the Suppose we have exactly the same problem as in (a), but we want to
Forward-Edge-Only Algorithm is guaranteed to find a flow of value at least 1/b enforce an even stronger version of the "no congestion" condition
times the maximum-flow value (regardless of how it chooses its forward-edge (iii). Thus we change (iii) to say "the paths do not share any nodes."
paths). With this new condition, show how to decide in polynomial lime
Decide whether you think this statement is true or false, and give a proof whether such a set of evacuation routes exists.
of either the statement or its negation. Also, provide an example with the same G, X, and S, in which the
answer is yes to the question in (a) but no to the question in (b).
12. Consider the following problem. You are given a flow network with unit-
capacity edges: It consists of a directed graph G = (V, E), a source s ~ V,
15. Suppose you and your friend Alanis live, together with n - 2 other people,
and a sink t ~ V; and ce = 1 for every e ~ E. You are also given a parameter k.
at a popular off-campus cooperative apartment, the Upson Collective.
The goal is to delete k edges so as to reduce the maximum s-t flow in Over the next n nights, each of you is supposed to cook dinner for the
G by as much as possible. In other words, you should find a set of edges co-op exactly once, so that someone cooks on each of the nights.
F ___ E so that IFI = k and the maximum s-t flow in G’ = (V, E - F) is as small Of course, everyone has scheduling conflicts with some of the nights
as possible subject to this. (e.g., exams, concerts, etc.), so deciding who should cook on which night
Give a polynomial-time algorithm to solve this problem.
becomes a tricky task. For concreteness, let’s label the people
Thus the hospital relaxes the requirements as follows. They add You show your friends a solution computed by your algorithm from
a new parameter c > 0, and the system now should try to return to (a), and to your surprise they reply, "This won’t do at all--one of the
conditions is only being measured by balloons from a single subcon-
each doctor j a list L~ with the following properties.
tractor." You hadn’t heard anything about subcontractors before; it
(A*) L~ contains at most c days that do not appear on the list Lj. turns out there’s an extra wrinkle they forgot to mention ....
L~,
(B) (Same as before) If we consider the whole set of lists L~ ..... Each of the balloons is produced by one of three different sub.
it causes exactly Pi doctors to be present on day i, for i = 1, 2 ..... n.
Describe a polynomial-time algorithm that implements this re- contractors involved in the experiment. A requirement of the exper-
vised system. It should take the numbers Pl, Pz ..... Pn, the lists iment is that there be no condition for which all k measurements
come from balloons produced by a single subcontractor.
L1 ..... Lk, and the parameter c > 0, and do one of the following two For example, suppose balloon 1 comes from the first subcon-
things.
, , L. ....’ satisfying properties (A*) and (B); or tractor, balloons 2 and 3 come from the second subcontractor, and
- Return lists L1, 2
Lk
balloon 4 comes from the third subcontractor. Then our previous so-
- Report (correctly) that there is no set of lists Lv L2 ..... Lk ’ that
lution no longer works, as both of the measurements for condition
satisfies both properties (A*) and (B).
c~ were done by balloons from the second subcontractor. However,_
20. Your friends are involved in a large-scale atmospheric science experi- we could use balloons 1 and 2 to each measure conditions c1, c2, and
ment. They need to get good measurements on a set S of n different use balloons 3 and 4 to each measure conditions c3, c4.
conditions in the atmosphere (such as the ozone level at various places), Explain how to modify your polynomial-time algorithm for part
and they have a set of m balloons that they plan to send up to make these (a) into a new algorithm that decides whether there exists a solution
measurements. Each balloon can make at most two measurements. satisfying all the conditions from (a), plus the new requirement about
Unfortunately, not all balloons are capable of measuring all condi- subcontractors.
tions, so for each balloon i = 1 ..... m, they have a set Si of conditions
that balloon i can measure. Finally, to make the results more reliable, they
plan to take each measurement from at least k different balloons. (Note 21. You’re helping to organize a class on campus that has decided to give all
that a single balloon should not measure the same condition twice.) They its students wireless laptops for the semester. Thus there is a collection
are having trouble figuring out which conditions to measure on which of n wireless laptops; there is also have a collection of n wireless access
balloon. points, to which a laptop can connect when it is in range.
Example. Suppose that k = 2, there are n = 4 conditions labeled q, c2, c3, c4, The laptops are currently scattered across campus; laptop e is within
and there are rn = 4 balloons that can measure conditions, subject to range of a set S~ of access points. We will assume that each laptop is within
the limitation that S~ = Sz = {q, c2, c3}, and $3 = $4 = {q, q, c4}..Then one range of at least one access point (so the sets S~ are nonempty); we will
possible way to make sure that each condition is measured at least k = 2 also assume that every access point p has at least one laptop within range
times is to have of it.
¯ balloon I measure conditions To make sure that all the wireless connectivity software is working
¯ balloon 2 measure conditions cz, cs, correctly, you need to try having laptops make contact with access points
in such a way that each laptop and each access point is involved in at
¯ balloon 3 measure conditions q, c4, and
least one connection. Thus we will say that a test set T is a collection of
¯ balloon 4 measure conditions c~, c4. ordered pairs of the form (e, p), for a laptop e and access point p, with
(a) Give a polynomial-time algorithm that takes the input to an instance the properties that
of this problem (the n conditions, the sets Si for each of the ra (i) If (LP) ~ T, then e is within range ofp (i.e., p ~ Se).
balloons, and the parameter k) and decides whether there is a way to
(ii) Each laptop appears in at least one ordered pair in T.
measure each condition by k different balloons, while each balloon
(i~) Each access point appears in at least one ordered pair in T.
only measures at most two conditions.
Chapter 7 Network Flow
Exercises 429
428
Here’s one way to divide the nodes of G into three categories of this
This way, by trying out all the connections specified by the pairs in T,
we can be sure that each laptop and each access point have correctly sort.
functioning software. * We say a node v is upstream if, for all minimum s-t cuts (A, B), we
The problem is: Given the sets Se for each laptop (i.e., which laptops have v ~ A--that is, v lies on the source side of every minimum cut.
are within range of which access points), and a number k, decide whether ~ We say a node v is downstream if, for all minimum s-t cuts (A, B), we
there is a test set of size at most k. have ~ ~ B--that is, v lies on the sink side of every minimum cut.
Example. Suppose that n = 3; laptop I is within range of access points 1 o We say a node v is central if it is neither upstream nor downstream;
and 2; laptop 2 is within range of access point 2; and laptop 3 is within there is at least one minimum s-t cut (A, B) for which v ~ A, and at
range of access points 2 and 3. Then the set of pairs least one minimum s-t cut (A’, B’) for which v ~ B’.
Give an algorithm that takes a flow network G and classifies each of "
(laptop 1o access point 1), (laptop 2, access point 2),
its nodes as being upstream, downstream, or centra!. The running time
(laptop 3, access point 3)
of your algorithm should be within a constant factor of the time required
would form a test set of size 3. to compute a single maximum flow.
(a) Give an example of an instance of this problem for which there is no
test set of size n. (Recall that we assume each laptop is within range 24. Let G = (V, E) be a directed graph, with source s ~ V, sink t ~ V, and
of at least one access point, and each access point p has at least one nonnegative edge capacities {ce}. Give a polynomial-time algorithm to
decide whether G has a unique minimum s-t cut (i.e., an s-t of capacity
laptop within range of it.)
strictly less than that of all other s-t cuts).
(b) Give a polynomial-time algorithm that takes the input to an instance
of this problem (including the parameter k) and decides whether 25. Suppose you live in a big apartment with a lot of friends. Over the
there is a test set of size at most k. course of a year, there are many occasions when one of you pays for an
expense shared by some subset of the apartment, with the expectation
22. Let M be an n x n matrix with each entry equal to either 0 or 1. Let mij
denote the entry in row i and column j. A diagonal entry is one of the that everything will get balanced out fairly at the end of the year. For
example, one of you may pay the whole phone bill in a given month,
form mii for some i.
another will occasionally make communal grocery runs to the nearby
Swapping rows i and j of the matrix M denotes the following action:
organic food emporium, and a thud might sometimes use a credit card
we swap the values m~k and mjk for k = 1, 2 ..... n. Swapping two columns
to cover the whole bill at the local Italian-Indian restaurant, Little Ida.
is defined analogously.
In any case, it’s now the end of the year and time to settle up. There
We say that M is rearrangeable if it is possible to swap some of the
are n people in the apartment; and for each ordered pair (i,j) there’s an
pairs of rows and some of the pairs of columns (in any sequefice) so that,
amount a~.l _> 0 that i owes ], accumulated over the course of the year. We
after a!l the swapping, all the diagonal entries of M are equal to 1. will reqt~e that for any two people ~ and j, at least one of the quantities
(a) Give an example of a matrix M that is not rearrangeable, but for
aij or aji is equal to 0. This can be easily made to happen as follows: If it
which at least one entry in each row and each column is equal to !. turns out that i owes j a positive amount x, andj owes i a positive amount
(b) Give a polynomial-time algorithm that determines whether a matrix y < x, then we will subtract off y from both sides and declare a~j = x - y
M with 0-1 entries is rearrangeable. while ag = 0. In terms of all these quantities, we now define the imbalance
of a person ~ to be the sum of the amounts that i is owed by everyone
23. Suppose you’re looking at a flow network G with source s and sink t, and else, minus the sum of the amounts that i owes everyone else. (Note that
you want to be able to express something like the following intuitive no- an imbalance can be positive, negative, or zero.)
tion: Some nodes are clearly on the "source side" of the main bottlenecks;
In order to restore all imbalances to 0, so that everyone departs on
some nodes are clearly on the "sink side" of the main bottlenecks; and
good terms, certain people will write checks to others; in other words, for
some nodes are in the middle. However, G can have many minimum cuts,
so we have to be careful in how we try making this idea precise. certain ordered pairs (i,j), i will write a check to j for an amount bi~ > O.
Exercises 431
Chapter 7 Network Flow
430
motion: We begin by assigning Pl to b~ and P2 to b2, and we reassign pl to
We will say that a set of checks constitutes a reconciliation if, for each
person i, the total value of the checks received by i, minus the total value bz and Pz to b~ during the motion (for example, when p~ passes the point
of the checks written by i, is equal to the imbalance of i. Finally, you and (2, 0)).
your friends feel it is bad form for i to write j a check if ~ did not actua!ly 27. Some of your friends with jobs out West decide they rea!ly need some
owe j money, so we say that a reconci~ation is consistent if, whenever i extra time each day to sit in front of their laptops, and the morning
writes a check to j, it is the case that aij > O. commute from Woodside to Palo Alto seems like the only option. So they
Show that, for any set of amounts a~j, there is always a consistent decide to carpool to work.
reconciliation in which at most n- 1 checks get written, by giving a Unfortufiately, they all hate to drive, so they want to make sure that
polynomial-time algorithm to compute such a reconciliation. any carpool arrangement they agree upon is fair and doesn’t overload
any individual with too much driving. Some sort of simple round-robin
26. You can tell that cellular phones are at work in rural communities, from scheme is out, because none of them goes to work every day, and so the
the giant microwave towers you sometimes see sprouting out of corn subset of them in the car varies from day to day.
fields and cow pastures. Let’s consider a very simplified model of a
Here’s one way to define fairness. Let the people be labeled
cellular phone network in a sparsely populated area.
{Px ..... Pk}. We say that the total driving obligation of p~ over a set of
We are given the locations of n base stations, specified as points
days is the expected number of times that p1 would have driven, had a
bl ..... bn in the plane. We are also given the locations of n cellular phones, driver been chosen uniformly at random from among the people going
specified as points Pl ..... Pn in the plane. Finally, we are given’a range to work each day. More concretely, suppose the carpool plan lasts for d
parameter A > 0. We call the set of cell phones fully connected if it is days, and on the ith day a subset St _ S of the people go to work. Then the
possible to assign each phone to a base station in such a way that above definition of the total driving obligation ~ for pj can be written as
* Each phone is assigned to a different base station, and A~ = ~:;~S~ I~,-i¯ Ideally, we’d like to require that p~ drives at most A~ times;
* If a phone at p~ is assigned to a base station at bj, then the straight-line unfortunately, A~ may not be an integer.
distance between the points Pi and b1 is at most A. So let’s say that a driving schedule is a choice of a driver for each
Suppose that the owner of the cell phone at point Pl decides to go day--that is, a sequence p~, p~ ..... pi~ with p~ ~ St-and that a fair driving
for a drive, traveling continuously for a total of z units of distance due schedule is one in which each p~ is chosen as the driver on at most
east. As this cell phone moves, we may have to update the assignment of days.
phones to base stations (possibly several times) in order to keep the set (a) Prove that for any sequence of sets ~ ..... Sa, there e~xists a fair
of phones fully connected. driving schedule.
Give a polynomial-time algorithm to decide whether it is possible to (b) Give an algorithm to compute a fair drixring schedule with ~g
keep the set of phones fully connected at all times during the travel of time polynomial in k and d.
this one cell phone. (You should assume that all other phones remain sta-
tionary during this travel.) If it is possible, you should report a sequence 28. A group of students has decided to add some features to Cornell’s on-line
of assignments of phones to base stations that will be sufficient in order Course Management System (CMS), to handle aspects of course planning
to maintain full connectivity; ff it is not possible, you should report a that are not currently covered by the software. They’re beginning with a
point on the traveling phone’s path at which full connectivity cannot be module that helps schedule office hours at the start of the semester.
maintained. Their initial prototype works as follows. The office hour schedule will
You should try to mak~ your algorithm run in O(n3) time if possible. be the same from one week to the next, so it’s enough to focus on the
scheduling problem for a single week. The course administrator enters
Example. Suppose we have phones at Pl = (0, 0) and P2 = (2, 1); we have
a collection of nonoverlapping one-hour time intervals I1, I2 ..... Ik when
base stations at bl = (1, 1) and b2 ----- (3, 1); and A = 2. Now consider the case
it would be possible for teaching assistants (TAs) to hold office hours;
in which the phone at pl moves due east a distance of 4 units, ending at
the eventual office-hour schedule will consist of a subset of some, but
(4, 0). Then it is possible to keep the phones fully connected during this
Chapter 7 Network Flow Exercises 433
432
generally, not all, of these time slots. Then each of the TAs enters his or For example, suppose that in our previous example, we add the
her weekly schedule, showing the times when he or she would be available constraint that we want at least one office hour on Wednesday and at
least one office hour on Thursday. Then the previous solution does
to hold office hours.
not work; but there is a possible solution In which we have the first
Finally, the course administrator specifies, for parameters a, b, and
TA hold office hours in tLme slot 11, and the second TA hold office
c, that they would like each TA to hold between a and b office hours per
hours In time slots I3 and I5. (Another solution would be to have the
week, and they would like a total of exactly c office hours to be held over
first TA hold office hours in time slots 11 and I4, and the second TA
the course of the week. hold office hours in time slot Is.)
The problem, then, is how to assign each TA to some of the office- Give" a polynomial-time algorithm that computes office-hour
hour time slots, so that each TA is available for each of his or her office- schedules under this more complex set of constraints. The algo-
hour slots, and so that the right number of office hours gets held. (There
rithm should either construct a schedule or report (correctly) that
should be only one TA at each office hour.) none exists.
Example. Suppose there are five possible time slots for office hours:
29. Some of your friends have recently graduated and started a small com-
11 =Mon 3-4 P.M.; 12 = Tue 1-2 P.M.; 13 = Wed 10-11 A.M.; I4 = Wed 3-4
pany, which they are currently running out of their parents’ garages in
and I5 = Thu 10-11 A.M.. Santa Clara. They’re in the process of porting all their software from an
There are two TAs; the first would be able to hold office hours, at any old system to a new, revved-up system; and they’re facing the following
time on Monday or Wednesday afternoons, and the second would be able problem.
to hold office hours at any, time on Tuesday,, Wednesday, or Thursday,. They have a collection of n soft, rare applications, {1, 2 ..... n}, rtm-
(In general, TA availabiliW might be more complicated to spec.ify than ning on their old system; and they’d like to port some of these to the new
this, but we’re keepIng this example simple.) Finally, each TA should hold system. If they move apphcation i to the new system, they expect a net
between a = 1 and b = 2 office hours, and we want exactly c = 3 office hours (monetary) benefit of b~ > 0. The different software applications interact
per week total. with one another; ff applications i and j have extensive interaction, then
One possible solution would be to have the first TA hold office hours the company, will incur an expense if they move one of i or j to the new
in Be slot I1, and the second TA to hold office hours In time slots I2 system but not both; let’s denote this expense by xij >- 0.
and I5. So, ff the situation were really this simple, your friends would just
(a) Give a polynomial-time algorithm that takes the Input to an Instance port all n applications, achieving a total benefit of ~.~ b~. Unfortunately,
of this problem (the time slots, the TA schedules, and the parameters there’s a problem ....
a, b, and c) and does one of the following two things: Due to small but fundamental incompatibilities between the two
- Constructs a valid schedule for office hours, specifying which systems, there’s no way to port application 1 to the new system; it will
TA will cover which time slots, or have to remain on the old system. Nevertheless, it might still pay off to
- Reports (correctly) that there is no valid way to schedule office port some of the other applications, accruing the associated benefit and
hours. Incurring the expense of the interaction between applications on different
(b) This office-hour scheduling feature becomes very popular, and so systems.
course staffs begin to demand more. In particular, they observe that So this is the question they pose to you: Which of the remaining
it’s good to have a greater density of office hours closer to the due applications, ff any, should be moved? Give a polynomial-time algorithm
date of a homework assignment. to find a set S ~ {2, 3 ..... n} for which the sum of the benefits minus the
So what they want to be able to do is to specify an office-hour expenses of moving the applications in S to the new system is maximized.
density parameter for each day of the week: The number d~ specifies
that they want to have at least d~ office hours on a given day i of the 30. Consider a variation on the previous problem. In the new scenario, any
week. apphcation can potentially be moved, but now some of the benefits b~ for
Exercises 435
Chapter 7 Network Flow
434
second or third boxes; but in any nesting arrangement, both the second
moving to the new system are in fact negative: If bi < 0, then it is preferable
and third boxes ~ be visible. So the minimum possible number of vis-
(by an amount quantified in hi) to keep i on the old system. Again, give
ible boxes is two, and one solution that achieves this is to nest the first
a polynomial-time algorithm to find a set $ ~_ {1, 2 ..... n} for which the
box inside the second.
sum of the benefits mlnus the expenses of moving the applications in S
to the Sew system is maximized. 32. Given a graph G = (V, E), and a natural number/(, we can define a relation
~ on pairs of vertices of G as follows. If x, y E V, we say that x ~ y if
31. Some of your friends are interning at the small high-tech company Web-
there exist k mutually edge-disjoint paths from x to y in G.
Exodus. A running joke among the employees there is that the back room
has less space devoted to high-end servers than it does to empty boxes Is it true ~hat for every G and every k >_ 0, the relation -~ is transitive?
of computer equipment, piled up in case something needs to be shipped That is, is it always the case that if x ~ y and y --+ z, men we nave x ~ z.
back to the supplier for maintainence. Give a proof or a countere.xample.
A few days ago, a large shipment of computer monitors arrived, each
in its own large box; and since there are many different kinds of monitors 33. Let G -- (V, E) be a directed graph, and suppose that for each node u, the
in the shipment, the boxes do not all have the same dimensions. A bunch number of edges into u is equal to the number of edges out of v. That is,
of people spent some time in the morning trying to figure out how to for all u,
store all these things, realizing of course that less space would be tkken
up If some of the boxes could be nested inside others.
Suppose each box i is a rectangnlar parallelepiped with side lengths Let xo y be two nodes of G, and suppose that there exist k mutually edge-
equal to (i1, iz, i3); and suppose each side length is strictly between half a disjoint paths from x to y. Under these conditions, does it follow that
meter and one meter. Geometrically, you know what it means for one box there exist k mutually edge-disjoint paths from y to x? Give a proof or a
to nest inside another: It’s possible if you can rotate the smaller so that counterexample with explanation.
it fits inside the larger in each dimension. Formally, we can say that box
i with dimensions (il, i2, i3) nests inside boxj with dimensions (Jl,J2,J3) If 34. Ad hoc networks, made up of low-powered wireless devices, have been
there is a permutation a, b, c of the dimensions {1, 2, 3} so that in <J~, and proposed for situations like natural disasters in which the coordinators
ib <Jz, and ic <j~. Of course, nesting is recursive: If i nests in j, andj nests of a rescue effort might want to monitor conditions in a hard-to-reach
in k, then by putting ~ inside j inside k, only box k is visible. We say that area. The idea is that a large collection of these wireless devices could be
a nesting arrangement for a set of n boxes is a sequence of operations dropped into such an area from an aLrplane and then configured into a
in which a box i is put inside another box j in which it nests; and ff there functioning network.
were already boxes nested inside i, then these end up inside. ] as well. Note that we’re talking about (a) relatively inexpensive devices that
(Also notice the following: Since the side lengths of ~ are more than half are (b) being dropped from an airplane into (c) dangerous territory; and
a meter each, and since the side lengths of j are less than a meter each, for the combination of reasons (a), (b), and (c), it becomes necessary to
box i will take up more than half of each dimension of j, and so after ~ is include provisions for dealing with the failure of a reasonable number of
put inside j, nothing else can be put inside j.) We say that a box k is visible the nodes.
in a nesting arrangement ff the sequence of operations does not result in We’d like it to be the case that if one of the devices u detects that it is in
its ever being put inside another box. danger of failing, it should transmit a representation of its current state to
Here is the problem faced by the people at WebExodus: Since only the some other device in the network. Each device has a limited transmitting
visible boxes are taking up any space, how should a nesting arrangement range--say it can communicate with other devices that lie within d meters
be chosen so as to minimize the number of visible boxes? of it. Moreover, since we don’t want it to try transmitting its state to a
Give a polynomial-time algorithm to solve this problem. device that has already failed, we should include some redundancy: A
device v should have a set of k other devices that it can potentially contact,
Example. Suppose there are three boxes with dimensions (.6, .6, .6),
each within d meters of it. We’ll call this a back-up set for device u.
(.75, .75, .75), and (.9, .7, .7). The first box can be put into either of the
Exercises 437
Chapter 7 Network Flow
436
the foreground; rather, it should compute a segmentation (A1, BI) that
(a) Suppose you’re given a set of n wireless devices, with positions maximizes the quantity q(A~, BO subject to the condition that ~ is in the
represented by an (x, y) coordinate pair for each. Design an algorithm
foreground. (In practice, this is useful for the following kind of operation:
that determines whether it is possible to choose a back-up set for
In segmenting a photo of a group of people, perhaps someone is holding
each device (i.e., k other devices, each within d meters), with the a bag that has been accidentally labeled as part of the background. By
further property that, for some parameter b, no device appears in
the back-up set of more than b other devices. The algorithm should clicking on a single pixel belonging to the bag, and recomputing an
optimal segmentation subject to the new condition, the whole bag Hill
output the back-up sets themselves, provided they can be found.
often become part of the foreground.)
(b) The idea that, for each pair of devices u and w, there’s a strict In fact, the system should allow the user to perform a sequence
dichotomy between being "in range" or "out of range" is a simplified
abstraction. More accurately, there’s a power decay ftmction f(.) that of such mouse-clicks u1, v2 ..... lit; and after mouse-click vg, the sys-
specifies, for a pair of devices at distance 3, the signal strength fO) tem should produce a segmentation (Ai, Bi) that maximizes the quantity
q(A~, Bi) subject to the condition that all of vl, v2 ..... li~ are in the fore-
that they’ll be able to achieve on their wireless connection. (We’ll
ground.
assume that fO) decreases with increasing 3.)
Give an algorithm that performs these operations so that the initial
We might want to build this into our notion of back-up sets as
segmentation is computed within a constant factor of the time for a single
follows: among the k devices in the back-up set of v, there should
be at least one that can be reached with very high signal strength, maximum flow, and then the interaction with the user is handled in O(dn)
at least one other that can be reached with moderately high signal time per mouse-click.
strength, and so forth. More concretely, we have values Pl >- P2 >-"" >- (Note: Solved Exercise 1 from t~s chapter is a useful primitive for
pg, so that if the back-up set for v consists of devices at distances doing this. Also, the symmetric operation of forcing a pixel to belong to
dl _< d2 <_..- _< dg, then we should have f(dj) >_ pj for each j. ’ the background can be handled by analogous means, but you do not have
Give an algorithm that determines whether it is possible to to work this out here.)
choose a back-up set for each device subject to this more detailed 36. We now consider a different variation of the image segmentation problem
condition, still requiring that no device should appear in the back-up in Section 7.10. We Hill develop a solution to an image labeling problem,
set of more than b other devices. Again, the algorithm should output where the goal is to label each pixel with a rough estimate of its distance
the back-up sets themselves, provided they can be found. from the camera (rather than the simple foreground/background labeling
used in the text). The possible labels for each pixel ~ be 0, 1, 2 .....M
35. You’re designing an interactive image segmentation tool that works as for some integer M.
follows. You start with the image segmentatign setup described in Section
Let G = (V, E) denote the graph whose nodes are pixels, and edges
7.10, with n pixels, a set of neighboring pairs, and parameters {ai}, {hi},
indicate neighboring pairs of pixels. A labeling of the pixels is a partition
and {pgj}. We will make two assumptions about this instance. First, we ~
of V into sets Ao, A! ..... AM, where Ag is the set of pixels that is labeled
suppose that each of the parameters {a~}, {b~l, and {Pif! is a nonnegative
integer between 0 and d, for some number d. Second, we Hill suppose that with distance k for k = 0 ..... M. We Hill seek a labeling of minimum cost;
the cost Hill come from two types of terms. By analogy with the fore-
the neighbor relation among the pixels has the property that each pixel
ground/background segmentation problem, we Hill have an assignment
is a neighbor of at most four other pixels (so in the resulting graph, there
cost: for each pixel ~ and label k, the cost ai.~ is the cost of assigning label
are at most four edges out of each node). k to pixel i. Next, ff two neighboring pixels (i,j) ~ E are assigned different
You first perform an initial segmentation (Ao, Bo) so as to maximize labels, there Hill be a separation cost. In Section 7.10, we used a sepa-
the quantity q(Ao, Bo). Now, this might result in certain pixels being ration penalty p~.j. in our current problem, the separation cost Hill also
assigned to the background when the user knows that they ought to be depend on how far the two pixels are separated; specifically, it Hill be
in the foreground. So, when presented with the segmentation, the user proportional to the difference in value between their two labels.
has the option of mouse-clicking on a particular pixel vl, thereby bringing
Thus the overa~ cost q’ of a labeling is defined as follows:
it to the foreground. But the tool should not simply bring this pixel into
Chapter 7 Network Flow Exercises 439
438
ever, as we’ll see here, it is possible to relax the nonnegativity requirement
a little and still have a problem that can be solved in polynomial time.
Let G = (V, E) be a directed graph, with source s ~ V, sink t ~ V, and
edge capacities {ce}. Suppose that for every edge e that has neither s nor t
Figure 7.30 The set of nodes corresponding to a single pixel i in Exercise 36 (shown as an endpoint, we have Q >_ 0. Thus ce can be negative for edges e that have
together with the source s and sink t). at least one end equal to either s or t. Give a pol~;nomial-time algorithm
to find an s-t cut of minimum value in such a graph. (Despite the new
M
nonnegativity.requirements, we still define the value of an s-t cut (A, B)
to be the sum of the capacities of all edges e for which the taft of e is in
k<~ (i,13aE A and the head of e is in B.)
i~Ak,j~A£
38. You’re working with a large database of employee records. For the pur-
The goal of this problem is to develop a polynomial-time algorithm poses of this question, we’ll picture the database as a two-dimensional
that finds the optimal labeling given the graph G and the penalty pa- table T with a set R of rn rows and a set C of n columns; the rows corre-
rameters ai,k and pq. The algorithm will be based on constructing a flow spond to individual employees, and the columns correspond to different
network, and we will start you off on designing the algorithmby providing
attributes.
a portion of the construction.
The flow network will have a source s and a sink t. In addition, for To take a simple example, we may have four columns labeled
M, name, phone number, start date, manager’s name
as shown in Figure 7.30. (M = 5 in the example in the figure.) and a table with five employees as shown here.
For notational convenience, the nodes ui,0 and u~,M+l will refer to s
and t, respectively, for any choice of i ~ V.
M; and
edges (u~,k+l, ui,~) in the opposite direction with very large capacity L. We Alanis 3-4563 6/13/95 Chelsea
will refer to this collection of nodes and edges as the chain associated Chelsea 3-2341 1/20/93 Lou
with pixel i. Elrond 3-2345 12/19/01 Chelsea
Notice that ff we make this very large capacity L large enough, then Hal 3-9000 1/12/97 Chelsea
there will be no minimum cut (A, B) so that an edge of capacity L leaves Raj 3-3453 7/1/96 Chelsea
the set A. (How large do we have to make it for this to happen?). HenCe, for
any minimum cut (A, B), and each pixel i, there will be exactly bne low- Given a subset S of the columns, we can obtain a new, smaller table
capacity edge in the chain associated with i that leaves the set A. (You by keeping only the entries that involve columns from S. We will call this
should check that ff there were two such edges, then a large-capacity new table the projection of T onto S, and denote it by T[$]. For example,
edge would also have to leave the set A.) if S = {name, start date}, then the projection T[$] would be the table
Finally, here’s the question: Use the nodes and edges defined so far consisting of just the first and third columns.
to complete the construction of a flow network with the property that a There’s a different operation on tables that is also useful, which is
minimum-cost labeling can be efficiently computed from a minimum s-t to permute the columns. Given a permutation p of the columns, we can
cut. You should prove that your construction has the desired property, obtain a new table of the same size as T by simply reordering the columns
and show how to recover the minimum-cost labeling from the cut. according to p. We will call this new table the permutation of T by p, and
denote it by Tp.
37. In a standard minimum s-t cut problem, we assume that all capadties are
nonnegative; allowing an arbitrary set of positive and negative capadties All of this comes into play for your particular application, as follows.
results in a problem that is computationally much more difficult. How- S~ that you’re
Exercises 441
Chapter 7 Network Flow
440
multiples of one million. Examples of such statistics would look like the
going to be working with a lot, so you’d like to have them available in a
following table.
readily accessible format. One choice would be to store the k projections
T[S1], T[S2] ..... T[Sk], but this would take up a lot of space. In considering
alternatives to this, you learn that you may not need to explicitly project Country A B C Total
onto each subset, because the underlying database system can deal with
grown-up men 11.998 9.083 2.919 24.000
a subset of the columns particularly efficiently if (in some order) the
grown-up women 12.983 10.872 3.145 27.000
members of the subset constitute a prefix of the columns in left-to-right
order. So, in our example, the subsets {name, phone number} and {name, children 1.019 2.045 0.936 4.000
start date, phone number,} constitute prefixes (they’re the first two and Total 26.000 22.000 7.000 55.000
first three columns from the left, respectively); and as such, they can
be processed much more efficiently in this table than a subset such .as We will assume here for simplicity that our data is such that all
{name, start date}, which does not constitute a prefix. (Again, note that row and column sums are integers. The Census Rounding Problem is to
a given subset Si does not come with a specified order, and so we are round all data to integers without changing any row or column-sum. Each
interested in whether there is some order under which it forms a prefix fractional number canbe rounded either up or down. For example, ~good
rounding for our table data would be as follows.
of the columns.)
So here’s the question: Given a parameter ~ < k, can you find ~ per-
mutations of the columns Pl, Pz ..... Pe so that for every one of the given Country A B C Total
subsets Si (for i = 1, 2 ..... k), it’s the case that the columns in Si consti- grown-up men 11.000 10.000 3.000 24.000
tute a prefix of at least one of the permuted tables Tp~, Tp~_ ..... Tp~? We’l!
grown-up women 13.000 10.000 4.000 27.000
say that such a set of permutations constitutes a valid solution to the
problem; if a valid solution exists, it means you only need to store the children 2.000 2.000 0.000 4.000
g permuted tables rather than all k projections. Give a polynomial-time Total 26.000 22.000 7.000 55.000
algorithm to solve this problem; for instances on which there is a valid
solution, your algorithm should return an appropriate set of ~ permuta- (a) Consider first the special case when all data are between 0 and 1.
So you have a matrix of fractional numbers between 0 and 1, and
tions.
your problem is to round each fraction that is between 0 and i to
F~xample. Suppose the table is as above, the given subsets are
either 0 or 1 without changing the row or column sums. Use a flow
S1 = {name, phone number}, computation to check if the desired rounding is possible.
Sz = {name, start date}, (b) Consider the Census Rounding Problem as defined above, where row
S3= {name, manager’s name, start date}, and column sums are integers, and you want to round each fractional
number ~ to either [aJ or loci. Use a flow computation to check if the
and ~ = 2. Then there is a valid solution to the instance, and it could be desired rounding is possible.
achieved by the two permutations
(c) Prove that the rounding we are looking for In (a) and (b) always exists.
Pl---{name, phone number, start date, manager’s name},
P2-’-{name, start date, manager’s name, phone number}. 40. In a lot of numerical computations, we can ask about the "stability"
or "robustness" of the answer. This kind of question can be asked for
This way, $I constitutes a prefix of the permuted table Tp~, and both $2 combinatorial problems as well; here’s one way of phrasing the question
and $3 constitute prefixes of the permuted table Tp_~. for the Minimum Spanning Tree Problem.
Suppose you are given a graph G = (V, E), with a cost ce on each edge e.
39. You are consulting for an environmental statistics firm. They collect We view the costs as quantities that have been measured experimentally,
statistics and publish the collected data in a book. The statistics are
subject to possible errors in measurement. Thus, the minimum spanning
about populations of different regions in the world and are recorded in
Exercises 443
Chapter 7 Network Flow
442
¯ At time 1, we preempt J1 to start J2 on P~.
tree one computes for G may not in fact be the "real" minimum spanning
¯ At time 2, we resume J~ on P~. (J2 continues processing on P1.)
tree.
Given error parameters ~ > 0 and/~ > 0, and a specific edge e’ = (u, v), ¯ At time 3, J2 completes by its deadline. P2 ceases to be available, so
you would like to be able to make a claim of the following form. we move J1 back to P1 to finish its remaining one unit of processing
there.
(,) Even if the cost of each edge were to be changed by at most s (either ¯ At time 4, J1 completes its processing on P1.
increased or decreased), and the costs of k of the edges other than e’ were
Notice that there is no solution that does not involve preemption and
further changed to arbitrarily different values, the edge e’ would still not belong
moving of jobs.
to any minimum spanning tree of G.
Such a property provides a type of guarantee that e’ is not likely to belong 42. Give a polynomial-time algorithm for the following minimization ana-
to the minimum spanning tree, even assuming significant measurement !ogue of the Maximum-Flow Problem. You are given a directed graph
error. G = (V, E), with a source s s V and sink t ~ V, and numbers (capacities)
Give a polynomial-time algorithm that takes G, e’, e, and k, and decides *(v, w) for each edge (v, w) ~ E. We define a flow f, and the value of a flow,
whether or not property (*) holds for e’. as usual, requiring that all nodes except s and t satisfy flow conserva-
tion. However, the given numbers are lower bounds on edge flow--that
41. Suppose you’re managing a collection of processors and must schedule is, they require that f(v, w) > ~.(v, w) for every edge (v, w) ~ E, and there is
a sequence of jobs over time. no upper bound on flow values on edges.
The jobs have the following characteristics. Each job ] has an arrival (a) Give a polynomial-time algorithm that finds a feasible flow of mini-
time aj when it is first available for processing, a length g1 which indicates mum possible value.
how much processing time it needs, and a deadline di by which it must (b) Prove an analogue of the Max-Flow Min-Cut Theorem for this problem
be finished. (We’ll assume 0 < ~i < d1 - ai.) Each job can be run on any (i.e., does min-flow = max-cut?).
of the processors, but only on one at a time; it can also be preempted
and resumed from where it left off (possibly after a delay) on another 43. You are trying to solve a circulation problem, but it is not feasible. The
processor. problem has demands, but no capacity limits on the edges. More formally,
Moreover, the collection of processors is not entirely static either: there is a graph G = (V, E), and demands dv for each node v (satisfying
You have an overall pool of k possible processors; but for each processor ~v~v dv = 0), and the problem is to decide ff there is a flow f such that
i, there is an interval of time % t~] during which it is available; it is f(e) > 0 and f~n(v) - f°Ut(v) = du for all nodes v V. Note that this problem
unavailable at all other times. can be solved via the circulation algorithm from Section 7.7 by setting
Given all this data about job requirements and processor availability, ce = +oo for all edges e ~ E. (Alternately, it is enough to set ce to be an
you’d like to decide whether the jobs can all be completed or not. Give a extremely large number for each edge--say, larger than the total of all
polynomial-time algorithm that either produces a schedule completing all positive demands dv in the graph.)
jobs by their deadlines or reports (correctly) that no such schedule exists. You want to fix up the graph to make the problem feasible, so it
You may assume that all the parameters associated with the problem are would be very useful to know why the problem is not feasible as it stands
integers. now. On a closer look, you see that there is a subset’U of nodes such that
Example. Suppose we have two jobs J1 and J2. J1 arrives at time 0, is due there is no edge into U, and yet ~v~u du > 0. You quickly realize that the
at time 4, and has length 3. J2 arrives at time 1, is due at time 3, and has existence of such a set immediately implies that the flow cannot exist:
length 2. We also have two processors P1 and P2. P~ is available between The set U has a positive total demand, and so needs incoming flow, and
times 0 and 4; Pz is ava~able between Omes 2 and 3. In this case, there is yet U has no edges into it. In trying to evaluate how far the problem is
a schedule that gets both jobs done. from being solvable, you wonder how big t~e demand of a set with no
incoming edges can be.
¯ At time 0, we start job J~ on processor P~.
Exercises 445
Chapter 7 Network Flow
Cook et al. (1998) and Ahuja, Magnanti, and Orlin (1993) discuss algorithms
for this problem.
While network flow models routing problems that can be reduced to the
task of constructing a number of paths from a single source to a single sink,
there is a more general, and harder, class of routing problems in which paths
must be simultaneously constructed between different pairs of senders and
receivers. The relationship among these classes of problems is a bit subtle;
we discuss this issue, as well as algorithms for some of these harder types of
routing problems, in Chapter !1.
Notes ou the Exercises Exercise 8 is based on a problem we learned from Bob
Bland; Exercise 16 is based on discussions with Udi Manber; Exercise 25 is ~r~tractabilit~
based on discussions with Jordan Erenrich; Exercise 35 is based on discussions
with Yuri Boykov, Olga Veksler, and Ramin Zabih; Exercise 36 is based on
results of Hiroshi Ishikawa and Davi Geiger, and of Boykov, Veksler, and Zabih;
Exercise 38 is basedon a problem we learned from AI Demers; and Exercise 46 We now arrive at a major transition point in the book. Up until now, we’ve de-
is based on a result of J. Picard and H. Ratliff. veloped efficient algorithms for a wide range of problems and have even made
some progress on informally categorizing the problems that admit efficient
solutions--for example, problems expressible as minimum cuts in a graph, or
problems that allow a dynamic programming formulation. But although we’ve
often paused to take note of other problems that we don’t see how to solve, we
haven’t yet made any attempt to actually quantify or characterize the range of
problems that can’t be solved efficiently.
Back when we were first laying out the fundamental definitions, we settled
on polynomial time as our working notion of efficiency. One advantage of
using a concrete definition like this, as we noted earlier, is that it gives us the
opportunity to prove mathematically that certain problems cannot be solved
by polynomial-time--and hence "efficient"--algorithms.
When people began investigating computational complexity in earnest,
there was some initial progress in proving that certain extremely hard problems
cannot be solved by efficient algorithms. But for many of the most funda-
mental discrete computational problems--arising in optimization, artificial
intelligence, combinatorics, logic, and elsewhere--the question was too dif-
ficult to resolve, and it has remained open since then: We do not know of
polynomial-time algorithms for these problems, and we c’annot prove that no
polynomial-time algorithm exists.
In the face of this formal ambiguity, which becomes increasingly hardened
as years pass, people working in the study of complexity have made significant
progress. A large class of problems in this "gray area" has been characterized,
and it has been proved that they are equivalent in the following sense: a
polynomial-time algorithm for any one of them would imply the existence of a
8.1 Polynomial-Time Reductions 453
Chapter 8 NP and Computational Intractability
452
This formulation of reducibility is very natural. When we ask about reduc-
polynomial-time algorithm for all of them. These are the NP-complete problems, tions to a problem X, it is as though we’ve supplemented our computational
a name that will make more sense as we proceed a little further. There are model with a piece of specialized hardware that solves instances of X in a
literally thousands of NP-complete problems, arising in numerous areas, and single step. We can now explore the question: How much extra power does
the class seems to contain a large fraction of the fundamental problems whose this piece of hardware give us? ~
complexity we can’t resolve. So the formulation of NP-completeness, and the
An important consequence of our definition of_<p is the following. Suppose
proof that all these problems are equivalent, is a powerful thing: it says that Y <p X and there actually exists a polynomial-time algorithm to solve X. Then
all these open questions are really a single open question, a single type of our specialized black box for X is actually not so valuable; we can replace
complexity that we don’t yet fully understand. it with a polynomial-time algorithm for X. Consider what happens to our
From a pragmatic point of view, NP-completeness essentially means "com- algorithm for problem Y that involved a polynomial number of steps plus
putafionallY hard for all practical purposes, though we can’t prove it." Discov- a polynomial number of calls to the black box. It now becomes an algorithm
ering that a problem is NP-complete provides a compelling reason to stop that involves a polynomial number of steps, plus a polynomial number of calls
searching for an efficient algorithm--you might as well search for an efficient to a subroutine that runs in polynomial time; in other words, it h~s become a
algorithm for any of the famous computational problems already known to polynomial-time algorithm. We have therefore proved the following fact.
be NP-complete, for which many people have tried and failed to find efficient
algorithms. (8.1) Suppose Y <p X: If X can be solved in polynomial time, then Y can be
solved in polynomial time.
8.1 Polynomial-Time Reductions We’ve made use of precisely this fact, implicitly, at a number of earlier
Our plan is to explore the space of computationallY hard problems, eventually points in the book. Recall that we solved the Bipartite Matching Problem using
arriving at a mathematical characterization of a large class of them. Our basic a polynomial amount of preprocessing plus the solution of a single instance
technique in this exploration is to compare the relative difficulty of different of the Maximum-Flow Problem. Since the Maximum-Flow Problem can be
problems; we’d like to formally express statements like, "Problem X is at least solved in polynomial time, we concluded that Bipartite Matching could as well.
as hard as problem Y." We will formalize this through the notion of reduction: Similarly, we solved the foreground/background Image Segmentation Problem
we will show that a particular problem X is at least as hard as some other using a polynomial amount of preprocessing plus the solution of a single
problem Y by arguing that, if we had a "black box" capable of solving X, instance of the Minimum-Cut Problem, with the same consequences. Both of
then we could also solve Y. (In other words, X is powerful enough to let us these can be viewed as direct applications of (8.1). Indeed, (8.1) summarizes
solve Y.) a great way to design polynomial-time algorithms for new problems: by
To make this precise, we add the assumption that X can be solved in reduction to a problem we already know how to solve in polynomial time.
polynomial time directly to our model of computation. Suppose we had a In this chapter, however, we will be using (8.1) to establish the computa-
black box that could solve instances of a problem X; if we write down the tional intractability of various problems. We will be engaged in the somewhat
input for an instance of X, then in a single step, the black box will return the subtle activity of relating the tractability of problems even when we don’t know
correct answer. We can now ask the following question: how to solve either of them in polynomial time. For this purpose, we will really
(.) Can arbitrary instances of problem Y be solved using a polynomial be using the contrapositive of (8.1), which is sufficiently valuable that we’ll
number of standard computational steps, plus a polynomial number of state it as a separate fact.
calls to a black box that solves problem X?
(8.2) Suppose Y <_p X. If Y cannot be solved in polynomial time, then X
If the answer to this question is yes, then we write Y _<p X; we read this as cannot be solved in polynomial time.
"Y is polynomial-time reducible to X;’ or "X is at least as hard as Y (with
respect to polynomial time)." Note that in this definition, we still pay for the Statement (8.2) is transparently equivalent to (8.1), but it emphasizes our
time it takes to write down the input to the black box solving X, and to read overall plan: If we have a problem Y that is known to be hard, and we show
the answer that the black box provides.
8.1 Polynomial-Time Reductions
454 Chapter 8 NP and Computational Intractability 455
for O(log n) different values of k.) This simple equivalence between decision
that Y <p X, then the hardness has "spread" to X; X must be hard or else it
and optimization will also hold in the problems we discuss below.
could be used to solve Y.
Now, to illustrate our basic strategy for relating hard problems to one an-
In reality, given that we don’t actually know whether the problems we’re other, we consider another fundamental graph problem for which no efficient
studying can be solved in polynomial time or not, we°ll be using <v to establish
algorithm is known: Vertex Cover. Given a graph G = (V, E), we say that a set
relative levels of difficulty among problems. of nodes S _.c V is a vertex cover if every edge e E E has at least one end in S.
With this in mind, we now establish some reducibilities among an initial Note the following fact about this use of terminology: In a vertex cover, the
collection of fundamental hard problems. vertices do the "cov. ering," and the edges are the objects being "covered." Now,
it is easy to find large vertex covers in a graph (for example, the full.vertex
set is one); the hard part is to find small ones. We formulate the Vertex Cover
A First Reduction: Independent Set and Vertex Cover
Problem as follows.
The Independent Set Problem, which we introduced as one of our five repre-
sentative problems in Chapter 1, will serve as our first prototypical example Given a graph G and a number k, does G contain a vertex cover of size at
of a hard problem. We don’t know a polynomial-time algorithm for it, but we most k?
also don’t know how to p~ove that none exists.
Let’s review the formulation of Independent Set, because we’re going to For example, in the graph in Figure 8.1, the set of nodes {1, 2, 6, 7} is a vertex
add one wrinkle to it. Recall that in a graph G = (V, E), we say a set of nodes cover of size 4, while the set [2, 3, 7] is a vertex cover of size 3.
$ c V is independent if no two nodes in S are ioined by an edge. It ’is easy We don’t know how to solve either Independent Set or Vertex Cover in
to find small independent sets in a graph (for example, a singie node forms polynomial time; but what can we say about their relative difficulty? We now
an independent set); the hard part is to find a large independent set, since
show that they are equivalently hard, by establishing that Independent Set <p
you need to bnild up a large collection of nodes without ever including two Vertex Cover and also that Vertex Cover <p Independent Set. This will be a
neighbors. For example, the set of nodes {3, 4, 5} is an independent set of direct consequence of the following fact.
size 3 in the graph in Figure 8.1, while the set of nodes {1, 4, 5, 6} is a larger
Figure 8.1 A graph whose
largest independent set has independent set.
size 4, and whose smallest (8.3) Let G = (V,E) be a graph. Then S is an independent set if and only if
In Chapter 1, we posed the problem of finding the largest independent set its complement V - S is a vertex cover.
vertex cover has size 3.
in a graph G. For purposes of our current exploration in terms of redncibility,
it wi~ be much more convenient to work with problems that have yes/no Proof. First, suppose that S is an independent set. Consider an arbitrary edge
answers only, and so we phrase Independent Set as follows. e = (u, v). Since S is independent, it cannot be the case that both u and v are
Given a graph G and a number k, does G contain an independent set of in S; so one of them must be in V - S. It follows that every edge has at least
one end in V - S, and so V - S is a vertex cover.
size at least k?
Conversely, suppose that V - S is a vertex cover. Consider any two nodes
In fact, from the point of view of polynomial-time solvability, there is not a u and v in S. If they were joined by edge e, then neither end of e would lie
significant difference between the optimization version of the problem (find in V - S, contradicting our assumption that V - S is a vertex cover. It follows
the maximum size of an independent set) and the decision version (decide, yes that no two nodes in S are joined by an edge, and so S is an independent set.
or no, whether G has an independent set of size at least a given k). Given a
method to solve the optimization version, we automatically solve the decision
version (for any k) as well. But there is also a slightly less obvious converse
to this: If we can solve the decision version of Independent Set for every k, Reductions in each direction between the two problems follow immedi-
then we can also find a maximum independent set. For given a graph G On n ately from (8.3).
nodes, we simply solve the decision version of Independent Set for each k; the
largest k for which the answer is "yes" is the size of the largest independent
set in G. (And using binary search, we need only solve the decision version
8.1 Polynomial-Time Reductions
Chapter 8 NP and Computational Intractability 457
456
Proof. If we have a black box to solve Vertex Cover, then we can decide
whether G has an independent set of size at least k by asking the black h
whether G has a vertex cover of size at most n - k. m
Proof. If we have a black box to solve Independent Set, then we can decide
whether G has a vertex cover of size at most k by asking the black box whether
G has an independent set of size at least n - k. ¯
To sum up, this type of analysis illustrates our plan in general: although
we don’t know how to solve either Independent Set or Vertex Cover efficiently,
(8.4) and (8.5) tell us how we could solve either given an efficient solution to
the other, and hence these two facts establish the relative levels of difficulty
of these problems.
We now pursue this strategy for a number of other problems.
Then the truth assignment v that sets all variables to 1 is not a satisfying ~Any independent set ~
assignment, because it does not satisfy the second of these clauses; but the contains at most one |
truth assignment v’ that sets all variables to 0 is a satisfying assignment. Conflict .node from each triangle.)
Given a set of clauses C1 ..... Ck over a set of variables X = (xl ..... xn},
does there exist a satisfying truth assignment?
There is a special case of SAT that will turn out to be equivalently difficult and
is somewhat easier to think about; this is the case in which all clauses contain
exactly three terms (corresponding to distinct variables). We call this problem
3-Satisfiability , or 3-SAT: Conflict
Given a set of clauses Q ..... Ck, each of length 3, over a set o[ variables Figure 8.3 The reduction from 3-SAT to Independent Set:
X = {xl ..... xn}, does there exist a satisfying truth assignment?
Satisfiability and 3-SafisfiabilltY are really fundamental combinatorial
search problems; they contain the basic ingredients of a hard computational
problem in very "bare-bones" fashion. We have to make n independent deci- One way to picture the 3-SAT instance was suggested earlier: You have to
sions (the assignments for each x~) so as to satisfy a set of constraints. There make an independent 0/1 decision for each of the n variables, and you
are several ways to satisfy each constraint in isolation, but we have to arrange succeed if you manage to achieve one of three ways of satisfying each
our decisions so that all constraints are satisfied simultaneously. clause.
A different way to picture the same 3-SAT instance is as follows: You have
Reducing 3-SAT to Independent Set to choose one term from each clause, and then find a truth assignment
We now relate the type of computational hardness embodied in SAT and 3- that causes al! these terms to evaluate to 1, thereby satisfying all clauses.
SAT to the superficially different sort of hardness represented by the search for So you succeed if you can select a term from each clause in such a way
independent sets and vertex covers in gr, aphs. Specifically, we will show that that no two selected terms "c6nflict"; we say thattwo tern-is conflict if
one is equal to a variable xi and the:bther is equal to its negation ~. If
~ T _<p Independent Se[. The difficul~ in proving a thing like this is clear;
3-S-,~
3-SAT is about setting Boolean variables in the presence of constraints, while we avoid conflicting terms, we ca~ find a truth assignment that makes
Independent Set is about selecting vertices in a graph. To solve an instance of the selected terms from each clause evaluate to 1.
3-SAT using a black box for Independent Set, we need a way to encode all these
Boolean constraints in the nodes and edges of a graph, so that satisfiabllity Our reduction will be based on this second view of the 3-SAT instance;
corresponds to the existence of a large independent set. hereis how we encode it using independent sets in a graph. First, construct a
graph G = (V, E) consisting of 3k nodes grouped into k triangles as shown in
Doing this illustrates a general principle for designing complex reductions
Figure 8.3. That is, for i = 1, 2 ..... k, we construct three vertices vii, va, vi3
Y <p X: building "gadgets" out of components in problem X to represent what
joined to one another by edges. We give each of these vertices a label; vii is
is going on in problem Y. labeled with the jth term from the clause Ci of the 3-SAT instance.
~ T <_p
(8.8) ~-s~; Independent Set.
...... Before proceeding, consider what the independent sets of size k look like
in this graph: Since two vertices cannot be selected from the same triangle,
Proof. We have a black box for Independent Set and want to solve an instance they consist of all ways of choosing one vertex from each of the triangles. This
is implementing our goal of choosing a term in each clause that will evaluate
of 3-SAT consisting of variables X = {xl ..... xn} and clauses C1 .....
to 1; but we have so far not prevented ourselves from choosing two terms that
The key to thinking about the reduction is to realize that there are two
conflict.
conceptually distinct ways of thinking about an instance of 3-SAT.
Chapter 8 NP and Computational Intractability 8.3 Efficient Certification and the Definition of NP
462 463
We encode conflicts by adding some more edges to the graph: For each Transitivity can be quite useful. For example, since we have proved
pair of vertices whose labels correspond to terms that conflict, we add an edge
B-SAT _<p Independent Set _<p Vertex Cover _<p Set Cover,
between them. Have we now destroyed all the independent sets of size k, or
does one still exist? It’s not clear; it depends on whether we can still select one we can conclude that B-SAT _<p Set Cover.
node from each triangle so that no conflicting pairs of vertices are chosen. But
this is precisely what the B-SAT instance required. 8.3 Efficient Certification and the Definition of NP
Let’s claim, precisely, that the original ~-SP2 instance is satisfiable if and Reducibility among problems was the first main ingredient in our study of
only if the graph G we have constructed has an independent set of size at least computational inffactability. The second ingredient is a characterization of the
k. First, if the B-SAT instance is satisfiable, then each triangle in our graph class of problems that we are dealing with. Combining these two ingredients,
contains at least one node whose label evaluates to 1. Let S be a set consisting
together with a powerful theorem of Cook and Levin, wil! yield some surpi~sing
of one such node from each triangle. We claim S is independent; for if there consequences. ~
were an edge between two nodes u, v E S, then the labels of u and v would Recall that in Chapter 1, when we first encountered the Independent Set
have to conflict; but this is not possible, since they both evaluate to 1. Problem, we asked: Can we say anything good about it, from a computational
Conversely, suppose our graph G has an independent set S of size at least point of view? And, indeed, there was something: If a graph does contain an
k. Then, first of all, the size of S is exactly k, and it must consist of one node independent set of size at least k, then we could give you an easy proof of fl~is
from each triangle. Now, we claim that there is a truth assignment v for the fact by exhibiting such an independent set. Similarly, if a B-SAT instance is
variables in the B-SAT instance with the property that the labels of all’ nodes satisfiable, we can prove this to you by revealing the satisfying assignment. It
in S evaluate to 1. Here is how we could construct such an assignment v. For may be an enormously difficult task to actually find such an assignment; but
each variable x;, if neither x~ nor x~ appears as a label of a node in S, then we if we’ve done the hard work of finding one, it’s easy for you to plug it into the
arbitrarily set v(xg) = 1. Otherwise, exactly one of x; or ~ appears as a label clauses and check that they are all satisfied.
of a node in S; for if one node in S were labeled xi and another were labeled The issue here is the contrast between fnding a solution and check-
x-~, then there would be an edge between these two nodes, contradicting our ing a proposed solution. For Independent Set or B-SAT, we do not know a
assumption that S is an independent set. Thus, if xi appears as a label of a
polynomial-time algorithm to find solutions; but checking a proposed solution
node in S, we set v(xi) = 1, and otherwise we set v(x~) = 0. By constructing v
to these problems can be easily done in polynomial time. To see that this is
in this way, all labels of nodes in S will_evaluate to 1.
not an entirely trivial issue, consider the problem we’d face if we had to prove
Since G has an independent set of size at least k if and only if the original that a B-SAT instance was not satisfiable. What "evidence" could we show that
3-SAT instance is satisfiable, the reduction is complete. [] would convince you, in polynomial time, that the instance was unsatisfiable?
for this consistent failure is that these problems simply cannot be solved in In 1971, Cook and Levin independently showed how to do this for very
natural problems in 3gT. Maybe the most natural problem choice for a first
polynomial time.
NP-complete problem is the following Circuit Satisfiability Problem.
To specify this problem, we need to make precise what we mean by a
circuit. Consider the standard Boolean operators that we used to define the
8.4 NP-Complete Problems Satisfiability Problem: ^ (AND), V (oa), and -, (NOT). Our definition of a circuit
In the absence of progress on the ~P = ~T question, people have turned to a is designed to represent a physical circuit built out of gates that implement
related but more approachable question: What are the hardest problems in these operators. Thus we define a circuit K to be a labeled, directed acyclic
N~p~. Polynomial-time reducibility gives us a way of addressing this question graph such as the one shown in the example of Figure 8.4.
and gaining insight into the structure of NT.
o The soarces in K (the nodes with no incoming edges) are labeled either
Arguably the most natural way to define a "hardest" problem X is via the
with one of the constants 0 or 1, or with the name of a distinct variable.
following two properties: (i) X E N~P; and (ii) for all Y E 3~T, Y _<p X. In other
The nodes of the latter type will be referred to as the inputs to the circuit.
words, we require that every problem in 3~T can be reduced to X. We will cail
o Every other node is labeled with one of the Boolean operators ^, v, or
such an X an NP-complete problem.
The following fact helps to further reinforce our use of the term hardest. --,; nodes labeled with ^ or v wil! have two incoming edges, and nodes
labeled with -, will have one incoming edge.
* There is a single node with no outgoing edges, and it will represent the
(8.12) Suppose X is an NP-compIete problem. Then X is solvable in poIyno- output: the result that is computed by the circuit.
L//’mial time if and only if :P = NT. . .............
A circuit computes a function of its inputs in the following natural way. We
imagine the edges as "wires" that carry the 0/1 value at the node they emanate
Proof. Clearly, if T = 3~T, then X can be solved in polynomial time since it from. Each node v other than the sources will take the values on its incoming
belongs to NT. Conversely, suppose that X can be solved in polynomial time. edge(s) and apply the Boolean operator that labels it. The result of this ^, v,
If Y is any other problem in NT, then Y _<vX, and so by (8.1), it follows that or -, operation will be passed along the edge(s) leaving v. The overall value
Y can be solved in polynomial time. Hence NT _ T; combined with (8.!0), computed by the circuit will be the value computed at the output node.
we have the desired conclusion, m
For example, consider the circuit in Figure 8.4. The leftmost two sources
A crucia! consequence of (8.12) is the following: If there is any problem in are preassigned the values 1 and 0, and the next three sources constitute the
N~P that cannot be solved in polynomiai time, then no NP-complete problem
can be solved in polynomiai time. Output:
seen problems that (like 3-SAT) have involved searching over 0/1 settings to a several Hamiltonian cycles; one visits the nodes in the order 1, 6, 4, 3, 2, 5, 1,
collection of variables. Another type of computationally hard problem involves while another visits the nodes in the order 1, 2, 4, 5, 6, 3, 1.
searching over the set of all permutations of a collection of objects. The Hamfltonian Cycle Problem is then simply the following:
Given a directed graph G, does it contain a Hamiltonian cycle?
The Traveling Salesman Problem
Probably the most famous such sequencing problem is the Traveling Salesman Proving Hamiltonian Cycle is NP-Complete
Problem. Consider a salesman who must visit n cities labeled vl, v2 ..... v~. We now show that both these problems are NP-complete. We do this by first
The salesman starts in city vl, his home, and wants to find a tour--an order establishing the NP-completeness of Hamfltonian Cycle, and then proceedJ_ng
in which to visit al! the other cities and return home. His goal is to find a tour to reduce from Hamiltonian Cycle to Traveling Salesman.
that causes him to travel as little total distance as possible.
(8.17) Hamiltonian Cycle is NP-complete. .... ~
To formalize this, we will take a very general notion of distance: for each .............................. ~ ............................. ......... :. : ......... ................
ordered pair of cities (vi, vj), we will specify a nonnegafive number d(vi, vj)
as the distance from vi to vj. We will not require the distance to be symmetric Proof. We first show that Hamiltonian Cycle is in ~P. Given a directed graph
(so it may happen that d(vi, vj) ~: d(vl, vi)), nor will we reqt~e it to satisfy G = (V, E), a certificate that there is a solution would be the ordered list of
the triangle inequality (so it may happen that d(vi, vj) plus d(vp vk) is actually the vertices on a Hamiltonian cycle. We could then check, in polynomial time,
that this list of vertices does contain each vertex exactly once, and that each
less than the "direct" distance d(vi, vk)). The reason for this is to make our
formulation as general as possible. Indeed, Traveling Salesman arises naturally consecutive pair in the ordering is joined by an edge; this would establish that
in many applications where the points are not cities and the traveler is not a the ordering defines a Hamfltonian cycle.
salesman. For example, people have used Traveling Salesman formulations for We now show that 3-SAT <p Harniltonian Cycle. Why are we reducing
problems such as planning the most efficient motion of a robotic arm that drills from 3-SAT? Essentially, faced with Hamiltonian Cycle, we really have no idea
holes in n points on the surface of a VLSI chip; or for serving I/O requests on what to reduce from; it’s sufficiently different from all the problems we’ve
a disk; or for sequencing the execution of n software modules to minimize the seen so far that there’s no real basis for choosing. In such a situation, one
context-switching time. strategy is to go back to 3-SAT, since its combinatorial structure is very basic.
Thus, given the set of distances, we ask: Order the cities into a tour Of course, this strategy guarantees at least a certain level of complexity in the
reduction, since we need to encode variables and clauses in the language of
vii, vi2 ..... vin, with i~ = 1, so as to minimize the total distance ~ d(vij, Vij+~) +
graphs.
d(vin, vq). The requirement i~ = 1 simply "orients" the tour so that it starts at
the home city, and the terms in the sum simply give the distance from each city So consider an arbitrary instance of 3-SAT, with variables x~ ..... xn and
on the tour to the next one. (The last term in the sum is the distanc.e required clauses C1 ..... Ck. We must show how to solve it, given the ability to detect
for the salesman to return home at the end.) Hamiltonian cycles in directed graphs. As always, it helps to focus on the
Here is a decision version of the Traveling Salesman Problem. essential ingredients of 3-SAT: We can set the values of the variables however
we want, and we are given three chances to satisfy each clause.
Given a set of distances on n cities, and a bound D, is there a tour of length We begin by describing a graph that contains 2n different Hamiltonian
at most D? cycles that correspond very naturally to the 2n possible truth.assignments to
the variables. After this, we will add nodes to model the.constraints imposed
by the clauses.
The Hamiltonian Cycle Problem
The Traveling Salesman Problem has a natural graph-based analogue, which We construct n paths P1 ..... Pn, where Pi consists of nodes vii,
forms one of the fundamental problems in graph theory. Given a directed graph for a quantity b that we take to be somewhat larger than the number of clauses
G = (V, E), we say that a cycle C in G is a Hamiltonian cycle if it visits each k; say, b = 3k + 3. There are edges from vii to vi,j+l and in the other direction
Figure 8.6 A directed graph vertex exactly once. In other words, it constitutes a "tour" of all the vertices, from vid+l to vq. Thus P~ can be traversed "left to fight;’ from vi~ to rib, or
contain~g a Hamflton~an "right to left," from rib to
cycle.
with no repetitions. For example, the directed graph pictured in Figure 8.6 has
8.5 Sequencing Problems 477
Chapter 8 NP and Computational Intractability
476
This naturally models the n independent choices of how to set each vari-
~ (Hamiltonian cycles correspond to~
ables x1 ..... xn in the 3-SAT instance. Thus we will identify each Hamiltonian
cycle uniquely with a truth assignment as follows: If e traverses P~ left to fight,
then x~ is set to 1; otherwise, x~ is set to 0.
Now we add nodes to model the clauses; the 3-SAT instance will turn out
to be satisfiable if and only if any Hamiltonlan cycle survives. Let’s consider,
as a concrete example, a clause
P~
C1 -----x1 v~2 vx3.
In the language of Hamiltonian cycles, this clause says, "The cycle should
traverse P1 left to right; or it should traverse P2 fight to left; or it should traverse
Pz P3 left to right." So we add a node c~, as in Figure 8.8, that does just this. (Note
that certain edges have been eliminated from this drawing, for the sake of
clarity.) For some value of g, node c1 will have edges from vle, v2,e+~, and
u3e; it will have edges to u~,e+~, u2,e, and u3,e+~. Thus it can be easily spliced
into any Hamiltonian cycle that traverses P~ left to right by visiting node c~
between vie and u~,e+~; similarly, q can be spliced into any Hamiltonian cycle
P
that traverses P2 right to left, or P~ left to right. It cannot be spliced into a
Hamfltonian cycle that does not do any of these things.
More generally, we will define a node c] for each clause Cj. We will reserve
node positions 3j and 3j + 1 in each path Pi for variables that participate in
clause Cj. Suppose clause Cj contains a term t. Then if t = xi, we will add
edges (u~,3j, q) and (c], ~i,3~+~); if t = ~, we will add edges (ui,3]+~, cj) and
(c;, vi,~).
Figure 8.7 The reduction from 3-SAT to Hamfltonian Cgcle: part 1. This completes the construction of the graph G. Now, following our
generic outline for NP-completeness proofs, we claim that the 3-SAT instance
is satisfiable if and only if G has a Hamiltonlan cycle.
First suppose there is a satisfying assignment for the 3-SAT instance. Then
we define a Hamiltonian cycle following our informal plan above. If x~ is
We hook these paths together as follows. For each i = 1, 2 ..... n -. 1, we assigned 1 in the satisfying assignment, then we traverse the path P~ left to
define edges from vii to v;+l,~ and to vi+~,b. We also define edges from rib to fight; otherwise we traverse Pi right to left. For each clause Cj, since it is
vi+~,~ and to vi+l,b. We add two extra nodes s and t; we define edges from s satisfied by the assignment, there will be at least one path P~ in which we will
to vn and v~; from Vn~ and vnb to t; and from t to s. be going in the "correct" direction relative to the node cj, and we can splice it
The construction up to this point is pictured in Figure 8.7. It’s important into the tour there via edges incident on vi,3j and vi,3j+~.,
to pause here and consider what the Hamiltonian cycles in our graph look like. Conversely, suppose that there is a Hamiltonian cycle e in G. The crucial
Since only one edge leaves t, we know that any Hamiltonlan cycle C must use thing to observe is the following. If e enters a node cj on an edge from ui,~j,
the edge (t, s). After entering s, the cycle C can then traverse P1 either left to it must depart on an edge to u~,3j+~. For if not, then v~,3j+~ will have only one
right or right to left; regardless of what it does here, it can then traverse Pz
unvisited neighbor left, namely, t~,3~+2, and so the tour will not be able to
either left to right or fight to left; and so f~rth, until it finishes traversing Pn visit this node and still maintain the Hamiltonlan property. Symmetrica~y, ff it
and enters t. In other words, there are exactly 2n different Hamiltonian cycles,
enters from ~.~]+1, it must depart immediately to vi,~j. Thus, for each node c~,
and they correspond to the n independent choices of how to traverse each Pi.
8.5 Sequencing Problems 479
Chapter 8 NP and Computational Intractability
478
Having established that the 3-SAT instance is satisfiable if and only if G
cl can only be visited if ~ has a Hamiltonian cycle, our proof is complete. ,,
cycle traverses some pathI
correct direction. __J
Proving Traveling Salesman is NP-Complete
Armed with our basic hardness result for Hamiltonian Cycle, we can move on
to show the hardness of Traveling Salesman.
As in other 3-SAT reductions, let’s consider a clause like xl v x~ v x3._ In the first place. The goal is to have a node like the topmost one that cann6t receive any color. So we
the language of 3-colorings of G, it says, "At least one of the nodes vl, vz, or start by "plugging in" three nodes corresponding to the terms, all colored False, at the bottom. For
each one, we then work upward, pairing it off with a node of a known color to force the node above
v3 should get the True color." So what we need is a little subgraph that we can to have the third color. Proceeding in this way, we can arrive at a node that is forced to have any
plug into G, so that any 3-co!oring that extends into this subgraph must have color we want. So we force each of the three different colors, starting from each of the three different
the property of assigning the True color to at least one of u~, ~, or u~. Ittakes terms, and then we plug all three of these differently colored nodes into our topmost node, arriving
some experimentation to find such a subgraph, but one that works is depicted at the impossibility.
in Figure 8.12.
8.8 Numerical Problems 491
Chapter 8 NP and Computational Intractability
490
The Subset Sum Problem
We now claim that the given 3-SAT instance is satisfiable if and only if G’
has a 3-coloring. First, suppose that there is a satisfying assignment for the Our basic problem in this genre will be Subset Sum, a special case of the
True, and Knapsack Problem that we saw before in Section 6.4 when we covered dynamic
~~-SA coloring of G’ by first coloring Base,
~T instance. We define a programming. We can formulate a decision version of this problem as follows.
False arbitrarily with the three colors, then, for each i, assigning vi the True
color if xi = 1 and the False color if xi = 0. We then assign v~ the only available Given natural numbers wI ..... wn, and a target number W, is there a
color. Finally, as argued above, it is now possible to extend this 3-coloring into subset of [w1 ..... wn} that adds up to precisely W?
each six-node clause subgraph, resulting in a 3-.coloring of all of G’. We have already seen an algorithm to solve this problem; why are we now
Conversely, suppose G’ has a 3-coloring. In this coloring, each node v~ including it on out’list of computationally hard problems? This goes back to an
is assigned either the True color or the False color; we set the variable xi issue that we raised the first time we considered Subset Sum in Section 6.4. The
correspondingly. Now we claim that in each clause of the 3-SAT instance, at aigorithm we developed there has running time O(nW), which is reasonable
when W is small, but becomes hopelessly impractical as W.(and the numbers
least one of the terms in the clause has the truth value 1. For if not, then all
three of the corresponding nodes has the False color in the 3-co!oring of G’ wi) grow large. Consider, for example, an instance with 100 numbers, .each of
and, as we have seen above, there is no 3-coloring of the corresponding clause which is 100 bits long. Then the input is only 100 x 100 = 10,000 digits, but
subgraph consistent with this--a contradiction. [] W is now roughly 2l°°.
To phrase this more generally, since integers will typically be given in bit
When k > 3, it is very easy to reduce the 3-Coloring Problem to k-Coloring. representation, or base-10 representation, the quantity W. is really exponential
Essentially, all we do is to take an instance of 3-Coloring, represent6d by a in the size of the input; our algorithm was not a polynomial-time algorithm.
graph G, add k - 3 new nodes, and join these new nodes to each other and to (We referred to it as pseudo-polynom.ial, to indicate that it ran in time polyno-
every node in G. The resulting graph is k-colorable if and only if the original mial in the magnitude of the input numbers, but not po!ynomial in the size of
graph G is 3-colorable. Thus k-Coloring for any k > 3 is NP-complete as well. their representation.)
This is an issue that comes up in many settings; for example, we encoun-
tered it in the context of network flow algorithms, where the capacities had
Coda: The Resolution of the Four-Color Conjecture integer values. Other settings may be familiar to you as well. For example, the
To conclude this section, we should finish off the story of the Four-Color security of a cryptosystem such as RSA is motivated by the sense that ~actoring
Conjecture for maps in the plane as well. After more than a hundred years, a 1,000-bit number is difficult. But if we considered a running time of 2l°°°
the conjecture was finally proved by Appel and Haken in 1976. The structure steps feasible, factoring such a number would not be difficult at all.
of the proof was a simple induction on the number of regions, but the It is worth pausing here for a moment and asking: Is this notion of
induction step involved nearly two thousand fairly complicated cases, and polynomial time for numerical operations too severe a restriction? For example,
the verification of these cases had to be carried out by a computer. This was given two natural numbers w1 and w2 represented in base-d notation for some
not a satisfying outcome for most mathematicians: Hoping for a proof that d > 1, how long does it take to add, subtract, or multiply them? This is an
would yield some insight into why the result was true, they instead got a case issue we touched on in Section 5.5, where we noted that the standard ways
analysis of enormous complexity whose proof could not be checked by hand. that kids in elementary school learn to perform these operations have (low-
The problem of finding a reasonably short, human-readable proof still remains degree) polynomial running times. Addition and subtraction (with carries) take
open. O(log Wl + log w2) time, while the standard multiplication algorithm runs in
O(log Wl- log w2) time. (Recall that in Section 5.5 we discussed the design of an
asymptotically faster multiplication algorithm that elementary schoolchildren
8.8 Numerical Problems are unlikely to invent on their own.)
We now consider some computationailY hard problems that involve arithmetic So a basic question is: Can Subset Sum be solved by a (genuinely)
operations on numbers. We will see that the intractability here comes from the polynomial-time algorithm? In other words, could there be an algorithm with
way in which some of the problems we have seen earlier in the chapter can running time polynomial in n and log W? Or polynomial in n alone?
be encoded in the representations of very large integers.
8.8 Numerical Problems 493
Chapter 8 NP and Computational Intractability
492
triples t1 ..... tiz. Then in the sum vat1 + " "" + wt,, there is a single 1 in each
Proving Subset Sum Is NP-Complete of the 3n digit positions, and so the result is equal to W.
The following result suggests that this is not likely to be the case. Conversely, suppose there exists a set of numbers vaq ..... vatk that adds
up to W. Then since each vat~ has three ls in its representation, and there are no
carries, we know that/c = n. It follows that for each of the 3n digit positions,
(8.2i~) Subset Sum is N-P-complete. exactly one of the vat, has a 1 in that position. Thus, tl ..... tk constitute a
perfect three-dimensional matching. []
Proof. We first show that Subset Sum is in ~f{P. Given natural numbers
val ..... van, and a target W, a certificate that there is a solution would be
the subset wi~ ..... vaik that is purported to add up to W. In polynomial time, Extensions: The Hardness of Certain Scheduling Problems
we can compute the sum of these numbers and verify that it is equal to W. The hardness of Subset Sum can be used to establish the hardness of a range
We now reduce a known NP-complete problem to Subset Sum. Since we of scheduling problems--including some that do not o’bviously involve the
are seeking a set that adds up to exactly a given quantity (as opposed to being addition of numbers. Here is a nice example, a natural (but much harder)
bounded above or below by this quantity), we look for a combinatorial problem generalization of a scheduling problem we solved in Section 4.2 using a greedy
that is based on meeting an exact bound. The B-Dimensional Matching Problem algorithm.
is a natural choice; we show that B-Dimensional Matching <~p Subset Sum. The Suppose we are given a set of n jobs that must be run on a single machine.
trick will be to encode the manipulation of sets via the addition of integers. Each iob i has a release time ri when it is first available for processing; a
So consider an instance of 3-Dimensional Matching specified by sets deadline d, by which it must be completed; and a processing duration t~. We
X, Y, Z, each of size n, and a set of m triples T _ X × Y × Z. A common will assume that all of these parameters are natural numbers. In order to be
way to represent sets is via bit-vectors: Each entry in the vector corresponds tO completed, job i must be allocated a contiguous slot of ti time units somewhere
a different element, and it holds a 1 if and only if the set contains that element. in the interval [r~, di]. The machine can run only one job at a time. The question
We adopt this type of approach for representing each triple t = (xi, yj, zk) ~ T: is: Can we schedule all jobs so that each completes by its deadline? We will
we construct a number vat with 3n digits that has a 1 in positions i, rt +], and call this an instance of Scheduling vaith Release Times and Deadlines.
2n + k, and a 0 in all other positions. In other words, for some base d > 1,
vat ---- di-1 ÷ dn+j-1 ÷ d2n+k-1. (8.24) Scheduling with Release Times and Deadlines is N-P-complete.
Note how taking the union of triples almost corresponds to integer ad-
dition: The ls fill in the places where there is an element in any of the sets. Proof. Given an instance of the problem, a certificate that it is solvable would
But we say almost because addition includes carries: too many ts in the same be a specification of the starting time for each job. We could then check that
column will "roll over" and produce a nonzero entry in the next column. This each job runs for a distinct interval of time, between its release time and
has no analogue in the context of the union operation. deadline. Thus the problem is in
In the present situation, we handle this problem by a simple trick. We have We now show that Subset Sum is reducible to this scheduling problem.
only m numbers in all, and each has digits equal to 0 or 1; so if we assume Thus, consider an instance of Subset Sum with numbers va~ ..... van and a
that our numbers arewritten in base d = m + 1, then there will be no carries target W. In constructing an equivalent scheduling instance, one is struck
at all.Thus we construct the following instance of Subset Sum. For each triple initially by the fact that we have so many parameters, to manage: release
t = (xi, Yj, zk) ~ T, we construct a number vat in base m + 1 as defined above. times, deadlines, and durations. The key is to sacrifice most of this flexibility,
producing a "skeletal" instance of the problem that still encodes the Subset
We define V¢ to be the number in base m + 1 with 3n digits, each of which is Sum Problem.
.--,3n- 1 -
equal to 1, that is, V¢ = 2_-,~=0 (m + 1)~. Let S = Y~.i~ vai- We define jobs 1, 2 ..... n; job i has a release time of
We claim that the set T of triples contains a perfect three-dimensional 0, a deadline of S + 1, and a duration of vai. For this set of jobs, we have the
matching if and only if there is a subset of the numbers {vat} that adds up to freedom to arrange them in any order, and they will al! finish on time.
V¢. For suppose there is a perfect three-dimensional matching consisting of
8.9 Co-NP and the Asymmetry of NP 495
Chapter 8 NP and ComputatiOnal Intractability
494
Pn, each of which.is a separate connected component. We set k = W.
We now further constrain the instance so that the only way to solve it will It is clear that G has a set of connected components whose union has size k if
be to group together a subset of the jobs whose durations add up precisely to and only if some subset of the numbers w1 ..... wn adds up to W. ,,
W. To do this, we define an (n + 1)st job; it has a release time of W, a deadline
of W + 1, and a duration of 1. The error here is subtle; in particular, the claim in the last sentence
Now consider any feasible solution to this instance of the scheduling is correct. The problem is that the construction described above does not
problem. The (n + 1)st job must be run in the interval [W, W + 1]. This leaves establish that Subset Sum <p Component Grouping, because it requires more
S available time units between the common release time and the common than polynomial t.ime. In constructing the input to our black box that solves
deadline; and there are S time units worth of jobs to run. Thus the machine Component Grouping, we had to build the encoding of a graph~ of size wI +
must not have any idle time, when no jobs are running. In particular, if jobs ¯ ¯ ¯ + wn, and this takes time exponential in the size of the input to the Subset
ik are the ones that run before time W, then the corresponding numbers Sum instance. In effect, Subset Sum works with the numbers w1 ..... wn
wig in the Subset Sum instance add up to exactly W. in a very compact representation, but Component Grouping. does not accept
Conversely, if there are numbers wil ..... wig that add up to exactly W, "compact" encodings of graphs. ’
then we can schedule these before job n + I and the remainder after job n + !; The problem is more fundamenta! than the incorrectness of this proof; in
this is a feasible solution to the schednling instance. TM fact, Component Grouping is a problem that can be solved in polynomial time.
If nl, n2 ..... nc denote the sizes of the connected components of G, we simply
Caveat: Subset Sum with Polynomially Bounded Numbers use our dynamic programming algorithm for Subset Sum to decide whether
There is a very common source of pitfalls involving the Subset Sum Problem, some subset of these numbers {hi} adds up to k. The running time required
and while it is closely connected to the issues we have been discussing already, for this is O(ck); and since c and/( are both bounded by n, this is O(n2) time.
we feel it is worth discussing explicitly¯ The pitfall is the following.. Thus we have discovered a new polynomial-time algorithm by reducing in
Consider the special case of Subset Sum, with n input numbers, in which W the other direction, to a polynomia!-time solvable special case of Subset Sum.
is bounded by a polynomial ~nction o[ n. Assuming ~P ~ ~N~P, this special
case is not NP-comptete.
It is not NP-complete for the simple reason that it can be solved in time O(nW), 8.9 Co-NP and the Asymmetry of NP
by our dynamic programming algorithm from Section 6.4; when W is bounded As a further perspective on this general class of problems, let’s return to the
by a polynomial function of n, this is a polynomial-time algorithm. definitions underlying the class NT. We’ve seen that the notion of an efficient
All this is very clear; so you may ask: Why dwell on its. The reason is that certifier doesn’t suggest a concrete algorithm for actually solving the problem
there is a genre of problem that is often wrongly claimed to be NP-complete that’s better than brute-force search.
(even in published papers) via reduction from this special case of Stibset Sum. Now here’s another observation: The definition of efficient certification,
Here is a basic example of such a problem, which we will call Component and hence of X0~, is fundamentally asymmetric. An input string s is a "yes"
Grouping. instance if and only if there exists a short t so that B(s, t) = yes. Negating this
Given a graph G that is not connected, and a number k, does there exist a statement, we see that an input string s is a "no" instance if and only if for all
subset o[ its connected components whose union has size exactly k? short t, it’s the case that B(s, t) = no.
This relates closely to our intuition about X~P: Wi-ien we have a "yes"
Incorrect Claim. Component Grouping is NP-complete. instance, we can provide a short proof of this fact. But when we have a "no"
instance, no correspondingly short proof is guaranteed by the definition; the
Incorrect Proof. Component Grouping is in ~q~Y, and we’ll skip the proof answer is no simply because there is no string that will serve as a proof. In
of this. We now attempt to show that Subset Sum <_p Component Grouping¯ concrete terms, recall our question from Section 8.3: Given an unsatisfiable set
Given an instance of Subset Sum with numbers wl ..... Wn and target W, of clauses, what evidence could we show to quickly convince you that there
we construct an instance of Component Grouping as follows. For each i, we is no satisfying assignment?
construct a path Pi of length wi. The graph G will be the union of the paths
8.10 A PalPal Taxonomy of Hard Problems 497
Chapter 8 NP and Computational Intractability
496
~P V~ co-~P are said to have a good characterization, since there is always a
For every problem X, there is a natural complementary problem ~: F__or all nice certificate for the solution.
input strings s, we say s a ~ if and only if s 9~ X. Note that if X ~ ~P, then X This notion corresponds directly to some of the results we hay6 seen earlier.
since from an algorithm A that solves X, we can simply produce an algorithm For example, consider the problem of determining whether a flow network
~ that runs A and then flips its answer. contains a flow of value at least v, for some quantity v. To prove that the
But it is far from clear that if X ~ ~N~P, it should follow that ~ ~ ~N~P. The answer is yes, we could simply exhibit a flow that achieves this value; this
problem ~, rather, has a different property: for all s, we have s ~ ~ if and only is consistent with the problem belonging to ~. But we can also prove the
if for all t of length at most p(Isl), B(s, t) = no. This is a fundamentally different answer is no: We can exhibit a cut whose capacity is strictly less than v. This
definition, and it can’t be worked around by simply "inverting" the output of duality between "~es" and "no" instances is the crux of the Max-Flow Min-Cut
the efficient certifier B to produce ~. The problem is that the exist t" in the Theorem.
definition of ~ has become a "for all t," and this is a serious change. Similarly, Hall’s Theorem for matchings from Section 7.5 proved that the
There is a class of problems parallel to ~N~ that is designed to model this Bipartite Perfect Matching Problem is in ~ A co-?@: We can exhibit either
issue; it is called, naturally enough, co-~N~P. A problem X belongs to co-~N~P if a perfect matching, or a set of vertices A _~ X such that the total number of
and only if the complementary problem ~ belongs to ~f~P. We do not know for neighbors of A is strictly less than IAI.
sure that ~f~P and co-~NO~ are different; we can only ask Now, if a problem X is in ~, then it belongs to both ~@ and co-~f~;
thus, ~P ___ ~ V~ co-~f~. Interestingly, both our proof of the Max-Flow Min-Cut
(8.25) Does ~P = co-~? Theorem and our proof of Hall’s Theorem came hand in hand with proofs of
Again, the widespread belief is that ~P ~ co-~P: Just because the "yes" the stronger results that Maximum Flow and Bipartite Matching are problems
instances of a problem have short proofs, it is not clear why we should believe in ~. Nevertheless, the good characterizations themselves are so clean that
that the "no" instances have short proofs as well. formulating them separately still gives us a lot of conceptual leverage in
reasoning about these problems.
Proving ~N~? ~: co-~ would be an even bigger step than proving ~P Naturally, one would like to know whether there’s a problem that has a
for the following reason: good characterization but no polynomial-time algorithm. But this too is an
open question:
one counselor who’s skilled at each of the n sports covered by the camp k. We are asked: Is there a hitting set H _~ A for B~, B2 ..... Brn SO that the
(baseball, volleyball, and so on). They have received job applications from size of H is at most k?
m potential counselors. For each of the n sports, there is some subset Prove that Hitting Set is NP-complete.
of the m applicants qualified in that sport. The question is: For a given
number k < m, is it possible to hire at most k of the counselors and have Consider an instance of the Satisfiability Problem, specified by clauses
at least one counselor qualified in each of the n sports? We’ll call this the Q over a set of Boolean variables x~ ..... xn. We say that the
instance is monotone if each term in each clause consists of a non.negated
Efficient Recruiting Problem.
variable; that is, each term is equal to x~, for some i, rather than ~.
Show that Efficient Recruiting is NP-complete. Monotone instances of Satisfiability are very easy to solve: ~They are
Suppose you’re consulting for a group that manages a high-performance always satisfiable, by setting each variable equal to 1.
real-time system in which asynchronous processes make use of shared For example, suppose we have the three clauses
resources. Thus the system has a set of n processes and a set of ra
resources. At any given point in time, each process specifies a set of
resources that it requests to use. Each resource might be requested by This is monotone, and indeed the assignment that sets al! three variables
many processes at once; but it can only be used by a single process at to 1 satisfies all the clauses. But we can observe that this is not the only
a time. Your job is to allocate resources to processes that request them. satisfying assignment; we conld also have set Xl and x2 to 1, and x3 to 0.
If a process is allocated all the resources it requests, then it is active; Indeed, for any monotone instance, it is natural to ask how few variables
otherwise it is blocked. You want to perform the allocation so that is many we need to set to 1 in order to satisfy it.
processes as possible are active. Thus we phrase the Resource Reservation Given a monotone instance of Satisfiability, together with a number
Problem as follows: Given a set of processes and resources, the set of k, the problem of Monotone Satisfiability with Few True Variables asks: Is
requested resources for each process, and a number k, is it possible to there a satisfying assignment for the instance in which at most k variables
allocate resources to processes so that at least k processes will be active? are set to 1? Prove this problem is NP-complete.
Consider the following list of problems, and for each problem ei-
Since the 3-Dimensional Matching Problem is NP-complete, it is natural
ther give a polynomial-time algorithm or prove that the problem is NP-
to expect that the corresponding 4-Dimensional Matching Problem is at
complete.
least as hard. Let us define 4-DimensionaIMatching as follows. Given sets
(a) The general Resource Reservation Problem defined above.
W, X, Y, and Z, each of size n, and a collection C of ordered 4-tuples of the
(b) The special case of the problem when k = 2. form (wi, xj, Yk, z~), do there exist n 4-tuples from C so that no two have
(c) The special case of the problem when there are two types of re- an element in common?
sources--say, people and equipment--and each process requires Prove that 4-Dimensional Matching is NP-complete.
at most one resource of each type (In other words, each process
requires one specific person and one specific piece of equipment.) 8. Your friends’ preschool-age daughter Madison has recently learned to
spell some simple words. To help encourage this, her parents got her a
(d) The special case of the problem when each resource is requested by colorful set of refrigerator magnets featuring the letters of the alphabet
at most two processes. (some number of copies of the letter A, some number of copies of the
5. Consider a set A = {al ..... an} and a collection B1, B2 ..... Bm of subsets of letter B, and so on), and the last time you saw her the two of you spent a
A (i.e., Bi _ A for each i). while arranging the magnets to spell out words that she knows.
We say that a set H _c A is a hitting set for the collection B~, B2 ..... Bra Somehow with you and Madison, things always end up getting more
if H contains at least one element from each B;--that is, ff H n Bi is not elaborate than originally planned, and soon the two of you were trying
empW for each i (so H "hits" all the sets B~). to spell out words so as to use up all the magnets in the full set--that
We now define the Hitting Set Problem as follows. We are given a set is, picking words that she knows how to spell, so that once they were all
A = {al ..... an}, a collection B1, B2 ..... Bm of subsets of A, and a number spelled out, each magnet was participatingoin the spelling of exactly one
Exercises 509
Chapter 8 NP and Computational Intractability
508
Prove that Strategic Advertising is NP-complete.
of the words. (Multiple copies of words are okay here; so for example, if
(b) Your friends at WebExodus forge ahead and write a pretty fast algo-
the set of refrigerator magnets includes two copies each of C, A, and T, . .....
rithm 8 that produces yes/no answers to arbitrary instances of the
it would be okay to spell out CAT twice.)
Strategic Advertising Problem. You may assume that the algorithm
This turned out to be pretty difficult, and it was only later that you 8 is always correct.
realized a plausible reason for this. Suppose we consider a general version
Using the algorithm 8 as a black box, design an algorithm that
of the problem of Using Up All the Refrigerator Magnets, where we replace
takes input G, {Pi}, and k as in part (a), and does one of the following
the English alphabet by an arbitrary collection of syznbols, and we model
two things:
Madison’s vocabulary as an arbitrary set of strings over this collection of
- Outputs a set of at most k nodes in G so that each path Pi includes
symbols. The goal is the same as in the previous paragraph.
Prove that the problem of Using Up All the Refrigerator Magnets is at least one of these nodes, or ’~
- Outputs (correctly) that no such set of at most k n6des exists.
Your algorithm should use at most a polynomial number of steps, to-
9.~CPo-~sreplete" gether with at most a polynomial number of calls to the .algorithm 8.
ider the following problem. You are managing a communication
network, modeled by a directed graph G = (V, E). There are c users who 11. As some people remember, and many have been told, the idea of hyper-
are interested in malting use of this network. User i (for each i = t, 2 ..... c) text predates the World Wide Web by decades. Even hypertext fiction is
issues a request to reserve a specific path Pi in G on which to transmit a relatively old idea: Rather than being constrained by the linearity of
data. the printed page, you can plot a story that consists of a collection of
You are interested in accepting as many of these path requests as interlocked virtual "places" joined by virtual "passages.’’4 So a piece of
possible, subject to the following restriction: ff you accept both Pi and hypertext fiction is really riding on an tmderlying directed graph; to be
then Pi and Pj cannot share any nodes. concrete (though narrowing the fuli range of what the domain can do),
Thus, the Path Selection Problem asks: Given a directed graph G = we’ll model this as follows.
(V, E), a set of requests P1, P2 ..... pc--each of which must be a path in Let’s view the structure of a piece of hypertext fiction as a directed
G--and a number k, is it possible to select at least k of the paths so that graph G = (V, E). Each node u e V contains some text; when the reader is
no two of the selected paths share any nodes? currently at u, he or she can choose to follow any edge out of u; and if the
Prove that Path Selection is NP-complete. reader chooses e = (u, v), he or she arrives next at the node v. There is a
start node s ~ V where the reader begins, and an end node t ~ V; when the
Jl0. Your friends at WebExodus have recently been doing some consullAng reader first reaches t, the story ends. Thus any path from s to t is a valid
work for companies that maintain large, publicly accessible Web sites-- plot of the story. Note that, unlike one’s experience using a Web browser,
contractual issues prevent them from saying which ones--and they’ve there is not necessarily a way to go back; once you’ve gone from a to ~,
come across the following Strategic Advertising Problem. ~ ~ou might not be able to ever return to u.
A company comes to them with the map of a Web site, which we’ll In this way, the hypertext structure defines a huge number of differ-
model as a directed graph G = (V, E). The companY also provides a set of ent plots on the same underlying content; and the relationships among
t trails typically followed by users of the site; we’ll model these trails as all these possibilities can grow very intricate. Here’s a type of problem
directed paths P1, P2 ..... Pt in the graph G (i.e., each Pi is a path in G). one encounters when reasoning about a structure 3ike this. Consider a
The companY wants WebExodus to answer the following question piece of hypertext fiction built on a graph G = (V, E) in which there are
for them: Given G, the paths {P~}, and a number k, is it possible to place certain crucial thematic elements: love, death, war, an intense desire to
advertisements on at most k of the nodes in G, so that each path P~ major in computer science, and so forth. Each thematic element i is rep-
includes at least one node containing an advertisement? We’ll call this resented by a set Ti _c V consisting of the nodes in G at which this theme
the Strategic Advertising Problem, with input G, {P~ : i = 1 ..... t}, and k.
Your friends figure that a good algorithra for this will make them all See, e.g., http://www.eastgate.com.
rich; tmfortunately, things are never quite this simple.
Exercises 511
Chapter 8 NP and Computational Intractability
510
a destination node t ~ v, is there an s-t path in G that includes at most one
appears. Now, given a particular set of thematic elements, we may ask: Is node from each zone Z~? Prove that Evasive Path is NP-complete.
there a valid plot of the story in which each of these elements is encoun-
tered? More concretely, given a directed graph G, with start node s and ~13. A combinatorial auction is a particular mechanism developed by econo-
end node t, and thematic elements represented by sets T1, T2 ..... Tk, the mists for selling a collection of items to a collection of potential buyers.
Plot Fulfillment Problem asks: Is there a path from s to t that contains at (The Federal Communications Commission has studied this type of auc-
least one node from each of the sets tion for assigning stations on the radio spectTum to broadcasting com-
Prove that Plot Fulfillment is N-P-complete. panies.) ’ -,
Here’s a ~imple type of combinatorial auction. There are n items for
12, Some friends of yours maintain a popular news and discussion site on sale, labeled I1 ..... In. Each item is indivisible and can only be so!d to one
the Web, and the traffic has reached a level where they want to begin person. Now, m different people place bids: The i~h bid specifies a subset
differentiating their visitors into paying and nonpaying customers. A si of the items, and an offering price x~ that the bidder is willing to pay
standard way to do this is to make all the content on the site available to for the items in the set Si, as a single unit. (We’ll represent this bid as the
customers who pay a monthly subscription fee; meanwhile, visitors who pair (S~, x~).)
don’t subscribe can still view a subset of the pages (all the while being
An auctioneer now looks at the set of all m bids; she chooses to
bombarded with ads asking them to become subscribers).
Here are two shnple ways to control access for nonsubscribers: You accept some of these bids and to reject the others. Each person whose
scould (1) designate a fixed subset of pages as viewable by nonsubscribers, bid i is accepted gets to take all the items in the corresponding set Sv
Thus the rule is that no two accepted bids can specify sets that contain
or (2) allow any page in principle to be viewable, but specify a maximum a common item, since this would involve giving the same item to two
number of pages that canbe viewedby a nonsubscriber in a single sessi°n" different people.
(We’ll assume the site is able to track the path followed by a visitor
The auctioneer collects the sum of the offering prices of all accepted
through the site.) bids. (Note that this is a "one-shot" auction; there is no opportunity to
Your friends are experimenting with a way of restricting access that place further bids.) The auctioneer’s goal is to collect as much money as
is different from and more subtle than either of these two options. possible.
They want nonsubscribers to be able to sample different sections of the
Thus, the problem of Winner Determination for Combinatorial Auc-
Web site, so they designate certain subsets of the pages as constituting
tions asks: Given items I~ ..... I~, bids (Sl, Xl) ..... (Sm, xm), and a bound B,
particular zones--for example, there can be a zone for pages on politics,
is there a collection of bids that the auctioneer can accept so as to collect
a zone for pages on music, and so forth. It’s possible for a page to belong
an amount of money that is at least B?
to more than one zone. Now, as a nonsubscribing user passes through
the site, the access policy allows him or her to visit one page .from each Example. Suppose an auctioneer decides to use this method to sell some
zone, but an attempt by the user to access a second page from the same excess computer equipment. There are four items labeled "PC," "moni-
zone later in the browsing session will be disallowed. (Instead, the user tor," "printer", and "scanner"; and three people place bids. Define
will be directed to an ad suggesting that he or she become a subscriber.)
More formally, we can model the site as a directed graph G = (V, E), in S1 = {PC, monitor}, S2 = {PC, printer}, S3 ---- {monitor, printer, scanner}
which the nodes represent Web pages and the edges represent directed
hyperKnks. There is a distinguished entry node s ~ V, and there are zones and
Zk c_ V. A path P taken by a nonsubscriber is restricted to include
at most one node from each zone Zi. The bids are (Sl, X1), (S2, X2), (S3, X3), and the bound B is equal to 2.
One issue with this more comphcated access policy is that it gets
difficult to answer even basic questions about reachabiliW, including: Is Then the answer to this instance is no: The auctioneer can accept at
most one of the bids (since any two bids have a desired item in common),
it possible for a nonsubscriber to visit a given node t? More precisely, we and this results in a total monetary value of only L
define the Evasive Path Problem as follows: Given G, Z1 ..... Z~, s ~ V, and
Chapter 8 NP and Computational Intractability Exercises
512 513
Prove that the problem of Winner Determination in Combinatorial ference) a sensor placed at any location in the set L~ will not be able
to receive signals on any frequency in the set Fv
Auctions is NP-complete.
We’ve seen the Interval Scheduling Problem in C.hapteors" 1. a~.d 4~er~1 We say that a subset L’ __c L of locations is sufficient if, for each of the n
Prove that this problem is NP-complete. 29. You’re configuring a large network of workstations, which we’ll model as
an undirected graph G; the nodes of G represent individual workstations
26. You and a friend have been trekldng through various far-off parts of and the edges represent direct communication links. The workstations all
the world and have accumulated a big pile of souvenirs. At the time you need access to a common core database, which contains data necessary
weren’t really thinking about which of these you were planning to keep for basic operating system functions.
and which your friend was going to keep, but now the time has come to
You could replicate this database on each workstation; this would
divide everything up. make look-ups very fast from any workstation, but you’d have to manage
Here’s a way you could go about doing this. Suppose there are n
a huge number of copies. Alternately, you could keep a single copy of the
n, and object i has an agreed-upon value xi. (We
database on one workstation and have the remaining workstations issue
could think of this, for example, as a monetary resale value; the case in requests for data over the network G; but this could result in large delays
which you and your friend don’t agree on the value is something we won’t for a workstation that’s many hops away from the site of the database.
pursue here.) One reasonable way to divide things would be to look for a
So you decide to look for the following compromise: You want to
partition of the objects into two sets, so that the total value of the objects
maintain a small number of copies, but place them so that any worksta-
in each set is the same. tion either has a copy of the database or is connected by a direct link to a
This suggests solving the following Number Partitioning Problem.
xn; you want to decide whether workstation that has a copy of the database. In graph terminology, such
a set of locations is called a dominating set.
the numbers can be partitioned into two sets S~ and $2 with the same
Thus we phrase the Dominating Set Problem as follows. Given the
sun2: network G, and a number k, is there a way to place k copies of the database
at k different nodes so that every node either has a c~py of the database
or is connected by a direct link to a node that has a copy of the database?
Show that Number partitioning is NP-complete. Show that Dominating Set is N-P-complete.
xn, 30. One thing that’s not always apparent when thinking about traditional
and numbers k and B. You want to know whether it is possible to partition ontmuous math" problems is the way discrete, combinatorial issues
Exercises
Chapter 8 NP and Computational Intractability 521
520
32. The mapping of genomes involves a large array of difficult computational
of the lctnd we’re studying here can creep into what look like standard problems. At the most basic level,, each of an organism’s chromosomes
calculus questions. can be viewed as an extremely long string (generally containing millions
Consider, for example, the traditional problem of minimizing a one- of symbols) over the four-letter alphabet {A, C, G, T}. One family of ap-
proaches to genome mapping is to generate a large number of short,
variable function like f(x) = 3 + x - 3x2 over an interval like x ~ [0, 1].
overlapping snippets from a chromosome, and then to infer the full long
The derivative has a zero at x = !/6, but this in fact is a maximttm of
the function, not a minimum; to get the minimum, one has to heed string representing the chromosome from this set of overlapping sub-
the standard warning to check the values on the boundary of the in- strings.
terval as well. (The minimum is in fact achieved on the boundary, at While wg won’t go into these string assemb!y problems in full detail,
x = 1.) here’s a simplified problem that suggests some of the computational dif-
Checking the boundarY isn’t such a problem when you have a function ficulty one encounters in this area. Suppose we have a set S =
in one variable; but suppose we’re now dealing with the problem of of short DNA strings over a q-letter alphabet; and each string si has length
minimizing a function in n variables xl,x2 ..... xn over the unit cube, 2e, for some number e > 1. We also have a libra~ .of additional strings
where each of xl, x2 ..... xn ~ [0, 1]. The minimum may be achieved on T = {q, t2 ..... trn} over the same alphabet; each of these also has length
the interior of the cube, but it may be achieved on the boundary; and 2& In trying to assess whether the string sb might c~me directly after the
this latter prospect is rather daunting, since the botmdary consists .of 2n string sa in the chromosome, we will look to see whether the library T
"corners" (where each xi is equal to either 0 or 1) as well as various pieces of contains a string tk so that the first e symbols in t~ are equal to the last
other dknensions. Calculus books tend to get suspiciously vagud around symbols in Sa, and the last ~ symbols in t~ are equal to the first ~ symbols
here, when trying to describe how to handle multivariable minimization in s~. If this is possible, we will say that tk corroborates the pair (sa, so).
(In other words, tk could be a snippet of DNA that straddled the region
problems inout
It turns the there’s
face ofathis complexitY.
reason for this: Minimizing an n-variable func- in which s~ directly followed Sa.)
tion over the unit cube in n dimensions is as hard as an NP-complete
NOW we’d like to concatenate all the strings in S in some order,
problem. To make this concrete, let’s consider the special case of poly- one after the other with no overlaps, so that each consecutive pair is
nomials with integer coefficients over n variables x~, x2 ..... Xn. To review corroborated by some string in the library T. That is, we’d like to order
some terminology, the strings in S as s~, s,_, ..... s,,, where i~, ~ ..... in is a permutation of
c and eachwe say a monomial
xi raised is
to asome
product of a real-number co-
efficient variable nonnegative integer power
{1, 2 ..... n}, so that for each j = 1, 2 ..... n - 1, there is a string t~ that
a, a_,.. an (For example, "~2~ x4 is a monomial.)
ai; we can write this as CX1 X2 . Xn . ~’~’1~’2 3 corroborates the pair (&,, s~-~+~). (The same string t~ can be used for more
A polynomial is then a sum of a finite set of monomials, (For example, than one consecutive pair in the concatenation.) If this is possible, we will
z4 - 6x~x~ is a polynomial.) say that the set S has a perfect assembly.
2X1X2X3 -{- X1X3
We define the Multivariable Polynomial Minimization Problem. as fol- Given sets S and T, the Perfect Assembly Problem asks: Does S have
lows: Given a polynomial in n variables withinteger coefficients, and given a perfect assembly with respect to T? Prove that Perfect Assembly is NP-
xn ~ [0, 1] complete.
an integer bound B, is there a choice of real numbers x> xz .....
that causes the poly-nomial to achieve a value that is < B? Example. Suppose the alphabet is {A, C, G, T}, the set S = {AG, TC, TA}, and
Choose a problem Y from this chapter that is known to be NP- the set T = [AC, CA, GC, GT} (so each string has length 2~ = 2). Then the
complete and show that answer to this instance of Perfect Assembly is yes: We can concatenate
the three strings in S in the order TCAGTA (so sq = s2, si2 = Sl, and si~ = s~). In
y <p Multivariable Polynomial Minimization. this order, the pair (s~, si_,) is corroborated by the string CA In the library
T, and the pair (&2, s,~) is corroborated by the strIng GT in the library T.
31. Given an undirected graph G = (V, E), a feedback set is a set X c__ V with the
properW that G - X has no cycles. The Undirected Feedback Set Problem 33. In a barter economy, people trade goods and services directly, without
asks: Given G and k, does G contain a feedback set of size at most k? Prove money as an intermediate step in the process. Trades happen when each
that Undirected Feedback Set is NP-complete.
Exercises 523
Chapter 8 NP and Computational Intractability
522
on a sma~ group of users? What are the dynamics by which rioting and
party views the set of goods they’re getting as more valuable than the set looting behavior sometimes (but only rarely) emerges from a crowd of
of goods they’re giving in return. Historically, societies tend to move from angry people? They proposed that these are all examples of cascade
barter-based to money-based economies; thus various online systems processes, in which an individual’s behavior is highly influenced by the
that have been experimenting with barter can be viewed as intentional behaviors of his or her friends, and so ff a few individuals instigate the
attempts to regress to this earlier form of economic interaction. In doing process, it can spread to more and more people and eventually have a
this, they’ve rediscovered some of the inherent difficulties with barter very wide impact. We can think of this process as being like the spread
relative to money-based systems. One such difficulty is the complexity
of an illness, or a rumor, jumping from person to person.
of identifying opportunities for trading, even when these opportunities
The mos~ basic version of their models is the following. There is some
exist.
To model this complexity, we need a notion that each person assigns underlying behavior (e.g., playing ice hockey, owning a cell phone, taking
part in a riot), and at any point in time each person is either an adopter of
a value to each object in the world, indicating how much this object would
the behavior or a nonadopter. We represent the population by a directed
be worth to them. Thus we assume there is a set of n people p~ ..... P~,
graph G = (V, E) in which the nodes correspond to people and there is
and a set of m distinct objects al ..... am. Each object is owned by one
an edge (v, w) ff person ~ has influence over the behavior of person w: If
of the people. Now each person Pi has a valuation function v~, defined so person v adopts the behavior, then this helps induce person u; to adopt
that vdaj) is a nonnegative number that specifies how much object aj is it as well. Each person w also has a given threshold O(w) ~ [0, 1], and this
worth to p~--the larger the number, the more valuable the object is to the has the following meaning: At any time when at least a O(w) fraction of
person. Note that everyone assigns a valuation to each object, including
the nodes with edges to tv are adopters of the behavior, the node tv will
the ones they don’t currently possess, and different people can assign
become an adopter as well.
very different valuations to the same object.
Note that nodes with lower thresholds are more easily convinced
A two-person trade is possible in a system like this when there are
to adopt the behavior, while nodes with higher thresholds are more
people p~ and pj, and subsets of objects A~ and A1 possessed by p~ and pi,
conservative. A node w with threshold O(w) = 0 will adopt the behavior
respectively, so that each person would prefer the objects in the subset
immediately, with no inducement from friends. Finally, we need a conven-
they don’t currently have. More precisely, tion about nodes with no incoming edges: We will say that they become
¯ pi’s total valuation for the objects in A1 exceeds his or her total adopters ff O(w) = 0, and cannot become adopters if they have any larger
valuation for the objects in &, and threshold.
. pis total valuation for the objects in A~ exceeds his or her total Given an instance of this model, we can simulate the spread of the
valuation for the objects in Aj. behavior as follows.
(Note that A~ doesn’t have to be all the objects possessed by p~ (and
likewise for Ai); Ai and A1 can be arbitrary subsets of their possessions
that meet these criteria.) Initially, set all nodes w with 0(w)=0 to be adopters
Suppose you are given an instance of a barter economy, specified (All other nodes start out as nonadopters)
Until there is no change in the set of adopters:
by the above data on people’s valuations for objects. (To prevent prob-
For each nonadopter w simultaneously:
lems with representing real numbers, we’ll assume that each person’s
valuation for each object is a natural number.) Prove that the problem of If at least a O(w) fraction of nodes w~ith edges to w are
adopters then
determining whether a two-person trade is possible is NP-complete.
~3 becomes an adopter
34. in the 1970s, researchers including Mark Granovetter and Thomas Endif
Schelling in the mathematical social sciences began trying to develop End~or
models of certain kinds of collective human behaviors: Why do particu- End
lar fads catch on while others die out? Why do particular technological Output the final set of adopters
innovations achieve widespread adoption, while others remain focused
Chapter 8 NP and Computational Intractability Exercises 525
524
issue that’s been worrying them: What if the overall premise of the game
Note that this process terminates, since there are only n individuals total,
is too simple, so that players get really good at it and become bored too
and at least one new person becomes an adopter in each iteration.
quicldy?
Now, in the last few years, researchers in marketing and data min-
It takes you a while, listening to their detailed description of the game,
ing have looked at how a model like this could be used to investigate
to figure out what’s going on; but once you strip away the space battles,
"word-of-mouth" effects in the success of new products (the so-called
kick-boxing interludes, and Stars-Wars-inspired pseudo-mysticism, the
viral marketing phenomenon). The idea here is that the behavior we’re
basic idea is as follows. A player in the game controls a spaceship and
concerned with is the use of a new product; we may be able to convince
is trying to make money buying and selling droids on different planets.
a few key people in the population to try out this product, and hope to
There are ~ different types of droids and k different planets. Each planetp
trigger as large a cascade as possible. has the following properties: there are s(j, p) >_ 0 droids of typej available
Concretely, suppose we choose a set of nodes S ___ V and we reset the for sale, at a fixed price of x(], p) > 0 each, for j = 1, 2 ..... n; and there
threshold of each node in S to 0. (By convincing them to try the product, is a demand for d(L p) > 0 droids of type j, at a fixed price of y(], p) _> 0
we’ve ensured that they’re adopters.) We then run the process described each. (We will assume that a planet does not simultaneously have both a
above, and see how large the final set of adopters is. Let’s denote the size positive supply and a positive demand for a single type of droid; so for
of this final set of adopters by f(S) (note that we write it as a function of each j, at least one of s(Lp) or d(],p) is equal to 0.)
S, since it naturally depends on our choice of S). We could think of f(S)
The player begins on planet s with z units of money and must end
as the influence of the set S, since it captures how widely the behavior
at planet t; there is a directed acyclic graph G on the set of planets, such
spreads when "seeded" at S. that s-t paths in G correspond to valid routes by the player. (G is chosen
The goal, if we’re marketing a product, is to find a small set S to be acyclic to prevent arbitrarily long games.) For a given s-t path P in
whose influence f(S) is as large as possible. We thus define the Influence G, the player can engage in transactions as follows. Whenever the player
Maximization Problem as follows: Given a directed graph G = (V, E), with
arrives at a planet p on the path P, she can buy up to s(j, p) droids of type
a threshold value at each node, and parameters k and b, is there a set S j for x(], p) units of money each (provided she has sufficient money on
of at most k nodes for which f(S) >_ b? hand) and/or sell up to dO, p) droids of type j for y(Lp) units of money
Prove that Influence Maximization is NP-complete. each (for j = 1, 2 ..... n). The player’s final score is the total amount of
Example. Suppose our graph G = (V, E) has five nodes {a, b, c, d, e}, four money she has on hand when she arrives at planet t. (There are also bonus
edges (a, b), (b, c), (e, d), (d, c), and all node thresholds equal to 2/3. Then points based on space battles and kick-boxing, which we’ll ignore for the
the answer to the Influence Maximization instance defined by G, with purposes of formulating this question.)
k = 2 and b = 5, is yes: We can choose S = {a, e}, and this will cause the So basically, the underlying problem is to achieve a high score. In
other three nodes to become adopters as wel!. (This is the only choice of other words, given an instance of this game, with a directed acyclic graph
S that will work here. For example, if we choose S = {a, d}, then b and c G on a set of planets, all the other parameters described above, and also
will become adopters, but e won’t; ff we choose S = {a, b}, then none of c, a target bound B, is there a path P in G and a sequence of transactions
d, or e will become adopters.) on P so that the player ends with a final score that is at least B? We’l! call
this an instance of the High-Score-on-Droid-Trader! Problem, or HSoDT!
35. Three of your friends work for a large computer-game company, and for short.
they’ve been working hard for several months now to get their proposal
for a new game, Droid Trader!, approved by higher management. In Prove that HSoDT! is NP-complete, thereby guaranteeing (assuming
the process, they’ve had to endure all sorts of discouraging comments, P # NP) that there isn’t a simple strategy for racking up high scores on
ranging from "You’re really going to have to work with Marketing on the your friends’ game.
name" to "Why don’t you emphasize the parts where people get to .kick
36. Sometimes you can know people for years and never really understand
each other in the head?"
At this point, though, it’s all but certain that the game is really them. Take your friends Raj and Alanis, for example. Neither of them is
heading into production, and your friends come to you with one final a morning person,but now they re
’ getting up at 6 AM every day to visit
Exercises 527
Chapter 8 NP and Computational Intractability
526
Consider the problem faced by Luke, Leia, and friends as they tried to
local farmers’ markets, gathering fresh fruits and vegetab!es for the new make their way from the Death Star back to the hidden Rebel base. We can
health-food restaurant they’ve opened, Chez Alanisse. view the galaxy as an undirected graph G = (V, E), where each node is a
In the course of trying to save money on ingredients, they’ve come star system and an edge (u, v) indicates that one can travel directly from u
across the following thorny problem. There is a large set of n possible raw to v. The Death Star is represented by a node s, the hidden Rebel base by a
ingredients they could buy, I1, I2 ..... In (e.g., bundles of dandelion greens, node t. Certain edges in t~s graph represent !onger distances than others;
jugs of rice vinegar, and so forth), ingredient Ij must be purchased in units thus each edge e has an integer length ee >- O. Also, certain edges represent
of size s(j) grams (any purchase must be for a whole number of units), routes that are more heavily patrolled by evil Imperial spacecraft; so each
and it costs c(j) dollars per unit. Also, it remains safe to use for t(j) days edge e also has an Integer risk re >_ O, indicating the expected amount
from the date of purchase. of damage incurred from special-effects-intensive space battles if one
Now, over the next k days, they want to make a set of k different daffy traverses this edge.
specials, one each day. (The order in which they schedule the specials It would be safest to travel through the outer rim of the galaxy, from
is up to them.) The it~ daffy special uses a subset Si _ {I1, I2 ..... Ia} of one quiet upstate star system to another; but then one’s ship would run
the raw ingredients. Specifically, it reqmres a(i, j) grams of ingredient Ij. out of fuel long before getting to its destination. Alternately, it would be
And there’s a final constraint: The restaurant’s rabidly loyal customer quickest to plunge through the cosmopolitan core of the galaxy; but then
base only remains rabidiy loyal if they’re being served the freshest meals there would be far too many Imperial spacecraft to deal with. In general,
available; so for each daffy special, the ingredients Si are partitioned into for any path P from s to t, we can define its total length to be the sum of
two subsets: those that must be purchased on the very day when i_he daffy the lengths of all its edges; and we can define its total risk to be the sum
special is being offered, and those that can be used any day while they’re of the risks of all its edges.
still safe. (For example, the mesclun-basfl salad special needs to be macie So Luke, Leia, and company are looking at a complex type of shortest-
with basil that has been purchased that day; but the arugula-basfl pesto path problem in this graph: they need to get from s to t along a path whose
with Cornel! dairy goat cheese special can use basil that is several days total length and total risk are both reasonably small. In concrete terms, we
old, as long as it is still safe.) can phrase the Galactic Shortest-Path Problem as follows: Given a setup
This is where the opportunity to save money on ingredients comes as above, and integer bounds L and R, is there a path from s to t whose
up. Often, when they buy a unit of a certain ingredient I~, they don’t need total length is at most L, and whose total risk is at most R?
the whole thing for the special they’re making that day. Thus, if they can Prove that Galactic Shortest Path is NP-complete.
follow up quickly with another special that uses I~ but doesn’t require it to
be fresh that day, then they can save money by not having to purchase I) 38. Consider the following version of the Steiner Tree Problem, which we’ll
again. Of course, scheduling the basil recipes close together may make it refer to as Graphical Steiner Tree. You are given an undirected graph
harder to schedule the goat cheese recipes close together, and so forth-- G = (V, E), a set X _ V of vertices, and a number k. You want to decide
this is where the complexity comes in. whether there is a set F ~ E of at most k edges so that in the graph (V, F),
X belongs to a single connected component.
So we define the Daily Special Scheduling Problem as follows: Given
data on ingredients and recipes as above, and a budget x, is there a way to Show that Graphical Steiner Tree is NP-complete.
schedule the k daffy specials so that the total money spent on ingredients 39. The Directed Disjoint Paths Problem is defined as follows. We are given
over the course of all k days is at most x? a directed graph G and/c pairs of nodes (Sl, tl), (s~, t2) ..... (sk, tk). The
Prove that Daily Special Scheduling is NP-complete. problem is to decide whether there exist node-disjoint paths P1, P2 ..... Pk
so that Pi goes from & to t~.
Show that Directed Disjoint Paths is NP-complete.
37. There are those who insist that the initial working title for Episode XXVII
of the Star Wars series was "IP = ~f~P"--but this is surely apocryphal. In anY 40. Consider the following problem that arises in the design of broadcasting
case, ff you’re so inclined, it’s easy to find NP-complete problems lurking schemes for networks. We are given a directed graph G = (V, E), with a
just below the surface of the original Star Wars movies.
Chapter 8 NP and Computational Intractability Notes and Further Reading 529
528
designated node r ~ V and a designated set of "target nodes" T __ V - {r}. ing problem. They’re studying a specific n-node communication network,
Each node v has a switching time sv, which is a positive integer. modeled as a directed graph G = (V, E). For reasons of fault tolerance, they
want to divide up G into as many virtual "domains" as possible: A domain
At time 0, the node r generates a message that it would like ev~ node in G is a set X of nodes, of size at least 2, so that for each pair of nodes
in T to receive. To accomplish this, we want to find a scheme whereby r
u, v ~ X there are directed paths from u to v and v to a that are contained
tells some of its neighbors (in sequence), who in turn tell some of their
entirely in X.
neighbors, and so on, until every node in T has received the message. More
formally, a broadcast scheme is defined as follows. Node r may send a Show that the following Domain Decomposition Problem is NP-com-
copy of the message to one of its neighbors at time 0; this neighbor will plete. Given a directed graph G = (V, E) and a number k, can V be parti-
receive the message at time 1. In genera!, at time t >_ 0, any node u that tioned into at least k sets, each of which is a domain?
has already received the message may send a copy of the message to
one of its neighbors, provided it has not sent a copy of the message in
any of the time steps t - su + 1, t - su + 2 ..... t - 1. (This reflects the role Notes and Further Reading
of the switching time; v needs a pause of su - 1 steps between successive
sendings of the message. Note that if sv = 1, then no restriction is imposed In the notes to Chapter 2, we described some of the early work on formalizing
by this.) computational efficiency using polynomial time; NP-completeness evolved
The completion time of the broadcast scheme is the ~u_m time t out of this work and grew into its central role in computer science following
by which all nodes in T have received the message. The Broadcast Time the papers of Cook (1971), Levin (1973), and Karp (1972). Edmonds (1965)
Problem is the following: Given the input described above, and a bound is credited with drawing particular attention to the class of problems in
b, is there a broadcast scheme with completion time at most b? ?,f~P ~ co-~f~P--those with "good characterizations." His paper also contains
the explicit conjecture that the Traveling Salesman Problem cannot be solved
Prove that Broadcast Time is NP-complete.
in polynomial time, thereby prefiguring the P ~ NP question. Sipser (1992) is
Example. Suppose we have a directed graph G = (V, E), with V = a useful guide to all of this historical context.
{r, a, b, c}; edges (r, a), (a, b), (r, c); the set T = {b, c}; and switching time
The book by Garey and Johnson (1979) provides extensive material on NP-
sv = 2 for each v ~ V. Then a broadcast scheme with minimum completion
time would be as follows: r sends the message to a at time 0; a sends completeness and concludes with a very useful catalog of known NP-complete
problems. While this catalog, necessarily, only covers what was known at the
the message to b at time 1; r sends the message to c at time 2; and the
time of the book’s publication, it is still a very useful reference when one
scheme completes at time 3 when c receives the message. (Note that a can
encounters a new problem that looks like it might be NP-complete. In the
send the message as soon as it receives it at time 1, since this is its first
meantime, the space of known NP-complete problems has continued to expand
sending of the message; but r cannot send the message at time 1 since
dramatically; as Christos Papadimitriou said in a lecture, "Roughly 6,000
sr = 2 and it sent the message at time 0.) papers every year contain an NP-completeness result. That means another
NP-complete problem has been discovered since lunch." (His lecture was at
41. Given a directed graph G, a cycle cover is a set of node-disjoint cycles
so that each node of G belongs to a cycle. The Cycle Cover Problem asks 2:00 in the afternoon.)
whether a given directed graph has a cycle cover. One can interpret NP-completeness as saying that each individual NP-
(a) Show that the Cycle Cover Problem can be solved in pol~rnomial time. complete problem contains the entire complexity of NP hidden inside it. A
concrete reflection of this is the fact that several of the Ni~-complete problems
(Hint: Use Bipartite Matching.)
Suppose we require each cycle to have at most three edges. Show that we discuss here are the subject of entire books: the Traveling Salesman is the
(b) subject of Lawler et al. (1985); Graph Coloring is the subject of Jensen and Toft
determining whether a graph G has such a cycle cover is NP-complete.
(1995); and the Knapsack Problem is the subject of Marte!lo and Toth (1990).
Suppose you’re consulting for a company in northern New Jersey that NP-completeness results for scheduling problems are discussed in the survey
42.
designs communication networks, and they come to you with the follow- by Lawler et al. (1993).
Chapter 8 NP and Computational Intractability
530
Notes on the Exercises A number of the exercises illustrate further problems
that emerged as paradigmatic examples early in the development of NP-
completeness; these include Exercises 5, 26, 29, 31, 38, 39, 40, and 41.
Exercise 33 is based on discussions with Daniel Golovin, and Exercise 34
is based on our work with David Kempe. Exercise 37 is an example of the
class of Bicriteria Shortest-Path problems; its motivating application here was
suggested by Maverick Woo.
A C~ass of Problems
beyond NP
Throughout the book, one of the main issues has been the noti0:n of time as a
computational resource. It was this notion that formed the basis for adopting
polynomial time as our working definition of efficiency; and, implicitly, it
underlies the distinction between T and NT. To some extent, we have also
been concerned with the space (i.e., memory) requirements of algorithms. In
this chapter, we investigate a class of problems defined by treating space as
the fundamental computational resource. In the process, we develop a natural
class of problems that appear to be even harder than 3/~P and co-NT.
9.1 PSPACE
The basic class we study is PSPACE, the set of a!l problems that can be solved
by an algorithm with polynomial space complexity--that is, an algorithm that
uses an amount of space that is polynomial in the size of the input.
We begin by considering the relationship of PSPACE to classes of problems
we have considered earlier. First of all, in polynomial time, an algorithm can
consume only a polynomial amount of space; so we can say
(9.1) ~P _ PSPACE.
But PSPACE is much broader than this. Consider, for example, an algorithm
that just counts from 0 to 2n- 1 in base-2 notation. It simply needs to
implement an n-bit counter, which it maintains in exactly the same way one
increments an odometer in a car. Thus this algorithm runs for an exponential
amount of time, and then halts; in the process, it has used only a polynomial
amount of space. Although this algorithm is not doing anything particularly
9.2 Some Hard Problems in PSPACE 533
Chapter 9 PSPACE: A Class of Problems beyond NP
532
interesting, it iliustrates an important principle: Space can be reused during a
computation in ways that time, by definition, cannot.
Here is a more striking application of this principle.
(9.2) There is an algorithm that solves 3-SAT using only a poIynomial amount
of space.
Proof. We simply use a brute-force algorithm that tries all possible truth
assignments; each assignment is plugged into the set of clauses to see if it
satisfies them. The key is to implement this all in polynomial space.
To do this, we increment an n-bit counter from 0 to 2n - ! just as described Figure 9.1 The subset relationships among various classes of problems. Note that we
above. The values in the counter correspond to truth assignments in the don’t know how to prove the conjecture that a~ of these classes are different from one
another.
following way: When the counter holds a value q, we interpret it as a truth
assignment v that sets xi to be the value of the im bit of q.
Thus we devote a polynomial amount of space to enumerating all possible it has not been proven that {P ~ PSPACE. Nevertheless, the nearly tmiversal
truth assignments v. For each truth assignment, we need only polynomial conjecture is that PSPACE contains problems that are not even in
space to plug it into the set of clauses and see if it satisfies them. If it does
satisfy the clauses, we can stop the algorithm immediately. If it doesn’t, We
delete the intermediate work involved in this "plugging in" operation and reuse 9.2 Some Hard Problems in PSPACE
this space for the next truth assignment. Thus we spend only polynomial space We now survey some natural examples of problems in PSPACE that are not
cumulatively in checking all truth assignments; this completes the bound on known--and not believed--to belong to
the algorithm’s space requirements. [] As was the case with ~g:P, we can try understanding the structure of
PSPACE by looking for complete problems--the hardest problems in the class.
T is an NP-complete problem, (9.2) has a significant conse-
Since ~a-SK We will say that a problem X is PSPACE-complete if (i) it belongs to PSPACE;
quence. and (ii) for all problems Y in PSPACE, we have Y _<p X.
It turns out, analogously to the case of ~:P, that a wide range of natural
(9.3) ~N{P c_g_ PSPACE. . .... problems are PSPACE-complete. Indeed, a number of the basic problems in
a-SK , there is artifici!l intelligence are PSPACE-complete, and we describe tkree genres of
Proof. Consider an arbitrary problem g in ~f~P. Since Y <_p " .T these here.
an algorithm that solves Y using a polynomial number of steps plus a poly-
nomial number of calls to a black box for 3-SAT. Using the algorithm in (9.2) Planning
to implement this black box, we obtain an algorithm for Y that uses only
Planning problems seek to capture, in a clean way, the task of interacting
polynomial space. [] with a complex environment to achieve a desired set of goals. Canonical
applications include large logistical operations that require the movement of
Just as with the class {P, a problem X is in PSPACE if and only if its
people, equipment, and materials. For example, as part of coordinating a
complementary problem X is in PSPACE as well. Thus we can conclude that disaster-relief effort, we might decide that twenty ambulances are needed at a
co-~f{P _c PSPACE. We draw what is known about the relationships among these particular high-altitude location. Before this can be accomplished, we need to
classes of problems in Figure 9.1. get ten snowplows to clear the road; this in turn requh:es emergency fuel and
Given that PSPACE is an enormously large class of problems, containing snowplow crews; but if we use the fuel for the snowplows, then we may not
both ~f{P and co->f~P, it is very likely that it contains problems that cannot have enough for the arnbuiances; and.., you get the idea. Military operations
be solved in polynomial time. But despite this widespread belief, amazingly
Chapter 9 PSPACE: A Class of Problems beyond NP
9.2 Some Hard Problems in PSPACE 535
534
where each C~ is a disjunction of three terms (in other words, it is an instance
also require such reasoning on an enormous scale, and automated planning of 3-SAT). Assume for simplicity that n is an odd number, and suppose we ask
techniques from artificial intelligence have been used to great effect in this
domain as well. ~X~VX2 ’’’ ~Xn_2VXn_~Xn q~ (X~ .....Xn)?
One can see very similar issues at work in complex solitaire games such That is, we wish to know whether there is a choice for Xl, so that for both
as Rubik’s Cube or the ~i~teen-puzzle--a 4 x 4 grid with fifteen movable tiles choices of x2, there is a choice for x3, and so on, so that q) is satisfied. We will
labeled I, 2 ..... 15, and a single hole, with the goal of moving the tiles around refer to this decision problem as Quantified 3-SAT (or, briefly, QSAT).
so that the numbers end up in ascending order. (Rather than ambulances and
The original 3:SAT problem, by way of comparison, simply aske’d
snowplows, we now are worried about things like getting the tile labeled 6
one position to the left, which involves getting the Ii out of the way; but ~Xl~X2 ... ~Xn_2~Xn_l~Xn~ (X1 ..... Xn)?.
that involves moving the 9, which was actually in a good position; and so
In other words, in 3-SAT it was sufficient to look for a single setting of the
on.) These toy problems can be quite tricky and are often used in artificial
Boolean variables.
intelligence as a test-bed for planning algorithms.
Having said all this, how should we define the problem of planning Here’s an example to illustrate the kind of reasoning that underlies an
instance of QSAT. Suppose that we have the formula
in a way that’s general enough to include each of these examples? Both
solitaire puzzles and disaster-relief efforts have a number of abstract features ®(xl, x2,.x9 = (xl vx~ vx9 A (xl vx~ vx-~ A x(~l v x2 vx~) A X(~lV ~ v x-5-)
in common: There are a number of conditions we are trying to achieve and a set
of allowable operators that we can apply to achieve these conditions. Thus we and we ask
model the environment by a set C = [CI ..... Cn] of conditions: A given state ~XIVX2~X3d# (X1, XZ, X3)?
of the world is specified by the subset of the conditions that currently hold. We
interact with the environment through a set {01 ..... Ok} of operators. Each The answer to this question is yes: We can set x1 so that for both choices of
operator Oi is specified by a prerequisite list, containing a set of conditions x2, there is a way to set xa so that ~ is satisfied. Specifically, we can set xI = 1;
that must hold for Oi to be invoked; an add list, containing a set of conditions then if x2 is set to 1, we can set xa to 0 (satisfying all clauses); and if x2 is set
that will become true after O~ is invoked; and a delete list, containing a set of to 0, we can set x3 to 1 (again satisfying all clauses).
conditions that will cease to hold after O~ is invoked. For example, we could Problems of this type, with a sequence of quantifiers, arise naturally as a
model the fifteen-puzzle by having a condition for each possible location of form of contingency planning--we wish to know whether there is a decision
each tile, and an operator to move each tile between each pair of adjacent we can make (the choice of xl) so that for all possible responses (the choice
locations; the prerequisite for an operator is that its two locations contain the of x2) there is a decision we can make (the choice of xa), and so forth.
designated tile and the hole.
The problem we face is the following: Given a set C0 of initial tonditions, Games
and a set C* of goal conditions, is it possible to apply a sequence of operators In 1996 and 1997, world chess champion Garry Kasparov was billed by the
beginning with C0 so that we reach a situation in which precisely the conditions media as the defender of the human race, as he faced IBM’s program Deep Blue
in C* (and no others) hold~. We will call this an instance of the Planning in two chess matches. We needn’t look further than this picture to convince
Problem. ourselves that computational game-playing is one of the most visible successes
of contemporary artificial intelligence.
Quantification A large number of two-player games fit naturally into the following flame-
We have seen, in the 3-SAT problem, some of the difficulty in determining work. Players alternate moves, and the first one to achieve a specific goal wins.
whether a set of disjunctive clauses can be simultaneously satisfied. When we (For example, depending on the game, the goal could be capturing the king,
add quantifiers, the problem appears to become even more difficult. removing all the opponent’s checkers, placing four pieces in a row, and so on.)
Let ~(xl ..... Xn) be a Boolean formula of the form Moreover, there is often a natural, polynomial, upper bound on the maximum
possible length of a game.
9.3 Solving Quantified Problems and Games in Polynomial Space 537
Chapter 9 PSPACE: A Class of Problems beyond NP
536
Fortunately, we can perform a simple optimization that greatly reduces
The Competitive Facility Location Problem that we introduced in Chapter 1 the space usage. When we’re done with the case x1 = O, all we really need
fits naturally within this framework. (It also illustrates the way in which games to save is the single bit that represents the outcome of the recursive call; we
can arise not iust as pastimes, but through competitive situations in everyday can throw away all the other intermediate work. This is another example of
life.) Recall that in Competitive Facility Location, we are given a graph G, with "reuse’--we’re reusing the space from the computation for x~ = 0 in order to
a nonnegative value bi attached to each node i. Two players alternately select compute the case xI = 1.
nodes of G, so that the set of selected nodes at all times forms an independent Here is a compact description of the algorithm.
set. Player 2 wins if she ultimately selects a set of nodes of total value at least
/3, for a given bound/3; Player 1 wins if he prevents this from happening. The If the first quantifier is B xi then
question is: Given the graph G and the bound B, is there a strategy by which
Set xi = 0 and recursively evaluate the quantified expression
Player 2 can force a win? over the remaining variables
Save the result (0 or I) and delete all other intermediate work~
Set xi= 1 and recursively evaluate the quantified expression
9.3 Solving Quantified Problems and Games in over the remaining variables
Polynomial Space If either outcome yielded an evaluation of I, then
We now discuss how to solve all of these problems in polynomial space. As return 1
we will see, this will be trickier--in one case, a lot trickier--than the (simple) Else return 0
task we faced in showing that problems like 3-SAT and Independent Set belong Endif
If the first quantifier is V xi then
to
We begin here with QSAT and Competitive Facility Location, and then Set xi= 0 and recursively evaluate the quantified expression
over the remaining variables
consider Planning in the next section.
Save the result (0 or I) and delete all other intermediate work
Set xi= 1 and recursively evaluate the quantified expression
~ Designing an Algorithm for QSAT over the remaining variables
If both outcomes yielded an evaluation of I, then
First let’s show that QSAT can be solved in polynomial space. As was the case
with 3-SAT, the idea will be to run a brute-force algorithm that reuses space return 1
Else return 0
carefully as the computation proceeds.
Endif
Here is the basic brute-force approach. To deal with the first quantifier ~Xl,
Endif
we consider both possible values for Xl in sequence. We first set x~ = 0 and
see, recursivefy, whether the remaining portion of the formula evaluates to 1.
We then set x~ - 1 and see, recursively, whether the remaining portion of the
formula evaluates to 1. The full formula evaluates to I if and only if either of ~ Analyzing the Algorithm
these recursive calls yields a 1--that’s simply the definition of the B quantifier. Since the recursive calls for the cases x1 = 0 and Xl = 1 overwrite the same
space, our space requirement S(n) for an n-variable problem is simply a
This is essentially a divide-and-conquer algorithm, which, given an input
polynomial in n plus the space requirement for one recursive call on an (n - 1)-
with n variables, spawns two recursive calls on inputs with n - 1 variables
variable problem:
each. If we were to save all the work done in both these recursive calls, our
space usage S(n) would satisfy the recurrence S(n) <_ S(n - 1) + p(n),
S(n) <_ 2S(n - 1) + p(n), where again p(n) is a polynomial function. Unrolling this recurrence, we get
where p(n) is a polynomial function. This wonld result in an exponential S(n) <_ p(n) + p(n - 1) + p(n - 2) +.-.+p(1) <_ n . p(n).
bound, which is too large.
9.4 Solving the Planning Problem in Polynomial Space 539
Chapter 9 PSPACE: A Class of Problems beyond NP
538
consisting precisely of the conditions that hold at that time. For an initial
Since p(n) is a polynomial, so is n-p(n), and hence our space usage is configuration % and a goal confgurafion e*, we wish to determine whether
polynomial in n, as desired. there is a sequence of operators that will take us from e0 to e*.
In summary, we have shown the following. We can view our Planning instance in terms of a giant, implicitly defined,
directed graph 9. There is a node of 9 for each of the 2n possible configurations
(9.4) QSAT can be solved in polynomial space. _ (i.e., each possible subset of e); and there is an edge of 9 from configuration
e’ to configuration e" if, in one step, one of the operators can convert e’ to e".
In terms of this graph, the Planning Problem has a very natural formulation:
Extensions: An Algorithm for Competitive Facility Location Is there a path in 9 from % to e*? Such a path corresponds precisely to a
We can determine which player has a forced win in a game such as Competitive sequence of operators leading from % to e*. -
Facility Location by a very similar type of algorithm. It’s possible for a Planning instance to have a short solution (as in the
Suppose Player 1 moves first. We consider all of his possible moves in example of the fifteen-puzzle), but this need not hold in general._ That is,
sequence. For each of these moves, we see who has a forced win in the resulting there need not always be a short path in 9 from ~0 to ~*. This should not be
game, with Player 2 moving first. If Player 1 has a forced win in any of them, so surprising, since 9 has an exponential number of nodes. But we must be
then Player 1 has a forced win from the initial position. The crucial point, carefu! in applying this intuition, since 9 has a special structure: It is defined
as in the QSAT algorithm, is that we can reuse the space from one candidate very compactly in terms of the n conditions and k operators.
move to the next; we need only store the single bit representing the outcome.
In this way, we only consume a polynomial amount of space plus th~ space (9.6) There are instances of the Planning Problem with n conditions and k
requirement for one recursive call on a graph with fewer nodes. As in the case operators for which there exists a solution, but the shortest solution has length
2n- 1.
of QSAT, we get the recurrence
S(n) <_ S(n - 1) + p(n) Proof. We give a simple example of such an instance; it essentially encodes
the task of incrementing an n-bit counter from the all-zeros state to the all-ones
for a polynomial p(n). state.
In summary, we have shown the following. o We have conditions C1, C2 ..... Cn.
(9.5) Competitive Facility Location can be solved in polynomial space. o We have operators Oi for i = 1, 2 ..... n.
o O1 has no prerequisites or delete list; it simply adds C1.
o For i > 1, Oi requires C] for all j < i as prerequisites. When invoked, it
9.4 Solving the Planning Problem in adds Q and deletes Cj for all j < i.
Polynomial Space Now we ask: Is there a sequence of operators that will take us from eo = ¢ to
Now we consider how to solve the basic Planning Problem in polynomial e* = {C1, C2 ..... Cn}?
space. The issues here will look quite different, and it will turn out to be a We claim the following, by induction on i:
much more difficult task.
From any configuration that does not contain C1 for any j < i, there exists
a sequence of operators that reaches a configuration.containing Ci for all
~ The Problem ] < i; but any such sequence has at least 2~ - 1 steps.
Recall that we have a set of conditions ~ --- {C1,. ¯., Cn} and a set of operators
{01 ..... 0k}. Each operator 0i has a prerequisite list Pi, an add list Ai, and This is clearly true for i = 1. For larger i, here’s one solution.
a delete list D~. Note that Oi can sti~ be applied even if conditions other than o By induction, achieve conditions {Q_I ..... Q} using operators O1 .....
those in Pi are present; and it does not affect conditions that are not in A~ or D~.
We define a configuration to be a subset ~’ ~_ ~; the state of the Planning
Problem at any given time can be identified with a unique configuration ~’ o Now invoke operator (9i, adding Ci but deleting everything else.
¯ 9.4 Solving the Planning Problem in Polynomial Space 541
Chapter 9 PSPACE: A Class of Problems beyond NP
540
o Again, by induction, achieve conditions (Ci-1 ..... C1} using operators Of course, this algorithm is too brute-force for our purposes; it takes
exponential space just to construct the graph 9. We could try an approach in
Oi-~- Note that condition Ci is preserved throughout this process. which we never actually build 9, and just simulate the behavior of depth-first
search or breadth-first search on it. But this likewise is not feasible. Depth-first
Now we take care of the other part of the inductive step--that any such
search crucially requires us to maintain a list of a~ the nodes in the current
sequence requires at least 2i - 1 steps. So consider the first moment when
path we are exploring, and this can grow to exponential size. Breadth-first
is added. At this step, Ci-t ..... C~ must have been present, and by induction,
reqt~es a list of all nodes in the current "frontier" of the search, and this too
this must have taken at least 2i-t- ! steps. Ci can only be added by
can grow to exponential size.
which deletes all Cj forj < i. Now we have to achieve conditions {Ci-~ .....
again; this will take another 2i-~ - ! steps, by induction, for a total of at least We seem stuck. Our problem is transparently equivalent to finding a path
in 9, and all the standard path-finding algorithms we know are too lavish in
2(2i-~ - 1) + 1 = 2i - 1 steps.
The overall bound now follows by applying this claim with i = n. their use of space. Could there really be a fundamentally different path-finding
algorithm out there?
Does this really reduce the space usage to a polynomial amount~. We first induction, path(el, ~,, [L/2]) and path(~,, ~2, [L/2]) will both return "yes,"
write down the procedure carefully, and then analyze it. We will think of L as and so Path(el, e2, L) will return "yes." Conversely, if there is a configuration
e’ so that Path(~l, ~’, [L/2]) and Path(~’, ~2, [L/2]) both return "yes," then
a power of 2, which it is for our purposes.
the induction hypothesis implies that there exist corresponding sequences
of operators; concatenating these two sequences, we obtain a sequence of
Path(~1, ~2, L)
operators from e~ to e2 of length at most L.
If L= 1 then
If there is an operator 0 converting ~i to ~2 then Now we consider the space requirements. Aside from the space spent
return ’yes’ ’ inside recursive calls, each invocation of Path involves an amount of space
Else polynomial in n, k, and log L. But at any given point in time, only a single
return echO,, recursive call is active, and the intermediate work from all other recursive calls
Endif
has been deleted. Thus, for a polynomial function p, the space requirement
Else (L > l) S(n, k, L) satisfies the recurrence
Enumerate al! configurations (Y using an n-bit counter
S(n, k,L) < p(n, k, logL) + S(n, k, [L/2]).
For each ~’ do the following:
S(n, k, 1) <_ p(n, k, 1).
Compute x = Path(~l, ~’, [L/2])
Delete all intermediate work, saving only the return value x Unwinding the recurrence for O(log L) levels, we obtain the bound S(n, k, L) =
Compute y = Path(~’, (~2, [L/Z]) O(!og L. p(n, k, log L)), which is a polynomial in n, k, and log L. []
Delete all intermediate work, saving only the return value y
If both x and y are equal to eyes’’, then return ~yes’’ If dynamic programming has an opposite, this is it. Back when we were
Endfor solving problems by dynamic programming, the fundamental principle was to
If ~~yes’’ was not returned for any ~’ then save a!l the intermediate work, so you don’t have to recompute it. Now that
Return ~no’’ conserving space is our goal, we have just the opposite priorities: throw away
Endif all the intermediate work, since it’s just taking up space and it can always be
Endif recomputed.
As we saw when we designed the space-efficient Sequence Alignment
Again, note that this procedure solves a generalization of our original
Algorithm, the best strategy often lies somewhere in between, motivated by
question, which simply asked for Path(~0, ~*, 2n)¯ This does mean, however, these two approaches: throw away some of the intermediate work, but not so
that we should remember to view L as an exponentially large parameter:
much that you blow up the running time.
log L = n. .
Figure 9,2 The reduction from Competitive 3-SAT to Competitive FaciliW Location.
Solved Exercises
must form an independent set naturally prevents both ui and v’i from being
chosen. At this point, we do not define any other edges. Solved Exercise 1
How do we get the players to set the variables in order--first xt, then Sell-avoiding walks are a basic object of study in the area of statistical physics;
and so forth? We place values on vl and u~ so high that Player 1 will lose they can be defined as follows. Let L denote the set of all points in R2 with
instantly if he does not choose them. We place somewhat lower values on integer coordinates. (We can think of these as the "grid points" of the plane.)
and u~, and continue in this way. Specifically, for a value c > k + 2, we define A sell-avoiding walk W of length n is a sequence of points (p~, p2 .....Pn)
the node values bv~ and bv~ to be cl+n-~. We define the bound that Player 2 is drawn from L so that
trying to achieve to be O) Pl = (0, 0). (The walk starts at the origin.)
B = cn-~ + cn-3 q- " " " q- c2 q- 1. (ii) No two of the points are equal. (The walk "auoids" itself.)
Let’s pause, before worrying about the clauses, to consider the game (iii) For each i = 1, 2 ..... n - 1, the points Pi and pi+~ are at distance 1 from
played on this ,graph. In the opening move of the game, Player 1 must select each other. (The walk moues between neighboring points in L.)
one of ul or u~ (thereby obliterating the other one); for if not, then Player Self-avoiding walks (in both two and three dimensions) are used in physical
2 will immediately select one of them on her next move, winning instantly. chemistry as a simple geometric mode! for the possible conformations of long-
Similarly, in the second move of the game, Player 2 must select one of u2 or chain polymer molecules. Such molecules can be viewed as a flexible chain
u~. For otherwise, Player 1 will select one on his next move; and then, even if of beads that flops around, adopting different geometric layouts; self-avoiding
Player 2 acquired all the remaining nodes in the graph, she would not be able walks are a simple combinatorial abstraction for these layouts.
to meet the bound B. Continuing by induction in this way, we see that to avoid
an immediate loss, the player making the itu move must select one of ui or u’~. A famous unsolved problem in this area is the following. For a natural
Note that our choice of node values has achieved precisely what we wanted: number n > 1, let A(n) denote the number of distinct’self-avoiding walks
The players must set the variables in order. And what is the outcome on this of length n. Note that we view walks as sequences of points rather than
graph? Player 2 ends up with a total of value of cn-~ + cn-3 +" + c~ = B - !: sets; so two walks can be distinct even if they pass through the same set
of points, provided that they do so in different orders. (Formally, the walks
she has lost by one unit! ~T
(Pl, P2 ..... Pn) and (ql, q2 ..... qn) are distinct if there is some i (1 < i < n)
We now complete the analogy with Competitive ~-SA by giving Player for which p~ # qi.) See Figure 9.3 for an example. In polymer models based on
2 one final move on which she can try to win. For each clause Ci, we define
self-avoiding walks, A(n) is directly related to the entropy of a chain molecule,
a node ci with value bcj = 1 and an edge associated with each of its terms as
Solved Exercises
Chapter 9 PSPACE: A Class of Problems beyond NP 549
548
(Thus the running time of your algorithm can be exponential, as long as its
space usage is polynomial. Note also that polynomial here means "polynomial
in n," not "polynomial in log n." Indeed, by part (a), we see that it will take
at least n - 1 bits to write the value of A(n), so clearly n - 1 is a lower bound
on the amount of space you need for producing a correct answer.)
0,0) Solution We consider part (b) first. One’s first thought is that enumerating all
self-avoiding walks sounds like a complicated prospect; it’s natural to imagine
the search as grov#ing a chain starting from a single bead, exploring possible
conformations, and backtracking when there’s no way to continue growing
(I,I)
and remain self-avoiding. You can picture attention-grabbing screen-savers that
do things like this, but it seems a bit messy to write down exactly what the
algorithm would be.
So we back up; polynomial space is a very generous bound, and we
can afford to take an even more brute-force approach. Suppose that instead
(I,0) of trying just to enumerate all self-avoiding walks of length n, we simply
enumerate all walks of length n, and then check which ones turn out to be self-
avoiding. The advantage of this is that the space of all walks is much easier
to describe than the space of self-avoiding walks.
Pn) on the set 2~ of grid points in the plane
can be described by the sequence of directions it takes. Each step frompi to
in the walk can be viewed as moving in one of four directions: north, south,
east, or west. Thus any walk of length n can be mapped to a distinct string
of length n - 1 over the alphabet [N, S, E, W}. (The three walks in Figure 9.3
(0,0) (I,0) would be ENW, NES, and EEN.) Each such string corresponds to a walk of
length n, but not al! such strings correspond to walks that are self-avoiding:
for example, the walk NESW revisits the point (0, 0).
Figure 9.3 Three distinct self-avoiding walks of length 4. Note that although walks (a)
and (b) involve the same set of points, they are considered different walks because they We can use this encoding of walks for part (b) of the question as follows.
pass through them in a different order. Using a counter in base 4, we enumerate all strings of length n - 1 over the
alphabet {N, S, E, W}, by viewing this alphabet equivalently as {0, 1, 2, 3}. For
each such string, we construct the corresponding walk and test, in polynomial
space, whether it is self-avoiding. Finally, we increment a second counter A
and so it appears in theories concerning the rates of certain metabolic and (initialized to 0) if the current walk is self-avoiding. At the end of this algorithm,
organic synthesis reactions. A will hold the value of A(n).
Despite its importance, no simple formula is known for the value A(n). In- Now we can bound the space used by this algorithm is follows. The first
deed, no algorithm is known for computing A(n) that runs in time polynomial counter, which enumerates strings, has n - 1 positions, each of which requires
two bits (since it can take four possible values). Similarly, the second counter
holding A can be incremented at most 4n-1 times, and so it too needs at most
(a) Show that A(n) > 2n-1 for all natural numbers n >_ 1. 2n bits. Finally, we use polynomial space to check whether each generated
(b) Give an algorithm that takes a number n as input, and outputs A(n) as a walk is self-avoiding, but we can reuse the same space for each walk, and so
number in binary notation, using space (i.e., memory) that is polynomial the space needed for this is polynomial as well.
Notes and Further Reading 551
Chapter 9 PSPACE: A Class of Problems beyond NP
550
your opponent into a situation where they’ll be the one who’s ultimately
The encoding scheme also provides a way to answer part (a). We observe stuck without a move?
that all walks that canbe encoded using only the letters IN, E} are self-avoiding, To highlight the strategic aspect, we define the following abstract
since they only move up and to the right in the plane. As there are 2n-1 strings version of the game, which we call Geography on a Graph. Here, we have
of length n - I over these two letters, there are at least 2n-1 self-avoiding walks; a directed graph G = (g, E), and a designated start node s ~ 11. Players
in other words, A(n) >_ 2 alternate turns starting from s; each player must, if possible, follow an
(Note that we argued earlier that our encoding technique also provides an edge out of the current node to a node that hasn’t been visited before. The
upper bound, showing immediately that A(n) ! 4n-l-) player who loses is the first one who cannot move to a node that hasn’t
been visited earlier in the game. (There is a direct analogy to Geography,
with nodes corresponding to words.) In other words, a player loses if the
Exercises game is currently at node v, and for edges of the form (v, w), the node m.
has already been visited.
1. Let’s consider a special case of Quantified 3-SAT in which the underlying
Boolean formula has no negated variables. Specifically, let ¢~(x~ ...... Prove that it is PSPACE-complete to decide whether the first player
be a Boolean formula of the form can force a win in Geography on a Graph..
C~^ C2^"’^
3. Give a polynomial-time algorithm to decide whether a player has a forced
where each Ci is a disjunction of three terms. We say ¯ is monotone if
win in Geography on a Graph, in the special case when the underlying
’" each
each term in each clause consists of a nonnegated variable--that ~s, graph G has no directed cycles (in other words, when G is a DAG).
term is equal to xi, for some ~, rather than
We define Monotone QSAT to be the decision problem
Notes and Further Reading
3XI"qX2 . . . 3Xn_2VXrt-l~Xn~ (X1 ..... Xn)2.
PSPACE is just one example of a class of intractable problems beyond NP;
where the formula ¯ is monotone. charting the landscape of computational hardness is the goal of the field of
Do one of the following two things: (a) prove that Monotone QSAT is complexity theory. There are a number of books that focus on complexity
PSPACE-complete; or (b) give an algorithm to solve arbitrary instances of theory; see, for example, Papadimitriou (1995) and Savage (1998).
Monotone QSAT that runs in time polynomial in n. (Note that in (b), the
The PSPACE-completeness of QSAT is due to Stockmeyer and Meyer
goal is polynomial time, not just polynomial space.)
(1973).
Consider the fol!owing word game, which we’ll call Geography. You have Some basic PSPACE-completeness results for two-player games can be
2.
a set of names of places, like the capital cities of all the countries in the found in Schaefer (1978) and in Stockmeyer and Chandra (1979). The Com-
world. The first player begins the game by naming the capital city c of petitive Facility Location Problem that we consider here is a stylized example
the country the players are in; the second player must then choose a city of a class of problems studied within the broader area of facility location; see,
c’ that starts with the letter on which c ends; and the game continues in for example, the book edited by Drezner (1995) for surveys of this topic.
this way, with each player alternately choosing a city that starts with the Two-player games have provided a steady source of difficult questions
letter on which the previous one ended. The player who loses is the first for researchers in both mathematics and artificial intelligence. Berlekamp,
one who cannot choose a city that hasn’t been named earner in the game.
Conway, and Guy (1982) and NowakowsM (1998) discuss some of the math-
For example, a game played in Hungary would start with "Budapest," ematical questions in this area. The design of a world-champion-leve! chess
and then it could continue (for example), "Tokyo, Ottawa, Ankara, Ams: program was for fifty years the foremost applied challenge problem in the field
terdam, Moscow, Washington, Narroo ¯ of computer game-playing. Alan Turing is known to have worked on devising
This game is a good test of geographical knowledge, of course, but algorithms to play chess, as did many leading figures in artificial intelligence
even with a list of the world’s capitals sitting in front of you, it’s also a over the years. Newborn (1996) gives a readable account of the history of work
major strategic challenge. Whichword should you picknext, to tr~
Chapter 9 PSPACE: A Class of Problems beyond NP
552
on this problem, covering the state of the art up to a year before IBM’s Deep
Blue finally achieved the goal of defeating the human world champion in a
match.
Planning is a fundamental problem in artificial intelligence; it features
prominently in the text by Russell and Norvig (2002) and is the subject of a
book by Ghallab, Nau, and Traverso (2004). The argument that Planning can
Ch ter
be solved in polynomial space is due to Sa.vitch (1970), who was concerned
with issues in complexity theory rather than the Planning Problem per se.
Notes on the Exercises Exercise 1 is based on a problem we learned from Extending the Limits
Maverick Woo and Ryan Williams; Exercise 2 is based on a result of Thomas
Schaefer.
Trac~abigity
~ We use the term NP-hard to mean "at least as hard as an NP-complete problem." We avoid referring to
optimization problems as NP-complete, since technically this term applies only to decision problems.
Chapter 10 Extending the Limits of Tractability !0.1 Finding Small Vertex Covers 555
554
even when we are not able to establish any provable guarantees about their a vertex cover. This means that the range of possible running-time bounds is
much richer, since it involves the interplay between these two parameters.
behavior.
But to start, we explore some situations in which one can exactly solve
instances of NP-complete problems with reasonable efficiency. How do these ~ The Problem
situations arise.~ The point is to recall the basic message of NP-completeness: Let’s consider this interaction between the parameters n and k more closely.
the worst-case instances of these problems are very difficult and not likely to First of all, we notice that if k is a fixed constant (e.g., k = 2 or k = 3), then
be solvable in polynomial time. On a particular instance, however, it’s possible we can solve Vertex Cover in polynomial time: We simply try all subsets of V
that we are not really in the "worst case"--maybe, in fact, the instance we’re of size k, and see whether any of them constitute a vertex cover. There are (~)
looking at has some special structure that makes our task easier. Thus the crux subsets, and each’takes time O(kn) to check whether it is a vertex cover, for a
of this chapter is to look at situations in which it is possible to quantify some total time of O(kn(~)) = O(knk+l). So from this we see that the intractability
precise senses in which an instance may be easier than the worst case, and to of Vertex Cover only sets in for real once k grows as a function of n.
take advantage of these situations when they occur. However, even for moderately small values of k, a running time of
We’ll look at this principle in several concrete settings. First we’ll consider O(kn~+1) is quite impractical. For example, if n = 1,000 and k = 10, then on
the Vertex Cover Problem, in which there are two natural "size" parameters for a computer executing a million high-leve! instructions per second, it would
a problem instance: the size of the graph, and the size of the vertex covey being take at least 1024 seconds to decide if G has a k-node vertex cover--which is
sought. The NP-completeness of Vertex Cover suggests that we will have to several orders of magnitude larger than the age of the universe. And this i~ for
be exponential in (at least) one of these parameters; but judiciously choosing a small value of k, where the problem was supposed to be more tractable! It’s
which one can have an enormous effect on the running time. natural to start asking whether we can do something that is practically viable
when k is a small constant.
Next we’ll explore the idea that many NP-complete graph problems be-
come polynomial-time solvable if we require the input to be a tiee. This is It turns out that a much better algorithm can be developed, with a running-
a concrete illustration of the way in which an input with "special structure" time bound of O(2k- kn). There are two things worth noticing about this. First,
can help us avoid many of the difficulties that can make the worst case in- plugging in n = 1,000 and k = 10, we see that our computer should be able to
tractable. Armed with this insight, one can generalize the notion of a tree to execute the algorithm in a few seconds. Second, we see that as k grows; the
a more general class of graphs--those with small tree-width--and show that running time is still increasing very rapidly; it’s simply that the exponenti .al
many NP-complete problems are tractable on this more general class as well. dependence on k has been moved out of the exponent on n and into a separate
function. From a practical point of view, this is much more appealing.
Having said this, we should stress that our. basic point remains the same
as it has always been: Exponential algorithms scale very badly. The current
chapter represents ways of staving off this problem that can be effective in ~ Designing the Algorithm
various settings, but there is clearly no way around it in the fully general case. As a first observation, we notice that if a graph has a small vertex cover, then
This will motivate our focus on approximation algorithms and !ocal search in it cannot have very many edges. Recall that the degree of a node is the number
subsequent chapters. of edges that are incident to it.
(10.1) If G = (V, E) has n nodes, the maximum degree of any node is at most
10.1 Finding Small Vertex Covers d, and there is a vertex cover of s~ze at most k, then G has at most kd edges.
Let us briefly recall the Vertex Cover Problem, which we saw in Chapter 8 Proof. Let S be a vertex cover in G of size k’ < k. Every edge in G has at least
when we covered NP-completeness. Given a graph G = (V, E) and an integer one end in S; but each node in S can cover at most d edges. Thus there can
k, we would like to find a vertex cover of size at most k--that is, a set of nodes be at most k’d <_ kd edges in G. *,
S __ V of size ISI _< k, such that every edge e ~ E has at least one end in S.
Like many NP-complete decision problems, Vertex Cover comes with two Since the degree of any node in a graph can be at most n - 1, we have the
parameters: n, the number of nodes in the graph, and k, the allowable size of following simple consequence of (10.1).
10.1 Finding Small Vertex Covers 557
Chapter 10 Extending the Limits of Tractability
556
Else, one of them (say, G-{u}) has a (k- l)-node vertex cover T
(10.2) If G = (V, E) has n nodes and a vertex cover of size k, then G has at In this case, TU[u} is a k-node vertex cover of G
most k(n - 1) <_ kn edges. Endif
Endif
So, as a first step in our algorithm, we can check if G contains more than
kn edges; if it does, then we know that the answer to the decision problem--
Is there a vertex cover of size at most k~.--is no. Having done this, we will
assume that G contains at most kn edges. ~ Analyzing the Algorithm
The idea behind the algorithm is conceptually very clean. We begin by Now we bound the running time of this algorithm. Intuitively, we are searching
considering any edge e = (a, v) in G. In any k-node vertex cover S of G, one a "tree of possibilities"; we can picture the recursive execution of the algorithm
of u or v must belong to S. Suppose that u belongs to such a vertex cover S. as giving rise to a tree, in which each node corresponds to a different recursive
Then if we delete u and all its incident edges, it must be possible to cover the call. A node corresponding to a recursive call with parameter k has, as children,
remaining edges by at most k - 1 nodes. That is, defining G-{a} to be the two nodes corresponding to recursive calls with parameter k - 1. Thus the tree
graph obtained by deleting a and all its incident edges, there must be a vertex has a total of at most 2k+l nodes. In each recursive call, we spend O(kn) time.
cover of size at most k - 1 in G-{a}. Similarly, if v belongs to S, this would
Thus, we can prove the following.
imply there is a vertex cover of size at most k - 1 in G-{v}.
Here is a concrete way to formulate this idea.
(10.4) The running time o[ the Vertex Cover Algorithm on an n-node graph,
with parameter k, is O(2k. kn).
{10’3} ~t e = (u, v) be any edge of G. The graph G has a vertex cover of size
at most k if and only if at least one of the graphs G-{u} and G-{v} has
vertex cover of size at most k - 1. We could also prove this by a recurrence as follows. If T(n, k) denotes the
running time on an n-node graph with parameter k, then T(., .) satisfies the
Proof. First, suppose G has a vertex cover S of size at most k. Then S contains following recurrence, for some absolute constant c:
at least one of u or v; suppose that it contains u. The set S-[a} must cover
all edges that have neither end equal to u. Therefore S-[al is a vertex cover T(n, 1) _< cn,
of size at most k - 1 for the graph G-[a]. T(n, k) < 2T(n, k - 1) + ckn.
Conversely, suppose that one of G-{u} and G-{v} has a vertex cover of
By induction on k _> 1, it is easy to prove that T(n, k) < c. 2kkn. Indeed, if this
size at most k - 1--suppose in particular that G- [u} has such a vertex cover
T. Then the set T U [u} covers all edges in G, so it is a vertex cover for G of is true for k - 1, then
Proof. If the paths in G can be k-colored, then we simply use these as the colors
in G’, assigning each of P[ and P[’ the color of Pi- In the resulting coloring, no
two paths with the same color have an edge in common.
Conversely, suppose the paths in G’ can be k-colored subject to the
additional restriction that P[ and P[’ receive the same color, for-each i =
1, 2 ..... k. Then we assign path Pi (for i < k) the common color of P[ and
P[’; and we assign path Pj (for j > k) the color that Pj gets in G’. Again, under
this coloring, no two paths with the same color have an edge in common. []
We’ve now transformed our problem into a search for a coloring of the
paths in G’ subject to the condition in (10.9): The paths P[ and P[’ (for 1 <i < k)
should get the same color.
Before proceeding, we introduce some further terminology that makes it
easier to talk about algorithms for this problem. First, since the names of the
colors are arbitrary, we can assume that path P~ is assigned the color i for
each i = 1, 2 ..... k. Now, for each edge ei = (vi, Vi+l), we let $i denote the
~ colonngs of (a, b, }’~ set of paths that contain this edge. A k-coloring of just the paths in Si has a
and {a", b", c"} must be / very simple structure: it is simply a way of assigning exactly one of the colors
~.nsistent..) {1, 2, :.., k} to each of the k paths in Si. We will think of such a k-coloring as
a one-to-one function f : Si ~ {1, 2 ..... k}.
Here’s the crucial definition: We say that a k-coloring f of $i and a k-
coloring g of Sj are consistent if there is a single k-coloring of all the paths
that is equal to f on Si, and also equal to g on Sj. In other words, the k-
colorings f and g on restricted parts of the instance could both arise from a
single k-coloring of the whole instance. We can state our problem in terms of
Figure 10.3 (a) Cutting through the cycle in an instance of Circular-Arc Coloring, and
then u~rolling it so it becomes, in (b), a collection of intervals on a line. consistency as follows: If f’ denotes the k-coloring of So that assigns color i to
P[, and f" denotes the k-coloring of Sn that assigns color i to P~’, then .we need
to decide whether f’ and f" are consistent.
pieces of the same path Pi on G, it’s not clear how to take the differing colors
of P~ and P~’ and infer from this how to color Pi on G. For example; having Searching for an Acceptable Inter~al Coloring It is not clear how to decide
the consistency of f’ and f" directly. Instead, we adopt a dynamic programming
~ a we get the set of intervals pictured in
sliced open the cycle in Figure 10.a(), approach by building up the solution through a series of subproblems.
Figure 10.3 (b). Suppose we compute a coloring so that the intervals in the first
row get the color 1, those in the second row get the color 2, and those in the The subproblems are as follows: For each set Si, worMng in order over
~d row get the color 3. Then we don’t have an obvious way to figure out a i = O, 1, 2 ..... n, we will compute the set Fi of all k-colorings on S, that are
consistent with f’. Once we have computed Fn, we need only check whether
color for a and c. it contains f" in order to answer our overall question: whether f’ and f" are
This suggests a way to formalize the relationship between the instance consistent.
Circular-Arc Coloring in G and the instance of Interval Colorifig in G’.
Chapter 10 Extending the Limits of TractabiliW 10.3 Coloring a Set of Circular Arcs 571
570
of f if all the paths in Si Q Si+~ have the same colors with respect to f and g. It
is easy to check that if g is an extension of f, and f is consistent with f’, then
so is g. On the other hand, suppose some co!oring g of Si+~ is consistent with We wi3_l discuss the running time of this algorithm in a moment. First,
f’; in other words, there is a coloring h of all paths that is equal to f’ on So and however, we show how to remove the assumption that the input instance has
is equal to g on Si+~. Then, if we consider the colors assigned by h to paths in uniform depth.
Si, we get a coloring f ~ Fi, and g is an extension of f.
This proves the following fact. Removing the Uniform-Depth Assumption Recall that the algorithm we just
designed assumes that’for each edge e, exactly k paths contain e. In general,
(10.10) The set Fi+~ is equal ~o the set of all extensions of k-colorings in Fi. each edge may carry a different number of paths, up to a maximum of k. (If
So, in order to compute Fi+p we simply need to list all extensions of all there were an edge contained in k+ 1 paths, then all these paths would need a
colorings in Fi. For each f ~ Fi, this means that we want a list of all colorings g different color, and so we could immediately conclude that the input instance
of Si+~ that agree with f on Si N Si+I. To do tl~S, we simply list all possible ways is not colorable with k colors.)
of assigning the colors of Si--~i+l (with respect to f) to the paths in Si+l--~i. It is not hard to modify the algorithm directly to handle the general case,
Merging these lists for all f ~ Fi then gives us Fi+p but it is also easy to reduce the general case to the uniform-depth case. For
Thus the overall algorithm is as follows. each edge e~ that carries only k~ < k paths, we add k - ki paths that consist
only of the single edge ei. We now have a uniform-depth instance, and we
claim
To determine whether f’ and f" are consistent:
Define F0 = {f’}
For i = l, 2 ..... n (10.11) The original instance can be colored with k colors if and only if the
For each f ~Fi modified instance (obtained by adding single-edge paths) can be colored with
Add all extensions of f to k colors.
EndYer
Proof. Clearly, if the modified instance has a k-coloring, ~hen we can use this
EndYer
Check whether f" is in Fn
same k-coloring for the original instance (simply ignoring the colors it assigns
to the single-edge paths that we added). Conversely, suppose the original
Figure 10.4 shows the results of executing this algorithm on the example instance has a k-coloring f. Then we can construct a k-coloring of the modified
of Figure 10.3. As with all the dynamic programming algorithms we have seen instance by starting with f and considering the extra single-edge paths one at
in this book, the actual coloring can be computed by tracing back through the a time, assigning any free color to each of these paths as we consider them.
steps that built up the sets F~, F2 .....
10.4 Tree Decompositions of Graphs 573
Chapter 10 Extending the Limits of Tractability
572
exist. We don’t encounter such a nice situation in general graphs, where there
~ Analyzing the Algorithm might not be a node that "breaks the communication" between subproblems
Finally, we bound the running time of the algorithm. This is dominated by the in the rest of the graph. Rather, for the Independent Set Problem in general
time to compute the sets F1, F2 ..... Fn. To build one of these sets Fi+l, we need graphs, decisions we make in one place seem to have complex repercussions
to consider each coloring f ~ Fi, and list all permutations of the colors that f all across the graph.
assigns to paths in Si-Sg+l. Since Si has k paths, the number of colorings in So we can ask a weaker version of our question_instead: For how general
F~ is at most k!. Listing all permutations of the colors that f assigns to Si-S~+~ a class of graphs can we use this notion of "limited intera~tion"--recursively
also involves enumerating a set of size ~!, where g _< k is the size of S~-S~+~. chopping up the i .nput using small sets of nodes--to design efficient algorithms
Thus the total time to compute Fi+~ from one F~ has the form Off(k)) for
for a problem like Maximum-Weight Independent Set?
a function f(.) that depends ouly on k. Over the n iterations of the outer loop In fact, there is a natural and rich class of graphs that supports this type
to compute F~, F2 ..... Fn, this gives a total running time of Off(/() ¯ n), as of algorithm; they are essentially "generalized trees," and for reasons that
desired. wil! become clear shortly, we will refer to them as graphs of bounded tree-
This concludes the description and analysis of the algorithm. We summa- width. Just as with trees, m.any NP-complete problems are tractable on graphs
rize its properties in the following statement. of bounded tree-width; and the class of graphs of bounded tree-width turns
out to have considerable practical value, since it includes many real-world
(10.12) Th~ algorithm described in this section correctly determines whether networks on which NP-complete graph problems arise. So, in a sense, this
a collection of paths on an n-node cycle can be colored with k colors;~ and its type of graph serves as a nice example of finding the "right" special case of a
running time is O(f(k) . n) for a function f(.) that depends only on k. problem that simultaneously allows for efficient algorithms and also includes
graphs that arise in practice.
Looking back on it, then, we see that the running time of the ,algorithm In this section, we define tree-width and give the general approach for
came from the intuition we described at the beginning of the section: For solving problems on graphs of bounded tree-width. In the next section, we
each i, the subproblems based on computing Fi and Fi+~ fit together along the discuss how to tell whether a given graph has bounded tree-width.
"narrow" interface consisting of the paths in just Si and Si+~, each of which
has size at most k. Thus the time needed to go from one to the other could
Defining Tree-Width
be made to depend only on k, and not on the size of the cycle G or on the
We now give a precise definition for this class of graphs that is designed
number of paths.
to generalize trees. The definition is motivated by two considerations. First,
we want to find graphs that we can decompose into disconnected pieces by
removing a small number of nodes; this allows us to implement dynamic
10.4 Tree Decompositions oI Graphs programming algorithms of the type we discussed earlier. Second, we want to
In the previous two sections, we’ve seen how particular NP-hard problems make precise the intuition conveyed by "tree-like" drawings of graphs as in
(specifically, Maximum-Weight Independent Set and Graph Coloring) can be Figure 10.5 (b).
solved when the input has a restricted structure. When you find yourself in
We want to claim that the graph G pictured in this figure is decomposable
this situation--able to solve an NP-complete problem in a reasonably natural
in a tree-like way, along the lines that we’ve been considering. If we were to
special case--it’s worth asking why the approach doesn’t work in general. As
encounter G as it is drawn in Figure 10.5(a), it might not be immediately clear
we discussed in Sections 10.2 and 10.3, our algorithms in both cases were
why this is so. In the drawing in Figure 10.5(b), however, we see that G is
taking advantage of a particular kind of structure: the fact that the input could
really composed of ten interlocking triangles; and seven of the ten triangles
be broken down into subproblems with very limited interaction. have the property that if we delete them, then the remainder of G falls apart into
For example, to solve Maximum-Weight Independent Set on a tree, we took disconnected pieces that recursively have this interlocking-triangle structure.
advantage of a special property of (rooted) trees: Once we decide whether or The other three triangles are attached at the extremities, and deleting them is
not to include a node u in the independent set, the subproblems in each subtree sort of like deleting the leaves of a tree.
become completely separated; we can solve each as though the others did not
Chapter 10 Extending the Limits of Tractability 10.4 Tree Decompositions of Graphs 575
574
o (Node Coverage) Every node of G belongs to at least one piece Vt.
o (Edge Coverage) For every edge e of G, there is some piece Vt containing
both ends of e.
(Coherence) Let tl, t2, and t3 be three nodes of T such that tz lies on the
path from t1 to t3. Then, if a node v of G belongs to both Vtl and Vt3, it
also belongs to Vt~_.
It’s worth checkin~g that the tree in Figure 10.5(c) is a tree decomposition of
the graph using the ten triangles as the pieces.
Next consider the case when the graph G is a tree. We can build a tree
C decomposition of it as follows. The decomposition tree T has a node tv for each
node V of G, and a node te for each edge e of G. The tree T has an edge (tv, te)
(c) when v is an end of e. Finally, if v is a node, then we define the piece Vt, = IV};
(a) (b) and if e = (u, v) is an edge, then we define the piece Vte= [u, v}. One can now
check that the three properties in the definition of a tree decomposition are
Figure 10.5 Parts (a) and (b) depict the same graph drawn in different ways. The drawing
in (b) emphasizes the way in which it is composed of ten interlocking triangles. Part (c) satisfied.
illustrates schematically how these ten triangles "fit together."
Gtl
No edge (u, u)
No edge (10.18) The value of it(U) is given "by the following recurrence:
d
it(U) = w(U) + ~ max{ft~(U~) _ w(U~ ~ U)"
More than
Proof. Suppose, by way of contradiction, that G has a (w + 1)-linked set X of ~2w elements
size at least 3w, and it also has a tree decomposition (T, {Vt}) of width less of X
than w; in other words, each piece Vt has size at most w. We may further
assume that (T, {Vt}) is nonredundant.
The idea of the proof is to find a piece Vt that is "centered" with respect
to X, so that when some part of Vt is deleted from G, one small subset of X is
separated from another. Since Vt has size at most w, this will contradict our
assumption that X is (w ÷ D-linked.
So how do we find this piece Vt? We first root the tree T at a node r; using Between tv and 2u~ elements of X
the same notation as before, we let Tt denote the subtree rooted at’a node
t, and write Gt for G5. Now let t be a node that is as far from the root r as Figure 10.9 The final step in the proof of (10.20).
possible, subject to the condition that Gt contains more than 2w nodes of X.
Clearly, t is not a leaf (or else Gt could contain at most w nodes of X); so
let tl ..... td be the children of t. Note that since each ti is farther than t from An Algorithm to Search for a Low, Width Tree Decomposition Building on
the root, each subgraph Gt~ contains at most 2w nodes of X. If there is a child ti these ideas, we now give a greedy algorithm for constructing a tree decomposi-
so that Gt~ contains at least w nodes of X, then we can define Y to be w nodes tion of low width. The algorithm will not precisely determine the tree-width of
of X belonging to Gti, and Z to be w nodes of X belonging to G-Gt~. Since the input graph G = (V, E); rather, given a parameter w, either it wil] produce
(T, {Vt}) is nonredundant, S = Vt~ N Vt has size at most w 1; but by (10.14), a tree decomposition of width less than 4w, or it will discover a (w + D-linked
deleting S disconnects Y-S from Z-S. This contradicts our assumption that set of size at least 3w. In the latter case, this constitutes a proof that the tree-
X is (w + 1)-linked. width of G is at least w, by (10.20); so our algorithm is essentially capable of
So we consider the case in which there is no child ti such that Gti contains narrowing down the true tree-width of G to within a factor of 4. As discussed
at least w nodes of X; Figure 10.9 suggests the structure of the argument in earlier, the running time will have the form O(f(w) ¯ ran), where rn and n are
this case. We begin with the node set of Gtl, combine it with Gt2, then Gt3, and the number of edges and nodes of G, and f(.) depends only on w.
so forth, unti! we first obtain a set of nodes containing more than w members Having worked with tree decompositions for a little while now, one can
of X. This will clearly happen by the time we get to Gtd, since Gt contains start imagining what might be involved in constructing one for an arbitrary
more than 2w nodes of X, and at most w of them can belong to Vt. So suppose input graph G. The process is depicted at a high level in Figure !0.!0. Our goal
our process of combining Gq, Gt2 .... first yields more than w members of X is to make G fall apart into tree-like portions; we begin the decomposition
once we reach index i _< d. Let W denote the set of nodes in the subgraphs by placing the first piece Vt anywhere. Now, hopefully, G-Vt consists of
Gt~, Gt2 ..... Gt~. By our stopping condition, we have IW C~XI > w. But since several disconnected components; we recursively move into each of these
Gti contains fewer than tv nodes of X, we also have I W N XI < 2w. Hence we components, placing a piece in each so that it partially overlaps the piece
can define Y to be tv + 1 nodes of X belonging to W, and Z to be tv + 1 nodes Vt that we’ve already defined. We hope that these new pieces cause the graph
of X belonging to V-W. By (10.13), the piece Vt is now a set of size at most to break up further, and we thus continue in this way, push~g forward with
tv whose deletion disconnects Y-Vt from Z-Vt. Again this contradicts our small sets while the graph breaks apart in front of us. The key to making this
assumption that X is (w + 1)-linked, completing the proof. [] algorithm work is to argue the following: If at some point we get stuck, and our
Chapter 10 Extending the Limits of Tractability 10.5 Constructing a Tree Decomposition
588 589
Why is this invariant so useful? It’s useful because it will let us add a new
node s to T and grow a new piece Vs in the component C, with the confidence
that s can be a leaf hanging off t in the larger partial tree decomposition.
Moreover, (,) requires there be at most 3w neighbors, while we are trying to
produce a tree decomposition of width less than 4w; this extra w gives our
Step 2 new piece "room" to expand by a little as it moves into C.
Step 3
Specifically, we now describe how to add a new node and a new piece
so that we still" have a partial tree decomposition, the invariant (,) is still
Step 1 maintained, and the set U has grown strictly larger. In this way, we make at
least one node’s worth of progress, and so the algorithm will terminate in at
most n iterations with a tree decomposition of the whole graph G.
Figure 10.11 Adding a new piece to the partial tree decomposition. Solved Exercises
Solved Exercise 1
We define a new piece Vs = X U S’, making s a leaf of t. All the edges As we’ve seen, 3-SAT is often used to model complex planning and decision-
from S’ into U have their ends in X, and IX U $/I < 3w + w = 4w, so we still making problems in artificial intelligence: the variables represent binary de-
have a partial tree decomposition. Moreover, the set of nodes co~ered by our cisions to be made, and the clauses represent constraints on these decisions.
partial tree decomposition has grown, since S’ A C is not empty. So we will be Systems that work with instances of 3-SAT often need to representMtuations
done if we can show that the invariant (*) still holds. This brings us exactly
in which some decisions have been made while others are still undetermined,
the intuition we tried to capture when discussing Figure 10.10: As we add the and for this purpose it is useful to introduce the notion of a partial assignment
new piece X U S’, we are hoping that the component C breaks up into further of truth values to variables.
components in a nice way.
Concretely, our partial tree decomposition now covers U U S’; and where Xn}, we say
that a partial assignment for X is an assignment of the value O, 1, or ? to each
we previously had a component C of G- U, we now may have several compo- xi; in other words, it is a function p : X --+ {0, 1, ?}. We say that a variable xi
nents C’ _ C of G- (U U S’). Each of these components C’ has all its. neighbors in is determined by the partial assignment if it receives the value 0 or 1, and
X U S’; but we must additionally make sure there are at most 3w such neigh- undetermined if it receives the value ?. We can think of a partial assignment
bors, so that the invariant (,) continues to hold. So consider one of these as choosing a truth value of 0 or 1 for each of its determined variables, and
components C’. We claim that all its neighbors in X U S’ actually belong to leaving the troth value of each undetermined variable up in the air.
one of the two subsets (X-Z) U S’ or (X-Y) U S~, and each of these sets has
size at most IXI < 3w. For, if this did not hold, then C’ would have a neighbor C,~, each a disjunction of
in both Y-S and Z-S, and hence there would be a path, through C’, from three distinct terms, we may be interested in whether a partial assignment is
sufficient to "force" the collection of clauses to be satisfied, regardless of how
Y-S to Z-S in G-S. But we have already argued that there cannot be such
a path. This establishes that (,) still holds after the addition of the new piece we set the undetermined variables. Similarly, we may be interested in whether
and completes the argument that the algorithm works correctly. there exists a partial assignment with only a few determined variables that
can force the collection of clauses to be satisfied; this smal! set of determined
Finally, what is the running time of the algorithm.~ The time to add a new
variables can be viewed as highly "influential," since their outcomes alone can
piece to the partial tree decomposition is dominated by the time required to be enough to force the satisfaction of the clauses.
check whether X is (w + 1)-linked, which is O(f(w). m). We do this for at
Chapter 10 Extending the Limits of Tractability Solved Exercises 593
592
following truth assignment v: v agrees with p on all determined variables, it
For example, suppose we are given clauses
assigns an arbitrary truth value to each undetermined variable not appearing
in Ci, and it sets each undetermined variable in Ci in a way that fails to satisfy
it. We observe that v sets each of the variables in Ci so as not to satisfy it, and
Then the partia! assignment that sets x1 to 1, sets x3 to 0, and sets a~ other
hence v is not a satisfying assignment. But v is consistent with p, and so it
variables to ? has only two determined variables, but it forces the collection
follows that p is not a forcing partial assignment. []
of clauses to be satisfied: No matter how we set the remaining four variables,
the clauses will be satisfied.
In view of (10.22), we have a problem that is very much like the search
Here’s a way to formalize this. Recall that a truth assignment for X is an
for small vertex cbvers at the beginning of the chapter. There we needed to
assignment of the value 0 or I to each xi; in other words, it must select a truth find a set of nodes that covered all edges, and we were limited to choosing at
value for every variable and not leave any variables undetermined. We. say that
most k nodes. Here we need to find a set of variables that covers all clauses
a truth assignment v is consistent with a partial assignment p if each variable (and with the right true/false values), and we’re limited to choosing at most
that is determined in p has the same truth value in both p and v. (In other
b variables.
words, if p(xi) #?, then p(xi) = v(xi).) Finally, we say that a partial assignment
So let’s try an analogue of the approach we used for finding a smal! vertex
p forces the collection of clauses C1 ..... Cm if, for every truth assignment v
cover. We pick an arbitrary clause Ce, containing xi, xj, and xk (each possibly
that is consistent with p, it is the case that v satisfies C1 ..... Cm. (We will also
negated). We know from (10.22) that any forcing assignment p must set one
call p a forcing partial assignment.)
of these three variables the way it appears in Ce, and so we can try all three
Motivated by the issues raised above, here’s the question. W6 are given a of these possibilities. Suppose we set x~ the way it appears in C~; we can then
collection of Boolean variables X = {x~, x2 ..... xn}, a parameter b < n, and eliminate from the instance all clauses (including Ce) that are satisfied by this
a collection of clauses C~ ..... Cm over the variables, where each clause is a assignment to xi, and consider trying to satisfy what’s left. We call this smaller
disjunction of three distinct terms. We want to decide whether there exists a set of clauses the instance reduced by the assignment to x~. We can do the same
forcing partial assignment p for X, such that at most b variables are determined for x] and xk. Since p must determine one of these three variables the way they
by p. Give an algorithm that solves this problem with a running time of the appear in Ce, and then still satisfy what’s left, we have justified the following
form Off(b). p(n, m)), where p(-) is a polynomial function, and f(-) is an analogue of (10.3). (To make the terminology a bit easier to discuss, we say
arbitrary function that depends only on b, not on n or m. that the size of a partial assignment is the number of variables it determines.)
Solution Intuitively, a forcing partia! assignment must "hit" each clause in
at least one place, since otherwise it wouldn’t be able to ensure the truth (10.23) There exists a forcing assignment of size at most b if and only if there
value. Although this seems natural, it’s not actually part of the definition (the is a forcing assignment of size at most b - 1 on at least one of the instances
definition just talks about truth assignments that are consistent with the partial reduced by the assignment to xi, xj, or xk.
assignment), so we begin by formalizing and proving this intuition.
We therefore have the following algorithm. (It relies on the boundary cases
(10.22) A partial assignment p forces all clauses if and only if, foreach clause in which there are no clauses (when by definition we can declare success) and
Ci, at least one of the variables in Ci is determined by p in a way that satis- in which there are clauses but b = 0 (in which case we declare failure).
fies Ci.
Proof. Clearly, if p determines at least one variable in each Ci in a way To search for a forcing partial assignment of size at most b:
that satisfies it, then no matter how we construct a full truth assignment for If there are no clauses, then by definition we’have
the remaining variables, all the clauses are already satisfied¯ Thus any truth a forclng assignment
assignment consistent with p satisfies all clauses. Else if b----O then by (10.22) there is no forcing assignment
Now, for the converse, suppose there is a clause Ci such that p does not Else let C~ be an arbitrary clause containing variables xi, x], xk
determine any of the variables in Ci in a way that satisfies Ci. We want to show For each of xi, x], Xk:
Set xi the way it appears in C~
that p is not forcing, which, according to the definition, requires us to exhibit
Reduce the instance by this assignment
a consistent truth assignment that does not satisfy all clauses. So consider the
Exercises 595
Chapter 10 Extending the Limits of Tractability
594
determine whether there’s a satisfying assignment in less time than it
Recursively check for a forcing assignment of size at
would take to enumerate all possible settings of the variables.
most b--1 on this reduced instmnce
Here we’ll develop one such algorithm, which solves instances of 3-
Endfor
If any of these recursive calls (say for xi) returns a SAT in O(p(n) ¯ (v~)n) time for some polynomial p(n). Note that the main
forcing assignment #’ of size most b--1 then term in this running time is (~%n, which is bounded by 1.74n.
Combining p’ with the assignment to Xi is the desired answer (a) For a truth assignment ~ for the variables Xl, xa ..... xn, we use qb (x~)
Else (none of these recursive calls succeeds)
to denote the value assigned by ~ to x~. (This can be either 0 or
There is no forcing assignment of size at most b 1.) If ® and ~’ are each truth assignments, we define the distance
Endif between $ and ®’ to be the number of variables x~ for which they
Endif assign different values, and we denote this distance by d(~, a;,). In
other words, d(~, ~’) = I{i : ®(xi) # ~’(x~)}l.
To bound the running time, we consider the tree of possibilities being A basic building block for our algorithm will be the ability to
searched, just as in the algorithm for finding a vertex cover. Each recursive answer the following kind of question: Given a truth assignment ~)
call gives rise to three children in this tree, and this goes on to a depth of at and a distance d, we’d like to know whether there exists a satisfying
most b. Thus the tree has at most 1 + 3 + 32 + ¯ ¯ ¯ + 3b < 3b+1 nodes, and at assignment ~)’ such that the distance from ~ to ®’ is at most d.
each node we spend at most O(m + n) time to produce the reduced instances. Consider the following algorithm, Explore(oh, d), that attempts to
Thus the total running time is O(3b(m + n)). answer this question.
Explore (~,d) :
Exercises If ~ is a satisfying assignment then return "yes"
Else if d = 0 then return "no"
In Exercise 5 of Chapter 8, we claimed that the Hitting Set Problem was
Else
NP-complete. To recap the definitions, consider a set A = {al ..... an} and a
collection B1. B2 ..... Bm of subsets of A. We say that a set H _ A is a hi.rig Let Q be a clause that is not satisfied by ¯
(i.e., all three terms in Ci evaluate to false) ~
set for the collection BI. B~_ ..... Bm ff H contains at least one element from Let ~I denote the assignment obtained from ¯ by
each B~--that is, ff H n B~ is not empty for each i. (So H "hits" all the sets
taking the variable that occurs in the first term of
B~.) clause Ci and inverting its assigned value
Now suppose we are given an instance of t~s problem, and we’d like Define ~2 and qb3 analogously in terms of the
to determine whether there is a hitting set for the collection of size at second and third terms of the clause Ci
most k. Furthermore suppose that each set B~ has at most c elements, for Recursively invoke :
a constant c. Give an algorithm that solves this problem with a running Explore (~I, d - I)
time of the form O([(c, k) ¯ p(n, m)), where p(.) is a polynomial function, Explore (qb2, d - I)
and f(-) is an arbitrary function that depends only on c and k, not on n Explore (~3, d - i)
If any of these three calls returns "yes"
then return "yes"
The difficulty in 3-SAT comes from the fact that there are 2n possible Else return "no"
assignments to the input variables xl, x2 ..... xn, and there’s no apparent
way to search this space in polynomial time. This intuitive picture, how- Prove that Explore(~, d) returns "yes" if and only if there exists
ever, might create the misleading impression that the fastest algorithms a satisfying assignment ®’ such that the distance from ~ to ~’ is at
for 3-SAT actually require time 2n. In fact, though it’s somewhat counter- most d. Also, give an analysis of the running time of Explore(qb, d)
intuitive when you first hear it, there are algorithms for 3-SAT that run as a function of n and d.
in significantly less than 2n time in the worst case; in other words, they
Chapter 10 Extending the Limits of TractabiliW
596 Exercises 597
(b) Clearly any two assignments ¯ and ~’ have distance at most n Prove that every triangulated cycle graph has a tree decomposition
from each other, so one way to solve the given instance of 3-SAT of width at most 2, and describe an efficient algorithm to construct such
would be to pick an arbitrary starting assignment ¢ and then run a decomposition.
ExpJ_ore(¢, n). However, this will not give us the iaxtming time we
want. The Minimum-CostDominating SetProblem is specified by an undirected
Instead, we will need to make several calls to Explore, from graph G = (V, E) and costs c(v) on the nodes v ~ V. A subset S c V is said
different starting poInts ¢, and search each time out to more limited to be a dominating set ff all nodes u ~ V-S have an edge (u, v) to a node u
distances. Describe how to do this in such a way that you can solve in S. (Note the difference between dominating sets and vertex covers: in
the Instance of 3-SAT In a running time of ordy O(p(n). (~f~)n). a dominating set, it is fine to have an edge (tt, u) with neither u nor v in
the set S as long as both u and v have neighbors in S.)
(a) Give a polynomial-time algorithm for the Dominating Set Problem for
Suppose we are given a directed graph G = (V, E), with V = {Vl, the special case in which G is a tree.
and we want to decide whether G has a Hamiltonian path from Vl to vn. (b) Give a polynomial-time algorithm for the Dominating Set Problem for
(That is, is there a path in G that goes from v~ to vn, passing through every the special case in which G has tree-width 2, and we are also given a
other vertex exactly once?) tree decomposition of G with width 2.
Since the Hamiltonian Path Problem is NP-complete, we do not ex-
The Node-Disjoint Paths Problem is given by an undirected graph G and
pect that there is a polynomial-time solution for this problem. However,
this does not mean that all nonpolynomial-time algorithms are ,equally k pairs of nodes (s~, ti) for i = 1 ..... k. The problem is to decide whether
"bad." For example, here’s the simplest brute-force approach: For each there are node-disjoint paths Pi so that path p~ connects s~ to t~. Give a
permutation of the vertices, see ff it forms a Hamiltonian path from polynomial-time algorithm for the Node-Disjoint Paths Problem for the
special case in which G has tree-width 2, and we are also given a tree
to vn. This takes time roughly proportional to n!, which is about 3 x !017
decomposition T of G with width 2.
when n = 20.
Show that the Hamiltonian Path Problem can In fact be solved In time The chromatic number of a graph G is the minimum k such that it has a
0(2n¯ p(n)), where p(n) is a polynomial function of n. This is a much better k-coloring. As we saw in Chapter 8, it is NP-complete for k > 3 to decide
algorithm for moderate values of n; for example, 2n is only about a million whether a given Input graph has chromatic number < k.
when n = 20. (a) Show that for every natural number w > 1, there is a number k(w) so
that the following holds.~-~ G is a graph of tree-width at ~ost w, then
G has chromatic number at most k(w). (The point is that k(w) depends
We say that a graph G = (V, E) is a triangulated cycle graph If it consists ouly on tu, not on the number of nodes in G.)
of the vertices and edges of a triangulated convex n-gon in the plane--in
(b) Given an undirected x-node graph G = (V, E) of tree-width at most
other words, If it can be drawn in the plane as follows.
w, show how to compute the chromatic number of G in time O(f(ra) ¯
The vertices are all placed on the boundary of a convex set In the plane p(n)), where p(.) is a polynomial but f(.) can be an arbitrary function.
(we may assume on the boundary of a circle), with each pair of consecutive
vertices on the circle joined by an edge. The remaining edges are then Consider the class of 3-SAT instances in which each of the n variables
drawn as straight line segments through the interior of the circle, with no occurs--counting positive and negated appearances combined--in ex-
pair of edges crossing in the Interior. We require the drawing to have the actly three clauses. Show that any such Instance of 3-SAT is in fact sat-
following property. If we let S denote the set of all poInts in theplane that isfiable, and that a satisfying assignment can be found In polynomial
Figure 10.12 A triangulated lie onvertices or edges of the drawing, then each bounded component of time.
cycle graph: The edges form the plane after deleting S is bordered by exactly, three edges. (This is the
the boundary of a convex Give a polynomial-time algorithm for the following problem. We are given
polygon together with a set sense in which the graph is a "triangulation.") a binary tree T = (V, E) with an even number of nodes, and a nonnegative
of line segments that divide A triangulated cycle graph is pictured In Figure 10.12. weight on each edge. We wish to find a partition of the nodes V into two
its interior into triangles.
Chapter 10 Extending the Limits of Tractability
598
sets of equal size so that the weight of the cut between the two sets is
as large as possible (i.e., the total weight of edges with one end in each
set is as large as possible). Note that the restriction that the graph is a
tree is crucial here, but the assumption that the tree is binary is not. The
problem is NP-hard in general graphs. Chapter
Notes and Further Reading
The first topic in this chapter, on how to avoid a rhnning time of O(kn~+1) for Approximation A ~gorithms
Vertex Cover, is an example of the general theme of parameterized complexity:
for problems with two such "size parameters" n and k, one generally prefers
running times of the form O(f(k) -p(n)), where p(-) is a polynomial, rather
than running times of the form O(n~). A body of work has grown up around
this issue, including a methodology for identifying NP-complete problems that
are unlikely to allow for such improved running times. This area is covered in
the book by Downey and Fellows (1999).
Following our encounter with NP-completeness and the idea of computational
The problem of coloring a collection of circular arcs was shown to be
intractability in general, we’ve been dealing with a fimdamental question: How
NP-complete by Garey, Johnson, Miller, and Papadimitriou (1980). They also
should we design algorithms for problems where polynomial time is probably
described how the algorithm presented in this chapter follows directly from
an unattainable goal?
a construction due to Tucker (1975). Both Interval Coloring and circular-
Arc Coloring belong to the following class of problems: Take a collection of In this chapter, we focus On a new theme related to this question: approx-
geometric objects (such as intervals or arcs), define a graph by joining pairs imation algorithms, which run in polynomial time and find solutions that are
of objects that intersect, and study the problem of coloring this graph. The guaranteed to be close to optimal. There are two key words to notice in this
book on graph co!oring by Jensen and Toff (1995) includes descriptions of a definition: close and guaranteed. We will not be seeking the optimal solution,
and as a result, it becomes feasible to aim for a polynomial running time. At
number of other problems in this style.
the same time, we will be interested in proving that our algorithms find so-
The importance of tree decompositions and tree-width was brought into
lutions that are guaranteed to be close to the optimum. There is something
prominence largely through the work of Robertson and Seymour (1990). The
inherently tricky in trying to do this: In order to prove an approximation guar-
algorithm for constructing a tree decomposition described in Section 10.5 is antee, we need to compare our solution with--and hence reason about--an
due to Diestel et al. (!999). Further discussion of tree-width and its.role in both optimal solution that is computationally very hard to find. This difficulty wil!
algorithms and graph theory can be found in the survey by Reed (1997) and
be a recurring issue in the analysis of the algorithms in this chapter.
the book by Diestel (2000). Tree-width has also come to play an important role
in inference algorithms for probabilistic models in machine learning (Jordan We will consider four general techniques for designing approximation al-
gorithms. We start with greedy algorithms, analogous to the kind of algorithms
1998).
Notes on the Exercises Exercise 2 is based on a result of Uwe Sch~ning; and we developed in Chapter 4. These algorithms will be simple and fast, as in
Chapter 4, with the challenge being to find a greedy rule that leads to solu-
Exercise 8 is based on a problem we learned from Amit Kumar.
lions provably close to optimal. The second general approach we pursue is
the pricing method. This approach is motivated by an economic perspective;
we will consider a price one has to pay to enforce each constraint of the prob-
lem. For example, in a graph problem, we can think of the nodes or edges of
the graph sha~ng the cost of the solution in some equitable way. The pricing
method is often referred to as the primal-dual technique, a term inherited from
11.1 Greedy Algorithms and Bounds on the Optimum: A Load Balancing Problem 601
Chapter 11 Approximation Algorithms
600
~ Designing the Mgorithm
the study of linear programming, which can also be used to motivate this ap- We first consider a very simple greedy algOrithm for the problem. The algorithm
proach. Our presentation of the pricing method here will not assume familiarity
makes one pass through the jobs in any order; when it comes to jobj, it assigns
with linear programming. We will introduce linear programming through our j to the machine whose load is smallest so far.
third technique in this chapter, linear programming and rounding, in which
one exploits the relationship between the computational feasibility of linear
Greedy-Balance :
programming and the expressive power of its more difficult cousin, integer
Start with no jobs assigned
programming. Finally, we will describe a technique that can lead to extremely
Set Ti=0 and A(0=0 for all machines Mi
good approximations: using dynamic programming on a rounded version of
For j=l .....
the input. Let Mi be a machine that achieves the minimum min/~ Tk
Assign job j to machine M
j~A(i)
M1
and we declare this to be the load on machine Mi. We seek to minimize a
quantity known as the makespan; it is simply the maximum load on any Figure 11.1 The result of running the greedy load balancing algorithm on three
machine, T = maxi Ti. Mthough we wil! not prove this, the scheduling problem machines with job sizes 2, 3, 4, 6, 2, 2,
of finding an assignment of minimum makespan is NP-hard.
11.1 Greedy Algorithms and Bounds on the Optimum: A Load Balancing Problem 603
Chapter 11 Approximation Algorithms
602
analysis, therefore, we wig need a lower bound on the optimum--a quantity
I~he contribution of
with the guarantee that no matter how good the optimum is, it cannot be less e last job alone is
than this bound. most the optimum
There are many possible lower bounds on the optimum. One idea for a
lower bound is based on considering the total processing time ~j tj. One of
the m machines must do at least a 1/m fraction of the total work, and so we ust before adding
have the following.
(iL3) AIgO~thra Greedy2Bal~he produCe~ an assign~ent b[ ]obs to ~a- ¯ Now we account for the remaining part of the load onM~, which is just the
chines wtth makespan T_ 2T , final job j. Here we simply use the other lower bound we have, (11.2), which
says that ti < T*. Adding up these two inequalities, we see that
Prooi. Here is the overall plan for the proof. In analyzing an approximation
algorithm, one compares the solution obtained to what one knows about the =- + < 2.W*.
optimum--in this case, our lower bounds (11.1) and (11.2). We consider a Since our makespan T is equal to Ti, this is the result we want. []
machine Mi that attains the maximum load T in our assignment, and we ask:
What was the last job j to be placed on Mi~. If tj is not too large relative t.o m6st It is not hard to give an example in which the solution is indeed close
of the other jobs, then we are not too far above the lower bound (11.1). And, to a factor of 2 away from optimal. Suppose we have m machines and
if tj is a very large job, then we can use (11.2). Figure 11.2 shows the structure n = m(m - 1) + 1 jobs. The first rn(m - 1) = n - 1 jobs each require time tj = 1.
of this argument. The last job is much larger; it requires time tn = m. What does our greedy
Here is how we can make this precise. When we assigned job j to Mi, the algorithm do with this sequence of jobs? It evenly balances the first n - 1 jobs,
machine Mi had the smallest load of any machine; this is the key property and then has to add the giant job n to one of them; the resulting makespan is
of our greedy algorithm. Its load just before this assignment was Ti - tp and T=2m- 1.
since this was the smallest load at that moment, it follows that every machine
11.1 Greedy Algorithms and Bounds on the Optimum: A Load Balancing Problem
Chapter 11 Approximation Algorithms
605
604
We will prove that the resulting assignment has a makespan that is at most 1.5
Optimal solution:
Approximate solution times the optimum.
via greedy algorithm:
greedy
What does the optimal solution look like in this example? It assigns the The improvement comes from the fo!lowing observation. If we have fewer
large job to one of the machines, say, M1, and evenly spreads the remaining than m jobs, then the greedy solution will clearly be optimal, since itputs each
jobs over the other m- 1 machines. This results in a makespan of m. Thus job on its own machine. And if we have more than m jobs, then we can use
¯
the ratio between the greedy algorithm solution and the optimal solution is the following further lower bound on the optimum.
(2m - 1)/m = 2 - 1Ira, which is close to a factor of 2 when m is large.
(11.4) If there are more than m jobs, then T* >_ 2tin+~.
See Figure 11.3 for a picture of this with m----4; one has to admire the
perversity of the construction, which misleads the greedy algorithm into Proof. Consider only the first m + 1 jobs in the sorted order. They each take
perfectly balancing everything, only to mess everything up with the final giant at least tin+1 time. There are m + 1 jobs and only m machines, so there must
item. be a machine that gets assigned two of these jobs. This machine will have
In fact, with a little care, one can improve the analysis in (11.3) to show processing time at least 2tin+P []
that the greedy algorithm with m machines is within exactly this factor of
2 - 1Ira on every instance; the example above is really as bad as possible¯
:
.
Now let’s think about how we might develop a better approximation Proof. The proof will be very similar to the analysis of the previous algorithm.
algorithm--in other words, one for which we are always guaranteed to be As before, we will consider a machine M~ that has the maximum load. If M~
within a factor strictly smaller than 2 away from the optimum. To do this, it only holds a single job, then the schedule is optimal.
helps to think about the worst cases for our current approximation algorithm. So let’s assume that machine M~ has at least two jqbs, and let t1 be the
Our earlier bad example had the following flavor: We spread everything out last job assigned to the machine. Note thatj > m + 1, since the algorithm will
very evenly across the machines, and then one last, giant, unfortunate iob assign the first m jobs to m distinct machines. Thus t1 < tra+~ < ½T*, where
arrived. Intuitively, it looks like it would help to get the largest jobs arranged the second inequality is (11.4).
nicely first, with the idea that later, small iobs can only do so much damage.
And in fact, this idea does lead to a measurable improvement. We now proceed as in the proof of (11.3), with the following single change.
Thus we now analyze the variant of the greedy algorithm that first sorts At the end of that proof, we had inequalities T~ - t] _< T* and t] _< T*, and we
the jobs in decreasing order of processing time and then proceeds as before. added them up to get the factor of 2. But in our case here, the second of these
11.2 The Center Selection Problem 607
Chapter 11 Approximation Algorithms
606
Next we have to clarify what we mean by the goal of wanting the centers
inequalities is, in fact, tj1 < iT , so adding the two inequalities gives us the
to be "central." Let C be a set of centers. We assume that the people in a given
bound town will shop at the closest mall. This suggests we define the distance of a
Ti < ! T*. [] site s to the centers as dist(s, C) = mAnc~c dist(s, c). We say that C forms an r-
-2 couer if each site is within distance at most r from one of the centers--that is,
if dist(s, C) < r for a!l sites s E S. The minimum r for which C is an r-cover will
be called the covering radius of C and will be denoted by r(C). In other words,
11.2 The Center Selection Problem the covering radius of a set of centers C is the faythest that anyone needs to
Like the problem in the previous section, the Center Selection Problem, which travel to get to his" or her nearest center. Our goal will be to select a set C of k
we consider here, also relates to the general task of allocating work across centers for which r(C) is as small as possible.
multiple servers. The issue at the heart of Center Selection is where best to
place the servers; in order to keep the formulation dean and simple, we will not
incorporate the notion of load balancing into the problem. The Center Selection f! Designing and Analyzing the Algorithm
Problem also provides an example of a case in which the most natural greedy Difficulties with a Simple Greedy Algorithm We now discuss, greedy algo-
algorithm can result in an arbitrarily bad solution, but a slightly different rithms for this problem. As before, the meaning of "greedy" here is necessarily
greedy method is guaranteed to always result in a near-optimal solution. a little fuzzy; essentially: we consider algorithms that select sites once by one in
a myopic fashion--that is, choosing each without explicitly considering where
the remaining sites will go.
j The Problem Probably the simplest greedy algorithm would work as follows. It would
’ Consider the following scenario. We have a set S of n sites--say, n little towns put the first center at the best possible location for a single center, then keep
in upstate New York. We want to select k centers for building large shopping adding centers so as to reduce the covering radius, each time, b~ as much as
malls. We expect that people in each of these n towns will shop at ofle of the possible. It turns out that this approach is a bit too simplistic to be effective:
malls, and so we want to select the sites of the k malls to be central. there are cases where it can lead to very bad solutions.
Let us start by defining the input to our problem more formally. We are To see that this simple greedy approach can be really bad, consider an
given an integer k, a set S of n sites (corresponding to the towns), and a example with only two sites s and z, and k = 2. Assume that s and z are
distance function. When we consider instances where the sites are points located in the plane, with distance equal to the standard Euclidean distance
in the plane, the distance function will be the standard Euclidean distance in the plane, and that any point in the plane is an option for placing a center.
between points, and any point in the plane is an option for placing a center. Let d be the distance between s and z. Then the best location for a single
The algorithm we develop, however, can be applied to more general notions of cenfer c1 is halfway between s and z, and the covering radius of this one
distance. In applications, distance sometimes means straight-line distance, but center is r({cl}) : d/2. The greedy algorithm would start with c1 as the first
can also mean the travel time from point s to point z, or the drivin~ distance center. No matter where we add a second center, at least one of s or z will have
(i.e., distance along roads), or even the cost of traveling. We will allow any the center c~ as closest, and so the covering radius of the set of two Centers
distance function that satisfies the following natural properties. will still be d/2. Note that the optimum solution with k = 2 is to select s and
z themselves as the centers. This will lead to a covering radius of 0. A more
o dist(s, s) = 0 for all s ~ S complex example illustrating the same problem can be obtained by having two
o the distance is symmetric: dist(s, z) = dist(z, s) for all sites s, z ~ S dense "clusters" of sites, one around s and one around z. Here our proposed
~ the triangle inequality: dist(s, z) + dist(z, h) > dist(s, h) greedy algorithm would start by opening a center halfway between the clusters,
while the optimum solution would open a separate center for each cluster.
The first and third of these properties tend to be satisfied by essentially all
natural notions of distance. Although there are applications with asymmetric Knowing the Optimal Radi~ Helps In searching for an improved algorithm,
distances, most cases of interest also satisfy the second property. Our greedy we begin with a useful thought experiment. Suppose for a minute that someone
told us what the optimum radius r is. Would this information help? That is,
gorithm will apply to any distance function that satisfies these three properties,
suppose we know that there is a set of k centers C* with radius r(C*) <_ r, and
and it will depend on a!l three.
Chapter 11 Approximation Algorithms 11.2 The Center Selection Problem
608 609
Next we argue that if the algorithm fails.to return a set of centers, then its
conclusion that no set can have covering radius at most r is indeed correct.
Site s covered by c* (11.7) Suppose the algorithm selects more than k centers. Then, ]’or any set
C* of size at most k, the covering radius is r(C*) > r.
Figure 11.4 Everything covered at radius r by c* is also covered at radius 2r by s.
Proof. Assume the opposite, that there is a set C* of at most k centers with
our job is to find some set of k centers C whose coveting radius is not much covering radius r(C*) < r. Each center c ~ C selected by the greedy algorithm
more than r. It turns out that finding a set of/~ centers with coveting radius at is one of the original sites in S, and the set~* has covering radius at most r,
: so there must be a center c* ~ C* that is at most a distance of r from c--that
is, dist(c, c*) <_ r. Let us say that such a center c* is close to c. We want to
Here is the idea: We can use the existence of this solution C* in our claim that no center c* in the optimal solution C* can be close to two different
algorithm even though we do not know what C* is. Consider any site s ~ S. centers in the greedy solution C. If we can do this, we are done: each center
There must be a center c* ~ C* that covers site s, and this center c* is at c ~ C has a close optimal center c* ~ C*, and each of these close optimal centers
distance at most r from s. Now our idea would be to take this site s as a is distinct. This will imply that IC*l > ICI, and since ICI > k, this will contradict
center in our solution instead of c*, as we have no idea what c* is. We would our assumption that C* contains at most k centers.
like to make s cover all the sites that c* covers in the unknown solution C*.
This is accomplished by expanding the radius from r to 2r. All .the sites that So we just need to show that no optimal center c* ~ C can be close to each
of two centers c, c’ ~ C. The reason for this is pictured in Figure 11.5. Each pair
were at distance at most r from center c* are at distance at most 2r from s of centers c, c’ ~ C is separated by a distance of more than 2r, so if c* were
(by the triangle inequality). See Figure 11.4 for a simple illustration of this
within a distance of at most r from each, then this would violate the triangle
argument.
inequality, since dist(c, c*) + dist(c*, c’) >_ dist(c, c’) > 2r. ,.
A Greedy Algorithm That Works For the specific case of the Center Selection
Problem, there is a surprising way to get around the assumption of knowing the
radius, without resorting to the general technique described earlier. It turns out
we can run essentially the same greedy algorithm developed earlier without
knowing anything about the value of r.
The earlier greedy algorithm, armed with knowledge of r, repeatedly
selects one of the original sites s as the next center, making sure that it is
at least 2r away if, ore all previously selected sites. To achieve essentially the
same effect without knowing r, we can simply select the site s that is farthest
away from all previously selected centers: If there is any site at least 2r away
from all previously chosen centers, then this farthest site s must be one of
them. Here is the resulting algorithm.
Centers used by optimal solution
have S’ # 0 after selecting k centers, as it would have s ~ S’, and so it would weight of a set cover is at most k if and only if there is a collection of at most
go on and select more than k centers and eventually conclude that k centers k sets that covers U.
cannot have covering radius at most r. This contradicts our choide of r, and
the contradiction proves that r(C) <_ 2r.. ~ Designing the Algorithm
We will develop and analyze a greedy algorithm for this problem. The algo-
Note the smprising fact that our final greedy 2-approximation algorithm rithm will have the property that it builds the cover one set at a time; to choose
is a very simple modification of the first greedy algorithm that did not work. its next set, it looks for one that seems to make the most progress toward the
Perhaps the most important change is simply that our algorithm always selects goal. What is a natural way to define "progress" in this setting? Desirable
sites as centers (i.e., every mall will be built in one of the little towns and not sets have two properties: They have small weight wi, and they cover lots of
halfway between two of them). elements. Neither of these properties alone, however, would be enough for
designing a good approximation algorithm. Instead, it is natural to combine
these two criteria into the single measure wi/ISil--that is, by selecting Si, we
11.3 Set Cover: A General Greedy Heuristic cover tSit elements at a cost of wi, and so this ratio gives the ,COSt per element
In this section we will consider a very general problem that we also encoun- covered," a very reasonable thing to use as a guide.
tered in Chapter 8, the Set Cover Problem. A number of important algorithmic Of course, once some sets have already been selected, we are only con-
problems can be formulated as special cases of Set Cover, and hencean ap- cerned with how we are doing on the elements still 1eft uncovered. So we will
proximation algorithm for this problem will be widely applicable. We will see maintain the set R of remaining uncovered elements and choose the set Si that
that it is possible to design a greedy algorithm here that produces solutions minimizes wi/[Si ~ RI.
with a guaranteed approximation factor relative to the optimum, although this
factor will be weaker than what we saw for the problems in Sections 11.1 and
Greedy-Set-Cover:
11.2.
Start with R= U and no sets selected
While the greedy algorithm we design for Set Cover will be very simple, While R # 0 -
the analysis will be more complex than what we encountered in the previous Select set Si that minimizes u~i/[S~ N
two sections. There we were able to get by with very simple bounds on Delete set S~ from R
the (unknown) optimum solution, while here the task of comparing to the EndWhile
optimum is more difficult, and we will need to use more sophisticated bounds. Return the selected sets
This aspect of the method can be viewed as our first example of the pricing
method, which we will explore more fully in the next two sections. As an example of the behavior of this algorithm, consider what it would do
on the instance in Figure 11.6. It would first choose the set containing the four
~ The Problem nodes at the bottom (since this has the best weight-to-coverage ratio, !/4). It
Recall from our discussion of NP-completeness that the Set Cover Problem is then chooses the set containing the two nodes in the second row, and finally
based on a set U of n elements and a list $1 ..... Srn of subsets of U; we say it chooses the sets containing the two individual nodes at the top. It thereby
that a set cover is a collection of these sets whose tmion is equal to all of U. chooses a collection of sets of total weight 4. Because it myopically chooses
the best option each time, this algorithm misses the fact that there’s a way to
In the version of the problem we consider here, each set Si has an
cover everything using a weight of just 2 + 2~, by selectit~g the two sets that
associated weight tvi >_ O. The goal is to find a set cover e so that the total
each cover a fi~ column.
weight
~4~ Analyzing the Algorithm
The sets selected by the algorithm clearly form a set cover. The question we
is minimized. Note that this problem is at least as hard as the decision version want to address is: How much larger is the weight of this set cover than the
of Set Cover we encountered earlier; if we set al! wi = 1, then the minimum weight ~v* of an optimal set cover?
11.3 Set Cover: A General Greedy Heuristic
614 Chapter 11 Approximation Algorithms 615
element s in the quantity cs. We add the following line to the code immediately
I+£ after selecting the set St.
1 The values cs do not affect the behavior of the algorithm at all; we view
wo sets can be used to ~
I:~.
over everything, but the|
reedy algorithm doesn’t|
nd them. J
them as a bookkeeping device to help in our comparison with the optimum
w*. As each set Si is selected, its weight is distributed over the costs cs of the
elements that are ~ewly covered. Thus these costs completely account for the
total weight of the set cover, and so we have
(11.9) If ~ is the set cover obtained by Greedy-Set-Cover, then ~s~e wi =
~s~u Cs.
The key to the analysis is to ask how much total cost any single set Sk
can account for--in other words, to give a bound on ~,s~sk Cs relative to the
weight wk of the set, even for sets not selected by the greedy algorithm. Giving
an upper bound on the ratio
~k
that holds for every set says, in effect, "To cover a lot of cost, you must use a lot
of weight." We know that the optimum solution must cover the full cost ~s~u Cs
via the sets it selects; so this type of bound will establish that it needs to use
at least a certain amount of weight. This is a lower bound on the optimum,
just as we need for the analysis.
Our analysis will use the harmonic function
n
Figure 11.6 An instance of the Set Cover Problem where the weights of sets are either
H(n) = ~ 1
i---1 l
1 or 1 + e for some small e > 0. The greedy algorithm chooses sets of tot.a1 weight 4,
rather than the optimal solution of weight 2 + 2e. To understand its asymptotic size as a function of n, we can interpret it as a
sum approximating the area under the curve y = 1Ix. Figure 11.7 shows how
it is naturally bounded above by 1 + f{nl ~ dx = 1 + In n, and bounded below
. rn+l 1
by dl ~ ax = ln(n + !). Thus we see that H(n) = ® (lnn).
As in Sections 11.1 and 11.2, our analysis will require a good lower bound
on this optimum. In the case of the Load Balancing Problem, we used lower Here is the key to establishing a bound on the performance of the algo-
bounds that emerged naturally from the statement of the problem: the average
load, and the maximum iob size. The Set Cover Problem wil! turn out to be (11.10) For every set Sk, the sum ~s~sk Cs is at most H (ISk[) ¯
more subtle; "simple" lower bounds are not very useful, and instead we will
Proof. To simplify the notation, we will assume that the elements of S~ are
use a lower bound that the greedy algorithm implicitly constructs as a by-
product. sa]. Furthermore,
Recall the intuitive meaning of the ratio wi/ISi ~ RI used by the algorithm; it let us assume that these elements are labeled in the order in which they are
is the "cost paid" for covering each new element. Let’s record this cost paid for assigned a cost cs~ by the greedy algorithm (with ties broken arbitrarily). There
616 Chapter 11 Approximation Algorithms 11.3 Set Cover: A General Greedy Heuristic 617
Proof. Let e* denote the optimum set cover, so that w* = ~si~e* wi. For each
of the sets in e*, (11.10) implies
y = 1/x
1
1
Cs.
s~Si
s~U
Figure 11.7 Upper and lower bounds for the Harmonic Function H(n).
Combining these with (11.9), we obtain the desired bound:
is no loss of generality in doing this, since it simply involves a renaming of the "
w,= E w,_>s~e*E 1 1
Es~siCs->H(d*) E Cs 1
H(d*)E
=- w,. []
elements in U.
size* s~U Si~
Now consider the iteration in which element s1 is covered by~e g,reedy
Sd ~ R by
Asymptotically, then, the bound in (11. !1) says that the greedy algorithm
our labeling of the elements. This implies that ISk r3 RI is at least d -] + 1, and
finds a solution within a factor O(log d*) of optimal. Since the maximum set
so the average cost of the set Sk is at most
size d* can be a constant fraction of the tota! number of elements n, this is a
Wk worst-case upper bound of O(log n). However, expressing the bound in terms
o
ISknR -d-j+1 of d* shows us that we’re doing much better if the largest set is small.
Note that this is not necessarily an equality, since sj may be covered in the It’s interesting to note that this bound is essentially the best one possible,
same iteration as some of the other elements sy for j’ < j. In this iteration, the since there are instances where the greedy algorithm can do this badly. To
greedy algorithm selected a set Si of minimum average cost; so this set Si has see how such instances arise, consider again the example in Figure 11.6. Now
average cost at most that of S~. It is the average cost of Si that gets assigned suppose we generalize this so that the underlying set of elements U consists
to sp and so we have of two tall columns with n/2 elements each. There are still two sets, each of
weight 1 + ~, for some small e > 0, that cover the columns separately. We also
create O(log n) sets that generalize the structure of the other sets in the figure:
there is a set that covers the bottommost n/2 nodes, another that covers the
We now simply add up these inequalities for all elements s ~ S~: next n/4, another that covers the next rt/8, and so forth. Each of these sets
d will have weight 1.
Wk Wk wt~ = H(d) . wk. Now the greedy algorithm will choose the sets of size n/2, n/4, n/8 .....
]=1
1d in the process producing a solution of weight S2(!og n). Choosing the two
sets that cover the columns separately, on the other hand, yields the optimal
We now complete our plan to use the bound in (11.10) for comparing the solution, with weight 2 + 2e. Through more complicated constructions, one
greedy algorithm’s set cover to the optimal one. Letting d* = max~ ISgl denote can strengthen this to produce instances where the greedy algorithm incurs
the maximum size of any set, we have the following approximation result. a weight that is very close to H(n) times t_he optimal weight. And in fact, by
much more complicated means, it has been shown that no polynomial-time
(11.11) The set cover e selected by Greedy-Set-Cover has weight at most approximation algorithm can achieve an approximation bound much better
H (d*) times the optimal weight than H(n) times optimal, unless P = NP.
Chapter 11 Approximation Algorithms 11.4 The Pricing Method: Vertex Cover
618 619
instance of Set Cover as follows. The underlying set U is equal to E. For each
11.4 The Pricing Method: Vertex Cover node i, we define a set Si consisting of all edges incident to node i and give
We now turn to our second general technique for designing approximation this set weight wi. Collections of sets that cover U now correspond precisely to
algorithms, the pricing method. We wil! introduce this technique by consid- vertex covers. Note that the maximum size of any S~ is precisely the maximum
ering a version of the Vertex Cover Problem. As we saw in Chapter 8, Vertex degree d.
Cover is in fact a special case of Set Cover, and so we will begin this section
Hence we can use the approximation algorithm for Set Cover to find a
by considering the extent to which one can use reductions in the design of vertex cover whose weight is within a factor of H(d) of minimum. []
approximation algorithms. Following this, we will develop an algorithm with
a better approximation guarantee than the general bound that we obtained for
Set Cover in the previous section. This H(d)-approximation is quite good when d is small; but it gets worse
as d gets larger, approaching a bound that is logarithmic in the number of
~ The Problem vertices. In the following, we will develop a stronger approximation algorithm
that comes within a factor of 2 of optimal.
Recall that a vertex cover in a graph G = (V, E) is a set S __ V so that each
edge has at least one end in S. In the version of the problem we consider here ..... Before turning to the 2-approximation algorithm, we make the following
each vertex i ~ V has a weight FOi >_ O, with the weight of a set S of vertices further observation: One has to be very careful when ~g to use reductions
denoted Lv(S) = ~i~S LVi. We would like to find a vertex cover S for which tv(S) for designing approximation algorithms. It worked in (!!.12), but we made
is minimum. When all weights are equal to 1, deciding if there is a vertex cover sure to go through an argument for why it worked; it is not the case that every
of weight at most k is the standard decision version of Vertex Cover. polynomial-time reduction leads to a comparable implication for approxima-
tion algorithms.
Approximations via Reductions? Before we work on developing an algo-
rithm, we pause to discuss an interesting issue that arises: Vertex Cover is Here is a cautionary example. We used Independent Set to prove that the
easily reducible to Set Cover, and we have iust seen an approximation algo- Vertex Cover Problem is NP-complete. Specifically, we proved
rithm for Set Cover. What does this imply about the approximability of Vertex
Cover? A discussion of this question brings out some of the subtle ways in
Independent Set _<p Vertex Cover,
which approximation results interact with polynomial-time reductions.
First consider the special case in which all weights are equal to !--that which states that "if we had a polynomial-time algorithm that solves the Vertex
is, we are looking for a vertex cover of minimum size. We wil! call this the Cover Problem, then we could use this algorithm to solve the Independent
urtweighted case. Recall that we showed Set Cover to be NP-complete using a Set Problem in polynomial time." Can we use an approximation algorithm for
reduction from the decision version of unweighted Vertex Cover. That is, the minimum-size vertex cover to design a comparably good approximation
Vertex Cover <p Set Cover algorithm for the maximum-size independent set?
The answer is no. Recall that a set I of vertices is independent if and
This reduction says, "If we had a polynomial-time algorithm that solves the
only if its complement S = V - I is a vertex cover. Given a minimum-size
Set Cover Problem, then we could use this algorithm to solve the Vertex Cover
vertex cover S*, we obtain a maximum-size independent set by taking the
Problem in polynomial time." We now have a polynomial-time algorithm for
complement I* = V - S. Now suppose we use an approximation algorithm for
the Set Cover Problem that approximates the solution. Does this imply that we
the Vertex Cover Problem to get an approximately minimum vertex cover S.
can use it to formulate an approximation algorithm for Vertex Cover? The complement I = V - S is indeed an independent set--~here’s no problem
(11.12) One cart use the Set Cover approximation algorithm to give art H (d)- there. The trouble is when we try to determine our approximation factor for
approximation algorithm/:or the weighted Vertex Cover Problem, where d is the the Independent Set Problem; I can be very far from optimal. Suppose, for
maximum degree o/: the graph. example, that the optimal vertex cover S* and the optimal independent set I*
both have size IVI/2. If we invoke a 2-approximation algorithm for the Vertex
Proof. The proof is based on the reduction that showed Vertex Cover <_p Set
Cover Problem, we may perfectly well get back the set S = V. But, in this case,
Cover, which also extends to the weighted case. Consider an instance of the
our "approximately maximum independent set" I = V - S has no elements.
weighted Vertex Cover Problem, specified by a graph G = (V, E). We define an
11.4 The Pricing Method: Vertex Cover 621
Chapter 11 Approximation Algorithms
620
Proof. Consider a vertex cover S*. By the definition of fairness, we have
f!:~ Designing the Algorithm: The Pricing Method ~e=(i,j) Pe < wi for all nodes i ~ S*. Adding these inequalities over all nodes
Even though (!1.12) gave us an approximation algorithm with a provable in S*, we get
guarantee, we will be able to do better. Our approach forms a nice illustration
of the pricing method for designing approximation algorithms.
i~S* e=(i,]) i~S*
The Pricing Method to Minimize Cost The pricing method (also known as Now the expression on the left-hand side is a sum of terms, each of which
the primal-dual method) is motivated by an economic perspective. For the is some edge price Pe. Since S* is a vertex cover, each edge e contributes at
case of the Vertex Cover Problem, we will think of the weights on the nodes least one term Pe {o the left-hand side. It may contribute more than one copy
as costs, and we will think of each edge as having to pay for its "share" of of Pe to this sum, since it may be covered from both ends by S*; but the prices
the cost of the vertex cover we find. We have actually just seen an analysis of are nonnegative, and so the sum on the left-hand side.is at least as large as
this sort, in the greedy algorithm for Set Cover from Section 11.3; it too can be the sum of al! prices Pe. That is,
thought of as a pricing algorithm. The greedy algorithm for Set Cover defined
values Cs, the cost the algorithm paid for covering element s. We can think of Epo-<E E
cs as the element s’s "share" of the cost. Statement (11.9) shows that it is very eaE i~S* e=(i,j)
natural to think of the values cs as cost-shares, as the sum of the cost-shares Combining this with the previous inequality, we get
~s~U cs is the cost of the set cover e returned by the algorithm, ~siee wi. The
key to proving that the algorithm is an H(d*)-approximafion algorithm was a
certain approximate "fairness" property for the cost-shares: (11.10) shows that
as desired. []
the elements in a set Sk are charged by at most an H([SkD factor more than
the cost of covering them by the set S~.
The Algorithm The goal of the approximation algorithm will be to find a
In this section, we’ll develop the pricing technique through another ap- vertex cover and to set prices at the same time. We can think of the algorithm
plication, Vertex Cover. Again, we will think of the weight tvi of the vertex i as being greedy in how it sets the prices. It then uses these prides to drive the
as the cost for using i in the cover. We will think of each edge e as a separate way it selects nodes for the vertex cover.
"agent" who is willing to "pay" something to the node that covers it. The al-
We say that a node i is tight (or "paid for") if ~e=(i,]) Pe =
gorithm will not only find a vertex cover S, but also determine prices Pe >- 0
for each edge e ~ E, so that if each edge e E pays the price Pe, this will in
total approximately cover the cost of S. These prices Pe are theanalogues of Vertex-Cover-Approx(G, w) :
Set Pe=O for all eeE
Cs from the Set Cover Algorithm.
While there is an edge e=(i,j) such that neither i nor j is tight
Thinking of the edges as agents suggests some natural fairness rules for
Select such an edge e
prices, analogous to the property proved by (11.10). First of all, selecting a
Increase Pe without violating fairness
vertex i covers all edges incident to i, so it would be "unfair" to charge these
EndWhile
incident edges in total more than the cost of vertex i. We call prices Pe fair if,
Let S be the set of all tight nodes
for each vertex i, the edges adjacent to i do not have to pay more than the
Return S
cost of the vertex: ~e=(i,j) Pe < ~Vi" Note that the property proved by (11.10)
for Set Cover is an approximate fairness condition, while in the Vertex Cover For example, consider the execution of this algorithm on the instance in
algorithm we’ll actually use the exact fairness defined here. A useful fact about Figure 11.8. Initially, no node is tight; the algorithm decides to select the edge
fair prices is that they provide a lower bound on the cost of any solution. (a, b). It can raise the price paid by (a, b) up to 3, at which point the node b
becomes tight and it stops. The algorithm then selects the edge (a, d). It can
(11.13) For any vertex cover S*, and any nonnegative and lair prices only raise this price up to 1, since at this point the node a becomes tight (due
to the fact that the weight of a is 4, and it is already incident to an edge that is
have ~e~ Pe <- ~v(S*).
11.4 The Pricing Method: Vertex Cover
Chapter 11 Approximation Algorithms 623
622
cover is the cost of nodes a, b, and d, which~’is 10. We can account for this cost
exactly by charging (a, b) and (a, d) twice, and (c, d) once.) Now, it’s true that
this is unfair to some edges, but the amount of unfairness can be bounded:
Each edge gets charged its price at most two times (once for each end).
We now make this argument precise, as follows.
fight (11.14) The’setS and prices p returned by the algorithm satisfy the inequality
b c d
Ca) ¯ ,~.
a: fight
Proof. All nodes in S are tight, so we have ~e=(i,j) Pe : Wi for all i ~ S. Adding
over all nodes in S we get
d: fight
An edge e = (i,j) can be included in the sum on the right-hand side at most
b: tight c d b: fight c
twice (if both i and j are in S), and so we get
(d)
(c)
Figure 11.8 Parts (a)-(d) depict the steps in an execution of the pricing algorithm on an e~E
instance of the weighted Verte-x Cover Problem. The numbers inside the nodes indicate
their weights; the numbers annotating the edges indicate the prices they pay as the as claimed. []
algorithm proceeds.
Finally, this factor of 2 carries into an argument that yields the approxi-
paying 3). Finally, the algorithm selects the edge (c, d). It can raise the price marion guarantee.
paid by (c, d) up to 2, at which point d becomes tight. We now have a situation
where all edges have at least one tight end, so the algorithm terminates. The (11115) The set S returned by the algori}hm is a vertex cover, and its COs~ is
tight nodes are a, b, and d; so this is the resulring vertex cover. (Note that this ~ ::
is not the minimum-weight vertex cover; that would be obtained by selecting
a and c.) Proof. First note that S is indeed a vertex cover. Suppose, by contradiction,
that S does not cover edge e = (i, ]). This implies that neither i nor j is right,
~ Analyzing the Algorithm and this contradicts the fact that the gh±le loop of the algorithm terminated.
At first sight, one may have the sense that the vertex cover S is fully paid for To get the claimed approximation bound, we simply put together statement
by the prices: all nodes in S are tight, and hence the edges adiacent to the (!1.14) with (11.13). Letp be the prices set by the algorithm, and let S* be an
node i in S can pay for the cost of i. But the point is that an edge e can be optimal vertex cover. By (11.14) we have 2 ~e¢EPe >-- go(S), and by (11.13) we
adjacent to more than one node in the vertex cover (i.e., if both ends of e are have ~e~E Pe <-- go(S*).
in the vertex cover)~ and hence e may have to pay for more than one node.
In other words, the sum of the edge prices is a lower bound on the weight
This is the case, for example, with the edges (a, b) and (a, d) at the end of the of any vertex cover, and twice the sum of the edge prices is an upper bound
execution in Figure 11.8. on the weight of our vertex cover:
However, notice that if we take edges for which both ends happened to
show up in the vertex cover, and we charge them their price twice, then we’re w(S) <_ 2 E Pe <- 2w(S*). []
exactly paying for the vertex cover. (In the example, the cost of the e~E
11.5 Maximization via the Pricing Method: Th-~Disjoint Paths Problem
624 Chapter 11 Approximation Algorithms 625
set of sinks to be T = {t1, t2 ..... tk}, setting each edge capacity to be c, and
11.5 Maximization via the Pricing Method: looking for the maximum possible number of disjoint paths starting in S and
The Disjoint Paths Problem ending in T. Why wouldn’t this work? The problem is that there’s no way
We now continue the theme of pricing algorithms with a fundamental Problem to tell the flow algorithm that a path starting at st ~ S must end at t~ ~ T; the
that arises in network routing: the Disjoint Paths Problem. We’ll start out by algorithm guarantees only that this path will end at some node in T. As a
developing a greedy algorithm for this problem and then show an improved result, the paths that come out of the flow algorithm may well not constitute a
algorithm based on pricing. solution to the instance of Maximum Disjoint Paths, since they might not link
a source st to its corresponding endpoint tu
~ The Problem Disjoint paths" problems, where we need to find paths conn.ecting desig-
nated pairs of terminal nodes, are very common in networking app~cations.
To set up the problem, it helps to recall one of the first applications we saw Just think about paths on the Internet that carry streaming media or Web data,
for the Maximum-Flow Problem: finding disioint paths in graphs, which we
or paths through the phone network carrying voice traffic.1 Paths sharing edges
discussed in Chapter 7. There we were looking for edge-disioint paths all
can interfere with each other, and too many paths sharing a single edge will
starting at a node s and ending at a node t. How crucial is it to the tractability cause problems in most applications. The maximum allowable amount of shar-
of this problem that all paths have to start and end at the same node? Using the ing will differ from application to application. Requiring the paths to be disjoint
technique from Section 7.7, one can extend this to find disjoint paths~ where is the strongest constraint, eliminating all interference between paths. We’ll
we are given a set of start nodes S and a set of terminals T, and the goal is
see, however, that in cases where some sharing is allowed (even just two paths
to find edge-disioint paths where paths may start at any node in $ 3nd end at to an edge), better approximation algorithms are possible.
any node in T.
Here, however, we will look at a case where each path to be routed has
its own designated starting node and ending node. Specifically, We consider
~;-~ Designing and Analyzing a Greedy Algorithm
the following Maximum Disjoint Paths Problem. We are given a directed graph We first consider a very simple algorithm for the case when the capacity c = 1:
G, together with k pairs of nodes (sl, tl), (s2, t2) ..... (sk, tk) and an integer that is, when the paths need to be edge-disjoint. The algorithm is essentially
capacity c. We think of each pair (st, ti) as a routing request, which asks for a greedy, except that it exhibits a preference for short paths. We will show that
path from st to tu A solution to this instance consists of a subset of the requests this simple algorithm is an O(~,/-~)-approximation algorithm, where m =
we will satisfy, I ~ {1 ..... k}, together with paths that satisfy them while not is the number of edges in G. This may sound like a rather large factor of
overloading any one edge: a path P~ for i ~ I so that Pi goes from st to t~, and approximation, and it is, but there is a strong sense in which it is essentially the
each edge is used by at most c paths. The problem is to find a solution with best we can do. The Maximum Disjoint Paths Problem is not only NP-complete,
as large as possible--that is, to satisfy as many requests as possible. Note that but it is also hard to approximate: It has been shown that unless ~ = ~, it
the capacity c controls how much "sharing" of edges we allow; .when c = 1, is impossible for any polynomial-time algorithm to achieve an approximation
we are requiring the paths to be fully edge-disioint, while larger c allows.some bound significantly better than O(~) in arbitrary directed graphs.
overlap among the paths. After developing the greedy algorithm, we will consider a slightly more
We have seen in Exercise 39 in Chapter 8 that it is NP-complete to sophisticated pricing algorithm for the capacitated version. It is interesting
determine whether al! k routing requests can be satisfied when the paths are
required to be node-disioint. It is not hard to show that the edge-disioint version
~ A researcher from the telecommunications industry once gave the following explanation for the
of the problem (corresponding to the case with c = 1) is also NP-complete.
distinction between Maximum DisJoint Paths and network flow, and the broken reduction in the
Thus it turns out to have been crucial for the application of efficient previous paragraph. On Mother’s Day, traditionally the busiest day of the year for telephone calls,
network flow algorithms that the endpoints of the paths not be explicitly paired the phone company must solve an enormous disjoint paths problem: ensuring that each source
individual si is connected by a path through the voice network to his or her mother tu Network flow
up as they are in Maximum Disioint Paths. To develop this point a little further,
algorithms, finding disjoint paths between a set S and a set T, on the other hand, will ensure only
suppose we attempted to reduce Maximum Disjoint Paths to a network flow
that each person gets their call through to somebody’s mother.
problem by defining the set of sources to be S = {sl, s2 ..... s~}, defining the
11.5 Maximization via the Pricing Method: The Disjoint Paths Problem
Chapter 11 Approximation Algorithms 627
626
them separately. We will call a path long if it has at least ~ edges, and we
will call it short otherwise. Let I~ denote the set of indices in I* so that the
corresponding path P[ is short, and let Is denote the set of indices in I so that
the corresponding path Pi is short.
blocks
~ergthing else. The graph G has m edges, and each long path uses at least ~ edges, so
there can be at most ~ long paths in I*. .
Now consider the short paths in I*. In order for I* to be much larger than
I, there would have to be many pairs that are connected in I* but not in I. Thus
let us consider pairs that are connected by the optimum using a short path,
Figure 11.9 A case in which it’s crucial that a greedy algorithm for selecting disjoint
paths favors short paths over long ones.
but are not connected by the greedy algorithm. Since the lJhth P~ connecting
si and ti in the optimal solution I* is short, the greedy algorithm would have
selected this path, if it had been available, before selecting any long paths.
to note that the pricing algorithm does much better than the simple greedy But the greedy algorithm did not connect s~ and t~ at all, and hence one of the
algorithm, even when the capacity c is only slightly more than 1. edges e along the path P~ must occur in a path P: that was selected earlier by
the greedy algorithm. We will say that edge e blocks the path P~.
Greedy-Dis j oint-Paths : Now the lengths of the paths selected by the greedy algorithm are mono-
Set tone increasing, since each iteration has fewer options for choosing paths.
Until no new path can be found
Let Pi be the shortest path (if one exists) that is edge-disjoint The path P: was selected before considering P[ and hence it must be shorter:
IP]I <_ IP[I <_ ~r-~. So path Pi is short. Since the paths used bythe optimum are
from previously selected paths, and connects some (si, ti) pair
edge-disjoint, each edge in a path Pj can block at most one path P~. It follows
that is not yet connected that each short path Pj blocks at most ~ paths in the optimal solution, and
Add i to I and select path P~ to connect si to so we get the bound
EndUntil
the selection of paths in I*. Long paths can block a lot of other paths, so for now
~ Designing and Analyzing a Pricing Algorithm we will focus on the short paths in Is. As we try to continue following what we
Not letting any two paths use the same edge is quite extreme; inmost did in the disjoint case, we immediately run into a difficulty, however. In that
applications one can allow a few paths to share an edge. We will now develop case, the length of a path in I* was simply the number of edges it contained; but
an analogous algorithm, based on the pricing method, for the case where c > 1 here, the lengths are changing as the algorithm runs, and so it is not clear how
paths may share any edge. In the disjoint case just considered, we viewed all to define the length of a path in I* for purposes of the analysis. In other words,
edges as equal and preferred short paths. We can think of this as a simple kind for the analysis, when should we measure this length? (At the. beginning of
of pricing algorithm: the paths have to pay for using up the edges, and each the execution? At the end?) j- ~
edge has a unit cost. Here we will consider a pricing scheme in which edges It turns out tl~at the crucial moment in the algorithm, for pur~0ses of our
are viewed as more expensive if they have been used already, and hence have
analysis, is the first point at which there are no short paths left to choose. Let
less capacity left over. This will encourage the algorithm to "spread out" its
~ be the length function at this point in the execution of the algorithm; we’!l
paths, rather than piling them up on any single edge. We wi!l refer to the cost
use [ to measure the length of paths in I*. For a path P, we use [(P) to denote
of an edge e as its length ~e, and define the length of a path to be the sum of the
its length, Y~e~P [e. We consider a path P~ in the optimal solution I* short if
lengths of the edges it contains: ~(P) = ~e~P 4- We wi!l use a multiplicative
[(p[) < f12, and long otherwise. Let I~~ denote the set of short paths in I*. The
parameter fl to increase the length of an edge each time an additional path
first step is to show that there are no short paths connecting pairs that are not
uses it. connected by the approximation algorithm.
Greedy-Paths-with-Capacity : (11.17) Consider a source-sink pair i ~ I* that is not connected by the approx-
Set imation algorithm; that is, i ~ I. Then ~(P~. ) >_ f12.
Set edge length ~e=l for all e ~ E
Proof. As long as short paths are being selected, we do not have to worry
Until no new path can be found
Let Pi be the shortest path (if one exists) so that adding Pi to about explicitly enforcing the requirement that each edge be used by at most
the selected set of paths does not use any edge more than c c = 2 paths: any edge e considered for selection by a third path would already
times, and Pi connects some (si, ti) pair not yet connected
have length ge = f12, and hence be long.
Add Consider the state of the algorithm with length ~. By the argument in the
Multiply the length of all edges along Pi by fl previous paragraph, we can imagine the algorithm having run up to this point
EndUntil without caring about the limit of c; it just selected a short path whenever it
could find one. S~nce the endpoints s~, ti of P[ are not connected by the greedy
Analyzing the Algorithm For the analysis we will focus on the simplest- algorithm, and since there are no short paths left when the length function
case, when at most two paths may use the same edge--that is, when c = 2. reaches ~, it must be the case that path P[ has length at least t2 as measured
We’ll see that, for this case, setting f~ = m1/3 will give the best approximation bye. ~
result for this algorithm. Unlike the disjoint paths case (when c = 1), it is
not known whether the approximation bounds we obtain here for c > 1 are The analysis in the disjoint case used the fact that there are only m edges
close to the best possible for polynomial-time algorithms in general, assuming to limit the number of long paths. Here we consider length [, rather than the
number of edges, as the quantity that is being consumed by paths. Hence,
The key to the analysis in the disioint case was to distinguish "short" and to be able to reason about this, we wil! need a bound dn the total length in
"long" paths. For the case when c = 2, we will consider a path Pi selected by the graph ~e [e. The sum of the lengths over all edges ~e ~e starts out at m
the algorithm to be short if the length is less than f12. Let Is denote the set of (length 1 for each edge). Adding a short path to the solution Is can increase
the length by at most f!s, as the selected path has length at most f12, and the
short paths selected by the algorithm.
lengths of the edges are increased by a fl factor along the path. This gives us
Next we want to compare the number of paths selected with the maximum
a useful comparison between the number of short paths selected and the total
possible. Let I* be an optimal solution and P~ be the set of paths used in this
length.
solution. As before, the key to the analysis is to consider the edges that block
11.6 Linear Programming and Rounding: An Application to Vertex Cover 631
Chapter 11 Approximation Algorithms
630
we will not attempt to provide any kind of comprehensive overview of it
(11.18) The set Is of short paths selected by the approximation algorithm, here. In this section, we will introduce some of the basic ideas underlying
and the lengths ~, satisfy the relation ~e ~e <- fl3IIsl + Fit. linear programming and show how these can be used to approximate NP-hard
optimization problems.
Finally, we prove an approximation bound for this algorithm. We will find
that even though we have simply increased the number of paths allowed on Recall that in Section 11.4 we developed a 2-approximation algorithm
each edge from 1 to 2, the approximation guarantee drops by a significant for the weighted Vertex Cover Problem. As a first application for the linear
amount that essentially incorporates this change into the exponent: from programming technique, we’ll give here a different 2-approximation algorithm
O(mtl/2) down to O(mtU3). that is conceptually much simpler (though slower in running time).
11.6 Linear Programming and Rounding: Given an m x n matrix A, and vectors b ~ Rrn and c ~ Rn, find a vector
An Application to Vertex Cover x ~ Rn to solve the following optimization problem:
We will start by introducing a powerful technique from operations research: min(ctx such that x > 0; Ax > b).
linear programtmting. Linear programming is the subject of entire courses, and
11.6 Linear Programming and Rounding: An Application to Vertex Cover 633
Chapter 11 Approximation Algoriflnns
632
Linear Programming was also known to be in co-2q~P for a long time, though
region satisfying the inequalitiesI this is not as easy to see. Students who have taken a linear programming course
xl>_ 0, x2>- 0 may notice that this fact follows from linear programming duality.2
Xl + 2x2 >_ 6
6
2xl + x2 >- 6 For a long time, indeed, Linear Programming was the most famous ex-
ample of a problem in both N~P and co-~P that was not known to have a
5
polynomial-time solution. Then, in 1981, Leonid Khachiyan, who at the time
was a young researcher in the Soviet Union, gave a polynomial-time algorithm
for the problem. After some initial concern in the U.S. popular press that this
discovery might ttirn out to be a Sputnik-like event in the Cold War (it didn’t),
researchers settled down to understand exactly what Khachiyan had done. His
2 initial algorithm, while polynomial-time, was in fact quite slow and imprac-
tical; but since then practical polynomia!-time algorithms--so-called interior
1 point methods--have also been developed following the work of Narendra
Karmarkar in 1984.
1 2 4 5 6 Linear programming is an interesting example for another reason as well.
The most widely used algorithm for this problem is the simplex method. It
Figure 11.10 The feasible region of a simple linear program. works very well in practice and is competitive with polynomial-time interior
methods on real-world problems. Yet its worst-case running time is known
to be exponential; it is simply that this exponential behavior shows up in
ctx is often called the objective [unction of the linear program, and Ax > b is practice only very rarely. For all these reasons, linear programming has been a
called the set of constraints. For example, suppose we define the v~ctor c to very useful and important example for thinking about the limits of polynomial
be (1.5, 1) in the example in Figure 11.10; in other words, we are seeking to time as a formal definition of efficiency.
minimize the quantity 1.5xl + x2 over the region defined by the inequalities. For our purposes here, though, the point is that linear programming
The solution to this would be to choose the point x = (2, 2), where the two problems can be solved in polynomial time, and very efficient algorithms
slanting lines cross; this yields a value of ctx = 5, and one can check that there exist in practice. You can learn a lot more about all this in courses on linear
is no way to get a smaller value. programming. The question we ask here is this: How can linear programming
We can phrase Linear Programming as a decision problem in the following help us when we want to solve combinatorial problems such as Vertex Cover?
way.
Given a matrix A, vectors b and c, and a bound y, does there, exist x so
Vertex Cover as an Integer Program
that x > O, Ax > b, and ctx < Y ?
Recall that a vertex cover in a graph G = (V, E) is a set S _c V so that each
To avoid issues related to how we represent real numbers, we will assume that edge has at least one end in S. In the weighted Vertex Cover Problem, each
the coordinates of the vectors and matrices involved are integers. vertex i ~ V has a weight wi > O, with the weight of a set S of vertices denoted
The Computational Complexity of Linear Programming The decision ver- w(S) = Y~4~s wi. We would like to find a vertex cover S for which w(S) is
sion of Linear Programming is in 3ff~. This is intuitively very believable--we minimum.
)ust have to exhibit a vector x satisfying the desired properties. The one con-
cern is that even if all the input numbers are integers, such a vector x may
not have integer coordinates, and it may in fact require very large precision
2 Those of you who are familiar with duality may also notice that the pricing method of the previous
to specify: How do we know that we’ll be able to read and manipulate it in
sections is motivated by linear programming duality: the prices are exactly the variables in the
polynomial time?. But, in fact, one can show that if there is a solution, then dual linear program (which explains why pricing algorithms are often referred to as primal-dual
there is one that is rational and needs only a polynomial number of bits to algorithms).
write down; so this is not a problem.
11.6 Linear Programming and Rounding: An Application to Vertex Cover
Chapter 11 Approximation Algorithms
635
634
But keep in mind that this is not just an instance of the Linear Programming
We now try to formulate a linear program that is in close correspondence
Problem: We have crucially required that all coordinates in the solution be
with the Vertex Cover Problem. Thus we consider a graph G = (V, E) .with
either 0 or 1. So our formulation suggests that we should solve the problem
a weight wi >_ 0 on each node i. Linear programming is based on the use of
vectors of variables. In our case, we will have a decision variable xi for each min(wtx subject to -i > x > ~, Ax >_ ~, x has integer coordinates).
node i ~ V to model the choice of whether to include node i in the vertex cover;
This is an instance of the Linear Programming Problem in which we require
xi = 0 will indicate that node i is not in the vertex cover, and xi = 1 will indicate the coordinates of x to take integer values; without this extra constraint,
that node i is in the vertex cover. We can create a single n-dimensional vector
x in which the im coordinate corresponds to the im decision variable xi. the coordinates of x could be arbitrary real numbers. We call this problem
Integer Programming, as we are looking for integer-valued solutions to a linear
We use linear inequalities to encode the requirement that the selected
nodes form a vertex cover; we use the objective function to encode the goal program.
of minimizing the total weight. For each edge (i, j) ~ E, it must have one end Integer Programming is considerably harder than Linear Programming;
in the vertex cover, and we write this as the inequality xi + xj >_ 1. Finally, indeed, our discussion really constitutes a reduction from Vertex Cover to the
to express the minimization problem, we write the set of node weights as decision version of Integer Programming. In other words, we have proved
an n-dimensional vector w, with the fm coordinate corresponding to wi; we
then seek to minimize w~x. In summary, we have formulated the Vertex Cover
Problem as follows.
To show the N-P-completeness of Integer Programming, we would still
(VC.IP) Min E ll~ixi have to establish that the decision version is in ~fT. There is a complication
iaV
here, as with Linear Programming, since we need to establish that there is
S.t. Xi q- x] > l (i, j) ~ E always a solution x that can be written using a polynomial number of bits. But
x/~ {0, 1} i~V. this can indeed be proven. Of course, for our purposes, the integer program
we are dealing with is explicitly constrained to have solutions-in which each
We claim that the vertex covers of G are in one-to-one correspondence with coordinate is either 0 or 1. Thus it is clearly in ~f~P, and our reduction from
the solutions x to this system of linear inequalities in which all coordinates Vertex Cover establishes that even this special case is NP-complete.
are equal to 0 or 1.
Using Linear Programming for Vertex Cover
(11.21) S is a vertex cover in G if and only if the vector x, defined as xi = 1 We have yet to resolve whether our foray into linear and integer programming
for i ~ S, and xi = 0 for i S, satisfies the constraints in (VC.IP). Further,. we will turn out to be useful or simply a dead end. Trying to solve the integer
have w(S) = wtx. programming problem (VC.IP) optimally is clearly not the right way to go, as
this is NP-hard.
We can put this system into the matrix form we used for linear program- The way to make progress is to exploit the fact that Linear Programming is
ruing, as follows. We define a matrix A whose columns correspond to the nodes not as hard as Integer Programming. Suppose we take (VC.IP) and modify it,
in V and whose rows correspond to the edges in E; entry Ale, i] = 1 if node i dropping the requirement that each x~ ~ {0, 1} and reverting to the constraint
is an end of the edge e, and 0 otherwise. (Note that each row has exactly two that each x~ is an arbitrary real number between 0 and 1. This gives us an
nonzero entries.) If we use ~ to denote the vector with all coordinates equal to instance of the Linear Programming Problem that we could ca]l (VC.LP), and
1, and ~ to denote the vector with all coordinates equal to 0, then the system we can solve it in polynomial time: We can find a set of values {x~} between 0
of inequalities above can be written as and 1 so that x~ + x7 _> 1 for each edge (i, j), and ~i VdiX~ is minimized. Let x*
denote this vector, and WLp = wtx* denote the value of the objective function.
Ax >~
We note the following basic fact.
11.7 Load Balancing Revisited: A More Advanced LP Application
Chapter !! Approximation Algorithms 637
636
Thus we have a produced a vertex cover S of weight at most 2tvLp. The
(11.23) Let S* denote a vertex cover o]~ minimum tveight. Then tvLp < w(S*).
lower bound in (11.23) showed that the optimal vertex cover has weight at
Proof. Vertex covets of G correspond to integer solutions of (VC.IP)~ so the least tvL~, and so we have the following result.
minimum of nfin(tvtx : -i >_ x >_ O, Ax >_ !) over all integer x vectors is exactly
(11.25) The algorithm produces a vertex cover S o~at most ttvi~e th~ minimum
the minimum-weight vertex cover. To get the minimum of the linear program
(VC.LP), we allow x to take arbitrary real-number values--that is, we minimize possible tvelght.
over many more choices of x--and so the minimum of (VC.LP) is no larger
than that of (VC.IP). []
* 11.7 Load Balancing Revisited: A ~ore Advanced
Note that (11.23) is one of the crucial ingredients we need for an approx- LP Application
imation algorithm: a good lower bound on the optimum, in the form of the In this section we consider a more general load balancing problem. We will
efficiently computable quantity tvLp. develop an approximation algorithm using the same general outline as the 2-
However, tvLp Can definitely be smaller than tv(S*). For example, if the approximation we iust designed for Vertex Cover: We solve a corresponding
graph G is a triangle and all weights are 1, then the minimum vertex cover has linear program, and then round the solution. However, the algorithm and its
a weight of 2. But, in a linear programming solution, we can set xi = ½ for all
3 AS analysis here will be significantly more complex than what was needed for
three vertices, and so get a linear programming solution of weight only ~. Vertex Cover. It turns out that the instance of the Linear programming Problem
a more general example, consider a graph on n nodes in which each pair of we need to solve is, in fact, a flow problem. Using this fact, we will be able
nodes is connected by an edge. Again, all weights are 1. Then the minimum to develop a much deeper understanding of what the fractional solutions to
vertex cover has weight n - 1, but we can find a linear programming solution the linear program look like, and we will use this understanding in order to
of value n/2 by setting xi = } for all vertices i. round them. For this problem, the only known constant-factor approximation
So the question-is: How can solving this linear program help us actually algorithm is based on rounding this linear programming solution.
find a near-optimal vertex cover? The idea is to work with the values x~ and
to infer a vertex cover S from them. It is natural that
¯ if x~ = 1 for some node i, ~ The Problem _
then we should put it in the vertex cover S; and if*xi =0, then we should leave The problem we consider in this section is a significant, but natural, gener-
it out of S. But what should we do with fractional values in between? What alization of the Load Balancing Problem with which we began our study of
¯ * .5? The natural approach here is to round¯
should we do if xi = .4xor i= approximation algorithms. There, as here, we have a set J of n jobs, and a set
¯ X --
Given a fractional solution {x~}, we define S = {i ~ V"i* > }}--that is, we M of m machines, and the goal is to assign each job to a machine so that the
rotmd values at least } up, and those below } down¯ maximum load on any machine will be as small as possible. In the simple Load
(11.24) The set S defined in this tray is a vertex cover, and tv(S)< tvLp. Balancing Problem we considered earlier, each job j can be assigned to any
machine i. Here, on the other hand, we will restrict the set of machines that
Proof. First we argue that S is a vertex cover. Consider an edge e ---- (i, ]). We each job may consider; that is, for each job there is just a subset of machines
claim that at least one of i and] must be in S. Recall that one of our inequalities to which it can be assigned. This restriction arises naturally in a number of
is xi + xj > 1. So in any solution x* that satisfies this inequality, either x~ > } applications: for example, we may be seeking to balance load while maintain-
or x~* > }. Thus at least one of these two will be rounded up, and i or ] will be ing the property that each job is assigned to a physically nearby machine, or
I -- to a machine with an appropriate authorization to process the job.
placed in S.
Now we consider the weight iv(S) of this vertex cover. The set S only has More formally, each jobj has a fixed given size t] > 0 and a set of machines
vertices with x~ > }; thus the linear program "paid" at least }tvi for node i, and M~ ___ M that it may be assigned to. The sets ~ can be completely arbitrary.
we only pay tvi: at most twice as much. More formally, we have the following We call an assignment of jobs to machines feasible if each job j is assigned to
chain of inequalities. a machine i e M]. The goal is still to minimize the maximum load on any
machine: Using Ji _c J to denote the jobs assigned to a machine i e M in
a feasible assignment, and using L~ = ~]~j; t] to denote the resulting load,
i ieS ieS
11.7 Load Balancing Revisited: A More Advanced LP Application
Chapter 11 Approximation Mgofithms
639
638
(GL.IP) rain L
we seek to minimize maxi Li. This is the definition of the Generalized Load
~--~xq=ty for all j ~ J
Balancing Problem.
In addition to containing our initial Load Balancing Problem as a special
case (setting Mj = M for all jobs j), Generalized Load Balancing includes the E xq <_ L for al! i ~ M
Bipartite Perfect Matching Problem as another special case. Indeed, given a
bipartite graph with the same number of nodes on each side, we can view the. xo s {O, ty} for all j E Y, i s M~.
nodes on the left as jobs and the nodes on the right as machines; we define xq = 0 for all j ~ J, i ~ My.
tj = 1 for all jobs j, and define Mj to be the set of machine nodes i such that
there is an edge (i, j) E E. There is an assignment of maximum load 1 if and First we claim that the feasible assignments are in one-to-one correspon-
only if there is a perfect matching in the bipartite graph. (Thus, network flow dence with the solutions x satisfying the above constraints, and, in an optimal
techniques can be used to find the optimum load in this special case.) The solution to (GL.IP), L is the !oad of the corresponding assignment.
fact that Generalized Load Balancing includes both these problems as special
(11.26) An assignment of jobs to machines has load’at most L if and only
cases gives some indication of the challenge in designing an algorithm for it.
if the vector x, defined by setting xq = ty whenever job j is assigned to machine
i, and xq = 0 otherwise, satisfies the constraints in (GLdP), with L set to the
maximum load of the assignment.
i~J Designing and Analyzing the Algorithm Next we will consider the corresponding linear program obtained by
We now develop an approximation algorithm based on linear programming for replacing the requirement that each xq ~ {0, ty} by the weaker requirement that
the Generalized Load Balancing Problem. The basic plan is the same one we xq > 0 for ally ~ J and i E My. Let (GL.LP) denote the resulting linear program. It
saw in the previous section: we’ll first formulate the problem as an equivalent would also be natural to add xq <_ ty. We do not add these inequalities explicitly,
linear program where the variables have to take specific discrete values; we’ll as they are implied by the nonnegativity and the equation ~i xq = ty that is
then relax this to a linear program by dropping this requirement on the values required for each job j.
of the variables; and then we’ll use the resnlting fractional assignment to obtain
an actual assignment that is dose to optimal. We’ll need to be more careful than We immediately see that if there is an assignment with load at most L, then
in the case of the Vertex Cover Problem in rounding the solution to produce (GL.LP) must have a solution with value at most L. Or, in the contrapositive,
the actual assignment. (11.27) If the optimum value of (GL.LP) is L, then the optimal load is at least
L* >L.
Integer and Linear Programming Formulations First we formulate the Gen- We can use linear programming to obtain such a solution (x, L) in polyno-
eralized Load Balancing Problem as a linear program with restrictions on the mial time. Our goal will then be to use x to create an assignment. Recall that
variable values. We use variables xq corresponding to each pair (i, j) of ma- the Generalized Load Balancing Problem is NP-hard, and hence we cannot ex-
chine i E M and iob j ~ I. Setting xq = 0 will indicate that )oh jis not assigned pect to solve it exactly in polynomial time. Instead, we will find an assignment
to machine i, while setting xq = t~ will indicate that all the load ti of iob j is with load at most two times the minimum possible. To be able to do this, we
assigned to machine i. We can think of x as a single vector with mn coordinates. will also need the simple lower bound (11.2), which we used already in the
We use linear inequalities to encode the requirement that each iob is original Load Balancing Problem.
assigned to a machine: For each iob j we require that ~_,i xq = t~. The load
of a machine i can then be expressed as Li = ~4 xq. We require that xq = 0 (11.28) The optimal load is at least L* > maxy ty.
whenever i ~ My. We will use the obiecfive function to encode the goal of Rounding the Solution When There Are No Cycles The basic idea is to
finding an assignment that minimizes the maximum load. To do this, we round the xq values to 0 or ty. However, we cannot use the simple idea of
will need one more variable, L, that will correspond to the load. We use the iust rounding large values up and small values down. The problem is that the
inequalities ~7 xq < L for all machines i. In summary, we have formulated the linear programming solution may assign small fractions of a job j to each of
following problem.
Chapter 11 Approximation Mgofithms 11.7 Loa.d Balancing Revisited: A More Advanced LP Application
640 641
the m machines, and hence for some jobs there may be no large xgj values.
The algorithm we develop will be a rounding of x in the weak sense that
each iob j will be assigned to a machine i with xij > 0, but we may. have to
round a few really small values up. This weak rounding already ensures that
the assignment is feasible, in the sense that we do not assign any iob j to a
machine i not in Mj (because if i ~ Mj, then we have xgi = 0).
The key is to understand what the structure of the fractional solution is
like and to show that while a few jobs may be spread out to many machines,
this cannot happen to too many jobs. To this end, we’ll consider the following
bipartite graph G(x) = (V (x), E(x)): The nodes are V (x) = M U J, the set of jobs
and the set of machines, and there is an edge (i, j) ~ E(x) if and only if xii > 0.
We’ll show that, given any solution for (GL.LP), we can obtain a new
solution x with the same lo_ad L, such that G(x) has no cycles. This is the
crucial step, as we show that a solution x with no cycles can be used to obtain
an assignment with load at most L + L*.
(11.29) Given a solution (x, L) of (GL.LP) such that the graph G(x) has no
cycles, we can use this solution x to obtain a feasible assignment of jobs to
machines with load at most L + L* in O(mn) time.
Proof. Since the graph G(x) has no cycles, each of its connected components
is a tree. We can produce the assignment by considefing,eaeh-Go.m~ponent Figure 11.11 An example of a graph G(x) with no cycles, where the squares are machines
separately. Thus, consider one of the components, which isa tree whose n--b-des and the circles are jobs. The solid lines show the resulting assignment of jobs to
machines.
correspond to jobs and machines, as shown in Figure 11.11.
First, root the tree at an arbitrary node. Now consider a job j. If the node
corresponding to job j is a leaf of the tree; let machine node i be its parent.
ty < 2 x~j < L,
Since j has degree 1 in the tree G(x), machine i is the only machine that has
been assigned any part of job j, and hence we must have that xgj = tj. Our
assignment will assign such a job j to its only neighbor i. For a job j whose using the inequality bounding the load in (GL.LP). For the parent j =p(i) of
corresponding node is not a leaf in G(x), we assign j to an arbitrary child of node i, we use tj _< L* by (11.28). Adding the two inequalities, we get that
the corresponding node in the rooted tree. ~j~:~ p~.j _< L + L*, as iaime.d. []
The method can clearly be implemented in O(mn) fim~ (including the Now, by (11.27), we know thatL < L*, so a solution whose load is bounded
time to set up the graph G(x)). It defines a feasible assignment, as the linear by L + L* is also bounded by 2L*--in other words, twice the optimum. Thus
program (GL.LP) required that xij = 0 whenever i ~ Mk To finish the proof, we we have the following consequence of (11.29).
need to show that the load is at most L + L*. Let i be any machine, and let J~
be the set of jobs assigned to machine i. The jobs assigned tO machine i form (11.30) Given a solution (x, L) of (GL.LP) such that the graph G(x) has no
a subset of the neighbors of i in G(x): the set Y~ contains those children of node cycles, then we can use this solution x to obtain a feasible assignment of jobs
i that are leaves, plus possi~!y:_~e parent p(i) of node i. To bound the load, to machines with load at most twice the optimum in O(mn) time.
we c~n~{der-th-e parent p(L) separately. For all other jobs j ~ p(i) assigned to
i, we have x~j = tj, and hence we canbound the load using the solution x, as Eliminating Cycles from the Linear Programming Solution To wrap up
follows. our approximation algorithm, then, we just need to show how to convert
1!.7 Load Balancing Revisited: A More Advanced LP Application
643
Chapter 11 Approximation Algorithms
642
This statement allows us to solve (GL.LP) using flow computations and a
Jobs binary search for the optimal value L: we try successive values of L until we
find the smallest one for which there is a feasible flow.
Machines
Here we’ll use the understanding we gained of (GL.LP) from the equivalent
flow formulation to modify a solution x to eliminate all cycles from G(x). In
terms of the flow we have just defined, G(x) is the undirected graph obtained
Supply = tj ( from G by ignoring the directions of the edges, deleting the sink v and all
adjacent edges, and also deleting all edges from Y to M that do not carry flow.
We’ll eliminate all cycles in G(x) in a sequence of at most ran steps, where
Demand = Zj tj the goa! of a single step is to eliminate at least one edge from G(x) without
increasing the load L or introducing any new edges.
Proof. Consider the cycle C in G(x). Recall that G(x) corresponds to the
set of edges that carry flow in the solution x. We will modify the solution
Figure 11.12 The network flow computation used to find a solution to (GL.LP). Edges by augmenting the flow along the cycle C, using essentially the procedure
between the jobs and machines have infinite capaciW. augment: from Section 7.1. The augmentation along a cycle will not change
the balance between incoming and outgoing flow at any node; rather, it will
eliminate one backward edge from the residual graph, and hence an edge
ik,jk,
an arbitrary solution of (GL.LP) into a solution x with no cycles in G(x). In where ie isa machine node and je is a job node. We’ll modify the solution
the process, we wil! also show how to obtain a solution to the linear program by decreasing the flow along all edges (j~, i~) and increasing the flow on the
(GL.LP) using flow computations. More precisely, given a fixed load value L, k (where k + 1 is used to denote 1), by the
we show how to use a flow computation to decide if (GL.LP) has a solution .... same amount 8. This change will not affect the flow conservation constraints.
with value at most L. For this construction, consider the following directed By setting 8 = min~=l xi&, we ensure that the flow remains feasible and the
graph G = (V, E) shown in Figure 11.12. The set of vertices of the-flow graph edge obtaining the minimum is deleted from G(x). []
G will be V = M t3 J t3 {v}, where v is a new node. The nodes j ~ J will be
sources with supply tj, and the only demand node is the new sink v, which We can use the algorithm contained in the proof of (11.32) repeatedly to
has demand ~,j tj. We’ll think of the flow in this network as "load" flowing eliminate al! cycles from G(x). Initially, G(x) may have ran edges, so after at
from jobs to the sink v via the machines. We add an edge (j, i) with infinite most O(mn) iterations, the resulting solution (x, L) will have no cycles in G(x).
capacity from job j to machine i if and only if i ~ Mj. Finally, we add an edge At this point, we can use (11.30) to obtain a feasible assignment with at most
(i, v) for each machine node i with capaciW L. twice the optimal load. We summarize the result by the following statement.
(11.31) The solutions of this flow pwblem with capacity L are in one-to-one (11.33) Given an instance of the Generalized Load Balancing Problem, we
correspondence with solutions of (GL.LP) with value L, where xij is the flow can find, in polynomial time, a feasible assignment with load at most twice the
value along edge (i, j), and the flow value on edge (i, t) is the load ~ xq on minimum possible.
machine i.
11.8 Arbitrarily Good Approximations: The Knapsack Problem
Chapter 11 Approximation Mgofithms 645
644
.fixed choice of ~ > 0; however, the dependence on ~ will not be polynomial.
11.8 Arbitrarily Good Approximations: We call such an algorithm a polynomial-time approximation scheme.
The Knapsack Problem You may ask: How could such a strong kind of approximation algorithm
Often, when you talk to someone faced with an NP-hard optimization p}oblem, be possible in polynomial time when the Knapsack Problem is NP-hard? With
they’re hoping you can give them something that will produce a solution integer values, if we get close enough to the optimum value, we must reach the
within, say, 1 percent of the optimum, or at least within a small percentage optimum itself! The catch is in the nonpolynomial dependence on the desired
of optimal. Viewed from this perspective, the approximation algorithms we’ve precision: for any fixed choice of ~, such as ~ = .5, ~ = .2, .or even ~ = .01, the
seen thus far come across as quite weak: solutions within a factor of 2 of the algorithm runs in .polynomial time, but as we change ~ to smaller and smaller
minimum for Center Selection and Vertex Cover (i.e., 100 percent more than values, the running time gets larger. By the time we make ~ s.rnall enough
optimal). The Set Cover Algorithm in Section 10.3 is even worse: Its cost is not to make sure we get the optimum value, it is no longer a polynomial-time
even within a fixed constant factor of the minimum possible! algorithm.
Here is an important point underlying this state of affairs: NP-complete
problems, as you well know, are all equivalent with respect to polynomial- ~~D_esigning the Algorithm
time solvability; but assuming T ~ :NT, they differ considerably in the extent
In Sectio!! 6~ x~e considered algorithms for the Subset Sum Problem, the
to which their solutions can be efficiently approximated. In some cases, it is special &se of the Knapsack Problem~hen vi = w~ for all items i. We gave a
actually possible to prove limits on approximability. For example, if dynamic programming algorithm for this special case that ran in O(nW) time
then the guarantee provided by our Center Selection Algorithm is ,the best assuming the weights are integers. This algorithm naturally extends to the more
possible for any polynomial-time algorithm. Similarly, the guarantee provided general Knapsack Problem (see the end of Section 6.4 for this extension). The
by the Set Cover Algorithm, however bad it may seem, is very close to the algorithm given in Section 6.4 works well when the weights are small (even if
best possible, unless T = 3gt?. For other problems, such as the Vertex Cover the values may be big). It is also possible to extend our dynamic programming
Problem, the approximation algorithm we gave is essentially the best known, algorithm for the case when the values are small, even if the weights may be
but it is an open question whether there could be polynomial-time algorithms big. At the end of this section, we give a dynamic programming algorithm for
with better guarantees. We will not discuss the topic of lower bounds on that case running in time O(n2v*), where v* = maxi v~. Note that this algorithm
approximability in this book; while some lower bounds of this type are not so does not run in polynomial time: It is only pseudo-polynomial, because of its
difficult to prove (such as for Center Selection), many are extremely technical. dependence on the size of the values vi. Indeed, since we proved this problem
to be NP-complete in Chapter 8, we don’t expect to be able to find a polynomial-.
time algorithm.
f!.The Problem
~In this section, we discuss an NP-c0mplete problem for which it is possible to Algorithms that depend on the values in .a pseudo-_polynomial way can
often be used to design polynomial-time appr~m~tion schemes~ and the
design a polynomial-time algorithm providing a very strong approximation. We
will consider a slightly more general version of the Knapsack (or Subset .Sum) algorithm we develop here is a very clean example of the basic strategy. In
particular, we will use the dynamic programming algorithm with running time
Problem. Suppose you have n items that you consider packing in a knapsack.
O(n2u*) to design a polynomial-time approximation scheme; the idea is as
Each item i = 1 ..... n has two integer parameters: a weight wi and a value
follows. If the values are small integers, then v* is smal! and the problem can
vi. Given a knapsack capacity W, the goal of the Knapsack Problem is to find
be solved in polynomial time already. On the other hand, if the values are
a subset S of items of maximum value subiect to the restriction that the total large, then we do not have to deal with them exactly, as we only want an
weight Of the set should not exceed W. In other words, we wish to maximize
approximately optimum solution. We will use a rounding parameter b (whose
~iaS 17i sub)ect to the condition ~i~s wi < W. value we’ll set later) and will consider the values rounded to an integer multiple
How strong an approximation can we hope for? Our algorithm will take of b. We will use our dynamic programming algorithm to solve the problem
as input the weights and values defi~ng the problem and wil! also take an with the rounded values. More precisely, for each item i, let its rounded value
extra parameter ~, the desired precision. It will find a subset S whose total be ~i = [vi/b] b. Note that the rounded and the original value are quite close
weight does not exceed W, with value ~i~s vi at most a (1 + ~) factor below to each other.
the maximum possible. The algorithm wil! run in polynomial time for any
11.8 Arbitrarily Good Approximations: The Knapsack Problem
Chapter 11 Approximation Algorithms 647
646
determine maxi 1)i. The item j with maximum value vj = maxi vi also has
(11.34) For each item i we have vi <_ ~li ~ Vi q- b. maximum value in the rounded problem, so maxi 0i = 0i = [vj/b] = n~-~.
What did we gain by the rounding~- If the values were big to start with, we Hence the overall running time of the algorithm is 0(n3~-1). Note that this
did not make them any smaller. However, the rounded values are all integer is polynomial time for any fixed ~ > 0 as claimed; but the dependence on the
multiples of a common value b. So, instead of solving the problem with the desired precision ~ is not polynomial, as the running time includes ~-1 rather
rounded values 5i, we can change the units; we can divide all values by b and
n.
than log ~-1. []
get an equivalent problem. Let Oi = ~Ji/b = [vi/b] for i = 1 .....
(11.35) The Knapsack Problem with values Oi and the scaled problem with Finally, we need to consider the key issue: How good is the solution
values Oi have the same set of oprgmum so tutions, the optimum values differ obtained by this algorithm? Statement (!1.34) shows that the values 9i we used
exactly by a factor of b, and the scaled values are integral. are close to the real values vi, and this suggests that the solution obtained may
not be far from optimal. .,
Now we are ready to state our approximation algorithm. We will assume
that all items have weight at most W (as items with weight wi > W are not in
any solution, and hence can be deleted). We also assume for simplicity that
~-1 is an integer. then w~ have
Knapsack-Approx (E) :
Set b = @/(2rt)) maxi vi Proof. Let S* be any set satisfying Y~,i~s* wi <-W. Our algorithm finds the
Solve the Knapsack Problem with values Oi (equivalently ~i) optimal solution with values 0~, so we know that
Keturn the set S of items found
The Shortest-First Algorithm only terminates when every unselected inter- 2. At a lecture in a computational biolog3, conference one of us attended
val conflicts with one of the intervals it selected. So, in particular, each interval a few years ago, a well-known protein chemist talked about the idea of
in (9 is either included in A, or conflicts with an interval in A. building a "representative set" for a large collection of protein molecules
Now we use the following accounting scheme to bound the number of whose properties we don’t understand. The idea would be to intensively
intervals in O. For each i ~ 0, we have some interval j ~ A "pay" for i, as study the proteins in the representative set and thereby learn (by infer-
follows. If i is also in A, then i will pay for itself. Otherwise, we arbitrarily ence) about all the proteins in the full collection.
choose an interval j ~ A that conflicts with i and have j pay for i. As we iust To be useful, the representative set must have two properties.
argued, every interval in O conflicts with some interval in A, so all intervals ¯ It should be relatively small, so that it will not be too expensive to
in O will be paid for under this scheme. But by (!1.41), each interval j ~ A study it.
conflicts with at most two intervals in O, and so it will only pay for at most
Chapter 11 Approximation Algorithms
Exercises 653
652
If T+ai<_B then
¯ Every protein in the full collection should be "similar" to some pro- S +- S U {ai}
rein in the representative set. (in this way, it truly provides some T<---T+ai
information about all the proteins.) Endif
More concretely, there is a large set P of proteins. We define similarity End_for
on proteins by a distance function d: Given two proteins p and q, it returns
a number dLo, q) >_ 0. in fact, the function d(-, -) most typically used is Give an instance in which the total sum of the set S returned by
the sequence alignment measure, which we looked at when we studied this algorithm is less than half the total sum of some other feasible
dynamic programming in Chapter 6. We’ll assume this is the distance subset of A.
beingused here. There is a predefined distance cut-off A that’s specified ~) Give a polynomial-time approximation algorithm for this problem
as part of the input to the problem; two proteins p and q are deemed to with the following guarantee: It returns a feasible set $_ A whose
be "similar" to one another ff and ouly ff d(p, q) _< A. total sum is at least half as large as the maximum to(dl sum of any
We say that a subset of P is a representative set if, for every protein feasible set S’ __c A. Your algorithm should have a nmning time of at
p, there is a protein q in the subset that is similar to it--that is, for which most O(n log n).
d(p, q) _< A. Our goal is to find a representative set that is as small as
Consider an optimization version of the Hitting Set Problem defined as
possible. follows. We are given a set A = {a1 ..... an} and a collection B1, B2 ..... Bm
(a) Give a polynomial-time algorithm that approximates the minimum of subsets of A. Also, each element ai ~ A has a weight wi > O. The problem
representative set to within a factor of O(log n). Specificaliy, your
is to find a hit~g set H ___ A such that the total weight of the elements in
algorithm should have the following property: If the minimum pos- H, that is, F.aiEn wi, is as small as possible. (As in Exercise 5 in Chapter 8,
sine size of a representative set is s*, your algorithm should return we say that H is a hitting set if H n Bi is not empW for each i.) Let’~b =
a representative set of size at most O(s* log n). maxi IBit denote the maximum size of any of the sets B1, Ba ..... Bin. Give
(b) Note the dose similarity between this problem and the Center Selec-
a polynomial-time approximation algorithm for this problem that finds
tion Problem--a problem for which we considered approximation a hitting set whose total weight is at most b times the minimum possible.
algorithms in Section 11.2. Why doesn’t the algorithm described
there solve the current problem? You are asked to consult for a business where clients bring in jobs each
day for processing. Each job has a processing time ti that is known when
3. Suppose you are given a set of positive integers A = {al, a2 ..... a,} and the job arrives. The company has a set of ten machines, and each job can
a positive integer B. A subset S _ A is called feasible ff the sum of the be processed on any of these ten machines.
numbers in S does not e~xceed/3: At the moment the business is running the simple Greedy-Balance
Eai<_B. Algorithm we discussed in Section 11.1. They have been told that this
alES may not be the best approximation algorithm possible, and they are
The sum of the numbers in S will be ca!led the total sum of S. wondering if they should be afraid of bad performance. However, they
are reluctant to change the scheduling as they really like the simplicity of
You would like to select a feasible subset S of A whose total sum is the current algorithm: jobs can be assigned to machines as soon as they
as large as possible. arrive, without having to defer the decision until later jobs arrive.
Example. If A = {8, 2, 4} and/3 = 11, then the optimal sohition is the subset
In particular, they have heard that t~s algorithm can produce so-
s = (8, lutions with makespan as much as twice the minimum possible; but
(a) Here is an algorithm for this problem. their e.xperience with the algorithm has been quite good: They have been
running it each day for the last month, and they have not observed it
Initially S = ~ to produce a makespan more than 20 percent above the average load,
Define T = 0 1lOE~ t~.
For i=1,2 ..... n
Chapter 1! Approximation Algorithms Exercises 655
654
9 by showing ad A~ to customers 1, 2, 4, ad A2 to customer 3, and ad A3 to
To try understanding why they don’t seem to be encountering this
customers 5 and 6.
factor-of-two behavior, you ask a bit about the kind of jobs and loads
The ultimate goal is to find a selection of an ad for each customer
they see. You find out that the sizes of jobs range between 1 and.50, that
that maximizes the spread. Unfortunately, this optimization problem
is, 1 <_ t; _ 50 for a~ jobs i; and the total load ~i ti is quite high each day:
is NP-hard (you don’t have to prove this). So instead, we will try to
it is always at least 3,000. approximate it.
Prove that on the type of inputs the company sees, the Greedy-
(a) Give a polynomial-time algorithm that approximates the maximum
Balance Algorithm will always find a solution whose makespan is at most
spread t.o within a factor of 2. That is, if the maximum spread
20 percent above the average load.
is s, then your algorithm should produce a selection of one ad
for each customer that has spread at least s/2. In designing your
6. Recall that in the basic Load Balancing Problem from Section 11.1, we’re
interested in placing jobs on machines so as to minimize the makespan-- algorithm, you may assume that no single customer has a value that
the maximum load on any one machine. In a number of applications, it is significantly above the average; specifically, if ~ = ~.~=1 v~ denotes
is natural to consider cases in which you have access to machines with the total value of all customers, then you may assume that no single
different amounts of processing power, so that a given job may complete customer has a value e.xceecling ~/(2ra).
more quicldy on one of your machines than on another. The question (b) Give an example of an instance on which the algorithm you designed
then becomes: How should you allocate jobs to machines in these more in part (a) does not find an optimal solution (that is, one of maximum
heterogeneous systems? spread). Say what the optimal solution is in your sample instance,
Here’s a basic model that exposes these issues. Suppose you have . and what your algorithm finds.
a system that consists of m slow machines and k fast machines. The Some friends of yours are working with a system that performs real-time
fast machines can perform t~vice as much work per unit time as the scheduling of jobs on multiple servers, and they’ve come to you for help in
slow machines. Now you’re given a set of n jobs; job i takes time t~ to getting around an unfortunate piece of legacy code that can’t be changed.
process on a slow machine and time ½t~ to process on a fast machine.
Here’s the situation. When a batch of jobs arrives, the system allo-
You want to assign each job to a machine; as before, the goal is to
cates them to servers using ~e simple Greedy-Balance Algorithm from
minimize the makespan--that is the maximum, over all machines, of the
Section 11.1, which provides an approximation to within a factor of 2.
total processing time of jobs assigned to that machine.
Give a polynomial-time algorithm that produces an assignment of in the decade and a ha~ since this part of the system was written, the
hardware has gotten faster to the point where, on the instances that the
jobs to machines with a makespan that is at most three times the opti- system needs to deal with, your friends find that it’s generally possible
to compute an optimal solution.
You’re consulting for an e-commerce site that receives a large number The difficulty is that the people in charge of the system’s internals
w~n’t let them change the portion of the software that implements the
of visitors each day. For each visitor ~, where i ~ {1, 2 ..... n}, the site
Greedy-Balance Algorithm so as to replace it with one that finds the
has assigned a value v~, representing the expected revenue that can be
optimal solution. (Basically, this portion of the code has to interact with
obtained from this customer.
Each visitor i is shown one of m possible ads A1, A2 ..... Am as they so many other parts of the system that it’s not worth the risk of something
going wrong if it’s replaced.)
enter the site. The site wants a selection of one ad for each customer so After grumbling about this for a while, your friends come up with an
that each ad is seen, overall, by a set of customers of reasonably large
alternate idea. Suppose they could write a little piece of code that takes
total weight. Thus, given a selection of one ad for each customer, we will
the description of the jobs, computes an optimal solution (since they’re
define the spread of this selection to be the minimum, over j = 1, 2 ..... m,
able to do this on the instances that arise in practice), and then feeds
of the total weight of all customers who were shown ad Aj.
the jobs to the Greedy-Balance Algorithm in an order that will cause it
Example Suppose there are six customers with v~lues 3, 4, 12, 2, 4, 6, and
to allocate them optimally. In other words, they’re hoping to be able to
there are m = 3 ads. Then, in this instance, one could achieve a spread of
Exercises 657
Chapter 11 Approximation Mgorithms
656
goal is to choose an independent set S of nodes of the grid, so that the
reorder the input in such a way that when Greedy-Balance encounters the
sum of the weights of the nodes in Sis as large as possible. (The sum of
input in this order, it produces an optimal solution.
the weights of the nodes in S will be called its total weight.)
So their question to you is simply the following: Is this always possi-
Consider the following greedy algorithm for this problem.
ble? Their conjecm_re is,
For every instance of the load balancing problem from Section 11.1, there
The "heaviest-first" greedy algorithm:
exists an order of the jobs so that ~vhen Greedy-Balance processes the jobs in
Start with S equal to the empty set
this order, it produces an assignment of jobs to machines ~vith the minimum While some node remiins in G
possible makespan. Pick a node u~ of maximum weight
Decide whether you think this conjecture is true or false, and give either Add u~ to S
a proof or a counterexample. Delete uf and its neighbors from G
Endwhi!e
9. Consider the following maximizationversion of the 3-DimensionalMatch- Ret ulql S
ing Problem. Given disjoint sets X, Y, and Z, and given a set T _ X x Y x Z
of ordered triples, a subset M c T is a 3-dimensional matching if each
element of X u Y u Z is contained in at most one of these triples. The (a) Let S be the independent set returned by the "heaviest-first" greedy
algorithm, and let T be any other independent set in G. Show that, for
Maximum 3-Dimensional Matching Problem is to find a 3-dimensional
matching M of maximum size. (The size of the matching, as usual, is the each node v ~ T, either u ~ S, or there is a node u’ S so that w(u) _<
and (v, v’) is an edge of G.
number of triples it contains. You may assume IXl = IYI = IZI ff you want.)
(b) Show that the "heaviest-first" greedy algorithm returns an indepen-
Give a polvnomial-time algorithm that finds a 3-dimension.~ match-
dent set of total weight at least ¼ times the maximum total:Weight of
ing of size at least ½ times the ma~xirnum possible size. any independent set in the grid graph G.
10. Suppose you are given an n x n grid graph G, as in Figure 11.13.
11. Recall that in the Knapsack Problem, we have n items, each with a weight
Associated with each node v is a weight ra(v), which is a nonnegative
tv~ and a value ui. We also have a weight bound W, and the problem is to se-
integer. You may assume that the weights of all nodes are distinct. Your
lect a set of items S of highest possible value subject to the condition that
the total weight does not exceed W--that is, ~i~s w~ _< W. Here’s one way
to look at the approximation algorithm that we designed in this chapter.
If we are told there exists a subset (9 whose total weight is ~.,~ w~ _< W
and whose total value is ~,~ ~ = V for some V, then our approximation
algorithm can find a set A with total weight ~i~ w~ _< W and total value at
least ~.~ u~ >_ V/(1 + ~). Thus the algorithm approximates the best value,
while keeping the weights strictly under W. (Of course, returning the set
(9 is always a valid solution, but since the problem is NP-hard, we don’t
expect to always be able to find (9 itself; the approximation bound of 1 + ~
means that other sets .4, with slightly less value, can be valid answers as
well.)
Now, as is well known, you can always pack a Little bit more for a trip
just by "sitting on your suitcase"--in other words, by slightly overflowing
the allowed weight limit. This too suggests a way of formalizing the
approximation question for the Knapsack Problem, but it’s the following,
Figure 11.13 A grid graph. different, formalization.
Notes and Further Reading 659
Chapter 11 Approximation Mgorithms
658
is to minimize the overall cost Y~.s~A fs + Y~u~U rnJns~A dus. Give an H(n)-
Suppose, as before, that you’re given n items with weights and values, approximation for this problem.
as well as parameters W and V; and you’re told that there is a subset O (Note that if all service costs dus are 0 or-infinity, then this problem
whose total weight is ~i~o wi _< W and whose total value is ~o V~ = V for is exactly the Set Cover Problem: fs is the cost of the set named s, and dus
some V. For a given fixed ~ > 0, design a polynomial-time algorithm that
is 0 ff node u is in set s, and infinity otherwise.)
finds a subset of items ~ such that ~i~ w~ _< (1 + ~)W and ~ vi >_ V.
In other words, you want A to achieve at least as high a total value as
the given bound V, but you’re allowed to exceed the weight llmit W by a
factor of 1 + ~. Notes and Further Reading
Example. Suppose you’re given four items, with weights and values as
The design of approximation algorithms for NP-hard problems is an active
follows:
area of research, and it is the focus of a book of surveys edited by Hochbaum
(lVl, 111) ~--- (5, 3), (192,112) = (4, 6) (1996) and a text by Vazirani (2001).
The greedy algorithm for load balancing and its analysis is due to Graham
(w3, v3) = (1, 4), (w4, v4) = (6, 11)
(1966, 1969); in fact, he proved that when the jobs are first sorted in descending
You’re also given W = 10 and V = 13 (since, indeed, the subset consisting of order of size, the greedy algorithm achieves an assignment within a factor 4_
the first three items has total weight at most 10 and has value 13). ,Finally, of optimal. (In the text, we give a simpler proof for the weaker bound of ~.)
you’re given ~ = .1. This means you need to find (via your approximation Using more complicated algorithms, even stronger approximation guarantees
algorithm) a subset of weight at most (1 + .1) ¯ 10 = 11 and value at least 13. can be proved for this problem (Hochbanm and Shmoys 1987; Hall !996). The
One vakid solution would be the subset consisting of the first and fourth techniques used for these stronger load balancing approximation algorithms
items, with value 14 >_ 13. (Note that this is a case where you’r~ able to are also closely related to the method described in the text for designing
achieve a value strictly greater than V, since you’re allowed to slightly arbitrarily good approximations for the Knapsack Problem.
overfill the knapsack.) The approximation algorithm for the Center Selection Problem follows the
approach of Hochbaum and Shmoys (1,985) and Dyer and Frieze (1985). Other
geometric location problems of this flavor are discussed by Bern and Eppstein
12. Consider the following problem. There is a set U of n nodes, which we
can think of as users (e.g., these are locations that need to access a (!996) and in the book of stirveys edited by Drezner (1995).
service, such as a Web server). You would like to place servers at multiple The greedy algorithm for Set Cover and its analysis are due independently
locations. Suppose you are given a set S possible sites that would be to Johnson (1974), Lov~sz (1975), and Chvatal (1979]. Further results for the
willing to act as locations for the servers. For each site s ~ S, there is Set Cover Problem are discussed in the survey by Hochbaum (1996).
a fee fs >-0 for placing a server at that location. Your goal will be to As mentioned in the text, the pricing method for designing approximation
approximately minimize the cost while providing the service to each of algorithms is also referred to as the primal-dual method and can be motivated
the customers. So far this is very much like the Set Cover Problem: The using linear programming. This latter perspective is the subject of the survey
places s are sets, the weight of set s is fs, and we want to select a collection by Goemans and Williamson (1996). The pricing algorithm to approximate the
of sets that covers all users. There is one extra complication: Users u ~ U Weighted Vertex Cover Problem is due to Bar-Yehuda and Even (1981).
can be served from multiple sites, but there is an associated cost dus for
serving user u from site s. When the value dus is very high, we do not want The greedy algorithm for the disjoint paths problem is ~lue to Kleinberg and
Tardos (1995); the pricing-based approximation algorithm for the case when
to serve user u from site s; and in general the service cost dus serves as an
multiple paths can share an edge is due to Awerbuch, Azar, and P!otkin (1993).
incentive to serve customers from "nearby" servers whenever possible.
Algorithms have been developed for many other variants of the Disjoint Paths
So here is the question, which we call the Facility Location Problem: Problem; see the book of surveys edited by Korte et al. (1990) for a discussion
Given the sets U and S, and costs f and d, you need to select a subset A _c ~ of cases that can be solved optimally in polynomial time, and Plotldn (1995)
at which to place servers (at a cost of ~_,s~A fs), and assign each user u to and Kleinberg (!996) for surveys of work on approximation.
the active server where it is cheapest to be served, rnins~A dus. The goal
Chapter 11 Approximation Mgofithms
660
The linear programming rounding algorithm for the Weighted Vertex Cover
Problem is due to Hochbaum (1982). The rounding algorithm for Generalized
Load Balancing is due to Lenstra, Shmoys, and Tardos (1990); see the survey
by Hall (1996) for other results in this vein. As discussed in the text, these
two results illustrate a widely used method for .designing approximation al-
gorithms: One sets up an integer programming formulation for the problem,
Cha~ter
transforms it to a related [but not equivalent) linear programming problem,
and then rounds the resulting solution. Vaziranl (2001) discusses many further
applications of this technique.
Local Search
Local search and randomization are two other powerful techniques for
designing approximation algorithms; we discuss these connections in the next
two chapters.
One topic that we do not cover in this book is inapproximabilitY. Just as
one can prove that a given NP-hard problem can be approximated to within
a certain factor in polynomial time, one can also sometimes establish lower In the previous two chapters, we have considered techniques for dealing with
bounds, showing that if the problem could be approximated to witl.g.’n bet- computationally intractable problems: in Chapter 10, by identifying structured
ter than some factor c in polynomial time, then it could be solved optimally, special cases of NP-hard problems, and in Chapter 11, by designing polynomial-
thereby proving T = 3q~P. There is a growing body of work that establishes such time approximation algorithms. We now develop a third and final topic related
limits to approximability for many NP-hard problems. In certain cases, these ~
positive and negative results have lined up perfectly to produce an approxima- Local search is a very general technique; it describes any algorithm that
tion threshold, establishing for certain problems that there is a polynomial-time "explores" the space of possible solutions in a sequential fashion, moving
approximation algorithm to within some factor c, and it is impossible to do in one step from a current solution to a "nearby" one. The generality and
better unless ~P = %f~P. Some of the early results on inapproximabilitY were not flexibility of this notion has the advantage that it is not difficult to design
rove, but more recent work ;~ ishas introduced
covered powerful
in the survey tech-
by Arora a local search approach to almost any computationally hard problem; the
niques that become quite intricate. ~m~ tup~ counterbalancing disadvantage is that it is often very difficult to say anything
and Lund (1996). precise or provable about the quality of the solutions that a local search
algorithm finds, and consequently very hard to tell whether one is using a
Notes oft the Exercises Exercises 4 and 12 are based on results of Dorit
Hochbaum. Exercise 11 is based on results of Sartaj Sahni, Oscar Ibarra, and good local search algorithm or a poor one.
Chul Kim, and of Dorit Hochbaum and David Shmoys. Our discussion of local search in this chapter will reflect these trade-offs.
Local search algorithms are generally heuristics designed to find good, but
not necessarily optimal, solutions to computational problems, and we begin
by talking about what the search for such solutions looks like at a global
level. A useful intuitive basis for this perspective comes from connections with
energy minimization principles in physics, and we explore this issue first. Our
discussion for this part of the chapter will have a somewhat different flavor
from what we’ve generally seen in the book thus far; here, we’ll introduce
some algorithms, discuss them qualitatively, but admit quite frankly that we
can’t prove very much about them.
There are cases, however, in which it is possible to prove properties
of local search algorithms, and to bound their performance relative to an
Chapter 12 Local Search 12.1 The Landscape of an Optimization Problem 663
662
behavior of each atom in the substance, as it interacts with nearby atoms.
optimal solution. This will be the focus of the latter part of the chapter: We But it is also useful to speak of the properties of the system as a whole--as
begin by considering a case--the dynamics of Hopfield neural networks--in an aggregate--and this is the domain of statistical mechanics. We will come
which local search provides the natural way to think about the underlying back to statistical mechanics in a little while, but for now we simply observe
behavior of a complex process; we then focus on some NP-hard problems for that our notion of an "energy landscape" provides useful visual intuition for
which local search can be used to design efficient algorithms with provable the process by which even a large physical system minimizes its energy. Thus,
approximation guarantees. We conclude the chapter by discussing a different while it would in reality take a huge number of dimensions to draw the true
type of local search: the game-theoretic notions of best-response dynamics "landscape" that constrains the system, we can use one-dimensional "cartoon"
and Nash equilibria, which arise naturally in the study of systems that contain representations t6 discuss the distinction between local and global energy
many interacting agents. minima, the "funnels" around them, and the "height" of the energy barriers
between them.
12.1 The Landscape of an Optimization Problem Taking a molten material and trying to coo! it to a perfect crystalline solid
Much of the core of local search was developed by people thinking in terms is really the process of trying to guide the underlying collection of atoms to
of analogies with physics. Looking at the wide range of hard computational its global potential energy minimum. This can be very difficult, and the large
number of local minima in a typical energy landscape represent the pitfalls
problems that require the minimization of some quantity, they reasoned as
that can lead the system astray in its search for the global minimum. Thus,
follows. Physical systems are performing minimization all the time, when they
rather than the simple example of Figure !2.2, which simply contains ~ single
seek to minimize their potential energy. What can we learn from the ways in
wrong choice, we should be more worried about landscapes with the schematic
which nature performs minimization? Does it suggest new kinds of algorithms?
Figure 12.1 When the poten- cartoon representation depicted in Figure 12.3. This can be viewed as a "jagged Figure 12.3 In a general en-
tial energy landscape has the funnel," in which there are local minima waiting to trap the system all the way ergy landscape, there may be
structure of a simple funnel, Potential Energy a very large number of local
it is easy to find the lowest along its journey to the bottom.
minima that make it hard to
point. If the world really looked the way a freshman mechanics class suggests, it find the global minimum, as
seems that it would consist entirely of hockey pucks sliding on ice and balls The Connection to Optimization in the "jagged funnel" drawn
rolling down inclined surfaces. Hockey pucks usually slide because you push This perspective on energy minimization has really been based on the follow-
here.
them; but why do balls roll downhill? One perspective that we learn from
ing core ingredients: The physical system can be in one of a large number of
Newtonian mechanics is that the ball is trying to minimize its potential energy. possible states; its energy is a function of its current state; and from a given
In particular, if the ball has mass m and falls a distance of h, it loses an amount state, a small perturbation leads to a "neighboring" state. The way in which
of potential energy proportional to mh. So, if we release a ball from the top these neighboring states are ]inked together, along with the structure of the
of the funne!-shaped landscape in Figure 12.1, its potential energy will be energy function on them, defines the underlying energy landscape.
minimized at the lowest point.
It’s from this perspective that we again start to think about computational
If we make the landscape a little more complicated, some extra issues
minimization problems. In a typical such problem, we have a large (typically
creep in. Consider the "double funnel" in Figure 12.2. Point A is lower than
exponential-size) set C of possible solutions. We also have a cost function c(.)
point B, and so is a more desirable place for the ball to come to rest. But if that measures the quality of each solution; for a solution S ~ C, we write its
we start the bal! rolling from point C, it will not be able to get over the barrier cost as c(S). The goal is to find a solution S* ~ ~ for which c(S*) is as small as
between the two funnels, and it will end up at B. We say that the ball has
possible.
become trapped in a local minimum: It is at the lowest point if one looks in
So far this is just the way we’ve thought about such problems a!l along. We
the neighborhood of its current location; but stepping back and looking at the
now add to this the notion of a neighbor relation on solutions, to capture the
Figure 12.2 Most landscapes whole landscape, we see that it has missed the global minimum.
are more complicated than idea that one solution S’ can be obtained by a small modification of another
simple funnels; for exam- Of course, enormously large physical systems must also try to minimize
solution S. We write S ~ S’ to denote that S’ is a neighboring solution of S,
ple, in this "double funnel," their energy. Consider, for example, taking a few grams of some homogeneous and we use N(S) to denote the neighborhood of S, the set {S’ : S ~ S’}. We
there’s a deep global mini- substance, heating it up, and studying its behavior over time. To capture
mum and a shallower local will primarily be considering symmetric neighbor relations here, though the
the potential energy exactly, we would in principle need to represent the
12.1 The Landscape of an Optimization Problem 665
Chapter 12 Local Search
664
Let’s think first about a very simple local search algorithm, which we’ll
basic points we discuss will apply to asymmetric neighbor relations as.well. A term gradient descent. Gradient descent starts with the full vertex set V and
crucial point is that, while the set C of possible solutions and the cost function . uses the following role for choosing a neighboring solution.
c(-) are provided by the specification of the problem, we have the freedom to
Let S denote the current solution. I[ there is a neighbor S’ of S with strictly
make up any neighbor relation that we want.
A local search algorithm takes this setup, including a neighbor relation, and lower cost, then choose the neighbor whose cost is as small as possible.
Othenvise terminate the algorithm.
works according to the following high-level scheme. At . all times,
S/
it maintains
a current solution S ~ C. In a given step, it chooses a neighbor of S, declares So gradient descent moves strictly "downhill" as long as it can; once this is no
S’ to be the new current solution, and iterates. Throughout the execution of longer possible, it’stops.
the algorithm, it remembers the minimum-cost solution that it has seen thus We can see that gradient descent terminates precisely at solutions that are
far; so, as it runs, it gradually finds better and better solutions. The crux of local minima: solutions S such that, for all neighboring S’, we have c(S) < c(S’).
a !~cal search algorithm is in the choice of the neighbor relation, and in the This definition corresponds very naturally to our notion of local minima in
design of the rule for choosing a neighboring solution at each step. energy landscapes: They are points from which no one-step perturbation will
Thus one can think of a neighbor relation as defining a (generally undi- improve the cost function.
rected) graph on the set of al! possible solutions, with edges joining neigh- How can we visualize the behavior of a local search algorithm in terms
boring pairs of solutions. A local search algorithm can then be viewed as of the kinds of energy landscapes we illustrated earlier? Let’s think first about
performing a walk on this graph, trying to move toward a good solution. gradient descent. The easiest instance of Vertex Cover is surely an n-node
graph with no edges. The empty set is the optimal solution (.since there are no
An Application to the Vertex Cover Problem edges to cover), and gradient descent does exceptionally well at finding this
This is still all somewhat vague without a concrete problem to think about; so solution: It starts with the full vertex set V, and keeps deleting nodes until
we’ll use the Vertex Cover Problem as a running example here. It’s important there are none left. Indeed, the set of vertex covers for this edge-less graph
to keep in mind that, while Vertex Cover makes for a good example, there corresponds naturally to the funnel we drew in Figure 12.1: The unique local
are many other optimization problems that would work just as well for this minimum is the global minimum, and there is a downhill path to it from any
point.
illustration.
Thus we are given a graph G = (V, E); the set ~ of possible solutions When cangradient descent go astray? Consider a "star graph" G, consisting
consists of all subsets S of V that form vertex covers. Hence, for example, we of nodes Xl, Yl, Y2 ..... Yn-1, with an edge from xl to each Yi. The minimum
always have V ~ ~. The cost c(S) of a vertex cover S will simply be its size; in vertex cover for G is the singleton set {xl}, and gradient descent can reach
this way, minimizing the cost of a vertex cover is the same as finding one of this solution by successively deleting Yl ..... Yn-1 in any order. But, if gradient
minimum size. Finally, we will focus our examples on local search .algorithms descent deletes the node xl first, then it is immediately stuck: No node Yi can be
that use a particularly simple neighbor relation: we say that S ~ S’ if S’ can deleted without destroying the vertex cover property, so the only neighboring.
be obtained from S by adding or deleting a single node. Thus our local search solution is the full node set V, which has higher cost. Thus the algorithm has
algorithms will be walking through the space of possible vertex covers, adding become trapped in the local minimum {Yl, Y2 .....Yn-1}, which has very high
or deleting a node tO their current solution in each step, and trying to find as cost relative to the global minimum.
small a vertex cover as possible. Pictorially, we see that we’re in a situation corresponding to the double
One useful fact about this neighbor relation is the following. funnel of Figure 12.2. The deeper funnel corresponds to’ the optimal solution
{xl}, while the shallower funnel corresponds to the inferior local minimum
(12.1) Each vertex cover S has at most n neighboring solutions. {Y> Y2 ..... Yn-1}. Sliding down the wrong portion of the slope at the very
beginning can send one into the wrong minimum. We can easily generalize
The reason is simply that each neighboring solution of S is obtained by adding this situation to one in which the two minima have any relative depths we
or deleting a distinct node. A consequence of (12.1) is that we can efficiently want. Consider, for example, a bipartite graph G with nodes x> x2 ..... xk and
examine all possible neighboring solutions of S in the process-of choosing Yl, Y2 ..... yg, where k < ~, and there is an edge from every node of the form xi
which to select.
12.2 The Metropolis Algorithm and Simulated Annealing 667
Chapter 12 Local Search
666
system is more likely to be in a lower energy state than in a high energy
to every node of the form yj. Then there are two local minima, corresponding state. Now let’s consider the effect of the temperature T. When T is small, the
to the vertex covers [xl ..... x~} and [Yl ..... Ye}. Which one is discovered by probability for a low-energy state is significantly larger than the probability for
a run of gradient descent is ent~ely determined by whether it first deletes an a high-energy state. However, if the temperature is large, then the difference
element of the form xi or yj. between these two probabilities is very sma!l, and the system is almost equally
With more complicated graphs, it’s often a useful exercise to think about likely to be in any state.
the kind of landscape they induce; and conversely, one sometimes may look at
a landscape and consider whether there’s a graph that gives rise to something The Metropoli~s Algorithm
like it. Metropolis et al. proposed the following method for performing step-by-step
For example, what kind of graph might yield a Vertex Cover instance with
simulation of a system at a fixed temperature T. At all times, the simulation
a landscape like the iagged furmel in Figure 12.37 One such graph is simply an maintains a current state of the system and tries to produce a new state by
n-node path, where’n is an odd number, with nodes labeled vl, vz ..... Vn in applying a perturbation to this state. We’ll assume that we’re only interested
order. The unique minimum vertex cover S* consists of all nodes vi where i is in states of the system that are "reachable" from some fixed initial state by a
even. But there are many local optima. For example, consider the vertex cover sequence of small perturbations, and we’ll assume that there is_ only a finite set
{v2, v3, vs, v6, v8, v9 .... } in which every third node is omitted. This is a vertex ~ of such states. In a single step, we first generate a small random perturbation
cover that is significantly larger than S*; but there’s no way to delete any node to the current state S of the system, resulting in a new state S’. LetE(S) and E(S’)
from it while stil! covering all edges. Indeed, it’s very hard for gradient descent denote the energies of S and S’, respectively. If E(S’) < E(S), then we update
to find the minimum vertex cover S* starting from the kill vertex set ~: Once
the current state to be S’. Otherwise let AE = E(S’) E(S) > 0. We update the
it’s deleted just a single node vi with an even value of i, it’s lost the chance to current state to be S’ with probability e-AE/(kT), and otherwise leave the current
find the global optimum S*. Thus the even/odd parity distinction in the nodes state at S.
captures a plethora of different wrong turns in the local search, and hence
Metropolis et al. proved that their simulation algorithm has the following
gives the overall funnel its jagged character. Of course, there is not a direct
property. To prevent too long a digression, we omit the proof; it is actually a
correspondence between the ridges in the drawing and the local optima; as
direct consequence of some basic facts about random walks.
we warned above, Figure 12.3 is ultimately just a cartoon rendition of what’s
going on.
(12.2) Let
But we see that even for graphs that are structurally very simple, gradient
descent is much too straightforward a local search algorithm. We now look at Z ~- E e-E(S)/(kT)"
some more refined local search algorithms that use the same type of neighbor
relation, but include a method for "escaping" from local minima. For a state S, let is(t) denote the fraction of the first t steps in which the state
of the simulation is in S. Then the limit of is(t) as t approaches oo is, with
probability approaching 1, equal to ½ ¯ e-E(s)/(kT).
12.2 The Metropolis Algorithm and
Simulated Annealing This is exactly the sort of fact one wants, since it says that the simulation
The first idea for an improved local search algorithm comes from the work of spends roughly the correct amount of time in each state, according to the
Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller (1953). They considered Gibbs-Boltzmann equation.
the problem of simulating the behavior of a physical system according to If we want to use this overall scheme to design a local search algorithm
principles of statistical mechanics. A basic model from this field asserts that the for minimization problems, we can use the analogies of Section 12.1 in which
probability of finding a physical system in a state with energy E is proportional states of the system are candidate solutions, with energy corresponding to cost.
to the Gibbs-Boltzmann function e-EI(kT), where T > 0 is the temperature and We then see that the operation of the Metropolis Algorithm has a very desirable
k > 0 is a constant. Let’s look at this flmction. For any temperature T, the pair of features in a local search algorithm: It is biased toward "downhill"
flmction is monotone decreasing in the energy E, so this states that a physical
12.2 The Metropolis Algorithm and Simulated Annealing 669
Chapter 12 Local Search
668
This behavior shows up in more complex examples as well, and in more
moves but will also accept "uphill" moves with smaller probability. In this way, complex ways; but it is certainly striking for it to show up here so simply. In
it is able to make progress even when situated in a local minimum. Moreover, order to figure out how we might fix this behavior, we return to the physical
as expressed in (12.2), it is globa!ly biased toward lower-cost solutions. analogy that motivated the Metropolis Algorithm, and ask: What’s the meaning
Here is a concrete formulation of the Metropolis Algorithm for a minimiza- of the temperature parameter T in the context of optimization?
tion problem. We can think of T as a one-dimensional knob that we’re able to turn,
and it controls the extent to which the algorithm is willing to accept uphill
moves. As we make T very large, the probability of accepting an uphill move
Staxt with an initial solution So, and constants k and T approaches 1, and the Metropolis Algorithm behaves like a random walk that
In one step: is basically indifferent to the cost function. As we make T very close to 0, on
Let S be the current solution the other hand, uphill moves are almost never accepted, and the Metropolis
Let S’ be chosen uniformly at random from the neighbors of S
Algorithm behaves almost identically to gradient descent ....
If c(S’)<_c(S) then
Update S +- S’
Simulated Annealing
Else
With probability e-(c(s’)-c(s))/(l~T) Neither of these temperature extremes--very low or very high--is an effective
Update S +- S’ way to solve minimization problems in general, and we can see this in physical
Otherwise settings as well. If we take a solid and heat it to a very high temperatur_e, we
Leave S unchanged do not expect it to maintain a nice crystal structure, even if this is energetically
EndIf favorable; and this can be explained by the large value of kT in the expression
e-E(S)/(kT), which makes the enormous number of less favorable states too
probable. This is a way in which we can view the "flinching" behavior of the
Thus, on the Vertex Cover instance consisting of the star graph in Sec- Metropolis Algorithm on an easy Vertex Cover instance: It’s trying to find the
tion 12.1, in which xl is joined to each of yl ..... Yn-1, we see that the lowest energy state at too high a temperature, when all the competing states
Metropolis Algorithm will quickly bounce out of the local minimum that arises have too high a probability. On the other hand, if we take a molten solid and
when xl is deleted: The neighboring solution in which xl is put back in will freeze it very abruptly, we do not expect to get a perfect crystal either; rather,
be generated and will be accepted with positive probability. On more complex we get a deformed crystal structure with many imperfections. This is because,
graphs as well, the Metropolis Algorithm is able, to some extent, to correct the with T very small, we’ve come too close to the realm of gradient descent, and
wrong choices it makes as it proceeds. the system has become trapped in one of the numerous ridges of its jagged
At the same time, the Metropolis Algorithm does not always b.ehave the energy landscape. It is interesting to note that when T is very small, then
way one would want, even in some very simple situations. Let’s go back to statement (12.2) shows that in the limit, the random walk spends most of its
the very first graph we considered, a graph G with no edges. Gradient descent time in the lowest energy state. The problem is that the random walk will take
solves this instance with no trouble, deleting nodes in sequence until none an enormous amount of time before getting anywhere near this limit.
are left. But, while the Metropolis Algorithm will start out this way, it begins In the early 1980s, as people were considering the coimection between
to go astray as it nears the global optimum. Consider the situation in which energy minimization and combinatorial optimization, Kirkpatrick, Gelatt, and
the current solution contains only c nodes, where c is much smaller than the Vecchi (!983) thought about the issues we’ve been discus’sing, and they asked
total number of nodes, n. With very high probability, the neighboring solution the following question: How do we solve this problem for physical systems,
generated by the Metropolis Algorithm will have size c + 1, rather than c - 1, and what sort of algorithm does this suggest? In physical systems, one guides
and with reasonable probability this uphill move will be accepted. Thus it a material to a crystalline state by a process known as annealing: The material
gets harder and harder to shrink the size of the vertex cover as the algorithm is cooled very gradually from a high temperature, allowing it enough time to
proceeds; it is exhibiting a sort of "flinching" reaction near the bottom of the reach equilibrium at a succession of intermediate lower temperatures. In this
funnel.
12.3 An Application of Loca! Search to Hopfield Neural Networks 671
Chapter 12 Local Search
670
12.3 An Application of Local Search to Hopfield
way, it is able to escape from the energy minima that it encounters a!l the way
through the cooling process, eventually arriving at the global optimum.
Neural Networks
Thus far we have been discussing local search as a method for trying to find the
We can thus try to mimic this process computationallY, arriving at an global optimum in a computational problem. There are some cases, however,
algorithmic technique known as simulated annealing. Simulated annealing in which, by examining the specification of the problem carefully, we discover
works by running the Metropolis Algorithm while gradually decreasing the that it is really just an arbitrary local optimum that is required. We now consider
value of T over the course of the execution. The exact way in which T is a problem that illustrates this phenomenon.
updated is called, for natural reasons, a cooling schedule, and a number of
considerations go into the design of the cooling schedule. Formally, a cooling
schedule is a function z from {1, 2, 3 .... } to the positive real numbers; in
iteration i of the Metropolis Algorithm, we use the temperature T = z(i) in our :~ The Problem
definition of the probability. The problem we consider here is that of finding stable conf~urations in
Qualitatively, we can see that simulated annealing allows for large changes HopfieId neural networks. Hopfield networks have been proposed ~s a simple
in the solution in the early stages of its execution, when the temperature is model of an associative memory, in which a large collection of ~ts are
high. Then, as the search proceeds, the temperature is lowered so that we are connected by an underlying network, and neighboring units try to correlate
less likely to undo progress that has already been made. We can also view their states. Concretely, a Hopfield network can be viewed as an undirected
simulated annealing as trying to optimize a trade-off that is implicit in’ {12.2). graph G = (I/, E), with an integer-valued weight we on each edge e; each weight
According to (!2.2~, values of T arbitrarily close to 0 put the highest probability may be positive or negative. A configuration $ of the network is an assignment
on minimum-cost solutions; however, (12.2~ by itself says nothing about the of the value -1 or +1 to each node u; we will refer to this value as the state su
rate of convergence of the functions is(t) that it uses. It turns out that these of the node u. The meaning of a configuration is that each node u, representing
functions converge, in general, much more rapidly for large values of T; and so a unit of the neural network, is trying to choose between one of two possible
to find minimum-cost solutions quickly, it is useful to speed up convergence states ("on" or "off"; "yes" or "no"); and its choice is influenced by those of
by starting the process with T large, and then gradually reducing it so as to its neighbors as follows. Each edge of the network imposes a requirement on
raise the probability on the optimal solutions. While we believe that physical its endpoints: If u is joined to v by an edge of negative weight, then u and v
systems reach a minimum energy state via annealing, the simulated annealing want to have the same state, while if u is joined to v by an edge of positive
method has no guarantee of finding an optimal solution. To see why, consider weight, then u and v want to have opposite states. The absolute value [We[
the double funnel of Figure 12.2. If the two funnels take equal area, then will indicate the strengtk of this requirement, and we will refer to ]we[ as the
at high temperatures the system is essentially equally likely to be in either absolute weight of edge e.
funnel. Once we cool the temperature, it will become harder and harder to Unfortunately, there may be no configuration that respects the require-
switch between the two funnels. There appears to be no guarantee that a~ the ments imposed by all the edges. For example, consider three nodes a, b, c all
end of annealing, we wil! be at the bottom of the lower funnel. mutually connected to one another by edges of weight 1. Then, no matter what
configuration we choose, two of these nodes will have the same state and thus
There are many open problems associated with simulated annealing, both
will be violating the requirement that they have opposite states.
in proving properties of its behavior and in determining the range of settings ¯
for which it works we!l in practice. Some of the general questions that come In view of this, we ask for something weaker. With respect to a given
up here involve probabilistic issues that are beyond the scope of this book. configuration, we say that an edge e = (u, v) is good if the requirement it
imposes is satisfied by the states of its two endpoints: either we < 0 and su = sv,
Having spent some time considering local search at a very general level, or we > 0 and su ~ sv. Otherwise we say e is bad. Note that we can express the
we now turn, in the next few sections, to some applications in which it is condition that e is good very compactly, as fol!ows: wesusv < O. Next we say
possible to prove fairly strong statements about the behavior of local search that a node u is satisfied in a given configuration if the total absolute weight
algorithms and about the local optima that they find.
12.3 An Application of Local Search to Hopfield Neural Networks 673
Chapter 12 Local Search
672
It is clear that a proof of (12.3) will need to rely somewhere on the
of all good edges incident to u is at least as large as the total absolute weight undirected nature of the network.
of all bad edges incident to u. We can write this as To prove (12.3), we will analyze the following simple iterative procedure,
E WeSaSv <_ O.
which we call the State-Flipping Algorithm, to search for a stable configuration.
v:e=(a,v)~E
While the current configuration is not stable
Finally, we call a configuration stable if all nodes are satisfied. There iust be an unsatisfied node
Why do we use the term stable for such configurati°ns~" This is based on Choose an unsatisfied node U
viewing the network from the perspective of an individual node u. On its own, Flip the state of u
the only choice u has is whether to take the state -1 or +1; and like all nodes, Endwhile
it wants to respect as many edge requirements as possible (as measured in
absolute weight). Suppose u asks: Should I flip my current state~. We see that An example of the execution of this algorithm is depicted in Figure 12.4,
if u does flip its state (while all other nodes keep their states the same), then ending in a stable configuration.
all the good edges incident to u become bad, and all the bad edges incident
to u become good. So, to maximize the amount of good edge weight under
its direct control, u should flip its state if and only if it is not satisfied. In
other words, a stable configuration is one in which no individual nod~ has an
incentive to flip its current state.
A basic question now arises: Does a Hopfield network always have a stable
configuration, and if so, how can we find one~.
-I
~ Designing the Algorithm
We will now design an algorithm that establishes the following result.
(12.3) Every Hopfield network has a stable configuration, and such a config- (a) (b) (c)
uration can be found in time polynomial in n and W = ~e ItVel.
We will see that stable configurations in fact arise very naturally as the
local optima of a certain local search procedure on the Hopfield n6twork.
To see that the statement of (12.3) is not entirely trivial, we note that
¯ it fails to remain true if one changes the model in certain natural ways. For
example, suppose we were to define a directed Hopfield network exactly as
above, except that each edge is directed, and each node determines whether
or not it is satisfied by looking only at edges for which it is the tail. Then,
in fact, such a network need not have a stable configuration. Consider, for
example, a directed version of the three-node network we discussed earlier:
There are nodes a, b, c, with directed edges (a, b), (b, c), (c, a), all of weight (d) (e) (f)
!. Then, if all nodes have the same state, they will all be unsatisfied; and if Figure 12.4 Parts (a)-(f) depict the steps in an execution of the State-Flipping Algorithm
one node has a different state from the other two, then the node directly in for a five-node Hopfield network, ending in a stable configuration. (Nodes are colored
front of it will be unsatisfied. Thus there is no configuration of-this directed black or white to indicate their st’ate.)
If we add together inequalities (12.1) and (!2.2), and divide by 2, we get Proof. Each flip improves the objective fllnction by at least a factor of (1 +
to~v_<to(a,~3). (12.~3 ~/n). Since (1 + l/x)x >_ 2 for any x _> 1, we see that (1 + ~/n)n/~ >_ 2, and so
the objective function increases by a factor of at least 2 every rt/~ flips. The
{u,vl~A
weight cannot exceed W, and hence it can only be doubled at most log W
The left-hand side of inequalit~ (12.3) accounts for a!l edge weight that does times. []
not cross from A to/3; so if we addto(A, B) to both sides of (12.B), the left-
hand side becomes equal to W. The right-hand side becomes 2to(A, B), so we
have W <_ 2to(A,B), or to(A, B) > ½W. 12.5 Choosing a Neighbor Relation
Since the globally optimal partition (A*,/3") clearly satisfies to(A*, B*) _< We began the chapter by saying that a local search algorithm is really based
W, we have to(A, B) >_ ½to(A*,/3"). ,, on two fundamental ingredients: the choice of the neighbor relation, and the
rule for choosing a neighboring solution at each step. In Section 12.2 we spent
Notice that we never really thought much about the optimal partition time thinking about the second of these: both the Metropolis Algorithm and
(A*, B*) in the proof of (12.5); we really showed the stronger statement that, simulated annealing took the neighbor relation as given and modified the way
in any locally optimal solution under the single-flip, neighborhood, at least half in which a neighboring solution should be chosen.
the total edge weight in the graph crosses the partition. What are some of the issues that should go into our choice of the neighbor
Statement (12.5) proves that a local optimum is a 2-approximation to relation? This can turn out to be quite subtle, though at a high level the trade-off
the maximum cut. This suggests that the local optimization may be a good is a basic one.
algorithm for approximately maximizing the cut value. However, there is one
more issue that we need to consider: the running time. As we saw at the end (i) The neighborhood of a solution should be rich enough that we do not
tend to get stuck in bad local optima; but
of Section 12.3, the Single-Flip Algorithm is only pseudo-polynomial, and it
is an open problem whether a local optimum can be found in polynomial (ii) the neighborhood of a solution should not be too large, since we want to
time. However, in this case we can do almost as well, simply by stopping the be able to efficiently search the set of neighbors for possible local moves.
algorithm when there are no "big enough" improvements. If the first of these points were the only concern, then it would seem that we
Let (A, B) be a partition with weight w(A, B). For a fixed ~ > 0, let us say should simply make all solutions neighbors of one another--after all, then
that a single node flip is a big-improvement-fliP if it improves the cut value by there would be no local optima, and the global optimum would always be just
at least ~w(A, B) where rt = IVI. Now consider a version of the Single-Flip one step away! The second point exposes the (obvious) problem with doing
Algorithm when we only accept big-improvement-fliPS and terminate once this: If the neighborhood of the current solution consists of every possible
no such flip exists, even if the current partition is not a local optimum. We solution, then the local search paradigm gives us no leverage whatsoever; it
claim that this will lead to almost as good an approximation and will run reduces simply to brute-force search of this neighborhood.
in polynomial time. First we can extend the previous proof to show that the
Actually, we’ve already encountered one case in which choosing the right
resulting cut is almost as good. We simply have to add the term ~w(A, B) to neighbor relation had a profound effect on the tractability of a problem, though
each inequality, as all we know is that there are no big-improvement-fliPS. we did not explicitly take note of this at the time: This was in the Bipartite
Matching Problem. Probably the simplest neighbor relation on matchings
(i2.6) Let (A, B) be a partition such that no big-improvement-flip is possibIel would be the following: M’ is a neighbor of M if M’ can be obtained by
Let (A*, B*) be a globally optimal partition. Then (2 + the insertion or deletion of a single edge in M. Under this definition, we get
"landscapes" that are quite jagged, quite like the Vertex Cover examples we
Next we consider the nmning time_
12.6 Classification via Local Search 681
Chapter 12 Local Search
680
k nodes to the opposite side of the partition. This becomes prohibitive even
saw earlier; and we can get locally optimal matchings under this definition for small values of k.
that have only hail the size of the maximum matching. Kernighan and Lin (1970) proposed an alternate method for generating
But suppose we try defining a more complicated (indeed, asymmetric) neighboring solutions; it is computationally much more efficient, but still
neighbor relation: We say that M’ is a neighbor of M if, when we set up allows large-scale transformations of solutions in a single step. Their method,
the corresponding flow network, M’ can be obtained from M by a single which we’ll call the K-L heuristic, defines the neighbors of a partition (A, B)
augmenting path. What can we say about a matching M if it is a local maximum according the following n-phase procedure.
under this neighbor relation~. In this case, there is no augmenting path, and
so M must in fact be a (globally) maximum matching. In other words, with In phase 1, we choose a single node to flip, in such a way that the value
of the resulting solution is as large as possible. We perform this flip even
this neighbor relation, the only locai maxima are global maxima, and so
if the value of the solution decreases relative to tv(A, B). We mark the
direct gradient ascent wil! produce a maximum matching. If we reflect on
what the Ford-Fulkerson algorithm is doing in our reduction from Bipartite node that has been flipped and let (A1, B1) denote the resulting solution.
Matching to Maximum Flow, this makes sense: the size of the matching strictly At the start of phase k, for k > 1, we have a partition (Ak_l, Bk_l); and
increases in each step, and we never need to "back out" of a local maximum. k - 1 of the nodes are marked. We choose a single unmarked node to
Thus, by choosing the neighbor relation very carefully, we’ve turned a iagged flip, in such a way that the value of the resulting solution is as large as
optimization landscape into a simple, tractable funnel. possible. (Again, we do this even if the value of the solution decreases as
Of course, we do not expect that things will always work out this Wel!. a result.) We mark the node we flip and let (A~:, Bk) denote the resulting
For example, since Vertex Cover is NP-complete, it would be surpriging if it solution.
allowed for a neighbor relation that simultaneously produced "well-behaved" After n phases, each node is marked, indicating that it has been flipped
landscapes and neighborhoods that could be searched efficiently. We now precisely once. Consequently, the fin!l partition (An, Bn) is actually the
look at several possible neighbor relations in the context of the Maximum Cut mirror image of the original partition (A, B): We have An = B and Bn = A.
Problem, which we considered in the previous section. The contrasts among Finally, the K-L heuristic defines the n- 1 partitions (A1, B1) .....
these neighbor relations will be characteristic of issues that arise in the general (An_~, Bn_l) to be the neighbors of (A, B). Thus (A, B) is a !ocal optimum
topic of local search algorithms for computationallY hard graph-partitioning under the K-L heuristic if and only if u~(A, B) > w(Ai, Bi) for 1 < i < n - 1.
problems. So we see that the K-L heuristic tries a very long sequence of t~ps, even
while it appears to be making things worse, in the hope that some partition
(Ai, Bi) generated a!ong the way will turn out better than (A, B). But even
Local Search Algorithms for Graph Partitioning though it generates neighbors very different from (A, B), it only performs n flips
In Section 12.4, we considered a state-flipping algorithm for the Maximum- in total, and each takes only O(n) time to perform. Thus it is computationally
Cut Problem, and we showed that the locally optimal sointions provide a much more reasonable than the k-flip rule for larger values of k. Moreover, the
2-approximation. We now consider neighbor relations that produce larger K-L heuristic has turned out to be very powerful in practice, despite the fact
neighborhoods than the single-flip rule, and consequently attempt to reduce that rigorous analysis of its properties has remained largely an open problem.
the prevalence of local optima. Perhaps the most natural generalization is the
k-flip neighborhood, for k >_ 1: we say that partitions (A, B) and (A’, B’) are
neighbors under the k-flip rule if (A’, B’) can be obtained from (A, B) by moving 12.6 Classification via Local Search
at most k nodes from one side of the partition to the other. We now consider a more complex application of local search to the design
Now, clearly if (A, B) and (A’, B’) are neighbors under the k-flip rule, then of approximation algorithms, related to the Image Segmentation Problem that
they are also neighbors under the k’-flip rule for every k~ > k. Thus, if (A, B) is a we considered as an application of network flow in Section 7. !0. The more
local optimum under the k’-flip rule, it is also a local optimum under the k-flip complex version of Image Segmentation that we focus on here will serve as
rule for every k < k’. But reducing the set of local optima by raising the value an example where, in order to obtain good performance from a local search
of k comes at a steep .computationai price: to examine the set of neighbors of algorithm, one needs to use a rather complex neighborhood structure on the
(A, B) under the k-flip rule, we must consider all O (nk) ways of moving up to
12.6 Classification via Local Search 683
Chapter 12 Local Search
682
forces that will guide the choice of the best labeling. For each edge (i, ]) E E,
set of solutions. We will find that the natural "state-flipping" neighborhood we have a separation penalty pit > 0 for labeling the two nodes i and j with
that we saw in earlier sections can result in very bad local optima. To Obtain different labels. In addition, nodes are more likely to have certain labels than
good performance, we will instead use an exponentially large neighborhood. others. This is expressed through an assignment penalty. For each node i E V
One problem with such a large neighborhood is that we can no longer afford and each label a ~ L, we have a nonnegative penalty Q(a) >_ 0 for assigning
to search though all neighbors of the current solution one by one for an label a to node i. (These penalties play the role of the likelihoods from the
improving solution. Rather, we will need a more sophisticated algorithm to Two-Label Image Segmentation Problem, except that here we vie=w them as
find an improving neighbor whenever one exists. costs to be minimized.) The Labeling Problem is to find a labeling f : V -+ L
that minimizes the total penalty:
~ The Problem ¯ (f) ~ ~ Q(f(i)) +
Recall the basic Image Segmentation Problem that we considered as an appli- i~ V (i,j) ~E:f(i)
cation of network flow in Section 7.10. There we formulated the problem of
Observe that the Labeling Problem with only two labels is precisely the
segmenting an image as a labeling problem; the goal was to label (i.e., classify)
Image Segmentation Problem from Section 7.10. For three labels, the Labeling
each pixel as belonging to the foreground or the background of the image. At
the time, it was clear that this was a very simple formulation of the problem, Problem is already NP-hard, though we will not prove this here.
and it would be nice to handle more complex labeling tasks--for example, Our goal is to develop a local search algorithm for this problem, in which
to segment the regions of an image based on their distance from the camera. local optima are good approximations to the optimal solution. This will also
Thus we now consider a labeling problem with more than two label~. In the serve as an illustration of the importance of choosing good neighborhoods
process, we will end up with a framework for classification that applies more for defining the local search algorithm. There are many possible choices for
neighbor relations, and we’]] see that some work a lot better than others. In
broadly than just to the case of pixels in an image.
particular, a fairly complex definition of the neighborhoods will be used to
In setting up the two-label foreground/background segmentatior~ problem,
obtain the approximation guarantee.
we ultimately arrived at the following formulation. We were given a graph
G = (V, E) where V corresponded to the pixels of the image, and the goal
was to classify each node in V as belonging to one of two possible classes: ~ Designing the Algorithm
foreground or background. Edges represented pairs of nodes likely to belong to A First Attempt: The Single-Flip Rule The simplest and perhaps most natural
the same class (e.g., because they were next to each other), and for each edge choice for neighbor relation is the single-flip rule from the State-Flipping
(i, j) we were given a separation penalty P~i >- 0 for placing i and ] in different Algorithm for the Maximum-Cut Problem: Two labelings are neighbors if we
classes. In addition, we had information about the likelihood of whether a can obtain one from the other by relabeling a single nodel Unfortunately, this
node or pixel was more likely to belong to the foreground or the background. neighborhood can lead to quite poor local optima for our problem even when
These likelihoods translated into penalties for assigning a node to the class there are only two labels.
where it was less likely to belong. Then the problem was to find a labeling This may be initially su~rising, since the rule worked quite well for the
of the nodes that minimized the total separation and assignment penalties. Maximum-Cut Problem. However, our problem is related to the Minimum-Cut
We showed that this minimization problem could be solved via a minimum- Problem. In fact, Minimum s-t Cut corresponds to a special case when there are
cut computation. For the rest of this section, we will refer to the problem we only two labels, and s and t are the only nodes with assignment penalties. It is
defined there as Two-Label Image Segmentation. not hard to see that this State-Flipping Algorithm is not a.good approximation
Here we will formulate the analogous classification/labeling problem with algorithm for the Minimum-Cut Problem. See Figure 12.5, which indicates how
more than two classes or labels. This problem will turn out to be NP-hard, the edges incident to s may form the global optimum, while the edges incident
and we will develop a local search algorithm where the local optima are 2- to t can form a local optimum that is much worse.
approximations for the best labeling. The general labeling problem, which we A Closer Attempt: Considering Two Labels at a Time Here we will develop a
will consider in this section, is formulated as follows. We are given a graph local search algorithm in which the neighborhoods are much more elaborate.
G = (V, E) and a set L of k labels. The goal is to label each node in V with one One interesting feature of our algorithm is that it allows each solution to have
of the labels in L so as to minimize a certain penalty. There are two competing
12.6 Classification via Local Search 685
Chapter 12 Local Search
684
Figure 12.5 An instance of the Minimum s-t Cut Problem, where all edges have
capacity 1.
exponentiallY many neighbors. This appears to be contrary to the general rule Figure 12.6 A bad local optimum for the local search algorithm that considers only
that "the neighborhood of a solution should not be too large;’ as stated in two labels at a time.
Section 12.5. However, we will be working with neighborhoods in a more
subtle way here. Keeping the size of the neighborhood small is good if the
plan is to search for an improving local step by brute force; here, however, we This neighborhood is much better than the single-flip neighborhood we
will use a polynomial-time minimum-cut computation to determine whether considered first. For example, it solves the case of two labels optimally.
any of a solution’s exponentially many neighbors represent an improvement. However, even with this improved neighborhood, local optima can still be
The idea of the local search is to use our polynomial-time algorithm bad, as shown in Figure 12.6. In this example, there are three nodes s, t, and z
for Two-Label Image Segmentation to find improving local steps. First let’s that are each required to keep their initial labels. Each other node lies on one of
consider a basic implementation of this idea that does not always give a good the sides of the triangle; it has to get one of the two labels associated with the
approximation guarantee. For a labeling f, we pick two labels a, b ~ L and nodes at the ends of this side. These requirements can be expressed simply by
restrict attention to the nodes that have labels a or b in labeling f. In a single giving each node a very large assignment penalty for the labels that we are not
local step, we will allow any subset of these nodes to flip labels from a to b, or allowing. We define the edge separation penalties as follows: The light edges
from b to a. More formally, two labelings f and f’ are neighbors if there are two in the figure have penalty 1, while the heavy edges have a large separation
labels a, b ~ L such that for all other labels c {a, b} and all nodes i ~ V, we penalty of M. Now observe that the labeling in the figure has penalty M + 3
have f(i) = c if and only if f’(i) = c. Note that a state f can have exponentially but is locally optimal. The (globally) optimal penalty is only 3 and is obtained
many neighbors, as an arbitrary subset of the nodes labeled a and b can flip from the labeling in the figure by relabeling both nodes next to s.
their label. However, we have the following.
A Local Search Neighborhood That Works Next we define a different neigh-
(12.8) If a labeling f is not locally optimal for 2the neighborhood above, then a borhood that leads to a good approximation algorithm. The local optimum in
neighbor with smaller penalty can be found via k minimum-cat computations. Figure 12.6 may be suggestive of what would be a good neighborhood: We
ProoL There are fewer than k2 pairs of distinct labels, so we can try each pair need to be able to relabel nodes of different labels in a single step. The key is
separately. Given a pair of labels a, b ~ L, consider the problem of finding an to find a neighbor relation rich enough to have this property, yet one that still
improved labeling via swapping labels of nodes between labels a and b. This allows us to find an improving local step in polynomial time.
is exactly the Segmentation Problem for two labels on the subgraph of nodes Consider a labeling f. As part of a local step in our new algorithm, we will
that f labels a or b. We use the algorithm developed for Twd-Label Image want to do the following. We pick one labe! a ~ L and restrict attention to the
Segmentation to find the best such relabefing. "
12.6 Classification via Local Search 687
Chapter !2 Local Search
686
(12.9) Given a labeIin~ f and a label a, the minimum cut in the graph G’ "
nodes that do not have label a in labeling f. As a single local step, We will
(V’ , E~) corresponds to the minimum-penalty neighbor of labeling f obtained
allow any subset of these nodes to change their labels to a. More formally, by relabeling a subset of nodes to label a. As a result, the minimum-penalty
for two labelings f and f/, we say that f’ is a neighbor of f if there is a label neighbor off can be found via k minimum-cut computationsi one for each label
a ~ L such that, for all nodes i ~ V, either f’(i) = f(i) or f’(i) = a. Note that this in L.
neighbor relation is not symmetric; that is, we cannot get f back from f’ via
a single step. We will now show that for any labeling f we can find its best
neighbor via k minimum-cut computations, and further, a local optimum for Proof. Let (A, B) be an s-t cut in G’. The large value of M ensures that a
minimum-capacity cut will not cut any of these high-capacity edges. Now
this neighborhood is a 2_approximation for the minimum penalty labeling.
consider a node e in G’ corresponding to an edge e = (i, ]) E. The node e V’
Finding a Good Neighbor To find the best neighbor, we will try each label a has three adiacent edges, each with capacity p~j. Given any partition of the
separately. Consider a label a. We claim that the best relabeling in which nodes other nodes, we can place e so that at most one of these three edges is cut.
may change theix labels to a can be found via a minimum-cut computation. We’ll call a cut good if no edge of capacity M is cut and, for all the nodes
The construction of the minimum-cut graph G/= (V’, E’) is analogous to corresponding to edges in E, at most one of the adjacent edges is cut. So far
the minimum-cut computation developed for Two-Label Image Segmentation. we have argued that all minimum-capacity cuts are good.
There we introduced a source s and a sink t to represent the two labels. Here we Good s-t cuts in G’ are in one-to-one correspondence with relabelings of f
will also intxoduce a source and a sink, where the source s will represent label
obtained by changing the label of a subset of nodes to a,:Consider the capacity
a, while the sink t will effectively represent the alternate option nodes, have-- of a good cut. The edges (s, i) and (i, t) contribute exactly the assignment
namely, to keep their original labels. The idea will be to find the minimum cut penalty to the capacity of the cut. The edges (i,]) directly connecting nodes in
in G’ and relabel all nodes on the s-side of the cut to label a, while letting all V contribute exactly the separation penalty of the nodes in the corresponding
nodes on the t-side keep their original labels. labeling: p~j if they are separated, and 0 otherwise. Finally, consider an edge
For each node of G, we will have a corresponding node in th’e new set e = (i, j) with a corresponding node e W. If i and j are both on the source side,
V’ and will add edges (i, t) and (s, i) to E’, as was done in Figure 7.18 from none of the three edges adjacent to e are cut, and in all other cases exactly one
Chapter 7 for the case of two labels. The edge (i, t) will have capacity ci(a), as of these edges is cut. So again, the three edges adjacent to e contribute to the cut
cutting the edge (i, t) places node i on the source side and hence corresponds exactly the separation penalty between i and j in the corresponding labeling.
to labeling node i with label a. The edge (i, s) will have capacity ci(f(i)), if As a result, the capacity of a good cut is exactly the same as the penalty of the
f(i) # a, and a very large number M (or +oo) if f(i) = a. Cutting edge (i, t) corresponding labeling, and so the minimum-capacity cut corresponds to the
places node i on the sink side and hence corresponds to node i retaining its best relabeling of f. ,~
original label f(i) # a. The large capacity of M prevents nodes i with f(i) = a
from being placed on the sink side.
~ Analyzing the Algorithm
In the construction for the two-label problem, we added edges between
the nodes of V and used the separation penalties as capacities. This Works Finally, we need to consider the quality of the local optima under this definition
well for nodes that are separated by the cut, or nodes on the source side that of the neighbor relation. Recall that in our previous two attempts at defining
are both labeled a. However, if both i andj are on the sink side of the cut, then neighborhoods, we found that they can both lead to bad local optima. Now, by
the edge connecting them is not cut, yet i andj are separated if f(i) # f(J). We contrast, we’ll show that any local optimum under our new neighbor relation
deal with this difficulty by enhancing the construction of G’ as follows. For an is a 2-approximation to the minimum possible penalty.
edge (i,j), if f(i) = fU) or one of i orj is labeled a, then we add an edge To begin the analysis, consider an optimal labeling f*, and for a label a ~ L
to E’ with capacity pij. For the edges e = (i,j) where f(i) # f(J) and neither has let V~ = {i : f*(i) = a} be the set of nodes labeled by a in f*. Consider a locally
label a, we’ll have to do something different to correctly encode via the graph optimal labeling f. We obtain a neighbor fa of labeling f by starting with f and
G’ that i andj remain separated even if they are both on the sink side. For each relabeling all nodes in V~ to a. The labeling f is locally optimal, and hence this
such edge e, we add an extra node e to V’ corresponding to edge e, and add neighbor fa has no smaller penalty: qb (fa) >_ ,(f). Now consider the difference
Figure 12.7 The construction the edges (i, e), (e,j), and (e, s) all with capacity Pij. See Figurel2.7 for these * (fa) - * (f), which we know is nonnegative. What quantities contribute to
for edge e = (i j) with a ~-
edges.
12.6 Classification via Local Search 689
Chapter 12 Local Search
688
We will rearrange the inequality by grouping the positive terms on the lefi-
this difference? The only possible change in the assignment penalties could hand side and the negative terms on the right-hand side. On the lefi-hand
come from nodes in V~: for each i ~ V~, the change is ci(f*(i))- ciff(i))- side, we get Q(f*(i)) for all nodes i, which is exactly the assignment penalty
The separation penalties differ between the two labelings only ’in edges (i, j) of f*. In addition, we get the term pi] twice for each of the edges separated by
that have at least one end in V~. The following inequality accounts for these f* (once for each of the two labels f*(i) and f*(j)).
differences. On the right-hand side, we get Q(f(i)) for each node i, which is exactly the
(12.10) For a labeling f and its neighbor fa, roe have assignment penalty of f. In addition, we get the terms pij for edges separated
E Pip
(i,]) in or leaving V~
E Q(f(i)) + E P~J] >-
i~ V~ (i,j) in or leaving V~
while the labeling fa has a separation penalty of at most
for these edges. (Note that this latter expression is only an upper bound, since
an edge (i, j) leaving V~ that has its other end in a does not contribute to the We proved that all local optima are good approximations to the labeling
with minimum penalty. There is one more issue to consider: How fast does
separation penalty of fa-) i
the algorithm find a local optimum? Recall that in the case of the Maximum-
Cut Problem, we had to resort to a variant of the algorithm that accepts only
Now we are ready to prove our main claim.
big improvements, as repeated local improvements may not run in polynomial
(12.11) For any locally optimal labeling f, and any other labeling f*, we have time. The same is also true here. Let ~ > 0 be a constant. For a given labeling f,
a~ (f) < 2q~ (f*). we will consider a neighboring labeling f’ a significant improvement if ~ (f’) <
(1 - ¢/3k) a~ (f). To make sure the algorithm runs in polynomial time, we should
only accept significant improvements, and terminate when no significant
ProoL Let fa be the neighbor of f defined previously by relabeling nodes to
improvements are possible. After at most ~-lk significant improvements, the
label a. The labeling f is locally optimal, so we have c~(fa) - q2(f) >_ 0 for
penalty decreases by a constant factor; hence the algorithm will terminate in
all a ~ L. We use (12.10) to bound a2(fa) - ~(f) and then add the resulting
polynomial time. It is not hard to adapt the proof of (12.11) to establish the
inequalities for all labels to obtain the following: following.
(12.12) For any fixed ~ > O, the version of the local search algorithm that only
accepts significant improvements terminates in polynomial time and results in
a labeling f such that q)(f) <_ (2 + ~)qb (f*) for any other labeling
12.7 Best-Response Dynamics and Nash Equilibria 691
Chapter 12 Local Search
690
agents whose paths contain e. In this way, there is an incentive for the agents
12.7 Best-Response Dynamics and Nash Equilibria to choose paths that overlap, since they can then benefit by splitting the costs
Thus far we have been considering local search as a technique for solving - of edges. (This sharing model is appropriate for settings in which the presence
optimization problems with a single objective--in other words, applying local of multiple agents on an edge does not significantly degrade the quality of
transmission due to congestion or increased latency. If latency effects do come
operations to a candidate solution so as to minimize its total cost. There are
many settings, however, where a potentially large number of agents, each with into play, then there is a countervailing penalty for sharing; this too leads to
interesting algorithmic questions, but we will stick to our current focus for
its own goals and objectives, collectively interact so as to produce a solution
oblem A solution that is produced under these circumstances often .... now, in which sharing comes with benefits only.)
to some pr ¯
¯ ....... n .... ~"~’ to null the s°luti°n
reflects the "tug-of-war" that led to it, v~m each a~ L ~y-,o ~
Best-Response Dynamics and Nash Equilibria: Definitions and
in a direction that is favorable to it. We wil! see that these interactions can be Examples
viewed as a kind of local search procedure; analogues of local minima have a
natural meaning as well, but having multiple agents and multiple objectives To see how the option of sharing affects the behavior of the agents, let’s begin
introduces new challenges. by considering the pair of very simple examples in Figure 12.8. In example (a),
The field of game theory provides a natural framework in which to talk each of the two agents has two options for constructing a path: the middle route
about what happens in such situations, when a collection of agents interacts through v, and the outer route using a single edge. Suppose that each agent
strategically--in other words, with each trying to optimize an individual ob- starts out with an initial path but is continually evaluating the current situation
iective function. To illustrate these issues, we consider a concrete appl$cation, to decide whether it’s possible to switch to a better path.
motivated by the problem of routing in networks; along the way, we will in- In example (a), suppose the two agents start out using their outer paths.
troduce some notions that occupy central positions in the area of game theory Then tl sees no advantage in switching paths (since 4 < 5 + 1), but t2 does
more generally. (since 8 > 5 + 1), and so t2 updates its path by moving to the middle. Once
this happens, things have changed from the perspective of t~: There is suddenly
an advantage for tl in switching as well, since it now gets to share the cost of
J The Problem the middle path, and hence its cost to use the middle path becomes 2.5 + 1 < 4.
’In a network like the Internet, one frequently encounters situations in which Thus it will switch to the middle path. Once we are in a situation where both
a number of nodes all want to establish a connection to a single source
node s. For example, the source s may be generating some kind of data
stream that all the given nodes want to receive, as in a style of one-to-manY
network communication known as multicast. We will model this situation by
representing the underlying network as a directed graph G = (V, E), with a cost
ce >_ 0 on each edge. There is a designated source node s ~ V and ~ collection
of k agents located at distinct terminal nodes tl, t2 ..... tk ~ V. For simplicity,
we will not make a distinction between the agents and the nodes at which
they reside; in other words, we will think of the agents as being tl, t2 ..... tk.
Each agent tj wants to construct a path Pj from s to tl using as little total cost
as possible. ~: agents
Now, if there were no interaction among the agents, this would consist of
Ca) (b)
k separate shortest-path problems: Each agent tl would find an s-t~ path for
which the total cost of all edges is minimized, and use this as its path P~. What Figure 12.8 (a) It is in the two agents’ interest to share the middle path. (b) It would
be better for all the agents to share the edge on the left. But if all k agents start on the
makes this problem interesting is the prospect of agents being able to share the fight-hand edge, then no one of them will want to unilaterally move from right to left;
costs of edges. Suppose that after all the agents have chosen their paths, agent in other words, the solution in which all agents share the edge on the right is a bad
t] only needs to pay its "fair share" of the cost of each edge e on its path; that Nash equilibrium.
is, rather than paying ce for each e on P~, it pays ce divided by the number of
Chapter 12 Local Search 12.7 Best-Response Dynamics and Nash Equilibria
692 693
sides are using the middle path, neither has an incentive to switch, and so this a benevolent central authority that viewed all agents as equally important and
hence evaluated the quality of a solution by summing the costs they incurred.
is a stable solution.
Note that in both (a) and (b), there is a social optimum that is also a Nash
Let’s discuss two definitions from the area of game theory that capture
equilibrium, although in (b) there is also a second Nash equilibrium whose
what’s going on in this simple example. While we wil! continue to focus on cost is much greater.
our particular mulficast routing problem, these definitions are relevant to any
setting in which multiple agents, each with an individual objective, interact to
The Relationship to Local Search
produce a collective solution. As such, we will phrase the definitions in these
general terms. Around here, the connections to local search start to come into focus. A set of
agents following best-response dynamics are engaged in some kind of gradient
o First of all, in the example, each agent was continually prepared to descent process, exploring the "landscape" of possible solutions as they try tb
improve its solution in response to changes made by the other agent(s). minimize their individual costs. The Nash equilibria are the natural analogues
We will refer to this process as best-response dynamics. In other words, of local minima in this process: solutions from which no improving move is
we are interested in the dynamic behavior of a process in which each possible. And the "local" nature of the search is clear as well, since agents are
agent updates based on its best response to the current situation. only updating their solutions when it leads to an immediate improvement.
o Second, we are particularly interested in stable solutions, where the best Having said all this, it’s important to think a bit further and notice the
response of each agent is to stay put. We will refer to such a solution, crucial ways in which this differs from standard local search. In the beginning
from which no agent has an incentive to deviate, as a Nash equilibrium. of this chapter, it was easy to argue that the gradient descent algorithm for
(This is named after the mathematician John Nash, who won the Nobel a combinatorial problem must terminate at a local minimum: each update
Prize in economics for his pioneering work on this concept.) Hence, decreased the cost of the solution, and since there were only finitely many
in example (a), the solution in which both agents use the middle path possible solutions, the sequence of updates could not go on forever. In other
is a Nash equilibrium. Note that the Nash equilibria are precisely the words, the cost function itself provided the progress measure we needed to
solutions at which best-response dynamics terminate. establish termination.
The example in Figure 12.8(b) illustrates the possibility of multiple Nash In best-response dynamics, on the other hand, each agent has its own
equilibria. In this example, there are k agents that all reside at a common node persona! objective function to minimize, and so it’s not clear what overall
t (that is, t1 = t2 ..... tk = t), and there are two parallel edges from s to t with "progress" is being made when, for example, agent t~ decides to update its
different costs. The solution in which all agents use the left-hand edge is a Nash path from s. There’s progress for t~, of course, since its cost goes down, but
equilibrium in which all agents pay (1 + ~)/k. The solution in which all agents this may be offset by an even larger increase in the cost to some other agent.
use the fight-hand edge is also a Nash equilibrium, though here the agents each Consider, for example, the network in Figure 12.9. If both agents start on the
middle path, then tl will in fact have an incentive to move to the outer path;which Figure 12.9 A network in
its the
pay k/k = !. The fact that this latter solution is a Nash equilibrium exposes an unique Nash equi-
important point about best-response dynamics. If the agents could somehow cost drops from 3.5 to 3, but in the process the cost of t2 increases from 3.5 to 6.librium differs from the social
synchronously agree to move from the fight-hand edge to the left-hand one, (Once this happens, t2 will also move to its outer path, and this solution--with optimum.
they’d all be better off. But under best-response dynamics, each agent is only both nodes on the outer paths--is the unique Nash equilibrium.)
evaluating the consequences of a unilateral move by itself. In effect, an agent There are examples, in fact, where the cost-increasing effects of best-
isn’t able to make any assumptions about future actions of other agents--in response dynamics can be much worse than this. Consider the situation in
an Internet setting, it may not even know anything about these other agents Figure 12.10, where we have k agents that each have th~ option to take a
or their current solutions--and so it is only willing to perform updates that common outer path of cost 1 + e (for some small number e > 0), or to take their
lead to an immediate improvement for itself. own alternate path. The alternate path for tj has cost 1/]. Now suppose we start
To quantify the sense in which one of the Nash equilibria in Figure 12.8(13) with a solution in which all agents are sharing the outer path. Each agent pays
(1 + e)/k, and this is the solution that minimizes the total cost to all agents.
is better than the other, it is useful to introduce one further deflation. We
say that a solution is a social optimum if it minimizes the total cost to all But running best-response dynamics starting from this solution causes things
agents. We can think of such a solution as the one that would be imposed by to unwind rapidly. First tk switches to its alternate path, since 1/k < (1 + e)/k.
12.7 Best-Response Dynamics and Nash Equilibria 695
Chapter 12 Local Search
694 worse than the total cost under the social optimum. How much worse? The
total cost of the social optimum in this example is 1 + e, while the cost of the
unique Nash equilibrium is 1+ ½ + ½ +--. + ~ = Y~.~=I 7" 1 We encountered this
expression in Chapter 11, where we defined it to be the harmonic number H(k)
and showed that its asymptotic value is H(k) = ®(log k).
costS 1 + ~, while These examples suggest that one can’t really view the social optimum as
I the unique Nash~_ the analogue of the global minimum in a traditional local search procedure. In
eqnilibfium costs
standard local search, the global minimum is always a stable solution, since no
~.~uch more.
improvement is possible. Here the social optimum can be an unstable solution,
since it just requires one agent to have an interest in deviating.
on which it changes are those in P~ -Pi and those in Pi-P~. On the former set, C(Pl ..... Pk) ~ E ca <~ E ceH(xe) : (I)(Pl ..... Pk)
e~E+ e~E+
it increases by
and
q~(Pt ..... Pk) = E CeH(xe) < E Cell(k) = H(k). C(P1 ..... Pk)" []
e~E+ e~E+
and on the latter set, it decreases by Using this, we. can give a bound on the price of:stability.
e~i_~} xe (12’i7) In ~Very instance, there is a Nash equilibrium SolutiOn for which th~
total cost to all agents exceeds that of the Social optimum by at most a factor of
So the criterion that tj used for switching paths is precisely the statement that H(~).
the total increase is strictly less than the total decrease, and hence the potential
~ decreases as a result of tfs switch. [] Proof. To produce the desired Nash equilibrium, we start from a social op-
timum consisting of paths P~ ..... P~ and run best-response dynamics. By
Now there are only finitely many ways to choose a path for each agent tj, (12.15), this must terminate at a Nash equilibrium P1 ..... P/~.
and (12.14) says that best-response dynamics can never revisit a set of paths
Pk once it leaves it due to an improving move by some agent. Thus we During this run of best-response dynamics, the total cost to all agents may
have been going up, but by (12.14) the potential function was decreasing.
have shown the following. Thus we have q)(P~ ..... Pk) < aO(p~ ..... p~).
This is basically all we need since, for any set of paths, the quantities C
(12.15) Best-response dynamics always leads to a set of paths that forms a and q~ differ by at most a factor of H(k). Specifically,
Nash equilibrium solution.
C(P~ ..... Pk) < d°(P1 ..... Pk) < dp(p~ ..... P~) < H(k). C(P~ ..... P~). []
Bounding the Price of Stability Our potential function ¯ also turns out to
Thus we have shown that a Nash equilibrium always exists, and there is
be very useful in providing a bound on the price of stabi~ty. The point is that, always a Nash equilibrium whose total cost is within an H(k) factor of the
although ¯ is not equal to the total cost incurred by al! agents, it tracks it
social optimum. The example in Figure 12.10 shows that it isn’t possible to
reasonably closely. improve on the bound of H(k) in the worst case.
To see this, let C(P1 .....Pk) denote the total cost to all agents when the
Although this wraps up certain aspects of the problem very neatly, there
selected paths are P1 .....pk. This quantity is simply the sum of Ce over all are a number of questions here for which the answer isn’t known. One
edges that appear in the union of these paths, since the cost of each such edge
particularly intriguing question is whether it’s possible to construct a Nash
is completely covered by the agents whose paths contain it. equilibrium for this problem in polynomial time. Note that our proof of the
Now the relationship between the cost function C and the potential func- existence of a Nash equilibrium argued simply that as best-response dynamics
tion ¯ is as follows. iterated through sets of paths, it could never revisit the same set twice, and
hence it could not run forever. But there are exponenfial~ly many possible sets
(12.16) For any set of paths P~ ..... Pk, we have of paths, and so this does not give a polynomial-time algorithm. Beyond the
< question of finding any Nash equilibrium efficiently, there is also the open
C(P>. .. , P~) < ~(P~ ..... Pk) H(k) ¯ C(P>. .. , Pk).
question of efficiently finding a Nash equilibrium that achieves a bound of
ProoL Recall our notation in which Xe denotes the number of paths containing H(k) relative to the social optimum, as guaranteed by (!2.17).
edge e. For the proposes of comparing C and ~, we also define E+ to be the It’s also important to reiterate something that we mentioned earlier: It’s
set of all edges that belong to at least one of the paths P, ..... pk. Then, by not hard to find problems for which best-response dynamics may cycle forever
the definition of C, we have C(P~ ..... pk) = ~e,e+ Ce.
Solved Exercises 701
Chapter 12 Loca! Search
7OO
circle in the plane that contains all points in Si, and define center ci to
and for which Nash equihbria do not necessarily exist. We were fortunate be the center of this circle.
here that best-response dynamics could be viewed as iteratively improving a
If steps (i) and (ii) cause the covering radius to strictly decrease, then we
potential function that guaranteed our progress toward a Nash equilibrium, perform another iteration; otherwise the algorithm stops.
but the point is that potential functions like this do not exist for all problems
The alternation of steps (i) and (ii) is based on the following natura!
in which agents interact. interplay between sites and centers. In step (i) we partition the sites as wel! as
Finally, it’s interesting to compare what we’ve been doing here to a prob-
possible given the centers; and then in step (ii) we place the centers as welt
lem that we considered earlier in this chapter: finding a stable configuration
as possible given the partition of the sites. In addition to its role as a heuristic
in a Hopfield network. If you recall the discussion of that earlier problem, we
for placing facilities, this type of two-step interplay is also the basis for local
analyzed a process in which each node "flips" between two possible states,
search algorithms in statistics, where (for reasons we won’t go into here) it is
seeking to increase the total weight of "good" edges incident to it. This can
in fact be viewed as an instance of best-response dynamics for a problem in called the Expectation Maximization approach.
which each node has an obiective function that seeks to maximize this mea- (a) Prove that this local search algorithm eventually terminates.
sure of good edge weight. However, showing the convergence of best-response (b) Consider the following statement.
dynamics for thehere:
lenge we faced Hopfield
Therenetwork
it turnedproblem
out thatwas much easier than
the state-flipping the chal-
process was There is an absolute constant b > 1 (independent of the particular input
function obtained instance), so when the local search algorithm terminates, the covering
in fact a’ msgmse form of local search with an objective radius of its solution is at most b times the optimal covering radius.
simply by adding together the objective functions of all nodes--in effect, the
analogue of the total cost to all agents served as a progress measure. In the Decide whether you think this statement is true or false, and give a proof
present case, it was precisely because this total cost function did not work of either the statement or its negation.
as a progress measure that we were forced to embark on the more ~complex Solution To prove part (a), one’s first thought is the following: The set of
analysis described here. covering radii decreases in each iteration; it can’t drop below the optimal
covering radius; and so the iterations must terminate. But we have to be a
bit careful, since we’re dealing with real numbers. What if the covering radii
Solved Exercises decreased in every iteration, but by less and less, so that the algorithm was
able to run arbitrarily long as its covering radii converged to some value from
Solved Exercise 1
The Center Selection Problem from Chapter 11 is another case in which one above?
can study the performance of local search algorithms. It’s not hard to take care of this concern, however. Note that the covering
Here is a simple local search approach to Center Selection (ind.eed, it’s a radius at the end of step (ii) in each iteration is completely determined by the
current partition of the sites into S1, S2 ..... Sk. There are a finite number of
common strategy for a variety of problems that involve locating facilities). In
sn} in the plane, and we ways to partition the sites into k sets, and if the local search algorithm ran
this problem, we are given a set of sites S = {sl, s2ck]
.....whose covering radius-- for more than this number of iterations, it would have to produce the same
want to choose a set of k centers C = {q, c2 .....
partition in two of these iterations. But then it would have the same covering
the farthest that people in any one site must travel to their nearest center--is
radius at the end of each of these iterations, and this contradicts the assumption
as small as possible. that the covering radius strictly decreases from each iteLation to the next.
We start by arbitrarily choosing k points in the plane to be the centers
This proves that the algorithm always terminates. (Note that it only gives
ck. We now alternate the following two steps.
an exponential bound on the number of iterations, however, since there are
(i) Given the set of k centers Q, cz ..... ck, we divide S into k sets: For exponentially many ways to partition the sites into k sets.)
i = 1, 2 ..... k, we define Si to be the set of all the sites for which Q is To disprove part (b), it would be enough to find a run of the algorithm in
the closest
Given this center.
division of S into k sets, construct new centers that will be as which the iterations gets "stuck" in a configuration with a very large covering
(ii) radius. This is not very hard to do. For any constant b > 1, consider a set S
"central" as possible relative to them. For each set Si, we find the smallest
Exercises
702 Chapter 12 Local Search 703
2..-Recall that for a problem in which the goal is to maximize some under-
of four points in the plane that form the corners of a tal!, narrow rectangle of lying quantity, gradient descent has a natural "upside-down" analogue,
width w and height h = 2bw. For example, we could have the four points be in which one repeatedly moves l!rom the current solution to a solution
(0, 0), (0, h), (w, h), (w, 0). of strictly greater value. Naturally, we could call this a gradient ascent
Now suppose k = 2, and we start the two centers anywhere to the left and algorithm. (Often in the literature you’ll also see such methods referred
right of the rectangle, respectively (say, at (-1, h/2) and (w + 1, h/2)). The to as hill-climbing algorithms.)
first iteration proceeds as follows. By straight symmetry, the observations we’ve made in this chapter
o Step (i) will divide S into the two points $1 on the left side of the rectangle about gradie.nt descent carry over to gradient ascent: For many problems
(with x-coordinate 0) and the two points $2 on the right side of the you can easily end up with a local optimum that is not very good. But
rectangle (with x-coordinate w). sometimes one encounters problems--as we sd~v, for example, with
o Step (ii) will place centers at the midpoints of $1 and $2 (i.e., at (0, hi2) the Maximum-Cut and Labeling Problems--for which a local search
algorithm comes with a very strong guarantee: Every local optimum is
and (w, h/2)). close in value to the global optimum. We now consider the Bipartite
We can check that in the next iteration, the partition of S will not change, and Matching Problem and find that the same phenomenon happens here as
so the locations of the centers Will not change; the algorithm terminates here well.
at a local minimum. Thus, consider the following Gradient Ascent Algorithm for finding
The covering radius of this solution is h/2. But the optimal solution~ wou!d a matching in a bipartite graph.
place centers at the midpoints of the top and bottom sides of the rectangle, for a
As long as there is an edge whose endpoints are unmatched, add it to
covering radius of w/2. Thus the covering radius of our solution is h/w = 2b > b
the current matching. When there is no longer such an edge, terminate
times that of the optimum. with a locally optimal matching.
(a) Give an example of a bipartite graph G for which this gradient ascent
Exercises algorithm does not return the maximum matching.
(b) Let M and M’ be matchings in a bipartite graph G. Suppose that
Consider the problem of finding a stable state in a Hop field neural
network, in the special case when all edge weights are positive. This IM’I > 2IMI. Show that there is an edge e’ ~ M’ such that M U {e’} is
corresponds to the Maximum-Cut Problem that we discussed earlier in a matching in G.
the chapter: For every edge e in the graph G, the endpoints of G would (c) Use (b) to conclude that any locally optimal matching returned by
prefer to have opposite states. the gradient ascent algorithm in a bipartite graph G is at least half
Now suppose the underlying graph G is connected and bipartite; the as large as a maximum matching in G.
nodes can be partitioned into sets X and Y so that each edge has.one Suppose you’re consulting for a biotech company that runs experiments
end in X and the other in Y. Then there is a natural "best" configuration ......... on two expensive high-throughput assay machines, each identical, which
for the Hopfield net, in which all nodes in X have the state +1 and all we’ll label M1 and M2. Each day they have a number of jobs that they
nodes in Y have the state -1. This way, all edges are good, in that their need to do, and each job has to be assigned to one of the two machines.
The problem they need help on is how to assign the jobs to machines to
ends have opposite states.
keep the loads balanced each day. The problem is staied as follows. There
The question is: In this special case, when the best configuration is
are n jobs, and each job j has a required processing time tj. They need
so clear, will the State-Flipping Algorithm described in the text (as long to partition the jobs into two groups A and B, where set A is assigned
as there is an unsatisfied node, choose one and flip its state) always find
to M1 and set B to M2. The time needed to process all of the jobs on the
this configuration? Give a proof that it will, or an example of an input
instance, a starting configuration, and an execution of the State-Flipping two machines is T~ = Y~,j~a tj and T2 = y~,]s~ tj. The problem is to have
Algorithm that terminates at a configuration in which not all edges are the two machines work roughly for the same amounts of time--that is,
to minimize [T~ - T2[.
good.
Notes and Further Reading 705
Chapter 12 Local Search
7O4
A previous consultant showed that the problem is NP-hard (by a and update A(]) to be A(j) U B(i) -B(j). (One is allowed to have
reduction from Subset Sum). Now they are looking for a good local search B(i) = A(i), or to have B(i) be the empty set; and analogously for
B(j).) :~
algorithm. They propose the following. Start by assigning jobs to the
two machines arbitrarily (say jobs ! ..... rt/2 to M1, the rest to M2)- The Consider a swap move applied to machines M~ and Mj. Suppose the
local moves are to move a single job from one machine to the other, and loads on Mi and Mj before the swap are T~ and Tj, respectively, and
we only move jobs if the move decreases the absolute difference in the the loads after the swap are T[ and T~. We say that the swap move is
processing times. You are hired to answer some basic questions about improving if max(T~, Tj’) < max(Ti, Tj)--in other words, the larger of the
two loads involved has strictly decreased. We say that an assignment
the performance of this algorithm. of jobs to machines is stable if there does not exist an improving swap
(a) The first question is: How good is the solution obtained? Assume move, beginning with the current assignment.
that there is no single job that dominates all the processing time--
that is, that tj <_ ½ ~1 t; for all jobs j. Prove that for every locally Thus the local search heuristic simply keeps executing improving
optimal solution, the times the two machines operate are roughly swap moves until a stable assignment is reached; at this point, the
resulting stable assignment is returned as the solution.
balanced: ½T1 <_ T2 <_ 2Tt.
(b) Next you worry about the running time of the algorithm: How often Example. Suppose there are two machines: In the current assignment,
will jobs be moved back and forth between the two machines? You the machine M1 has jobs of sizes 1, 3, 5, 8, and machine M2 has jobs of
propose the fol!owing small modification in the algorithm.. If, in sizes 2, 4. Then one possible improving swap move would be to define
a local move, many different jobs can move from one machine to B(1) to consist of the job of size 8, and define B(2) to consist of the job
the other, then the algorithm should always move the job j with of size 2. After these two sets are swapped, the resulting assignment has
maximum tj. Prove that, under this variant, each job will move at jobs of size 1, 2, 3, 5 on M1, and jobs of size 4, 8 on M2. This assignment
is stable. (It also has an optimal makespan of 12.)
most once. (Hence the local search terminates in at most n moves.)
(c) Finally, they wonder if they should work on better algorithms. Give (a) As specified, there is no explicit guarantee that this local search
an example in which the local search algorithm above will not lead heuristic will always terminate. What if it keeps cycling forever
through assignments that are not stable?
to an optimal solution.
Consider the Load Balancing Problem from Section 11.1. Some friends Prove that, in fact, the local search heuristic terminates in a finite
4. number of steps, with a stable assignment, on any instance.
of yours are running a collection of Web servers, and they’ve designed
a local search heuristic for this problem, different from the algorithms (b) Show that any stable assignment has a makespan that is within a
described in Chapter 11. factor of 2 of the minimum possible makespan.
Recall that we have m machines M1 ..... Mm, and we must assign
each job to a machine. The load of the ith job is denoted ti. The makespan
of an assignment is the maximum load on any machine: Notes and Further Reading
max ~ Kirkpatrick, Gelatt, and Vecchi (1983) introduced simulated annealing, build-
machines Mi jobs ] assigned to Mi ing on an algorithm of Metropolis et al. (1953) for simulating physical systems.
In the process, they highlighted the analogy between energy landscapes and
Your friends’ local search heuristic works as follows. They start with the solution spaces of computational problems.
an arbitrary assignment of jobs to machines, and they then repeatedly The book of surveys edited by Aarts and Lenstra (1997) covers a wide range
try to apply the following type of "swap move." of applications of local search techniques for algorithmic problems. Hopfield
Let A(i) and A(j) be the jobs asmgned to machines Mi and Mj, neural networks were introduced by Hopfield (1982) and are discussed in
respectively. To perform a swap move on Mi and Mj, choose subsets more detail in the book by Haykin (1999). The heuristic for graph partitioning
of jobs B(i) c__ A(j) and B(j) ~ A(]), and "swap" these jobs between discussed in Section 12.5 is due to Kernighan and Lin (1970).
the two machines. That is, update A(i) to be A(i)U B(])- B(i),
Chapter 12 Local Search
706
The local search algorithm for classification based on the Labeling Problem
is due to Boykov, Veksler, and Zabih (1999). Further results and computational
experiments are discussed in the thesis by Veksler (1999).
The multi-agent routing problem considered in Section 12.7 raises issues
at the intersection of algorithms and game theory, an area concerned with
the general issue of strategic interaction among agents. The book by Osborne
(2003) provides an introduction to game theory; the algorithmic aspects of the
sub)ect are discussed in surveys by Papadimitriou (2001) and Tardos (2004)
and the thesis and subsequent book by Roughgarden (2002, 2004). The use
of potential functions to prove the existence of rash equilibria has a long
history in game theory (Beckmann, McGulre, and Winsten, 1956; Rosenthal
1973), and potential functions were used to analyze best-response dynamics
by Monderer and Shapley (1996). The bound on the price of stability for the
routing problem in Section 12.7 is due to Anshelevich et al. (2004).
The idea that a process can be "random" is not a modern one; we can trace
the notion far back into the history of human thought and certainly see its
reflections in gambling and the insurance business, each of which reach into
ancient times. Yet, while similarly intuitive subjects like geometry and logic
have been treated mathematically for several thousand years, the mathematical
study of probability is surprisingly young; the first known attempts to seriously
formalize it came about in the 1600s. Of course, the history of computer science
plays out on a much shorter time scale, and the idea of randomization has been
with it since its early days.
Randomization and probabilistic analysis are themes that cut across many
areas of computer science, including algorithm design, and when one thinks
about random processes in the context of computation, it is usually in one of
two distinct ways. One view is to consider the world as behaving randomly:
One can consider traditional algorithms that confront randomly generated
input. This approach is often termed average-case analysis, since we are
studying the behavior of an algorithm on an "average" input (subject to some
underlying random process), rather than a worst-case input:
A second view is to consider algorithms that behave randomly: The world
provides the same worst-case input as always, but we allow our algorithm to
make random decisions as it processes the input. Thus the role of randomiza-
tion in this approach is purely internal to the algorithm and does not require
new assumptions about the nature of the input. It is this notion of a randomized
algorithm that we will be considering in this chapter.
13.1 A First Application: Contention Resolution 709
Chapter 13 Randomized Algorithms
708
we will be using for many of the algorithms that follow. In particular, it is a
Why might it be useful to design an algorithm that is allowed to make chance to work through some basic manipulations involving events and their
random decisions? A first answer would be to observe that by allowing ran- probabilities, analyzing intersections of events using independence as well as
domization, we’ve made our underlying model more powerful. Efficient de- unions of events using a simple Union Bound. For the sake of completeness,
terministic algorithms that always yield the correct answer are a special case we give a brief summary of these concepts in the final section of this chapter
of efficient randomized algorithms that only need to yield the correct answer (Section 13.!5).
with high probability; they are also a special case of randomized algorithms
that are always correct, and run efficiently in expectation. Even in a worst- ~ The Problem
case world, an algorithm that does its own "internal" randomization may be
Suppose we have n processes P1, P2 ..... Pn, each competing for access to
able to offset certain worst-case phenomena. So problems that may not have
a single shared database. We imagine time as being divided into discrete
been solvable by efficient deterministic algorithms may still be amenable to
rounds. The database has the property that it can be accessed by at most
randomized aigorithras. one process in a single round; if two or more processes attempt to access
But this is not the whole story, and in fact we’ll be looking at randomized it simultaneously, then all processes are "locked out" for the duration of that
algorithms for a number of problems where there exist comparably efficient de- round. So, while each process wants to access the database as often as possible,
terministic algorithms. Even in such situations, a randomized approach often it’s pointless for all of them to try accessing it in every round; then everyone
exhibits considerable power for further reasons: It may be conceptually mu_ch will be perpetually locked out. What’s needed is a way to divide up the rounds
simpler; or it may allow the algorithm to function while maintaining very little among the processes in an equitable fashion, so that all processes get through
internal state or memory of the past. The advantages of randomization’ seem to the database on a regular basis.
to increase further as one considers larger computer systems and networks,
If it is easy for the processes to communicate with one another, then one
with many loosely interacting processes--in other words, a distributed sys-
. e random behavior on the part of individual processes can reduce the can imagine all sorts of direct means for resolving the contention. But suppose
tern Her ........... ~ ..... ,Tnchronlzation that is reqmred,
" it is that the processes can’t communicate with one another at all; how then can
amount o IP ex liCit colnlllUlUU~uu~ ~ ~
....... ~,~o,~v~,o amon~ processes, reducing the they work out a protocol under which they manage to "take turns" in accessing
often valuable as a toot for symm~u y-u, ~ ...... o ~, ~ the database?
danger of contention and "hot spots." A number of our examples will come
from settings like this: regulating access to a shared resource, balancing load
on multiple processors, or routing packets through a network. Even a small f! Designing a Randomized Algorithm
level of comfort with randomized heuristics can give one considerable leverage Randomization provides a natural protocol for this problem, which we can
in thinking about large systems. specify simply as follows. For some number p > 0 that we’ll determine shortly,
A natural worry in approaching the topic of randomized algorithms is that each process will attempt to access the database in each round with probability
it requires an extensive knowledge of probability. Of course, it’s always better p, independently of the decisions of the other processes. So, if exactly one
to know more rather than less, and some algorithms are indeed based.on process decides to make the attempt in a given round, it will succeed; if
complex probabilistic ideas. But one further goal of this chapter is to illustrate two or more try, then they will all be locked out; and if none try, then the
how little underlying probability is really needed in order to understand many round is in a sense "wasted." This type of strategy, in which each of a set
of the well-known algorithms in this area. We will see that there is a small set of identical processes randomizes its behavior, is the core of the symmetry-
of useful probabilistic tools that recur frequently, and this chapter will try to breaking paradigm that we mentioned initially: If all the processes operated
develop the tools alongside the algorithms. Ultimately, facility with these tools in lockstep, repeatedly trying to access the database at th~ same time, there’d
is as valuable as an understanding of the specific algorithms themselves. be no progress; but by randomizing, they "smooth out" the contention.
Using (!3.1), we see that 1/(en) < Pr [g[i,t]] < !/(2n), and hence
Pr [g [i, t]] is asymptotically equal to ®(l/n).
Our real concern is whether a process succeeds in accessing the database in
Waiting for a Particular Process to Succeed Let’s consider this protocol with
a given round. Let g [i, t] denote this event. Clearly, the process Pi must attempt the optimal value p = 1In for the access probability. Suppose we are interested
an access in round t in order to succeed. Indeed, succeeding is equivalent to
in how long it will take process Pi to succeed in accessing the database at least
the following: Process P~ attempts to access the database in round t, and each
once. We see from the earlier calculation that the probability of its succeeding
other process does not attempt to access the database in round t. Thus $[i, t] is
in any one round is not very good, if n is reasonably large. How about if we
equal to the intersection of the event A [i, t] with all the complementary events
consider multiple rounds?
A [j, t----~, for j # i:
Let 5=[i, t] denote the "failure event" that process Pi does not succeed
in any of the rounds 1 through t. This is clearly just the intersection of
~[,,t]=A[i, tln (~’~A----O~,t]t¯ the complementary events 8 [i, rj for r = 1, 2 ..... t. Moreover, since each of
\J#~ / these events is independent, we can compute the probability of ~[i, t] by
All the events in this intersection are independent, by the definition of the multiplication:
contention-resohition protocol. Thus, to get the probability of g [i, t]o we can
muitiply the probabilities of all the events in the intersection: Pr[9:[i,t]]=Pr =yIPr = 1-1 1- .
r=l r=l n
Pr [$[i, t]] : Pr [A[i, t]]" 1-[ Pr ~~~ = p(1- p)n-I. This calculation does give us the value of the probability; but at this point,
~#~ we’re in danger of ending up with some extremely complicated-looking ex-
We now have a nice, closed-form expression for the probability that Pi pressions, and so it’s important to start thinking asymptotically. Recal! that
the probability of success was ®(l/n) after one round; specifically, it was
succeeds in accessing the database in round t; we can now ask how to set p bounded between 1/(en) and 1/(2n). Using the expression above, we have
so that this success probability is maximized. Observe first that the success
probability is 0 for the extreme cases p = 0 and p -= 1 (these correspond to the
Pr [9:[i, t]] = H Pr <1-
extreme case in which processes never bother attempting, and the opposite ~’"
/’~1 --
extreme case in which every process tries accessing the database in every
Now we notice that if we set t = en, then we have an expression that can be
round, so that everyone is locked out). The function f(P)=p(1- p)n-1 is
plugged directly into (13.1). Of course en will not be an integer; so we can
positive for values of p strictly between 0 and 1, and its derivative f’(P) =
t take t = [en] and write
(1 -p)n-~ _ (n.- 1 pr
the maximum ~s ac ¯
setting p = 1/n. (Notice that p = 1/n is a natural intuitive choice as well, if one Pr[9:[i,t]]< 1-~( <1)[en] ( l"~enl
1-)-~n/ <-
e
wants exactly one process to attempt an access in any round.)
13.1 A First Application: Contention Resolution
Chapter 13 Randomized Algorithms 713
712
we’re trying to avoid, and we want a bound on its probability in terms of
This is a very compact and useful asymptotic statement: The probabiliW
constituent "bad events" on the fight-hand side.
that process Pi does not succeed in any of rounds 1 through [en] is upper-
bounded by the constant e-1, independent of n. Now, if we increase t by some For the case at hand, recall that St = [-JP=I S[i, t], and so
fairly small factors, the probability that Pi does not succeed in any of rounds
I through t drops precipitously: If we set t --- [en] . (c In n), then we have Pr [St] _< ~ Pr [S[i, t]].
i=l
~_ e-c ln n ~ n-C.
(1)t (( l ~[enl~c’nn The expression on the right-hand side is a sum of n terms, each with the same
value; so to make ~he probability of St small, we need to make each of the
terms on the right significantly smaller than 1in. From our earlier discussion,
So, asymptotically, we can view things as follows. After ®(n) rounds, we see that choosing t = ®(n) will not be good enough, since then each term
the probability that Pi has not yet succeeded is bounded by a constant; and on the fight is only bounded by a constant. If we choose t = [en] . (c In n),
between then and ®(n in n), this probability drops to a quantity that is quite then we have Pr IS[i, t]] < n-c for each i, which is wi~at we want. Thus, in
small, bounded by an inverse polynomial in n. particular, taking t = 2 [en] In n gives us
Waiting for All Processes to Get Through Finally, we’re in a position to ask
the question that was implicit in the overall setup: How many rounds must Pr [St] _< ~ Pr IS[i, t]] _< n-n-2= n-1,
elapse before there’s a high probability that all processes will have succeeded i=1
in accessing the database at least once? and so we have shown the following.
To address this, we say that the protocol fails after t rounds if some process
has not yet succeeded in accessing the database. Let St denote the event that (13.3) With probability at least 1 - n-~, all processes succeed in accessing
the protocol fails after t rounds; the goal is to find a reasonably smallvalue of the database at least once within t = 2 [en] Inn rounds.
t for which Pr [St] is small.
The event St occurs if and only if one of the events S[i, t] occurs; so we
An interesting observation here is that if we had chosen a value of t equal
can write to qn In n for a very small value of q (rather than the coefficient 2e that we
actually used), then we would have gotten an upper bound for Pr [SF[i, t]] that
St = ~_J S[i, t]. was larger than n-~, and hence a corresponding upper bound for the overall
i-----1
failure probability Pr [St] that was larger than 1--in other words, a completely
Previously, we considered intersections of independent events, which were worthless bound. Yet, as we saw, by choosing larger and larger values for
very simple to work with; here, by contrast, we have a union of events that are the coefficient q, we can drive the upper bound on Pr [St] down to n-c for
not independent. Probabilities of unions like this can be very hard tO compute any constant c we want; and this is really a very tiny upper bound. So, in a
exactly, and in many settings it is enough to analyze them using a simple Union sense, all the "action" in the Union Bound takes place rapidly in the period
Bound, which says that the probability of a union of events is upper-bounded when t = ®(n In n); as we vary the hidden constant inside the ®(-), the Union
by the sum of their individual probabilities: Bound goes from providing no information to giving an extremely strong upper
bound on the probability.
(13.z) (The Union Bound) Given events El, E2 ..... En, we have
We can ask whether this is simply an artifact of usirig the Union Bound
for our upper bound, or whether it’s intrinsic to the process we’re observing.
Although we won’t do the (somewhat messy) calculations here, one can show
i=1 i=1
that when t is a small constant times n In n, there really is a sizable probability
that some process has not yet succeeded in accessing the database. So a
Note that this is not an equality; but the upper bound is good enough rapid falling-off in the value of Pr [St] genuinely does happen over the range
when, as here, the union on the left-hand side represents a "bad event" that t = ®(n In n). For this problem, as in many problems of this flavor, we’re
13.2 Finding the Global Minimum Cut 715
Chapter 13 Randomized Mgofithms
714
since both sides A and B are nonempty, and s belongs to only one of them.
really identifying the asymptotically "correct" value of t despite our use of the So we fix any s ~ V and compute the minimum s-t cut in G’ for every other
seemingly weak Union Bound. node t ~ V-{s}. This is n - 1 directed minimum-cut computations, and the
best among these will be a global rain-cut of G. []
13.2 Finding the Global Minimum Cut
Randomization naturally suggested itself in the previous example, since we The algorithm in (13.4) gives the strong impression that finding a global
were assuming a model with many processes that could not directly commu- rain-cut in an undirected graph is in some sense a harder problem than finding
nicate. We now look at a problem on graphs for which a randomized approach a minimum s-t cut in a flow network, as we had to invoke a subroutine for the
comes as somewhat more of a surprise, since it is a problem for which perfectly latter problem n -’1 times in our method for solving the former. But it turns out
reasonable deterministic algorithms exist as well. that this is just an illusion. A sequence of increasingly simple algorithms in the
late 1980s and early 1990s showed that global rain-cuts in undirected graphs
could actually be computed just as efficiently as s-t cuts or even more so, and by
~ The Problem techniques that didn’t require augmenting paths or even ~ notion of flow. The
Given an undirected graph G = (V, E), we define a cut of G to be a partition high point of this line of work came with David Karger’s discovery in 1992 of
of V into two non-empty sets A and B. Earlier, when we looked at network the Contraction Algorithm, a randomized method that is qualitatively simpler
flows, we worked with the closely related definition of an s-t cut: there, given
than all previous algorithms for global rain-cuts. Indeed, it is sufficiently simple
a directed graph G = (V, E) with distinguished source and sink nodes s and t, that, on a first impression, it is very hard to believe that it actually works.
an s-t cut was defined to be a partition of V into sets A and B such that s ~ A
and t ~ B. Our definition now is slightly different, since the underlying graph
is now undirected and there is no source or sink.
~ Designing the Algorithm
For a cut (A, B) in an undirected graph G, the size of (A, B) is the number of Here we describe the Contraction Algorithm in its simplest form. This version,
edges with one end in A and the other in B. A global minimum cut (or "global while it runs in polynomial time, is not among the most efficient algorithms
rain-cut" for short) is a cut of minimum size. The term global here is meant for global rain-cuts. However, subsequent optimizations to the algorithm have
to connote that any cut of the graph is allowed; there is no source or sink. given it a much better running time.
Thus the global rain-cut is a natural "robustness" parameter; it is the smallest The Contraction Algorithm works with a connected multigraph G = (V, E);
number of edges whose deletion disconnects the graph. We first check that this is an undirected graph that is allowed to have multiple "parallel" edges
network flow techniques are indeed sufficient to find a global rain-cut. between the same pair of nodes. It begins by choosing an edge e = (u, v) of G
uniformly at random and contracting it, as shown in Figure 13.1. This means
(13.4) There is a polynomial-time algorithm to find a global rain-cut in an we produce a new graph G’ in which u and v have been identified into a single
undirected graph G. new node w; all other nodes keep their identity. Edges that had one end equal
Proof. We start from the similarity between cuts in undirected graphs and s-t to u and the other equal to v are deleted from G’. Each other edge e is preserved
cuts in directed graphs, and with the fact that we know how to find the latter in G’, but if one of its ends was equal to u or v, then this end is updated to be
equal to the new node w. Note that, even if G had at most one edge between
optimally.
any two nodes, G’ may end up with parallel edges.
So given an undirected graph G = (V, E), we need to transform it so that
there are directed edges and there is a source and sink. We first replace every The Contraction Algorithm then continues recursively on G’, choosing
undirected edge e = (u, v) ~ E with two oppositely oriented directed edges, an edge uniformly at random and contracting it. As these recursive calls
e’= (u, u) and e"= (v, u), each of capacity 1. Let G’ denote the resulting proceed, the constituent vertices of G’ should be viewed as supernodes: Each
supernode w corresponds to the subset S(w) g. V that has been "swallowed
directed graph.
up" in the contractions that produced w. The algorithm terminates when
Now suppose we pick two arbitrary nodes s, t ~ V, and find the minimum
it reaches a graph G’ that has only two supernodes v1 and u2 (presumably
s-t cut in G’. It is easy to check that if (A, B) is this minimum cut in G’, then
with a number of parallel edges between them). Each of these super-nodes ui
(A, B) is also a cut of minimum size in G among all those that separate s from
has a corresponding subset S(ui) c_ V consisting of the nodes that have been
t. But we know that the global rain-cut in G must separate s from something,
13.2 Finding the Global Minimum Cut 717
Chapter 13 Randomized Algorithms
716
in B. We want to give a lower bound on the probability that the Contraction
Algorithm returns the cut (A, B).
Consider what could go wrong in the first step of the Contraction Algo-
rithm: The problem would be if an edge in F were contracted. For then, a node
of A and a node of B would get thrown together in the same supernode, and
(A, B) could not be returned as the output of the algorithm. Conversely, if an
edge not in F is contracted, then there is still a chance that (A, B) could be
~
So what we want is an upper bound on the probability that an edge in F is
Figure 13.1 The Contraction Algorithm applied to a four-node input graph. contracted, and for this we need a lower bound on the size of E. Notice that if
any node v had degree less than k, then the cut ({v}, V- {v}) would have size
contracted into it, and these two sets S(Vl) and S(v2) form a partition of V. We less than k, contradicting our assumption that (A, B) is a global rain-cut. Thus
output (S(Vl), S(v2)) as the cut found by the algorithm. every node in G has degree at least k, and so IEI > ½kn. Hence the probability
that an edge in F is contracted is at most
The Contraction Algorithm applied to a multigraph G = (V, E): k2
For each node u, we will record ½kn - n"
the set S(u) of nodes that have been contracted into
Initially S(u) = {~} for each u Now consider the situation after j iterations, when there are rt -j super-
If G has two nodes uI ~nd v2, then return the cut (S(ul), S(u2)) nodes in the current graph G’, and suppose that no edge in F has been
Else choose an edge e = (u,u) of G uniformly at random contracted yet. Every cut of G’ is a cut of G, and so there are at least k edges
Let G’ be the graph resulting from the contraction of e~ incident to every supernode of G’. Thus G’ has at least ½k(n- j) edges, and
with a new node ray replacing u and u so the probability that an edge of F is contracted in the next iteration j ÷ 1 is
Define S(zuv) = S(u) U S(u)
at most
Apply the Contraction Algorithm recursively to G’ 2
Endif ½k(n -j) n -j"
The cut (A, B) will actually be returned by the algorithm if no edge
/,~ Analyzing the Algorithm n- 2. If we write E] for
The algorithm is making random choices, so there is some probabil!ty that it the event that an edge of F is not contracted in iteration j, then we have
will succeed in finding a global min-cut and some probability that it won’t. One shown Pr [El] > 1 - 2/n and Pr [Ej+1 I E1 A E2---~ Ej] _> 1 - 2/(n -j). We are
might imagine at first that the probability of success is exponentially small. interested in lower-bounding the quantity Pr [El C~ E2"-. Cl En_2], and we
After all, there are exponentially many possible cuts of G; what’s favoring the can check by unwinding the formula for conditional probability that this is
minimum cut in the process? But we’ll show first that, in fact, the success equal to
probability is only polynomially small. It will then follow that by running the
algorithm a polynomial number of times and returning the best cut found in
any run, we can actually produce a global rain-cut with high probability.
(13.5) The Contraction Algorithm returns a global rain-cut of G with proba-
1n
bility at least /(2).
There is something genuinely surprising about the statement of (13.15). Now let k’ denote the largest natural number that is strictly smaller than ~k.
e The right-hand side of the above equation only increases if we replace the
We have arrived at a nonobvious fact about a-SATJth existence of an
assignment satisfying many clauses--whose statement has nothing to do with terms in the first sum by k’pj and the terms in the second sum by kpj. We also
randomization; but we have done so by a randomized construction. And, observe that E .pJ = 1 -p, and so
in fact, the randomized construction provides what is quite possibly the j<7k18
simplest proof of (13.15). This is a fairly widespread principle in the area
of combinatoricsJnamely, that one can show the existence of some structure
by showing that a random construction produces it with positive probability. j<7k/8 j>7k/8
Constructions of this sort are said to be applications of the probabilistic method. and hence kp > 7~ k - k’. But } k - k’ > I, since k’ is a natural number strictly
Here’s a cute but minor application of (13.15): Every instance of 3-SAT smaller than ~ times another natural number, and so
with at most seven clauses is satisfiable. Why? If the instance has k < 7 clauses,
then (13.15) implies that there is an assignment satisfying at least ~k of them. ~k -- k’ 1
p> >
k - 8k
But when k < 7, it follows that ~k > k - 1; and since the number of clauses
satisfied by this assignment must be an integer, it must be equal to k. In other This was our goal--to get a lower bound on p--and so by the waiting-time
words, all clauses are satisfied. bound (13.7), we see that the expected number of trials needed to find the
satisfying assignment we want is at most 8k.
"middle position"; thus we define things precisely as follows: The median of The kth largest element lies in S+
S = {al, a2 ..... an} is equal to the kth largest element in S, where k = (n + 1)/2 Recursively call Select(S+, k - 1 - 4)
Endif
if n is odd, and k = n/2 if n is even. In what follows, we’ll assume for the sake
of simplicity that all the numbers are distinct. Without this assumption, the
problem becomes notationallY more complicated, but no new ideas are brought
into play. Observe that the algorithm is always called recursively on a strictly smaller set,
It is clearly easy to compute the median in time O(n log n) if we simply so it must terminate. Also, observe that if ISl = 1, then we must have k = 1,
sort the numbers first. But if one begins thinking about the problem, it’s far and indeed the sirfgle element in S will be returned by the algorithm. Finally,
from clear why sorting is necessary for computing the median, or even why from the choice of which recursive call to make, it’s clear by induction that the
fl(n log n) time is necessary. In fact, we’ll show how a simple randomized right answer will be returned when ]SI > I as well. Thus we have the following
approach, based on divide-and-conquer, yields an expected running time of
O(n).
(13.17) Regardless of how the splitter is chosen, the algorithm above returns
the kth largest element of S.
~ Designing the Algorithm
A Generic Algorithm Based on Splitters The first key step toward getting
an expected linear running time is to move from median-finding to the more
general problem of selection. Given a set of n numbers S and a nun/bet k Choosing a Good Splitter Now let’s consider how the running time of Select
between 1 and n, consider the function Select(S, k) that returns the kth largest depends on the way we choose the splitter. Assuming we can select a splitter
element in S. As special cases, Select includes the problem of finding the in linear time, the rest of the algorithm takes linear time plus the time for the
median of S via Select(S, n/2) or Select(S, (n + 1)/2); it also includes the recursive call. But how is the running time of the recursive call affected by the
easier problems of finding the minimum (Select(S, 1)) and the maximum choice of the splitter? Essentially, it’s important that the splitter significantly
(Select(S, n)). Our goal is to design an algorithm that implements Select so reduce the size of the set being considered, so that we don’t keep making
that it runs in expected time O(n). passes through large sets of numbers many times. So a good choice of splitter
The basic structure of the algorithm implementing Select is as follows. should produce sets S- and S+ that are approximately equal in size.
We choose an element aie S, the "splitter," and form the sets S- = {a] : aj < ai} For example, if we could always choose the median as the splitter, then
and S+ = {aj : a~ > ai}. We can then determine which of S- or S+ contains the we could show a linear bound on the running time as follows. Let cn be the
kth largest element, and iterate only on this one. Without specifying yet how running time for Select, not counting the time for the recursive cal!. Then,
we plan to choose the splitter, here’s a more concrete description of how we with medians as splitters, the running time T(n) would be bounded by the
form the two sets and iterate. recurrence T(n) < T(n/2) + cn. This is a recurrence that we encountered at the
beginning of Chapter 5, where we showed that it has the solution T(n) = O(n).
Select (S, k) : Of course, hoping to be able to use the median as the splitter is rather
Choose a splitter aieS circular, since the median is what we want to compute in the first place! But,
For each element a] of S in fact, one can show that any "well-centered" element can serve as a good
Put aj in S- if aj<ai splitter: If we had a way to choose splitters ai such that there were at least
Put aj in S+ if aj>ai ~n elements both larger and smaller than ai, for any fixed constant e > 0,
End[or then the size of the sets in the recursive call would shrink by a factor of at
If Is-I = k -1 then least (1- ~) each time. Thus the running time T(n) would be bounded by
The splitter ai was in fact the desired answer the recurrence T(n) < T((1 - ~)n) + cn. The same argument that showed the
Else if lS-[>_k then previous recurrence had the solution T(n) = O(n) can be used here: If we
The kth largest element lies in S- unroll this recurrence for any ~ > 0, we get
Rect~rsively call Select(S-, k)
13.5 Randomized Divide and Conquer: Median-Finding and Quicksort 731
Chapter 13 Randomized Algorithms
730
elements in the set are central, and so the probability that our random choice
T(n) <_ cn + (I - e)cn + (1 - e)Zcn + .... [I + (I - e) + (I - e)z +...] of splitter produces a central element is ½. Hence, by our simple waiting-time
1 bound (13.7), the expected number of iterations before a central element is
cn < - ¯ cn, found is 2; and so the expected number of iterations spent in phase j, for any
j, is at most 2.
since we have a convergent geometric series.
This is pretty much all we need for the analysis. Let X be a random variable
Indeed, the only thing to really beware of is a very "off-center" splitter.
equal to the number of steps taken by the algorithm. We can write it as the
For example, if we always chose the minimum element as the splitter, then we
sum X = X0 + X1 +. X2 +- ¯., where Xj is the expected number of steps spent
may end up with a set in the recursive call that’s only one element smaller
by the algorithm in phase j. When the algorithm is in phase j, the set has
than we had before. In this case, the running time T(n) would be bounded
size at most n(¼)], and so the number of steps required for one iteration in
by the recurrence T(n) < T(n - 1) + cn. Unrolling this recurrence, we see that
phase j is at most cn(¼)J for some constant c. We have just argued that the
there’s a problem: expected number of iterations spent in phase j is at most two, and hence we
cn(n + 1) _ ®(n2).
T(n) < cn + c(n - 1) + c(n - 2) + have E [X/] < 2cn(~)/. Thus we can bound the total expected running time
2
using linearity of expectation,
Random Splitters Choosing a "wel!-centered" splitter, in the sense we have
just defined, is certainly similar in flavor to our original problem of choosing
the median; but the situation is really not so bad, since any well-cen, tered J
splitter will do. since the sum ~j(~)J is a geometric series that converges. Thus we have the
Thus we will implement the as-yet-unspecified step of selecting a splitter following desired result.
using the following simple rule: ’
(13:18) The expected running ~me of SeZect(ni k) is O(n):
...... ~ .........:--=: ..........................................................................
: ...............
: ..............::.: ..... .......
Choose a splitter ai ~ S uniformly at random
The intuition here is very natural: since a fairly large fraction of the elements A Second Application: Quicksort
are reasonably weil-centered, we will be likely to end up with a good splitter The randomized divide-and-conquer technique we used to find the median
simply by choosing an element at random. is also the basis of the sorting algorithm Quicksort. As before, we choose a
The analysis of the running time with a random splitter is based on this splitter for the input set S, and separate S into the dements below the splitter
idea; we expect the size of the set under consideration to go down by a fixed value and those above it. The difference is that, rather than looking for the
constant fraction every iteration, so we should get a convergent se~es and median on just one side of the splitter, we sort both sides recursively and glue
hence a linear bound as previously. We now show how to make this precise. the two sorted pieces together (with the splitter in between) to produce the
overall output. Also, we need to explicitly include a base case for the recursive
code: we only use recursion on sets of size at bast 4. A complete description
~ Analyzing the Algorithm of Qu±cksort is as follows.
We’ll say that the algorithm is in phase j when the size of the set under
consideration is at most rt({)J but greater than n(~)j+l. Let’s try to bound
the expected time spent by the algorithm in phase j. In a given iteration of the Quicksort (S) :
algorithm, we say that an element of the set under consideration is central if If ISI < 3 then
at least a quarter of the elements are smaller than it and at least a quarter of Sort S
Output the sorted list
the elements are larger than it.
Else
Now observe that if a central element is chosen as a splitter, then at least
Choose a splitter a~ ~S uniformly at random
a quarter of the set will be thrown away, the set will shrink by a factor of {
For each element aj of S
or better, and the current phase will come to an end. Moreover, half of all the
13.5 Randomized Divide and Conquer: Median-Finding and Quicksort
Chapter 13 Randomized Algorithms 733
732
Output the sorted list
Put aj in S- if a]<at Endif
Put a] in S+ if a]>ai Else
Endfor While no central splitter has been found
Recursively call Quicksort(S-) and Quicksort(S+) Choose a splitter af E S uniformly at random
Output the sorted set S-, then at, then the sorted set S+
For each element a] of S
Endif Put aj in S- if a] < at
Put a] in S+ if aj>at
As with median-finding, the worst-case running time of this method is Endf or "
not so good. If we always select the smallest element as a splitter, then the If [S-[ >_ ]S[/4 and [S+[ >_ [S[/4 then
nmning time T(n) on n-element sets satisfies the same recurrence as before: af is a central splitter
T(n) <_ T(n - 1) + cn, and so we end up with a time bound of T(n) = ®(n2).
Endif -~
In fact, this is the worst-case running time for [~uicksort. Endwhile
On the positive side, if the splitters selected happened to be the medians Recursively call Quicksort($-) and @uicksort($+)
of the sets at each iteration, then we get the recurrence T(n) < 2T(n/2) + Output the sorted set S-, then at, then the sorted set S+
which arose frequently in the divide-and-conquer analyses of Chapter 5; the Endif
running time in this lucky case is O(n log n).
Here we are concerned with the expected running time; we will showy that Consider a subproblem for some set S. Each iteration of the While loop
this can be bounded by O(n log n), almost as good as in the best case when the selects a possible splitter ai and spends O(ISt) time splitting the set and deciding
splitters are perfectly centered. Our analysis of [~uicksort will closely follow if ai is central. Earlier we argued that the number of iterations needed until
the analysis of median-finding. Just as in the Select procedure that We used we find a central splitter is at most 2. This gives us the following statement.
for median-finding, the crucial definition is that of a central splitter--one that
divides the set so that each side contains at least a quarter of the elements. (As (13.19) The expected running time [or the algorithm on a set S, excluding
the time spent on recursive calls, is O(ISI).
we discussed earlier, it is enough for the analysis that each side contains at
least some fixed constant fraction of the elements; the use of a quarter here is The algorithm is called recursively on multiple subproblems. We will group
chosen for convenience.) The idea is that a random choice is likely to lead to a these subproblems by size. We’ll say that the subproblem is of type j if the size
central splitter, and central splitters work well. In the case of sorting, a central of the set under consideration is at most n(~)J but greater than n(~)/+1. By
splitter divides the problem into two considerably smaller subproblems. (13.1 9), the expected time spent on a subproblem of type ], excluding recursive
To simplify the presentation, we will slightly modify the algorithm so that calls, is O(n(¼)~). To bound the overall running time, we need to bound the
it only issues its recursive calls when it finds a central splitter. Essentially, this number of subproblems for each type j. Splitting a type j subproblem via a
modified algorithm differs from [~uicksort in that it prefers to throw away central splitter creates two subproblems of higher type. So the subproblems of
an "off-center" splitter and try again; Quicksort, by contrast, launches the a given typej are disjoint. This gives us a bound on the number of subproblems.
recursive calls even with an off-center splitter, and at least benefits from the
(13.20) The number of type j subproblems created by the algorithm is at most
work already done in splitting S. The point is that the expected running time
(~)]+1.
of this modified algorithm can be analyzed very simply, by direct analogy
with our analysis for median-finding. With a bit more work, a very similar but There are at most (34--)i+1 subproblems of type j, and the expected time
somewhat more involved analysis can also be done for the original [~uicksort spent on each is O(n(~)l) by (13.19). Thus, by linearity of expectation, the
algorithm as well; however, we will not describe this analysis here. expected time spent on subproblems of type] is O(n). The number of different
types is bounded by log~ n = O(log n), which gives the desired bound.
Modified Quicksort (S) :
If ISI <3 then (13,21) The expected running time of ~odified QuiCksort iS O(n log n)~
Sort S
13.6 Hashing: A Randomized Implementation of Dictionaries
735
Chapter 13 Randomized Algorithms
734
(for example, u may be the name or ID number of an employee, and we
We considered this modified version of Quicksort tO simplify the analy- want to also store some personal information about this employee), and
sis. Coming back to the original Quicksor% our intuition suggests that the we will simply imagine this being stored in the dictionary as part of a
expected running time is no worse than in the modified algorithm, as accept- record together with u. (So, in general, when we talk about the element
ing the noncentral splitters helps a bit with sorting, even if it does not help as u, we really mean u and any additional information stored with u.)
much as when a central splitter is chosen. As mentioned earlier, one can in Delete(u) removes element u from the set S, if it is currently present.
fact make this intuition precise, leading to an O(n log n) expected time bound
O Lookup(u) determines whether u currently belongs to S; if it does, it also
for the original ~uicksort algorithm; we will not go into the details of this
retrieves any.additional information stored with~.
here.
Many of the implementations we’ve discussed earlier in the book involve
13.6 Hashing: A Randomized Implementation oi (most of) these operations: For example, in the implementation of the BFS
and DFS graph traversal algorithms, we needed to maintain the set S of nodes
Dictionaries already visited. But there is a fundamental difference between those problems
Randomization has also proved to be a powerful technique in the design and the present setting, and that is the size of U. The universe U in BFS or DFS
of data structures. Here we discuss perhaps the most fundamental use of is the set of nodes V, which is already given explicitly as part of the input.
randomization in this setting, a technique called hashing that can be used Thus it is completely feasible in those cases to maintain a set S _ U as we
to maintain a dynamically changing set of elements. In the next section, we did there: defining an array with IUI positions, one for each possible element,
will show how an application of this technique yields a very simple algorithm and setting the array position for u equal to 1 if u ~ S, and equal to 0 if u ~ S.
for a problem that we saw in Chapter 5--the problem of finding the closest This allows for insertion, deletion, and lookup of elements in constant time
pair of points in the plane. per operation, by simply accessing the desired array entry.
Here, by contrast, we are considering the setting in which the universe
~ The Problem U is enormous. So we are not going to be able to use an array whose size is
One of the most basic applications of data structures is to simply maintain a anywhere near that of U. The fundamental question is whether, in this case,
set of elements that changes over time. For example, such applications could we can still implement a dictionary to support the basic operations almost as
include a large company maintaining the set of its current employees and quickly as when U was relatively small.
contractors, a news indexing service recording the first paragraphs of news
We now describe a randomized technique called hashing that addresses
articles it has seen coming across the newswire, or a search algorithm keeping this question. While we will not be able to do quite as well as the case in
track of the small part of an exponentially large search space that it has already which it is feasible to define an array over all of U, hashing will allow us to
explored. come quite close.
In all these examples, there is a universe U of possible elements that is
extremely large: the set of all possible people, all possible paragraphs (say, up
to some character length limit), or all possible solutions to a computationallY ~ Designing the Data Structure
hard problem. The data structure is trying to keep track of a set S _ U whose As a motivating example, let’s think a bit more about the problem faced by
size is generally a negligible fraction of U, and the goal is to be able to insert an automated service that processes breaking news. Suppose you’re receiving
and delete elements from S and quickly determine whether a given element a steady stream of short articles from various wire services, weblog postings,
and so forth, and you’re storing the lead paragraph of each article (truncated
belongs
We to S. cal! a data structure that accomplishes this a d[ctiortary. More
will to at most 1,000 characters). Because you’re using many sources for the sake
precisely, a dictionary is a data structure that supports the following operations. of full coverage, there’sa lot of redundancy: the same article can show up
many times.
o MakeDictionary. This operation initializes a fresh dictionary that can
When a new article shows up, you’d like to quickly check whether you’ve
maintain a subset S of U; the dictionary starts out empty.
seen the lead paragraph before. So a dictionary is exactly what you want for this
o Insert(u) adds element u ~ U to the set S. In many applications, there problem: The universe U is the set of al! strings of length at most 1,000 (or of
may be some additional information that we want to associate with u
13.6 Hashing: A Randomized Implementation of Dictionaries
Chapter 13 Randomized Mgofithms 737
736
position. In the worst case, we will insert all the elements of this set, and then
length exactly 1,000, if we pad them out with blanks), and we’re maintaining our Lookup operations will consist of scanning a linked fist of length n.
a set S_ c U consisting of strings (i.e., lead paragraphs) that we’ve seen before.
Our main goal here is to show that randomization can help significantly
One solution would be to keep a linked list of all paragraphs, and scan for this problem. As usual, we won’t make any assumptions about the set of
this fist each time a new one arrives. But a Lookup operation in this case takes
elements S being random; we will simply exploit randomization in the design
time proportional to ISI. How can we get back to something that looks like an of the hash function. In doing this, we won’t be able to completely avoid
array-based solution~. collisions, but can make them relatively rare enough, and so the fists will be
Hash Functions The basic idea of hashing is to work with an array of size quite short.
ISI, rather than one comparable to the (astronomical) size of U.
Suppose we want to be able to store a set S of size up to n. We will Choosing a Good Hash Function We’ve seen that the efficiency of the
set up an array H of size rt to store the information, and use a function dictionary is based on the choice of the hash function h. Typically, we will think
h : U --> {0, 1 ..... rt - 1} that maps elements of U to array positions. We call of U as a large set of numbers, and then use an easily computable function h
such a function h a hash function, and the array H a hash table. Now, if we that maps each number u ~ U to some value in the smaller range of integers
want to add an element a to the set S, we simply place a in position h(a) of [0, 1 ..... n - 1}. There are many simple ways to do this: we could use the first
the array H. In the case of storing paragraphs of text, we can think of h(.) as or last few digits of u, or simply take u modulo ft. While these simple choices
computing some kind of numerical signature or "check-sum" of the paragraph may work well in many situations, it is also possible to get large numbers
u, and this tells us the array position at which to store a. of collisions. Indeed, a fixed choice of hash function may run into problems
This would work extremely well if, for all distinct a and u in our set S, it because of the types of elements u encountered in the application: Maybe the
happened to be the case that h(u) ~= h(v). In such a case, we could look up particular digits we use to define the hash function encode some property of
u in constant time: when we check array position H[h(u)], it would either be u, and hence maybe only a few options are possible. Taking u modulo rt can
have the same problem, especially if rt is a power of 2. To take a concrete
empty or would contain just ~. example, suppose we used a hash function that took an English paragraph,
In general, though, we cannot expect to be this lucky: there can be distinct used a standard character encoding scheme like ASCII to map it to a sequence
elements R, u ~ S for which h(R) = h(u). We will say that these two elements .... of bits, and then kept only the first few bits in this sequence. We’d expect a
collide, since they are mapped to the same place in H. There are a number huge number of collisions at the array entries corresponding to the bit strings
of ways to deal with collisions. Here we wil! assume that each position H[i] that encoded common English words like The, while vast portions of the array
of the hash table stores a linked fist of all elements u ~ S with h(R) = i. The can be occupied only by paragraphs that begin with strings like qxf, and hence
operation Lookup(u) would now work as follows. will be empty.
o Compute the hash function h(u). A slightly better choice in practice is to take (u mod p) for a prime number
o Scan the linked fist at position H[h(tt)] to see if ~ is present inthis fist. .... p that is approximately equal to n. While in some applications this may yield
a good hashing function, it may not work well in al! applications, and some
Hence the time required for Lookup(R) is proportional to the time to compute primes may work much better than others (for example, primes very close to
h(tt), plus the length of the finked list at H[h(a)]. And this latter quantity, in powers of 2 may not work so well).
turn, is just the number of elements in S that collide with ~. The Insert and Since hashing has been widely used in practice for a long time, there is a
Delete operations work similarly: Insert adds u to the linked fist at position lot of experience with what makes for a good hash function, and many hash
H[h(R)], and De]-e~ce scans this fist and removes a if it is present. functions have been proposed that tend to work well empirically. Here we
So now the goal is clear: We’d like to find a hash function that "spreads would like to develop a hashing scheme where we can prove that it results in
out" the elements being added, so that no one entry of the hash table H efficient dictionary operations with high probability.
contains too many elements. This is not a problem for which worst-case
The basic idea, as suggested earlier, is to use randomization in the con-
analysis is very informative. Indeed, suppose that IUI >_ r~2 (we’re imagining
struction of h. First let’s consider an extreme version of this: for every element
applications where it’s much larger than this). Then, for any hash function h
that we choose, there will be some set S of rt elements that all map to the same u ~ U, when we go to insert u into S, we select a value h(u) uniformly at
13.6 Hashing: A Randomized Implementation of Dictionaries
Chapter 13 Randomized Algorithms 739
738
The class of all possible functions failed to have this property: Essentially, the
random in the set {0, 1 ..... n - 1}, independently of all previous choices. In only way to represent an arbitrary function from U into {0, 1 ..... n - 1} is to
this case, the probability that two randomly selected values h(u) and h(v) are write down the value it takes on every single element of U.
equal (and hence cause a collision) is quite small.
In the remainder of this section, we will show the surprising fact that
there exist classes ~ that satisfy both of these properties. Before we do this,
(13.22) With this uniform random hashing scheme, the probability that two
randomly selected values h(u) and h(v) collide--that is, that h(u) = h(v)--is we first make precise the basic property we need from a universal class of hash
functions. We argue that if a function h is selected at random from a universal
exactly 1/n. class of hash func.tions, then in any set S C U of size at most n, and any u ~ U,
Proof. Of the n2 possible choices for the pair of values (h(u), h(v)), all are the expected number of items in S that collide with u is a constant.
equally likely, and exactly n of these choices results in a collision, m
(13.23) Let IK be a universal class of hash [unctions mapping a universe U
However, it will not work to use a hash function with independently to the set {0, ! ..... n - 1}, let S be an arbitrary subset-of U of size at most n,
random chosen values. To see why, suppose we inserted u into S, and then and let u be any element in U. We define X to be a random variable equal to the
later want to perform either De3-e~ce(u) or Lookup(u). We immediately run into number of elements s ~ S for which h(s) = h(u), for a random choice of hash
the "Where did I put it?" problem: We will need to know the random value ~nction h ~ ~C. (Here S and u are fixed, and the randomness is in the choice
h(u) that we used, so we will need to have stored the value h(u) in some form of h ~ ~.) Then E [X] _< 1.
where we can quickly look it up. But figs is exactly the same problem we were
trying to solve in the first place. Proof. For an element s ~ S, we define a random variable Xs that is equal to 1
There are two things that we can learn from (13.22). First, it provides a
if h(s) = h(u), and equal to 0 otherwise. We have E [Xs] = Pr IXs = 1] < l/n,
concrete basis for the intuition from practice that hash functions that spread since the class of functions is universal.
things around in a "random" way can be effective at reducing collisions. Sec-
ond, and more crucial for our goals here, we will be able to show how a more Now X = ~,s~s Xs, and so, by linearity of expectation, we have
controlled use of randomization achieves performance as good as suggested
in (13.22), but in a way that leads to an efficient dictionary implementation.
Universal Classes of Hash Functions The key idea is to choose a hash
function at random not from the collection of all possible functions into Designing a Universal Class of Hash Functions Next we will design a
[0, n - 1], but from a carefully selected class of functions. Each function h in universal class of hash functions. We will use a prime number p ~ n as the
our class of functions ~ will map the universe U into the set {0, 1 ..... n - 1}, size of the hash table H. To be able to use integer arithmetic in designing
and we will design it so that it has two properties. First, we’d like ft to come our hash functions, we will identify the universe with vectors of the form
with the guarantee from (13.22): x = (xl, x2 .... xr) for some integer r, where 0 _< xi < p for each i. For example,
we can first identify U with integers in the range [0, N - 1] for some N, and
o For any pair of elements u, v ~ U, the probability that a randomly chosen
then use consecutive blocks of [log pJ bits of u to define the corresponding
h ~ ~ satisfies h(u) = h(v) is at most 1/n.
coordinates xi. If U _ [0, N - 1], then we will need a number of coordinates
We say that a class ~ of functions is universal if it satisfies this first property. r -~ log N/log n.
Thus (13.22) can be viewed as saying that the class of all possible functions Let A be the set of all vectors of the form a = (a1 ..... at), where ai is an
from U into {0, ! ..... n -- 1} is universal. integer in the range [0, p - 1] for each i = 1 .....
r. For each a ~ A, we define
However, we also need ~ to satisfy a second property. We will state this the linear function
slightly informally for now and make it more precise later.
o Each h a ~ can be compactly represented and, for a given h ~ J~ and ha(x)= aixi modp.
u ~ U, we can compute the value h(u) efficiently.
13.7 Finding the Closest Pair of Points: A Randomized Approach 741
Chapter 13 Randomized Algorithms
740
chosen, the probability of ha(x) = ha(Y), taken over the final choice of aj, is
This now completes our random implementation of dictionaries. We define exactly 1/p. It will follow that the probability of ha(x) = ha(Y) over the random
the family of hash functions to be 9£--= {ha:a ~ A]- To execute MakeDic- choice of the full vector a must be 1/p as well.
tionary, we choose a random hash function from J~; in other words, we This conclusion is intuitively clear: If the probability is 1/p regardless of
choose a random vector from A (by choosing each coordinate uniformly at how we choose all other ai, then it is lip overall. There is also a direct proof
random), and form the function ha. Note that in order to define A, we need of this using conditional probabilities. Let £ be the event that ha(x) = ha(Y),
to find a prime number p >_ n. There are methods for generating prime num- and let 5’b be the event that all coordinates ai (for i ~j) receive a sequence of
bers quickly, which we will not go into here. (In practice, this can also be values b. We will show, below, that Pr [~ I 5~b] = 1/p for all b. It then follows
accomplished using a table of known prime numbers, even for relatively large that Pr [g] = ~b Pr" [~ I 9:b]- Pr [5:b] = (I/p) ~bPr [9:b] = !/p.
n.) We then use this as the hash function with which to implement Insert, So, to conclude the proof, we assume that values have been chosen
arbitrarily for all other coordinates ai, and we consider the probability of
Delete, and Lookup. The family 9~ = {ha : a ~ A} satisfies a formal version of selecting aj so that ha(x) = ha(Y). By rearranging terms, we see that ha(x) =
the second property we were seeking: It has a compact representation, since ha(Y) if and only if
by simply choosing and remembering a random a ~ A, we can compute ha(u)
for all elements u ~ U. Thus, to show that ~ leads to an efficient, hashing- aj(Y] -- xj) = E ai(xi -- yi) modp.
based implementation of dictionaries, we iust need to establish that ~ is a
universal family of hash functions. Since the choices for all ai (i ~j) have been fixed, we can view the right-hand
side as some fixed quantity m. Also, let us define z =
~ Analyzing the Data Structure Now it is enough to show that there is exactly one value 0 < a] < p that
If we are using a hash function ha from the class J~ that we’ve defined, then a satisfies a]z = m rood p; indeed, if this is the case, then there is a probability
collision ha(X) : ha(Y) defines a linear equation modulo the prime number p. In of exactly lip of choosing this value for aj. So suppose there were two such
order to analyze such equations, it’s useful to have the following "cancellation values, aj and a~. Then we would have ajz = a~z modp, and so by (13.24) we
law." would have a] = a~ rood p. But we assumed that a], a~ < p, and so in fact aj
(13.24) For any prime p and any integer z ~= 0 rood p, and any two integers and a~ would be the same. It follows that there is only one aj in this range that
satisfies ajz = m rood p.
a, r, ifaz =/~z modp, then ~ =/~ modp.
Tracing back through the implications, this means that the probability of
ProoL Suppose az = ~z modp. Then, by rearranging terms, we get z(a -/~) = choosing aj so that ha(x) = ha(Y) is l/p, however we set the other coordinates
0 mod p, and hence z(a - r) is divisible by p. But z ~ 0 rood p, so z is not ai in a; thus the probability that x and y collide is lip. Thus we have shown
divisible by p. Since p is prime, it follows that a - fl must be divisible by p; that ~C is a universal class of hash functions, u
that is, a --- fl rood p as claimed. []
We now use this to prove the main result in our analysis. 13.7 Finding the Closest Pair of Points:
A Randomized Approach
(13.25) The class of linear fimctions ~K defined above is universal. In Chapter 5, we used the divide-and-conquer technique to develop an
O(n log n) time algorithm for the problem of finding the closest pair of points in
Yr) be two distinct elements
Proof. Let x = (Xl, x2 ....Xr) and y = (Y~, Yz .... the plane. Here we will show how to use randomization to develop a different
of U. We need to show that the probability of ha(x) = ha(Y), for a randomly algorithm for this problem, using an underlying dictionary data structure. We
chosen a ~ A, is at most 1/p. will show that this algorithm runs in O(n) expected time, plus O(n) expected
Since x 5& y, then there must be an index j such that xj ~ yj. We now dictionary operations.
consider the following way of choosing the random vector a ~ A. We first There are several related reasons why it is useful to express the running
choose all the coordinates ai where i ~j. Then, finally, we choose coordinate time of our algorithm in this way, accounting for the dictionary operations
aj. We will show that regardless of how a!l the other coordinates ai were
13.7 Finding the Closest Pair of Points: A Randomized Approach
Chapter 13 Randomized Algorithms 743
742
separately. We have seen in Section 13.6 that dictionaries have a very efficient the points in this order. When we get to a new point p, we look "in the vicinity"
implementation using hashing, so abstracting out the dictionary operations ofp to see if any of the previously considered points are at a distance less than
allows us to treat the hashing as a "black box" and have the algorithm inherit 3 from p. If not, then the closest pair hasn’t changed, and we move on to the
next point in the random order. If there is a point within a distance less than
an overall running time from whatever performance guarantee is satisfied by
this hashing procedure. A concrete payoff of this is the following. It has been 3 from p, then the closest pair has changed, and we will need to update it.
shown that with the right choice of hashing procedure (more powerful, and The challenge in turning this into an efficient algorithm is to figure out
more complicated, than what we described in Section 13.6), one can make the how to implement the task of looking for points in the vicinity ofp. It is here
underlying dictionary operations run in linear expected time as well, yielding that the dictionary, data structure will come into play.
an overall expected running time of O(n). Thus the randomized approach we We now begin making this more concrete. Let us assume for simplicity that
describe here leads to an improvement over the running time of the divide- the points in our random order are labeled Pl ..... Pn. The algorithm proceeds
and-conquer algorithm that we saw earlier. We will talk about the ideas that in stages; during each stage, the closest pair remains constant. The first stage
lead to this O(n) bound at the end of the section. starts by setting 3 = d(pl, P2), the distance of the first two points. The goal of
It is worth remarking at the outset that randomization shows up for two a stage is to either verify that 3 is indeed the distance of the closest pair of
independent reasons in this algorithm: the way in which the algorithm pro- points, or to find a pair of points Pi, Pj with d(Pi, pj) < 3. During a stage, we’ll
cesses the input points will have a random component, regardless of how the gradually add points in the order Pl, P2 ..... Pn. The stage terminates when
dictionary data structure is implemented; and when the dictionary is imple- we reach a point Pi so that for some j < i, we have d(pi, pj) < 8. We then let 8
mented using hashing, this introduces an additional source of randomness as for the next stage be the closest distance found so far: 8 = min/:/<i d(pi, pj).
part of the hash-table operations. Expressing the running time via the num- The number of stages used will depend on the random order. If we get
ber of dictionary operations allows us to cleanly separate the two uses of lucky, and Pl, P2 are the closest pair of points, then a single stage will do. It
randomness. is also possible to have as many as n - 2 stages, if adding a new point always
decreases the minimum distance. We’ll show that the expected running time
of the algorithm is within a constant factor of the time needed in the first,
~ The Problem
lucky case, when the original value of 8 is the smallest distance.
Let us start by recalling the problem’s (very simple) statement. We are given
n points in the plane, and we wish to find the pair that is closest together. Testing a Proposed Distance The main subroutine of the algorithm is a
As discussed in Chapter 5, this is one of the most basic geometric proximity method to test whether the current pair of points with distance 8 remains
the closest pair when a new point is added and, if not, to find the new closest
problems, a topic with a wide range of applications.
pair.
We wil! use the same notation as in our earlier discussion of the closest-
pair problem. We wil! denote the set of points by P = [Pl ..... Pn}, where Pi The idea of the verification is to subdivide the unit square (the area where
the points lie) into subsquares whose sides have length 8/2, as shown in
has coordinates (xi, y~); and for two points Pi, Pj E P, we use d(p~, pj) to denote
the standard Euclidean distance between them. Our goal is to find the pair of Figure !3.2. Formally, there will be N2 subsquares, where N--- [1/(28)]: for
0 < s < N - 1 and 1 < t < N - !, we define the subsquare Sst as
points pi, pj that minimizes d(p~, pj).
To simplify the discussion, we wi!l assume that the points are al! in the Sst = {(x, y) : s8/2 < x < (s + 1)8/2; t8/2 < y < (t + 1)8/2].
unit square: 0 < x~, y~ < 1 for all i = 1 ..... n. This is no loss of generality: in
linear time, we can rescale all the x- and y-coordinates of the points so that We claim that this collection of subsquares has two nice properties for our
they lie in a unit square, and .then we can translate them so that this unit purposes. First, any two points that lie in the same subsquare have distance
less than 8. Second, and a partial converse to this, any two points that are less
square has its lower left corner: at the origin.
than 8 away from each other must fall in either the same subsquare or in very
close subsquares.
L~ Designing the Algorithm
The basic idea of the algorithm is very simple. We’ll consider the points in (13.26) If two points p and q belong to the same subsquare Sst, then
random order, and maintain a current value 3 for the closest pair as we process d(p,q) <8.
13.7 Finding the Closest Pair of Points: A Randomized Approach
Chapter 13 Randomized Mgorithms 745
744
Now, when the next point p is considered, we determine which of the
subsquares Sst it belongs to. If p is going to cause the minimum distance to
change, there must be some earlier point p’ ~ P’ at distance less than 8 from
it; and hence, by (13.27), the point p’ must be in one of the 25 squares around
(if p is involved in the clo~sest" the square Sst containing p. So we will simply check each of these 25 squares
i 5/2 I pair, then the other point one by one to see if it contains a point in P’; for each point in P’ that we find
l~s in a close subsquare. this way, we compute its distance to p. By (13.26), each of these subsquares
s_~
contains at most one point of P’, so this is at most a constant number of distance
computations. (NOte that we used a similar idea, via (5. !0), at a crucial point
in the divide-and-conquer algorithm for this problem in Chapter 5.)
Proof. Consider the first i points Pl, P2 ..... pi in the random order. Assume
Analyzing the Algorithm that the minimum distance among these points is achieved by p and q. Now
There are akeady some things we can say about the overall running time the point pi can only cause the minimum distance to decrease if pi = p or
of the algorithm. To consider a new point Pi, we need to perform only a Pi = q- Since the first i points are in a random order, any of them is equally
constant number of Lookup operations and a constant number of distance likely to be last, so the probability that p or q is last is 2/i. M
computations. Moreover, even if we had to update the closest pair in every
iteration, we’d only do n MakeDictionaxy operations. Note that 2/i is only an upper bound in (13.30) because there could be
The missing ingredient is the total expected cost, over the course of the multiple pairs among the first i points that define the same smallest distance.
algorithm’s execution, due to reinsertions into new dictionaries when the By (13.29) and (13.30), we can bound the total number of Insert oper-
closest pair is updated. We will consider this next. For now, we can at least ations as
summarize the current state of our knowledge as follows.
Ix] = n + i. [xi] _< n + 2n = 3n.
(13128) The algorithm CorreCtly maintains the IoseSt pair at all timesl and i
it performs at most O(n) distance computations; O(n) Lookup operations;and Combining this with (13.28), we obtain the following bound on the running
O(n) MakeDictionary operationsi time of the algorithm.
We now conclude the analysis by bounding the expected number of (13.31} In expectation, the randomized closest-pair algorithm requires O(n)
Insert operations. Trying to find a good bound on the total expected number time plus O(n) dictionary operations.
of Insert operations seems a bit problematic at first: An update to the closest
13.7 Finding the Closest Pair of Points: A Randomized Approach 749
Chapter 13 Randomized Mgorithms
748
Now consider this sequence of Lookup operations for a fixed order or. For
Achieving Linear Expected Running Time i = 1 ..... X(a), let Yi be the number of points that need to be inspected during
up to this point, we have treated the dictionary data structure as a black box, the ith Lookup operations--namely, the number of previously inserted points
and in 03.31) we bounded the running time of the algorithm in terms of that collide with the dictionary entry involved in this Lookup operation. We
computational time plus dictionary operations. We now want to give a bound would like to bound the expected value V’x(~)
of Yi, where expectation is over
on the actual expected running time, and so we need to analyze the work both the random choice of a and the random choice of hash function.
involved in performing these dictionary operations. By (13.23), we know that E [Y~ I g~] = O(1) for all cr and all values of i.
To implement the dictionary, we’ll use a universal hashing scheme, like the It is useful to be able to refer to the constant in the expression O(1) here, so
one discussed in Section 13.6. Once the algorithm employs a hashing scheme, we will say that E [Yi I ~o] < cl for all cr and all values of i. Summing over all
it is making use of randomness in two distinct ways: First, we randomly order i, and using linearity of expectation, we get E [~i Yi I ~a] --< clX(cr)- NOW we
the points to be added; and second, for each new minimum distance 8, we have
apply randomization to set up a new hash table using a universal hashing
scheme.
When inserting a new point Pi, the algorithm uses the hash-table Lookup
operation to find all nodes in the 25 subsquares close to Pi. However, if
the hash table has collisions, then these 25 Lookup operations can involve
inspecting many more than 25 nodes. Statement (13.23) from Section ,13.6 = q ~ E [X I g~] . Pr [g~] = qE [X] .
shows that each such Lookup operation involves considering O(1) previously
inserted points, in expectation. It seems intuitively clear that performing O(n) Since we know that E [X] is at most con, the total expected number of points
hash-table operations in expectation, each of which involves considering O(1) considered is at most Coqn = O(n), which proves the claim. []
elements in expectation, will result in an expected running time of O(n) overall.
To make this intuition precise, we need to be careful with how these two Armed with this claim, we can use the universal hash functions from
sources of randomness interact. Section 13.6 in our closest-pair algorithm. In expectation, the algorithm will
consider O(n) points during the Lookup operations. We have to set up multiple
(15.52) Assume we implement the randomized closest-pair algorithm using a hash tables--a new one each time the minimum distance changes--and we
universal hashing scheme. In expectation, the total number of points considered have to compute O(n) hash-function values. All hash tables are set up for
during the Lookup operations is bounded by O(n). the same size, a prime p > n. We can select one prime and use the same
table throughout the algorithm. Using this, we get the following bound on the
Proof. From (13.31) we know that the expected number of Lookup operations runxfing time.
is O(n), and from (13.23) we know that each of these Lookup operations
involves considering only O(1) points in expectation. In order to conclude (13.33) In expectation, the algorithm uses O(n) hash-function computations
that this implies the expected number of points considered is O(n), we now and O(n) additional time for finding the closest pair of points.
consider the relationship between these two sources of randomness.
Let X be a random variable denoting the number of Lookup operations Note the distinction between this statement and (13.31). There we counted
performed by the algorithm. Now the random order a that the algorithm each dictionary operation as a single, atomic step; here,, on the other hand,
chooses for the points completely determines the sequence of minimum- we’ve conceptually opened up the dictionary operations so as to account for
distance values the algorithm will consider and the sequence of dictionary the time incurred due to hash-table collisions and hash-function computations.
operations it will perform. As a result, the choice of a determines the value Finally, consider the time needed for the O(n) hash-function computations.
of X; we let X(a) denote this value, and we let ~a denote the event the How fast is it to compute the value of a universal hash function h? The class
algorithm chooses the random order or. Note that the conditional expectation of universal hash functions developed in Section 13.6 breaks numbers in our
E [X I ga] is equal to X(cr). Also, by (!3.31), we know that E [X] _< Con, for universe U into r ~ log N/log n smaller numbers of size O(log n) each, and
some constant Co.
13.8 Randomized Caching 751
Chapter 13 Randomized MgofithIns
750
The goal of a Cache Maintenance Algorithm is to minimize the number of
then uses O(r) arithmetic operations on these smaller numbers to compute the cache misses, which are the truly expensive part of the process. The sequence
hash-function value. $o computing the hash value of a single point involves of memory references is not under the control of the algorithm--this is simply
O(log N/log n) multiplications, on numbers of size log n. This is a total of dictated by the application that is running--and so the job of the algorithms
O(n log N/log n) arithmetic operations over the course of the algorithm, more we consider is simply to decide on an eviction policy: Which item currently in
than the O(n) we were hoping for. the cache should be evicted on each cache miss?
In fact, it is possible to decrease the number of arithmetic operations to
O(n) by using a more sophisticated class of hash functions. There are other In Chapter 4, we saw a greedy algorithm that is optimal for the problem:
classes of universal hash functions where computing the hash-function value Always evict the item that will be needed the farthest in the future. While this
can be done by oniy 0(1) arithmetic operations (though these operations will algorithm is useful to have as an absolute benchmark on caching performance,
have to be done on larger numbers, integers of size roughly log N). This it clearly cannot be implemented under real operating conditions, since we
class of improved hash functions also comes with one extra difficulty for don’t know ahead of time when each item will be needed next. Rather, we need
this application: the hashing scheme needs a prime that is bigger than the to think about eviction policies that operate online, using only information
size of the universe (rather than just the size of the set of points). Now the about past requests without knowledge of the future.
universe in this application grows inversely with the minimum distance 3, and
The eviction policy that is typically used in practice is to evict the item that
so, in particular, it increases every time we discover a new, smaller minimum was used the least recently (i.e., whose most recent access was the longest ago
distance. At such points, we will have to find a new prime and set up a new
in the past); this is referred to as the Least-Recently-Used, or LRU, policy. The
hash table. Mthough we will not go into the details of this here, it is possible
empirical justification for LRU is that algorithms tend to have a certain locality
to deal with these difficulties and make the algorithm achieve an expected
in accessing data, generally using the same set of data frequently for a while.
running time of O(n). If a data item has not been accessed for a long time, this is a sign that it may
not be accessed again for a long time.
13.8 Randomized Caching Here we will evaluate the performance of different eviction policies with-
We now discuss the use of randomization for the caching problem, which we out making any assumptions (such as locality) on the sequence of requests.
first encountered in Chapter 4. We begin by developing a class of algorithms, To do this, we will compare the number of misses made by an eviction policy
the marking algorithms, that include both deterministic and randomized ap- on a sequence a with the minimum number of misses it is possible to make
proaches. After deriving a general performance guarantee that applies to all on a. We will use f(a) to denote this latter quantity; it is the number of misses
marking algorithms, we show how a stronger guaraniee can be obtained for a achieved by the optimal Farthest-in-Future policy. Comparing eviction policies
particular marking algorithm that exploits randomization. to the optimum is very much in the spirit of providing performance guaran-
tees for approximation algorithms, as we did in Chapter 11. Note, however, the
following interesting difference: the reason the optimum was not attainable in
/.~ The Problem our approximation analyses from that chapter (assuming ~P 7~ N:P) is that the
We begin by recalling the Cache Maintenance Problem from Chapter 4. In the
algorithms were constrained to run in polynomial time; here, on the other
m~st basic setup, we consider a processor whose full memory has n addresses;
hand, the eviction policies are constrained in their pursuit of the optimum by
it is also equipped with a cache containing k slots of memory that can be
the fact that they do not know the requests that are coming in the future.
accessed very quickly. We can keep copies of k items from the full memory in
the cache slots, and when a memory location is accessed, the processor will For eviction policies operating under this online constraint, it initially
first check the cache to see if it can be quickly retrieved. We say the request seems hopeless to say something interesting about their performance: Why
is a cache hit if the cache contains the requested item; in this case, the access couldn’t we just design a request sequence that completely confounds any
is very qnick. We say the request is a cache miss if the requested item is not online eviction policy? The surprising point here is that it is in fact possible to
in the cache; in this case, the access takes much longer, and moreover, one give absolute guarantees on the performance of various online policies relative
of the items currently in the cache must be evicted to make room for the new to the optimum.
item. (We will assume that the cache is kept full at all times.)
13.8 Randomized Caching 753
Chapter 13 Randomized Algorithms
752
cache--does not specify which unmarked item should be selected. We will
We first show that the number of misses incurred by LRU, on any request
see that eviction policies with different properties and performance guarantees
sequence, can be bounded by roughly k times the optimum. We then use arise depending on how we resolve this ambiguity.
randomization to develop a variation on LRU that has an exponentially stronger
We first observe that, since a phase starts with all items unmarked, and
bound on its performance: Its number of misses is never more than O(log k)
items become marked only when accessed, the unmarked items have all been
times the optimum. accessed less recently than the marked items. This is the sense in which
a marking algorithm is trying to evict items that have not been requested
~ Designing the Class of Marking Algorithms recentfy. Also, at a.ny point in a phase, if there are any unmarked items in the
The bounds for both LRU and its randomized variant will follow from a cache, then the least recently used item must be unmarked. It follows that the
general template for designing online eviction pollcies--a class of policies LRU policy evicts an unmarked item whenever one is available, and so we
called marking algorithms. They are motivated by the following intuition. have the following fact.
To do well against the benchmark of f(a), we need an eviction policy that
is sensitive to the difference between the following two possibilities: (a) in (13.34) The LRU policy is a marking algorithm.
the recent past, the request sequence has contained more than k distinct
items; or (b) in the recent past, the request sequence has come exclusively /;~ Analyzing Marking Algorithms
from a set of at most k items. In the first case, we know that [(a) must be We now describe a method for analyzing marking algorithms, ending with a
increasing, since no algorithm can handle more than k distinct items without bound on performance that applies to al! marking algorithms. After this, when
incurring a cache miss. But, in the second case, it’s possible that a is pa~sing we add randomization, we will need to strengthen this analysis.
through a long stretch in which an optimal algorithm need not incur any Consider an arbitrary marking algorithm operating on a request sequence
misses at all. It is here that our policy must make sure that it incurs very a. For the analysis, we picture an optimal caching algorithm operating on a
few misses. alongside this marking algorithm, incurring an overall cost of f(~r). Suppose
Guided by these considerations, we now describe the basic outline of a that there are r phases in this sequence a, as defined by the marking algorithm.
marking algorithm, which prefers evicting items that don’t seem to have been To make the analysis easier to discuss, we are going to "pad" the sequence
used in a long time. Such an algorithm operates in phases; the description of ~r both at the beginning and the end with some extra requests; these will not
one phase is as follows. add any extra misses to the optimal algorithm--that is, they will not cause f(~r)
to increase--and so any bound we show on the performance of the marking
Each memory item can be either marke_____~d or unmarked algorithm relative to the optimum for this padded sequence will also apply to
At the beginning of the phase, all items are unmarked a. Specifically, we imagine a "phase 0" that takes place before the first phase,
On a request to item s: in which all the items initially in the cache are requested once. This does not
Mark s affect the cost of either the marking algorithm or the optimal algorithm. We
If s is in the cache, then evict nothing also imagine that the final phase r ends with an epilogue in which every item
Else s is not in the cache: currently in the cache of the optimal algorithm is requested twice in round-
If all items currently in the cache are marked then robin fashion. This does not increase f(a); and by the end of the second pass
Declare the phase over through these items, the marking algorithm will contain each of them in its
Processing of s is deferred to start of next phase cache, and each will be marked.
Else evict an unmarked item from the cache For the performance bound, we need two things: an upper bound on the
Endif number of misses incurred by the marking algorithm, and a lower bound saying
Endif that the optimum must incur at least a certain number of misses.
Note that this describes a class of algorithms, rather than a single spe- The division of the request sequence cr into phases turns out to be the
cific algorithm, because the key step--evict an unmarked item from the key to doing this. First of al!, here is how we can picture the history of a
Chapter 13 Randomized Algorithms 13.8 Randomized Caching
754 755
phase, from the marking algorithm’s point of view. At the beginning of the Proof. The number of misses incurred by the marking algorithm is at most
phase, all items are unmarked. Any item that is accessed during the phase is kr = k(r - !) + k < k. f(~r) + k,
marked, and it then remains in the cache for the remainder of the phase. Over
the course of the phase, the number of marked items grows from 0 to k, and where the final inequality is just (13.37). []
the next phase begins with a request to a (k + 1)st item, different from all of
these marked items. We summarize some conclusions from this picture in the Note that the "+k" in the bound of (13.38) is just an additive constant,
following claim. independent of the length of the request sequence or, and so the key aspect
of the bound is the factor of k relative to the optimum. To see that this factor
(13.35) In each phase, a contains accesses to exactly k distinct items. The of k is the best bound possible for some marking algorithms, and for LRU in
subsequent phase begins with an access to a different (k + !)st item. particular, consider the behavior of LRU on a request sequence in which k + 1
items are repeatedly requested in a round-robin fashion. LRU will each time
Since an item, once marked, remains in the cache until the end of the evict the item that will be needed iust in the next step, and hence it will incur
phase, the marking algorithm cannot incur a miss for an item more than once in a cache miss on each access. (It’s possible to get this kind of terrible caching
a phase. Combined with (!3.35), this gives us an upper bound on the number performance in practice for precisely such a reason: the program is executing a
of misses incurred by the marking algorithm. loop that is just slightly too big for the cache.) On the other hand, the optimal
(13.36) The marking algorithm incurs at most k misses per phase, for a total policy, evicting the page that will be requested farthest in the fllture, incurs
a miss only every k steps, so LRU incurs a factor of k more misses than the
of at most kr misses over all r phases. optimal policy.
As a lower bound on the optimum, we have the following fact.
~ Designing a Randomized Marking Algorithm
(13.37) The optimum incurs at least r - 1 misses. In other words, "f(a) >_
The bad example for LRU that we just saw implies that, if we want to obtain
a better bound for an online caching algorithm, we will not be able to reason
Proof. Consider any phase but the last one, and look at the situation just about fully general marking algorithms. Rather, we will define a simple Ran-
after the first access (to an item s) in this phase. Currently s is in the cache domfzed Marking Algorithm and show that it never incurs more than O(log k)
maintained by the optimal algorithm, and (13.55) tells us that the remainder times the number of misses of the optimal algorithm--an exponentially better
of the phase will involve accesses to k - 1 other distinct items, and the first bound.
access of the next phase will involve a kth other item as well. Let S be this Randomization is a natural choice in trying to avoid the unfortunate
set of k items other than s. We note that at least one of the members of S is sequence of "wrong" choices in the bad example for LRU. To get this bad
not currently in the cache maintained by the optimal algorithm (since, with s sequence, we needed to define a sequence that always evicted precisely the
there, it only has room for k - 1 other items), and the optimal algorithm will wrong item. By randomizing, a policy can make sure that, "on average," it is
incur a miss the first time this item is accessed. throwing out an unmarked item that will at least not be needed right away.
What we’ve shown, therefore, is that for every phase ) < r, the sequence Specifically, where the general description of a marking contained the line
from the second access in phase) through the first access in phase) + 1 involves
at least one miss by the optimum. This makes for a total of at least r - 1 misses. Else evict an unmarked item from the cache
these that are to fresh items. Let X~ denote the number of misses incurred by the Randomized Marking
" 7 which essentially said that the
To strengthen the result from (lo.3), Algorithm in phasej. Each request to a fresh item results in a guaranteed miss
optimum incurs at least one miss per phase, we provide a bound in terms for the Randomized Marking Algorithm; since the fresh item was not marked
in the previous phase, it cannot possibly be in the cache when it is requested
of the number of fresh items in a phase.
in phase j. Thus the Randomized Marking Algorithm incurs at least cj misses
(1339} f(cr) ~ ½ ~j---1 c~. in phase j because of requests to fresh items.
Stale items, by contrast, are a more subtle matter. The phase starts with
Proof. Let ~(a) denote the number of misses incurred by the optimal algorithm k stale items in the cache; these are the items that were unmarked en masse
in phase j, so that f(a)= ~[=~ ])(cr). From (13.35), ,we know~ ~a_t:~.n~an~,P~hoa~s~e at the beginning of the phase. On a request to a stale item s, the concern is
j, there are requests to k distinct items. Moreover, oy our aemuuuu whether the Randomized Marking Algorithm evicted it earlier in the phase and
there are requests to cj+! fl_trther items in phase j + 1; so between phases j and
now incurs a miss as it has to bring it back in. What is the probability that the
j + 1, there are at least k + c~+1 distinct items requested. It follows that the ith request to a stale item, say s, results in a miss? Suppose that there have been
optimal algorithm must incur at least cj+l misses over the course of phases j
c < c] requests to fresh items thus far in the phase. Then the cache contains
the c formerly fresh items that are now marked, i - ! formerly stale items that
are now marked, and k - c - i + I items that are stale and not yet marked in
l It is not, however, the simplest way to incorporate randomization into a caching algorithm. We could
have considered the Purely Random Algorithm that dispenses with the whole notion of marking, and
this phase. But there are k - i + 1 items overall that are still stale; and since
on each cache miss selects one of its k current items for eviction uniformly at random. (Note the exactly k - c - i + 1 of them are in the cache, the remaining c of them are not.
difference: The Randomized Marking Algorithm randomizes only over the unmarked items.) Although Each of the k - i + 1 stale items is equally likely to be no longer in the cache,
we won’t prove this here, the Purely Random Algorithm can incur at least c times more misses than and so s is not in the cache at this moment with probability c q
~< -- k-i+l"
the optimum, for any constant c < k, and so it does not lead to an improvement over LRU.
13.9 Chernoff Bounds 759
Chapter 13 Randomized Algorithms
758
This is the probability of a miss on the request to s. Summing over all requests
to unmarked items, we have
i=1
Proof. To bound the probability that X exceeds (1 + 8)/z, we go through a
Thus the total expected number of misses incurred by the Randomized sequence of simple transformations. First note that, for any t > 0, we have
Pr [X > (1 + 8)/z] = Pr [etx > et(l+~)u], as the function f(x) = etx is monotone
Marking Algorithm is
in x. We will use this observation with a t that we’ll select later.
r
r
Next we use some simple properties of the expectation. For a random
i=1 i=1 variable Y, we have y Pr [Y > ~] _< E [Y], by the definition of the expectation.
This allows us to bound the probability that Y exceeds y in terms of E [Y].
Combining (13.39) and (13.40), we immediately get the following perfor- Combining these two ideas, we get the following inequalities.
mance guarantee.
(13.41) The expected nwraber of misses incurred by the Randomized Mafking Next we need to bound the expectation E [etX]. Writing X as X = Ei Xi, the
Algorithm is at most 2H(k) ¯ f(a) = O(log k)- f(a). expectation is E [etx] = E [et ~x~] = E [r-[i etX;]¯ For independent variables Y
and Z, the expectation of the product YZ is E [YZ] = E [Y] E [Z]. The variables
Xi are independent, so we get E [F[ietx~] = r-liE [etX~].
Now, ea~ is e~ with probability Pi and e° = 1 otherwise, so its expectation
13.9 Chernoff Bounds can be bounded as
In Section 13.3, we defined the expectation of a random variable formally and
have worked with this definition and its consequences ever since. Intuitively, E [ea*] =pier+ (1 -Pi) = 1 +pi(et - 1) < ep6e’-l),
we have a sense that the value of a random variable ought to be "near" its where the last inequality follows from the fact that 1 + a _< e~ for any ~ >_ 0.
expectation with reasonably high probability, but we have not yet explored Combining the inequalities, we get the following bound.
the extent to which this is true. We now turn to some results that allow us to
reach conclusions like this, and see a sampling of the applications that follow.
We say that two random variables X and Y are independent if, for any
values i and j, the events Pr IX = i] and Pr [Y =j] are independent. This --< e-t(l+~)tz H ePz(e~-l) <- e-t(l+8)lZel~(et-1)"
definition extends naturally to larger sets of random variables. Now consider i
a random variable X that is a sum of several independent 0-1-valued random To obtained the bound claimed by the statement, we substitute t = ln(1 + 8).
variables: X = X! + X~. +. ¯ ¯ + Xn, where Xi takes the value I with probability
otherwise.
pi, and the value 0 the By linearity
independence of the of expectation,
random weX1,
variables haveX2E .....
[X] =
Xn
~=~ p~. Intuitively, Where (13.42) provided an upper bound, showing that X is not likely to
suggests that their fluctuations are likely to "cancel out;’ and so their sum deviate far above its expectation, the next statement, (13.43), provides a lower
X will have a value close to its expectation with high probability. This is in bound, showing that X is not likely to deviate far below its expectation. Note
fact true, and we state two concrete versions of this result: one bounding the that the statements of the results are not symmetric, and this makes sense: For
probability that X deviates above E[X], the other bounding the probability that the upper bound, it is interesting to consider values of 8 much larger than 1,
X deviates below E[X]. We call these results Chemoff bounds, after one of the while this would not make sense for the lower bound.
probabilists who first established bounds of this form.
Chapter 13 Randomized Algorithms 13.10 Load Balancing
760 761
Analyzing a Random Allocation
(13.43) Let X, X1, X2 ..... Xn and Ix be as defined above, and assume that We will see that the analysis of our random load balancing process depends on
/~ _< E IX]. Then for any 1 > ~ > O, roe have the relative sizes of m, the number of jobs, and n, the number of processors.
Pr [X < (1 - ~)/x] < e-~" ¯ We start with a particularly clean case: when m = n. Here it is possible for
each processor to end up with exactly on4 job, though this is not very likely.
Rather, we expect that some processors will receive no jobs and others will
The proof of (13.43) is similar to the proof of (13.42), and we do not give receive more than one. As a way of assessing the quality of this randomized
it here. For the applications that follow, the statements of (13.42) and (13.43), load balancing he.uristic, we study how heavily loaded with jobs a processor
rather than the internals of their proofs, are the key things to keep in mind. can become.
Let Xi be the random variable equal to the number of jobs assigned to
processor i, for i = 1, 2 ..... n. It is easy to determine the expected value
15.10 Load Balancing of Xi: We let Yij be the random variable equal to 1 if job y is assigned
In Section 13.1, we considered a distributed system in which communication to processor i, and 0 otherwise; then Xi = y~in=l yq and E [Yij] = l/n, so
among processes was difficult, and randomization to some extent replaced E [Xi] = ~=l E [Y~j] = 1. But our concern is with how far Xi can deviate
explicit coordination and synchronization. We now revisit this theme through above its expectation: What is the probability that X~ > c? To give an upper
another stylized example of randomization in a distributed setting. bound on this, we can directly apply (13.42): X~ is a sum of independent 0-1-
valued random variables {Yq}; we have Ix = I and 1 + 8 = c. Thus the fol!owing
statement holds.
~ The Problem
Suppose we have a system in which m jobs arrive in a stream and need to be (13.44)
processed immediately. We have a collection of n identical processors that are
capable of performing the jobs; so the goal is to assign each iob to a processor (eC -l~ .
Pr [Xi > c] < \---~-j
in a way that balances the workload evenly across the processors. If we had
a central controller for the system that could receive each iob and hand it
off to the processors in round-robin fashion, it would be trivial to make sure In order for there to be a small probability of any Xi exceeding c, we will take
that each processor received at most [mini iobs--the most even balancing the Union Bound over i = 1, 2 ..... n; and so we need to choose c large enough
possible. to drive Pr [X~ > c] down well below 1In for each i. This requires looking at
But suppose the system lacks the coordination or centralization to imple- the denominator cc in (13.44). To make this denominator large enough, we
ment this. A much more lightweight approach would be to simply assign each need to understand how this quantity grows with c, and we explore this by
iob to one of the processors uniformly at random. Intuitively, this should also first asking the question: What is the x such that xx = n?
balance the iobs evenly, since each processor is equally likely to get each iob. Suppose we write y(n) to denote this number x. There is no closed-form
At the same time, since the assignment is completely random, one doesn’t expression for y (n), but we can determine its asymptotic value as follows.
expect everything to end up perfectly balanced. So we ask: How well does this If xx = n, then taking logarithms gives x log x = log n; and taking logarithms
simple randomized approach work? again gives log x + log log x = log log n. Thus we have
Although we will stick to the motivation in terms of iobs and processors
here, it is worth noting that comparable issues come up in the analysis of 2 log x > log x + log log x = log log n > 766 log x,
hash functions, as we saw in Section 13.6. There, instead of assigning iobs to and, using this to divide through the equation x log x = log n, we get
processors, we’re assigning elements to entries in a hash table. The concern 1 log n
about producing an even balancing in the case of hash tables is based on -x < -- _< x = y (n).
2 - log log n
wanting to keep the number of collisions at any particular entry relatively
small. As a result, the analysis in this section is also relevant to the study of Thus T(R)= (9(\lo~o~n)"
log n
hashing schemes.
13.11 Packet Routing 763
Chapter 13 Randomized Algorithms
762
Packet 1 ,~,., , ,.
Now, if we set c = ey(n), then by (13.44) we have ~-, ]unly one packet can ]
-- .k,. ft.. ""... ~cross e per time step
Thus, applying the Union Bound over this upper bound for X1, Xz ..... Xn, we
.....................................
have the fo!lowing.
Schedules and Their Durations Let’s now move from these examples to the
question of scheduling packets and managing queues in an arbitrary network
G. Given packets labeled 1, 2 ..... N and associated paths P1, P2 ..... PN, a
aCket 1 may need to wait packet schedule specifies, for each edge e and each time step t, which packet
~doe
r packets 2, 3, 6, and 9,
pending on the sche~
will cross edge e in step t. Of course, the schednie must satisfy some basic
consistency properties: at most one packet can cross any edge e in any one
step; and if packet i is scheduled to cross e at step t, then e should be on
the path Pi, and the earlier portions of the schednie should cause i to have
already reached e. We will say that the duration of the schedule is the number
of steps that elapse until every packet reaches its destination; the goal is to
find a schedule of minimum duration.
What are the obstacles to having a schedule of low duration? One obstacle
would be a very long path that some packet must traverse; clearly, the duration
will be at least the length of this path. Another obstacle would be a single edge
e that many packets must cross; since each of these packets must cross e in a
distinct step, this also gives a lower bound on the duration. So, if we define the
dilation d of the set of paths {P1, P2 ..... PN} to be the maximum length of any
Pi, and the congestion c of the set of paths to be the maximum number that have
any single edge in common, then the duration is at least max(c, d) = S2 (c 4- d).
In 1988, Leighton, Maggs, and Rao proved the following striking result:
Congestion and dilation are the only obstacles to finding fast schednies, in the
Figure 13.4 A case in which the scheduling of packets matters.
sense that there is always a schedule of duration O(c + d). While the statement
of this result is very simple, it turns out to be extremely difficult to prove; and
it yields only a very complicated method to actually construct such a schedtfle.
So, instead of trying to prove this result, we’ll analyze a simple algorithm (also
closest to its destination. In this case, packet 1 will have to wait for packets proposed by Leighton, Maggs, and Rao) that can be easily implemented in a
2 and 3 at the second level of the tree; and then later it will have to"wait for distributed setting and yields a duration that is only worse by a logarithmic
packets 6 and 9 at the fourth level of the tree. Thus it will take nine steps factor: O(c + d log(raN)), where m is the number of edges and N is the number
for this packet to reach its destination. On the other hand, suppose that each of packets.
edge e manages its queue by always transmitting the packet that is farthest
from its destination. Then packet I wil! never have to wait, and it will reach
its destination in five steps; moreover, one can check that every packet will
~ Designing the Algorithm
reach its destination within six steps.
A Simple Randomized Schedule If each edge simply transmits an arbitrary
There is a natural generalization of the tree network in Figure 13.4, in waiting packet in each step, it is easy to see that the resniting schedule has
which the tree has height h and the nodes at every other level have k children. duration O(cd): at worst, a packet can be blocked by c - 1 other packets on
In this case, the queue management policy that always transmits the packet each of the d edges in its path. To reduce this bound, we need to set things up
nearest its destination restflts in some packet requiring f~ (hk) steps to reach its so that each packet only waits for a much smaller number of steps over the
destination (since the packet traveling farthest is delayed by ~2 (k) steps at each whole trip to its destination.
of f~ (h) levels), while the policy that always transmits the packet farthest from
13.11 Packet Routing 767
Chapter 13 Randomized Algorithms
766
i then moves forward one edge p~r block,
The reason a bound as large as O(cd) can arise is that the packets are until it reaches its destination
very badly timed with respect to one another: Blocks of c of them all meet
at an edge at the same time, and once this congestion has cleared, the same This schedule will work provided that we avoid a more extreme type of
thing happens at the next edge. This sounds pathological, but one should collision: It should not be the case that more than b packets are supposed to
remember that a very natural queue management policy caused it to happen show up at the same edge e at the start of the same block. If this happens, then
in Figure 13.4. However, it is the case that such bad behavior relies on very at least one of them will not be able to cross e in the next block. However, if the
unfortunate synchronization in the motion of the packets; so it is believable initial delays smooth things out enough so that no more than b packets arrive
that, if we introduce some randomization in the timing of the packets, then at any edge in the ’same block, then the schedule will work just as intended.
this kind of behavior is unlikely to happen. The simplest idea would be just to In this case, the duration will be at most b(r 4- d)--the maximum number of
randomly shift the times at which the packets are released from their sources. blocks, r + d, times the length of each block, b.
Then if there are many packets al! aimed at the same edge, they are unlikely
to hit it al! at the same time, as the contention for edges has been "smoothed (13.47) Let ~ denote the event that more than b packets are required to be
out." We now show that this kind of randomization, properly implemented, in at the same edge e at the start o[ the same block. I[ ~ does not occur, then the
fact works quite well. duration o[ the schedule is at most b(r + d).
Consider first the following algorithm, which will not quite work. It Our goal is now to choose values of r and b so that both the probability Pr [g]
involves a parameter r whose value will be determined later. and the duration b(r + d) are small quantities. This is the crux of the analysis
since, if we can show this, then (13.47) gives a bound on the duration.
Each packet i behaves as follows:
i chooses a random delay s between I and r ~ Analyzing the Algorithm
i waits at its source for s time steps To give a bound on Pr [£], it’s useful to decompose it into a union of simpler
i then moves full speed ahead, one edge per time step bad events, so that we can apply the Union Bound. A natural set of bad events
until it reaches its destination arises from considering each edge and each time block separately; if e is an
edge, and t is a block between I and r + d, we let 5vet denote the event that more
If the set of random delays were really chosen so that no two packets ever than b packets are required to be at e at the start of block t. Clearly, ~ = Ue,t~et.
"collided’--reaching the same edge at the same time--then this schedule Moreover, if Net is a random variable equal to the number of packets scheduled
would work just as advertised; its duration would be at most r (the maximum to be at e at the start of block t, then Ya is equivalent to the event [Net > b].
initial delay) plus d (the maximum number of edges on any path). However,
The next step in the analysis is to decompose the random variable Net
unless r is chosen to be very large, it is likely that a collision will occur
into a sum of independent 0-1-valued random variables so that we can apply a
somewhere in the network, and so the algorithm will probably fail: Two packets
Chernoff bound. This is naturally done by defining Xea to be equal to 1 if packet
will show up at the same edge e in the same time step t, and both will be
i is required to be at edge e at the start of block t, and equal to 0 otherwise.
required to cross e in the next step. Then Net = ~i Xea; and for different values of i, the random variables Xea
Grouping Time into Blocks To get around this problem, we consider the are independent, since the packets are choosing independent delays. (Note
following generalization of this strategy: rather than implementing the "full that Xea and Xe’t,i, where the value of i is the same, would certainly not be
speed ahead" plan at the level of individual time steps, we implement it at the independent; but our analysis does not require us to add random variables
level of contiguous blocks of time steps. of this form together.) Notice that, of the r possible delays that packet i can
choose, at most one will require it to be at e at block t; thus E [Xea] < 1/r.
For a parameter b, group intervals of b consecutive time steps Moreover, at most c packets have paths that include e; and if i is not one of
into single blocks of time these packets, then clearly E [Xeti] = 0. Thus we have
Each packet i behaves as follows:
f chooses a random delay s between I and r E t= E ti
i
i waits at its source for s blocks
13.12 Background: Some Basic Probability Definitions 769
Chapter 13 Randomized Algorithms
768
~ 2 since Our choice of the parameters b and r, combined with (13.44), now implies
We now have the setup for applying the Chernoff bound (la.4),
the following.
Net is a sum of the independent 0-l-valued random variables Xea. Indeed, the
quantities are sort of like what they were when we analyzed the problem of (13.48) With high probability, the duration of the schedule for the packets is
throwing m jobs at random onto n processors: in that case, each constituent O(c + d log (raN)).
random variable had expectation I/n, the total expectation was m/n, and we
needed m to be ~2 (n log n) in order for each processor load to be close to its Proof. We have just argued that the probability of the bad event £ is very
expectation with high probability. The appropriate analogy in the case at hand small, at most (raN)-(z-l) for an arbitrarily large constant z. And provided
is for r to play the role of n, and c to play the role of m: This makes sense that £ does not happen, (13.47) tells us that the duration of the schedule is
symbolicalfy, in terms of the parameters; it also accords with the picture that bounded by
the packets are like the jobs, and the different time blocks of a single edge are
like the different processors that can receive the jobs. This suggests that if we b(r + d) = 3-2c (r + d) = 3c + d. 3-2c = 3c + d(3q log(raN)) = O(c + d log(raN)).
want the number of packets destined for a particular edge in a particular block
to be dose to its expectation, we should have c --- f~ (r log r).
This will work, except that we have to increase the logarithmic term a
little to make sure that the Union Bound over all e and all t works out in the 13.12 Background: Some Basic Probability
end. So let’s set Definitions
C
F-- For many, though certainly not all, applications of randomized algorithms, it is
q log(raN) enough to work with probabilities defined over finite sets only; and this turns
where q is a constant that will be determined later. out to be much easier to think about than probabilities over arbitrary sets. So
Let’s fix a choice of e and t and try to bound the probability that Net we begin by considering just this special case. We’ll then end the section by
c We define/z = ~, and observe that E [Net] <-/z, so revisiting all these notions in greater generality.
exceeds a constant times 7"
we are in a position to apply the Chernoff bound (1B.42). We choose 8 = 2,
so that (1 ÷ 8)/z - ~ = 3q log(~), and we use this as ~e upper bound in Finite Probability Spaces
We have an intuitive understanding of sentences like, "If a fair coin is flipped,
~e expression Pr [Net > ~] = Pr [Net > (1 + 8)~]. Now, applying (13.42), we
the probability of ’heads’ is 1/2." Or, "If a fair die is rolled, the probability of a
have
’6’ is 1/6." What we want to do first is to describe a mathematical framework
in which we can discuss such statements precisely. The framework will work
well for carefully circumscribed systems such as coin flips and rolls of dice;
at the same time, we will avoid the lengthy and substantial philosophical
issues raised in trying to model statements like, "The probability of rain
tomorrow is 20 percent." Fortunately, most algorithmic settings are as carefully
where z is a const~t ~at can be made as l~ge as we want by choos~g ~e circumscribed as those of coins and dice, if perhaps somewhat larger and more
constant q appropriateD. complex.
We can see ~om ~is calcOa~on ~at R’s safe to set b = 3c/r; for, in ~s To be able to compute probabilities, we introduce the notion of a finite
case, ~e event ~et ~at Net > b will have yew sm~l probability for each choice probability space. (Recall that we’re dealing with just the case of finite sets for
of e ~d t. There ~e m ~fferent choices for e, and d + r different choice for now.) A finite probability space is defined by an underlying sample space
t, where we observe ~at d + r ! d + c - 1 i N. Thus we have which consists of the possible outcomes of the process under consideration.
Each point i in the sample space also has a nonnegafive probability mass
p(i) >_ 0; these probabi~ty masses need only satisfy the constraint that their
total sum is 1; that is, ~i~a p(i) = 1. We define an event ~ to be any subset of
wNch c~ be made as smN1 as we want by choosing z large enough.
13.12 Background: Some Basic Probability Definitions
Chapter 13 Randomized Algorithms 771
770
This, of course, corresponds to the intuitive way one might work out the
t--an event is defined simply by the set of outcomes that constitute it--and probability, which is to say that we can choose any identifier we want for
we define the probability of the event to be the sum of the probability masses process P2, after which there is only 1 choice out of 2k for process Pl that will
of all the points in g. That is, cause the names to agree. It’s worth checking that this intuition is really just
a compact description of the calculation above.
Pr [g] = ~p(i).
Conditional Probability and Independence
In many situations that we’ll consider, al! points in the sample space have the If we view the probability of an event ~, roughly, as the likelihood that g
same probability mass, and then the probability of an event g is simply its size is going to occur, then we may also want to ask about its probability given
relative to the size of s2; that is, in this special case, Pr [g] = Igl/ls2l. We use additional information. Thus, given another event 9: of positive probability,
~ to denote the complementary event ~-g; note that Pr[g] = 1 - Pr[g]. we define the conditional probability of ~ given 9: as
Thus the points in the sample space and their respective probability
masses form a complete description of the system under consideration; it
is the events--the subsets of the sample space--whose probabilities we are
interested in computing. So to represent a single flip of a "fair" coin, we This is the "fight" definition intuitively, since it’s performing the following
can define the sample space to be S2 = {heads, -~a±]_a} and set p(heada) = calculation: Of the portion of the sample space that consists of 9~ (the event
p(~ca±:ts) = 1/2. If we want to consider a biased coin in which "heads" is twic.e we "know" to have occurred), what fraction is occupied by ~?
as likely as "tails;’ we can define the probability masses to be p(l~eads) = 2/3 One often uses conditional probabilities to analyze Pr[g] for some com-
and p(’ca±]_s) = 1/3. A key thing to notice even in this simple example is that plicated event ~, as follows. Suppose that the events 9~1, S~2 ..... S~k each have
defining the probability masses is a part of defining the underlying problem; positive probability, and they partition the sample space; in other words, each
in setting up the problem, we are specifying whether the coin is fair or biased, outcome in the sample space belongs to exactly one of them, so ~=~ Pr [~] =
not deriving this from some more basic data. 1. Now suppose we know these values Pr [~], and we are also able to deter-
Here’s a slightly more complex example, which we could call the Process mine Pr [~ I ~] for each j = !, 2 ..... k. That is, we know what the probability
Naming, or Identifier Selection Problem. Suppose we have n processes in a dis- of g is if we assume that any one of the events Srj has occurred. Then we can
tributed system, denoted Pl, P2 ..... Pn, and each of them chooses an identifier compute Pr [~] by the following simple formula:
for itself uniformly at random from the space of all k-bit strings. Moreover, each k
process’s choice happens concurrently with those of all the other processes, Pr[~]= y~, Pr [~ I~]" Pr [~j].
and so the outcomes of these choices are unaffected by one another. If we view j=l
each identifier as being chosen from the set {0, 1, 2 ..... 2k - 1} (by.consider- To justify this formula, we can unwind the right-hand side as follows:
ing the numerical value of the identifier as a number in binary notation), then
the sample space S2 could be represented by the set of all n-tuples of integers, erpCq] k
with each integer between 0 and 2k - 1. The sample space would thus have j---l~Pr[~ ]°°]]’Pr[~r]]=]~l’= Pr[~j]
¯ Pr [~]] = ~ Pr [g ~ 9:j] = Pr [g].
(2/~)n = 2kn points, each with probability mass 2-/% ]=1
Now suppose we are interested in the probability that processes Pl and Independent Events Intuitively, we say that two events are independent if
P2 each choose the same name. This is an event g, represented by the subset information about the outcome of one does not affect our estimate of the
consisting of all n-tuples from S2 whose first two coordinates are the same. likelihood of the other. One way to make this concrete would be to declare
There are 2/~(n-1) such n-tuples: we can choose any value for coordinates 3 events ~ and S: independent if Pr[~ 1~] = Pr [~], and Pr [S:I g] = Pr IS:]. (We’ll
through n, then any value for coordinate 2, and then we have no freedom of assume here that both have positive probability; otherwise the notion of
choice in coordinate !. Thus we have independence is not very interesting in any case.) Actually, if one of these
two equalities holds, then the other must hold, for the following reason: If
Pr[g] = ~ p(i) = 2~(n-1) ¯ 2-kn = 2.k" Pr [~ l S:] = Pr [~], then
13.12 Background: Some Basic Probability Definitions
773
Chapter 13 Randomized Algorithms
772
Pr [£ rn 5:1 _ Pr [£],
E1
Pr
and hence Pr [E N 9:] = Pr (El. Pr [5:], from which the other equality holds as
well.
It turns out to be a little cleaner to adopt this equivalent formulation as
E2 E3
our working definition of independence. Formally, we’ll say that events £ and
5: are independent if Pr[g r3 9:] = Pr [£]. Pr [9:].
This product formulation leads to the following natural generalization. We Figure 13.5 The Union Bound: The probability of a union is maximized when the events
say that a collection of events gx, E2 ..... £n is independent if, for every set of have no overlap.
indices I _ {1, 2 ..... n}, we have
events, the probability mass of a point that is counted once on the left-hand
i~I side will be counted one or more times on the right-hand side. (See Figure 13.5.)
It’s important to notice the following: To check if a large set of events This means that for a general set of events, the equality in (13.49) is relaxed to
is independent, it’s not enough to check whether every pair of them is an inequality; and this is the content of the Union Bound. We have stated the
independent. For example, suppose we flip three independent fair coins:, If £i. Union Bound as (13.2), but we state it here again for comparison with (13.49).
denotes the event that the ith coin comes up heads, then the events gx, gz, E3
are independent and each has probability 1/2. Now let A denote the event that (13.50} (The Union Bound) Given events E~, ~2 ..... En, we have
coins 1 and 2 have the same value; let B denote the event that coins 2 and 3 have
the same value; and let C denote the event that coins 1 and 3 have different Pr i< Pr
values. It’s easy to check that each of these events has probability 1/2, and the i=1 i=!
intersection of any two has probability 1/4. Thus every pair drawn from A, B, C ..... .......... .... ~ : ....... :~ ~
is independent. But the set of all three events A, B, C is not independent, since
Given its innocuous appearance, the Union Bound is a surprisingly pow-
Pr [A~BOC]=O. erful tool in the analysis of randomized algorithms. It draws its power mainly
from the following ubiquitous style of analyzing randomized algorithms. Given
The Union Bound a randomized algorithm designed to produce a correct result with high proba-
Suopose we are given a set of events El, ~2 ..... En, and we are interested bility, we first tabulate a set of "bad events" ~1, ~2 ..... ~n with the following
in the probability that any of them happens; that is, we are interested in the property: if none of these bad events occurs, then the algorithm will indeed
probaNlity Pr [t-J~___lEi]. If the events are all pairwise disjoint from one another, produce the correct answer. In other words, if 5~ denotes the event that the
then the probability mass of their union is comprised simply of the separate algorithm fails, then we have
contributions from each event. In other words, we have the following fact.
Pr[Sq<Pr 8i ¯
pai~.
But it’s hard to compute the probability of this union, so we apply the Union
Bound to conclude that
ti=l ._I i=1
In general, a set of events E~, ~2 ..... ~n may overlap in complex ways. In Pr [~ < Pr gi -< Pr [gi].
this case, the equality in 03.49) no longer holds; due to the overlaps among i=l
13.12 Background: Some Basic Probability Definitions 775
Chapter 13 Randomized Mgorithms
774
(i) The sample space ~2.
Now, if in fact we have an algorithm that succeeds with very high probabil- (ii) A collection g of subsets of ~2; these are the only events on which we are
ity, and if we’ve chosen our bad events carefully, then each of the probabilities allowed to compute probabilities.
Pr [£i] will be so small that even their sum--and hence our overestimate of (iii) A probability function Pr, which maps events in g to real numbers in
the failure probability--will be small. This is the key: decomposing a highly
complicated event, the failure of the algorithm, into a horde of simple events [0, 1].
whose probabilities can be easily computed.
Here is a simple example to make the strategy discussed above more The collection $ of allowable events can be any family of sets that satisfies
concrete. Recall the Process Naming Problem we discussed earlier in this the following basic" closure properties: the empty set and the full sample space
section, in which each of a set of processes chooses a random identifier. ~2 both belong to $; if ~ ~ ~, then ~ ~ ~ (closure under complement); and
Suppose that we have !,000 processes, each choosing a 32-bit identifier, and ~ g, then U~=l~~ ~ $ (closure under countable union). The
we are concerned that two of them will end up choosing the same identifier. probability function Pr can be any function from $ to [0, 1] that satisfies
Can we argue that it is unlikely this will happen~. To begin with, let’s denote the following basic consistency properties: Pr [¢] = O, Pr [~2] = 1, Pr [~] --
this event by 9:. While it would not be overwhelmingly difficult to compute 1 - Pr [~], and the Union Bound for disjoint events (13.49) should hold even
Pr [Y] exactly, it is much simpler to bound it as follows. The event 9: is really a for countable unions--if ~1, £2, ~3 .... ~ ~ are all pairwise disjoint, then
union of (l°2°°) "atomic" events; these are the events £ii that processes Pi and
p1 choose the same identifier. It is easy to verify that indeed, 9: = t~<7giT. Now,
for any i # j, we have Pr [£i7] = 2-32, by the argument in one of our earlier
examples. Applying the Union Bound, we have Pr ~, =- ~ Pr [~] .
1----1 i----1
Notice how, since we are not buiAding up Pr from the more basic notion of a
probability mass anymore, (13.49) moves from being a theorem to simply a
Now, (10200) is at most half a million, and 232 is (a little bit) more than 4 billion, required property of Pr.
so this probability is at most 4~00 = .000125. When an infinite sample space arises in our context, it’s typically for the
following reason: we have an algorithm that makes a sequence of random
Infinite Sample Spaces decisions, each one from a fixed finite set of possibilities; and since it may run
So far we’ve gotten by with finite probability spaces only. Several of the for arbitrarily long, it may make an arbitrarily large number of decisions. Thus
sections in this chapter, however, consider situations in which a random we consider sample spaces ~2 constructed as follows. We start with a finite set
process can run for arbitrarily long, and so cannot be well described by a of symbols X = [1, 2 ..... n}, and assign a weight w(i) to each symbol i ~ X.
sample space of finite size. As a result, we pause here to develop the notion We then define ~ to be the set of all infinite sequences of symbols ffomX (with
of a probability space more generally. This wil! be somewhat technical, and in repetitions allowed). So a ~ical element of fl will look like (x~,
part we are providing it simply for the sake of completeness: Although some of with each entry x~ ~ X.
our applications require infinite sample spaces, none of them really exercises The simplest type of event we will be concerned with is as follows: it is the
the full power of the formalism we describe here. event that a point w ~ fl begins with a particular finite sequence of symbols.
Once we move to infinite sample spaces, more care is needed in defining a Thus, for a fi~te sequence a = xlx2 ... Xs of length s, we define the prefix
probability function. We cannot simply give each point in the sample space fl event associated with cr to be the set of all sample points of fl whose first s
a probability mass and then compute the probability of every set by summing. entries form the sequence a. We denote this event by £~, and we define its
Indeed, for reasons that we will not go into here, it is easy to get into trouble probability to be Pr [a~] = w(xOw(x2)... W(Xs).
if one even allows every subset of fl to be an event whose probability can be The following fact is in no sense easy to prove.
computed. Thus a general probability space has three components:
776 Chapter 13 Randomized Algorithms Solved Exercises 777
subset S of the nodes with the property that every node in V - S is adjacent
(13.51) There is a probability space ( ~2. g, Pr), satisfying the required closure to a node in S. We call such a set S a dominating set, since it "dominates" all
and consistency properties, such that S2 is the sample space defined above. other nodes in the graph. If we give uplink transmitters only to the nodes in a
Ea ~ g for each finite sequence or, and Pr [~.] = va(xl)va(x2) " " ra(xs), dominating set S, we can still extract data from all nodes: Any node u ~ S can
choose a neighbor v ~ S, send its data to u, and have v relay the data back to
Once we have this fact, the closure of g under complement and countable the base station.
union, and the consistency of Pr with respect to these operations, allow us to
The issue is now to find a dominating set S of minimum possible size,
compute probabilities of essentially any "reasonable" subset of
since this will minimize the number of uplink transmitters we need. This is an
In our infinite sample space S2, with events and probabilities defined as NP-hard problem; in fact, proving this is the crux of Exercise 29 in Chapter 8.
above, we encounter a phenomenon that does not naturally arise with finite (It’s also worth noting here the difference between dominating sets and vertex
sample spaces. Suppose the set X used to generate S2 is equal to {0, !}, and covers: in a dominating set, it is fine to have an edge (u, u) with neither u nor
w(0) = w(1) = 1/2. Let ~ denote the set consisting of all sequences that contain v in the set S as long as both u and v have neighbors in S. So, for example, a
at least one entry equal to 1. (Note that £ omits the "all-0" sequence.) We graph consisting of three nodes all connected by edges has a dominating set
observe that g is an event in g, since we can define ai to be the sequence of of size 1, but no vertex cover of size 1.)
i - 1 0s followed by a 1, and observe that £ = U~__l£~c Moreover, all the events
Despite the NP-hardness, it’s important in applications like this to find as
£~i are pairwise disjoint, and so small a dominating set as one can, even if it is not optimal. We will see here
that a simple randomized strategy can be quite effective. Recal! that in our
i=I i=I graph G, each node is incident to exactly d edges. So clearly any dominating
set will need to have size at least ~+~, since each node we place in a dominating
Here, then, is the phenomenon: It’s possible for an event to have probability set can take care only of itself and its d neighbors. We want to show that a
1 even when it’s not equal to the whole sample space f2. Similarly, Pr random selection of nodes will, in fact, get us quite close to this simple lower
1 - Pr[g] = O, and so we see that it’s possible for an event to have probability bound.
0 even when it’s not the empty set. There is nothing wrong with any of these
Specifically, show that for some constant c, a cn setlogofn ~ nodes chosen
results; in a sense, it’s a necessary step if we want probabilities defined over uniformly at random from G will be a dominating set with high probability.
infinite sets to make sense. It’s simply that in such cases, we should be careful (In other words, this completely random set is likely to form a dominating set
to distinguish between the notion that an event has probability 0 and the
that is only O(log n) times larger than our simple lower bound of ~+~.)
intuitive idea that the event "can’t happen."
Solution Let k = cn~,log n
where we will choose the constant c later, once we
Solved Exercises have a better idea of what’s going on. Let ~ be the event that a random choice
of k nodes is a dominating set for G. To make the analysis simpler, we will
Solved Exercise 1 consider a model in which the nodes are selected one at a time, and the same
Suppose we have a collection of small, low-powered devices scattered around node may be selected twice (if it happens to be picked twice by our sequence
a building. The devices can exchange data over short distances by wireless of random choices).
communication, and we suppose for simplicity that each device has enough Now we want to show that if c (and hence k) is large enough, then Pr [g] is
range to communicate with d other devices. Thus we can model the wireless close to 1. But g is a very complicated-looking event, so we begin by breaking
connections among these devices as an undirected graph G = (V, E) in which it down into much simpler events whose probabilities we can analyze more
each node is incident to exactly d edges. easily.
Now we’d like to give some of the nodes a stronger uplink transmitter that To start with, we say that a node w dominates a node v if w is a neighbor
they can use to send data back to a base station. Giving such a transmitter to of v, or w = u. We say that a set S dominates a node v if some element of S
every node would ensure that they can all send data like this, but we can dominates v. (These definitions let us say that a dominating set is simply a
achieve this while handing out fewer transmitters. Suppose that we find a set of nodes that dominates every node in the graph.) Let ~ [u, t] denote the
Solved Exercises
778 Chapter 13 Randomized Algorithms 779
event that the tth random node we choose dominates node u. The probability to occur if and only if one of the events Dv fails to occur, so g = Uv~v. Thus,
of this event can be determined quite easily: of the n nodes in the graph, we by the Union Bound (13.2), we have
must choose u or one of its d neighbors, and so
< Pr _< n = -c-+"
Pr [D Iv, t]] - an+ 1 uEV
(a) Let c* denote the maximum possible number of equations that can be For part (b), let’s press our luck by trying the same algorithm. Again let Xr
satisfied by an assignment of values to variables. Give a polynomial-time be a random variable equal to 1 if the rth equation is satisfied, and 0 otherwise;
1 . equations.
algorithm that produces an assignment satisfying at least ~c let X be the total number of satisfied equations; and let c* be the optimum.
If you want, your algorithm can be randomized; in this case, the expected We want to claim that E [Xr] = 1/2 as before, even when there can be
1 *. In either case, you
number of equations it satisfies should be at least ~c an arbitrary number of variables in the rth equation; in other words, the
should prove that your algorithm has the desired performance guarantee. probability that the equation takes the correct value rood 2 is exactly !/2.
(b) Suppose we drop the condition that each equation must have exactly two We can’t just write down all the cases the way we did for two variables per
variables; in other words, now each equation simply specifies that the equation, so we will use an alternate argument.
sum of an arbitrary subset of the variables, rood 2, is equal to a particular In fact, there are two natural ways to prove that E [Xr] = 1/2. The first
value br. uses a trick that appeared in the proof of (13.25) in Section 13.6 on hashing:
Again let c* denote the maximum possible numbel of equations We consider assigning values arbitrarily to all variables but the last one in
that can be satisfied by an assignment of values to variables, and give the equation, and then we randomly assign a value to the last variable x.
a polynomial-time algorithm that produces an assignment satisfying at Now, regardless of how we assign values to all other variables, there are two
least ½c* equations. (As before, your algorithm can be randomized.) If ways to assign a value to x, and it is easy to check that one of these ways will
you believe that your algorithm from part (a) achieves this guarantee here satisfy the equation and the other will not. Thus, regardless of the assignments
as well, you can state this and justify it with a proof of the performance to all variables other than x, the probability of setting x so as to satisfy the
guarantee for this more general case. equation is exactly !/2. Thus the probability the equation is satisfied by a
random assignment is 1/2.
Solution Let’s recall the punch line of the simple randomized algorithm for (As in the proof of (13.25), we can write this argument in terms of con-
MAX 3-SAT that we saw earlier in the chapter: If you’re given a constraint dltional probabilities. If £ is the event that the equation is satisfied, and
satisfaction problem, assigning variables at random can be a surprisingly 9’b is the event that the variables other than x receive a sequence of val-
effective way to satisfy a constant fraction of all constraints. ues b, then we have argued that Pr [£ I~] = 1/2 for all b, and so Pr[~] =
We now try applying this principle to the problem here, beginning with Y-~.b Pr [£ IN:b]. Pr [3-b] = (1/2) Y~.b Pr [9~b] = 1/2.)
part (a). Consider the algorithm that sets each variable independently and uni- An alternate proof simply counts the number of ways for the rth equation
formly at random. How well does this random assignment do, in expectation? to have an even sum, and the number of ways for it to have an odd sum. If
As usual, we will approach this question using linearity of expectation: If X is we can show that these two numbers are equal, then the probability that a
a random variable denoting the number of satisfied equations, we’ll break X random assignment satisfies the rth equation is the probability it gives it a sum
up into a sum of simpler random variables. with the right even/odd parity, which is 1/2.
For some r between ! and k, let the rth equation be " In fact, at a high level, this proof is essentially the same as the previous
one, with the difference that we make the underlying counting problem
(xi + xj) mod 2 -- br. explicit. Suppose that the rth equation has t terms; then there are 2t possible
assignments to the variables in this equation. We want to claim that 2t-1
Let Xr be a random variable equal to 1 if this equation is satisfied, and 0 assignments produce an even sum, and 2t-1 produce an odd sum, which will
otherwise. E [Xr] is the probability that equation r is satisfied. Of the four show that E [Xr] = 1/2. We prove this by induction on t. For t = 1, there are
possible assignments to equation i, there are two that cause it to evaluate to 0 just two assignments, one of each parity; and for t = 2, we already proved this
rood 2 (xi = x1 = 0 and x~ = xj = 1) and two that cause it to evaluate to 1 rood earlier by considering all 22 = 4 possible assignments. Now suppose the claim
2 (x~ = 0; xj = 1 and x~ = i; x] = 0). Thus E [Xr] = 2/4 = 1/2. holds for an arbitrary value of t - 1. Then there are exactly 2t-1 ways to get
Now, by linearity of expectation, we have E IX] = ~r E [Xr] = k/2. Since an even sum with t variables, as follows:
the maximum number of satisfiable equations c* must be at most k, we satisfy
2t-2 ways to get an even sum on the first t - 1 variables (by induction),
at least c*/2 in expectation. Thus, as in the case of MAX 3-SAT, a simple random
assignment to the variables satisfies a constant fraction of all constraints. followed by an assignment of 0 to the tth, plus
Exercises 783
Chapter 13 Randomized Mgofithms
782
t-2 Suppose we have a system with n processes. Certain paks of pro-
o 2 ways to get an odd sum on the first t - 1 variables (by induction),
cesses are in conflict, meaning that they both require access to a shared
followed by an assignment of I to the tth.
resource. In a given time interval, the goal is to schedule a large subset
The remaining 2t-1 assignments give an odd sum, and this completes the S of the processes to run--the rest will remain idle--so that notwo con-
induction step. flicting processes are both in the scheduled set S. We’ll call such a set S
Once we have E [Xr] = 1/2, we conclude as in part (a): Linearity of conflict-free.
expectation gives us E IX] = ~r E [Xr] = k/2 >_ c*/2. One can picture this process in terms of a graph G = (V, E) with a
node representing each process and an edge joining pairs of processes
that are in conflict. It is easy to check that a set of processes S is conflict-
Exercises free if and only if it forms an independent set in G. This suggests that
finding a maximum-size conflict-free set S, for an arbitrary conflict G,
3-Coloring is a yes/no question, but we can phrase it as an optimization
will be difficult (since the general Independent Set Problem is reducible
problem as follows. to this problem). Nevertheless, we can still look for heuristics that find
Suppose we are given a graph G = (V, E), and we want to color each a reasonably large conflict-free set. Moreover, we’d like a simple method
node with one of three colors, even if we aren’t necessarily able to give for achieving this without centralized control: Each process should com-
different colors to every pair of adjacent nodes. Rather, we say that.an murticate with only a small number of other processes and then decide
edge (u, v) is satisfied if the colors assigned to u and v are different. whether or not it should belong to the set S.
Consider a 3-coloring that mmxJmizes the number of satisfied edges,
and let c* denote ~s number. Give a polynomial-time algorithm that We wil! suppose for purposes of this question that each node has
2 ¯ edges. If you want, your exactly d neighbors in the graph G. (That is, each process is in conflict
produces a 3-coloring that satisfies at least ~c , with exactly d other processes.)
algorithm can be randomized; in this case, the expected number 6f edges
(a) Consider the following simple protocol.
it satisfies should be at least~c. 2 *
Consider a county in which 100,000 people vote in an election. There Each process Pi independently picks a random value xi; it sets xi to 1 with
are only two candidates on the ballot: a Democratic candidate (denoted probability ½ and sets x~ to 0 with probability ½. It then decides to enter
D) and a Republican candidate (denoted R). As it happens, this county is the set S if and only if it chooses the value 1, and each of the processes
heavily Democratic, so 80,000 people go to the polls with the intention with which it is in conflict chooses the value O.
of voting for D, and 20,000 go to the polls with the intention of voting
for R. Prove that the set S resulting from the execution of t~s protocol is
However, the layout of the ballot is a little confusing, so each voter, conflict-free. Also, give a formula for the expected size of S in terms
independently and with probability~-N,1 votes for the ~ong candidate-- of n (the number of processes) and d (the number of conflicts per
that is, the one that he or she didn’t intend to vote for. (Remember that process).
in this election, there are only two candidates on the ballot.) The choice of the probability ½ in the protocol above was fairly ar-
Let X denote the random variable equal to the number of votes bitrary, and it’s not clear that it should give the best system perfor-
received by the Democratic candidate D, when the voting is conducted mance. A more general specification of the protocol would replace
with this process of error. Determine the expected value of X, and give the probability ½ by a parameter p between 0 and 1, as follows.
an explanation of your derivation of this value.
Each process Pi independently picks a random value xi; it sets xi to 1
In Section 13.1, we saw a simple distributed protocol to solve a particu-
with probability p and sets xi to 0 with probability 1 - p. It then decides
lar contention-resolution problem. Here is another setting in which ran-
to enter the set S if and only if it chooses the value 1, and each of the
domization can help with contention resolution, through the distributed
processes with which it is in conflict chooses the value O.
construction of an independent set.
Exercises 785
784 Chapter 13 Randomized Algorithms
Figure 13.6 Towns TI, T2 .... Tn need to decide how to share the cost of the cable.
way. If we have two clauses such that one consists of just the term turned away. If the seller rejects the bid, buyer i departs and the bid
is withdrawn; and only then does the seller see any future buyers.
xi, and the other consists of just the negated term ~, then this is a
pretty direct contradiction. Suppose an item is offered for sale, and there are n buyers, each with
Assume that our instance has no such pair of "contacting a distinct bid. Suppose further that the buyers appear in a random order,
clauses"; that is, for no variable xg do we have both a clause C and that the seller knows the number n of buyers. We’d like to design
and a clause C’ = {Y~}. Modify the randomized procedure above to im- a strategy whereby the seller has a reasonable chance of accepting the
prove the approximation factor from 1/2 to at least .6. That is, change highest of the n bids. By a strategy, we mean a rule by which the seller
the algorithm so that the expected number of clauses satisfied by the decides whether to accept each presented bid, based only on the value of
process is at least .6k. n and the seciuence of bids seen so far.
For example, the seller could always accept the first bid presented.
(c) Give a randomized polynomial-time algorithm for the general MAX
SAT Problem, so that the expected number of clauses satisfied by the This results in the seller accepting the highest of the n bids with probabil-
ity only l/n, since it requires the highest bid to be the first one presented.
algorithm is at least a .6 fraction of the maximum possible.
(Note that, by the example in part (a), there are instances where Give a strategy under which the seller accepts the highest of the n bids
one cannot satisfy more than k/2 clauses; the point here is that with probability at least 1/4, regardless of the value of n. (For simplicity,
we’d still like an efficient algorithm that, in expectation, can satisfy you may assume that n is an even number.) Prove that your strategy
a .6 fraction of the maximum that can be satisfied by an optimal achieves this probabilistic guarantee.
assignment.)
10. Consider a very simple online auction system that works as follows. There
8. Let G = (V, E) be an undirected graph with n nodes and ra edges. For a
are n bidding agents; agent i has a bid b~, which is a positive natural
subset X __c V, we use G[X] to denote the subgraph induced on X--that is,
number. We will assume that all bids b~ are distinct from one another.
the graph whose node set is X and whose edge set consists of all edges
The bidding agents appear in an order chosen uniformly at random, each
of G for which both ends lie in X.
proposes its bid b~ in turn, and at all times the system maintains a variable
We are given a natural number k _< n and are interested in finding a b* equal to the highest bid seen so far. (Initially b* is set to 0.)
set of k nodes that induces a "dense" subgraph of G; we’ll phrase this
What is the expected number of times that b* is updated when this
concretely as follows. Give a polynomial-time algorithm that produces,
process is executed, as a function of the parameters in the problem?
for a given natural number k _< n, a set X _ V ink(k-l)
of k nodes with the property
edges.
that the induced subgraph G[X] has at least ~ Example. Suppose b1 = 20, b2 = 25, and b3 = 10, and the bidders arrive in
You may give either (a) a deterministic algorithm, or (b) a randomized the order 1, 3, 2. Then b* is updated for 1 and 2, but not for 3.
algorithm that has an expected running time that is polynomial, ahd that
11. Load balancing algorithms for parallel or distributed systems seek to
//~only outputs correct answers. spread out collections of computing jobs over multiple machines. In this
9./Suppose you’re designing strategies for selling items on a popular aucn. on
way, no one machine becomes a "hot spot." If some kind of central
J Web site. Unlike other auction sites, this one uses a one-pass auc~on, coordination is possible, then the load can potentially be spread out
in which each bid must be immediately (and irrevocably) accepted or
almost perfectly. But what if the jobs are coming from diverse sources
refused. Specifically, the site works as follows. that can’t coordinate? As we saw in Section 13.10, one option is to assign
¯ First a seller puts up an item for sale. them to machines at random and hope that this randomization wil! work
¯ Then buyers appear in sequence. to prevent imbalances. Clearly, this won’t generally work as well as a
¯ When buyer i appears, he or she makes a bid b~ > O. perfectly centralized solution, but it can be quite effective. Here we try
¯ The seller must decide immediately whether to accept the bid or not. analyzing some variations and extensions on the simple load balancing
If the seller accepts the bid, the item is sold and all future buyers are heuristic we considered in Section 13.10.
Exercises 791
Chapter 13 Randomized Mgofithms
79O
each job Jg, exactly n of the basic processes associated with y~ have been
Suppose you have k machines, and k jobs show up for processing. assigned to each of the two machines. An assignment of basic processes
Each job is assigned to one of the k machines independently at random to machines will be called nearly balanced if, for each job y~, no more
(with each machine equally likely). than -~n of the basic processes associated with J~ have been assigned to
(a) Let N(k) be the expected number of machines that do not receive any the same machine.
jobs, so that N(k)/k is the expected fraction of machines with nothing (a) Show that for arbitrarily large values of n, there exist sequences of
to do. What is the value of the lkntt lirnk-~ N(k)/k? Give a proof of jobs I1 ..... Jn for which no perfectly balanced assignment exists.
your answer. Suppose that n >_ 200. Give an algorithm that takes an arbitrary se-
(b) Suppose that machines are not able to queue up excess jobs, so if the quence o~ jobs J1 ..... Yn and produces a nearly balanced assignment
random assignment of jobs to machines sends more than one job to of basic processes to machines. Your algorithm may be randomized,
a machine M, then iV! will do the first of the jobs it receives and reject in which case its expected running time should be polynomial, and
the rest. Let R(k) be the expected number of rejected jobs; so it should always produce the correct answer.
is the expected fraction of rejected jobs. What is limk~oo R(k)/k? Give
a proof of your answer. 15. Suppose you are presented with a very large set $ of real numbers, and
(c) Now assume that machines have slightlylarger buffers; each machine you’d like to approximate the median of these numbers by sampling. You
M will do the first two jobs it receives, and reject anY additional jobs. may assume all the mtmbers in S are distinct. Let n = tS[; we ~ say that
Let R2(k) denote the expected number of rejected jobs under this .rule. a number x is an e-approxJmate median of S if at least (½ - e)n numbers
What is lirn~_~oo R2(k)/k? Give a proof of your answer. in S are less than x, and at least (½ - e)n numbers in S are greater than x.
Consider an algorithm that works as follows. You select a subset
12. Consider the following analogue of Karger’s algorithm for finding mini- S’ _~ S tmiformly at random, compute the median of S’, and return this
mum s-t cuts. We will contract edges iteratively using the following ran- as an approximate median of S. Show that there is an absolute constant
domlzed procedure. In a given iteration, let s and t denote the possibly c, independent of n, so that ff you apply this algorithm with a sample S’
contracted nodes that contain the original nodes s and t, respectively. To of size c, then with probability at least .99, the number returned will be
make sure that s and t do not get contracted, at each iteration we delete a (.05)-approximate median of S. (You may consider either the version of
any edges connecting s and t and select a random edge to contract among the algorithm that constructs S’ by sampling with replacement, so that an
the remaining edges. Give an example to show that the probability that element of S can be selected multiple times, or one without replacement.)
this method finds a minimum s-t cut can be exponentially small.
16. Consider the following (partially specified) method for transmitting a
13. Consider a balls-and-bins experiment with 2n balls but only two bins. message securely between a sender and a receiver. The message ~11 be
As usual, each ball independently selects one of the two bins, bqth bins represented as a string of bits. Let ~ = {0, 1}, and let Z* denote the set of
equally likely. The expected number of balls in each bin is n. In this all strings of 0 or more bits (e.g., 0, 00, 1110001 a ~*). The "empty string,"
problem, we explore the question of how big their difference is likely to with no bits, will be denoted x ~ ~*.
be. Let X1 and X2 denote the number of balls in the two bins, respectively.
(X1 and X2 are random variables.) Prove that for any e > 0 there is a The sender and receiver share a secret function f : ~* x ~ -+ ~. That
is, f takes a word and a bit, and returns a bit. When the receiver gets a
constant c > 0 such that the probability Pr IX1 - X2 > c4~ <_ e.
sequence of bits ~ ~ ~*, he or she runs the following, method to decipher
it.
14. Some people designing parallel physical simulations come to you with
the following problem. They have a set P of k basic processes and want to
assign each process to run on one of two machines, M1 and M2. They are Let a=ala2-..~n, where
then going to run a sequence of n jobs, Yl ..... Jn. Each job Ji is represented The goal is to produce an ~-bit deciphered message,
by a set Pi c__ P of exactly 2n basic processes which must be running
(each on its assigned machine)while the job is proc_ess.ed..A~, assi..gf~n~ e~nt Set /3
of basic processes to machines will be called perfectly oalancea it,
Notes and Further Reading 793
Chapter !3 Randomized Algorithms
792
We wi~ be interested in the expected cost of a vertex cover selected by
n
this algorithm.
Set fli=f(fll fl2"’’fli-l,~i)
Eadfor (a) Is this algorithm a c-approximation algorithm for the Minimum
Output fl Weight Vertex Cover Problem for some constant c? Prove your an-
swer.
One could view this is as a type of "stream cipher with feedback." One Is this algorithm a c-approximation algorithm for the M~h~imum Cardi-
problem with this approach is that, if any bit a~ gets corrupted in trans- nality Vertex Cover Problem for some constant c? Prove your answer.
mission, it will corrupt the computed value of ~1 for all ] > i. (Hint; For an edge, let Pe denote the probability that edge e is
We consider the following problem. A sender S wants to transmit the selected as an uncovered edge in this algorithm. Can you express
Rk. With each the expected value of the solution in terms of these probabilities? To
one, he shares a different secret function f~. Thus he sends a different bound the value of an optimal solution in terms of the Pe probabilities,
encrypted message ~I0 to each receiver, so that a~0 decrypts to fl when try to bound the sum of the probabilities for the edges incident to a
the above algorithm is run with the function f(0. given vertex v, namely, Pe.)
e incident to u
Unfortunately, the communication channels are very noisy, so each of
the n bits in each of the k transmissions is independently corrupted (i.e.,
flipped to its complement) with probability 1/4. Thus no single receiver Notes and Further Reading
on his or her own is likely to be able to decrypt the message corrhctly.’
Show, however, that ff k is large enough as a function of n, then the k The use of randomization in algorithms is an active research area; the books
receivers can jointly reconstruct the plain-text message in the following by Motwani and Raghavan (1995) and Mitzenmacher and Upfal (2005) are
way. They get together, and without revealing any of the ~0 or the f~0, devoted to this topic. As the contents of this chapter make clear, the types
they interactively run an algorithm that will produce the correct fl with of probabilistic arguments used in the study of basic randomized algorithms
probability at least 9/!0. (How large do you need k to be inyour algorithm?) often have a discrete, combinatorial flavor; one can get background in this
style of probabilistic analysis from the book by Feller (1957).
17. Consider the following simple model of gambling in the presence of bad The use of randomization for contention resolution is common in many
odds. At the beginning, your net profit is 0. You play for a sequence of n systems and networking applications. Ethernet-style shared communication
rounds; and in each round, your net profit increases by 1 with probability media, for example, use randomized backoff protocols to reduce the number
1/3, and decreases by 1 with probability 2/3. of collisions among different senders; see the book by Bertsekas and Gallager
Show that the expected number of steps in which your net profit is (1992) for a discussion of this topic.
positive can be upper-bounded by an absolute constant, independent of The randomized algorithm for the Minimum-Cut Problem described in the
the value of n. text is due to Karger, and afier further optimizafions due to Karger and Stein
(1996), it has become one of the most efficient approaches to the minimum
18. In this problem, we will consider the following simple randomized algo- cut problem. A number of further extensions and applications of the algorithm
rithm for the Vertex Cover Algorithm. appear in Karger’s (1995) Ph.D. thesis.
The approximation algorithm for MAX 3-SAT is due to Johnson (!974), in
Start with S=~ a paper that contains a number of early approximation algorithms for NP-hard
While S is not a vertex cover, problems. The surprising punch line to that section--that every instance of 3-
Select au edge e not covered by S SAT has an assignment satisfying at least 7/8 of the clauses--is an example
Select one end of e at random (each end equally likely) of the probabilistic method, whereby a combinatorial structure with a desired
Add the selected node to $ property is shown to exist simply by arguing that a random structure has
Endwhile the property with positive probability. This has grown into a highly refined
Chapter 13 Randomized Algorithms
794
technique in the area of combinatorics; the book by A_lon and Spencer (2000)
covers a wide range of its applications.
Hashing is a topic that remains the subject of extensive study, in both
theoretical and applied settings, and there are many variants of the basic
method. The approach we focus on in Section 13.6 is due to Carter and Wegman Epilogue: Algorithms That Run
(1979). The use of randomization for finding the closest pair of points in the
plane was originally proposed by Rabin (1976), in an influential early paper Forever
that exposed the power of randomization in many algorithmic settings. The
algorithm we describe in this chapter was developed by Golin et al. (1995).
The technique used there to bound the number of dictionary operations, in
which one sums the expected work over all stages of the random order, is
sometimes referred to as backwards analysis; this was originally proposed
by Chew (1985) for a related geometric problem, and a number of further
applications of backwards analysis are described in the survey by Seidel (1993).
The performance guarantee for the LRU caching algorithm is due to Sleator Every decade has its addictive puzzles; and if Rubik’s Cube stands out as the
and Tarjan (1985), and the bound for the Randomized Marking algorithm is
preeminent solita~e recreation of the early 1980s, then Tetris evokes a similar
due to Fiat, Karp, Luby, McGeoch, Sleator, and Young (1991). More generally,
nostalgia for the late eighties and early nineties. Rubik’s Cube and Tetris have a
the paper by Sleator and Tarian highlighted the notion of online algorithms, number of things in common--they share a highly mathematical flavor, based
which must process input without knowledge of the future; caching is one
on stylized geometric forms--but the differences between them are perhaps
of the fundamental applications that call for such algorithms. The book by more interesting.
Borodin and E1-Yaniv (1998) is devoted to the topic of online algorithms and
includes many further results on caching in particular. Rubik’s Cube is a game whose complexity is based on an enormous search
space; given a scrambled configuration of the Cube, you have to apply an
There are many ways to formulate bounds of the type in Section 13.9,
intricate sequence of operations to reach the ultimate goal. By contrast, Tetris--
showing that a sum of 0-1-valued independent random variables is unlikely to
deviate far from its mean. Results of this flavor are generally called Ctzeraoff in its pure form--has a much fuzzier definition of success; rather than aiming
for a particular endpoint, you’re faced with a basically infinite stream of events
bounds, or Chernoff-Hoeffding bounds, after the work of Chernoff (1952)
to be dealt with, and you have to react continuously so as to keep your head
and Hoeffding (1963). The books by A!on and Spencer (1992), Motwani and
above water.
Raghavan (1995), and Mitzenmacher and Upfa! (2005) discuss these kinds of
bounds in more detail and provide further applications. These novel features of Tetris parallel an analogous set of themes that has
The results for packet routing in terms of congestion and dilation are emerged in recent thinking about algorithms. Increasingly, we face settings in
due to Leighton, Maggs, and Rao (1994). Routing is another area in which which the standard view of algorithms--in which one begins with an input,
randomization can be effective at reducing contention and hot spots; the book runs for a finite number of steps, and produces an output--does not ready
apply. Rather, if we think about Internet touters that move packets while
by Leighton (1992) covers many further applications of this principle.
avoiding congestion, or decentralized file-sharing mechanisms that replicate
Notes on the Exercises Exercise 6 is based on a result of Benny Chor and
and distribute content to meet user demand, or machine learning routines
Madhu Sudan; Exercise 9 is a version of the Secretary Problem, whose popu- that form predictive models of concepts that change over time, then we are
larization is often credited to Martin Gardner. dealing with algorithms that effectively are designed to ran forever. Instead
of producing an eventual output, they succeed if they can keep up with an
environment that is in constant flux and continuously throws new tasks at
them. For such applications, we have shifted from the world of Rubik’s Cube
to the world of Tetris.
Epilogue: Algorithms That Run Forever 797
Epilogue: Algorithms That Run Forever
796
There are many settings in which we could explore this theme, and as our
final topic for the book we consider one of the most compelling: the design of
algorithms for high-speed packet switching on the Internet.
_____------ 02
~ The Problem
A packet traveling from a source to a destination on the Internet can be thought
of as traversing a path in a large graph whose nodes are switches and whose
edges are the cables that link switches together. Each packet p has a header
from which a switch can determine, when p arrives on an input lJ_nk, the output
link on which p needs to depart. The goal of a switch is thus to take streams of
packets arriving on its input links and move each packet, as quickly as possible,
to the particular output link on which it needs to depart. How quickly? In high- Figure E.1 A switch with n = 4 inputS and outputs. In one time step, packets p, q, and r
volume settings, it is possible for a packet to arrive on each input link once have arrived.
ever~ few tens of nanoseconds; if they aren’t offloaded to their respective
output links at a comparable rate, then traffic wil! back up and packets wil!
be dropped. So, in Figure E. 1, the given time step could end with packets p and q having
In order to think about the algorithms operating inside a switch, we model ’ departed on their output links, and with packet r sitting in the output buffer
the switch itself as follows. It has n input links I1 ..... In and n output links 03. (In discussing this example here and below, we’ll assume that q is favored
On. Packets arrive on the input links; a given packet p has an associated over r when decisions are made.) Under this model, the switch is basically
input/output type (I[p], O[p]) indicating that it has arrived at input link I[p] a "fricfionless" object through which packets pass unimpeded to their output
and needs to depart on output link O[p]. Time moves in discrete steps; in each buffer.
step, at most one new packet arrives on each input link, and at most one In reality, however, a packet that arrives on an input link must be copied
packet can depart on each output link. over to its appropriate output link, and this operation requires some processing
Consider the example in Figure E.1. In a single time step, the three packets that ties up both the input and output links for a few nanoseconds. So, rea!ly,
p, q, and r have arrived at an empty switch on input links I1, 13, and I4, constraints within the switch do pose some obstacles to the movement of
respectively. Packet p is destined for 01, packet q is destined for 03, and packet packets from inputs to outputs.
r is also destined for 03. Now there’s no problem sending packet p out on link The most restrictive model of these constraints, input/output queueing,
O1; but only one packet can depart on link 03, and so the switch has to resolve works as follows. We now have an input buffer for each input link I, as
the contention between q and r. How can it do this? well as an output buffer for each output link O. When each packet arrives, it
The simplest model of switch behavior is known as pure output queueing, immediately lands in its associated input buffer. In a single time step, a switch
and it’s essentially an idealized picture of how we wished a switch behaved. can read at most one packet from each input buffer and write at most one
In this model, all nodes that arrive in a given time step are placed in an output packet to each output buffer. So under input/output queueing, the example of
buffer associated with their output link, and one of the packets in each output Figure E.1 would work as follows. Each of p, q, and r would arrive in different
buffer actually gets to depart. More concretely, here’s the model of a single input buffers; the switch could then move p and q to their output buffers, but
time step. it could not move all three, since moving al! three would involve writing two
packets into the output buffer 03. Thus the first step would end with p and
One step trader pure output queueing: q having departed on their output links, and r sitting in the input buffer 14
Packets arrive on input links (rather than in the output buffer 03).
Each packet p of type (I~], 0~]) is moved to output buffer 0~] More generally, the restriction of limited reading and writing amounts to
At most one packet departs from each output buffer the following: If packets Pl ..... p~ are moved in a single time step from input
Epilogue: Algorithms That Run Forever 799
Epilogue: Algorithms That Run Forever
798
buffers to output buffers, then all their input buffers and all their output buffers
g} must
J O1
form a bipartite matching. Thus we can model a single time step as follows.
The choice of which matching to move is left unspecified for now; this is a
point that will become crucial later.
So under input/output queueing, the switch introduces some "friction" on
the movement of packets, and this is an observable phenomenon: if we view
the switch as a black box, and simply watch the sequence of departures on the
output links, then we can see the difference between pure output queueing
and input/output queueing. Consider an example whose first step is just like
Figure E.1, and in whose second step a single packet s of type (I4, 04) arrives. ____.------- 02 As a result of r having ~
Under pure output queueing, p and q would depart in the first step, and r and
s would depart in the second step. Under input/output queueing, however,
the sequence of events depicted in Figure E.2 occurs: At the end of the first
step, r is still sitting in the input buffer 14, and so, at the end of the second
Iin
to wait, one of packets|
r and s will be blocked|
this step.
J
step, one of r or s is still in the input buffer 14 and has not yet departed. This
conflict between r and s is called head-of-line blockir~, and it causes a switch
with input/output queueing to exhibit inferior delay characteristics compared
(b)
with pure output queueing.
Simulating a Switch with Pure Output Queueing While pure output queue- Figure E.2 Parts (a) and (b) depict a two-step example in which head-of-line bloc!ring
OCCURS.
ing would be nice to have, the arguments above indicate why it’s not feasible
to design a switch with this behavior: In a single time step (lasting only tens of
nanoseconds), it would not generally be possible to move packets from each
step. But a speed-up of n is completely infeasible; and if we think about this
of n input links to a common output buffer.
worst-case example, we begin to worry that we might need a speed-up of n to
But what if we were to take a switch that used input/output queueing and
make this work--after all, what if all the arriving packets really did need to
ran it somewhat faster, moving several matchings in a single time step instead
go to the same output buffer?
of just one? Would it be possible to simulate a switch that used pure output
queueing? By this we mean that the sequence of departures on the output links The crux of this section is to show that a much more modest speed-up
(viewing the switch as a black box) should be the same under the behavior of is sufficient. We’ll describe a striking result of Chuang, Goe!, McKeown, and
pure output queueing and the behavior of our sped-up input/output queueing Prabhakar (1999), showing that a switch using input/output queueing with a
algorithm. speed-up of 2 can simulate a switch that uses pure output queueing. Intuitively,
the result exploits the fact that the behavior of the switch at an internal level
It is not hard to see that a speed-up of n would suffice: If we could move
need not resemble the behavior under pure output queueing, provided that
n matchings in each time step, then even if every arriving packet needed to
the sequence of output link departures is the same. (Hence, to continue the
reach the same output buffer, we could move them a~ in the course of one
800 Epilogue: Algorithms That Run Forever Epilogue: Algorithms That Run Forever
801
example in the previous paragraph, it’s okay that we don’t move all n arriving time to leave; to mitigate this, there are airport personnel who are allowed
packets to a common output buffer in one time step; we can afford more time to helpfully extract you from the middle of the line and hustle you through
for this, since their departures on this common output link will be spread out security. The output buffers, by way of contrast, are relaxing places: You sit
over a long period of time anyway.) around until your time to leave is announced, and then you just go. The goal
is to get everyone through the congestion in the middle so that they depart on
time.
/-~ Designing the Algorithm
Just to be precise, here’s our model for a speed-up of 2. One consequence of these observations is that we don’t need to worry
about packets that are already in output buffers; they’ll just depart at the
fight time. Hence we refer to a packet p as unprocessed if it is still in its
One step under sped-up input/output queueing:
Packets ~rrive on input links and ~re placed in input buffers input buffer, and we define some further useful quantities for such packets.
A set of packets whose types form a matching are moved to their The input cushion IC(p) is the number of packets ordered in front of p in its
input buffer. The output cushion OC(p) is the number of packets already in
associated output buffers
p’s output buffer that have an earlier time to leave. Things are going well for
At most one packet departs from each output buffer
A set of packets whose types form a matching are moved to their an unprocessed packet p if OC(p) is significantly greater than IC(p); in this
case, p is near the front of the line in its input buffer, and there are still a lot of
associated output buffers
packets before it in the output buffer. To capture this relationship, we define
In order to prove that this model can simulate pure output queueing; we , Slack(p) --= OC(p) - IC(p), observing that large values of Slack(p) are good.
need to resolve the crucial underspecified point in the model above: Which Here is our plan: We will move matchings through the switch so as to
matchings should be moved in each step? The answer to this question will form maintain the following two properties at all times.
the core of the result, and we build up to it through a sequence of intermediate. (i) Slack(p) >_ 0 for all unprocessed packets p.
steps. To begin with, we make one simple observation right away: If a packet
(ii) In any step that begins with IC(p) = OC(p) = 0, packet p will be moved
of type (I, O) is part of a matching selected by the switch, then the switch wil!
to its output buffer in the first matching.
move the packet of this type that has the earliest time to leave.
Maintaining Input and Output Buffers To decide which two matchings the We first claim that it is sufficient to maintain these two properties.
switch should move in a given time step, we define some quantifies that track (E.1) I[properties (i) and (ii) are maintained [or all unprocessed packets at
the current state of the switch relative to pure output queueing. To begin with, all times, then every packet p will depart at its time to leave TL(p).
for a packet p, we define its time to leave, TL(p), to be the time step in which
it would depart on its output link from a switch that was running pure output Proof. Ifp is in its output buffer at the start of step TL(p), then it can clearly
queueing. The goal is to make sure that each packet p departs from our switch depart. Otherwise it must be in its input buffer. In this case, we have OC(p) = 0
(running sped-up input/output queueing) in precisq!y the time step TL(p). at the start of the step. By property (i), we have Slack(p) = OC(p) - IC(p) >_ O,
Conceptually, each input buffer is maintained as an ordered list; however, and hence IC(p) = 0. It then follows from property (ii) that p will be moved to
we retain the freedom to insert an arriving packet into the middle of this the output buffer in the first matching of this step, and hence will depart in
order, and to move a packet to its output buffer even when it is not yet at this step as well. ~,
the front of the line. Despite this, the linear ordering of the buffer will form
a useful progress measure. Each output buffer, by contrast, does not need to It turns out that property (ii) is easy to guarantee (and i~ will arise naturally
be ordered; when a packer’s time to leave comes up, we simply let it depart. from the solution below), so we focus on the tricky task of choosing matchings
We can think of the whole setup as resembling a busy airport terminal, with so as to maintain property (i).
the input buffers corresponding to check-in counters, the output buffers to Moving a Matching through a Switch When a packet p first arrives on an
the departure lounge, and the internals of the switch to a congested security input link, we insert it as far back in the input buffer as possible (potentially
checkpoint. The input buffers are stressful places: If you don’t make it to the somewhere in the middle) consistent with the requirement Slack(p) > O. This
head of the line by the time your departure is announced, you could miss your makes sure property (i) is satisfied initially for p.
Epilogue: Algorithms That Run Forever 803
Epilogue: Algorithms That Run Forever
802
(E.i~) By moving avo stable matchings in each time step, according, to the
preferences just defined, the switch is able to simulate the behavior of pare
output qaeaeing.
Overall, the algorithm makes for a surprising last-minute appearance by
the topic with which we began the book--and rather than matching men with References
women or applicants with employers, we find ourselves matching input finks
to output links in a high-speed Internet router.
This has been one glimpse into the issue of algorithms that run forever,
keeping up with an infinite stream of new events. It is an intriguing topic, full
of open directions and unresolved issues. But that is for another time, and
another book; and, as for us, we are done.
B. Bollobas. Modern Graph Theory. Spfinger-Verlag, 1998. R. Downey and M. Fellows. Parametrized Complexity. Springer-Veflag, 1999.
A. Borodin and R. E!-Yaniv. Online Computation and Competitive Analysis. Z. Drezner (ed.). Facility location. Springer-Veflag, 1995.
Cambridge University Press, 1998. R.Duda, P. Hart, and D. Stork. Pattern Classification (2nd edition). WHey, 2001.
A. Borodin, M. N. Nielsen, and C. Rackoff. (Incremental) priority algorithms. Proc.
M. E. Dyer and A. M. Frieze. A simple heuristic for thep-centre problem. Operations
13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 752-761, 2002.
Research Letters, 3 (1985), 285-288.
y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via
J. Edmonds. Minimum partition of a matroid into independent subsets. J. Research
graph cuts. International Conference on Computer Vision, pp. 377-384, 1999.
of the National Bureau of Standards B, 69 (1965), 67-72.
L. J. Carter and M. L. Wegman. Universal classes of hash functions. J. Computer
J. Edmonds. Optimum branchings. J. Research of the National Bureau oy Standards,
and System Sciences 18:2 (1979), 143-154.
71B (1967), 233-240.
B. V. Cherkassky, A. V. Goldberg, and T. Radzik. Shortest paths algorithms:
Theory and experimental evaluation. Proc. 5th ACM-SIAM Symposium on Discrete J. Edmonds. Matroids and the Greedy Algorithm. Math. Programming 1 (1971),
127-136.
Algorithms, pp. 516-525, 1994.
H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic efficiency
for network flow problems. Journal of the ACM 19:2(!972), 248-264.
the sum of observations. Annals of Mathematical Statistics, 23 (!952), 493-509.
L. P. Chew. Building Voronoi diagrams for convex polygons in linear expected L. Euler. Solutio problematis ad geometriam situs pertinentis. Commetarii
time. Technical Report, Dept. of Math and Computer Science, Dartmouth College, Academiae Scientiarnm Imperialis Petropolitanae 8 (1736), 128-140.
Hanover, NH, 1985. R. M. Fano. Transmission of Information. M.I.T. Press, 1949.
y. J. Chu and T. H. Lin. On the shortest arborescence of a directed graph. Sci. W. Feller. An Introduction to Probability Theory and Its Applications, Vol. 1. WHey,
Sinica 14 (!965), 1396-1400. 1957.
S.-T. Chuang, A. Goel, N. McKeown, and B. Prabhakar. Matching output queueing A. Fiat, R. M. Karl3, M. Luby, L. A. McGeoch, D. D. Sleator, and N. E. Young.
with a combined input output queued switch. IEEE J. on Selected Areas in Competitive paging algorithms. J. Algorithms 12 (1991), 685-699.
Communications, 17:6 (1999), 1030-1039.
R. W. Floyd. Algorithm 245 (TreeSort). Communications of the ACM, 7 (1964),
V. Chvatal. A greedy heuristic for the set covering problem. Mathematics of 701.
Operations Research, 4 (1979), 233-235.
References 8O9
References
808
M. Granovetter. Threshold models of collective behavior. American Journal of
L. R. Ford. Network Flow Theory. RAND Corporation Technical Report P-923, Sociology 83:6(1978), 1420-1443.
1956. D. Greig, B. Porteous, and A. Seheult. Exact maximum a posteriori estimation for
L. R. Ford and D. R. Fulkerson. Flows in Networks. princeton University Press, binary images. J. Royal Statistical Society B, 51:2(1989), pp. 271-278.
1962. D. Gt~sfield. Algorithms on Strings, Trees, and Sequences: Computer Science and
D. Gale. The two-sided matching problem: Origin, development and current issues. Computational Biology. Cambridge University Press, 1997.
International Game Theory Review, 3:2/3 (2001), 237-252.
D. R. Gusfield and R. W. Irving. The Stable Marriage Problem: Structure and
D. Gale and L. Shapley. College admissions and the stability of marriage. American Algorithms. MIT Press, 1989.
Mathematical Monthly 69 (1962), 9-15.
L. A. Hall. Approximation algorithms for scheduling. In Approximation Algorithms
M. R. Garey and D. S. Johnson. Computers and Intractability. A Guide to the Theory for NP-Hard Problems, edited by D. S. Hochbaum. PWS Publishing, 1996.
of NP-Completeness. Freeman, 1979. P. Hall. On representation of subsets. J. London Mathematical Society 10 (1935),
M. Garey, D. Johnson, G. Miller, and C. Papadimitriou. The complexity of coloring 26-30.
circular arcs and chords. SIAM J. Algebraic and Discrete Methods, 1:2 (June 1980),
S. Haykin. Neural Networks: A Comprehensive Foundation (2nd ed.). Macmillan,
216-227. 1999.
M. Ghallab, D. Nau, and P. Traverso. Automated Planning: Theory and Practice.
D. S. Hirschberg. A linear space algorithm for computing maximal common
Morgan Kaufmann, 2004. subsequences. Communications of the ACM 18 (1975) 341-343.
M. X. Goemans and D. P. Williamson. The primal-dual method for approximation
D. S. Hochbaum. Approximation algorithms for the set covering and vertex cover
algorithms and its application to network design problems. In Approximation problems. SIAMJ. on Computing, 11:3 (1982), 555-556.
Algorithms for NP-Hard Problems, edited by D. S. Hochbaum. PWS Publishing,
D. S. Hochbaum (ed.). Approximation Algorithms for NP-Hard Problems. PWS
1996.
Publishing, 1996.
A. Goldberg. Efficient Graph Algorithms for Sequential and Parallel Computers.
D. S. Hochbaum. Approximating covering and packing problems: set cover, vertex
Ph.D. thesis, MIT, 1986.
cover, independent set and related problems. In Approximation Algorithms for
A. Goldberg. Network Optimization Library. http://www.avglab.com/andrew NP-Hard Problems, edited by D. S. Hochbaum. PWS Publishing, 1996.
/soft.htmI. D. S. Hochbaum and D. B. Shmoys. A best possible heuristic for the k-center
A. Goldberg, ~. Tardos, and R. E. Tarian. Network flow algorithms. In Paths, Flows, problem. Mathematics of Operations Research 10:2 (1985), 180-184.
and VLSI-Layout, edited by B. Korte et al. Springer-Verlag, 1990.
D. S. Hochbaum and D. B. Shmoys. Using dual approximation algorithms for
A. Goldberg and R. Tarian. A new approach to the maximum flow problem. Proc. scheduling problems: Theoretical and practical results. Journal o[ the ACM 34
ISth ACM Symposium on Theory of Computing, pp. 136-146, !986. (1987), 144-162.
M. Go[in, R. Raman, C. Schwarz, and M. Staid. Simple randomized algorithms for W. Hoeffding. Probability inequalities for sums of bounded random variables. J.
closest pair problems. Nordic J. Comput., 2 (1995), 3-27. American Statistical Association, 58 (1963), 13-30.
M. C. Golumbic. Algorithmic Graph Theory and Perfect Graphs. Academic Press, J. Hopfield. Neural networks and physical systems with emergent collective
1980. computational properties. Proc. National Academy of Scien~ces of the USA, 79
R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical (1982), 2554-2588.
Journal 45 (1966), 1563-1581. D. A. Huffman. A method for the construction of minimum-redundancy codes.
R. L. Graham. Bounds for multiprocessing timing anomalies. SIAM J. Applied Proc. IRE 40:9 (Sept. 1952), 1098-1101.
Mathematics 17 (1969), 263-269. A. Jain and R. Dubes. Algorithms for Clustering Data. Prentice Hall, 1981.
R. L. Graham and P. Hell. On the history of the minimum spanning tree problem. T. R. Jensen and B. Tort. Graph Coloring Problems. Wiley Interscience, 1995.
Annals of the History of Computing, 7 (1985), 43-57.
References 811
References
810
D. Konig. Uber Graphen und ihre Anwendung auf Determinantentheorie und
D. S. Johnson. Approximation algorithms for combinatorial problems. J. of Mengenlehre. Mathematische Annalen, 77 (1916), 453-465.
Computer and System Sciences, 9 (1974), 256-278. B. Korte, L. Lovfisz, H. J: PrSmel, A. Schrijver (eds.). Paths, Hotvs, and VLSI-Layout
M. Jordan (ed.). Learning in Graphical Models. MIT Press, 1998. Springer-Verlag, 1990.
A. Karatsuba and Y. Ofman. Multiplication of mulfidigit numbers on automata. E. Lawler. Combinatorial Optimization: Networks and Matroids. Dover, 2001.
Soviet Physics Doklady, 7 (1962), 595-596. E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. The Traveling
D. Karger. Random Sampling in Graph Optimization Problems. Ph.D. Thesis, Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley, 1985.
Stanford University, 1995. E. L. Lawler, J. K. Lenstrd, A. H. G. Rinnooy Kan, and D. B. Shmoys. Sequencing
D. R. Karger, C. Stein. A new approach to the minimum cut problem. Journal of and scheduling: Algorithms and complexity. In Handbooks in Operations Research
and Management Science 4, edited by S. C. Graves, A. H. G. Rirmooy Kan, and R
the ACM 43:4(1996), 601-640. H. Zipkin. Elsevier, 1993.
N. Karmarkar. A new polynomial-time algorithm for linear programming. Combb
F. T. Leighton, Introduction to Parallel Algorithms and Architectures. Morgan
natorica, 4:4(1984), 373-396. Kaufmann, 1992.
R. M. Karp. Reducibility among combinatorial problems. In Complexity of Computer
Computations, edited by R. Miller and J. Thatcher, pp. 85-103. Plenum Press, 1972. F. T. Leighton, B. M. Maggs, and S. Rao. Packet routing and job-shop scheduling
in O(congestion + dilation) steps. Combinatorica, 14:2 (1994), 167-186.
B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs.,
D. Lelewer and D. S. Hirshberg. Data Compression. Computing Surveys 19:3 (1987),
The Bell System Technical Journal, 49:2 (1970), 291-307.
261-297.
S. Keshav. An Engineering Approach to Computer Networking. Addison-Wesley,
J. K. Lenstra, D. Shmoys, and g. Tardos. Approximation algorithms for scheduling
1997. unrelated parallel machines. Mathematical Programming, 46 (1990), 259-271.
L. Khachiyan. A polynomial algorithm in linear programming. Soviet Mathematics
L. Levin. Universal Search Problems (in Russian). Problemy Peredachi Informatsii,
Doklady, 20:1(1979), 191-194. 9:3 (1973), pp. 265-266. For a partial English translation, see B. A. Trakhtenbrot, A
S. Kirkpatfick, C. D. Gelatt, Jr., and M. P. Vecchi. Optimization by simulated survey of Russian approaches to Perebor (brute-force search) algorithms. Annals
annealing. Science, 220:4598 (1983), 671-680. of the History of Computing 6:4 (1984), 384-400.
j. Kleinberg. Approximation Algorithms for Disjoint Paths Problems. Ph.D Thesis, L. Lovfisz. On the ratio of the optimal integral and fractional covers. Discrete
MIT, 1996. Mathematics 13 (1975), 383-390.
J. Kleinberg and 1~. Tardos. Disjoint paths in densely embedded graphs. Proc. 36th S. Martello and P. Toth. Knapsack Problems: Algorithms and Computer Implemen-
IEEE Symposium on Foundations of Computer Science, pP. 52-61, 1995. tations. Wiley, 1990.
D. E. Knuth, The Art of Computer Programming, Vol. 1: Fundamental Algorithms D. H. Mathews and M. Zuker. RNA secondary structure prediction. In Encyclopedia
(3rd edition). Addison-Wesley, 1997a. of Genetics, Genomics, Proteomics and Bioinformatics, edited by P. Clote. Wiley,
2004.
D. E. Knuth. The Art of Computer Programming, Vo!. 2: SeminumericaI Algorithms
(3rd edition). Addison-Wesley, 1997b. K. Mehlhorn and St. Ni~er. The LEDA Platform of Combinatorial and Geometric
Computing. Cambridge University Press, 1999.
D. E. Knuth. Stable marriage and its relation to other combinatorial problems. CRM
Proceedings and Lecture Notes, vol. 10. American Mathematical Society, 1997c. K. Menger. Zur allgemeinen Kurventheorie. Fundam. Math. 19 (1927), 96-115.
D. E. Knuth. The Art of Computer Programming, Vol. 3: Sorting and Searching (3rd K. Menger. On the origin of the n-Arc Theorem. J. Graph Theory 5 (1981), 341-350.
edition). Addison-Wesley, 1998. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth. A. H. Teller, and E. Teller.
V. Kolmogorov and R. Zabih. What energy functions can be minimized via graph Equation of state calculations by fast computing machines. J. Chemical Physics 21
cuts? !EEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26:2 (1953), 1087-1092.
(2004), 147-159.
References 813
References
812
T. Roughgarden. Selfish Routing. Ph.D. thesis, Cornell University, 2002.
M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algo-
rithms and ProbabiIistic Analysis. Cambridge University Press, 2005. T. Roughgarden. Selfish Routing and the Price of Anarchy. MIT Press, 2004.
D. Monderer and L. Shapley. Potential Games. Games and Economic Behavior 14 S. Russell and P. Norvig. Artificial Intelligence: A Modem Approach (2nd edition).
(1996), 124-143. Prentice Hail, 2002.
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University D. Sankoff. The early introduction of dynamic programming into computational
Press, 1995. biology. Bioinformatics 16:1 (2000), 41-47.
John F. Nash, Jr. Equilibrium points in n-person games. Proc. National Academy J. E. Savage. Models of Computation. Addison-Wesley, 1998.
of Sciences of the USA, 36 (1950), 48-49. W. Savitch. Relationships between nondeterministic and deterministic tape
S. B. Needleman and C. D. Wunsch. J. Molecular Biology. 48 (1970), 443-455. complexities. J. Computer and System Sciences 4 (1970), 177-192.
G. L. Nemhanser and L. A. Wolsey. Integer and Combinatorial Optimization. Wiley, T. Schaefer. On the complexity of some two-person perfect-information games. J.
1988. Computer and System Sciences 16:2 (April 1978), !85-225.
J. Nesetril. A few remarks on the history of MST-problem. ArchivumMathematicum
T. Schelling. Micromotives and Macrobehavior. Norton, !978.
Bmo, 33 (1997), 15-22.
A. Schrijver. On the history of the transportation and maximum flow problems.
M. Newborn. Kasparov versus Deep Blue: Computer Chess Comes of Age. Springer- Math. Programming 91 (2002), 437-445.
Veflag, 1996.
R. Seidel. Backwards analysis of randomized geometric algorithms. In New Trends
R. Nowakowski (ed.). Games of No Chance. Cambridge University Press, 1998.
in Discrete and Computational Geometry, edited by J. Pach, pp. 37-68. Springer-
M. Osborne. An Introduction to Game Theory. Oxford University Press, 2003. Verlag, 1993.
C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1995. M. I. Shamos and D. Hoey. Closest-point problems. Prec. 16thlEEESymposium on
C. H. Papadimitriou. Algorithms, games, and the Internet. Proc. 33rd ACM Foundations of Computer Science, pp. 151-162, 1975.
Symposium on Theory of Computing, PP. 749-753, 2001. C. E. Shannon and W. Weaver. The Mathematical Theory of Communication.
S. Plotkin. Competitive routing in ATM networks. IEEE J. Selected Areas in University of Illinois Press, 1949.
Communications, 1995, pp. 1128-1136. M. Sipser. The history and status of the P versus NP question. Prec. 24th ACM
F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Symposium on the Theory of Computing, pp. 603-618, 1992.
Springer-Veriag, 1985. D. D. Sleator and R. E. Tarjan. Amortized efficiency of list update and paging rules.
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes Communications of the ACM, 28:2 (1985), 202-208.
in C. Cambridge University Press, 1988. M. Smid. Closest-point problems in computational geometry. In Handbook of
M. O. Rabin. Probabilistic algorithms. In Algorithms and Complexity: New Computational Geometry, edited by J. Rudiger Sack and J. Urrutia, pp. 877-935.
Directions and Recent Results, edited by J. Traub, 21-59. Academic Press, 1976. Elsevier Science Publishers, B.V. North-Holland, 1999.
B. Reed. Tree width and tangles, a new measure of connectivity and some J. W. Stewart. BGP4: Inter-Domain Routing in the Intemet. Addison-Wesley, 1998.
applications. Surveys in Combinatorics, edited by R. Bailey. Cambridge University
L. Stockmeyer and A. K. Chandra. Provably difficult combinatorial’games. SIAM
Press, 1997.
J. on Computing 8 (1979), 151-174.
N. Robertson and P. D. Seymour. An outline of a disjoint paths algorithm. In Paths,
Flows, and VLSI-Layout, edited by B. Korte et al. Springer-Veflag, 1990. L. Stockmeyer and A. Meyer. Word problems requiring exponential time. Proc. Sth
Annual ACM Symposium on Theory of Computing, pp. 1-9, 1973.
R, W. Rosenthal. The network equilibrium problem in integers. Networks 3 (1973),
l!.. Tardos. Network Games. Proc. 36th ACM Symposium on Theory of Computing,
53-59.
S. Ross. Introduction to Stochastic Dynamic Programming, Academic Press, 1983. pp. 341-342, 2004.
References
814
Classification via local search, NP-completeness, 487-490 Computational biology Conditional expectation, 724 randomization in, 782-784 ex
ChangeKey operation as optimization problem, 782 ex RNA Secondary Structure
681-682 Conditional probability, 771-772 Context-free grammars, 272
for heaps, 65 Circular-Arc Coloring Problem, 563 Prediction Problem, 272-273
algorithm analysis for, 687-689 Conditions, in planning problems, Contingency planning, 535
for Prim’s Algorithm, 150 algorithm for, 275-278
algorithm design for, 683-687 algorithm analysis for, 572 534, 538 Contraction Algorithm
for shortest paths, 141-142
notes, 706 algorithm design for, 566-571 notes, 335 Configurations analyzing, 716-718
Chao, T., 207 problem, 273-275
problem, 682-683 notes, 598 in Hopfield neural networks, 671, designing, 715-716
Character encoding. See Huffman
Clause gadgets, 483-484 problem, 563-566 sequence alignment. See Sequence 676, 700, 702-703 ex for number of global minimum
codes Graph Coloring Problem, 485-486, alignment
Clauses with Boolean variables, in planning problems, 538-539 cuts, 718-719
Character sets, 162 Computational complexity. See
459-460 499 Conflict graphs, 16 Control theory, 335
Characterizations
Cleary, J. G., 206 chromatic number in, 597 ex Computational .intractability; Contacts Convergence of probability functions,
notes, 529
Clock signals, 199 ex computational complexity of, Computational tractability in 3-SAT Problem, 461 711
in NP and co-NP, 496-497 486-487 Computational geometry
Clones ’R’ Us exercise, 309-311 ex contention resolution for, 782- Convolutions, 234
Charged particles, 247-248 ex closest pair of points, 226, 741
Close to optimal solutions, 599 notes, 529 784 ex algorithms for, 238-242
Check reconciliation, 430 ex
Closest-Pair algorithm, 230 NP-completeness, 487-490 notes, 249 in Interval Scheduling Problem, computing, 237-238
Cherkassky, Boris V., 336
Closest pair of points, 209, 225 for partitioning, 499 Computational intractability, 451-452 118 problem, 234-237
Chernoff, H., 794
algorithm for Combinatorial auctions, 511 ex Circuit Satisfiability Problem, Congestion Conway, J. H., 551
Chernoff bounds, 758-760 466-470
analyzing, 231 Combinatorial structure of spanning in Minimum Spanning Tree Cook, S. A., NP-completeness, 467,
for load balancing, 762
designing, 226-230 trees, 202-203 ex efficient certification in, 463-466 Problem, 150 529, 543
for packet routing, 767-769 of packet schedule paths, 765
Chernoff-Hoeffding bounds, 794 notes, 249 Common running times, 47-48 Graph Coloring Problem, 485-486 Cook reduction, 473
problem, 226 cubic, 52-53 computational complexity of, Conjunction with Boolean variables, Cooling schedule in simulated
Chess, 535 486-487
randomized approach, 741-742 linear, 48-50 459 annealing, 669-670
Chew, L. PI, 794
algorithm analysis for, 746-747 O(n log n), 50-51 notes, 529 Connected components, 82-83 Corner-to-comer paths for sequence
Children
algorithm design for, 742-746 O(nk), 53-54 NP-completeness, 487-490 Connected undirected graphs, 76-77 alignment, 284-285, 287-288
in heaps, 59-61
linear expected running time for, quadratic, 51-52 numerical problems, 490 Connectivity in graphs, 76-79 Cost function in local search, 663
in trees, 77
748-750 sublinear, 56 in scheduling, 493-494 breadth-first search for, 79-82 Cost-sharing
Chor, Benny, 794
notes, 794 Communication networks Subset Sum Problem, 491-495 connected components in, 82-83, for apartment expenses, 429-430 ex
Chromatic number. See Coloring
problem, 742 graphs as models of, 74-75 partitioning problems, 481-485 86-87, 94 for edges, 690
Problems
running time of, 51-52 switching in, 26-27 ex, 796-804 polynomial-time reductions, depth-first search for, 83-86 for Internet services, 690-700,
Chromosomes
Clustering, 157-158 Compatibility 452-454 directed graphs, 97-99 785-786ex
DNA, 279 Independent Set in, 454-456
formalizing, 158, 515-516 ex of configurations, 516-517 ex Conservation conditions Coniomb’s Law, 247-248 ex
in genome mapping, 521 ex,
greedy algorithms for of labelings and preflows, 358 Turing, 473 for flows, 339 Counting inversions, 222-223, 246 ex
787 ex
analyzing, 159-161 of prices and matchings, 408 Vertex Cover in, 456-459 for preflows, 357 Counting to infinity, 300-301
Chu, Y. J., 206
designing, 157-158 Compatible intervals, 116, 253 Satisfiability Problem, 459-463 Consistent check reconciliation, Coupon collecting example, 722-724
Chuang, S.-T., 799
notes, 206 Compatible requests, 13,116, 118-119 sequencing problems, 473-474 430 ex Course Management System (CMS),
Chvatal, V., 659 Hamiltonian Cycle Problem,
problem, 158 Competitive 3-SAT game, 544-547 Consistent k-coloring, 569 431-433 ex
Circuit Safisfiability Problem
CMS (Course Management System), Competitive Facility Location 474-479 Consistent metrics, 202 ex Cover, T., 206
in NP completeness, 466-470
431-433 ex Problem, 17 Hamiltonian Path Problem, Consistent truth assignment, 592 ex Coverage Expansion Problem,
relation to PSPACE-completeness,
Co-NP, 495-496 games in, 536-537 480-481 Constraint Satisfaction Problems 424-425 ex
543
Circular-Arc Coloring Problem, 563 for good characterization, 496-497 in PSPACE completeness, 544-547 Traveling Salesman Problem, in 3-SAT, 500 Covering problems, 455-456, 498
in PSPACE, 532-533 Compiler design, 486 474, 479 in Lecture Planning Problem, Covering radius in Center Selection
algorithms for
Coalition, 500-502 ex Complementary base-pairing in DNA, Computational tractability, 29-30 503 ex Problem, 607-608, 700-702 ex
analyzing, 572
Cobham, A., 70 273-275 efficiency in, 30-31 Constraints in Linear Programming Crew scheduling, 387
designing, 566-571
Coherence property, 575 Complementary events, 710 polynomial time, 32-35 Problem, 632-634 algorithm for
notes, 598
Cohesiveness of node sets, 444ex Complex plane, 239 worst-case running times, 31-32 Consumer preference patterns, 385 analyzing, 390-391
problem, 563-566
Collaborative filtering, 221-222 Complex roots of unity, 239 Compute-Opt algorithm, 255-256 Container packing, 651 ex designing, 389-390
Circulations
Collecting coupons example, 722-724 Component array, 152-153 Computer game-playing Contention resolution, 708-709 problem, 387-389
in Airline Scheduling Problem, 390
Collective human behaviors, Component Grouping Problem, chess, 55! algorithm for Crick, K, 273
with demands, 379-384, 414ex
522-524ex 494-495 PSPACE for, 535-536 analyzing, 709-714 Cross-examination in Lecture
with lower bounds, 382-384, 387,
Collisions in hashing, 736-737 Compression. See Data compression Computer vision, 226, 391,681 designing, 709 Planning Problem, 503 ex
414 ex
Coloring problems Computational steps in algorithms, Concatenating sequences, 308- notes, 793 Cryptosystem, 491
in survey design, 387
3-Coloring Problem 35-36 309 ex, 517 ex problem, 709 Cubic time, 52-53
Citation networks, 75
820 Index Index 821
Cushions in packet switching, 801 Deadlines in planning problems, 54t designing, 374-375 problem, 221-223 for Circular-Arc Coloring, 569-571
Cut Property minimizing lateness, 125-126 Descendants in trees, 77 extensions, 377-378 limitations of, 251 in interval scheduling, 14
characteristics of, 187-188 ex algorithm analysis for, 128-131 Determined variables, 591 ex greedy approximation, 625-627 median-finding, 727 over intervals, 272-273
in Minimum Spanning Tree algorithm design for, 126-128 DFS. See Depth-first search (DFS) greedy pricing, 628-630 algorithm analysis for, 730-731 algorithm for, 275-278
Problem, 146-149 algorithm extensions for, 131 Diagonal entries in matrices, 428 ex notes, 449, 659 algorithm design for, 728-730 problem, 273-275
Cuts. See Minimum cuts notes, 206 Diameter of networks, 109-110 ex NP-omplete version of, 527 ex problem, 727-728 for Knapsack Problem, 266-267,
Cycle Cover Problem, 528 ex in schedulable jobs, 334 ex Dictionaries problem, 374, 624-625 Mergesort Algorithm, 210-211 645, 648
Cycle Property in NP-complete scheduling hashing for, 734 for undirected graphs, 377-378, approaches to, 211-212 algorithm analysis for, 270-271
characteristics of, 187-188 ex problems, 493,500 data structure analysis for, 597ex substitutions in, 213-214 algorithm design for, 268-270
in Minimum Spanning Tree Decentralized algorithm for shortest 740-741 Disjunction with Boolear~. variables, unrolling recurrences in, 212-213 algorithm extension for, 271-272
Problem, 147-149 paths, 290-291 data structure design for, 459 Quicksort, 731-734 for Maximum-Weight Independent
Cytosine, 273 Decision-making data, 5!3-514ex 735-740 Disks in memory hierarchies, 132 related recurrences in, 220-221 Set Problem, 561-562
Decision problem problem, 734-735 Distance function sequence alignment notes, 335
D for efficient certification, 463 sequence alignment in, 278-279 in clustering, 158 algorithm analysis for, 282-284 in planning problems, 543
DAGs (directed acyclic graphs), vs. optimization version, 454 Diestel, R. for biological sequences, 279-280, algorithm design for, 281-282 principles of, 258-260
99-104 Decision variables in Weighted Vertex graph theory, 113 652 ex problem, 278-281 Segmented Least Squares Problem,
algorithm for, 101-104 Cover problem, 634 tree decomposition, 598 Distance vector protocols subproblems in, 215-220 261
problem, 100-!01 Decisive Subset Problem, 513-514 ex Differentiable functions, minimizing, description, 297-300 DNA, 273-275 algorithm analysis for, 266
topological ordering in, 104ex, Decomposition 202 ex, 519-520 ex problems with, 300-301 genome mapping, 521 ex algorithm design for, 264-266
107 ex path, 376 Dijkstra, Edsger W., 137, 206 Distances RNA. See RNA Secondary Structure problem, 261-264
Daily .Special Scheduling Problem, tree. See Tree decompositions Dijkstra’s Algorithm in breadth-first search, 80 Prediction Problem for sequence alignment. See
526 ex Deep Blue program in Minimum-Cost Perfect Matching in Center Selection Problem, sequence alignment for, 279 Sequence alignment
Das, Gautam, 207 in chess matches, 535 Problem, 408 606-607 Dobkin, David, 207 for shortest paths in graphs. See
Dashes in Morse code, 163 notes, 552 for paths, 137-!41,143,290, 298 for closest pair of points, 226, Doctors Without Weekends, Shortest Path Problem
Data compression, 162 Degrees Dilation of paths in packet schedules, 743-745 412-414 ex, 425-426 ex using tree decompositions, 580-584
greedy algorithms for of nodes, 88 765 between graph nodes, 77 Domain Decomposition Problem, Weighted Interval Scheduling
analyzing, 173-175 of polynomials, 40 Dinitz, A., 357 in Minimum Spanning Tree 529 ex Problem, 252
designing, 166-173 Delete lists in planning problems, Directed acyclic graphs (DAGs), Problem, 150 Dominating Set Problem algorithm design, 252-256
extensions, 175-177 534, 538 99-104 in networks, 109-110ex Minimum-Cost, 597ex memoized recursion, 256-257
notes, 206 Delete operation algorithm for, 101-104 in Traveling Salesman Problem, in wireless networks, 776-779 ex
problem, 162-166 for dictionaries, 735-736, 738 problem, 100-101 479 definition, 519 ex E
Data mining for heaps, 62, 64-65 topological ordering in, 101,104 ex, Distinct edge costs, 149 Dormant nodes in negative cycle Earliest Deadline First algorithm,
for event sequences, 190 ex for linked lists, 44-45 107 ex Distributed systems, 708 detection, 306 127-128
in Segmented Least Squares DeLillo, Don, 400 Directed Disjoint Paths Problem. See Diverse Subset Problem, 505 ex Dots in Morse code, 163 Edahiro, M., 207
Problem, 263 Demands Disjoint Paths Problem Divide-and-Conquer-Alignment Doubly linked lists, 44-45 Edge congestion, 150
for survey design, 385 in circulation, 379-384, 414ex Directed Edge-Disjoint Paths algorithm, 288-289 Douglas, Michael, 115 Edge costs
Data stream algorithms, 48 in survey design, 386 Problem, 374, 624-625 Divide-and-conquer approach, 209, Downey, R., 598 distinct, 149
Data structures Demers, A1, 450 Directed edges for graphs, 73 727 Downstream nodes in flow networks, in Minimum Spanning Tree
arrays, 43-44 Demographic groups, advertising Directed graphs, 73 closest pair of points, 225 429 ex Problem, 143
dictionaries, 734-735 policies for, 422-423 ex connectivity in, 97-99 algorithm analysis for, 231 Downstream points in sharing, 690
in graph traversal, 90-94 Dense subgraphs, 788 ex disjoint paths in, 373-377 algorithm design for, 226-230 communications networks, Edge-disjoint paths, 374-376,
for representing graphs, 87-89 Dependencies in directed acyclic representing, 97 convolutions, 234 26-27 ex 624-625
hashing, 736-741 graphs, 100 search algorithms for, 97 algorithms for, 238-242 Dreyfus, S., 336 Edge lengths in shortest paths, 137,
lists, 44-45 Dependency networks, graphs for, 76 strongly connected, 77, 98-99 computing, 237-238 Drezner, Z., 551,659 290
notes, 70 Depth World Wide Web as, 75 problem, 234-237 Droid Trader! game, 524ex Edge-separation property, 575-577
priority queues. See Priority queues of nodes, 167 Directed Hopfield networks, 672 integer multiplication, 231 Dubes, R., 206 Edges
queues, 90 of sets of intervals, 123-125, Discovering nodes, 92 algorithm analysis for, 233-234 Duda, R., 206 bottleneck, 192 ex
in Stable Matching Problem, 42-47 566-567 Discrete Fourier transforms, 240 algorithm design for, 232-233 Duration of packet schedules, 765 capacity of, 338
stacks, 90 Depth-first search (DFS), 83-86 Disjoint Paths Problem, 373-374, 624 problem, 231-232 Dyer, M. E., 659 in graphs, 13, 73-74
Union-Find, 151-157 for directed graphs, 97-98 algorithms for inversions in, 221 Dynamic programming, 251-252 in Minimum Spanning Tree
De Berg, M., 250 implementing, 92-94 analyzing, 375-377 algorithms for, 223-225 for approximation, 600 Problem, 142-150
Index 823
Index
Graphs (cont.) designing and analyzing, Hamiltonian Cycle Problem, 474 Highway billboard placement, with depth, 437-438ex Input buffers in packet switching,
queues and stacks for traversing, 133-136 description, 474-475 307-309 ex local search, 681-682 797-801
89-90 extensions, 136-137 NP-completeness of, 475-479 Hill-climbing algorithms, 703 ex problem, 392-393 Input cushions in packet switching,
representing, 87-89 pricing methods in Disjoint Paths Hamiltonian Path Problem, 480 Hirschberg Daniel S., 206 < tool design for, 436-438ex 801
shortest paths in. See Shortest Path Problem, 624 NP-completeness of, 480-481 Histograms with convolution, 237 Implicit labels, 248 ex Input/output queueing in packet
Problem analyzing, 626, 628-630 running time of, 596ex Hitting Set Problem Inapproximability, 660 switching, 797
topological ordering in, 101-104 designing, 625-626, 628 Hard problems. See Computational defined, 506-507 ex Independent events, 709-710, Insert operation
algorithm design and analysis problem, 624-625 intractability; NP-hard optimization version, 653 ex 771-772 for closest pair of points, 746-747
for, 101-104 Shortest-First, 649-651 ex problems set size in, 594ex Independent random variables, 758 for dictionaries, 734-736
in DAGs, 104 ex, 107 ex for shortest paths, 137 Harmonic numbers Ho, J., 207 Independent Set Problem, 16-!7, for heaps, 64
problem, 100-101 analyzing, 138-142 in card guessing, 722 Hochbaum, Dorit, 659-660 454 for linked lists, 44-45
trees. See Trees designing, 137-138 in Nash equilibrium, 695 Hoeffding, H., 794 3-SAT reduction to, 460-462 Instability in Stable Matching
Greedy-Balance algorithm, 601-602 Hart, P., 206 Hoey, D., 226 contention resolution with,
Greedy algorithms, 115-116 Problem, 4, 20-25 ex
for Appalachian Trail exercise, Greedy-Disjoint-Paths algorithm, 626 Hartmanis, J., 70 Hoffman, Alan, 449 782-784 ex Integer multiplication, 209, 231
183-185 ex Greedy-Paths-with-Capacity Hash functions, 736-737 Hopcroft, J., 70 with Interval Scheduling Problem, algorithm for
for approximation, 599 algorithm, 628-630 designing, 737-738 Hopfield neural networks, 671 16, 505 ex analyzing, 233-234
Center Selection Problem, Greedy-Set-Cover algorithm, 613-616 universal classes of, 738-740, algorithms for notes, 205 designing, 232-233
,606-612 Greig, D., 449 749-750 analyzing, 674-675 in O(nk) time, 53-54 notes, 250
load balancing, 600-606 Grid graphs Hash tables, 736-738, 760 designing, 672-673 in a path, 312-313 ex problem, 231-232
Set Cover Problem, 612-617 greedy algorithms for, 656-657 ex Hashing, 734 notes, 705 in polynomial-time reductions, Integer programming
Shortest-First, 649-651 ex local minima in, 248-249 ex for closest pair of points, 742, problem, 671-672 454-456 for approximation, 600, 634-636
for clustering for sequence alignment, 283-284 749-750 stable configurations in, 676, 700, running times of, 54-55 for load balancing, 638-639
analyzing, 159-161 Group decision-making data, data structures for 702-703 ex using tree decompositions, 580-584 for Vertex Cover Problem, 634
designing, 157-158 513-514ex analyzing, 740-741 Hospital resident assignments, relation to Vertex Cover, 455-456, Integer Programming Problem,
for data compression, 161-166 Growth order, asymptotic, 35-36 designing, 735-740 23-24 ex 619 633-635
analyzing, 173-175 in common tractions, 40-42 for load balancing, 760-761 Houses, floor plan ergonomics for, Independent sets Integer-valued circulations, 382
designing, 166-173 lower bounds, 37 notes, 794 416-417 ex for grid graphs, 657 ex Integer-valued flows, 351
extensions, 175-177 notes, 70 problem, 734-735 HSoDT! (High-Score-on-Droid- in packing problems, 498 Interference-free schedules, 105 ex
for Interval Scheduling Problem, properties of, 38-40 HayMn, S., 705 Trader! Problem), 525 ex strongly, 519 ex Interference in Nearby
14, 116 tight bounds, 37-38 Head-ofqine blocking in packet Hsu, Y., 207 in trees, 558-560 Electromagnetic Observation
analyzing, 118-121 upper bounds, 36-37 switching, 798-799 Huffman, David A., 170, 206 Indifferences in Stable Matching Problem, 512-513 ex
designing, 116-!18 Guanine, 273 Heads of edges, 73 Huffman codes, 116, 161 Problem, 24-25 ex Interior point methods in linear
extensions, 121-122 Guaranteed close to optimal Heap order, 59-61 greedy algorithms for Inequalities programming, 633
for Interval Coloring, 121-125 solutions, 599 Heapify-down operation, 62-64 analyzing, 173-175 linear Interleaving signals, 329 ex
limitations of, 251 Guessing cards Heapify-up operation, 60-62, 64 designing, 166-173 in Linear Programming Problem, Internal nodes in network models,
for minimizing lateness, 125-126 with memory, 721-722 Heaps, 58-60 extensions, 175-177 631 339
analyzing, 128-131 without memory, 721 operations for, 60-64 notes, 206 for load balancing, 638 Internet touters, 795
designing, 126-128 Gusfield, D. R. for priority queues, 64-65 problem, 162-166 for Vertex Cover Problem, 634 Internet routing
extensions, 131 sequence analysis, 335 for Dijkstra’s Algorithm, 1)t2 Human behaviors, 522-524 ex triangle, 203 ex, 334-335 ex notes, 336
for minimum-cost arborescences, stable matching, 28 for Prim’s Algorithm, 150 Hyperlinks in World Wide Web, 75 Infinite capacities in Project Selection shortest paths in, 297-301
177-179 Guthrie, Francis, 485 Heights of nodes, 358-359 Hypertext fiction, 509-510 ex Problem, 397 Internet services, cost-sharing for,
analyzing, 181-183 Guy, R. K., 551 Hell, P., 206 Infinite sample spaces, 774-776 690-700, 785-786 ex
Hidden surface removal, 248 ex I
designing, 179-181 Influence Maximization Problem, Interpolation of polynomials, in
for Minimum Spanning Tree H Hierarchical agglomerative Ibarra, Oscar H., 660 524 ex Fast Fourier Transform, 238,
Problem, 142-143 Haken, W., 490 clustering, 159 Identifier Selection Problem, 770 Information networks, graphs for, 75 241-242
analyzing, 144-149 Hall, L., 659-660 Hierarchical metrics, 201 ex Idle time in minimizing lateness, Information theory Intersection Interface Problem, 513 ex
designing, 143-144 Hall, P., 449 Hierarchies 128-129 for compression, 169 Interval Coloring Problem, 122-125,
extensions, 150-151 Hall’s Theorem, 372 memory, 131-132 Image Segmentation Problem, notes, 206 566
for NP-hard problems on trees, and Menger’s Theorem, 377 in trees, 78 391-392 Initial conditions in planning from Circular-Arc Coloring
558-560 notes, 449 High-Score-on-Droid-Trader! Problem algorithm for, 393-395 problems, 534, 538 Problem, 566-569
for optimal caching, 131-133 for NP and co-NP, 497 (HSoDT!), 525 ex
Index 827
Interval Coloring Problem (cont.) in Scheduling with Release Times total weights in, 657-658ex Least-Recently-Used (LRU) principle problem, 637-638 algorithm design for, 672-673
notes, 598 and Deadlines, 493 notes, 335, 529 in caching, 136-137, 751-752 notes, 659-660 local optima in, 671
Interval graphs, 205 Johnson, D. S. Knuth, Donald E., 70, 336 notes, 794 for Vertex Cover, 635-637 problem, 671-672
Interval Partitioning Problem, circular arc coloring, 529 recurrences, 249-250 Least squares, Segmented Least Linear Programming Problem, for Maximum-Cut Problem
122-125,566 MAX-SAT algorithm, 793 stable matching, 28 Squares Problem, 261 631-632 approximation, 676-679
Interval Scheduling Problem, 13-14, NP-completeness, 529 Kolmogorov, Vladimir, 449 algorithm for Linear space, sequence alignment in, Metropolis algorithm, 666-669
116 ....... ~ Set Cover algorithm, 659 K6nig, D., 372, 449 analyzing, 266 284 neighbor relations in, 663-664,
decision version of, 505 ex Jordan, M., 598 Korte, B., 659 designing, 264-266 algorithm design for, 285-288 679-681
greedy algorithlns for, 116 Joseph, Deborah, 207 Kruskal’s Algorithm, 143-144 notes, 335 problem, 284-285 notes, 660
for Interval Coloring, 121-125 Junction boxes in communications with clustering, 159-160 problem, 261-264 Linear time, 48-50 optimization problems, 662
analyzing, 118-121 networks, 26-27 ex data structures for Leaves and leaf nodes, in t~ees, 77, for closest pair of points, 748-750 connections to, 663-664
designing, 116-118 pointer-based, 154-155 559 graph search, 87 potential energy, 662-663
extensions, 121-122 K simple, 152-153 Lecture Planning Problem, 502-505 ex Linearity of expectation, 720-724 Vertex Cover Problem, 664-666
Multiple Interval Scheduling, 512 ex K-clustering, 158 improvements, 155-157 LEDA (Library of Efficient Algorithms Linked lists, 44-45 simulated annea~ng, 669-670
notes, 206 K-coloring, 563,569-570 optimality of, 146-147 and Datastructures), 71 Linked sets of nodes, 585-586 Locality of reference, 136, 751
for processors, 197 ex K-flip neighborhoods, 680 problem, 151-152 Lee, Lillian, 336 Lists Location problems, 606, 659
Shortest-First greedy algorithm for, K-L (Kernighan-IAn) heuristic, 681 valid execution of, 193 ex Leighton, F. T., 765, 794 adjacency, 87-89, 93 Logarithms in asymptotic bounds, 41
649-651 ex Kahng, A., 207 Kumar, Amit, 598 Lelewer, Debra, 206 merging, 48-50 Lombardi, Mark, 110 ex
Intervals, dynamic programming Karatsuba, A., 250 Lengths in Stable Matching Algorithm, Lookup operation
Karger, David, 715, 790ex, 793 L of edges and paths in shortest 42-45
over for closest pair of points, 748-749
algorithm for, 275-278 Karmarkar, Narendra, 633 Labeling Problem paths, 137, 290 Liu, T. H., 206 for dictionaries, 735-736, 738
problem, 273-275 Karp, R. M. via local search, 682-688 of paths in Disjoint Paths Problem, Llewellyn, Donna, 250 Loops, running time of, 51-53
Inventory problem, 333 ex augmenting paths, 357 notes, 706 627-628 Lo, Andrew, 336. Lovfisz, L., 659
Inverse Ackermann function, 157 NP-completeness, 529 Labels and labeling of strings, 463 Load balancing Low-Diameter Clustering Problem,
Inversions Randomized Marking algorithm, gap labeling, 445 ex Lenstra, J. K. greedy algorithm for, 600-606 515-516 ex
algorithms for counting, 223-225 794 image, 437-438 ex local search, 705 linear programming for, 637 Lower bounds
in ~g lateness, 128-129 Karl3 reduction, 473 in image segmentation, 393 rounding algorithm, 660 algorithm design and analysis asymptotic, 37
problem, 221-223 Kasparov, Garry, 535 in Preflow-Push Algorithm, scheduling, 206 for, 638-643 circulations with, 382-384, 387,
significant, 246 ex Kempe, D., 530 360-364, 445 ex Levin, L., 467, 529, 543 problem, 637-638 414 ex
Kernighan, B., 681,705 Landscape in local search, 662 Library of Efficient Algorithms and randomized algorithms for,
Investment simulation, 244-246 ex notes, 660
Irving, R. W., 28 Kernighan-IAn (K-L) heuristic, 681 connections to optimization, Datastructures (LEDA), 71 760-762 on optimum for Load Balancing
Ishikawa, Hiroshi, 450 Keshav, S., 336 663-664 Licenses, software, 185-187ex Local minima in local search, Problem, 602-603
Iterative-Compute-Opt algorithm, Keys notes, 705 LIFO (last-in, first-out) order, 90 248-249 e.x, 662, 665 Lowest common ancestors, 96
259 in heaps, 59-61 potential energy, 662-663 Light fixtures, ergonomics of, Local optima LRU (Least-Recently-Used) principle
Iterative procedure in priority queues, 57-58 Vertex Cover Problem, 664- 416-417 ex in Hopfield neural networks, 671 in caching, 136-137, 751-752
for dynamic programming, Khachiyan, Leonid, 632 666 Likelihood in image segmentation, in Labeling Problem, 682-689 notes, 794
258-260 Kim, Chul E., 660 Laptops on wireless networks, 393 in Maximum-Cut Problem, 677-678 Luby, M., 794
for Weighted Interval Scheduling Kirkpatrick, S., 669,705 427-428 ex Limits on approximability, 644 Local search, 661-662 Lund, C., 660
Problem, 252 Kleinberg, J., 659 Last-in, first-out (LIFO) order, 90 Lin, S., 681,705 best-response dynamics as, 690,
Knapsack algorithm, 266-267, Lateness, minimizing, 125-126 Line of best fit, 261-262 693-695 M
J 648-649 algorithms for Linear equations definitions and examples, M-Compute-Opt algorithm, 256-
Jagged funnels in local search, 663 Knapsack-Approx algorithm, 646-647 analyzing, 128-!31 rood 2, 779-782 ex 691-693 257
Jain, A., 206 Knapsack Problem, 266-267, 499 designing, 126-128 solving, 631 Nash equilibria in, 696-700 Maggs, B. M., 765, 794
Jars, stress-testing, 69-70 ex algorithms for extensions for, 131 Linear programming and rounding, problem, 690-691 Magnanti, Thomas L., 449-450
Jensen, T. R., 529, 598 analyzing, 270-271 notes, 206 630-631 questions, 695-696 Magnets, refrigerator, 507-508 ex
Jobs designing, 268-270 in schedulable jobs, 334ex for approximation, 600 classification via, 681-682 Main memory, 132
in Interval Scheduling, 116 extensions, 271-272 Lawler, E. L. general techniques, 631-633 algorithm analysis for, 687-689 MakeDictionary operation
in load balancing, 600, 637-638, approximations, 644 matroids, 207 Integer Programming Problem, algorithm design for, 683-687 for closest pair of points, 745-746
789-790 ex algorithm analysis in, 646-647 NP-completeness, 529 633-635 notes, 706 for hashing, 734
in Scheduling to Minimize algorithm design in, 645-646 scheduling, 206 for load balancing, 637 problem, 682-683 Makespans, 600-605, 654 ex
Lateness, 125-126 problem, 644-645 Layers in breadth-first search, 79-81 algorithm design and analysis Hopfield neural networks, 671 MakeUnionFind operation, 152-156
for, 638-643 algorithm analysis for, 674-675 Manber, Udi, 450
828 Index Index 829
Mapping genomes, 279, 521 ex, for disjoint paths, 376-377 Mehlhorn, K., 71 number of, 718-719 Motwani, R., 793-794 Negation with Boolean variables, 459
787 ex good characterizations via, 497 Memoization, 256 problem, 714-715 Multi-phase greedy algorithms, 177 Negative cycles, 301
Maps of routes for transportation with node capacities, 420-421 ex over subproblems, 258-260 in image segmentation, 393 analyzing, 181-183 algorithms for
networks, 74 Maximum 3-Dimensional Matching for Weighted Interval Scheduling Karger’s algorithm for, 790 ex designing, 179-181 designing and analyzing,
Margins in pretty-printing, 317-319 ex Problem, 656 ex Problem, 256-257 in local search, 684 problem, 177-179 302-304
Marketing, viral, 524 ex Maximum, computing in linear time, Memory hierarchies, 131-132 in Maximum-Flow Problem, 340 Multi-way choices in dynamic extensions, 304-307
Marking algorithms for randomized 48 Menger, K., 377, 449 in networks, 346 programming, 261 in Minimum-Cost Perfect Matching
caching, 750, 752-753 Maximum-Cut Problem in local Menger’s Theorem, 377 algorithm analysis for, 346-348 algorithm for Problem, 406
analyzing, 753-755 search, 676, 683 Merge-and-Count algorithm, 223-225 maximum flow with, 348-352 analyzing, 266 problem, 301-302
notes, 794 algorithms for Mergesort Algorithm, 210-211 notes, 793 designing, 264-266 relation to shortest paths, 291-294
randomized, 755-758 analyzing, 677-679 as example of general approach, in Project Selection Problem, problem, 261-264 Neighborhoods
Martello, S., 335,529 designing, 676-677 211-212 397-399 for shortest paths, 293 in Hopfield neural networks, 677
Matching, 337 for graph partitioning, 680-681 notes, 249 Minimum Spanning Tree Problem, Mniticast, 690 in Image Segmentation Problem,
3-Dimensional Matching Problem Maximum Disjoint Paths Problem, running times for, 50-51 116 Mniticommodity Flow Problem, 382 682
NP-completeness, 481-485 624 recurrences for, 212-214 greedy algorithms for ¯ Multigraphs in Contraction in local search, 663-664, 685-687
polynomial time in, 656ex greedy approximation algorithm Merging analyzing, 144-149 Algorithm, 715 in Maximum-Cut Problem, 680
problem, 481 for, 625-627 inversions in, 22!-225 designing, 143-144 Multiple Interval Scheduling, 512 ex Nemhauser, G. L., 206
4-Dimensional Matching Problem, pricing algorithm for, 628-630 sorted lists, 48-50 extensions, 150-151 Multiplication Nesetril, J., 206
507 ex problem, 624-625 Meta-search tools, 222 notes, 206 integer, 209, 231 Nested loops, running time of, 51-53
base-pair, 274 Maximum-Flow Problem Metropolis, N., 666, 705 problem, 142-143 algorithm analysis for, 233-234 Nesting arrangement for boxes,
in bipartite graphs. See Bipartite algorithm for Metropolis algorithm, 666-669 Minimum spanning trees algorithm design for, 232-233 434-435 ex
Matching Problem analyzing, 344-346 Meyer, A., 543, 551 for clustering, 157-159 notes, 250 Network design, in Minimum
in load balancing, 638 designing, 340-344 Miller, G., 598 membership in, 188 ex problem, 231-232 Spanning Tree Problem,
Minimum-Cost Perfect Matching extensions, 378-379 Minimum-altitude connected Minimum-weight Steiner trees, polynomials via convolution, 235, 142-143, 150
Problem, 405-406 circulations with demands, subgraphs, 199 ex 204 ex, 335ex 238-239 Network flow, 337-338
algorithm design and analysis 379-382 Minimum-bottleneck spanning trees, Minimum Weight Vertex Cover Multivariable Polynomial Airline Schedufing Problem, 387
for, 405-410 circulations with demands and 192 ex Problem, 793 ex Minimization Problem, algorithm analysis for, 390-391
economic interpretation of, lower bounds, 382-384 Minimum Cardinality Vertex Cover Mismatch costs, 280 520 ex algorithm design for, 389-390
410-411 with node capacities, 420-421 ex Problem, 793 ex Mismatches in sequences, 278-280 Mutual teachability, 98-99 problem, 387-389
notes, 449 notes, 448 Minimum-Cost Arborescence Mitzeumacher, M., 793-794 Mutually reachable nodes, 98-99 Baseball Elimination Problem, 400
in packet switching, 798, 801-803 problem, 338-340 Problem, 116, 177 Mobile computing, base stations for, algorithm design and analysis
in sequences, 278-280 Maximum Matching Problem. See greedy algorithms for 4!7-418 ex N for, 402-403
in Stable Matching Problem. See Bipartite Matching Problem analyzing, 181-183 Mobile robots, 104-106 ex N-node trees, 78 characterization in, 403-404
Stable Matching Problem Maximum spacing, clusterings of, designing, 179-181 Mobile wireless networks, 324-325 ex Nabokov, Vladimir, 107 ex problem, 400-401
Mathews, D. H., 335 158-159 problem, 177-179 Mod 2 linear equations, 779-782 ex N~er, S., 71 Bipartite Matching Problem, See
Matrices Maximum-Weight Independent Set Minimum-Cost Dominating Set Modified Qnicksort algorithm, Nash, John, 692 Bipartite Matching Problem
adjacency, 87-89 Problem Problem, 597ex 732-734 Nash equilibria Disjoint Paths Problem, 373-374
entries in, 428 ex using tree decompositions, 572, Minimum-Cost Path Problem. See Molecules definitions and examples, 691-693 algorithm analysis for, 375-377
in linear programming, 631-632 580-584 Shortest Path Problem closest pair of points in, 226 finding, 696-700 algorithm design for, 374-375
Matroids, 207 on trees, 560-562 Minimum-Cost Flow Problem, 449 entropy of, 547-550 ex notes, 706 algorithm extensions for,
MAX-3 -SAT Maze-Solving Problem, 78-79 Minimum-Cost Perfect Matching protein, 651-652 ex problem, 690-691 377-378
algorithm design and analysis for, McGeoch, L. A., 794 Problem, 405-406 RNA, 273-274 questions, 695-696 problem, 374
725-726 McGuire, C. B., 706 algorithm design and analysis for, Monderer, D., 706 National Resident Matching Problem, good augmenting paths for, 352
good assignments for, 726-727 McKeown, N., 799 405-410 Monitoring networks, 423-424 ex 3, 23-24 ex algorithm analysis for, 354-356
notes, 793 Median-finding, 209,727 economic interpretation of, 410-411 Monotone formulas, 507 ex Natural brute-force algorithm, 31-32 algorithm design for, 352-354
problem, 724-725 algorithm for notes, 449 Monotone QSAT, 550 ex Natural disasters, 419 ex algorithm extensions for,
random assignment for, 725-726, analyzing, 730-731 Minimum cuts Monotone Satisfiability, 507 ex Nau, Dana, 552 356-357
787 ex designing, 728-730 in Baseball Elimination Problem, Morse, Samuel, 163 Near-trees, 200 ex finding, 412 ex
Max-Flow Min-Cut Theorem, approximation for, 791 ex 403-404 Morse code, 163 Nearby Electromagnetic Observation Image Segmentation Problem,
348-352 problem, 727-728 g!obal, 714 Most favorable Nash equilibrium Problem, 512-513 ex 391-392
for Baseball Elimination Problem, Medical consulting firm, 412-414ex, algorithm analysis for, 716-718 solutions, 694-695 Needieman, S., 279 algorithm for, 393-395
403 425-426 ex algorithm design for, 715-716
Index 831
830 Index
Objective function in Linear Output queueing in packet switching, Path Coloring Problem, 563-565 in linked lists, 44
Network fiow (cont.) co-NP and asymmetry in, 495-497 796-797 Path decomposition, 376
Programming Problem, 632 in Union-Find data structure,
Image Segmentation efficient certification in, 463-466 Overlay networks, 784-785 ex Path Selection Problem, 508 ex
Graph Coloring, 485-490 Odd cycles and graph bipartiteness, 154-157
Problem (cont.) Overmars, M., 250 Path vector protocols, 301 Points, closest pairs of. See Closest
problem, 392-393 independent sets, 17 95
Off-center splitters in median-finding, Paths, 76-77 pair of points
Maximum-Flow Problem. See notes, 529,659 P augmenting. See Augmenting paths Politics, gerrymandering in,
numerical problems, 490-495 730
Maximum-Flow Problem P class: See Polynomial time disjoint. See Disjoint Paths Problem 331-332 ex
partitioning problems, 481-485 Offering prices in combinatorial
Preflow-Push Maximum-Flow Packet routing, 762-763 shortest. See Shortest Path Problem Polymer models, 547-550 ex
polynomial-time reductions, auctions, 511 ex
AJgorithm, 357 algorithm for Patterns Polynomial Minimization Problem,
452-454 Ofman, Y., 250
algorithm analysis for, 361-365 analyzing, 767-769 in related recurrences, 221 520 ex
Independent Set in, 454-456 Omega notation
algorithm design for, 357-361 designing, 765-~67 in unrolling recurrences, 213,215, Polynomial space. See PSPACE
algorithm extensions for, 365 Turing, 473 in asymptotic order of growth,
37-38 notes, 794 218 Polynomial time, 34, 463-464
algorithm implementation for, Vertex Cover in, 456-459 problem, 763-765
exercise, 66 e.x, 68 ex Pauses in Morse code, 163 approximation scheme, 644-645
365-367 proofs for, 470-473 Packet switching
On-line algorithms, 48 Peer-to-peer systems, 784-785 ex in asymptotic bounds, 40-41
Project Selection Problem, 396-399 Satisfiability Problem in, 459- algorithm for
for caching, 751 Peering relationships in as definition of efficiency, 32-35
Networks 463 analyzing, 803-804
for Interval Scheduling Problem, communication networks, 75 in efficient certification, 463
graphs as models of, 75-76 sequencing problems, 473-474
121 designing, 800-803 Perfect Assembly Problem, 521 ex notes, 70-71
neural. See Hopfield neural Hamiltonian Cycle Problem,
notes, 794 problem, 796-800 Perfect matching, 337 reductions, 452-454
networks 474-479
One-pass auction, 788-789 ex Packets, 763 in Bipartite Matching Problem, independent Set in, 454-456
routing in. See Routing in networks Hamiltonian Path Problem,
Open-Pit Mining Problem, 397 Packing problems, 456, 498 14-16, 371-373,404-405 Turing, 473
social, 7~-76, 110-111ex 480-481 Pairs of points, closest. See Closest
Operators in planning problems, 534,, in Gale-Shapley algorithm, 8 Vertex Cover in, 456-459
wireless, 108-109 ex, 324-325 ex Traveling Salesman Problem,
538-540 pair of points in Stable Matching Problem, 4-5 Polynomial-time algorithm, 33
Newborn, M., 551-552 474, 479 Papadimitriou, Christos H.
Opportunity cycles, 324 ex Permutations Polynomially bounded numbers,
Nielsen, Morten N., 207 taxonomy of, 497-500
Optimal caching circular arc coloring, 529 of database tables, 439-440 ex subset sums with, 494-495
Node-Disjoint Paths Problem, 597 ex NP-hard problems, 553-554 complexity theory, 551
greedy algorithms for in sequencing problems, 474 Polynomials, recursive procedures
Node-separation property, 575-576 taxonomy of, 497-500
designing and analyzing, game theory, 706 Phases for marking algorithms, for, 240-241
Nodes on trees, 558 Parameterized complexity, 598 752-753 interpolation, 238, 241-242
Circular-Arc Coloring Problem. 133-136
in binary trees, 108 ex Parents in trees, 77 Picard, J., 450 multiplication, 235
See Circular-Arc Coloring extensions, 136-137
central, 429 ex Parsing algorithms for context-free Picnic exercise, 327 ex Porteous, B., 449
Problem notes, 206
degrees of, 88 grammars, 272 Pieces in tree decompositions, 574 Porting software, 433 ex
depth of, 167 decompositions. See Tree problem, 131-133
Optimal prefix codes, 165-166, Partial assignment, 591-594 ex Ping commands, 424 ex Potential functions
discovering, 92 decompositions Partial products in integer
170-173 Pixels in Nash equilibrium, 700
in graphs, 13, 73-74 greedy algorithm for, 558-560 multiplication, 232
Optimal radius in Center Selection compression of images, 176 notes, 706
for heaps, 59-60 Maximum-Weight Independent Partial substitution
Problem, 607-610 in image segmentation, 392-394 for push operations, 364
heights of, 358-359 Set Problem, 560-562 in sequence alignment recurrence,
Optimal schedules in minimizing in loca! search algorithm, 682 Prabhakar, B., 799
linked sets of, 585-586 Vertex Cover Problem, 554-555 289 Placement costs, 323-324 ex Precedence constraints in Project
algorithm analysis for, 557 lateness, 128-131
local minimum, 248 ex in unrolling recurrences, 214, Planning Selection Problem, 396-397
algorithm design for, 555-557 Oral history study, 112 ex
in network models, 338-339 217L219, 243-244 ex contingency, 535 Precedence relations in directed
prices on, 407-4!0 Null pointers in linked lists, 44 Order of growth, asymptotic. See
Asymptotic order of growth Partial tree decomposition, 588-590 notes, 552 acyclic graphs, 100
in shortest paths, 137 Number Partitioning Problem, 518 ex Partitioning problems, 498-499
Ordered graphs, characteristics of, in PSPACE, 533-535, 538 Preference lists in Stable Matching
Nonadopters in human behaviors, Numerical problems, 490, 499 3-Dimensional Matching Problem, algorithm analysis for, 542-543
313 ex Problem, 4-5
523ex in scheduling, 493-494 481-485
Ordered pairs as representation of algorithm design for, 540-542 Preferences in Stable Matching
Noncrossing conditions in RNA Subset Sum Problem, 491-495 Graph Coloring Problem, 485-486 problem, 538-540
directed graph edges, 73 Problem, 4
base-pair matching, 274 Interval Partitioning Problem, Plot Fulfillment Problem, 510 ex Prefix codes, 164-165
O Ordering, topological, 102
Nondeterministic search, 464n 121-125, 566 Plotkin, S., 659 binary trees for, 166-169
O notation computing, 101
Nonsaturafing push operations, local search for, 680-681 P = NP question, 465 optimal, 165-166, 170-173
363-364, 446 ex in asymptotic order of growth, in DAGs, 102, 104ex, 107ex
node deletions in, 102-104 Maximum Cut Problem, 676 Pointer-based structures for Prefix events in infinite sample
Norvig, P., 552 36-38
Orfin, James B., 449-450 notes, 705 Union-Find, 154-156 spaces, 775
Nowakowski, R., 551 exercise for, 65-66 ex Number Partitioning Problem, Pointer graphs in negative cycle Preflow-Push Maximum-Flow
NP and NP-completeness, 451y452, O(n2) time, 51-52 Output buffers in packet switching,
518 ex detection algorithm, 304-306 Algorithm, 357
466 O(n3) time, 52-53 796-801
Segmented Least Squares Problem, Pointers analyzing, 361-365
O(nk) time, 53-54 Output cushions in packet switching,
Circuit Satisfiability Problem, 263-265 for heaps, 59-60 designing, 357-361
466-470 O(n log n) time, 50-51 801
832 Index Index 833
Preflow-Push Maximum-Flow Probability mass, 769 designing, 536-537 randomization in, 782-784 ex subproblems in, 215-220 problem, 273-275
Algorithm (cont.) Probing nodes, 248 ex extensions, 538 divide-and-conquer approach, 209, in Weighted Interval Scheduling Robertson, N., 598
extensions, 365 Process Naming Problem, 770 monotone, 550 ex 727 " Problem, 257 Robots, mobile, 104-106ex
implementing, 365 Progress measures notes, 551 median-finding, 727-731 Recursive-Mulfiple algorithm, Rosenbluth, A. W., 666
notes, 449 for best-response dynamics, 697 in PSPACE completeness, 543-545 Quicksort, 731-734 233-234 Rosenbluth, M. N., 666
variants, 444-446 ex in Ford-Fulkerson Algorithm, Quadratic time, 51-52 global minimum cuts, 714 Recursive procedures Rooted trees
Preflows, 357-358 344-345 Quantification in PSPACE, 534-538 algorithm analysis for, 716-718 for depth-first search, 85, 92 arborescences as, 177-179
Preparata, F. P., 249 in Gale-Shapley algorithm, 7-8 Quantifiers in PSPACE completeness, algorithm design for, 715-716 for dynamic programming, for clock signals, 200 ex
Preprocessing for data structures, 43 in Hopfield neural networks, 674 544 number of, 718-719 259-260 description, 77-78
Prerequisite lists in planning Project Selection Problem, 396 Queue management policy, 763 problem, 7i4-715 for Weighted Interval Scheduling for prefix codes, 166
problems, 534, 538 algorithm for Queues hashing, 734 . Problem, 252-256 rounding fractional solutions via,
Press, W. H., 250 analyzing, 398-399 for graph traversal, 89-90 data structure analysis for, Reduced costs of edges, 409 639-643
Pretty-printing, 317-319 ex designing, 397-598 for Huffman’s Algorithm, 175 740-741 Reduced schedules in optimal Roots of unity with convolution, 239
Price of stability problem, 396-397 in packet routing, 763 data structure design for, caching, 134-135 Rosenthal, R. W., 706
in Nash equilibrium, 698-699 Projections of database tables, in packet switching, 796-797 735-740 Reductions Ross, S., 335
notes, 706 439-440 ex priority. See Priority queues problem, 734-735 polynomial-time, 452-454 ROTC picnic exercise, 327 ex
Prices Proposed distances for closest pair of Quicksort, 731-734 for load balancing, 760-762 Turing0 Cook, and Karp, 473 Roughgarden, T., 706
economic interpretation of, 410-411 points, 743-745 for MAX-3-SAT, 724-727 in PSPACE completeness, 546 Rounding
fair, 620-621 Protein molecules, 651-652 ex R notes, 793 transitivity of, 462-463 for Knapsack Problem, 645
in Minimum-Cost Perfect Matching Pseudo-code, 35-36 Rabin, M. O., 70, 794 for packet routing, 762-763 Reed, B., 598 in linear programming. See Linear
Problem, 407-410 Pseudo-knotting, 274 Rackoff, Charles, 207 algorithm analysis for, 767-769 Refrigerator magnets, 507-508 ex programming and rounding
Pricing (primal-duN) methods, 206 Pseudo-polynomial time Radio interference, 512-513 ex algorithm design for, 765-767 Register allocation, 486 Route maps for transportation
for approximation, 599-600 in augmenting paths, 356-357 Radzik, Tomasz, 336 notes, 794 Relabel operations in preflow, networks, 74
Disioint Paths Problem, 624-630 efficiency of, 271 Raghavan, P., 793 problem, 763-765 360-364, 445 ex Router paths, 297-301
Vertex Cover Problem, 6!8-623 in Knapsack Problem, 645 Random assignment probability. See Probability Release times, 137, 493, 500 Routing in networks
notes, 659 in Subset Sum Problem, 491 for linear equations rood 2, random variables and expectations Representative sets for protein game theory in, 690
Primal-dual methods. See Pricing PSPACE, 531-533 780-781 ex in, 719-724 molecules, 651-652 ex definitions and examples,
methods completeness in, 18, 543-547 for MAX-3-SAT problem, 725-726, Randomized caching, 750 Requests in interval schedu!ing, 691-693
Prim’s Algorithm for games, 535-538, 544-547 787 ex marking algorithms for, 752-753 13-14 and local search, 693-695
implementing, 149-150 planning problems in, 533-535, Random variables, 719-720 analyzing, 753-755 Residual graphs, 341-345 Nash equilibria in, 696-700
optimality, 146-147 538 with convolution, 237 notes, 794 in Minimum-Cost Perfect Matching problem, 690-691
for spanning trees, 143-144 algorithm analysis for, 542-543 expectation of, 719-720 randomized, 755-758 Problem, 405 questions, 695-696
Printing, 317-319 ex algorithm design for, 540-542 linearity of expectation, 720-724 notes, 794 for preflows, 358-359 Internet
Priority queues, 57-58 problem, 538-540 Randomized algorithms, 707-708 problem, 750-752 Resource allocation disjoint paths in, 624-625
for Diikstra’s Algorithm, 141-142 quantification in, 534-538 for approximation algorithms, Rankings, comparing, 221-222 in Airline Scheduling, 387 notes, 336
heaps for. See Heaps Pull-based Bellman-Ford algorithm, 660, 724-727, 779-782.ex, . Ranks in Stable Matching Problem, 4 in Bipartite Matching, 14-16 shortest paths in, 297-301
for Huffman’s Algorithm, 175 298 787-788 ex, 792-793 ex Rao, S., 765, 794 in Center Selection, 606-607 notes, 336
notes, 70 Pure output queueing in packet caching. See Randomized caching Ratliff, H., 450 in Interval Scheduling, 13-14, 116 packet, 762-763
for Prim’s Algorithm, 150 switching, 796 Chernoff bounds, 758-760 Rearrangeable matrices, 428 ex in Load Balancing, 600, 637 algorithm analysis for, 767-769
Priority values, 57-58 Push-based Bellman-Ford algorithm, closest pair of points, 741-742 Rebooting computers, 320-322 ex in Wavelength-Division algorithm design for, 765-767
Probabilisfic method 298-299 algorithm analysis for, 746-747 Reconciliation of checks, 430 ex Multiplexing, 563-564 problem, 763-765
for MAX-3-SAT problem, 726 Push-Based-Shortest-Path algorithm, algorithm design for, 742-746 Recurrences and recurrence relations, Resource Reservation Problem, 506 ex Routing requests in Maximum
notes, 793 299 linear expected running time for, 209 Reusing space, 537-538, 541 Disjoint Paths Problem, 624
Probability, 707 Push operations in preflow, 360, 748-750 for divide-and-conquer algorithms, Reverse-Delete Algorithm, 144, RSA cryptosystem, 491
Chernoff bounds, 758-760 446 ex notes, 794 210-211 148-149 Rubik’s Cube
conditional, 771-772 Pushing flow in network models, 341 problem, 742 approaches to, 211-212 Rinnooy Kan, A. H. G., 206 as planning problem, 534
of events, 709-710, 769-770 contention resolution, 708-709 substitutions in, 213-214 Rising trends, 327-328 ex vs. Tetris, 795
probability spaces in algorithm analysis for, 709-714 unrolling recurrences in, RNA Secondary Structure Prediction Run forever, algorithms that
finite, 769-771 QSAT (Quantified 3-SAT), 535-536 algorithm design for, 709 212-213,244ex Problem, 272-273 description, 795-796
infinite, 774-776 algorithm for notes, 793 in sequence alignment, 285-286, algorithm for, 275-278 packet switching
Union Bound in, 772-774 analyzing, 537-538 problem, 709 289-290 notes, 335 algorithm analysis for, 803-804
Index 835
834 Index
Run forever, algorithms that (cont.) Daily Special Scheduling Problem, problem, 456-459,612-613 notes, 250 running times for, 50-51
Secondary structure, RNA. See RNA
packet switching (cont.) 526ex Secondary Structure Prediction relation to Vertex Cover Problem, smoothing, 209, 236 substitutions in, 2!3-214
algorithm design for, 800-803 interference-free, 105 ex 618-620 Significant improvements in neighbor unrolling recurrences in, 212-213
Problem
problem, 796-800 interval. See Interval Scheduling Segmentation, image, 391-392 Set Packing Problem, 456, 498 labeling, 689 O(n log n) time, 50-51
Running times, 47-48 Problem Seymour, P. D., 598 Significant inversion, 246 ex priority queues for, 58
algorithm for, 393-395
cubic, 52-53 Knapsack Problem. See Knapsack Shamir, Ron, 113 Similarity between strings, 278-279 Quicksort, 731-734
local search in, 681-682
exercises, 65-69 ex Problem problem, 392-393 Shamos, M. I. Simple paths in graphs, 76 topological, 10!-104, 104ex, 107ex
linear, 48-50 Load Balancing Problem. See Load tool design for, 436-438 ex closest pair of points, 226 Simplex method in linear Source conditions for preflows,
in Maximum-Flow Problem, Balancing Problem divide-and-conquer, 250 programming, 633 358-359
Segmented Least Squares Problem,
344-346 for minimizing lateness. See 261 Shannon, Claude E., 169-170, 206. Simulated annealing Source nodes, 338-339, 690
O(nk), 53-54 Lateness, minimizing algorithm for Shannon-Fano codes, !69-170 ~ notes, 705 Sources
O(n log n), 50-51 Multiple Interval Scheduling, analyzing, 266 Shapley, Lloyd, 1-3, 28, 706, 786 ex technique, 669-670 in circulation, 379-381
quadratic, 51-52 NP-completeness of, 512 ex Shapley value, 786 ex Single-flip neighborhood in Hopfield in Maximum-Flow Problems, 339
designing, 264-266
sublinear, 56 numerical problems in, 493-494, Sharing neural networks, 677 Space complexity, 531-532
notes, 335
worst-case, 31-32 500 problem, 261-264 apartment expenses, 429-430 ex Single-flip rule in Maximum-Cut Space-Efficient-Alignment algorithm,
Russell, S., 552 optimal caching edge costs, 690 Problem, 680 285-286
segments in, 263
greedy algorithm design and Internet service expenses, 690-700, Single-link clustering, 159, 206 Spacing of clusterings, 158-159
Seheult, A., 449
analysis for, 133-136 785-786ex Sink conditions for preflows, 358-359 Spanning Tree Problem. See
Seidel, R., 794
S-t connectivity, 78, 84 Shmoys, David B. Sink nodes in network models,- Minimum Spanning Tree
greedy algorithm extensions for, Selection in median-finding, 728-730
S-t Disioint Paths Problem, 374 136-137 greedy algorithm for Center ~8-~39 Problem
Self-avoiding walks, 547-550 ex
Sahni, Sartai, 660 problem, 131-133 Selection, 659 Sinks in circulation, 379-381 Spanning trees
Self-enforcing processes, 1
Sample space, 769, 774-776 in packet routing. See Packet Separation for disjoint paths, 377 rounding algorithm for Knapsack, Sipser, Michael and arborescences. See Minimum-
Sankoff, D., 335 routing Separation penalty in image 660 polynomial time, 70 Cost Arborescence Problem
Satisfiability (SAT) Problem processors, 442-443 ex segmentation, 393,683 scheduling, 206 P = NP question, 529 combinatorial structure of,
3-SAT. See 3-SAT Problem shipping, 25-26 ex Shortest-First greedy algorithm, Six Degrees of Kevin Bacon game, 202-203 ex
Sequence alignment, 278, 280 ’
NP completeness, 466-473 triathalons, 191 ex algorithms for 649-651 ex 448 ex Sparse graphs, 88
relation to PSPACE completeness, for weighted sums of completion Shortest Path Problem, 116, 137, 290 Skeletons of graphs, 517-518 ex Spell-checkers, 279
analyzing, 282-284
543 times, 194-195 ex designing, 281-282 bicriteria, 530 Skew, zero, 201 ex Spencer, J., 793-794
reductions and, 459-463 Schoning, Uwe, 598 distance vector protocols Slack Splitters
for biological sequences, 279-280,
Satisfiable clauses, 459 Schrijver, A., 449 652 ex description, 297-300 in minimizing lateness, 127 in median-finding, 728-730
Satisfying assignments with Boolean Schwartzkopf, O., 250 in linear space, 284 problems, 300-301 in packet switching, 801-802 in Qnicksort, 732
variables, 459 Search space, 32, 47-48 algorithm design for, 285-288 Galactic, 527 ex Sleator, D. D. Stability in generalized Stable
Saturating push operations, 363-364, greedy algorithms for LRU, 137 Matching Problem, 23-24 ex
Search problem, 284-285
446 ex binary analyzing, 138-142 Randomized Marking algorithm, Stable configurations in Hopfield
notes, 335
Savage, John E., 551 in arrays, 44 problem, 278-281 designing, 137-138 794 neural networks, 671,676,
Savitch, W., 541,552 in Center Selection Problem, 6!0 and Segmented Least Squares, with minimum spanning trees, Staid, Michiel, 249 700, 702-703 ex
Scaling behavior of polynomial time, sublinear time in, 56 189 ex Smoothing signals, 209, 236 Stable matching, 4-5
309-311 ex
DD breadth-first, 79-82 Sequencing problems, 473-474, 499 negative cycles in graphs, 30! Social networks Stable Matching problem, 1,802-803
Scaling Max-Flow Algorithm, for bipartiteness, 94-96 Hamiltonian Cycle Problem, algorithm design and analysis, as graphs, 75-76 algorithms for
353-356 for connectivity, 79-81 302-304 paths in, 110-111 ex analyzing, 7-9
474-479
Scaling parameter in augmenting for directed graphs, 97-98 problem, 301-302 Social optimum vs. Nash equilibria, designing, 5-6
Hamiltonian Path Problem,
paths, 353 implementing, 90-92 480-481 with negative edge lengths 692-693,699 extensions, 9-12
Scaling phase in Scaling Max-Flow in planning problems, 541 Traveling Salesman Problem, 474, designing and analyzing, Solitaire puzzles, 534 implementing, 45-47
Algorithm, 354-356 for shortest paths, 140 291-294 Sort-and-Count algorithm, 225 lists and arrays in, 42-45
479
Schaefer, Thomas, 552 brute-force, 31-32 Set Cover Problem, 456-459, 498, extensions, 294-297 Sorted-Balance algorithm, 605 exercises, 19-25 ex
Scheduling depth-first, 83-86 612 notes, 206, 335-336 Sorted lists, merging, 48-50 and Gale-Shapley algorithm, 8-9
Airline Scheduling Problem, 387 for connectivity, 83-86 approximation algorithm for problem, 290-291 Sorting notes, 28
algorithm analysis for, 390-391 for directed graphs, 97-98 Signals and signal processing for Load Balancing Problem, problem, !-5
analyzing, 613-617
algorithm design for, 389-390 implementing, 92-94 designing, 613 clock, 199 ex 604-606 search space for, 32
problem, 387-389 in planning problems, 541 limits on approximability, 644 with convolution, 235-236 Mergesort Algorithm, 210-211 truthfulness in, 27-28 ex
carpool, 431 ex local. See Local search notes, 659 interleaving, 329 ex approaches to, 211-212 Stacks for graph traversal, 89-90
836 Index Index 837
Stale items in randomized marking Strongly connected directed graphs, Survey Design Problem, 384-385 Theta in asymptotic order of growth, Traveling Salesman Problem, 499 Two-LabelImage Segmentation,
algorithm, 756-757 77, 98-99 algorithm for 37-38 distance in, 474 391-392, 682
Star Wars series, 526-527 ex Strongly independent sets, 519 ex analyzing, 386-387 Thomas, J.; 206 notes, 529
Start nodes in shortest paths, 137 Strongly polynomial algorithms, designing, 386 Thomass’en, C., 598 ~- NP-completeness of, 479 lJ
StartHeap operation, 64 356-357 prdblem, 385-386 Thresholds running times for, 55-56 Underspecified algorithms
State-flipping algorithm Subgraphs Suspicious Coalition Problem, approximation, 660 Traversal of graphs, 78-79 graph traversal, 83
in Hopfield neural networks, connected, 199ex 500-502 ex in humanbehaviors, 523 ex breadth-first search for, 79-82 Ford-Fulkerson, 351-352
673-677 dense, 788 ex Swapping rows in matrices, 428 ex Thymine, 273 connected components via, 82-83, Gale-Shapley, 10
as local search, 683 Sublinear time, 56 Switched data streams, 26-27 ex Tight bounds, asymptotic, 37-38 86-87 Preflow-Push, 361
State flipping neighborhood in Image Subproblems Switching Tight nodes in pricing method, depth-first search for, 83-86 Undetermined variables, 591 ex
Segmentation Problem, 682 in divide-and-conquer techniques, algorithm for 621 Traverso, Paolo, 552 Undirected Edge-Disioint Paths
Statistical mechanics, 663 215-220 analyzing, 803-804 Time-serie~ data mining, 190ex Tree decompositions, 572-573 Problem, 374
Staying ahead in greedy algorithms, in dynamic programming, 251, designing, 800-803 Time-stamps for transactions, algorithm for, 585-591 Undirected Feedback Set Problem,
115-116 258-260 in communications networks, 196-197ex dynamic programming using, 520ex
in Appalachian Trail exercise, in Mergesort Algorithm, 210 26-27 ex Time to leave in packet switching, 580-584 Undirected graphs, 74
184 ex with Qnicksort, 733 problem, 796-800 800 notes, 598 connected, 76-77
in Interval Scheduling Problem, for Weighted Interval Scheduling Switching time in Broadcast Time Time-varying edge costs, 202 ex problem, 584-585 disjoint paths in, 377-378
119-120 Problem, 254, 258-260 Problem, 528 ex Timing circuits, 200 ex properties in, 575-580 in image segmentation, 392
for shortest paths, 139 Subsequences, !90ex Symbols, encoding. See Huffman Toft, B., 598 tree-width in, 584-590 number of global minimum cuts
Stearns, R. E., 70 Subset Sum Problem, 266-267, 491, codes Top-down approach for data defining, 573-575, 578-579 in, 718-719
Steepness conditions for preflows, 499 Symmetry-breaking, randomization compression, 169-170 notes, 598 Unfairness in Gale-Shapley algorithm,
358-359 algorithms for for, 708-709 Topological ordering, 102 Trees, 77-78 9-10
Steiner trees, 204ex, 334-335ex, analyzing, 270-271 computing, 101 and arborescences. See Minimum- Uniform-depth case of Circular Arc
527ex designing, 268-270 T in DAGs, 102, 104 ex, 107 ex Cost Arborescence Problem Coloring, 566-567
Steps in algorithms, 35-36 extensions, 271-272 Tables, hash, 736-738, 760 Toth, P. binary Unimodal sequences, 242 ex
Stewart, John W., 336 hardness in, 493-494 Tails of edges, 73 Knapsack Problem, 335 nodes in, 108ex Union Bound, 709, 712-713
Stewart, Potter, 207 relation to Knapsack Problem, 645, Tardos, ~. Subset Sum, 529 for prefix codes, 166-169 for contention resolution, 712-713
Stochastic dynamic programming, 648, 657-658 ex disjoint paths problem, 659 Tours in Traveling Salesman Problem, breadth-first search, 80-81 for load balancing, 761-762
335 NP-completeness of, 492-493 game theory, 706 474 depth-first search, 84-85 for packet routing, 767-768
Stockmeyer, L., 543, 551 with polynomially bounded network flow, 448 Tovey, Craig, 250 in Minimum Spanning Tree in probability, 772-774
Stocks numbers, 494-495 rounding algorithm, 660 Trace data for networked computers, Problem. See Minimum Union-Find data structure, 151-152
investment simulation, 244-246 ex Subsquares for closest pair of points, Target sequences, 309 111 ex Spanning Tree Problem improvements, 155-157
rising trends in, 327-328 ex 743-746 Trojan, R. E. Tracing back in dynamic NP-hard problems on, 558 pointer-based, 154-157
Stopping points in Appalachian Trail Substitution graph traversal, 113 programming, 257 decompositions. See Tree simple, 152-153
exercise, 183-185 ex in sequence alignment, 289 LRU, 137 Trading in barter economies, decompositions Union operation, 152-154
Stopping signals for shortest paths, in unrolling recurrences, 213-214, online algorithms, 794 521-522 ex Maximum-Weight Independent Universal hash functions, 738-740,
297 217-219, 243-244 ex polynomial time, 70-71 Trading cycles, 324 ex Set Problem, 560-562 749-750
Stork, D., 206 Success events, 710-712 Preflow-Push Algorithm, 449 Traffic of possibilities, 557 Unrolling recurrences
Strategic Advertising Problem, Sudan, Madhu, 794 Taxonomy of NP-completeness, in Disjoint Paths Problem, 373 Tree-width. See Tree decompositions in Mergesort Algorithm, 212-213
508-509 ex Summing in unrolling recurrences, 497-500 in Minimum Spanning Tree Triangle inequality, 203 e.x, 334- subproblems in, 215-220
Stream ciphers with feedback, 792 ex 213,216-217 Telegraph, 163 Problem, 150 335 ex, 606 substitutions in, 213-214, 217-219
Stress-testing iars, 69-70 ex Sums of functions in asymptotic Teller, A. H., 666 in networks, 339, 625 Triangulated cycle graphs, 596-597 ex in unimodal sequence exercise,
Strings growth rates, 39-40 Teller, E., 666 Transactions Triathalon scheduling, 191 ex 244 ex
chromosome, 521 ex Supemodes Temperature in simulated annealing, approximate time-stamps for, Trick, Michael, 250 Unweighted case in Vertex Cover
concatenating, 308-309 ex, 517 ex in Contraction Algorithm, 669-670 196-197ex Truth assignments Problem, 618
encoding. See Huffman codes 715 Terminal nodes, 690 via shortest paths, 290 with Boolean variables, 459 Upfal, E., 793-794
length of, 463 in minimum-cost arborescences, Terminals in Steiner trees, 204 ex, Transitivity consistent, 592 ex Uplink transmitters, 776-777 ex
similarity between, 278-279 181 334-335 ex of asymptotic growth rates, 38-39 Truthfulness in Stable Matching Upper bounds, asymptotic, 36-37
Strong components in directed Supervisory committee exercise, Termination in Maximum-Flow of reductions, 462-463 Problem, 27-28 ex Upstream nodes in flow networks,
graphs, 99 196ex Problem, 344-346 Transmitters in wireless networks, Tucker, A., 598 429 ex
Strong instability in Stable Matching Supply in circulation, 379 Testing bipartiteness, 94-96 776-779 ex Turing, Alan, 551 Upstream points in communications
Problem, 24-25 ex Surface removal, hidden, 248 ex Tetris, 795 Transportation networks, graphs as Turing Award lecture, 70 networks, 26-27 ex
models of, 74 "Twelve Days of Christmas," 69 ex User-friendly houses, 416-417 ex
Index
Using up All the Refrigerator Magnets Virtual places in hypertext fiction, of Steiner trees, 204 ex
Problem, 507-508 ex 509ex in Vertex Cover Problem, 618
Virus tracking, 111-112 ex Well-centered splitters
V VLSI chips, 200 ex in median-finding, 729-730
Valid execution of Kruskal’s Von Neumann, John, 249 in Quicksort, 732
algorithm, 193 ex Voting Width, tree, in tree decompositions.
Valid partners in Gale-Shapley expected value in, 782 ex See Tree decompositions
algorithm, 10-12 gerrymandering in, 331-332 ex Williams, J. W. J., 70
Valid stopping points in Appalachian Williams, Ryan, 552
Trail exercise, 183-184ex W Williamson, D. P., 659
Validation functions in barter Wagner, R., 336 Winner Determination for
economy, 522 ex Walks, self-avoiding, 547-550 ex Combinatorial Auctions
Values Wail Street, 115 problem, 511-512 ex
of flows in network models, 339 Water in shortest path problem, Winsten, C. B., 706
of keys in priority queues, 57-58 140-141 Wireless networks
Van Kreveld, M., 250 Waterman, M., 335 ad hoc, 435-436ex
Variable-length encoding schemes, Watson, J., 273 for laptops, 427-428 ex
163 Watts, D. J., 113 nodes in, 108-109 ex, 324-325 ex
Variables Wavelength assignment for wireless transmitters for, 776-779 ex
adding in dynamic programming, networks, 486 wavelength assignment for, 486,
266, 276 Wavelength-division multiplexing Witten, I. H., 206
Boolean, 459-460 (WDM), 563-564 Woo, Maverick, 530, 552
random, 719-720 Wayne, Kevin, 449 Word-of-mouth effects, 524 ex
with convolution, 237 Weak instability in Stable Matching Word processors, 317-319 ex
linearity of expectation, 720-724 Problem, 25 ex Word segmentation problem,
Vazirani, V. V., 659-660 Weaver, W., 206 316-318 ex
Vecchi, M. P., 669, 705 Wegman, M. L., 794 World Wide Web
Vectors, sums of, 234-235 Weighted Interval Scheduling advertising, 422-423 e_x, 508-508 ex
Veksler, Olga, 449-450, 706 Problem, 14, 122, 252 diameter of, 109-110ex
Vertex Cover Problem, 498, 554-555 algorithms for as directed graph, 75
and Integer Programming Problem, designing, 252-256 meta-search tools on, 222
633-635 memoized recursion, 256-257 Worst-case analysis, 31-32
linear programming for. See IAnear relation to billboard placement, Worst-case running times, 31-32
programming and rounding 309 ex Worst valid partners in Gale-Shapley
in local search, 664-666 subproblems in, 254, 258-260 algorithm, 11-!2
notes, 659-660 Weighted sums of completion times, Wosley, L. A., 206
optimal algorithms for 194-195 ex Wtmsch, C., 279
analyzing, 557 Weighted Vertex Cover Problem, 618,
designing, 555-557 631 ¥
in polynomial-time reductions, as generalization of Vertex Cover, Young, N. E., 794
454-459 633-635
pricing methods, 618 notes, 659-660 Z
algorithm analysis for, 622-623 Weights Zabih, Ramin D., 449-450, 706
algorithm design for, 620-622 of edges in Hopfield nettral Zero skew, 201 ex
problem, 618-619 networks, 671 Zero-Weight-Cycle problem, 513 ex
problem, 555 in infinite sample spaces, 775 Zones
randomized approximation in Knapsack Problem, 267-272, in Competitive Facility Location
algorithm for, 792-793 ex 657-658ex Problem, 18
Vertices of graphs, 74 of nodes, 657ex in Evasive Path Problem, 510-511 ex
Viral marketing phenomenon, 524 ex in Set Cover Problem, .61-2-- Zuker, M., 335