Coursera Algorithm Toolbox
Coursera Algorithm Toolbox
Coursera Algorithm Toolbox
Big-O
LECTURE 1 BIG Oh
https://www.youtube.com/watch?edufilter=NULL&v=sblr6SXgyLA
Introduction to Big O Notation and Time Complexity (Data
Structures & Algorithms #7)
https://www.youtube.com/watch?edufilter=NULL&v=D6xkbGLQesk
Welcome!
Welcome
Video: LectureWelcome!
3 min
Reading: Companion MOOCBook
10 min
10 min
6 min
Purchase a subscription to unlock this item.
1h
10 min
Video: LectureSolving the Maximum Pairwise Product Programming Challenge: Improving the Naive Solution,
Testing, Debugging
13 min
8 min
7 min
10 min
10 min
5 questions
2h
10 min
Acknowledgements (Optional)
Reading: Acknowledgements
2 min
Algorithms are everywhere. Whether you are writing software, analyzing a genome, predicting traffic
jams, producing automatic movie recommendations, or just surfing the Internet, you're dealing with
algorithms. Every single branch of computer science uses algorithms, so a course on algorithms and
data structures is an essential part of any CS curriculum. >> It's important that the algorithms we use
are efficient as users want to see the search results in a blink of an eye even if they search through
trillions of web pages. A poorly thought out algorithm could take literally centuries to process all the
webpages indexed by a search engine or all the Facebook posts. And thus, algorithmic improvements
are necessary to make these systems practical.
Play video starting at 43 seconds and follow transcript0:43
That's why tech companies always ask lots of algorithmic questions at the interviews. >> In data
science problems, like ranking internet search results, predicting road accidents, and recommending
movies to users, advanced algorithms are used to achieve excellent search quality, high prediction
accuracy, and to make relevant recommendations. However, even for a simple machine learning
algorithm like linear regression to be able to process big data is usually a challenge. When advanced
algorithms such as deep neural networks are applied to huge data sets they make extremely accurate
predictions. Recently starting to even outperform humans in some areas of vision and speech
recognition, but getting those algorithms to work in hours instead of years on a large dataset is hard.
And performing experiments quickly is crucial in data science. >> Algorithms are everywhere. Each of
trillions of cells in your body executes a complex and still poorly understood algorithm.
Play video starting at 1 minute 40 seconds and follow transcript1:40
And algorithm are the key for solving important biomedical problems such as what are the mutations
that differentiate you from me and how is it they relate to diseases. In this specialization you will
learn the theory behind the algorithm. Implement algorithm in the programming language of your
choice and apply them to solving practical problems such as assembling the genome from millions of
tiny fragments, the largest jigsaw puzzle ever assembled by humans. >> To conclude, algorithms are
everywhere. And it is important to design your algorithms and to implement them. To turn you into a
pro in algorithm design we will give you nearly 100 programming assignments in this class. Your
solutions will be checked automatically, and you will learn how to implement, test, and debug fast
algorithms solving large and difficult problems in seconds. We look forward to seeing you in this class.
We know it will make you a better programmer.
Play video starting at 2 minutes 43 seconds and follow transcript2:43
>> Algorithms are everywhere. In fact you just saw five algorithms solving the fundamental sorting
problem in computer science, and they all have different running times. In fact while four of them are
about to finish one will take a much longer time. In this specialization you'll be able to implement all
of these algorithms, and master the skill of answering both algorithmic and programming questions at
your next interview.
Companion MOOCBook
We invite you to use the following companion book for the specialization:
Alexander Kulikov and Pavel Pevzner. Learning Algorithms through Programming and Puzzle
Solving. 2018.
1. Basic knowledge of at least one programming language: Python, C++, Java, C#,
Javascript, C, Haskell, Ruby, Rust, Scala.
We expect you to be able to implement programs that: 1) read data from the standard input (in
most cases, the input is a sequence of integers); 2) compute the result (in most cases, a few loops
are enough for this); 3) print the result to the standard output. For each programming challenge in
this course, we provide starter solutions in C++, Java, and Python. The best way to check whether
your programming skills are enough to go through problems in this course is to solve two
problems from the first week. If you are able to pass them (after reading our tutorials), then you
will definitely be able to pass the course.
Test_your_solutions.pdf
This section contains the Maximum Pairwise Product Programming Challenge. You can go ahead
and solve this programming challenge. However, if you encounter any problems with it, please go
through the previous optional section with videos describing how to solve it, test and debug your
solutions in C++. It also contains several very useful techniques for testing and debugging your
solutions to programming challenges in genereal. The next quiz will help you determine whether
you're ready to proceed with solving programming challenges, or is it better to go through the
previous optional section beforehand.
Solving Programming Challenges
TOTAL POINTS 5
1.Question 1
What will you typically need to implement yourself in the programming assignments if you program
in C++, Java or Python?
1 point
2.Question 2
Your program in C, C++ or Java thinks that the product of
numbers 5000050000 and 5000050000 is equal to -1794967296−1794967296. What is the
most probable reason?
1 point
Compiler error.
Integer overflow.
3.Question 3
Which tests should you perform before submitting a solution to the programming assignment?
1 point
Test on the examples from the problem statement. Then make a few other small tests, solve them
manually and check that your program outputs the correct answer. Generate a big input and
launch your program to check that it works fast enough and doesn't consume too much memory.
Test for corner cases: smallest allowed values and largest allowed values of all input parameters,
equal numbers in the input, very long strings, etc. Then make a stress test. After all these tests
passed, submit the solution.
Just check that the answers for the examples from the problem statement are correct.
Test on the examples from the problem statement. Then make a few other small tests, solve them
manually and check that your program outputs the correct answer. After all these tests passed,
submit the solution.
4.Question 4
Where does the input data come from when you implement a stress test?
1 point
You generate valid input data as a part of the stress test implementation.
You download and use the tests we've prepared to check your solution to the problem.
5.Question 5
If you submit a solution of a programming assignment, but it does not pass some of the tests,
what feedback will you get from the system?
1 point
If it is one of the first few tests, you will see the input data, the answer of your program and the
correct answer. Otherwise, you will only see either that the answer of your program is wrong or
that your program is too slow or that your program uses too much memory.
You will only get the feedback that your program either passed or did not pass.
You will see the input data, the answer of your program, the correct answer, how long did your
program work and how much memory did it use for each of the tests.
I understand that submitting work that isn’t my own may result in permanent failure of this course
or deactivation of my Coursera account. Learn more about Coursera’s Honor Code
Using PyCharm to solve programming challenges
If your primary programming language is Python, we encourage you to install the PyCharm Edu
IDE and try it (after installing, select Learn -> Start Coursera Assignment -> Algorithmic Toolbox).
The beautiful PyCharm IDE will allow you to code like a pro:
implement a solution, implement unit tests and stress tests for it, and run the tests in the
IDE;
use visual debugging tools;
use various smart features of the IDE: code inspections, autocompletion, refactoring;
when you are happy with your implementation, submit it to Coursera.
We hope that PyCharm Edu will make your learning process smooth and enjoyable! Please use
this forum thread to leave feedback, suggest improvements, and report bugs.
Week2
Algorithmic Warm-up
In this module you will learn that programs based on efficient algorithms can solve the same
problem billions of times faster than programs based on naïve algorithms. You will learn how to
estimate the running time and memory of an algorithm without even implementing it. Armed with
this knowledge, you will be able to compare various algorithms, select the most efficient ones, and
finally implement them as our programming challenges!
Key Concepts
Estimate the running time of an algorithm
Practice implementing efficient solutions
Practice solving programming challenges
Implement programs that are several orders of magnitude faster than straightforward
programs
Why Study Algorithms?
7 min
Resume
. Click to resume
Video: LectureComing Up
3 min
Fibonacci Numbers
Video: LectureProblem Overview
3 min
Video: LectureNaive Algorithm
5 min
Video: LectureEfficient Algorithm
3 min
Reading: Resources
2 min
4 min
Video: LectureEfficient Algorithm
5 min
Reading: Resources
2 min
Big-O Notation
Video: LectureComputing Runtimes
10 min
Video: LectureAsymptotic Notation
6 min
Video: LectureBig-O Notation
6 min
Video: LectureUsing Big-O
10 min
1h
Reading: Resources
2 min
Practice Quiz: Logarithms
6 questions
Practice Quiz: Big-O
7 questions
2 questions
Course Overview
Video: LectureCourse Overview
10 min
Programming Assignment 2
2h 30m
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Computing Fibonacci numbers: Section 0.2 of [DPV08]
Visualizations
Computing Fibonacci numbers by David Galles
To better appreciate the difference between polynomial time and exponential time algorithms, try
computing F_{20}F20 using this visualization. For this, enter "20" into the field and press
"Fibonacci Recursive". This calls a recursive algorithm that makes an endless number of recursive
calls. This call will never end even if you increase the visualization speed to maximum. Stop this
call by pressing "Skip Forward" and press "Fibonacci Table". This will call an iterative algorithm
that uses an array to compute Fibonacci numbers efficiently. The third button calls a recursive
algorithm with memoization. We will cover such algorithms in the Dynamic Programming module
later in this class.
(Note that the visualization uses a slightly different definition of Fibonacci numbers:
there, F_0=F_1=1F0=F1=1, and in the lecture, F_0=0,F_1=1F0=0,F1=1. This, of course, has
no influence on the running time.)
Advanced Reading
Properties of Fibonacci numbers: Exercises 0.2–0.4 in [DPV08]
References
[DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms (1st Edition).
McGraw-Hill Higher Education. 2008.
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Greatest common divisor: Section 1.2.3 of [DPV08], Section 31.2 of [CLRS]
References
[DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms (1st Edition).
McGraw-Hill Higher Education. 2008.
[CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to
Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009
Big-O Notation
Hello, everybody. Welcome back. Today, we're going to be talking about Big-O notation, which is the
specific, sort of asymptotic notation that we will be using most frequently here. So, the idea here is
we're going to introduce the meaning of Big-O notation and describe some of its advantages and
disadvantages. So to start with the definition. The idea is that we want something that cares about
what happens when the inputs get very large and sort of only care about things up to constants. So
we are going to come up with a definition. If you've got two functions, f and g. g(n) is Big-O of g(n). If
there are two constants, capital N and little c, such that for all n, at least N. f(n) is at most c*g(n). And
what this means is that at least for sufficiently large inputs, f is bounded above by some constant
multiple of g.
Play video starting at 58 seconds and follow transcript0:58
Which is really sort of this idea that we had from before. Now, for example, 3n squared plus 5n plus 2
is O of n squared,
Play video starting at 1 minute 8 seconds and follow transcript1:08
because if we take any n at at least 1, 3n squared plus 5n plus 2 is at most 3n squared plus 5n squared
plus 2n squared, which is 10n squared.
Play video starting at 1 minute 19 seconds and follow transcript1:19
Some multiple of n squared. So, and in particular if you look at these two functions, they really in
some sense do have the same growth rate. If you look at the ratio between them, sure it's large, its
10n equals 1, but as n gets large it actually drops down to about 3. And once you're putting in inputs,
at n equals 100, n squared is a million 3n squared + 5n + 2 is a little bit more than 3 million.
Play video starting at 1 minute 45 seconds and follow transcript1:45
So, they're not the same function. One of them is distinctly larger than the other, but it's not larger by
much, not by more than a factor of about three.
Play video starting at 1 minute 57 seconds and follow transcript1:57
Throughout this course, we're going to be using big-O notation to report, basically, all of our
algorithms' runtimes. And, this has a bunch of advantages for us.
Play video starting at 2 minutes 7 seconds and follow transcript2:07
The first thing that it does for us is it clarifies growth rate. As I've said before, often what we care
about is how does our runtime scale with the input size. And this is sort of an artifact to the fact that
we often really care about what happens when we put really, really, really big inputs to our algorithm.
How big can we deal with, before it starts breaking down?
Play video starting at 2 minutes 29 seconds and follow transcript2:29
And, if you gave me some sort of complicated expression in terms of the input, with lots of terms,
then it might be hard given two algorithms to really compare them. I mean, which one's bigger would
depend on exactly which inputs I'm using. It requires some sort of annoying computation to
determine where exactly one's better than the other. But, if you look at things asymptotically what
happens as n gets large? It often becomes much more clear that, once n is very, very large, algorithm
a is better than algorithm b.
Play video starting at 3 minutes 1 second and follow transcript3:01
The second thing it does for us is that it cleans up notation. We can write O(n²), instead of 3n² + 5n +
2. And that's a lot cleaner and much easier to work with. We can write O(n) instead of n + log₂(n) +
sin(n). We can write O(n log(n)) instead of 4n log₂(n) + 7. And note, that in the big O, we don't actually
need to specify the base of the logarithm that we use. Because log₂(n), and log₃(n), and log₁₀(n), and
log₇(n), They only differ by constant multiples. And up to the constant multiples, this big O that we
have really doesn't care.
Play video starting at 3 minutes 44 seconds and follow transcript3:44
Another consequence of this is that because our notation is cleaner, because we have fewer lower
order terms to deal with, this actually makes the algebra that we have to do easier. It makes it easier
to manipulate big O expressions because they're not as messy.
Play video starting at 4 minutes 0 seconds and follow transcript4:00
And the final thing this does is that this big O notation really does solve these problems we were
talking about a couple of lectures ago. In order to compute runtimes in terms of big O, we really don't
need to know things like how fast the computer is, or what the memory hierarchy looks like, or what
compiler we used, because, by and large, although these things will have a big impact on your final
runtime, that impact will generally only be a constant multiple. And if two things are only off by a
constant multiple, they've got the same big O.
Play video starting at 4 minutes 32 seconds and follow transcript4:32
That's all there is.
Play video starting at 4 minutes 34 seconds and follow transcript4:34
Now, I should say that there's a warning. Big-O is incredibly useful, we are going to be using it for
basically everything in this course, but it does lose a lot of information about your runtime. It forgets
about any constant multiples. So, if you have two algorithms, and one of them's a hundred times
faster, they have the same Big-O.
Play video starting at 4 minutes 54 seconds and follow transcript4:54
But, in practice, if you want to make things fast, a factor of 100 is a big deal. Even a factor of two is a
big deal. And so, if you really want to make things fast, once you have a good asymptotic runtime, you
then want to look into the nitty-gritty details. Can I save a factor of two here? Can I rearrange things
to make things run a little bit smoother? Can I make it interact better with the memory hierarchy? Can
I do x, y and z to make it faster by these constant factors that we didn't see beforehand? The second
thing that you should note along these lines is that big O is only asymptotic. In some sense, all it tells
you about are what happens when you put really, really, really, really, really big inputs into the
algorithm.
Play video starting at 5 minutes 41 seconds and follow transcript5:41
And,well, if you actually want to run your algorithm on a specific input. Big O doesn't tell you anything
about how long it takes in some sense. I mean, usually the constants hidden by the big O are
moderately small and therefore you have something useful. But sometimes they're big. Sometimes an
algorithm with worse big O runtime, that's worse asymptotically on very large inputs, actually, for all
practical sizes, is actually beaten by some other algorithm. And there are cases of this where you find
two algorithms where a works better than b on really, really, really big inputs. But sometimes really,
really, really big means more than you could ever store in your computer in the first place. And so, for
any practical input you want to use algorithm b.
Play video starting at 6 minutes 32 seconds and follow transcript6:32
In any case, though, despite these warnings, big O is incredibly useful. We're going to be using it
throughout this course. And so, next lecture, we're going to be talking a little bit about how to deal
with big O expressions, how to manipulate them, how to use them to compute runtimes, but once
you have that we'll really be sort of ready to do some algorithms.
Play video starting at 6 minutes 53 seconds and follow transcript6:53
In any case, that's all for this lecture, come back next time and we'll talk about that.
Big-O notations in details
DS::
Arabic Topics >> Udacity >> Coursera
>> Book (Cracking >> grokking >> problem solving >> Roberto reinfor + creitivty + project >> Packt
basant
Narasimha Karumanchi >>
Kulkov >> Khan-academy
))
Resources
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Big-OO notation and growth rate: Section 0.3 of [DPV08]
Big-OO notation at Khan Academy
References
[DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms (1st Edition).
McGraw-Hill Higher Education. 2008.
Computing Runtimes
Asymptotic Notation
Hello, everybody. Welcome back. Today we're going to start talking about
asymptotic notation.
So here we're going to sort of just introduce this whole idea of asymptotic
notation and describe some of the advantages of using it.
Play video starting at 19 seconds and follow transcript0:19
So last time we ran into this really interesting problem that computing
runtimes is hard, in that if you really, really want to know how long a
particular program will take to run on a particular computer, it's a huge
mess. It depends on knowing all kinds of fine details about how the program
works. And all kinds of fine details about how the computer works, how fast
it is, what kind of system architecture it is. It's a huge mess. And we don't
want to go through this huge mess every single time we try to analyze an
algorithm. So, we need something that's maybe a little bit less precise but
much easier to work with, and we're going to talk about the basic idea
behind that.
And the basic idea is the following. That, there are lots of factors that have
an effect on the final runtime but, most of them will only change the runtimes
by a constant. If you're running on a computer that's a hundred times faster,
it will take one-one hundreth of the time, a constant multiple. If your system
architecture has multiplications that take three times as long as additions,
then if your program is heavy on multiplications instead of additions, it might
take three times as long, but it's only a factor of three. If your memory
hierarchy is arranged in a different way, you might have to do disk lookups
instead of RAM lookups. And those will be a lot slower, but only by a
constant multiple.
Of course there's a problem with this idea, if you look at it sort of by itself,
that if you have runtimes of one second or one hour or one year, these only
differ by constant multiples. A year is just something like 30 million seconds.
And so, if you don't care about factors of 30 million, you can't tell the
difference between a runtime of a second and a runtime of a year. How do
we get around this problem?
Well, there's a sort of weird solution to this. We're not going to actually
consider the runtimes of our programs on any particular input. We're going
to look at what are known as asymptotic runtimes. These ask, how does the
runtime scale with input size? As the input size n gets larger, does the
output scale proportional to n, maybe proportional to n squared? Is it
exponential in n? All these things are different. And in fact they're sort of so
different that as long as n is sufficiently large, the difference between n
runtime and n squared runtime is going to be worse than any constant
multiple.
If you've got a constant multiple of 1000, 1000n might be pretty bad with that
big number in front. But, when n becomes big, it's still better than n squared.
And so, by sort of only caring about what happens in this sort of long scale
behavior, we will be able to do this without seeing these constants, without
having to care about these details.
And in fact, this sort of asymptotic, large scale behavior is actually what you
care about a lot of the time, because you really want to know: what happens
when I run my program on very large inputs?
And these different sorts of scalings do make a very large difference on that.
So suppose that we have an algorithm whose runtime is roughly
proportional to n and we want it to run it on a machine that runs at about a
gigahertz. How large an input can we handle such that we'll finish the
computation in a second?
Play video starting at 4 minutes 15 seconds and follow transcript4:15
Well if it runs at about size n, you can handle about a billion sized inputs,
before it takes more than a second.
If instead of n, it's n log n it's a little bit slower, you can only handle inputs
the size about 30 million. If it runs like n squared, it's a lot worse. You can
only handle inputs of size about 30,000 before it starts taking more than a
second.
If the inputs are of size 2 to the n, it's incredibly bad, you can only handle
inputs of size about 30 in a second. Inputs of size 50 already take two
weeks, inputs of size 100 you'll never ever finish. And so the difference
between n and n squared and 2 to the n is actually really, really significant.
It's often more significant than these factors of 5 or 100 that you're seeing
from other things.
Now just to give you another feel of sort of how these sort of different types
of runtimes behave, let's look at some sort of common times that you might
see. There's log n, which is much smaller than root n, which is much smaller
than n, which is much smaller than n log n, which is much smaller than n
squared, which is much smaller than 2 to the n. So, if we graph all of these,
you can see that these graphs sort of separate out from each other. If you
just look at them at small inputs, it's maybe a little bit hard to tell which ones
are bigger, there's a bit of jostling around between each other. But if we
extend the graph outwards a bit, it becomes much more clear. 2 to the n
starts after about 4 really taking off. Really just 2 to the n just shoots up
thereafter and becomes 20 or 30, it just leaves everyone else in the dust. N
squared keeps up a pretty sizable advantage though against everyone else.
N log n and n also are pretty well separated from the others. In this graph,
root n and log n seem to be roughly equal to each other, but if you kept
extending, if you let n get larger and larger, they'd very quickly differentiate
themselves. Square root of 1 million is about 1,000. Log of 1 million is about
20. And so really as you keep going out, very quickly the further out you go
the further separated these things become from each other, and that's really
the key idea behind sort of asymptotics. We don't care so much about the
constants, we care about what happens as your inputs get very large, how
do they scale.
So that's it for today. Come back next lecture. We'll talk about in sort of
detail what this actually means and how to actually get it to work. So until
next time.
Big-O Notation
Hello, everybody. Welcome back. Today, we're going to be talking about Big-O
notation, which is the specific, sort of asymptotic notation that we will be using most
frequently here. So, the idea here is we're going to introduce the meaning of Big-O
notation and describe some of its advantages and disadvantages. So to start with the
definition. The idea is that we want something that cares about what happens when
the inputs get very large and sort of only care about things up to constants. So we are
going to come up with a definition. If you've got two functions, f and g. g(n) is Big-O of
g(n). If there are two constants, capital N and little c, such that for all n, at least N. f(n)
is at most c*g(n). And what this means is that at least for sufficiently large inputs, f is
bounded above by some constant multiple of g.
Which is really sort of this idea that we had from before. Now, for example, 3n
squared plus 5n plus 2 is O of n squared,
Some multiple of n squared. So, and in particular if you look at these two functions,
they really in some sense do have the same growth rate. If you look at the ratio
between them, sure it's large, its 10n equals 1, but as n gets large it actually drops
down to about 3. And once you're putting in inputs, at n equals 100, n squared is a
million 3n squared + 5n + 2 is a little bit more than 3 million.
Play video starting at 1 minute 45 seconds and follow transcript1:45
So, they're not the same function. One of them is distinctly larger than the other, but
it's not larger by much, not by more than a factor of about three.
Throughout this course, we're going to be using big-O notation to report, basically, all
of our algorithms' runtimes. And, this has a bunch of advantages for us.
The first thing that it does for us is it clarifies growth rate. As I've said before, often
what we care about is how does our runtime scale with the input size. And this is sort
of an artifact to the fact that we often really care about what happens when we put
really, really, really big inputs to our algorithm. How big can we deal with, before it
starts breaking down?
And, if you gave me some sort of complicated expression in terms of the input, with
lots of terms, then it might be hard given two algorithms to really compare them. I
mean, which one's bigger would depend on exactly which inputs I'm using. It requires
some sort of annoying computation to determine where exactly one's better than the
other. But, if you look at things asymptotically what happens as n gets large? It often
becomes much more clear that, once n is very, very large, algorithm a is better than
algorithm b.
The second thing it does for us is that it cleans up notation. We can write O(n²),
instead of 3n² + 5n + 2. And that's a lot cleaner and much easier to work with. We can
write O(n) instead of n + log₂(n) + sin(n). We can write O(n log(n)) instead of 4n
log₂(n) + 7. And note, that in the big O, we don't actually need to specify the base of
the logarithm that we use. Because log₂(n), and log₃(n), and log₁₀(n), and log₇(n),
They only differ by constant multiples. And up to the constant multiples, this big O that
we have really doesn't care.
And the final thing this does is that this big O notation really does solve these
problems we were talking about a couple of lectures ago. In order to compute
runtimes in terms of big O, we really don't need to know things like how fast the
computer is, or what the memory hierarchy looks like, or what compiler we used,
because, by and large, although these things will have a big impact on your final
runtime, that impact will generally only be a constant multiple. And if two things are
only off by a constant multiple, they've got the same big O.
Now, I should say that there's a warning. Big-O is incredibly useful, we are going to be
using it for basically everything in this course, but it does lose a lot of information
about your runtime. It forgets about any constant multiples. So, if you have two
algorithms, and one of them's a hundred times faster, they have the same Big-O.
But, in practice, if you want to make things fast, a factor of 100 is a big deal. Even a
factor of two is a big deal. And so, if you really want to make things fast, once you
have a good asymptotic runtime, you then want to look into the nitty-gritty details. Can
I save a factor of two here? Can I rearrange things to make things run a little bit
smoother? Can I make it interact better with the memory hierarchy? Can I do x, y and
z to make it faster by these constant factors that we didn't see beforehand? The
second thing that you should note along these lines is that big O is only asymptotic. In
some sense, all it tells you about are what happens when you put really, really, really,
really, really big inputs into the algorithm.
And,well, if you actually want to run your algorithm on a specific input. Big O doesn't
tell you anything about how long it takes in some sense. I mean, usually the constants
hidden by the big O are moderately small and therefore you have something useful.
But sometimes they're big. Sometimes an algorithm with worse big O runtime, that's
worse asymptotically on very large inputs, actually, for all practical sizes, is actually
beaten by some other algorithm. And there are cases of this where you find two
algorithms where a works better than b on really, really, really big inputs. But
sometimes really, really, really big means more than you could ever store in your
computer in the first place. And so, for any practical input you want to use algorithm b.
In any case, though, despite these warnings, big O is incredibly useful. We're going to
be using it throughout this course. And so, next lecture, we're going to be talking a
little bit about how to deal with big O expressions, how to manipulate them, how to
use them to compute runtimes, but once you have that we'll really be sort of ready to
do some algorithms.
In any case, that's all for this lecture, come back next time and we'll talk about that.
Using Big-O
Hello everybody. Welcome back. Today we're going to be talking about using Big-O
notation. So the basic idea here, we're going to be talking about how to manipulate
expressions involving Big-O and other asymptotic notations. And, in particular, we're
going to talk about how to use Big-O to compute algorithm runtimes in terms of this
notation.
So recall, we said that f(n) was Big-O of g(n). If for all sufficiently large inputs f(n) was
bounded above by some fixed constant times g(n). Which really says that f is
bounded above by some constant times g.
Now, we'd like to manipulate expressions, we'd like to, given expressions write them
in terms of Big O in the simplest possible manner. So there's some common rules you
need to know.
The first rule is that multiplicative constants can be omitted. 7n cubed is O of n cubed.
n squared over 3 is O of n squared. The basic premise that we had when building this
idea was that we wanted to have something that ignores multiplicative constants.
The second thing to note is that you have two powers of n. The one with the larger
exponent grows faster, so n grows asymptotically slower than Big-O of n squared.
Root n grows slower than n, so it's O of n.
What's more surprising is that if you have any polynomial and any exponential, the
exponential always grows faster. So n to the fifth is O of root two to the n. n to the 100
is O of 1.1 to the n. And this latter thing is something that should surprise you a little
bit. Because n to the 100 is a terrible runtime. Two to the 100 is already so big that
you really can't expect to do it ever. On the other hand, 1.1 to the n grows pretty
modestly. 1.1 to the 100 is a pretty reasonable-sized number.
On the other hand, what this really says, is that once n gets large, maybe 100
thousand or so, 1.1 eventually takes over, and starts beating n to the 100. And it does
so by, in fact, quite a bit. But it doesn't really happen until n gets pretty huge.
In a similar vein, any power of log n grows slower than any power of n. So log n
cubed is O of root n. n log n is O of n squared.
Finally, if you have some sum of terms smaller terms in the sum can be omitted. So n
squared plus n. n has a smaller rate of growth.
So this is O of n squared. 2 to the n + n to the 9th. n to the 9th has a smaller rate of
growth, so this is O(2 to the n). So, these are common rules for manipulating these
expressions. Basically these are the only ones that you'll need most of the time to
write anything in terms of Big-O that you need.
Okay, so let's see how this works in practice. If we actually want to compute runtimes
using Big-O notation. So let's look at this one algorithm again. So we created an
array. We set the 0th element to 0 and the first element to 1. We then went through
this loop, where we set each element to the sum of the previous two. And then
returned the last element of the array. Let's try computing this runtime in terms of Big-
O notation.
So, we're just going to run through it operation by operation and ask how long it takes.
First operation is we created an array, and let's for the moment ignore the memory
management issues, assume that it's not too hard to allocate the memory. But let;s
suppose that what your compiler does is we actually want to zero out all of these cells
in memory and that's going to take us a little bit of work. Because for every cell,
basically what we have to do, is we need to zero out that cell, we then need to
increment some counter to tell us which cell we're working on next and then maybe
we need to do a check to make sure that we're not at the end. If we are at the end, to
go to the next line. Now for every cell we have to do some amount of work. We have
to do something like do a write, and the comparison, and an increment. And it's not
entirely clear how many machine operations this is. But it's a constant amount of work
per cell in the array. If there are n plus 1 cells. This is O of n time, some constant
times n. Next we set the zeroth elements of the array of zero. And this might just be a
simple assignment. We might have to load a few things into registers or do some
pointer arithmetic, but no matter whether this is one machine operation or five or
seven, that's still going to be a constant number of machine operations, O(1).
Next we run through this loop, for i running from two to n, we run through it n minus
one times, that's O(n) times.
The main thing we do in the loop is we set the ith element of the array to the sum of
the i minus first and i minus second. Now the lookups and the store, those are all of
the sorts of things we had looked at, those should be O of 1. But the addition is a bit
worse. And normally additions are constant time. But these are large numbers.
Remember, the nth Fibonacci number has about n over 5 digits to it, they're very big,
and they often won't fit in the machine word.
Play video starting at 5 minutes 39 seconds and follow transcript5:39
Now if you think about what happens if you add two very big numbers together, how
long does that take? Well, you sort of add the tens digit and you carry, and you add
the hundreds digit and you carry, and add the thousands digit, you carry and so on
and so forth. And you sort of have to do work for each digits place.
And so the amount of work that you do should be proportional to the number of digits.
And in this case, the number of digits is proportional to n, so this should take O(n)
time to run that line of code.
Finally, we have a return step, which is a pointer arithmetic and array lookup and
maybe popping up the program stack. And it's not quite clear how much work that is,
but it's pretty clear that it's a constant amount of work, it doesn't become worse as n
gets larger. So, that's O of one time.
So, now we just have to add this all together. O of N plus O of 1 plus O of 1 plus O of
N times through the loop times O of N times work per time through the loop plus O of
1, add it all together, the dominant term here, which is the only one we need, is the O
of n times O of n. That's O of n squared. So this algorithm runs in time O of n
squared.
Now, we don't know exactly what the constants are, but O of n squared means that if
we want to finish this in a second, you can probably handle inputs of size maybe
30,000. Now, depending on the computer that you had and the compiler and all of
these messy details, maybe you can only handle inputs of size 1,000 in a second.
Maybe you can handle inputs the size of million in a second. It's probably not going to
be as low as ten or as high as a billion but, I mean, 30,000's a good guess and well, it
takes work to get anything better than that.
And so, this doesn't give us an exact answer but it's pretty good.
If you want to say that f is bounded below by g, that it grows no slower than g, you
say that f(n) is Omega of g(n). And that says that for some constant c, f(n) is at least c
times g(n), for all large n.
Now instead of saying bounded above or bounded below, sometimes that you
actually want to say that they grow at the same rate.
And for that you'd see f is Big-Theta of g(n). Which means, that F is both Big-O of g,
and, Big-Omega of G. Which says, up to constants, that f and g grow at the same
rate.
Finally, sometimes instead of saying that f grows no faster than g, you actually have
to say that it grows strictly slower than g, and for that you say f(n) is Little-o of g(n).
And that says that, not only is the ratio between f(n) and g(n) bounded above by some
constant, but actually this constant can be made as small as you like. In particular this
means that the ratio f(n) over g(n) goes to zero as n goes to infinity.
So, these are some other notations that you'll see now and then. You should keep
them in mind. They're useful. Big-O is the one that usually shows up, because we
actually want to bound our runtimes above. It's sort of the big, important thing, but
these guys are also useful.
So, to summarize the stuff on asymptotic notation. What it lets us do is ignore these
messy details in the runtime analysis that we saw before.
It produces very clean answers that tell us a lot about the asymptotic runtime of
things.
And these together make it very useful. It means we're going to be using it extensively
throughout the course. So you really ought to get used to it.
But, it does throw away a lot of practical useful information. So if you really want to
make your program fast, you need to look at more than just the Big-O runtime.
With this lecture, we basically finished the sort of introductory material that we need.
Next lecture I'll talk to you a little bit about sort of an overview of the rest of the course
and some our philosophy for it. But after that, we'll really get into the meat of the
subject. We'll start talking about key important ways to design algorithms. So, I hope
you enjoy it.
Youtube Channels
CS Dojo
New Baghdad
CS Dojo
Big-O Notation: Plots
The purpose of this notebook is to visualize the order of growth of some functions used frequently
in the algorithm analysis. Note that this is an interactive notebook meaning that besides of just
running all the code below you may also fool around with it. Try to plug in you favorite functions
and/or change the ranges below and see what happens. Proceed by repeatedly clicking the Run
button. To start over, select Kernel -> Restart and Clear Output.
Definitions
We start by reminding the definitions. Consider two functions f(n)f(n) and g(n)g(n) that are
defined for all positive integers and take on non-negative real values. (Some frequently used
functions used in algorithm design: lognlogn, n⎯⎯√n, nlognnlogn, n3n3, 2n2n). We say
that ff grows slower than gg and write f≺gf≺g, if f(n)g(n)f(n)g(n) goes to 0 as nn grows. We say
that ff grows no faster than gg and write f⪯gf⪯g, if there exists a constant cc such
that f(n)≤c⋅g(n)f(n)≤c⋅g(n) for all nn.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
/opt/conda/lib/python3.5/site-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may
take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This
may take a moment.')
/opt/conda/lib/python3.5/site-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may
take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This
may take a moment.')
Now, plotting a function is as easy as the following three lines of code. It shows the plot of a
function 7n2+6n+57n2+6n+5 in the range 1≤n≤1001≤n≤100. Note that the scale of the yy-axis
adjusts nicely.
In [2]:
n = np.linspace(1, 100)
plt.plot(n, 7 * n * n + 6 * n + 5)
plt.show()
Now, let us add a function 20n20n to the previous example to visualize that 20n20n grows slower
than 7n2+6n+57n2+6n+5.
In [3]:
n = np.linspace(1, 100)
plt.plot(n, 7 * n * n + 6 * n + 5, label="7n^2+6n+5")
plt.plot(n, 20 * n, label="20n")
plt.legend(loc='upper left')
plt.show()
Common rules
Before proceeding with visualizations, let's review the common rules of comparing the order of
growth of functions arising frequently in algorithm analysis.
n = np.linspace(1, 5)
plt.plot(n, 7 * n * n + 6 * n + 5, label="7n^2+6n+5")
plt.plot(n, 7 * n * n, label="7n^2")
plt.legend(loc='upper left')
plt.show()
n = np.linspace(1, 100)
plt.plot(n, 7 * n * n + 6 * n + 5, label="7n^2+6n+5")
plt.plot(n, 7 * n * n, label="7n^2")
plt.legend(loc='upper left')
plt.show()
We see that as nn grows, the contribution of 6n+56n+5 becomes more and more negligible.
Another way of justifying this, is to plot the function 7n2+6n+57n27n2+6n+57n2.
In [6]:
x = np.linspace(1, 100)
plt.plot(n, (7 * n * n + 6 * n + 5)/(7 * n * n))
plt.show()
n = np.linspace(1, 100)
plt.plot(n, (7 * n * n + 6 * n + 5)/(n * n))
plt.show()
Rule 2: Out of two polynomials, the one with larger degree grows
faster
For constants a>b>0a>b>0, nana grows faster than nbnb. This, in particular, means
that nb=O(na)nb=O(na). To visualize it, let's plot nn, n2n2, and n3n3.
In [8]:
n = np.linspace(1, 10)
plt.plot(n, n, label="n")
plt.plot(n, n * n, label="n^2")
plt.plot(n, n * n * n, label="n^3")
plt.legend(loc='upper left')
plt.show()
Let's now see what happens on a bigger scale: instead of the range 1≤n≤101≤n≤10, consider the
range 1≤n≤1001≤n≤100.
In [9]:
n = np.linspace(1, 100)
plt.plot(n, n, label="n")
plt.plot(n, n * n, label="n^2")
plt.plot(n, n * n * n, label="n^3")
plt.legend(loc='upper left')
plt.show()
n = np.linspace(1, 10)
plt.plot(n, n ** 4, label="n^4")
plt.plot(n, 2 ** n, label="2^n")
plt.legend(loc='upper left')
plt.show()
The plot reveals that in this range n4n4 is always greater than 2n2n. This however does not mean
that n4n4 grows faster than 2n2n! To ensure this, let's take a look at a larger
range 1≤n≤201≤n≤20.
In [11]:
n = np.linspace(1, 20)
plt.plot(n, n ** 4, label="n^4")
plt.plot(n, 2 ** n, label="2^n")
plt.legend(loc='upper left')
plt.show()
n = np.linspace(1, 20)
plt.plot(n, n, label="n")
plt.plot(n, np.log(n), label="log n")
plt.legend(loc='upper left')
plt.show()
n = np.linspace(1, 100)
plt.plot(n, n ** .5, label="n^.5")
plt.plot(n, np.log(n) ** 3, label="(log n)^3")
plt.legend(loc='upper left')
plt.show()
This looks strange: it seems that (logn)3(logn)3 grows faster than n⎯⎯√n. Let's do the standard
trick: increase the range from [1,100][1,100] to, say, [1,1000000][1,1000000].
In [14]:
n = np.linspace(1, 10 ** 6)
plt.plot(n, n ** .5, label="n^.5")
plt.plot(n, np.log(n) ** 3, label="(log n)^3")
plt.legend(loc='upper left')
plt.show()
Surprisingly, the logaritmic function is still above the polynomial one! This shows that it is in fact
dangerous to decide which function grows faster just by looking at how they behave for some not
so large values of nn. The rule "any polynomial grows faster than any polylogarithm" means
that eventually the polynomial function will become larger and larger than polylogarithmic. But the
rule does not specify for what value of nn this happens for the first time.
To finally ensure that n⎯⎯√n outperforms (logn)3(logn)3 eventually, let's increase the range
to 108108.
In [15]:
n = np.linspace(1, 10 ** 8)
plt.plot(n, n ** .5, label="n^.5")
plt.plot(n, np.log(n) ** 3, label="(log n)^3")
plt.legend(loc='upper left')
plt.show()
Also, let's consider an even large interval to make sure that these two functions don't switch back.
In [16]:
n = np.linspace(1, 10 ** 15)
plt.plot(n, n ** .5, label="n^.5")
plt.plot(n, np.log(n) ** 3, label="(log n)^3")
plt.legend(loc='upper left')
plt.show()
Exercise
As the final exercise, try to find the value of nn where n0.1n0.1 becomes larger
than (logn)5(logn)5.
In [17]:
n = np.linspace(1, 100)
plt.plot(n, n ** .1, label="n^.1")
plt.plot(n, np.log(n) ** 5, label="(log n)^5")
plt.legend(loc='upper left')
plt.show()
Course Overview
Hello everybody, welcome back to the Data Structures and Algorithm specialization and the
Algorithmic Toolbox course within it. This is the last lecture in the introductory unit and here we're
going to give sort of an overview of the course. And in particular, what we're going to do is we're
going to talk about sort of the philosophy of the course, and how it fits into the what we're going to
be teaching you within the rest of this course. So, there's a problem. Algorithm design is hard, and in
particular it's hard to teach. And by this I actually mean something pretty specific. Now, algorithms
solve many, many different problems. You can use them to find paths between locations on a map, or
find good matchings with some property, or identify images in a photograph. Many, many different
sort of unrelated sounding problems can all be solved by algorithms.
Play video starting at 56 seconds and follow transcript0:56
And because the sorts of things that an algorithm problem might ask you to do are so varied, there's
no unified technique that will allow you to solve all of them.
Play video starting at 1 minute 6 seconds and follow transcript1:06
And this is different from what you see in a lot of classes, when you're learning linear algebra they talk
about how do you solve systems of linear equations. And they teach you some technique, like row
reduction, and then you're sort of done. You just sort of need to practice it, and you can solve any
system of linear equations. They give you a system of linear equations, you turn the crank on this row
reduction technology and out pops an answer.
Play video starting at 1 minute 33 seconds and follow transcript1:33
For algorithms there isn't that sort of thing. There's no general procedure where I give you an
algorithms problem and you sort of plug it into this machine and turn a crank and out pops a good
algorithm for it. And this makes it hard to teach. If there was such a thing, we could just teach you,
here's this thing that you do. You do this, and you'll have a good algorithm for any problem you might
run into.
Play video starting at 1 minute 57 seconds and follow transcript1:57
And it's harder than that. I mean, sometimes, in order to find a good algorithm, it requires that you
have a unique insight. You're working on some problem that no one's ever looked at before. In order
to find a good algorithm for it, you need to come up with some clever idea that no one else has ever
come up with before. This is why sort of algorithms are so well studied, why they're such an active
field of research. There are still so many different new things yet to be discovered there. And we
certainly can't teach you things that haven't been discovered yet. And we also can't teach you things
custom tailored to the problems that you are going to run into in your life as a programmer.
Play video starting at 2 minutes 39 seconds and follow transcript2:39
So since we can't teach you everything you need to know about how to solve all of your algorithm
problems, what can we teach you?
Play video starting at 2 minutes 47 seconds and follow transcript2:47
Well, there are sort of two things. One thing that we can definitely give you is practice designing
algorithms. We're going to have lots of homework problems with lots of things for you to work on,
and this will give you practice, how do you, given a problem you haven't seen before, come up with a
good algorithm for it? Once you have the algorithm, how do you implement it and make sure
everything works and runs reasonably well? That's something you can practice. And it turns out that
for the type of problems where they're sort of very general and can be many different things, I mean,
it's possible to solve a lot of them, and one of the ways to be able to solve them is practice.
Play video starting at 3 minutes 23 seconds and follow transcript3:23
But we're also going to do more. We're not just going to throw you in the deep end and say, try to
swim, try to program all of these algorithms. There is something useful.
Play video starting at 3 minutes 33 seconds and follow transcript3:33
We can't teach you a generic procedure that will solve any algorithms problem for you. But what we
can do is we can give you some common tools. Some very useful tools for algorithm design. And
especially in this first course in our specialization we're really going to focus on helping to build up
your algorithmic toolbox.
Play video starting at 3 minutes 54 seconds and follow transcript3:54
And in particular, this course is going to focus on three of the most common and most generally
applicable algorithmic design techniques.
Play video starting at 4 minutes 4 seconds and follow transcript4:04
The first of these is greedy algorithms. This is something where you're trying to construct some big
object, and the way you do it is you sort of make one decision in the most greedy, locally optimal way
you can.
Play video starting at 4 minutes 16 seconds and follow transcript4:16
And once you've made that decision you make another decision in the most greedy, locally optimal
way you can. And you just keep making these decisions one at a time until you have an answer. And
surprisingly somehow making these locally optimal decisions gives you a globally optimal solution.
Play video starting at 4 minutes 33 seconds and follow transcript4:33
And when this happens it gives you very clean algorithms and it's great.
Play video starting at 4 minutes 38 seconds and follow transcript4:38
That's the first thing we'll talk about. Next, we'll talk about divide and conquer, which is a technique
where you've got some big problem you're trying to solve. What you do is you break it into a bunch of
little pieces, you solve all the pieces, and then you put their answers together to solve the original
thing. Finally we'll talk about dynamic programming. This is a little bit more subtle of a technique. This
is what you get when you've got some sort of large problem, that has sort of a lot of, not sub-
problems, but sort of related problems to it. And this sort of whole family of related problems, their
solutions sort of depend on one another in a particular type of way.
Play video starting at 5 minutes 17 seconds and follow transcript5:17
And when you have it there's this great trick that you have, where you sort of start at the small
problems at the bottom of the pile. And you solve all of them. And you sort of keep track of all of your
answers. And you use the answers to the small problems, to build up to obtain answers to the larger
and larger problems.
Play video starting at 5 minutes 34 seconds and follow transcript5:34
So these are what we're going to talk about. Each of the techniques we're going to talk about, how
you recognize when it applies, how do you analyze it when it applies, and some practical techniques
about how to implement, how to use them. All that good stuff.
Play video starting at 5 minutes 52 seconds and follow transcript5:52
So there's one other thing before we let you go into the fun world of greedy algorithms that you
should keep in mind throughout this course, and that's that there are these, maybe, different levels of
algorithm design. There's sort of different levels of sophistication that go into it.
Play video starting at 6 minutes 8 seconds and follow transcript6:08
At sort of the very lowest level, or top of this slide, I guess, there is the naive algorithm. This is sort of
a thing where you take the definition of a problem and you turn it into an algorithm, and we saw this
for Fibonacci numbers and greatest common divisors. You sort of interpreted the definition of the
thing you wanted to compute as an algorithm, and you were done. Now, these things are often very
slow, as we saw. Often they look like in order to find the best way of doing something, we enumerate
all ways to do it, and then figure out which one's the best. On the other hand, these are slow, but it's
often a good idea to first come up with a naive algorithm, just make sure you have some algorithm
that works.
Play video starting at 6 minutes 53 seconds and follow transcript6:53
Sometimes this works well and often you can just be done with it. Other times, it's too slow, but at
least you made sure that you understood what problem you were working on and have something
that runs.
Play video starting at 7 minutes 6 seconds and follow transcript7:06
But after that, the next thing that you want to do, if this naive algorithm is too slow, is you try and
look at your tool box. You say, are there any standard techniques that I know that apply here? Maybe
there's a greedy algorithm that solves this problem, or maybe I have to use a dynamic program. But if
you can find one of these standard techniques that work, often that doesn't involve too much effort
on your part, and gives you something that works pretty well.
Play video starting at 7 minutes 34 seconds and follow transcript7:34
Now once you have something that works, you often want to optimize it. And there are lots of ways
to improve an existing algorithm. Reduce the runtime from n-cubed to n-squared or n-squared to n.
And to do this, there are just a whole bunch of things. Maybe sometimes you could just sort of
rearrange the order in which you do the operations to cut out some of the work that you do.
Sometimes you have to introduce a data structure to speed things up. There are a bunch of ways to
do this. We'll talk a little bit about how this works. And these three levels are things that you should
be comfortable with and able to apply pretty well by the end of this course.
Play video starting at 8 minutes 10 seconds and follow transcript8:10
However, sometimes these three are not enough.
Play video starting at 8 minutes 15 seconds and follow transcript8:15
Sometimes a naive algorithm is just too slow, the standard tools don't apply, there's nothing that you
can really optimize to improve things. Sometimes in order to get a workable algorithm, what you
need is magic. You need some unique insight that no one else has ever had before.
Play video starting at 8 minutes 32 seconds and follow transcript8:32
You need some sort of clever new idea and these, there's only so much we can do to teach you how
to produce magic. We will show you some examples of things that really did have clever ideas that
maybe you can't reproduce the thought process like, how do you come up with this crazy idea, that
just happens to make this work? You should at least be able to appreciate the sort of thought that
goes into this sort of thing. In any case it's something to keep in mind when looking on that, when
thinking about our problems, and what sort of things are expected of you.
Play video starting at 9 minutes 7 seconds and follow transcript9:07
In any case, that is basically it for the introductory segment. We've talked a lot about sort of why
algorithms are important and given you some examples. We've talked about asymptotic notation, but
now it's time to let you go to the rest of the course. The rest of the course will keep giving you
exercises to hone your skills, and each unit of this course will cover one of these major techniques.
After I leave you with the end of the introduction, Michael will pick up and talk to you about greedy
algorithms. Next off, Neil will talk to you about divide and conquer. Finally, Pavel will have a unit on
dynamic programming. Each of these, they will talk to you about where the technique applies, how to
analyze it, how to implement it, all that good stuff.
Play video starting at 9 minutes 51 seconds and follow transcript9:51
But this is where I leave you, I hope you enjoyed the introduction, and I will put you in Michael's very
capable hands to start learning about greedy algorithms starting in the next lecture. So, until then,
farewell.
Week3
Greedy Algorithms
Week 3
Algorithmic Toolbox
Week 3
Discuss and ask questions about Week 3.
Greedy Algorithms
In this module you will learn about seemingly naïve yet powerful class of algorithms called greedy
algorithms. After you will learn the key idea behind the greedy algorithms, you may feel that they
represent the algorithmic Swiss army knife that can be applied to solve nearly all programming
challenges in this course. But be warned: with a few exceptions that we will cover, this intuitive
idea rarely works in practice! For this reason, it is important to prove that a greedy algorithm
always produces an optimal solution before using this algorithm. In the end of this module, we will
test your intuition and taste for greedy algorithms by offering several programming challenges.
Less
Key Concepts
Practice implementing greedy solutions
Build greedy algorithms
Create a program for changing money optimally
Create a program for maximizing the value of a loot
Create a program for maximizing the number of prize places in a competition
Less
Introduction
1h
Resume
. Click to resume
Video: LectureLargest Number
2 min
10 min
Video: LectureCar Fueling
7 min
9 min
Video: LectureMain Ingredients of Greedy Algorithms
2 min
3 questions
Grouping Children
6 min
5 min
5 min
Fractional Knapsack
Video: LectureLong Hike
6 min
6 min
2 min
Reading: Resources
2 min
3 questions
Programming Assignment 3
10 min
1h
3h
Survey
Survey
10 min
http://dm.compsciclub.ru/app/list
Interactive Puzzles
These interactive puzzles will help you develop your problem
solving skills. Try them before attempting to solve the coding
challenges described in Learning Algorithms Through
Programming and Puzzle Solving textbook that powers our
online specialization Data Structures and Algorithms at
Coursera and MicroMasters at edX. Some of these puzzles will
also help you to solve problems in our Introduction to Discrete
Mathematics for Computer Science Specialization at Coursera.
Interactive Puzzle: Car Fueling
A car can travel at most 3 miles on a full tank. You want to make as few refills
as possible while getting from A to B. Select gas stations where you would like
to refill and press "Start journey" button.
(These interactive puzzles are taken from the Introduction to Discrete Mathematics for Computer
Science specialization. They are optional, but we strongly encourage you to solve them: they will
help you to "invent" the key algorithmic ideas on your own and will help you to solve the
programming challenges. Even if you fail to solve some puzzles, the time will not be lost as you
will better appreciate the beauty and power of the underlying ideas.)
Passed 100%
This course uses a third-party tool, Interactive Puzzle: Car Fueling, to enhance your learning
experience. No personal information will be shared with the tool.
Correct
We cannot refill outside of gas stations, we don't need to refill twice at the same gas station and
we don't consider the initial refill at A, so there can be at most n refills. In some cases, we will
need to refill at every gas station: for example, if the distance between A and the first gas station
is L and the distance between any two neighboring gas stations is also L.
Hi, in this video you will learn to implement the greedy algorithm from the previous video, and
analyze its running time. Here we have the pseudocode for this algorithm, and the procedure is called
MinRefills. And the main input to this procedure is array x. From the problems statement, we know
that the positions of the gas stations are given by numbers from x1 to xn. And those are measured in
kilometers in terms of distance from A to the corresponding gas station along the path from A to B.
For convenience, we actually add to array x positions of point A which is x0 and is the smallest value
in the array. And point B, which is Xn plus 1, and it is the largest value in the array x.
Play video starting at 49 seconds and follow transcript0:49
Along our route from A to B, we will visit some points. Of course we will start from point A. And then
we'll probably go to some gas station, refilll there. And then go to another gas station and to another
gas station and then to another and then at some point we will get to the point B or point x n plus 1.
So we see that we only need to store the positions in the array x. We don't need to consider any
positions between the elements of array x. And so, we will store in the variable currentRefill,
Play video starting at 1 minute 24 seconds and follow transcript1:24
the position in the array x where we're currently standing. And we will initialize it with 0. Because we
start from point A, which is the same as x0, and has index 0 in the array x.
Play video starting at 1 minute 39 seconds and follow transcript1:39
And later currentRefill will store the index in the array x, where we're currently standing.
Play video starting at 1 minute 47 seconds and follow transcript1:47
We'll also store the answer to our problem in the variable numRefills.
Play video starting at 1 minute 53 seconds and follow transcript1:53
At each point in the execution of the algorithm, it will contain the number of refills we have already
made. And we initialize it with zero because the problem statement asks us to count the minimum
number of refills we need to do. Not counting the initial refill at point A. So when we are standing at
point A, we consider that we haven't made any refills yet.
Play video starting at 2 minutes 19 seconds and follow transcript2:19
Then the main external while loop goes.
Play video starting at 2 minutes 24 seconds and follow transcript2:24
And it goes on while we're still to the left from point B, because then we need to go right to reach our
destination B. And we check this condition with this inequality, that currentRefill is at most n. This
means that the position or index in the array x is at most n, and so we're to the left from point B
currently. In this case we still need to go to the right. And first we save our current position in the
array x in the variable lastRefill. This means that we made our lastRefill in the position currentRefill.
Play video starting at 2 minutes 59 seconds and follow transcript2:59
And now we need to go to the right from there, and either get to destination B or get to the rightmost
reachable gas station and refill there. And the next internal while loop does exactly that.
Play video starting at 3 minutes 14 seconds and follow transcript3:14
It gradually increases our currentRefill position in the array x until it reaches the rightmost point in the
array x which is reachable from the lastRefill position.
Play video starting at 3 minutes 26 seconds and follow transcript3:26
So first we check that currentRefill position is at most n because if it is n plus 1 already it means that
we reached our destination B. And there is no point increasing it further. If it's still to the left from B,
then we'll look at the next position to the right, x currentRefill plus 1.
Play video starting at 3 minutes 47 seconds and follow transcript3:47
We need to check whether it's reachable from lastRefill position or not. And first we can build the
distance from the lastRefill position to the currentRefill plus one position by subtracting the values of
the array x. And if this distance is at most L, then it means that we can travel this distance with full
tank, without any refills. And of course, at the lastRefill position, we could fill our tank up to the full
capacity. And then we'll be able to travel for L kilometers. So, this inequality checks if actually position
currentRefill plus 1 is reachable from the lastRefill position. If it is, we increase the value of
currentRefill and go on with our internal while loop. When we exit this internal while loop we're
already maybe in the point B, or we may be in some point which is the farthest reachable gas station.
Play video starting at 4 minutes 46 seconds and follow transcript4:46
Now we compare it with our lastRefill position. And if it turns out that it is the same, it means that we
couldn't go to the right. We don't have enough fuel even to get to the next gas station. And then, we
cannot return the minimum number of refills that we need to do on the way from A to B, because it is
impossible to get from A to B at all. And so we return this result IMPOSSIBLE. Otherwise, we moved at
least a bit to the right, and then we need to see. If we are already in the point B, we don't need to do
anything else. Otherwise, we need to refill there. So, we check that we're to the left from point B with
this inequality. And if it's true then we're at some gas station and we need to refuel. So we increase
the numRefills variable by one. And then we return to the start of our external while loop. And there
we again check if we're to the left from point B we need another iteration. And if currentRefill is
already n plus 1, then we've reached point B and we need to exit the external while loop. And in that
case, we just return the answer which is number of refills we've made so far.
Play video starting at 6 minutes 0 seconds and follow transcript6:00
We've implemented the greedy algorithm from the previous lecture. Now let's analyze its running
time. From the first look it can seem that it works in n squared time because we have the external
while loop which can make n iterations and internal loop which can make n iterations. So, n by n is n
squared, but actually we will show that it only makes O of n actions for the whole algorithm. To prove
that let's first look at the currentRefill variable.
Play video starting at 6 minutes 33 seconds and follow transcript6:33
We see that it only changes one by one here.
Play video starting at 6 minutes 40 seconds and follow transcript6:40
And it starts from zero. And what is the largest value that it can attain?
Play video starting at 6 minutes 48 seconds and follow transcript6:48
Of course, the largest value is n plus 1 because this is the largest index in the array x, and currentRefil
is index in the array x. So, variable currentRefil starts from zero, changes one by one. And the largest
value it can have is n plus 1. It means that it is increased at most n plus 1 times which is Big-O of n. But
that's not all we do. Also, we increase variable numRefills here.
Play video starting at 7 minutes 20 seconds and follow transcript7:20
But we also increase it always one by one. It also starts from zero. And what is the largest number
that this variable can attain?
Play video starting at 7 minutes 29 seconds and follow transcript7:29
Well, of course, it is n because when we have n gas stations. There is no point to refuel twice at the
same gas station. So we can refuel at most n times. So variable numRefills goes from 0 to n and
changes one by one. So it is only changed at most n times.
Play video starting at 7 minutes 49 seconds and follow transcript7:49
And so, it is also linear in terms of n. And so, we have at most n plus 1 iterations of the external while
loop. Everything but the internal while loop takes there constant time. This assignment, this if, and
this if with assignment.
Play video starting at 8 minutes 6 seconds and follow transcript8:06
And the external loop and internal loop combined also spend at most linear time of iterations.
Because they change variable currentRefill and it changes at most linear number of times. So all in all
our algorithm works in Big-O n time.
Play video starting at 8 minutes 25 seconds and follow transcript8:25
Let's go through this proof once again. The Lemma says that the running time of the whole algorithm
is big O of n. And we prove this by first noticing that the currentRefill variable changes only from zero
to at most n plus 1. And the change is always one by one. That the numRefills variable changes from
zero to at most n. And it also changes one by one. So, both these variables are changed Big-O of n
times. And everything else that happens is constant time. Each iteration of the external loop, and
there are at most n plus 1 iterations of the external loop. Thus, our algorithm works in linear time. In
the next video, we will review what we've learned about greedy algorithms in general.
Hi. In this video, we will briefly review the main ingredients of greedy algorithms and the first of them
is reduction to a subproblem. Basically, when you have some problem, you make some first move and
thus reduce your problem to a similar problem, but which is smaller. For example, you have fewer
digits left or fewer gas stations left in front of you and this similar problem, which is smaller is called a
subproblem. Another key ingredient is a safe move and the move is called safe if it is consistent with
some optimal solution. In other words, if there exists some optimal solution in which the first move is
the same as your move, then your move is called a safe move and not all first moves are actually safe.
For example, to go until there's no fuel is not a safe move in the problem about car fueling. And often,
greedy moves are also not safe, for example, to get to the closest gas station and refuel at it is not a
safe move while to get to the farthest gas station and refuel there is a safe move. Now the general
strategy of solving a problem goes like this. First, you analyze the problem and you come up with
some greedy choice and then the key thing is to prove that this greedy choice is a safe move and you
really have to prove it. Because, otherwise, you can come up with some greedy choice and then come
up with a greedy algorithm and then even implement it and test it and try to submit it in the system.
Only to learn that the algorithm is incorrect, because the first move is actually not a safe move and
there are cases in which this first move is not consistent with any optimal solution. And in that case,
we will have to invent a new solution and implement it from scratch. All the work you've done before
will be useless. So please prove your algorithms and prove that the first move is a safe move. When
you prove that, you reduce a problem to a subproblem. And hopefully, that is a similar problem,
problem of a same kind and then you start solving this subproblem the same way. You make your
greedy choice and you reduce it to subproblem, and you iterate until there are no problems left or
until your problem is so simple that you can just solve it right away. And in the next lessons, we will
apply greedy algorithms to solve more difficult problems.
Celebration Party Problem
Hi, in this lesson we will discuss the problem of
organizing children into groups.
And you will learn that if you use a naive algorithm to solve this problem,
it will work very,
very slowly, because the running time of this algorithm is exponential.
But later in the next lesson, we will be able to improve the training time
significantly by coming up with a polynomial time algorithm.
Play video starting at 23 seconds and follow transcript0:23
Let's consider the following situation.
You've invited a lot of children to a celebration party, and
you want to entertain them and also teach them something in the process.
You are going to hire a few teachers and
divide the children into groups and assign a teacher to each of the groups
this teacher will work with this group through the whole party.
Play video starting at 44 seconds and follow transcript0:44
But you know that for a teacher to work with a group of children efficiently
children of that group should be of relatively the same age.
More specifically age of any two children in the same group
should differ by at most one year.
Play video starting at 1 minute 1 second and follow transcript1:01
Also, you want to minimize the number of groups.
Because you want to hire fewer teachers, and
spend the money on presents and other kinds of entertainment for the children.
So, you need to divide children into the minimum possible number of groups.
Such that the age of any two children in any group differs by at most one year.
Play video starting at 1 minute 23 seconds and follow transcript1:23
Now, let's look at the pseudo code for the naive algorithm that solves this problem.
Play video starting at 1 minute 29 seconds and follow transcript1:29
Basically, this algorithm will consider every possible partition
of the children into groups and find the partition which both
satisfies the property that the ages of the children
in any group should differ by at most one and contains the minimum number of groups.
Play video starting at 1 minute 49 seconds and follow transcript1:49
We start with assigning the initial value of the number of
groups to the answer m and this initial value is just the number of children.
Because we can always divide all the children into groups of one, and
then of course each group has only one child so the condition is satisfied.
Play video starting at 2 minutes 9 seconds and follow transcript2:09
Then we consider every possible partition of all children into groups.
The number of groups can be variable, so this is denoted by a number k,
and we have groups G1, G2 and up to Gk.
Play video starting at 2 minutes 24 seconds and follow transcript2:24
And then we have a partition,
we first need to check whether it's a good partition or not.
So, we have a variable good which we assigned to true initially,
because we think that maybe this partition will be good.
But then we need to check for each group whether it satisfies our condition or not.
So, we go in a for group with index i of the group from 1 to k,
and then we consider the particular group GI, and
we need to determine whether all the children in this group differ by at most
1 year, or there are two children that differ more.
Play video starting at 2 minutes 59 seconds and follow transcript2:59
To check that, it is sufficient to compare the youngest child with the oldest child.
If their ages differ more than by one, then the group is bad.
Otherwise, every two children differ by at most one year, so the group is good.
And so we go through all the groups in a for loop.
If at least one of the groups is bad,
then our variable good will contain value false by the end.
Otherwise, all the groups are good, and the variable good will contain value true.
So, after this for loop, we check the value of the variable good, and
if it's true, then we improve our answer.
At least try to improve it.
With a minimum of its current value and
the number of the groups in the current partition.
And so, by the end of the outer for loop which goes through all the partitions,
our variable m will contain the minimum possible number of groups in a partition
that satisfies all the conditions.
It is obvious that this algorithm works correctly
because it basically considers all the possible variants and
selects the best one from all the variants which satisfy our condition on the groups.
Play video starting at 4 minutes 14 seconds and follow transcript4:14
Now, let us estimate the running time of this algorithm.
Play video starting at 4 minutes 18 seconds and follow transcript4:18
And I state that the number of operations that this algorithm makes
is at least 2 to the power of n, where n is the number of children in C.
Actually, this algorithm works much slower and
makes much more operations than 2 to the power of n, but
we will just prove this lower bound to show that this algorithm is very slow.
Play video starting at 4 minutes 40 seconds and follow transcript4:40
To prove it, let's consider just partitions of the children in two groups.
Of course there are much more partitions than that.
We can divide them in two, three, four, and so on.
Much more groups.
But, let's just consider partitions in two groups and
prove that even the number of such partitions is already
at least two to the power of n.
Really, if C is a union of two groups,
G1 and G2, then basically we can make such partition for
any G1 which is a subset of such C of all children.
For any G1, just make
group G2 containing all the children which are not in the first group.
And then all the children will be divided into these two groups.
So, now the size of the set of all children is n.
Play video starting at 5 minutes 35 seconds and follow transcript5:35
And if you want to compute the number of possible groups G1 then we should
note that each item of the set, or
each child, can be either included or excluded from the group G1.
So, there can be 2 to the power of n different
groups G1. And so there are at least 2 to the power of n
partitions of the set of all children in two groups.
and it means that our algorithm will do at
least 2 to the power of n operations because this considers every partition.
Among all the partitions, there are all the partitions into two groups.
Play video starting at 6 minutes 16 seconds and follow transcript6:16
So, how long will it actually work?
We see that the Naive algorithm works in time Omega (2n),
so it makes at least 2 to the power of n operations.
And for example for just 50 children this is at least 2 to the power of 50 or
the larges number which is on the slide.
This is the number of operations that we will need to make and I estimate that
on a regular computer, this will take at least two weeks for you to compute
this if this was exactly the number of operations that you would need.
So, it works really, really slow.
But in the next lesson we will improve this significantly.
Hi, in this lesson you will learn how to solve the problem of
organizing children into groups more efficiently.
Most specifically, we will come up with a polynomial time algorithm for
this problem as opposed to the exponential type algorithm from the previous lesson.
But in order to do this, we first need to do a very important thing
that you should probably do every time before solving an
Hi, in this lesson you will learn how to solve the problem of organizing children into groups more
efficiently. Most specifically, we will come up with a polynomial time algorithm for this problem as
opposed to the exponential type algorithm from the previous lesson. But in order to do this, we first
need to do a very important thing that you should probably do every time before solving an
algorithmic problem. You should reformulate it in mathematical terms. For example, in this problem
we will consider points on the line instead of children. For example, if we have a child of age three
and a half years we will instead consider a point on the line with coordinate 3.5 and if we have
another child of age 6, we will instead consider a point with coordinate 6 on the line.
Play video starting at 51 seconds and follow transcript0:51
Now, what do groups of children correspond to? If we have a group of children, it consists of several
children and several points on the line correspond to this group and the fact that the age of any two
children in the group differs by at most one, means that there exists a segment of length one on this
line that contains all those points.
Play video starting at 1 minute 15 seconds and follow transcript1:15
Now the goal becomes to select the minimum possible number of segments of length one, such that
those segments cover all the points. Then, if we have such segments, we can just take all the points
from that segment in the same group, and any two children in the group who differ by, at most, one
year.
Play video starting at 1 minute 40 seconds and follow transcript1:40
Now let's look at an example.
Play video starting at 1 minute 43 seconds and follow transcript1:43
We have a line with a few points on it and we want to cover all the points with segments of length
one. Here is one example of such covering. All the segments on the picture are of the same length and
we consider that this is length one of this line.
Play video starting at 2 minutes 0 seconds and follow transcript2:00
This is not the optimal solution because below there is another example of covering and we have only
three segments and they still cover all the points.
Play video starting at 2 minutes 11 seconds and follow transcript2:11
Now we want to find a way to find the minimum possible number of segments to cover all the points
in any configuration.
Play video starting at 2 minutes 19 seconds and follow transcript2:19
We want to do that using greedy algorithm and you probably remember from the previous lessons
that to come up with a greedy algorithm, you need to do a greedy choice and to prove that this
greedy choice is a safe move.
Play video starting at 2 minutes 33 seconds and follow transcript2:33
I state that in this problem, safe move is to cover the leftmost point with a segment of length one
which starts or has left end in this point.
Play video starting at 2 minutes 46 seconds and follow transcript2:46
To prove that this is really a safe move, we need to prove that there exists an optimal solution with
the minimum possible number of unit length segments such that one of the segments has its left end
in the leftmost point.
Play video starting at 3 minutes 4 seconds and follow transcript3:04
Let's prove that.
Play video starting at 3 minutes 7 seconds and follow transcript3:07
To do that, let's consider any optimal solution of a given problem with a given point.
Play video starting at 3 minutes 14 seconds and follow transcript3:14
Let's consider the leftmost point colored in green.
Play video starting at 3 minutes 20 seconds and follow transcript3:20
It is covered by some segment.
Play video starting at 3 minutes 23 seconds and follow transcript3:23
Colored in red.
Play video starting at 3 minutes 25 seconds and follow transcript3:25
Now, let's move this red segment to the right until it's left end is in this leftmost point. I say that we
didn't miss any of the points in the process, because this green point is the leftmost point so there are
no points to the left from it and while we are moving the segment to the right, we didn't miss any of
the points.
Play video starting at 3 minutes 53 seconds and follow transcript3:53
It means that what we have now is still a correct covering because all of the points are still covered
and the number of segments in this covering is the same as in some optimal solution from which we
started and that means that it is also an optimal covering. So we have just found an optimal solution
in which there is a segment which starts in the leftmost point. So, we proved that covering the
leftmost point with a segment which starts in it is a safe move.
Play video starting at 4 minutes 28 seconds and follow transcript4:28
Now that we have a safe move, let's consider what happens after it. We have the leftmost point
covered, and also maybe some other points covered. So we don't need to consider these points
anymore. We are not interested in them and we need to cover all the other points with the minimum
possible number of unit length segments. So this is the same kind of problem which we started with,
so this is a subproblem.
Play video starting at 4 minutes 58 seconds and follow transcript4:58
Basically it means that we have a greedy algorithm. First, make a safe move. Add a segment to the
solution with the left hand starting in the leftmost point. Then remove all the points which are already
covered by the segment from the set and if there are still points left, repeat the process and repeat
this process until there are no points left in the set.
Analysis and Implementation of the Efficient Algorithm
Correct
l = x_1 = 5, r = x_1 + 1 = 6l=x1=5,r=x1+1=6. ii will start from 11, then it will be incremented in
the line before the inner while loop, then it will be incremented in the inner while loop
until x_ixi becomes greater than r = 6r=6, and that will happen for i = 5i=5, because x_4 = 6
\leq 6x4=6≤6 and x_5 = 7 > 6x5=7>6.
5
is selected.This is correct.
l = x_1 = 5, r = x_1 + 1 = 6l=x1=5,r=x1+1=6. iiwill start from 11, then it will be incremented in
the line before the inner while loop, then it will be incremented in the inner while loop until x_ixi
becomes greater than r = 6r=6, and that will happen for i = 5i=5, because x_4 = 6 \leq 6x4
=6≤6and x_5 = 7 > 6x5=7>6.
6
Now let us consider the pseudocode that implements this algorithm. For the sake of simplicity we
assume that the points in the input are given in sorted order from smallest to largest. We'll start with
an empty set of segments denoted by R and we start with index i pointing at the first point which is
the leftmost because the points are sorted.
Play video starting at 25 seconds and follow transcript0:25
Now we go through the points, and we find the leftmost point. Currently i is pointing to the leftmost
point in the set. And at the start of the while loop i will always point to the leftmost point which is still
in the set. Now we cover it with the segment from l to r which has unit length, and the left end in the
point xi, so this is a segment from xi to xi+1. We add this segment to the solution set, and then we
need to remove all the points from the set which already covered. Instead of removing the points, we
will just move the pointer to the right and forget about the points, which are to the left from the
pointer. So the next while loop, does exactly that. We know that for any i that is larger than the
current i, xi is to the right from the left end of the segment, because the points are sorted. So if xi is
also to the left from R, then it is covered by the segment.
Play video starting at 1 minute 34 seconds and follow transcript1:34
So we just go to the right, and to the right with pointer i. And while xi is less than or equal to r, we
know that the point is covered. And as soon as we find some xi which is bigger than r, it means that
this point is not covered and all the points further in the array are also not covered. So we stop. And
then we repeat again the iteration of the outer while loop. Or maybe our pointer i is already out of
the array of the input points. And then we stop and return R, which is the set of segments that we've
built in the process. Now let's prove that this algorithm works in linear time.
Play video starting at 2 minutes 22 seconds and follow transcript2:22
Indeed, index i changes just from 1 to n. And we always increase it by one. For each value of i, we add
at most one segment to the solution. So overall, we increase i at most n times and add at most n
segments to the solution.
Play video starting at 2 minutes 40 seconds and follow transcript2:40
And this leads to a solution which works in Big-O of n time. Now, we had an assumption that the
points in the input are already sorted. What if we drop this assumption? Then we will have to sort the
points first, and then apply our algorithm PointsCoverSorted.
Play video starting at 2 minutes 59 seconds and follow transcript2:59
Later in this module, you will learn how to sort points in time n log n.
Play video starting at 3 minutes 4 seconds and follow transcript3:04
Combining that with our procedure PointsCoverSorted will give you total running time of n log n.
Play video starting at 3 minutes 11 seconds and follow transcript3:11
Now let's look at our improvement.
Play video starting at 3 minutes 13 seconds and follow transcript3:13
We first implemented a straightforward solution, which worked in time at least 2 to the power of n.
Play video starting at 3 minutes 20 seconds and follow transcript3:20
And it worked very, very slowly. So that even for 50 children, we would have to spend at least 2
weeks of computation to group them.
Play video starting at 3 minutes 30 seconds and follow transcript3:30
Our new algorithm, however, works in n log n time. And that means that even if we had 10 million
children coming to a party, it would spend only a few seconds grouping them optimally into several
groups. So that's a huge improvement.
Play video starting at 3 minutes 46 seconds and follow transcript3:46
Now let's see how we went to this end. First, we've invented a naive solution which worked in
exponential time. It was too slow for us so we wanted to improve it. But to improve it, the very first
important step was to reformulate it in mathematical terms.
Play video starting at 4 minutes 4 seconds and follow transcript4:04
And then we had an idea to solve it with a greedy algorithm. So, we had to find some greedy choice
and prove that it will be a safe move. In this case, the safe move turns out to be to add to the solution
a segment with left and in the leftmost point. And then we prove that this is really a safe move. It is
very important to prove your solutions before even trying to implement them. Because otherwise, it
could turn out that you implemented the solution, tried to submit it, got wrong answer or some other
result, different from accepted. And then, you've made a few more changes, but it still didn't work.
And then, you understand that your solution was wrong, completely from the start. And then you
need a new solution, and you will have to implement it from scratch. And that means that you've
wasted all the time on implementation on the first wrong solution. To avoid that, you should always
prove your solution first.
Play video starting at 5 minutes 6 seconds and follow transcript5:06
So after we've proved the safe move, we basically got our greedy solution.
Play video starting at 5 minutes 11 seconds and follow transcript5:11
Which works in combination with a certain algorithm in time n log n.
Play video starting at 5 minutes 16 seconds and follow transcript5:16
Which is not only polynomial, but is very close to linear time, and works really fast in practice.
Fractional Knapasack
Long Hike
In the initial "Long Hike" problem, we were given food items, their total weights and energy values,
and we wanted to maximize the total energy value of fractions of food items that fit into the
knapsack of capacity W. In the new mathematical formulation, we are again given a knapsack of
capacity W and some items. We know the weights and values of the items, and we want to
maximize the total value of fractions of items that fit into the knapsack. What do the weights and
values in this mathematical formulation correspond to in the initial Long Hike problem?
Weights correspond to the weights of the food items and values correspond to the energy values
(calories).
Correct
Indeed, we maximize the total value, which is total energy value in this case, with restriction on the
total weight.
Weights correspond to the weights of the food items and values correspond to the energy values
(calories).
is selected.This is correct.
Indeed, we maximize the total value, which is total energy value in this case, with restriction on the
total weight.
Weights correspond to the energy values (calories) and values correspond to the weights of the
food items.
You are given a knapsack of capacity 7 kg and 3 items. First item has value $20 and weight 4 kg,
second item has value $18 and weight 3 kg, third item has value $14 and weight 2 kg. What is the
maximum total value of the fractions of items that fit into the knapsack in this case?
38
40
42
Correct
Turns out that $42 is the optimal value! In this video you will learn an algorithm that solves the
problem optimally. If you apply that algorithm to this case, you will get total value $42.
42
is selected.This is correct.
Turns out that $42 is the optimal value! In this video you will learn an algorithm that solves the
problem optimally. If you apply that algorithm to this case, you will get total value $42.
43
Hello. In this lesson, you will learn an algorithm to determine which food items and in which amounts
should you take with yourself on a really long hike so that to maximize their total energy value.
Play video starting at 13 seconds and follow transcript0:13
So, you're planning a long hike. It will take a few days or maybe a few weeks, but you don't know
exactly how long will it take. So, to be safe, you need to get enough food with you. And you have a
knapsack which can fit up to 15 kilograms of food in it. And you've already bought some cheese, some
ham, some nuts, and maybe some other food items. You want to fit them all in the knapsack, so as to
maximize the amount of calories that you can get from them. Of course you can cut the cheese. You
can cut the ham. You can select only some of the nuts. And then fit all of that into your knapsack.
Play video starting at 49 seconds and follow transcript0:49
To solve this maximization problem, we again need to first reformulate it in mathematical terms. And
then it becomes an instance of a classical fractional knapsack problem, which goes like this. We have
n items with weights w1 through wn and values v1 though vn.
Play video starting at 1 minute 9 seconds and follow transcript1:09
And a bag of capacity big W. And we want to maximize the total value of fractions of items that fit
into this bag.
Play video starting at 1 minute 20 seconds and follow transcript1:20
In this case, weights are also weights in the real life and values are the energy values of the food items
that you've bought.
Play video starting at 1 minute 31 seconds and follow transcript1:31
So, here's an example, and we will denote by dollars the value of the item, and the weight just by
numbers. So, for example, the first item has value $20 and has weight 4, the second item has value
$18 and weight 3, and the third item has value $14 and weight 2. And we have a knapsack of capacity
7. There are a few ways with which we can fill this knapsack. For example, of them is put the whole
first item and the whole second item in the knapsack. Then the total value is the sum of the values of
the first item and the second, which is $38. We can improve on that. For example, take the whole first
item, the whole third item, and only one third of the second item for a total value of $40. We can do
even better by taking the whole third item, the whole second item, and only half of the first item, and
that will give us $42. And actually it turns out that this is the optimal thing to do.
Play video starting at 2 minutes 44 seconds and follow transcript2:44
So now we want to create a greedy algorithm that will solve this maximization problem, and we need
to get some greedy choice and make a safe move. And to do that, we have to look at the value per
unit of weight. So, for example for the first item, value per unit of weight is $5, for the second item,
it's $6 per unit, and for the third one it's $7 per unit. So although the first item is most valuable, the
third item has the maximum value per unit. And of course there is an intuition that we should
probably fit first the items with the maximum value per unit.
Play video starting at 3 minutes 32 seconds and follow transcript3:32
And really, the safe move is to first try to fit the item with the maximum value per unit. And there's a
lemma that says that there always exists some optimal solution to our problem that uses as much as
possible of an item with the maximum value per unit of weight. And what do we mean by as much as
possible? Well, either use the whole item, if it fits into the knapsack, or, if the capacity of the
knapsack is less than how much we have of this item, then just fill the whole knapsack only with this
item.
Play video starting at 4 minutes 9 seconds and follow transcript4:09
Let's prove that this is really a safe move.
Play video starting at 4 minutes 13 seconds and follow transcript4:13
We will prove looking at this example. So, first let's suppose we have some optimal solution,
Play video starting at 4 minutes 21 seconds and follow transcript4:21
and let's suppose that in this optimal solution, we don't use as much as possible of the best item with
the highest value per unit of weight. Then take some item which we used in this solution and separate
its usage into two parts, one part of the same size of how much we have of the best item, and the
second part is everything else. Then we can substitute the first part with the best item. So, for
example, in this case, we substitute half of the first item with second item.
Play video starting at 5 minutes 3 seconds and follow transcript5:03
Of course, in this part, the total value will increase, because the value per unit of weight is better for
the best item than for the item currently used. And in the general case, this will also work. So, either
we will be able to replace some part of the item already used by the whole best item, or we can
replace the whole item that is already used by some part of the best item. And in any case, if we can
make such a substitution, of course the total value will increase, because the best item just has better
value per unit of weight, so for each unit of weight, we will have more value. So this gives us a greedy
algorithm to solve our problem.
Play video starting at 5 minutes 56 seconds and follow transcript5:56
What we'll do is while knapsack is still not full, we will do a greedy choice. We will choose the item
number i which has the maximum value of vi over wi, which is the value per unit of weight. And then
if this item fits into knapsack fully, then take of all this item. Otherwise, if there is only few space left
in the knapsack, take so much of this item as to fill the knapsack to the end. And then in the end, we'll
return the total value of the items that we took and how much did we take of each item.
1,3,2,4
1,2,3,4
Correct
For the first item, value per unit of weight is \frac{2}{1} = 212=2, for the second it is \frac{3}
{2} = 1.523=1.5, for the third it is \frac{4}{3} = 1.333\dots34=1.333…, and for the fourth it
is \frac{5}{4} = 1.2545=1.25. 2 > 1.5 > 1.333\dots2>1.5>1.333…> 1.25>1.25, so the
correct order is "1,2,3,4".
Hi.
In this lesson you will learn how to implement the Greedy Algorithm for
the Fractional Knapsack.
How to estimate its running time and how to improve its asymptotics.
Here is the description of the greedy algorithm from the previous lesson.
While knapsack is still not full, we select the best item left.
The one with the highest value per unit of weight.
And either fit all of this item in the knapsack or
if there is only few space left in the knapsack, cut this item and
fit as much as you can in what's left in the knapsack, and
then repeat this process until the knapsack is full.
In the end return the total value of the items taken and the amounts taken.
We've proven that the selection of best item is a safe move.
Then after we've selected the best item what we've got left is a knapsack with
a capacity which is less, but the problem is the same: you have some items and
you have a knapsack of some capacity and you should fill it optimally so
as to maximize the total value of the items that fit.
So this greedy algorithm really works.
Now let's implement it.
Play video starting at 1 minute 15 seconds and follow transcript1:15
Here we have a procedure called Knapsack.
It starts with filling the array A with amounts of items taken with 0 and
Play video starting at 1 minute 25 seconds and follow transcript1:25
the total value we also initialize to 0 and then,
as we said on the slide, we repeat for n times the following iterations.
If the knapsack is already full than in the variable W,
we will have 0 because in the start we have in the variable W the total
capacity of the knapsack, but each time we will put something in the knapsack, we will
update W will decrease it by the amount of weight that we put already in.
And so in the end, when the knapsack is full, W will be 0.
If W became 0, it means that we should just return the total value and
the amounts of the items that we took.
Play video starting at 2 minutes 7 seconds and follow transcript2:07
Otherwise we should select the best item.
The best item is the item which is still left so, wi is more than 0 and
out of such items, it's the item with the maximum value per weight,
so the one which maximizes vi over wi.
Play video starting at 2 minutes 25 seconds and follow transcript2:25
When we've selected the i, we determine the amount which will it take,
it is either the whole wi, the whole of this item if it fits in the knapsack.
Otherwise, if the capacity of the knapsack is already less,
then we just fill it up to the end.
So A is minimum of wi and big W.
Play video starting at 2 minutes 45 seconds and follow transcript2:45
After we select the amount, we just update all the variables.
So we update wi by decreasing it by a, because we took already a of this item.
We're also increased the amount
A of i corresponding to the item number i by the value A.
And we'll also decrease the capacity left, because we just decrease it by A.
By putting it A of item i.
Play video starting at 3 minutes 13 seconds and follow transcript3:13
Also we increase value V by this formula:
a multiplied by vi and divided by wi.
Why is that?
Because we took A of item number i, we took A units and
one unit brings us amount of value equal to vi over wi.
So if you take A units, the total value by these items,
a multiplied by vi and divided by wi.
Play video starting at 3 minutes 42 seconds and follow transcript3:42
After we do n iterations, or maybe less, if the knapsack is full before
we do all n iterations, we'll return the total value and the amounts in the array.
Play video starting at 3 minutes 55 seconds and follow transcript3:55
Now the running time of this algorithm is Big-O of n squared.
Why is that?
Well, first we have the inner selection of best item, which works in linear time.
Because basically, we have to go through all the items to select the best one.
And we have the main loop, for loop, which is executed n times at most, maybe less.
So in each iteration we do some linear time computation and
we do this at most n times.
That means that the total running time is Big-O of n squared.
Play video starting at 4 minutes 34 seconds and follow transcript4:34
Now we can't improve on that because if we sort the items by decreasing value of
vi over wi, then it will be easier to select the best item which is still left.
Let's look at this pseudo code.
Let's assume that we've already sorted the input items,
size that v1 over w1 is, more than or equal to v2 over w2 and that is greater,
or equal to the fraction for the next item and up to vn over wn.
Play video starting at 5 minutes 9 seconds and follow transcript5:09
And we can start with the same array of amounts and
the same total value filled with zeroes.
But then we make a for loop for i going from 1 to n.
And on each iteration i will be the best unit which is still left.
So on the start of the iteration we check whether we still have some capacity
in the knapsack.
If it is already filled we just return the answer.
Otherwise we know that i is the best item
because we didn't consider it previously and it is the item with the maximum
value per unit out of those which we didn't consider before.
So we determine the amount of this item with the same formula and
we update the weights, the amounts, the capacity, and
the total value in the same way as we did in the previous pseudocode.
The only change is that we change the order in which we consider the items.
And this allows us to make each iteration in constant time instead of linear time.
So, this new algorithm now works in linear time, because it has at most n iterations,
and each iteration works at most in constant time.
So, if we apply first some sorting algorithm
to sort items by decreasing value of vi over wi.
And then apply this new knapsack procedure.
Total run time will be n log n, because sorting will work in n log n.
And the knapsack itself will work in linear time.
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Knapsack: Section 6.5 of [BB]
References
[BB] Gilles Brassard and Paul Bratley. Fundamentals of Algorithms. Prentice-Hall. 1996.
Programming assignment 3
http://dm.compsciclub.ru/app/quiz-balls-in-boxes
http://dm.compsciclub.ru/app/quiz-activity-selection
http://dm.compsciclub.ru/app/quiz-touch-all-segments
Week 4
Algorithmic Toolbox
Week 4
Discuss and ask questions about Week 4.
Go to forum
Divide-and-Conquer
In this module you will learn about a powerful algorithmic technique called Divide and Conquer.
Based on this technique, you will see how to search huge databases millions of times faster than
using naïve linear search. You will even learn that the standard way to multiply numbers (that you
learned in the grade school) is far from the being the fastest! We will then apply the divide-and-
conquer technique to design two efficient algorithms (merge sort and quick sort) for sorting huge
lists, a problem that finds many applications in practice. Finally, we will show that these two
algorithms are optimal, that is, no algorithm can sort faster!
Key Concepts
Express the recurrence relation on the running time of an algorithm
Create a program for searching huge lists
Create a program for finding a majority element
Create a program for organizing a lottery
Introduction
10 min
Resume
. Click to resume
Video: LectureIntro
3 min
Video: LectureLinear Search
7 min
Ungraded External Tool: Ungraded External ToolInteractive Puzzle: Two Adjacent Cells of Opposite
Colors
10 min
Video: LectureBinary Search
7 min
8 min
Reading: Resources
10 min
4 questions
Polynomial Multiplication
6 min
7 min
6 min
Reading: Resources
5 min
3 questions
Master Theorem
4 min
9 min
Reading: Resources
10 min
1 question
Sorting Problem
Video: LectureProblem Overview
2 min
Video: LectureSelection Sort
8 min
Video: LectureMerge Sort
10 min
Video: LectureLower Bound for Comparison Based Sorting
12 min
7 min
Reading: Resources
5 min
Practice Quiz: Sorting
4 questions
Quick Sort
Video: LectureOverview
2 min
Video: LectureAlgorithm
9 min
Video: LectureRandom Pivot
13 min
15 min
Video: LectureEqual Elements
6 min
Video: LectureFinal Remarks
8 min
Reading: Resources
10 min
4 questions
Programming Assignment 4
3h
http://dm.compsciclub.ru/app/quiz-opposite-colors
Binary Search
For not found :
Based on midpoint idea
Hi, so let's talk now about binary search.
Play video starting at 4 seconds and follow transcript0:04
A dictionary is a good example of a ordered list.
Okay, basically where every word is in order.
And that makes finding words much easier.
You can imagine how difficult it would be to search a dictionary
if the order of the words was random.
You'd have to just search through every single page, and in fact,
every word on every page.
It'd take quite a long time.
Play video starting at 25 seconds and follow transcript0:25
So let's look at the problem statement for searching in a sorted array.
So what we have coming in is, A, an array, along with a low and
upper bound that specify the bounds within the array in which to search.
What's important about the array is that it's in sorted order.
What we mean by that is if we look at any index i at an element.
And then the next element, that this first element is no more than the next element.
We don't say less than because we want to allow for
arrays that have repeated elements.
Play video starting at 58 seconds and follow transcript0:58
So officially this is called a monotonic non-decreasing array.
Play video starting at 1 minute 4 seconds and follow transcript1:04
The other input is the key to look for.
Play video starting at 1 minute 7 seconds and follow transcript1:07
The output for this is an index such that the element at
that index in the array is equal to the key.
We say an element and not the element just as we did in linear search.
because of the fact that there may be more than one element--
more than one element that matches because there may be duplicates in the array.
Play video starting at 1 minute 28 seconds and follow transcript1:28
If we don't have a match,
instead of returning NOT_FOUND as we did in the linear search case,
we're going to actually return somewhat more useful information,
which is where in the array would you actually
insert the element if you wanted to insert it?
Or where would it have been, if it were there?
So what we're going to return is the greatest index,
such that A sub i is less than k.
That is, if the key is not in the array,
we're returning an index such that if you look at the element at that index,
it's less than the key but the next element is greater than the key.
Play video starting at 2 minutes 7 seconds and follow transcript2:07
And we do have to take account of the fact that what if every element in
the array is greater than the key?
In that case, we're going to go ahead and return low- 1.
Play video starting at 2 minutes 19 seconds and follow transcript2:19
So look at an example.
We've got this array with 7 elements in it, and the element 20 is repeated in it.
So if we search in this array for 2, we want to go ahead and return 0,
saying that every element in the array is larger than this.
If on the other hand, we look for 3, we're going to return 1.
If we look for 4, we're also going to be returning 1.
which really signifies between 1 and 2.
That is, it's bigger than 3 but it's less than 5.
Play video starting at 2 minutes 46 seconds and follow transcript2:46
If we search for 20, we return 4.
Or we might also return 5.
Either one of those is valid because 20 is present at each of those indexes.
And if we search for 60, we'll return 7.
But if we search for 70, we'll also return 7.
Play video starting at 3 minutes 2 seconds and follow transcript3:02
So let's look at our implementation of BinarySearch.
So we're going to write a recursive routine, taking in A, low, high and key,
just as we specified in the problem statement.
Play video starting at 3 minutes 12 seconds and follow transcript3:12
First our base case.
If we have an empty array, that is if high is less than low, so
no elements, then we're going to return low-1.
Play video starting at 3 minutes 22 seconds and follow transcript3:22
Otherwise, we're going to calculate the midpoint.
So we want something halfway between low and high.
So what we're going to do is figure the width, which is high- low,
cut it in half, so divide by 2, and then add that to low.
That might not be an integer
because of the fact that high- low divided by 2 may give us a fractional portion,
so we're going to take the floor of that.
Play video starting at 3 minutes 49 seconds and follow transcript3:49
For example, in the previous case, we had 1 to 7, it'll be 7- 1 is 6,
divided by 2 is 3 + our low is 1 is 4, so the midpoint would be 4.
We'll see an example of this shortly.
Play video starting at 4 minutes 1 second and follow transcript4:01
And now we check and see is the element at that midpoint equal to our key.
If so, we're done, we return it.
If not, the good news is of course,
we don't have to check all the other elements, we've ruled out half of them.
So if the key is less than the midpoint element,
then all the upper ones we can ignore.
So we're going to go ahead and now return the BinarySearch in A from low to mid- 1,
completely ignoring all the stuff over here.
Otherwise, the key is greater than the midpoint, and again,
we can throw away the lower stuff and go from midpoint + 1, all the way to high.
Play video starting at 4 minutes 41 seconds and follow transcript4:41
Let's look at an example.
So let's say we're searching for the key 50 in this array with 11 elements.
So we'll do a binary search on this array, from 1 to 11, looking for 50.
Low is 1, high is 11.
We'll calculate the midpoint, the midpoint will be 11- 1 is 10,
divided by 2 is 5, add that to 1, the midpoint is 6.
And now we check and see is the midpoint element equal to 50?
Well, no.
The midpoint element is 15 and the element we are looking for,
the key we're looking for, is 50.
So we're going to go ahead and ignore the lower half of the array and
now call binary search again, with the low equal to 7, so one more than the midpoint.
Play video starting at 5 minutes 25 seconds and follow transcript5:25
So now we've got a smaller version of the problem.
We're looking for 50 within the elements 7 to 11, we'll calculate the midpoint.
11- 7 is 4 divided by 2 is 2, so
we'll add that to 7 to get a midpoint of 9.
We check, is the element at index 9 equal to our key?
The element at index 9 is 20, our key is 50, they're not equal.
Play video starting at 5 minutes 49 seconds and follow transcript5:49
However, 50 is greater than 20, so we're going to go ahead and
make a new recursive call with midpoint + 1, which is 10.
So, again, we do our binary search from 10 to 11.
We calculate the midpoint.
Play video starting at 6 minutes 4 seconds and follow transcript6:04
High- low, 11- 10 is 1, divided by 2 is one-half + 10 is 10 and a half,
we take the floor of that, we get 10 and a half, so our midpoint is 10 and a half.
I'm sorry, our midpoint is 10.
And now we check.
Is the value at element 10 equal to our key?
Well the value at element 10 is 50, our key is 50 so yes.
We're going to go ahead and return that midpoint which is 10.
In summary then, what we've done is broken our problem into
non-overlapping subproblems of the same type.
We've recursively solved the subproblems.
And then we're going to combine the results of those subproblems.
We broke the problem into a problem of size half
(slightly less than half).
We recursively solved that single subproblem and
then we combined the result very simply just by returning the result.
Play video starting at 6 minutes 53 seconds and follow transcript6:53
In the next video, we're going to go ahead and look at the runtime for
binary search,
along with an iterative version.
And we'll get back to actually discussing that problem that I discussed with
the dictionary translation problem.
Play video starting at 7 minutes 9 seconds and follow transcript7:09
We'll see you shortly.
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
1 point
1023
10
11
2.Question 2
Can you use binary search to find number 8 in the array [1, 24, 25, 23, 17, 8, 9]?
1 point
3.Question 3
You have a sorted array with 1023 elements. You use binary search to determine whether number
239 is present in this array or not. How many elements of the array will you compare it with if
number 239 is not present in this array?
1 point
10
1023
4.Question 4
What is the maximum number of iterations a binary search will make to find some number in the
array [1, 2, 3, 5, 8, 13, 21, 34]?
1 point
I understand that submitting work that isn’t my own may result in permanent failure of this course
or deactivation of my Coursera account. Learn more about Coursera’s Honor Code
Polynomial multiplications:
Polynomial Multiplication
6 min
7 min
6 min
Reading: Resources
5 min
3 questions
Problem Overview and Naïve Solution
In this lecture we're going to talk about a more complicated divide-and-conquer algorithm to solve
polynomial multiplication. So first we'll talk about what polynomial multiplication is. So polynomial
multiplication is basically just taking two polynomials and multiplying them together. It's used in a
variety of ways in computer science. Error correcting codes, if you want to multiply large integers
together. Right, so if you've got thousand digit integers and you want to multiply them together,
there's a quicker way than doing it the normal way you learned in elementary school. And that uses
the idea of multiplying polynomials. It is used for generating functions, and for convolution. Let's look
at an example. So let's say you have polynomial A, which is 3 x squared + 2x + 5, and polynomial B,
which is 5 x squared + x + 2. If you multiply them together you get 15 x to the fourth + 13 x cubed + 33
x squared + 9x + 10. Why is that? Well, let's look, for instance the 15 x to the fourth comes from
multiplying 3 x squared times 5 x squared, that's 15x to the fourth. The 10 comes from multiplying 5
by 2. The 13 x cubed comes from 3 x squared times x, which is 3 x cubed, plus 2x times 5 x squared,
which is 10 x cubed. For a total of 13 x cubed. So let's look at the problem statement. So we're going
to have two n- 1 degree polynomials, all right? a sub n-1 is the coefficient of the x to the n-1 all the
way down to a0 which is the coefficient of the x to the 0 term or the one term.
Play video starting at 1 minute 41 seconds and follow transcript1:41
And then we similarly have a b polynomial as well.
Play video starting at 1 minute 45 seconds and follow transcript1:45
Now first you may wonder what happens if you actually want to multiply polynomials that don't
happen to have the same degree? What if you want to multiply a degree three polynomial times a
degree two polynomial? Right, where the degree is just the exponent of the highest term.
Play video starting at 2 minutes 4 seconds and follow transcript2:04
Well in that case, what you you could do is just pad out the smaller polynomial, the lower degree
polynomial, to have zeros for its earlier coefficients. I'll give an example of that in just a second. And
then the product polynomial is the result that we want to come up with so that's a higher degree
polynomial, right? If our incoming polynomials, are degree n- 1, then we're going to get a term of the
x to the n- 1 in a, times x to the n- 1 in b, and that's going to give us an x to the 2n- 2 in the c term. So,
the c sub 2n-2 term, comes about from multiplying the a sub n-1 term and the b sub n-1 term. The c
sub 2n-3 term comes from the a sub n-1, b sub n-2, and a sub n-2, b sub n-1. So it's got two terms that
multiply together. The c sub 2n-4 term would have three terms that multiply together. And we have
more and more terms that get multiplied and summed together, and then fewer and fewer back
down. So c sub 2 has three pairs which get added together, c sub 1 has two pairs and c sub 0 has one
pair. So here's an example. This is actually the same example we had before. So n is three and all we
need, notice, are the coefficients. We don't actually need to have the x's written out. So 3, 2, and 5
means 3 x squared plus 2x plus 5. 5, 1, 2 means 5 x squared plus x plus 2.
Play video starting at 3 minutes 38 seconds and follow transcript3:38
What if B were only a degree one polynomial? It was just x plus 2. Well then we would set B equal 0,
1, 2. That is, B's x squared term is 0 x squared. So A(x) is this, B(x) is that. When you multiply them
together, we get the same result we got before. And now we just pluck off the coefficients here, so
the 15, the 13, the 33, the 9, and the 10. And that's our resulting answer: those coefficients. So let's
look at a naive algorithm to solve this.
Play video starting at 4 minutes 15 seconds and follow transcript4:15
The naive algorithm basically just says, well first off, let's create a product array. This is basically going
to be the C, the result, and it's going to be of highest degree 2n-2. So it's going to have 2n-1 terms all
the way from the 0 term up to the 2n-2 term. So we'll initialize it to 0, and then we'll have a nested for
loop. For i equals 0 to n-1, for j equals 0 to n-1. And at every time, what we'll do is we will calculate a
particular pair. So we'll calculate the A[i], B[j] pair, multiply them together and add them into the
appropriate product. Which is the appropriate product to put it in? It's the i + j case. As an example,
when i is 0 and j is 0, we calculate A at 0 times B at 0 and we add that to product at 0. So that says the
two zero-degree terms in A and B get multiplied together to the zero-degree term in C. At the other
extreme, if i is n-1 and j is n-1, we take A at n-1 times B at n-1 and we store that in the product of 2n-
2. As you can see, the intermediate values in product are going to have more terms added to them
than the edges.
Play video starting at 5 minutes 41 seconds and follow transcript5:41
And, of course, then we return the product. How long does this take? Well, this takes order n
squared. Clearly, we've got two for loops, one smaller for loop that's from 0 to 2n-2, so that's order n.
And then a nested for loop, where i goes from 0 to n-1, j goes from 0 to n-1, so those each go through
the first one n times, the second one n squared times. So our runtime is O(n squared).
Play video starting at 6 minutes 11 seconds and follow transcript6:11
In the next video, we're going to look at a divide and conquer algorithm to solve this problem.
Although, we'll see that it too will be somewhat naive.
Polynomial Multiplication
Submit your assignment
Master Theorem
4 min
9 min
Reading: Resources
10 min
1 question
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Master Theorem: Section 2.2 of [DPV08]
References
[DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms (1st Edition).
McGraw-Hill Higher Education. 2008.
Master Theorem
Submit your assignment
Sorting problem:
Problem Overview
Hello, and welcome to the sorting problem lesson.
Play video starting at 5 seconds and follow transcript0:05
As usual, we start with a problem I'll review.
Play video starting at 10 seconds and follow transcript0:10
So sorting is a fundamental computational problem. Your input in this problem consists of a sequence
of elements, and your goal is to output this element in, for example, non-decreasing order.
Play video starting at 24 seconds and follow transcript0:24
The formal statement of this problem is as follows. You are given a sequence of finite elements. We
will usually denote the sequence by A throughout this lesson. And your goal is to output these same
elements in non-decreasing order.
Play video starting at 40 seconds and follow transcript0:40
Once again, sorting is an important computational task used in many efficient algorithms. For some
algorithms, it is just as important to process given elements in non-decreasing order, going from
smaller ones to larger ones. In some other algorithms, just by sorting your input data, you gain a
possibility to perform your queries much more efficiently. A canonical example of such situation is a
search problem. In this problem, we are given a sequence of finite elements. And your goal is to check
whether a particular element is present in your sequence. A simple way to solve this problem, is of
course, just to scan your input sequence from left to right and to check, whether your element is
present in this sequence. This gives you a linear kind algorithm. And you know already that if you
input data, if you input sequences you sorted, then you can do this much more faster. Basically, in
time, in logarithmic time, in the size of your input sequence. So ou first compare your element to the
middle element.
Play video starting at 1 minute 45 seconds and follow transcript1:45
If it is just few element, then you are done, if it is not, you continue with the left half of your sequence
or the right half of your sequence. So in logarithmic number of comparison, and the worst case, you
will be able to say whether your element is present in this sequence or not. So, if you are given a
sequence and you are expecting many such queries. You're expecting to be asked to check whether a
given object is present or not. For me such objects, then it just makes sense to first sort your input
data and only then perform all these queries. This will give you a much more efficient algorithm in
general. All right. And this is only a small example. We will see many other situations, where sorting
your data first helps to perform queries much more efficiently. So in the subsequent videos of this
lesson, we will study many efficient sorting algorithms.
Selection Sort
In this video, we will study one of the simplest sort of algorithms called selection sort.
Play video starting at 6 seconds and follow transcript0:06
So it's main idea is quite simple, we just keep growing the sorted part of our rate. So let me illustrate
it on a toy example, assume we're given a sequence of links. Five consistent of five integers, eight four
two five and two. So we start just by finding one of the minimum elements in this array, in this case it
is two. Now lets just do the following, lets just swap it with the first element of our array.
Play video starting at 37 seconds and follow transcript0:37
After swapping, two stays in its final position, so two is the minimum value of our array and it is
already in its first position. Now let's do the fun one, let's just forget about this element. It is already
in its final position and let's repeat the same procedure was the remaining part of our array. Namely,
we began first find the minimum value, it is again two. We'll swap it with the first element of the
remaining part and then we'll just forget about this element. So again, we find the minimum value
which is now four with what was the first element of the remaining part which is now the sole
element of our array. And then, we just forget about first three elements and we continue with only
remaining parts. So once again, we just keep growing the sorted part of our array. In the end, what
we have, is that the whole array is sorted. The pseudocode shown here on the slide, directly
implements the idea of the selection sort algorithm that we just discussed.
Play video starting at 1 minute 40 seconds and follow transcript1:40
So here we have a loop where i ranges from 1 to n. Initially, i is equal to 1. Inside this loop, we
compute the index of a minimal value in the array, from, within the list from i to n. We do this as
follows, so we create a variable, minlndex which is initially equal to i. And then we go through all the
remaining elements inside this part, I mean through elements from i + 1 to n. And if we find a smaller
element we update the variable minlndex. So in the end of this for loop, what we have is that
minindex is a position of a minimal element inside the array from i to m.
Play video starting at 2 minutes 25 seconds and follow transcript2:25
Then we swap this element with the element Ai.
Play video starting at 2 minutes 30 seconds and follow transcript2:30
Namely, when i is equal to one, what we've done, we've found the minimal element in the well array
and we've swapped it with the first element. So now, the first element of our array is in its final
position. Then under second iteration of our loop, we do the same actually. We find the minimum
value, the position of a minimum value inside the remaining part of our array and put it on the second
place. On the sort loop we find the minimum value in this remaining part and put it on the place and
so on. So we keep growing the sorted part of our array. So when it would be useful to check the
online visualization to see how it goes, so let's do this.
Play video starting at 3 minutes 15 seconds and follow transcript3:15
This visualization shows how selection sort algorithm performs on a few different datasets. Namely on
the random datasets, on a sequence which is nearly sorted. Also on a sequence which is sorted in
reversed order. And on a sequence which contains just a few unique elements. So let's run this
algorithm and see what happens.
Play video starting at 3 minutes 44 seconds and follow transcript3:44
So you can see that indeed this algorithm just grows the sorted region, the sorted initial region of our
array. So another interesting property is it is revealed by this visualization is the following. So the
running time of this algorithm actually does not depend on input data. So it only depends on the size
of our initial sequence.
Play video starting at 4 minutes 11 seconds and follow transcript4:11
The other [INAUDIBLE] time of how algorithm is quadratic and this is not difficult to see right? So what
we have is two nested loops. In the outer loop, i ranges from 1 to n. In the inner loop, j ranges from i
plus 1 to n, to find a minimum inside the remaining part of our array. So in total we have quadratic
number of iterations. At this point however, we should ask ourselves whether our estimate was right
in time of the selection, so our algorithm was too pessimistic. And this is whar I mean by this. So recall
that we have two nested loops. In the outer loop, i ranges from 1 to n. In the inner loop, g ranges
from i + 1 to n. So when i is equal to 1, the number of iterations of the inner loop is n- 1. However,
when i is equal to 2, the number of iterations of the inner loop is n- 2, and so on. So when i increases,
the number of iterations of the inner loop decreases. So a more accurate estimate for the total
number of iterations of the inner loop would be the following, (n- 1) + (n- 2) + (n- 3) and so on.
Play video starting at 5 minutes 19 seconds and follow transcript5:19
So it is definitely less than n-squared. However we will show this it is equal to n-squared. Namely, this
is xx n-squared, and this it is roughly equaled n-square divided by two.
Play video starting at 5 minutes 33 seconds and follow transcript5:33
The sum is that we need to estimate is called an Arithmetic Series, and there is a known formula for
this for this sum. Namely 1 + 2 + 3 +, and so on, n, is equal to n(n+1)/2. And this is how we can prove
this formula.
Play video starting at 5 minutes 52 seconds and follow transcript5:52
Let's just try it, all our n integers in a row, 1, 2, and so on, n. Below them let's write the same set of
integers, but in the reverse order. So, n, then n minus 1, and so on, 2, and 1. Then what we get is a
row of size 2 by n. Having n columns, and in each column, the sum of the corresponding two integers
is equal to n plus 1. Great, so in the first column we have n and one, and in the second column we
have two and minus one and so on and in the last column we have n and one. So the sum in each
column is equal to n plus one and zero n columns. Which means that the sum of all the numbers in
our table is equal to n, when supplied by n plus one. So since this table contains our sum, the sum of
the integers from 1 to n twice, we conclude that the sum of all the numbers from 1 to n is equal to
n(n+1)/2. Another possibility to find this formula, to see why this formula is correct is to take a
rectangle of size n, of dimensions n multiplied by n plus 1. So it's area is equal to n multiplied by n plus
one. And to cut it into two parts such as it's shown in the slide, such as the area of each of these two
parts is equal to 1 + 2 + and so on n. We're all ready to conclude. So we've just discussed the selection
sort algorithm. This algorithm is easy to implement, easy to analyze, and it's running time is n
squared, where n is the size of the input sequence. So it sorts the input sequence and array in place.
Meaning that it requires almost no extra memory. I mean, all extra memory which is required by this
algorithm is only for storing indices, like i, j and m index. There are many other quadratic algorithms,
like insertion sort and bubble sort. We're not going to cover them here, and instead, in the next video
we will proceed, to do a faster, a faster sort algorithm.
Merge Sort
Play
Volume
0:01/10:53
Subtitles
Settings
Full Screen
Notes
All notes
Click the “Save Note” button when you want to capture a screen. You can also highlight and save
lines from the transcript below. Add your own notes to anything you’ve captured.
Save Note
Discuss
Download
Help Us Translate
Interactive Transcript - Enable basic transcript mode by pressing the escape key
You may navigate through the transcript using tab. To save a note for a section of text press CTRL
+ S. To expand your selection you may use CTRL + arrow key. You may contract your selection
using shift + CTRL + arrow key. For screen readers that are incompatible with using arrow keys for
shortcuts, you can replace them with the H J K L keys. Some screen readers may require using
CTRL in conjunction with the alt key
Play video starting at 0 seconds and follow transcript0:00
In this video, we will study the so-called merge sort algorithm.
It is based on the divide and conquer technique,
which main idea is the following.
To solve a given computational problem, you first split it into two or
more disjoint subproblems, then you solve each of these subproblems recursively.
And finally, you combine the results that you get from the recursive
calls to get the result for your initial subproblem.
And this is exactly what we're going to do in merge sort algorithm.
So let's show a toy example.
We're given an array of size eight, and we are going to sort it.
First, we just split this array into two halves of size four,
just the left half and the right half.
Then we make two recursive calls to sort both these parts.
These are two results in arrays.
Now what remains to be done is to merge these two arrays into one,
these two arrays of size four into one array of size eight.
Well, let's think how this can be done.
First of all,
I claim that it is easy to find the minimal value in the resulting array.
Indeed, we know that the minimum value in this case in the first array is two, and
the minimum value in the second array is one.
Which means that the minimum value in the result in merge array must be one.
So let's take one from the right side of array, put it in the resulting array and
forget about it.
It is already in its right place.
Play video starting at 1 minute 31 seconds and follow transcript1:31
What remains is an array of size four and
an array of size three that still need to be merged.
Well, again, it is easy to find the minimum value of
Play video starting at 1 minute 42 seconds and follow transcript1:42
the result of merging these two arrays.
In this case, it is two, because the minimum value in the array of size four
is two, and the minimum value in the arrays of size three is six.
So two is smaller than six, so we get two out of our left array,
put it into the resulting array after one, and press hit.
In the end, we get the following sorted array.
Again, the pseudocode of the merge sort algorithm directly implements this idea.
So this pseudocode takes an input array A of size n as an input.
And if n is equal to 1, then in this case, just nothing needs to be done,
we can just return the rate A itself.
If n is greater than 1, on the other hand, then we split the rate
A into two roughly equal parts and sort them recursively.
We call them B and C here.
Then the only thing that needs to be done is to merge these two sorted arrays.
So this is done in the procedure merge, which we will present on the next slide.
And finally, we just return the result of this merging procedure.
Play video starting at 2 minutes 55 seconds and follow transcript2:55
The pseudocode of the merging procedure is also straightforward.
Assumes that we are given two sorted arrays, B and C, of size p and
q respectively, and we would like to merge them into a sorted array of size p + q.
So the first thing we do is create an array of size p + q in array D.
It is initially empty.
Then we keep doing the following thing.
So what is the minimum value among all the values stored in the arrays B and C?
Well, it is easy to find.
We know that the first element in the array B is its smallest element, and
the first element in the array C is its smallest element.
So the smallest one among these two is the smallest
element inside the unit of these two arrays.
So we just find the minimum of these first elements and move it from one of
these arrays to the results in array D, and forget about this element completely.
Now what is left is essentially the same problem.
We're left with two sorted arrays, and we still need to merge them.
So we do it exactly the same.
We take the first two elements,
we compare them and move the smaller one to the resulting array.
And we keep doing this while both of these arrays are empty.
I mean, we need this to be able to take their first elements.
When one of them becomes empty,
we just copy the rest of the other array to the resulting array D.
I mean, where rest to the resulting array D.
Well, it is not difficult to see that this procedure is correct, and the trying
time is p + q, namely, the size of the array p plus the size of the array q.
And this just because we just can both of these arrays from left
to right in the run of this merging procedure.
This is how sorting our initial array of size eight
by the merge sort algorithm looks like.
So the merge sort algorithm first splits
the initial array of size eight into two arrays of size four.
Each of these arrays of size four in turn is split into two arrays of size two, and
each of them is split into two arrays of size one.
Then merge procedure starts merging these arrays of size one into arrays
of size twos and into, then these arrays of size two into a size four.
And finally, it merges the result into arrays of size four,
into the resulting array of size eight.
We are now going to prove that the running time of the merge sort algorithm,
on a sequence containing n elements, is big O of n log n.
Know that this is significantly faster than a quadratic selection sort algorithm.
For example, it is perfectly okay to sort the sequence of size 1 million, for
example, 10 to the 6th, on your laptop using merge sort algorithm.
While for the quadratic time selection sort algorithm,
sorting a sequence of size 10 to the 6th, 1 million, will take roughly
10 to the 12th operations, which is too much for modern computers.
Play video starting at 6 minutes 11 seconds and follow transcript6:11
Okay, so to prove this lemma, to prove the upper bound on the running
time of the merge sort algorithm, first know that to merge two parts
of size n over 2 of our initial array, takes the linear time.
Namely, big O of n, because while the left part has size n over 2,
the right part has size n over 2.
And for merging, we basically just combo these parts from left to right.
So it takes just a linear amount of work to do this.
Which, in turn means, that if we denote by T of n the running time of our merge sort
algorithm, then it satisfies the following recurrence.
T(n) is at most 2T(n / 2) + big O(n).
Here 2T(n / 2) could response to two recursive calls.
So we denote it by T(n), the running time of our algorithm on input of size n.
So when we sort two sequences of size n / 2,
we spend time twice T(n / 2).
So the big O of n term corresponds to what we do before we make recursive calls and
what we do after recursive calls.
So what we do before is just split the input array into two halves.
What we do after is merging the results of two arrays into one array of size n.
So it is not difficult to see that all of this can be done in linear time.
So we get this recurrence, and on the next slide,
we're going to show that this recurrence implies that the running
time of our algorithm is bounded from above by n log n.
To estimate the running time of this algorithm,
let's consider its recursion tree.
Namely, at the top of this tree, we have one array of size n.
So for this array of size n, we make two recursive calls for
arrays of size n over 2.
Each of these arrays of size n over 2 in turn is split into two
arrays of size n over 4.
So we get four arrays of size of n over 4 and so on.
So in this tree, we have log n levels.
Now let's estimate the work done at each of the levels of these three separately,
namely, once again, to solve a problem of size n.
To sort an array of size n, we first prepare to make recursive calls.
In this case, we just split the array into two halves of size n over 2.
Then we do make recursive calls, and then we need to combine the results.
So all the work now inside recursive calls will be accounted for
on the lower levels of this tree.
So now what we are going to do is to account for
only the work done before the recursive calls and
after the recursive calls at each separate level.
And we know already that it takes linear time to do this.
I mean, if we have an array of size n,
it takes linear time to split it into two halves.
And then it takes linear time to combine the results of
recursive calls into one array.
So let's just denote this time by cn,
I mean let's denote the hidden constant inside big O by c.
Then what we can say is that on the top level we spend time cn.
Then on the next level, for each subarray,
we spend time c times n over 2, because the size of array is n over 2.
However, we have 2 arrays, so the total work that we do at this level is 2
multiplied by c, multiplied by n over 2, which is again just cn.
On the next level, we spend time 4 because we have 4 arrays multiplied by c,
multiplied by n over 4, because the size of the array is now n over 4.
This is a cn again, and so on.
So we have log n levels.
At each level, we do roughly cn operations.
So the total number of operations in our algorithm is cn log n,
which proves our lemma.
So again, what we've just proved is that the running time of
the merge sort algorithm is big O of n log n.
So in the next video, we will show that actually no algorithm,
no comparison based algorithms, to be completely formal, can sort a given
sequence of n elements asymptotically faster than in n log n time.
Which actually means that the merge sort algorithm is asymptotically optimal.
Merge Sort
In this video, we will study the so-called merge sort algorithm. It is based on the divide and conquer
technique, which main idea is the following. To solve a given computational problem, you first split it
into two or more disjoint subproblems, then you solve each of these subproblems recursively. And
finally, you combine the results that you get from the recursive calls to get the result for your initial
subproblem. And this is exactly what we're going to do in merge sort algorithm. So let's show a toy
example. We're given an array of size eight, and we are going to sort it. First, we just split this array
into two halves of size four, just the left half and the right half. Then we make two recursive calls to
sort both these parts. These are two results in arrays. Now what remains to be done is to merge these
two arrays into one, these two arrays of size four into one array of size eight. Well, let's think how this
can be done. First of all, I claim that it is easy to find the minimal value in the resulting array. Indeed,
we know that the minimum value in this case in the first array is two, and the minimum value in the
second array is one. Which means that the minimum value in the result in merge array must be one.
So let's take one from the right side of array, put it in the resulting array and forget about it. It is
already in its right place.
Play video starting at 1 minute 31 seconds and follow transcript1:31
What remains is an array of size four and an array of size three that still need to be merged. Well,
again, it is easy to find the minimum value of
Play video starting at 1 minute 42 seconds and follow transcript1:42
the result of merging these two arrays. In this case, it is two, because the minimum value in the array
of size four is two, and the minimum value in the arrays of size three is six. So two is smaller than six,
so we get two out of our left array, put it into the resulting array after one, and press hit. In the end,
we get the following sorted array. Again, the pseudocode of the merge sort algorithm directly
implements this idea. So this pseudocode takes an input array A of size n as an input. And if n is equal
to 1, then in this case, just nothing needs to be done, we can just return the rate A itself. If n is greater
than 1, on the other hand, then we split the rate A into two roughly equal parts and sort them
recursively. We call them B and C here. Then the only thing that needs to be done is to merge these
two sorted arrays. So this is done in the procedure merge, which we will present on the next slide.
And finally, we just return the result of this merging procedure.
Play video starting at 2 minutes 55 seconds and follow transcript2:55
The pseudocode of the merging procedure is also straightforward. Assumes that we are given two
sorted arrays, B and C, of size p and q respectively, and we would like to merge them into a sorted
array of size p + q. So the first thing we do is create an array of size p + q in array D. It is initially
empty. Then we keep doing the following thing. So what is the minimum value among all the values
stored in the arrays B and C? Well, it is easy to find. We know that the first element in the array B is its
smallest element, and the first element in the array C is its smallest element. So the smallest one
among these two is the smallest element inside the unit of these two arrays. So we just find the
minimum of these first elements and move it from one of these arrays to the results in array D, and
forget about this element completely. Now what is left is essentially the same problem. We're left
with two sorted arrays, and we still need to merge them. So we do it exactly the same. We take the
first two elements, we compare them and move the smaller one to the resulting array. And we keep
doing this while both of these arrays are empty. I mean, we need this to be able to take their first
elements. When one of them becomes empty, we just copy the rest of the other array to the resulting
array D. I mean, where rest to the resulting array D. Well, it is not difficult to see that this procedure is
correct, and the trying time is p + q, namely, the size of the array p plus the size of the array q. And
this just because we just can both of these arrays from left to right in the run of this merging
procedure. This is how sorting our initial array of size eight by the merge sort algorithm looks like. So
the merge sort algorithm first splits the initial array of size eight into two arrays of size four. Each of
these arrays of size four in turn is split into two arrays of size two, and each of them is split into two
arrays of size one. Then merge procedure starts merging these arrays of size one into arrays of size
twos and into, then these arrays of size two into a size four. And finally, it merges the result into
arrays of size four, into the resulting array of size eight. We are now going to prove that the running
time of the merge sort algorithm, on a sequence containing n elements, is big O of n log n. Know that
this is significantly faster than a quadratic selection sort algorithm. For example, it is perfectly okay to
sort the sequence of size 1 million, for example, 10 to the 6th, on your laptop using merge sort
algorithm. While for the quadratic time selection sort algorithm, sorting a sequence of size 10 to the
6th, 1 million, will take roughly 10 to the 12th operations, which is too much for modern computers.
Play video starting at 6 minutes 11 seconds and follow transcript6:11
Okay, so to prove this lemma, to prove the upper bound on the running time of the merge sort
algorithm, first know that to merge two parts of size n over 2 of our initial array, takes the linear time.
Namely, big O of n, because while the left part has size n over 2, the right part has size n over 2. And
for merging, we basically just combo these parts from left to right. So it takes just a linear amount of
work to do this. Which, in turn means, that if we denote by T of n the running time of our merge sort
algorithm, then it satisfies the following recurrence. T(n) is at most 2T(n / 2) + big O(n). Here 2T(n / 2)
could response to two recursive calls. So we denote it by T(n), the running time of our algorithm on
input of size n. So when we sort two sequences of size n / 2, we spend time twice T(n / 2). So the big
O of n term corresponds to what we do before we make recursive calls and what we do after
recursive calls. So what we do before is just split the input array into two halves. What we do after is
merging the results of two arrays into one array of size n. So it is not difficult to see that all of this can
be done in linear time. So we get this recurrence, and on the next slide, we're going to show that this
recurrence implies that the running time of our algorithm is bounded from above by n log n. To
estimate the running time of this algorithm, let's consider its recursion tree. Namely, at the top of this
tree, we have one array of size n. So for this array of size n, we make two recursive calls for arrays of
size n over 2. Each of these arrays of size n over 2 in turn is split into two arrays of size n over 4. So we
get four arrays of size of n over 4 and so on. So in this tree, we have log n levels. Now let's estimate
the work done at each of the levels of these three separately, namely, once again, to solve a problem
of size n. To sort an array of size n, we first prepare to make recursive calls. In this case, we just split
the array into two halves of size n over 2. Then we do make recursive calls, and then we need to
combine the results. So all the work now inside recursive calls will be accounted for on the lower
levels of this tree. So now what we are going to do is to account for only the work done before the
recursive calls and after the recursive calls at each separate level. And we know already that it takes
linear time to do this. I mean, if we have an array of size n, it takes linear time to split it into two
halves. And then it takes linear time to combine the results of recursive calls into one array. So let's
just denote this time by cn, I mean let's denote the hidden constant inside big O by c. Then what we
can say is that on the top level we spend time cn. Then on the next level, for each subarray, we spend
time c times n over 2, because the size of array is n over 2. However, we have 2 arrays, so the total
work that we do at this level is 2 multiplied by c, multiplied by n over 2, which is again just cn. On the
next level, we spend time 4 because we have 4 arrays multiplied by c, multiplied by n over 4, because
the size of the array is now n over 4. This is a cn again, and so on. So we have log n levels. At each
level, we do roughly cn operations. So the total number of operations in our algorithm is cn log n,
which proves our lemma. So again, what we've just proved is that the running time of the merge sort
algorithm is big O of n log n. So in the next video, we will show that actually no algorithm, no
comparison based algorithms, to be completely formal, can sort a given sequence of n elements
asymptotically faster than in n log n time. Which actually means that the merge sort algorithm is
asymptotically optimal.
Lower Bound for Comparison Based Sorting
Need to understand
In the previous video, we proved that the running time of the Warshall
algorithm on a sequence consisting of n elements is big log of n.
In this video we will show that this bond is essentially optimal.
We will do this by showing that any correct sorting algorithm
that sorts an object by comparing pairs of them.
Must make a clear stand log in operation such as particularly in the worst case.
Once again we say that the sorting algorithm is comparison based
if it sorts the given objects just by comparing pairs of them.
We can imagine the following situation, we have add objects, that look the same, for
example in walls, but have different weights.
And we also have pen balance.
And this pen balance is, our only way to compare pairs of these balls.
And our goal is to rearrange these balls, in order of increasing weights.
Play video starting at 1 minute 0 seconds and follow transcript1:00
So, for example, the two source and algorithms that we'll already consider it,
namely the selection sort algorithm and the merge sort algorithm are both
comparison based algorithms.
So for example, the selection sort algorithm at each region
finds the minimum value in the remaining part of the array.
And it does so exactly by comparing pairs of objects, right?
Also, the merge sort algorithm is also comparison based algorithm.
So it first splits an array into halves.
And then it needs to merge the two results in arrays.
And when merging the results in arrays, it also uses comparisons, right?
So we take the first two elements of two sorted arrays.
We compare them, and based on this comparison we take one of these elements
out of one of those two arrays and put it to the result in the array.
So this as a formal statement that we're going to prove.
It says that any comparison based algorithm
that sorts an object has running time.
At least big and n log n in the worst case.
So but in otherwise we can say the following.
Assume that you have an algorithm that sorts an object
by comparing pairs of them.
It can be the case that for some given both sequences of an object,
your algorithm performs less than analog operations.
Say, linear number of operations.
However, it cannot be the case that your algorithm always sorts in time,
asymptotically less than n log n.
Meaning that, there must exist, sequence of objects, on which your algorithm
Play video starting at 2 minutes 50 seconds and follow transcript2:50
will perform at least performing a login comparison to sort such sequences.
Any comparison based algorithm can be shown as a huge tree that contains all
possible sequences of comparisons that can be made by this algorithm.
For example, here on the slide.
We show a simple algorithm that sort three object.
Three objects.
So it starts by comparing a1 and a2.
If a1 happens to be smaller than a2,
then we proceed to comparing a2 and a3.
If a2 is smaller than a3, then we already know the permutation
of the input three objects in non-decreasing order.
Namely, we know that a1 is smaller than a2,
and we know that a2 is smaller than a3.
So we can just output the following permutation.
Play video starting at 3 minutes 49 seconds and follow transcript3:49
Right.
If on the other hand, a2 happened to be at least a3,
then at this point we already know that a2 is greater than a1.
And a2 Is no smaller than a3.
So at this point, we know that a2 is the maximum element among our three elements.
However, we still need to compare a1 and a3, so we do this comparison,
and based on its result, we output either this permutation or this permutation.
Play video starting at 4 minutes 21 seconds and follow transcript4:21
Well this was the case when a1 happened to be small as an a2.
However we need also to consider the case when a1 happened to be at least a2.
Play video starting at 4 minutes 32 seconds and follow transcript4:32
So we proceed similarly in this case.
So this is just a toy example for an algorithm for
a comparison based algorithm comparing three objects, sorting three objects.
However, such a huge tree can be drawn for any comparison based health algorithm.
So at the root of this tree we have the first comparison.
And its children will label just the next
comparison that is made based on the result of the first comparison and so on.
So each internal node is labeled with some comparison.
And each leaf is labeled with a permutation of m input objects.
Play video starting at 5 minutes 12 seconds and follow transcript5:12
A simple but crucial for this argument observation is that in this tree,
we must have at least n factorial leaves.
And this is because we have n factorial different permutations of n input objects.
Where n factorial is defined to be the product of n and minus one and
minus two and so on.
So why is that?
Why we must have any possible permutation as a leaf in our tree?
Well, this is just because it is possible that this permutation is a lead output,
is the right output of our algorithm.
So for example on our previous slide, on our toy example, we have three objects and
there are six possible permutations of these three objects, and
there are six leaves in our tree.
For example one of them is 213 and it says that the second element
is the smallest one, then goes the first element, and then goes the third element.
And indeed there are cases when this is the right answer.
Right?
So when the input data consists of three objects,
such that the second element is the smallest one,
the first one is the next one, and the third element is the largest one.
Right?
So once again, you have a huge tree which carries a comparison based algorithm.
There must be at least n factorial leaves,
because each possible permutation must be present as a leaf in our tree.
Play video starting at 6 minutes 44 seconds and follow transcript6:44
So on the other hand the maximal number of
comparisons made by our algorithm corresponds to the depths of our tree.
So the depths is defined as the maximal number of edges run away
from the root to the leaf, to some leaf of our tree.
So and this is exactly the maximal possible number of comparisons
which our algorithm makes.
So now we would like to show that d must be large in our case,
must be at least be big O omega of analog n.
And we know already that our tree contains many, many leaves.
Mean n factorial is a function that grows extremely fast.
Okay so, intuitively we would like to show that if a tree has many,
many leaves, then it has a large depth.
And at least intuitively this clear.
If you have a tree of very small depths then it must just a few leaves, right?
But, we know that it has many, many,
many leaves, in fact at least ten factorial leaves.
To formally show this we need the following, we need the following estimate.
The depths of a binary tree is at least a binary algorithm of its number
of leaves or equivalently 2 to the depths is at least its number of leaves.
Well this can be proved formally, but let me just show you this informally.
Let's concede a tree for example of depth 1.
So in this case, d is equal to 1.
And it is clear that the maximal possible number of leaves
in a tree of depth 1 is equal to 2.
So now, let's try to understand what is the maximal possible
number of leaves in a depth of In a tree of depth 2.
For example, this is a tree of depth 2.
This is another tree of depth 2, it has has three leaves.
And this is a tree of depth 2 that has maximal possible number of leaves,
in this case it is 4.
It is 2 to the d indeed.
And intuitively it is clear that to have a tree of depth d that has
maximal possible number of leaves.
We need to take a tree which has a full binary tree of depth d, right?
And this tree has exactly 2 to the d leaves.
So the maximal number of leaves in a tree of depth d is 2 to the d.
Play video starting at 9 minutes 15 seconds and follow transcript9:15
Which proves that 2 to the d is at least l.
Play video starting at 9 minutes 20 seconds and follow transcript9:20
Okay, so the last step that we need to show is that if we have n factorial
leaves, then the depths of our tree is at least big log n again.
And we will show this on the next slide.
Play video starting at 9 minutes 34 seconds and follow transcript9:34
It remains to estimate log of n factorial.
We're going to show here that log of n factorial is at least c times n log n.
Which means as it works that log of n factorial is big log of n log n.
To do this, we express n factorial as a product of 1, 2, 3.
And so on n minus 1, and
then right algorithm of product of these numbers as a sum of their algorithm.
So, log of n factorial is equal to log of 1 plus log of 2 plus log of 3 and
so on plus log of n.
So, this is a sum of an object.
Let's just throw away the first half of this and elements,
and leave only the second half.
So in this second half, we have n over two elements and
each of them is at least log of n over two, right?
So this has algorithms of numbers which are at least n over two.
So we have n over two.
Elements, each of them is at least algorithms of n over 2.
This allows us to conclude that log sum is at least 10 over 2 times log of n over 2.
And this in turn be big log of n for a simple reason.
So log n over 2 is equal to log n minus 1.
Well, this grows like log n, right?
Because log n is a growing function and one is a constant so
again minus one goes as log n.
And over grows as n, right?
So, this is up to constant factors, this is just n.
So, n over two times log n over two grows like n log n.
Play video starting at 11 minutes 16 seconds and follow transcript11:16
Okay so this concludes our proof, and
this concludes the proof of the fact that any comparison based
algorithm must make at least n log n adorations in the worst case.
Once again, another conclusion is that when merged sort algorithms that we
considered in the previous lecture e is asymmetrically optimal.
In the next video we will see an algorithm that actually sorts
n given objects in time less than n log n.
Actually in time just in linear time.
In time big log of n however,
it will sort the n given objects, knowing something about these objects.
It will only sort the given objects if the subject has small integers.
And we will sort them without actually comparing them to each other.
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Merge sort and lower bound for comparison based sorting: Section 2.3 of [DPV08]
Visualizations
Comparison based sorting algorithms by David Galles
sorting-algorithms.com
References
[DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms (1st Edition).
McGraw-Hill Higher Education. 2008.
Sorting
Submit your assignment
Quick Sort
Quick Sort
Video: LectureOverview
2 min
Video: LectureAlgorithm
9 min
Video: LectureRandom Pivot
13 min
15 min
Video: LectureEqual Elements
6 min
Video: LectureFinal Remarks
8 min
Reading: Resources
10 min
4 questions
Hello, and welcome to the next lesson in the Divide-and-Conquer model. This lesson is going to be
devoted to the quick sort algorithm, which is one of the most efficient and commonly used in practice
sorting algorithms.
Play video starting at 14 seconds and follow transcript0:14
Well, as usual, we start with the overview of this algorithm. The algorithm is comparison based,
meaning that it sorts the n given elements by comparing pairs of them. Its running time is also
asymptotically n log n, but not in the worst case, as was with the merge sort algorithm, for example,
but on the average case. This is because this algorithm is randomized, so it uses random numbers to
sort the given n objects. Well, we will explain later in this lesson what this means. Finally, as I said
before, this algorithm is very efficient in practice and, at the same time, not so difficult to implement.
This is a toy example explaining the main idea of the quick sort algorithm. So given an array, in this
case of size 11, let's take its first element. In this case it is 6. And let's do the following. Let's rearrange
all the elements in this array such that the element 6 stays in its final position. All the elements that
go before it are actually at most 6. And all the elements that go after 6, after this element, are greater
than 6. Well, we will show that this can be done by a single scan of the initial array. This is how the
resulting array looks like. So once again, 6 stays in its final position. All the elements before it are at
most 6. All the elements after it are greater than 6. So we do not need to move 6 anymore. It is
already in its final position. So what remains to be done is to sort all the elements that go before 6
and all the elements that go after 6. And this can be done just with two recursive calls to the same
algorithm, to the quick sort algorithm. So we do this, and immediately after these two recursive calls,
we have a sorted array. Well, in the next video we will explain all the details of this algorithm.
In this video, we'll provide the full outline of the Greek word algorithm.
So as you remember that algorithm is recursive, for
this reason we pass through this procedure [INAUDIBLE] a and
also doing this is l and r in this array for left and right and
this procedure saw the sub array inside r within this is form l to r.
Well we first check whether l is at least r.
And if yes then this means that they can respond in sub array contains at
most one element.
And this in turn means that nothing needs to be done so we just return.
Play video starting at 38 seconds and follow transcript0:38
Otherwise we call the partition procedure with the same parameters.
It returns an index m between l and r.
So it rearranges all the elements inside
this sub array with the following property.
After the call to this procedure, A of m stays in its final position,
meaning that all the elements to the left of an element A of m are at most A of m.
And all the elements to the right are greater than A of m.
Well once again, after the call to the partition procedure,
A of m stays in its final position.
So what remains to be done is to sort all the elements that are at most A to m.
They stay to the left of A of m, and all the elements that stay to the right.
So we do this just by making two recursive calls.
So this is how the wall algorithm looks pictorially.
Again, we are given an array A, with two indices l and r,
and we are going to sort the sub array inside from indices L to R.
So with first call the participation procedure which parameter A, l and r.
And it gives us an index m between l and r was the following property.
All elements to the left of them are at most the element A of m.
All the elements to the right are great as an A of m.
Then we make two recursive calls to sort the left part within this is from
l to m- 1 and to solve the right part within this is from m + 1 to r.
And immediately after these two recursive call, we have a sorted array.
So before showing the actual of the partition procedure,
we explain it's main ideas again, on a toy example.
So first of all, we will take is the element A[l] and denoted by x.
This will be called our pivot element.
So what pivot is exactly is the element with respect to which
we're going to partition our sub array.
So x will be placed in its final position.
So our goal now is to rearrange all the elements inside our current sub array so
that x stays in its final position and all the elements to the left of x.
At most x and all the elements to the right of x are greater than x.
So we will do this gradually increasing the region of already discovered elements.
So for this we will use a counter i, and we will maintain the following invariant.
So I will go from l + 1 to r, and
at each point of time when we have already have the i element
we will keep to region in sizes these region from l + 1 to i.
In the first region from l + y to j, we will keep all elements that are at most x.
In the second adjacent region within this is from j +
1 to i we will have all elements that are greater than x.
Let's see it for example.
Play video starting at 3 minutes 48 seconds and follow transcript3:48
So I assume that we are somewhere in the middle of this process.
In this case, x is equal to 6, and
we need to partition all the elements with respect to x.
We already have two sub regions so in the red region,
we keep all elements that are at most x.
There are at most 6 in the blue region we have holds elements that
are greater than 6.
Play video starting at 4 minutes 14 seconds and follow transcript4:14
Okay, now we move i to the next position and we discover the element 9.
So this element is greater than 6, so
we just need to extend the second region, the blue region.
The region of elements is at greater than 6.
So in this case we just do nothing.
Play video starting at 4 minutes 33 seconds and follow transcript4:33
Well the next case is more interesting, we move i to the next position, and
we discover the element 4.
In this case, we need to somehow move this element to the red region,
to the region of elements which at most 6.
So to do this we just swoop it to currently
first element of the blue region, in this case was 9.
So if we do this 4 will be the last element of currently red region and
9 will go to the blue region.
So we do this and now, we increase also the just
to reflect the fact that our red region had just been extended.
Then we will find to the next element so we discover element 7 which is
greater than 6 which means that we can just extend the blue region,
then we discover another element which is 6.
6 is at most 6 and it is actually equal to 6, so
we need to move it to the red region.
Again, we swap it with the first element of the blue region and
then we extend the red region.
We increase g to reflect the fact that the red region has just been extended,
then we discover another element, which is at most 6.
We move it to the end of the red region.
Play video starting at 5 minutes 55 seconds and follow transcript5:55
And finally, what we also need to do in the very end is to move the pivot
element which is 6 in this case to its final position.
And its final position actually can easily be found in this case.
So we have red region and we have blue region.
In red region, all the elements are at most 6, and in blue region,
all the elements are greater than 6.
So we can just swap 6 with the last element of the red region.
In this case it is 1, so if we swap these two elements then you
can see that all the elements in the blue region are indeed greater than 6.
All the elements in the red region are smaller than 6.
So we are done with this partition procedure.
Where now ready to present the Soutacot of the petition procedure.
We called it what we're going to do is to place some element x,
which is called the pivot, into it's final place so that all the elements before
x are at most x and all the elements after x is greater than x.
So as the pivot element in this procedure, we are going to use just the first
element of the correspondence of rate, so x is assigned A of l.
Play video starting at 7 minutes 14 seconds and follow transcript7:14
We're also going to remain the following subregions.
So first of all, we will readily increase the region of discovered elements.
So i goes from l +1 to r and inside this region of
[INAUDIBLE] elements, we will maintain two sub regions.
In the first region with indices from l +1 to j,
we will keep all of the elements at most x.
In the second region with indices from j+1 to i, we will keep all of the elements
that are greater than x and we will gradually and freeze the value of i.
So when i is increased, so I assumed that i has just been increase so
we discovered a new element of A of i.
So if A of i is greater than x then just the second
of region of elements that are greater than x,
is extended automatically and we do not need to do anything in this case.
However, if the newly discovered element is at most x,
then we need to move it to the first region.
So we do this as follows.
So we just increase the value of j to indicate the fact that
the first region has just been increased, and then swap the elements.
A[j] and A[i], so this way,
we just maintain our invariant each time when i is increased.
So in the very end, when i reaches the value of r,
we also need to place our initial element that states that at
the beginning our pivot between our two regions.
So for this we just swap elements A[l], so this is our pivot with element A[j].
And we then return the value j as an index of our pivot element.
Random Pivot
Now when the algorithm is present, we need to estimate the running time.
For the Quicksort algorithm, the running time analysis is a little bit tricky.
So before stating, and proving the theorem about it's running time,
let's be allowing partition.
First of all, let's consider a pathological case, when somehow,
it always happens that we select
the minimum value of our current subarray as our pivot element.
Well in this case, well let's see what happens with our current subarray.
Say of size n.
And we select its minimum value as a pivot.
And partition, the subarray with respect to the sum.
Since this is the minimum value,
it's final position is just the first position, is the resulting array, right?
Which means that we partition into two parts.
The first part is just empty, we have no elements smaller than our pivot.
And the second part, contains n- 1 elements,
Play video starting at 1 minute 1 second and follow transcript1:01
because all the remaining elements are greater than our current element.
Play video starting at 1 minute 6 seconds and follow transcript1:06
Okay, so in this case, if this happens at this iteration,
I mean at this call to partition procedure,
then we can write the running time of our Quicksort algorithm,
satisfies the following relation T of n is equal to n plus t of n minus one.
The term n here, the response to the running time of the petition procedure.
Well, it is actually big often.
But just to simplify let's put n here.
Let me also recall you that, well if we have an array of size n,
then the partition procedure indeed works in time,
big often, because it just becomes the subarray?
So now let's see what is the solution for this recurrence relation.
Well, we can just unwind this recurrence relation term by term.
So we have n plus T of n minus 1.
Let's replace T of n minus 1 by n minus 1 plus T of n minus 2.
Then we replace T of n minus 2 by n minus 2 plus T of n minus 3.
And we keep doing so.
So what is left is the following sum, n + (n- 1) + (n- 2) and so on.
So what we know is this sum already, so this is arithmetic series.
And we know that it grows quadratically.
Which give us something strange,
I mean our Quicksort algorithm works in quadratic time.
Which means that it is not quick, actually, right?
We'll resolve this issue later.
Now, let's consider a slightly different case.
Assume that somehow, we always partition into two parts,
such that one of them has size, for example, n- 5.
And the other one has the size four.
Well I claim that even in this case.
First of all, note that both these cases correspond to very unbalanced partitions.
In the first case, we have two parts one of size 0 and one of size n-1.
In the second case, we have two parts one of size five.
And one of size four and one of size n minus five.
So the size of stuff parts are very unbalanced.
They are very different.
Play video starting at 3 minutes 22 seconds and follow transcript3:22
Okay, so I claimed that in this case the running time,
is also going to be quadratic.
And this can also be shown, just be unwinding this recurrence relation.
So let's just throw away this T(4), and leave only T(n- 5).
Okay, so T(n) is equal to n plus T(n-5).
Let's replace T(n-5) with (n-5)+T(n)-10.
Let's then replace T of n minus ten with T of n
with n minus ten plus T of n minus 15 and so on.
So this leaves us with the following sum.
N plus n minus five plus n minus ten and so on and
this is also an arithmetic progression.
The only difference with the previous arithmetic progression is that now,
we have step five.
The difference between neighbors is five, but not one.
Well, still, this arithmetic progression has a linear number of terms.
Which means that it sums rows quadratically.
With the only difference that the hidden constant inside this set up is
smaller than in the previous case.
Now let's consider another pathological case.
Assume that it somehow so happens for some unknown reasons
that at each iteration at each call there's a partition procedure.
It partitions the array into roughly equal sizes.
Play video starting at 4 minutes 49 seconds and follow transcript4:49
Well in this case we can write the following reference relation
on the running time of our algorithm.
T of n is equal to T of n over 2 plus the linear term.
And we know this reference already.
It is exactly the reference the running time of the satisfies.
Right?
And we, proved that in this case t of n grows as n increases vertically.
Let me remind you, how we prove this.
We analyzed the.
So, in this three of the route we have one array of size n.
At the next level we have two arrays of size n over two n,
at the next level we have four rays of size n over four, and so on.
So the height of this tree is log base two, well it is basically logarithmic.
At each level the sum of the sizes of of full arrays is equal to N.
So we have array of size N at the top, two arrays of size N over two
at the next level, and four arrays of size N over four at the next level,
the size is still N, and so on.
At each level we spend a linear amount of work.
This is essential.
We spend a linear amount of work at each level, and we have a logarithmic number of
levels, which means we spent an N log N in time total.
Play video starting at 6 minutes 3 seconds and follow transcript6:03
Okay, let's consider another, again very pathological case.
I assume that we alway split an array of size n into two parts.
One of size n over 2, n over 10.
I'm sorry.
One of size 9n over 10.
So in this case the recurrence is the following.
T of n is equal to T of n, over 10.
Plus T of 9N over 10 plus a linear term.
I claim that even in this case we can prove [INAUDIBLE] again on
the running time of how well and this is how we can do this.
Well, lets again consider the [INAUDIBLE] because the [INAUDIBLE] of [INAUDIBLE]
algorithm.
In this case, it is not balanced.
Right?
Because when we go to the left branch,
we reduce the size of the current subproblem by 10.
And when we go to the right branch, we reduce the size of the current subproblem
only by factor of 10 divided by 9.
Which means that in our 3, the size of the left branch is
of the left most branch, is actually log based ten.
Of n while is the size of the right most branch is log based ten over nine over m.
Well, still the height of this of this three is logarithmic.
But the previous case is that nouns are based on the algorithm is different but
it's still constant.
It is log based, log based 9 of m.
And also, but also, the previous property is true.
Play video starting at 7 minutes 37 seconds and follow transcript7:37
The sum of the sizes of all arrays at each level is still equal to n.
It is at most n, actually.
At the root we have one array of size n.
At the next level we have two arrays, one of size n/10,
and the other one is of size 9n/10.
Right? So the size is still n.
At the next level it is the same, and so on.
So we have a logarithmic number of levels, and at each level we spend a linear amount
of work which gives us an n log n upper bound once again.
Okay, all this analysis of what about only pathological cases
if we always split in a balanced way or in an unbalanced way.
In reality, or just when we run a Greek algorithm on some array,
well some of the partitions are going to be balanced.
Some of the partitions are going to be unbalanced.
So will still do not know what is the actual running time
of the Greek algorithm.
We still need to determine this.
However, we already get an important message.
So running time of Algorithm of the Greeks are depends
on how balanced our partitions.
What we know know is the following, if all our politicians are balanced does
that make improved that the running time is at most n log n hypothetically.
At the same time if all of our partitions are unbalanced
then the running time is quadratic.
Play video starting at 9 minutes 1 second and follow transcript9:01
This means that we would like to have a way of selecting a pivot
element such that it always guarantees a balanced partition.
At the same time it is not clear at all how to do this.
How to guarantee that we can always peek quickly.
The pivot element with respect to this pivot,
the rate is partitioned in a balanced way.
So instead of this we will use the following elegant solutions, so
let's just select the pivot element from the current subarray randomly.
To implement this solution, we do the following.
Before following the partition procedure.
We just select a random index between l and m, and
we swap elements A[l] and this random element.
Okay, then we call partition, and then we proceed in a usual way.
Let me explain intuitively why selecting a random partition is going to
help us to prove a good upper bound on the running time of the Quicksort algorithm.
Well, for this, consider array A's that we're going to partition
with respect to random p and consider it sorted version.
Assume for the moment that all the elements inside our array are different.
Play video starting at 10 minutes 20 seconds and follow transcript10:20
In the sorted version, consider the middle half elements.
Well we can see that n/2 elements that stay exactly in the middle.
Well an important property of all these elements is the following: for each
of these n/2 elements there are at least n/4 elements that are greater than them.
And at least n over four elements that are smaller.
Well this means that if we select any of these elements inside our array a,
then the partition with respect to this element is going to be balanced.
Right?
In both parts there will be at least n over four elements.
Well these n over two elements stay somewhere in the initial array.
So they stay in the middle in the sorted array and
they stay somewhere in the initial array.
It doesn't matter for us.
What is important for us is that there are at least n over two
elements with respect to which the partition is going to be balanced.
Which means that with probability one half we will have a balanced partition.
And this happens to be enough to prove it with upper bound.
So we're going to show that the randomized Quicksort algorithm is actually very fast.
Well first of all it is fast in practice and
we will prove it's theoretical analog out upper bound when it's running time.
Play video starting at 11 minutes 44 seconds and follow transcript11:44
This is a formal statement of an upper bound on the running time of the Quicksort
algorithm that we are going to prove in the next video.
So I assume that we are given an array A, and assume for
the moment that all the elements of this array are pairwise different.
Then the average running time of the Quicksort algorithm
on this array consisting of n elements, is big o of n log n.
While the worst case running time of this algorithm is n squared.
Play video starting at 12 minutes 11 seconds and follow transcript12:11
Well let me explain the word on average here.
Well, this means that for any fixed array.
So if we are very unlikely with the random beats,
the running time potentially could be higher as an algorithm.
However, on average, and average is over all possible random beats.
The running time of the QuickSort algorithm is n log n.
And this is true for any input.
So this theorem doesn't say well, for Quicksort algorithm.
For some arrays, the running time is large, for
some arrays the running time is low.
But on average, the running time is good enough.
So it says that for any fixed rate, the average running time is then n log n.
Okay, so we are going to prove this theorem in the next video.
Equal Elements
In this video we address the issue of equal elements in the. So recall that we proved the upper bound
on the running time of the render Greek algorithm, in the assumption that all the elements inside the
given array are prioritized different.
Play video starting at 19 seconds and follow transcript0:19
And actually, we used essentially these assumptions. So, recall that we estimated the probability that
two elements, A prime of I and A prime of J are comparative. And we argued that if any element
between them is selected, the appearance is that they will not become period. However, if they are
equal, so if A prime of A is equal to A prime of J, this means actually that all the elements in this range
are equal. So if we select any element inside this range, in the middle of this range, as a pivot, then it
is not true that these two elements will go into different parts with respect to this element, this is just
because all of the elements inside this range are equal, which means that if we partition with respect
of this element, all of the elements in this range will be [INAUDIBLE] this element. So, they all will be
in the left part with respect to this element, okay? So our analysis doesn't work for equal elements
but let's see what happens in real life. What happens if we run the greek sort algorithm for the array
in which there are equal elements. For this, let's use the following online visualization. This
visualization shows how different selection, different certain algorithms that are formed on different
datasets. So there are eight certain algorithms here where we are now interested in this QuickSort
algorithm. And there are four different types of datasets. So the field datasets is just random
sequence. The next one is in sorted sequence, the next one is a reversed sequence and the next one
which is most interesting to us at the moment is a sequence that contains a few unique elements. So
let's see how the greek sort algorithm performs on the last dataset. So for this let's just run all the
algorithms, on all data sets. So let's see what happens here. So you, you may notice now that, for
example, have already sorted everything. And while greek sort have just finished to sort the last, the
last data set. So Greek sort is not, is not so fast on data sets that contains few unique elements and
this is why. So just consider a dataset that consists of elements that are all equal to each other. So all
elements are equal to each. This means that the selection, the partition procedure always
Play video starting at 3 minutes 5 seconds and follow transcript3:05
partitions the array with respect to the element x, right? And then in this case, one of the parts,
namely the part of the elements that are greater than x, is just empty. It has size zero. And the other
part has size n minus one. So the records and equalities, the records equalities on the running time of
how a algorithm on such a data set always satisfies the following relation, T of n is equal to T of n
minus 1 plus a linear term plus T of 0. And we know already, so this is an unbalanced partition. We
know the responds to the quadratic right in time so, which means that the running time of the quick
sort algorithm a very simple array. So it contains all the elements of this array are equal. Which means
that actually this array is already sorted. In this array our quick sort algorithm spends a quadratic time
to sort it to overcome this difficulty we'll do the following. Instead of partitioning our rate into two
regions. Namely these regions contain all elements that contain all x and all elements that are greater
than x. We are going to partition into three parts. The corresponding partition procedure is usually
called three-way partition. Formally, it returns two indices. m1 and m2, such that, all the elements
inside the region from m1 to m2 are equal to x. All the elements to the left of this region are smaller
than x. All the elements that are to the right of this region are greater than x.
Play video starting at 4 minutes 46 seconds and follow transcript4:46
So this is how it looks pictorially. We have three regions. So, from l to m1 minus 1, we have all
elements that are smaller than x. In the region from m1 to m2, we have all elements that are equal to
x.
Play video starting at 5 minutes 3 seconds and follow transcript5:03
In the region from m2 plus 1 to r, we have all elements that are greater than x. This procedure
actually can be implemented in a similar way to their regional partition procedure. It can be
implemented ties with a single kind of area with maintaing their regions or it can be implemented
with two counts. So we first split our rate into regions which contain elements of most x or greater
than x and then we split the region into two parts.
Play video starting at 5 minutes 36 seconds and follow transcript5:36
Well, this is how the modified randomized quick sort algorithm is going to apply.
Play video starting at 5 minutes 43 seconds and follow transcript5:43
So we just replace it, the cold partition procedure by a cold two partition suite procedure. Now we
have three regions, and, actually the middle region is in its final place, so we do not touch it after the
partition procedure. We make two recursive calls to the first region and to the last region. So, let's see
whether the resulting algorithm is indeed Greek. And for this, let's use the same visualization. The
resulting algorithm is shown here in the last column. Let's, once again, run all the algorithms and see
what happens in the last column.
Play video starting at 6 minutes 20 seconds and follow transcript6:20
Well we see that now, the results in Greek algorithm, is indeed Greek.
In this last video of the Quicksort lesson, I would like to address two implementation issues. So the
first issue is about space complexity of the QuickSort algorithm. So on one hand, when sorting an
array by a Quicksort algorithm, we do not use any additional space. We just partition the array and
with small elements inside the array. On the other hand, the QuickSort algorithm is a recursive
algorithm. And when we make a recursive call we store some information on this tech. Right? So on
one hand it is possible to show that the average recurrent depths is logarithmic. Meaning that we
need only a logarithmic additional space. On the other hand, there is a very nice and elegant trick that
allows to re-implement the QuickSort algorithm, such that it's worst case space complexity is at most
logarithmic.
Play video starting at 1 minute 3 seconds and follow transcript1:03
So for this, let's recall that the QuickSort algorithm contains of the call to the partition procedure and
then of two recursive calls.
Play video starting at 1 minute 14 seconds and follow transcript1:14
So the situation when we have a recursive call is and, if the procedure is called tail recursion. And
there is a known way to eliminate such a recursive call. Namely, instead of making this recursive call,
let's just update. Well, in the second recursive call, we sort the right part of our array. I mean, the part
from index n+1 to index r. Instead of making this recursive call, let's replace the with a while loop,
inside this while loop we call the partition procedure as shown on the slide. Then we make a recursive
call to the left part, but instead of making the recursive call for the right part, we'll just update the
value of l to be equal to m+1. And then we go to the beginning of this while loop, and this essentially
mimics our recursive call.
Play video starting at 2 minutes 12 seconds and follow transcript2:12
So far so good. We've just realized that we can eliminate the last recursive call. At the same time let's
also realize the following thing.
Play video starting at 2 minutes 22 seconds and follow transcript2:22
In our QuickSort algorithm we first call the partition precision, then we make two recursive calls. And
these two recursive calls are in a sense independent. Well it doesn't matter which comes first, right?
So they do not depend on each other. This means that we can as well eliminate a recursive call
through the first part. Well, and this in turn means that we can always select which one to eliminate.
And for us, it is better to remove a recursive call to a longer part. And this is why, if we always make a
recursive call during the rate which is shorter, then we make a recursive call during the rate which is
at least twice shorter than the initial already, right? And this in turn means that the depths of our
recursion will be at most logarithmic. Because well, the first recursive call is made for an array of size
of at most n over 2, then at most n over 4 and so on. So the depth is logarithmic, which is good. And
this can be implemented as follows. So we first call the partition procedure. It gives us a value of m. At
this point, we know the length of two parts. And we just compare them. If, for example, the lengths of
the first part is shorter, then we make a recursive call to this part. And instead of making the recursive
call for the second part, we just update the value of l. In the other case when the right part is shorter,
we make the recursive call for this part, and instead of making the recursive call for this part, we'll just
update the value of r. Right? So overall this gives us an implementation of the QuickSort algorithm
which uses in the worst case an additional logarithmic space. So the next implementation issue
concerns the random bits used by our algorithm. So I assume that we would like to have a
deterministic version of our randomized QuickSort. And this is a reasonable thing to want because in
practice we would like to have such a thing as reproducibility, which is for example essential for
debugging. So we would like our program to always output the same, on the same dataset. And this is
why we would probably not like to use random numbers, okay? Then we can do the following. The
following algorithm is known as intro sort and is used in many practical implementation of QuickSort.
So instead of selecting the pivot element randomly, let's select it as follows using, for example, the
following simple heuristic. Each time when we're given a summary, and we need to partition it with
respect to some pivot. So for this we need to select pivot, and let's select it as follows. We take the
first element of the summary, the last element and the middle element, for example. Then we have
three elements, and we sort them. We just compare them and we select the medium value of these.
Play video starting at 5 minutes 31 seconds and follow transcript5:31
And we use this element as our pivot element. So this is a very simple heuristic, it can be
implemented very efficiently. We just need three comparisons to select this median. And in many
cases this is enough for the QuickSort algorithm to work effectively. However, this is not what we
want, right. We are not happy with the statement that this algorithm works. Works well in many
cases. We would like our algorithm to works well just on every possible input.
Play video starting at 6 minutes 3 seconds and follow transcript6:03
Unfortunately there are pathological cases in which these heuristics works badly. But we can
overcome this in the following way. While running our QuickSort algorithm, well let's count what is
the current depths of our recursion three. And at some point when it exceeds some values here again,
for some constant c, then we just stop the current algorithm and switch to some other algorithm. For
example for the heap sort algorithm. This is another efficient algorithm, which is, asymptotically as
good as MergeSort I mean, it has asymptotic again. However Greek sort is usually faster in practice.
So, at this point, we switch to the quick sort algorithm.
Play video starting at 6 minutes 55 seconds and follow transcript6:55
Which means that for these pathological bad instances, for the QuickSort with this simple heuristic of
selecting the pivot element, we still work in the worst case in time n log m. Because before we
exceeded the depth c log n, we spend time n log m. And after this, we'll start this algorithm.
Immediately and we run the heap sort algorithm. So overall, we've spent time n log n. So this gives us
an algorithm which in many cases performs like the QuickSort algorithm and in any case, just in the
worst case, its running time is bounded above by n log n. So to conclude, the QuickSort algorithm is a
comparison based algorithm whose running time is big O of n log n in the average case, and big O of n
squared in the worst case.
Play video starting at 7 minutes 55 seconds and follow transcript7:55
What is important in this algorithm is that it is very efficient in practice. It is more efficient than the
north shore algorithim for example. For this reason it is commonly used in practice and for this reason
it is called QuickSort
Resources
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Quick sort: Chapter 7 of [CLRS]
Visualizations
sorting-algorithms.com
References
[CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to
Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009
http://dm.compsciclub.ru/app/quiz-local-maximum
Week 5
Algorithmic Toolbox
Week 5
Discuss and ask questions about Week 5.
Go to forum
Dynamic Programming 1
In this final module of the course you will learn about the powerful algorithmic technique for solving
many optimization problems called Dynamic Programming. It turned out that dynamic programming
can solve many problems that evade all attempts to solve them using greedy or divide-and-conquer
strategy. There are countless applications of dynamic programming in practice: from maximizing the
advertisement revenue of a TV station, to search for similar Internet pages, to gene finding (the
problem where biologists need to find the minimum number of mutations to transform one gene into
another). You will learn how the same idea helps to automatically make spelling corrections and to
show the differences between two versions of the same text.
Less
Key Concepts
http://dm.compsciclub.ru/app/quiz-number-of-paths
Interactive Puzzle: Two Rocks Game
http://dm.compsciclub.ru/app/quiz-take-the-last-stone
Interactive Puzzle: Three Rocks Game
http://dm.compsciclub.ru/app/quiz-three-rocks-game
Change Problem
Before recording this lecture, I stopped by the coffee shop.
Play video starting at 4 seconds and follow transcript0:04
This cappuccino is good.
Play video starting at 8 seconds and follow transcript0:08
And as soon as I gave $5 to the cashier, she faced
an algorithmic problem of which coins to select to give me the change.
Play video starting at 19 seconds and follow transcript0:19
And cashiers all over the world use an algorithmic approach
called greedy algorithm to solve this problem.
Today we will learn how cashiers and computer scientists
use greedy algorithm for solving many practical problems.
Play video starting at 39 seconds and follow transcript0:39
So the change problem is finding the minimum number of coins
needed to make change.
Play video starting at 46 seconds and follow transcript0:46
More formally, input to the problem is integer money and positive integers,
coin1, coin2, coind, that represents coin denominations.
For example in the US, coin1 will be 1 cents,
coin2 will be 5 cents, 10 cents, 25 cents, and 50 cents.
And the output is the minimum number of coins, with denominations coin1,
coin2, coind that changes money exactly.
Play video starting at 1 minute 17 seconds and follow transcript1:17
So today in the morning, when cashier had to return me 40 cents,
she most likely used the following algorithm.
First, finding the largest coin denomination
that is smaller than 40 cents.
It will be 25 cents.
So she gave me 25, 15 cents left and
then the next challenge is how to change 15 cents.
Play video starting at 1 minute 39 seconds and follow transcript1:39
The next step is she probably found the largest coin smaller than 15 cents,
it is 10 cents.
She gave me 10 cents, and finally, she returned 5 cents.
As a result, she changed 40 cents as 25 plus 10 plus 5.
Do you think it's the minimum number of coins she could possibly return?
Play video starting at 2 minutes 2 seconds and follow transcript2:02
It is the minimal number of coins in the United States.
But if you travel to Tanzania, it won't be
the minimum number of coins because there is a 20 cent coin in Tanzania.
And therefore this greedy approach to solving the change
problem will fail in Tanzania because there is a better way to change 40 cents,
simply as 20 cents plus 20 cents, using Tanzanian 20 cents coin.
Play video starting at 2 minutes 32 seconds and follow transcript2:32
Since the greedy approach to solving the change problem failed,
let's try something different.
Let's try the recursive algorithm for solving the same problem.
Suppose we want to change 9 cents, and
our denominations are 1 cent, 5 cents, and 6 cents.
What would be the optimal way to change 9 cents?
Well, if we only knew what is the optimal ways to change 9 minus 6 cents,
9 minus 5 cents and 9 minus 1 cents, then we would know,
what is the optimal way to change 9 cents?
In other words, to change 9 cents,
we need to know how to change small number of cents,
in our case, 3 cents, 4 cents, and 8 cents.
And therefore, an approach to solving this problem would be
to use this recurrence to write the recursive program.
Play video starting at 3 minutes 39 seconds and follow transcript3:39
This idea is implemented in the program RecursiveChange.
Play video starting at 3 minutes 44 seconds and follow transcript3:44
To change money, cents using coins, coin1,
coin2, coind, we do the following.
We first recursively call RecursiveChange with the amount of money,
money minus coin1, money minus coin2, and money minus coind.
And find the minimum amount of money for these d choices.
We have plus 1 because there is one more coin to add and returns this way.
Play video starting at 4 minutes 17 seconds and follow transcript4:17
This looks like the right approach to solve the problem,
but let's check how fast the resulting program is.
Play video starting at 4 minutes 28 seconds and follow transcript4:28
So, when we're changing 76 coins, there are actually three choices.
We need to recursively call RecursiveChange for 70 cents,
71 cents, and 75 cents.
Play video starting at 4 minutes 42 seconds and follow transcript4:42
But for each of these values, we need once again to call three choices.
Play video starting at 4 minutes 49 seconds and follow transcript4:49
And we will continue growing this tree and
very quickly it will turn into a gigantic tree.
Play video starting at 4 minutes 58 seconds and follow transcript4:58
Let's check how many times we have already tried to change 70 cents.
Three times, and we only started expanding this tree.
In fact, if we continue further, we will see that there were six [mistake: four]
times when we needed to compute RecursiveChange for 70.
Play video starting at 5 minutes 21 seconds and follow transcript5:21
How many times do you think we will need to run recursive calls
when we compute the minimal number of coins for 30 cents?
Play video starting at 5 minutes 32 seconds and follow transcript5:32
It turn out that we will need to call it trillions of times,
which means that our seemingly very elegant
RecursiveChange program will not finish before the end of your lifetime.
So as simple as the change problem looks like, neither a greedy approach nor
a recursive approach solve it in a reasonable time.
60 years ago, a brilliant mathematician, Richard Bellman, had a different idea.
Play video starting at 6 minutes 6 seconds and follow transcript6:06
Wouldn't it be nice to know all the answers for changing money minus
coin i by the time we need to compute an optimal way of changing money?
Play video starting at 6 minutes 18 seconds and follow transcript6:18
And instead of the time consuming calls to RecursiveChange,
money minus coin i, that may require to be repeated trillions of times,
they would simply look up these values.
Play video starting at 6 minutes 32 seconds and follow transcript6:32
This idea resulted in dynamic programming approach that is applied in thousands
of diverse, practical applications in a myriad of different fields.
Play video starting at 6 minutes 45 seconds and follow transcript6:45
And the key idea of dynamic programming is to start filling this matrix,
not from the right to the left, as we did before in the recursive change, but
instead, from the left to the right.
So, we will first ask the trivial question,
what is the minimum number of coins needed to change 0 cents?
And, of course, it is 0.
What is the minimum number of coins to change 1 cents?
Obviously it is one, but we can compute this number by finding what
is the minimum number of coins to change 0 cents and adding one coin.
We will proceed in a similar fashion to compute the minimum number of
coins to change 2 cents, 3 cents, and 4 cents.
There is only one possibility to derive this number from the previous number.
And for 5 cents, actually there are two possibilities, green and blue.
For green one, you can derive it from 0 cents by adding 5 coins,
and for blue possibility, we can derive it from 4 cents by adding one penny.
Well, which possibility would you select?
Of course the one that gives you the minimum change for 5 coins.
Play video starting at 8 minutes 0 seconds and follow transcript8:00
And continue further and apply the code to 6 cents and there are three possibilities and once
again we select the optimal choice that correspond to minimum number of coins.
Let's say it may be 0 coins plus 6 cents.
We continue for 7 cents, continue for 8 cents, and finally,
very quickly, we actually found the correct answer for 9 cents.
We need four coins to change 9 cents.
And this results in DPChange algorithm that simply fills
up the table that I just showed you from left to right.
Play video starting at 8 minutes 43 seconds and follow transcript8:43
DP change is the first dynamic programming algorithm that you saw in this course,
and there will be thousands more.
Play video starting at 8 minutes 52 seconds and follow transcript8:52
You may be wondering why is this algorithm called dynamic programming and
what does it have to do with programming?
Well, in fact programming and
dynamic programming has nothing to do with programming.
Play video starting at 9 minutes 4 seconds and follow transcript9:04
Amazingly enough, dynamic programming is one of
the most practical algorithms computer scientists use.
But when Richard Bellman was developing this idea for
Air Force project he was working on, it looked completely impractical.
Play video starting at 9 minutes 22 seconds and follow transcript9:22
And he wanted to hide that he's really doing mathematics from the Secretary
of Defense, rather than working on Air Force project.
Therefore he invented a name
that basically has nothing to do with what dynamic programming algorithms do.
In his own word, he said, what name could I choose?
I was interested in planning but planning is not a good word for various reasons.
I decided therefore to use the word programming, and
I wanted to get across the idea that this was dynamic.
It was something not even a Congressman could object.
Change Money
Resources
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Change problem: Section "An Introduction to Dynamic Programming: The Change Problem" of
[CP]
Visualizations
Making change by David Galles
References
[CP] Phillip Compeau, Pavel Pevzner. Bioinformatics Algorithms: An Active Learning Approach.
Active Learning Publishers. 2014.
String Comparison
The Alignment Game
Cystic Fibrosis is one of the most common genetic diseases in humans. Approximately one in 25
people carries a cystic fibrosis gene. And when both parents carry a faulty gene, there is a 25% chance
Play video starting at 23 seconds and follow transcript0:23
that their child will have cystic fibrosis.
Play video starting at 27 seconds and follow transcript0:27
In the early 1980s biologists started the hunt for cystic fibrosis genes, one of the first gene hunting
projects in the framework of the human genome project. 30 years ago biologists narrowed the search
for the cystic fibrosis gene to a million nucleotide-long region on chromosome 7. However, this region
contained many genes, and it was not clear which of them is responsible for cystic fibrosis. How
would you find which of these genes is the cause of cystic fibrosis?
Play video starting at 1 minute 8 seconds and follow transcript1:08
I'll give you a hint.
Play video starting at 1 minute 10 seconds and follow transcript1:10
Cystic fibrosis in involves sweat secretion with abnormally high sodium levels.
Play video starting at 1 minute 18 seconds and follow transcript1:18
Well, this is a biological hint that does not help us to solve the challenge of finding something in this
one million nucleotide area that is responsible for cystic fibrosis. Let me give you hint number two. By
that time when cystic fibrosis hunt was on, biologists already knew the sequences of some genes
responsible for secretions. For example, ATP binding proteins act as transport channels responsible
for secretion.
Play video starting at 1 minute 53 seconds and follow transcript1:53
You still may be wondering how these two hints may help you to find the cystic fibrosis gene in the
found one million nucleotide-long region on chromosome 7. But here's my third hint.
Play video starting at 2 minutes 9 seconds and follow transcript2:09
Should we search for genes in this region that are similar to known genes responsible for secretion?
Biologists used this third hint and bingo, they found that one of genes in this region was similar to the
ATP binding proteins that act as transport channels responsible for secretion. To learn how biologists
find similarities between chance, we will first learn how to play a simple game called the alignment
game.
Play video starting at 2 minutes 47 seconds and follow transcript2:47
The alignment game is a single person game. I give you two strings, and your goal is to remove symbol
from the strings in such a way that the number of points is maximized. I have to explain to you how
you can get points for playing the alignment game. You can either remove the first symbol from both
strings. And in this case, you get one point if they're the same symbol, you don't get any points if they
are different symbol. Or you can remove first symbol from one of the strings and in this case you also
don't get any points. So let's try to play this game. In the beginning it makes sense to remove the first
symbol from both strings, we'll get plus one. Then another pair of identical symbols, another plus one.
And now symbols are different so it doesn't make sense to remove them both because we'll get zero
point. Maybe we should only remove C from the second string and after we've done this there is a
possibility to remove two Gs from both string. We get another point we continue, continue, continue,
and finally after playing this game we get score of plus four. Do you think you can get score of plus
five playing this game?
Play video starting at 4 minutes 15 seconds and follow transcript4:15
We also after playing this game have constructed something that is called alignment of two strings.
Alignment of two strings is a two row matrix such that, that first row consist of symbols of the first
string in order, possibly interspaced with the space symbol. And the second row consists of the
symbols of the second string once again, possibly interspersed with the space symbol. After we
constructed the alignment, we can classify different columns in the alignment matrix as matches or
mismatches or insertions. Insertions corresponds to the case when we selected the symbol from the
second string and deletions that correspond to the case when we selected the symbol from the first
string. And more over we can score this alignment by giving premium for every match, we'll give
premium plus one and penalty for every mismatch and every insertion and deletion that we denote as
indel. In our case we will use penalty minus mu for mismatches and penalty minus sigma for
insertions and deletions or indels. For example in our case if mu equals zero and sigma equal to one,
then we get alignment score equal to one. So we define the alignment score as number of matches
minus mu number of mismatches and minus sigma number of indels. And the optimal alignment
problem is given two strings mismatch penalty mu, and indel penalty sigma find an alignment of two
strings maximizing the score. We will be particularly interested in one particular score of alignment.
We will define common subsequence as simply matches in an alignment of two strands. In this case,
common subsequence is represented by ATGT, and the longest common subsequence problems that
we will be interested in is the following. Given two strings we want to find the longest common
subsequence of these strings.
Play video starting at 6 minutes 44 seconds and follow transcript6:44
And of course, you have already recognized that to find longest common subsequence we simply
need to find maximum score alignment with the parameters mu equals zero and sigma equals zero.
Play video starting at 7 minutes 0 seconds and follow transcript7:00
Another classical problem in computer science is the edit distance problem. Given two strings, find
the minimum number of elementary operations, insertions, deletions, or substitutions of symbols.
That transform one string into another. And of course the minimum number of insertions, deletions,
and mismatches in an alignment of two strings, represents the edit distance. For example, if you want
to find the editing distance between the strings, editing and distance, they can construct optimal
alignment of the string with appropriate scores. Here I show matches, mismatches,insertions,
deletions. And to see that the edit distance problem is equivalent to the alignment problem let's
consider this alignment between editing and distance. And let's compute the total number of symbols
in the two strings.
Play video starting at 8 minutes 0 seconds and follow transcript8:00
Obviously the total number of symbol in two strings is equal to twice number of matches, plus twice
number of mismatches plus number of insertions plus number of deletions. I will take the liberty to
derive this expression and after I rewrote it you will see that the first three terms corresponds to the
alignment score, and the last three terms corresponds to the edit distance.
Play video starting at 8 minutes 26 seconds and follow transcript8:26
Therefore, minimizing edit distance is the same as maximizing the alignment score. Which means the
edit distance problem is just one particular version of the alignment problem.
Edit Distance
Resources
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Edit distance: Section 6.3 of [DPV08]
Visualizations
Edit distance calculator by Peter Kleiweg
Longest common subsequence by David Galles (note the longest common subsequence problem
is a special case of the edit distance problem where we allow insertions and deletions only)
Advanced Reading
Chapter 5 "How Do We Compare Biological Sequences" of [CP]
Both sources explain, in particular, Hirschber's algorithm that allows to compute an optimal
alignment (but not just its score!) of two strings of length nn and mm in quadratic
time O(nm)O(nm) and a linear space O(m+n)O(m+n) only.
References
[DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms (1st Edition).
McGraw-Hill Higher Education. 2008.
[CP] Phillip Compeau, Pavel Pevzner. Bioinformatics Algorithms: An Active Learning Approach.
Active Learning Publishers. 2014.
Additional Slides
Dynamic programming is probably the hardest part of the course. At the same time, it is definitely
one of the most important algorithmic techniques. Please see additional slides that discuss an
alternative perspective for dynamic programming algorithms: to get to a dynamic programming
solution, we start from the most naive brute force solution and then start optimizing it. The slides
also contain many pieces of Python code.
Programming assignment 5
http://dm.compsciclub.ru/app/quiz-primitive-calculator
Programming Assignment: Programming Assignment 5: Dynamic
Programming 1
Week 6
Algorithmic Toolbox
Week 6
Discuss this week's modules here.
Go to forum
Dynamic Programming 2
Key Concepts
Knapsack
Submit your assignment
In this video, we will design a dynamic programming solution for the Knapsack with repetitions
problem.
Play video starting at 8 seconds and follow transcript0:08
Recall that in this problem, we are given an unlimited quantity of each item.
Play video starting at 15 seconds and follow transcript0:15
This is a formal statement of the problem. We're given n items with weights w1, w2 and so on, wn.
And its values are v1, v2 and so on, Vn.
Play video starting at 28 seconds and follow transcript0:28
By capital W we denote the total capacity or the total weight of the knapsack. And our goal is to select
the subset of items where each item can be taken any number of times such that the total weight is at
most capital W while the total value is as large as possible.
Play video starting at 48 seconds and follow transcript0:48
To come up with a dynamic programing algorithm, let's analyze the structure of an optimal solution.
For this consider some subset of items, of total weight, at most capital W, whose total value is
maximal. And let's consider some element i in it, let's see what happens if we take this element out of
this solution. So what remains is some subset of items whose total weight is at most capital W minus
wi. Right? So this is easy. What is crucial for us is that the total value of this remaining subset of items
must be optimal. I mean it must be maximal amount all subset of items whose total weight is at most
capital w minus w i. Why is that? Well, assume that there is some other subset of items whose total
weight is at most, capital W- wi, but whose total value is higher? Let's then take the highest item and
put it back to this subset of items. What we get, actually, is the solution to our initial problem of
higher value. I mean, its total weight is at most capital W, and its value is higher than the value of our
initial solution. But these contradicts to the fact that we started with an optimal solution.
Play video starting at 2 minutes 14 seconds and follow transcript2:14
So, such trick is known as cut and paste trick. And it is frequently used in designing dynamic
programming algorithms. So, let me repeat what we just proved. If we take an optimal solution for a
knapsack of total weight W and take some item i out of it, then what remains must be an optimal
solution for a knapsack of smaller weight.
Play video starting at 2 minutes 41 seconds and follow transcript2:41
So this suggests that we have a separate subproblem for each possible total weight from zero to
capital W. Namely, let's define value of w as a optimal total value of items whose total weight is, at
most w.
Play video starting at 2 minutes 59 seconds and follow transcript2:59
This allows us to express value of w using the values for a smaller weight knapsack. Namely to get an
optimal solution for a knapsack of total weight w we first take some smaller knapsack and an optimal
solution for it and add an item i to it. So first of all to be able to add an item i to it and get a knapsack
of total weight W we need this smaller knapsack to be of total weight at most W minus wi, also when
adding i'th item to it we increase its value by vi, and the final thing is we do not know which element
to add exactly. For this reason, we just go through all possible elements, n items, and select the
maximal value. The maximal value of the following thing: Value of W minus wi, plus vi.
Play video starting at 4 minutes 2 seconds and follow transcript4:02
Having a recurrent formula for value of w as we just discussed, it is not so difficult to implement an
algorithm solving the knapsack problem with repetitions. Recall that we expressed the solution for a
knapsack, through solutions from knapsacks of smaller weight. This means that it makes sense to
solve our subproblems in the order of increasing weight. So we do this in the pseudocode. Initially we
set value of 0 to 0 just to reflect that fact
Play video starting at 4 minutes 36 seconds and follow transcript4:36
that the maximal possible total value of a Knapsack of weight 0, clearly equals 0. Then we go in a loop
from w=1 to W. And for each such w we just compute the corresponding maximum as follows. We go
through all items i such that wi is at most w.
Play video starting at 5 minutes 0 seconds and follow transcript5:00
And for each such item i, we see what happens if we take an optimal solution
Play video starting at 5 minutes 6 seconds and follow transcript5:06
of for a knapsack of size W minus wi, and add an item i into it. Clearly in this case, the total value is
value(w minus wi) plus vi, and the total weight is at most W. So this is a feasible solution for a
Knapsack of total weight W. So we check whether the result in value is larger and what we currently
have and if it is we update value of w. In the end, we just return value of capital W. So this algorithm
is clearly correct because it just implements our recurrent formula, right? So in particular this loop
just computes the maximum from the previous slide. Now let's estimate the running time of this
algorithm. It is not difficult to see that the running time is
Play video starting at 6 minutes 7 seconds and follow transcript6:07
of n multiplied by capital W. Why is that? Well just because we have two nested loops here. So this is
the first loop, and this is the second loop. The first one has capital W on it, capital W iterations. And
the second one has n iterations. N iterations. What happens inside in the loop here it takes just
constant time.
Play video starting at 6 minutes 36 seconds and follow transcript6:36
We conclude this video by applying our algorithm to the example considered a few minutes before.
Play video starting at 6 minutes 44 seconds and follow transcript6:44
So in this case we are given four items and a knapsack of total capacity 10. We are going to compute
the optimal value for all knapsacks of total weight from zero to ten. So, which means that it makes
sense to store all these values just in an array. So, shown here on the slide. Initially this array is filled
by zero's and we're going to fill it in with values from left to right.
Play video starting at 7 minutes 14 seconds and follow transcript7:14
So the first non-obvious cell is two. So this is the first weight for which we can add any item. So in this
case we can actually say that to get a solution for knapsack of total weight two we can get a solution
for knapsack of total weight 0 and add the last element to it. This will also give us plus nine to the
value.
Play video starting at 7 minutes 46 seconds and follow transcript7:46
So this is the only possible solution for this cell, so we do not even need to compute the maximum. So
in this case, the value is equal to nine.
Play video starting at 7 minutes 55 seconds and follow transcript7:55
So what about value of three? So in this case, we already have a choice. We can either get an optimal
solution for total weight one, and add the fourth element to it, or we can get an optimal solution for a
knapsack of total weight zero and add the second element to it, whose value is 14. So among these
two values, the second choice is better. It gives us a solution of value 14, so we'll write it in this cell.
Play video starting at 8 minutes 33 seconds and follow transcript8:33
Now, for value of 4, there are already three choices. Let's consider them. So also we can take an
optimal solution for a knapsack of total weight two and add the last to it. So this is plus 9 or we can
take an optimal solution for a knapsack of total weight one and add the second item to it
Play video starting at 9 minutes 3 seconds and follow transcript9:03
so plus 14 or we can take an optimal solution for a knapsack of total weight 0 and add the third item.
Play video starting at 9 minutes 15 seconds and follow transcript9:15
This is plus 16. Right? So in this case, we need to select the maximum amount 16, 14 and 9 plus 9
which is 18. In this case, 18 is the maximum value. So we'll write it in this cell. So by continuing in the
same manner, we can fill in the whole array
Play video starting at 9 minutes 37 seconds and follow transcript9:37
and see that the last element is equal to 48, we just devise that the optimal value for this knapsack
with repetitions problem is equal to 48. And also, let me remind you that this optimal value can be
updated by taking one copy of this item, and 2 copies of the last item.
Play video starting at 10 minutes 7 seconds and follow transcript10:07
In the next video, we will learn how to solve this problem when repetitions are not allowed.
Knapsack without Repetitions
In this video we will be designing a dynamic formatting solution for the Knapsack without Repetitions
problem. Recall that in this problem we're give a single copy of each item. So this is also to remind
you the formal statement of the problem, so we emphasize once again that we are not allowed to
take more than a single copy of each item. Well, we already know that our previous same reason
cannot produce the right answer for our new very namely for the Knapsack without repetitions
problems. Well this is simply because in our toy example is that optimal value for the Knapsack with
repetitions was 48 while the optimal value for the Knapsack without repetitions was 46. So this means
that if we just run our previous algorithm, it will produce an incorrect result. Still it is important to
understand where our algorithms, where our reasoning more generally fails for this problem.
Play video starting at 1 minute 8 seconds and follow transcript1:08
So once again, let's consider an optimal subset of items for a knapsack of total weight capital W. And
assume for the moment that we know that it contains the nth element.
Play video starting at 1 minute 21 seconds and follow transcript1:21
That is the last item. So we argue, well similarly to the previous case that if we take this item out of
the current knapsack, then what we get must be an optimal solution for a knapsack of smaller weight,
namely of total weight W- wn. So if we take we the smaller solution and we add the nth item to it, we
get an optimal solution for the initial knapsack of total weight, W. I assume however, that the optimal
solution for the smaller knapsack, already contains the nth item. This means that we cannot add
another copy of the nth element to it, right, because then the resulting solution will contain two
copies of the nth element which is now forbidden by the problem formulation. So this is why we need
to come up with a different notion of a subproblem. So still, let's take a closer look at our optimal
solution. It is not difficult to see that there are only two cases, either it contains the lost item, or it
doesn't contain it. I assume that it contains, and let's again take this nth item out of our current
solution. So what is left? First of all, it is some solution for a knapsack of total weight, capital W- wn,
and it also uses only items from 1 to n- 1, because, well, we just took out the nth item, right?
Play video starting at 3 minutes 1 second and follow transcript3:01
If, on the other hand, the initial optimal solution for the knapsack of total weight W does not contain
the nth item, well, then it contains only items from 1 to n minus 1. Right? Well this simple observation
will help us to get the right definition of a subproblem for this version of the knapsack problem.
Play video starting at 3 minutes 23 seconds and follow transcript3:23
Well on the previous slide we argued as follows. Consider an optimal solution for a knapsack of total
weight capital W. And there are two cases. Either it can contain the last item or it doesn't contain. If it
contains we can take it out, and reduce the problem for small knapsack using only items from one to
n minus one. On the other hand, if it doesn't contain the nth item, then we'll reduce it to another case
when the knapsack only uses items from 1 to n-1. In any case, we reduce the number of items and in
the first case, we also reduce the size of the knapsack, the total weight of the knapsack. We might
continue this process, and express the solution for all sub-problems through solutions to force up
subproblems. If we continue in the same fashion what we get somewhere in the middle is a solution
for a knapsack of some weight that uses some first i items. Well let's just use this as a definition of our
subproblem. Namely, for any w, from 0 to W, and for any i, from 0 to n, let's denote by value of w and
i the maximum value that can be achieved by using only items from 1 to i, and whose total weight is
at most w. Right, then it is easy to express it through solutions for smaller such problems. Once again,
value of w and i, is a subset, is an optimal
Play video starting at 5 minutes 12 seconds and follow transcript5:12
value of a subset, of the first items who stole the weight is utmost w. So we know that in this optimal
subset, either there is the i-th item or the i-th item is not contained in it. So there are two cases. So
we need to select the maximum out of two cases. And the first case if we take the i-th item out what
is left is an optimal solution for the following problem. We are allowed only to use the first i-1 items
and the total weight should be no more than w-wi, so this is the first term under the maximum.
Play video starting at 5 minutes 51 seconds and follow transcript5:51
In the second case, if the i-th item is not used in an optimal solution, then we just know that the
optimal solution is the same as for the knapsack of total weight, W, using only the first i- 1 items.
Play video starting at 6 minutes 7 seconds and follow transcript6:07
So we managed to express the solution for our problems through solutions for smaller sub-problems.
And this is probably the most important thing in designing dynamic problem in algorithms.
Play video starting at 6 minutes 21 seconds and follow transcript6:21
We now done our recurrent formula into a dynamic problem in algorithm. As usual, we start from
initialization namely with your set all the values of 0, j to 0 for all j and all the values of w, 0 to 0. Well,
this just expresses the fact that if we have no items, well, then the value is zero. If we have the
knapsack of total weight zero, then the total value's also zero, of course. Then recall, now, we need to
somehow compute, all other values of w, i.
Play video starting at 7 minutes 0 seconds and follow transcript7:00
Recall that we expressed value Wi of Wi through values of
Play video starting at 7 minutes 7 seconds and follow transcript7:07
W, smaller w and i- 1 and W and i- 1. This means that we always reduce the problem from Wi to
something with smaller number of items, to i- 1. This actually helps us to understand that it makes
sense to gradually increase the number of allowable items. And this is why we have in this
pseudocode an outer loop where i goes from 1 to n. When i is fixed, we will compute all the values of
W, i. So for this, we also go from W equal to 1 to capital W and do the following. So now, i and W are
fixed, we need to compute value of W, i. First, we just check what is the value of, what is the solution
for the subproblem when we use the knapsack of the same weight w but we only use the first i-1
items.
Play video starting at 8 minutes 11 seconds and follow transcript8:11
This is implemented as follows. We first just assign value of w, i to value of w, i-1. Then we need to
check whether we can improve this value by using the i-th item. First of all we can only do this if the
weight of the ice item does not exceed the weight of the current knapsack which is just W. So, if it
doesn't exceed we see what happens if we take an optimal value for the knapsack of the total weight
w minus wi. That is filled only by elements from 1 to i minus 1, and add the i-th element to it. If it
gives a larger value than we currently have, we will update the value of wi, so in the end we just
return the value of capital w and n. Because this is the solution to our initial problem. So this a
solution for a knapsack of size capital w that uses just all the n items, right? Now so it is clear that this
algorithm is correct just because it directly implements the recurrent formula that we already
discussed. So let's analyze its running time. It is not difficult to show, again, that its running time is
actually the same. It is again n multiplied by W. Well, this is again just because we have two loops
here. So this is the first loop with n iterations, and this is the inner loop with W iterations. And what is
going on inside only takes some constant time.
Play video starting at 10 minutes 3 seconds and follow transcript10:03
Now let's apply the algorithm that we've just designed to our toy example. Recall that we need to
store the values of all subproblems for Wi, for all W from zero to ten, and all i from zero to four, in our
case. For these purposes, it is natural to use a two-dimensional table, or two-dimensional array. You
can see such a two-dimensional array on the slide already filled in. So here we have i, so all the rows
of our columns are by all possible way of i, and all the columns in this set by all possible values of W.
Right, we start by initializing the first row, and the first column of this table by zero. That is, we fill this
row by zeroes and we fill this column by zeroes also. Then we start filling in this table row by row.
That is, we first fill in this cell, then this cell, then this cell, then this cell, and so on. So we go like this.
So we first fill in this row, then fill in this row, then fill in this row and then fill in this row. So the
results in value 46 is actually the answer to our initial problem. Now, let me show you how some
particular value, just through this trait, let me show you how some particular value in this table was
computed. For example, consider this cell.
Play video starting at 11 minutes 53 seconds and follow transcript11:53
So formally, this is value, value(10, 2). Which means that this is an optimal value of a knapsack of total
weight 10 that only uses the first two items. So assume that we don't know what to put here.
Play video starting at 12 minutes 18 seconds and follow transcript12:18
So we just need to compute it right now. So let's argue as we did before. So this is a knapsack of total
weight 10 that uses only the first two items. Well, we then say that the second item is either used or
not. So if it is not used, then this is the same as filling in the Knapsack of total weight ten just using the
first item. And we already know this value because it is in the previous row. So this is value 10, 1,
right? So the value in this case is 30. On the other hand, if the second item is used, then if we take it
out, what is left is an optimal solution for a knapsack of total weight 10 minus 3. Because 3 is the
weight of the second item, which means that it is an optimal solution for a knapsack of size 7. Of total
weight 7 that only uses the first, that is only allowed to use the first item. Also, if we add this item to,
if we add the second item to the solution, we get 30 plus 14. Which is much better than without using
the second item, right? So that's why we have 44 here.
Play video starting at 13 minutes 45 seconds and follow transcript13:45
And also for this reason we fill this matrix row by row. So now that when we need to compute the
value of this cell, we already have computed the value of these two cells.
Play video starting at 14 minutes 1 second and follow transcript14:01
So that's why we fill our metrics exactly row by row.
Play video starting at 14 minutes 7 seconds and follow transcript14:07
Now let me use the same example to illustrate an important technique in dynamic programming.
Namely reconstructing an optimal solution.
Play video starting at 14 minutes 17 seconds and follow transcript14:17
Reconstructing an optimal solution in this particular problem I mean finding not only the optimal
value for the knapsack of size of total weight. But the subset of items that lead to this optimal value
itself. For this we first create a boolean array of size four. In this array, we will mark, for each item,
whether it is used in an optimal solution or not. Now what we're going to do is to back trace the path
that led us to the optimal value, 46. In particular, let's try to understand how this value of 46 was
computed.
Play video starting at 15 minutes 2 seconds and follow transcript15:02
Well, first of all, 46 is formally value of 10, 4, that is is an optimal value for a knapsack of total weight
ten using the first four items. We argued that the fourth item is either used or not. If it is not used,
then this value is the same as the value 10, 3, which is shown here. That is the value of the knapsack
of the same weight, using the first three items. If on the other hand it is used, then what is left must
be an optimal solution for a knapsack of size 10 minus 2 which is 8, that uses also the first three items.
Well this value is already computed, it is 30, so we need to compute the maximum among 30 plus 9,
because, well the value of the last item is 9 and 46. In this particular case there, the maximum is equal
to 46 which means that we decided at this point not to use the last item, right? So we put 0 into our
boolean array to indicate this, and we move to this cell.
Play video starting at 16 minutes 18 seconds and follow transcript16:18
Again, let's try to understand how this value was computed. It was computed as the maximum value
of two numbers which depend on the following values. So either we do not use the third item, then it
is the same, has the value of this cell or we use the third item. In this case, what remains is an
knapsack of size, of total weight 6, and using the first two items and its value is 30.
Play video starting at 16 minutes 49 seconds and follow transcript16:49
Plus the weight of the third item, which is 16. In this particular case, 30 plus 16 is larger than 44,
which means that this value of 46 was computed using this value. This, in turn, means that we
decided to use the third item. Let's mark it by putting 1 into our boolean array. Now we stay in this
cell and we try to understand how it was computed. It was computed as a maximum over this 30 and
this 0, plus fourteen. Right, in this case, the first value is larger so we move to this cell and we mark
that we decided not to use the second item. Okay and finally, we realize that we arrived at this value
30 from the right, from the left upper corner. Right? So, this way we reconstructed the wall optimal
solution. Once again, we backtraced the path that led us to the optimal value.
Play video starting at 18 minutes 11 seconds and follow transcript18:11
Here, what is shown here, is that we decided to use the first item and the third item. So let's check
that it indeed gives us the optimal value of 46. So indeed if we compute the sum of the weight of the
first and the third item, it is 10. And while the total value is 30 plus 16 which is 46 indeed. And as I said
before this technique is usually used in dynamic programming algorithms to reconstruct the optimal
solution.
Final Remarks
Play
Volume
0:03/7:40
Subtitles
Settings
Full Screen
Notes
All notes
Click the “Save Note” button when you want to capture a screen. You can also highlight and save
lines from the transcript below. Add your own notes to anything you’ve captured.
Save Note
Discuss
Download
Help Us Translate
Interactive Transcript - Enable basic transcript mode by pressing the escape key
You may navigate through the transcript using tab. To save a note for a section of text press CTRL
+ S. To expand your selection you may use CTRL + arrow key. You may contract your selection
using shift + CTRL + arrow key. For screen readers that are incompatible with using arrow keys for
shortcuts, you can replace them with the H J K L keys. Some screen readers may require using
CTRL in conjunction with the alt key
Play video starting at 0 seconds and follow transcript0:00
We conclude this lesson with a few important remarks.
Play video starting at 4 seconds and follow transcript0:04
The first remark is about a trick called memoization.
Play video starting at 9 seconds and follow transcript0:09
Usually when designing a dynamic program and algorithm, you start
with analyzing the structure of an optimal solution for your computational problem.
You do this to come up with the right
definition of a sub-problem that will allow you to express the solution for
a sub-problem through solutions for smaller sub-sub-problems.
So, when you write down this recurrence relation you can actually transform it
to an ISA alternative algorithm or a recursive algorithm.
The corresponding i 20 algorithm just solves all sub-problems,
going from smaller ones to larger ones.
And for this reason it is also sometimes called a bottom up algorithm.
On the other hand, the recursive algorithm to solve a sub-problem
makes recursive calls to smaller sub-sub-problems.
And for this reason it is sometimes called the top down approach.
Well if you implement a recursive algorithms
straightforwardly it might turn out to be very slow because it will recompute
Play video starting at 1 minute 15 seconds and follow transcript1:15
some radius many, many, many times.
Like with three-dimensional numbers for example.
However, there is a simple trick, and it is called memorization,
that allows you to avoid re-computing many times the same thing.
Namely, you can do the following, when solving sub-problems,
right after solving it you store its solution into a table, for example.
Play video starting at 1 minute 41 seconds and follow transcript1:41
And when you make a recursive call to solve some sub-problem, before
trying to solve it, you check in a table whether its solution is already stored.
And if its solution is already in the table which means that it was
already computed then you just return it immediately.
So this recursive call, turns out to be just a table look up.
So this is how a recursive algorithm with memoization works.
Let's see how a recursive algorithm with memoization for
the Knapsack problem looks like.
For simplicity let's assume that we're talking about the Knapsack
we use repetitions.
In this case, we need to compute our sub-problem for a Knapsack of size w,
is just the optimal rate of a Knapsack of total weight w.
So we computed as follows, we computed by recursive procedure.
First of all, we check whether its solution is already in a hash table.
We use hash table to store pairs of objects.
So, for weight w, we store value of w if it is already computed.
If it is already in the table, we return it immediately, otherwise we just
compute it and we make recursive calls to compute the values for
the sub-problem on w minus wi, okay?
And when the value is computed, we just store it in our hash table.
So this way, we use memoization by storing this in the hash table
to avoid recomputing the same thing once again.
So once again, an iterative algorithm solves all sub-problems
going from smaller ones to larger ones, right?
And eventually solves the initial problem.
On the other hand the recursive algorithm goes as follows.
So it stars from the initial problem and
it makes recursive calls to smaller sub-sub-problems, right?
So in some sense an iterative algorithm and the recursive algorithm are doing
the same job, especially if we need to solve just old range of sub-problems.
However, a recursive algorithm might turn to be slightly slower because
Play video starting at 4 minutes 3 seconds and follow transcript4:03
it solves the same sub-problems on one hand.
On the other hand, when making a recursive call you also need to
Play video starting at 4 minutes 12 seconds and follow transcript4:12
put the return address on stamp, for example.
So, the recursive algorithm has some overhead.
There are however cases when you do not need to solve all the sub-problems and
the Knapsack problem is nice illustration of this situation.
So, imagine that we are given an input to the Knapsack problem where all
the weight of n items together with total weight of the Knapsack
are divisible by 100, for example.
Play video starting at 4 minutes 41 seconds and follow transcript4:41
This means that we are actually not interested in sub-problems
where the weight of the knapsack is not divisible by 100, why is that?
Well just because for any subset of items since all the weight
of items is divisible by 100 their total weight is also divisible by 100.
So in this case an iterative algorithm still will solve just
whole range of sub-problems.
While a recursive algorithm will make only those recursive calls
that I actually needed to compute the final solution.
So, it will make only recursive course through sub-problems
whose weight are divisible by 100.
The final remark of this lesson is about the running time.
So if you remember the running time of words that we recently designed in this
lesson was the log of n multiplied by w.
And this running time looks like polynomial, however it is not.
And this is why, so consider for example, the following input.
Play video starting at 5 minutes 48 seconds and follow transcript5:48
I mean, I assume that the total weight of the knapsack is as shown on this slide.
This is a very huge number, roughly ten to the 20,
I mean 20 digits of decimal representation.
At the same time, the input size is really tiny, just 20 digits, right?
So this is not gigabytes of data, just 20 digits but on this input
already our algorithm will need to perform roughly ten to the 20 operations.
This is really huge, for example we can't do this on our laptops,
and this is because to represent the value of W, we only need log W digits.
Play video starting at 6 minutes 30 seconds and follow transcript6:30
So, in case of the Knapsack problem,
our input is proportional not to n plus W, but to n plus log W.
Play video starting at 6 minutes 40 seconds and follow transcript6:40
Okay, and if you represent the running time in terms of n and log W,
then you get the following expression, n multiplied by 2 to the log W,
which means that our algorithm is in fact exponential time algorithm.
Play video starting at 6 minutes 56 seconds and follow transcript6:56
Put it otherwise, it can only process inputs where W is not large enough,
it's roughly less than 1 billion, for example.
Play video starting at 7 minutes 9 seconds and follow transcript7:09
Okay, and in fact, we believe that it is very difficult to construct an algorithm
that will solve this problem in polynomial time, in truly polynomial time.
In particular, we will learn later in this presentation that this problem
is considered to be so difficult that for solving the Knapsack problem for
example, in polynomial time, one gets $1 million.
Polynomial vs Pseudopolynomial
Many of you are surprised to learn that the running time O(nW) for the knapsack algorithm is
called pseudo polynomial, but not just polynomial. The catch is that the input size is proportional
to logW, rather than W.
To further illustrate this, consider the following two scenarios:
They look similar, but there is a dramatic difference. Assume that we have an algorithm that loops
for m iterations. Then, in the first case it is a polynomial time algorithm (in fact, even linear time),
whereas in the second case it is an exponential time algorithm. This is because we always
measure the running time in terms of the input size. In the first case the input size is proportional
to m, but in the second case it is proportional to logm. Indeed, a file containing just a number
“100000” occupies about 7 bytes on your disc while a file containing a sequence of 100000 zeroes
(separated by spaces) occupies about 200000 bytes (or 200 KB). Hence, in the first case the
running time of the algorithm is O(size), whereas in the second case the running time
is O(2size).
Let’s also consider the same issue from a slightly different angle. Assume that we have a file
containing a single integer 74145970345617824751. If we treat it as a sequence
of m=20 digits, then an algorithm working in time O(m) will be extremely fast in practice. If, on
the other hand, we treat it as an integer m=74145970345617824751, then an algorithm
making m iterations will work for
74145970345617824751109⋅60⋅60⋅24⋅365≈2351
years, assuming that the underlying machine performs 109 operations per second.
Further reading: a question at stackoverflow.
Resources
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Knapsack: Section 6.4 of [DPV08]
References
[DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms (1st Edition).
McGraw-Hill Higher Education. 2008.
placing parentheses
Hello, and welcome to the next lesson in the dynamic programming module. In this lesson, we will be
applying the dynamic programming technique for solving a wide range of problems where your goal is
to find an optimal order of something. We will illustrate this technique by solving the so-called placing
parentheses problem. In this problem, your input is an arithmetic expression consisting of numbers or
digits and arithmetic operations, and your goal is to find an order of applying these arithmetic
operations that maximizes the radian. You specify this order by placing parentheses, and that's why
the problem is called placing parentheses.
Play video starting at 43 seconds and follow transcript0:43
As usual we start with problem overview.
Play video starting at 47 seconds and follow transcript0:47
Consider the following toy arithmetic expression. 1 + 2- 3 x 4- 5. In this case we have five digits and
four arithmetic operations. And we would like to find an order of applying these four arithmetic
operations to maximize the value of this expression. So when the order of operation is fixed, you do
the following. You take the first operation. You take two adjusting digits, and you apply these
operations. For example, if the operation is multiplication, in this case, so then two digits are 3 and 4.
So you multiply 3 and 4, you get 12, and you just replace 3 times 4 by 12. You then take the next
operation, apply it also, and replace two numbers and the arithmetic sign by this result, until you
proceed in a similar fashion. In the end here, you get a single number. And your goal is to find an
order that guarantees that this number is as large as possible.
Play video starting at 1 minute 53 seconds and follow transcript1:53
You can specify an order just by placing a set of parentheses in your expression. For example, if you
would like to apply all your four operations just from left to right, you place the parentheses as
follows. In this particular case, we compute the results as follows. So we first compute 1 + 2, this is 3.
We then subtract 3 from the results. This gives us 0. We then multiply the result by 4. This is still 0.
And finally, we subtract 5. So this gives us -5. And this is actually non-optimal, because for example,
there is a better order. In this case, we first multiply 3 and 4, this gives us 12. We then subtract 5, this
gives us 7. Then we go to compute the sum of 1 and 2, this gives us 3. So when the final operation is
subtraction, we subtract 7 from 3. This gives us -4. So in this case the order of applying operations was
the following. So we first compute the product of 3 and 4, so this is the first operation. We then
subtract 5. This is the second operation. We then compute the result of 1 + 2. So this plus is the third
operation, and this minus is the fourth operation, the last one.
Play video starting at 3 minutes 20 seconds and follow transcript3:20
It is not difficult to see that the optimal value in this case is equal to 6. And it can be obtained as
follows. You first subtract 5 from 4. This gives you -1. You then multiply it by 3, and you get -3. You
then compute the sum of the first two digits. This is 1 + 2, and that is equal to 3. Finally you subtract
-3 from 3. This is the same as 3 + 3, it is equal to 6. Well, you might find the result as follows, you just
go through all possible orders. Let's see how many different orders are there. Well, there are four
arithmetic operations in this case, so you can choose any of the four possible arithmetic operations to
be the first one. You can choose any of these three remaining operations to be the second one, and
you can select any of the two remaining operations to be the third one. And the last one is unique, it
is the only remaining operations. So, in total, there are 4 by 3 by 2 by 1 different orders. This is equal
to 24, and you can just enumerate all of them, write them down, compute an answer for each of
these orderings and select the maximal value. However, our method of going through all possible
orderings does not scale well. And this is why. Consider the toy example shown on the slide. In this
case we have six digits and five arithmetic operations. This example will require us to go through all
possible 120 orderings.
Play video starting at 5 minutes 5 seconds and follow transcript5:05
So just because there are five iterations, so any of five of them can be the first one, any of the
remaining four of them can be the second one, and so on. So this is 5 by 4 by 3 by 2 by 1, which is
equal to 120. This is already not so easy to do this by hand. I mean, to go through all possible such
orderings. Well, this is not easy, but we can teach a computer to do this, right? So we can implement
an algorithm that goes through all possible orderings. However, in general, this algorithm will perform
roughly n factorial steps, where n is the number of arithmetic operations, for exactly the same reason.
If you have n arithmetic operations, then any of them can be the first one. Any of the remaining n
minus 1 operations can be the second one, and so on. So this is n times n minus 1, times n minus 2,
and so on. This is equal n factorial, and n factorial is an extremely fastly growing function. For
example, 20 factorial already equals roughly 2 times 10 to the 18. This means that if you implement
such an algorithm, it will not be able to compute the maximum value of an expression consisting of
just 20 digits in a reasonable time, even in one year, not to say about one second. Which means, as
usual, that we need another algorithm, as you might well have guessed. We will use dynamic
programming to find, to design a more efficient algorithm. In the meantime, you might want to check
your intuition by trying a few possible orderings to perform in this small expression, and by using our
in video quiz.
PRACTICE QUIZ • 10 MIN
Reconstructing a Solution
In this last video of this lesson, we show a method of reconstructing an actual
solution from two tables computed by our dynamic programming algorithm.
Play video starting at 13 seconds and follow transcript0:13
Okay, here on this slide we see two tables, m and capital M.
Computed by our dynamic program and
algorithm which contain minimal and maximal values respectively for
all possible subexpressions of our initial expression.
Play video starting at 30 seconds and follow transcript0:30
Let me first put in this for
all the rows and columns of these two matrices,
as well as numbers for our initial digits.
Play video starting at 49 seconds and follow transcript0:49
Well, in particular, we see by reading the contents of this cell
capital M of (1,6) that the maximal value of our initial expression is equal to 200,
and our goal is to unwind the whole solution, I mean,
parenthesizing of the initial expression, from these two tables.
So our first goal on this way is to understand from which two
subexpressions of the initial expression the value 200 was computed.
Play video starting at 1 minute 22 seconds and follow transcript1:22
Well, let's see, when computing the value for the maximal value for
subexpression (1,6), we tried all possible splittings
of the expression (1,6) into two subexpressions.
Well, let's just go through all of them.
The first possibility is to split it into two subexpressions (1,1),
which corresponds just to the first digit which
is just 5, and subexpression (2,6),
with a minus sign between them, right.
So for both these two subexpressions we already know minimal values and
maximal values.
Well, let me mark them.
So this is the minimal value for the subexpression (1,1).
This is the maximal value for subexpression (1,1).
For (2,6), this is the minimal value,
-195, and this is a maximal value, 75.
So we would like to maximize this subexpression
one minus subexpression two, which means that we would like the first subexpression
to be as large as possible and the second subexpression to be as small as possible.
Well, this means that we need to
try to take the maximal value of the first subexpression which is five and
the minimal value of the second subexpression which is -195.
Well, we see that in this case,
5 minus -195 is the same as 5 plus 195,
which equals exactly 200, right,
which allows us to conclude, actually,
that the value 200 can be obtained as follows.
So, we subtract the minimum value which is -195
over the second subexpression from 5, right.
So we restored the last operation in an optimal
parenthesizing of the initial expression.
However, we still need to find out how to obtain
-195 out of the second subexpression.
Well, let's do this.
Play video starting at 3 minutes 52 seconds and follow transcript3:52
Okay, so we need to find how the minimum value
of the subexpression (2,6) was obtained.
Play video starting at 4 minutes 2 seconds and follow transcript4:02
Well, there are several possible splittings, once again,
of the subexpression (2,6) into two smaller sub-subexpressions.
The first of them is to split (2,6) into (2,2),
which just corresponds to the digit 8 plus (3,6).
Well, in this case, we would like the value to be as small as possible and
our sign is plus in this case, which means that we would like the value of
subexpression (2,2) to be as small as possible and
the value of subexpression (3,6) also to be as small as possible.
And you already know these values, they are in our tables,
so the minimal value of subexpression (2,2) is 8,
while the minimum value of subexpression (3, 6) is minus 91, right.
So we see that the sum of these two values is not equal to -195,
right, which means that plus is not the last operation
in the optimal parenthesizing that gives the minimum
value of subexpression (2, 6), right.
So let's check the next one.
Another possibility to split the subexpression (2, 6) is the following.
We split it into subexpression (2,
3) times subexpression (4, 6), right.
So once again, we would like to find the minimum value of subexpression (2, 6).
Well, let's see just all possibilities.
The minimum value of subexpression (2, 3) is 15.
Play video starting at 5 minutes 52 seconds and follow transcript5:52
It's maximal value is also 15.
As to subexpression (4,6), its minimum value is -13.
It's maximal value is 5.
Play video starting at 6 minutes 4 seconds and follow transcript6:04
And we would like the product of these two values to be as small as possible.
Well, it is not difficult to see that if we take just 15 and
multiply it, which is a minimum value of subexpression (2,3),
and multiply it by the minimum value of the subexpression (4,6),
which is -13, then we get exactly -195.
And this, in turn, allows us to get -195
from the subexpression (2,6).
We can do as follows.
We can first compute the sum of 8 and 7.
This gives us 15.
And then to multiply it by the result of the second subexpression.
Play video starting at 6 minutes 50 seconds and follow transcript6:50
Well, now it remains to find out how to get -13 out of this subexpression for
6, but in this small example, it is already easy to get -13.
Well, we just first compute the sum of 8 and 9 and
then subtract it from 4, right.
So this way we reconstructed the whole solution, I mean,
an optimal parenthesizing, or an optimal ordering, of our arithmetic operations,
leading to this value, to this maximal value, 200.
Let's just check it once again that our parenthesizing leads to the value 200,
indeed.
So we first compute the sum of 8 and 9.
This gives us 17.
We then subtract 17 from 4.
And this gives us -13.
We then compute the sum of 8 and 7.
This gives us 15.
We multiply 15 by -13.
It gives us -195, and, finally,
we subtract this number from 5, and we get 200, indeed.
So we reconstructed the whole solution.
In general, I mean for an expression consisting of n digits and
n minus 1 operations they can respond, an algorithm makes roughly quadratic number
of steps, because it needs to reconstruct n minus one operations, I mean,
an order of n minus one operations, going from last one to the first one.
And for each operation, it potentially needs to go through all possible
splittings into two subexpressions, and this number is at most M.
So the running time is bigger of ten times M, which is bigger of M squared.
And this technique is quite general.
It applies in many cases in the dynamic problem in algorithms.
Programming assignment