BHN Kuliah 14 - Techniques For The Design of Algorithm1
BHN Kuliah 14 - Techniques For The Design of Algorithm1
BHN Kuliah 14 - Techniques For The Design of Algorithm1
Algorithmic Paradigms
That is:
Over the next few lectures we shall examine the following paradigms:
Questions
An assignment
A comparison between two variables
An arithmetic operation between two variables. The worst-case input
is that input assignment for which the most basic operations are
performed.
Simple Example:
n := 5;
loop
get(m);
n := n -1;
until (m=0 or n=0)
Worst-case: 5 iterations
Usually we are not concerned with the number of steps for a single fixed
case but wish to estimate the running time in terms of the `input size'.
get(n);
loop
get(m);
n := n -1;
until (m=0 or n=0)
Worst-case: n iterations
Sorting:
Graph `searching':
n == the number of nodes in the graph or the number of edges in the graph.
Iteration:
Conditional
Asymptotic Performance
We are usually concerned with the growth in running time as the input size
increases, e.g.
n P Q R
1 1 5 100
10 1024 500 1000
100 2 100 50,000 10,000
1000 2 1000 5 million 100,000
If each is run on a machine that executes one million (10^6) operations per
second
n P Q R
1 1(*ms 5(*ms 100(*ms
5 1 millisec 0.5 millisec 1 millisec
100 270years 0.05 secs 0.01 secs
1000 2970years 5 secs 0.1 secs
Thus,
The growth of run-time in terms of n (2^n; n^2; n) is more significant than
the exact constant factors (1; 5; 100)
`O'-notation
There are values, n0 and c, such that f( n ) < = c * g( n ) whenever n > = n0.
Run-time O( n^2 )
Examples
OM-notation
OM-notation
can be used.
Again let, f:N-> R and g:N-> R. Then: f( n ) = OM ( g(n) ) means that there
are values, n0 and c, such that f( n ) > = c * g( n ) whenever n > = n0.
THETA-notation
f(n)+g(n)=O(g(n)) ; f(n)+g(n)=OM(g(n))
f(n)*g(n)=O(g(n)^2) ; f(n)*g(n)=OM(f(n)^2)
Examples
f(n)=O(g(n))
10n+2n^2 = O(n^2)
10n+2n^2 = OM(n^2)
(10n)*(2n^2) = O(n^4)
(10n)*(2n^2) = OM(n^2)
Divide-and-Conquer
Suppose our original algorithm, alpha, is used to carry out the `solves these
sub-instances' step 2. Let
Then,
But
= (3/4)(cn^2) + dn
So if dn < (cn^2)/4 (i.e. d < cn/4) then beta is faster than alpha
In particular for all large enough n, (n > 4d/c = Constant), beta is faster than
alpha.
This realisation of beta improves upon alpha by just a constant factor. But if
the problem size, n, is large enough then
n > 4d/c
n/2 > 4d/c
...
n/2^i > 4d/c
which suggests that using beta instead of alpha for the `solves these' stage
repeatedly until the sub-sub-sub..sub-instances are of size n0 < = (4d/c) will
yield a still faster algorithm.
cn^2 if n < = n0
T(gamma)(n) =
3T(gamma)( n/2 )+dn otherwise
We shall show how relations of this form can be estimated later in the
course. With these methods it can be shown that
The improvement that results from applying algorithm gamma is due to the
fact that it maximises the savings achieved beta.
The threshold input size, n0, below which the problem size is not sub-
divided.
The size of sub-instances into which an instance is split.
The number of such sub-instances.
The algorithm used to combine sub-solutions.
In (II) it is more usual to consider the ratio of initial problem size to sub-
instance size. In our example this was 2. The threshold in (I) is sometimes
called the (recursive) base value. In summary, the generic form of a divide-
and-conquer algorithm is:
Recursive Procedures
Example 1:
Binary Search
Given a name and the value n the problem is to find the number associated
with the name.
We assume that any given input name actually does occur in the directory,
in order to make the exposition easier.
Or
Or
Example 2
Closest Pair
Input:
Output
Note: The distance DELTA( i, j ) between p(i) and p(j) is defined by the
expression:
For each i, x(i) < = x(i+1), i.e. the points are ordered by increasing x from
left to right.
Consider drawing a vertical line (L) through the set of points P so that half
of the points in P lie to the left of L and half lie to the right of L.
and
The section between the two comment lines is the `combine' stage of the
Divide-and-Conquer algorithm.
If there are points p(l) and p(r) whose distance apart is less than DELTA then
it must be the case that
The x-coordinates of p(l) and p(r) differ by at most DELTA.
The y-coordinates of p(l) and p(r) differ by at most DELTA.
Call the set of points found in (1) and (2) P-strip. and sort the s points in this
in order of increasing y-coordinate. letting ( q(1),q(2) ,..., q(s) ) denote the
sorted set of points.
Then the combine stage of the algorithm consists of two nested for loops:
Example 3
Integer Multiplication
Note: The algorithm below works for any number base, e.g. binary, decimal,
hexadecimal, etc. We use decimal simply for convenience.
2 (n/2)-digit numbers.
Thus the expression for the multiplication of x and y in terms of the numbers
a, b, c, and d tells us that:
Moderately difficult: How many steps does the resulting algorithm take to
multiply two n-digit numbers?
Performance Analysis
It was observed earlier that one reason for examining algorithmic paradigms
was the fact that their running time could often be precisely determined.
In the case when alpha and beta are both constant (as in all the examples we
have given) there is a general method that can be used to solve such
recurrence relations in order to obtain an asymptotic bound for the running
time Tp( n ).
In general:
Dynamic Programming
Optimisation Problem
Dynamic Programming is a
Bottom-Up Technique
in which the smallest sub-instances are explicitly solved first and the results
of these used to construct solutions to progressively larger sub-instances.
In contrast, Divide-and-Conquer is a
Top-Down Technique
which logically progresses from the initial instance down to the smallest
sub-instances via intermediate sub-instances. We can illustrate these points
by considering the problem of calculating the Binomial Coefficient, "n
choose k", i.e.
Using this relationship, a rather crude Divide-and-Conquer solution to the
problem of calculating the Binomial Coefficient `n choose k' would be:
It should be noted that since the coefficient `i choose j' requires only the
values of `i-1 choose j-1' and `i-1 choose j', computing the table entries in
the order of increasing i+j ensures that the table entries needed for ``i choose
j' have already been calculated, i.e.
k : integer)
return integer is
bc : table;
i, j, k : integer;
sum : integer;
begin
bc(i,0) := 1;
end loop;
bc(1,1) := 1;
sum := 3; i := 2; j := 1;
bc(i,j) := bc(i-1,j-1)+bc(i,j-1);
i := i-1; j := j+1;
i := sum-1; j := 1;
else
i := n; j := sum-n;
end if;
end if;
end loop;
return bc(n,k);
end bin_coeff;
sum := sum + 1;
i := sum-1; j := 1;
else
i := n; j := sum-n;
end if;
end if;
is invoked when all the table entries `i choose j', for which i+j equals the
current value of sum, have been found. The if statement increments the value
of sum and sets up the new values of i and j.
Now consider the differences between the two methods: The Divide-and-
Conquer approach recomputes values, such as "2 choose 1", a very large
number of times, particularly if n is large and k depends on n, i.e. k is not a
constant.
Despite the fact that the algorithm description is quite simple (it is just a
direct implementation of the relationship given) it is completely infeasible
as a practical algorithm.
V = {1, 2 ,..., n }
and edges E as subset of VxV. Each edge in E has associated with it a non-
negative length.
For each k (with 1 < = k < = n), the (i, j) entry of Dk, denoted Dk( i,j ), will
contain the Length of the shortest path from node i to node j when only
the nodes
{ 1, 2, 3 ,..., k }
Obviously Dn = D.
0 if i=j
D0( i, j ) = infinite if (i,j) not in E
Length(i,j) if (i,j) is in E
Now, suppose we have constructed Dk, for some k < n.
{ 1, 2, 3 ,..., k, k+1 }
D(k+1)( i, j ) = Dk( i, j )
Dk(i,j)
minimum
Dk( i, k+1 ) + Dk( k+1, j )
In the implementation below, L denotes the matrix of edge lengths for the set
of edges in the graph G( V, E ).
O( n^3 )
Thus O( n ) steps are used to compute each of the n^2 matrix entries.
Greedy Algorithms
This is another approach that is often used to design algorithms for solving
Optimisation Problems
Minimised or Maximised
SELECT( x ) is largest
or SELECT( x ) is smallest
All Greedy Algorithms have exactly the same general form. A Greedy
Algorithm for a particular problem is specified by describing the predicates
`solution' and `feasible'; and the selection function `select'.
Consequently, Greedy Algorithms are often very easy to design for
optimisation problems.
boolean;
boolean;
--
***************************************************
x : candidate;
S : candidate_set;
begin
S := {};
x := select( C );
C := C - {x};
S := S union { x };
end if;
end loop;
if solution( S ) then
return S;
else
return es;
end if;
end greedy;
For the first of these, the algorithm always returns an optimal solution.
The output is a spanning tree, T( V, F ) of G( V,E ) such that the total edge
length, is minimal amongst all the possible spanning trees of G( V,E ).
Note: An n-node tree, T is a connected n-node graph with exactly n-1 edges.
T( V,F ) is a spanning tree of G( V,E ) if and only if T is a tree and the edges
in F are a subset of the edges in E.
The algorithm may be viewed as dividing the set of nodes, V, into n parts or
components:
An edge is added to the set S if and only if it joins two nodes which belong
to different components; if an edge is added to S then the two components
containing its endpoints are coalesced into a single component.
In this way, the algorithm stops when there is just a single component
{ 1, 2 ,..., n }
remaining.
Iteration Edge Components
0 - {1}; {2}; {3}; {4}; {5}; {6}; {7}
1 {1,2} {1,2}; {3}; {4}; {5}; {6}; {7}
2 {2,3} {1,2,3}; {4}; {5}; {6}; {7}
3 {4,5} {1,2,3}; {4,5}; {6}; {7}
4 {6,7} {1,2,3}; {4,5}; {6,7}
5 {1,4} {1,2,3,4,5}; {6,7}
6 {2,5} Not included (adds cycle)
7 {4,7} {1,2,3,4,5,6,7}
Question: How do we know that the resulting set of edges form a Minimal
Spanning Tree?
Since T is a tree the graph T (which contains one extra edge) must contain a
cycle that includes the (new) edge {p,q}.
Now since p is in W and q is not in W there must be some edge, e' = {p',q'}
in H which is also part of this cycle and is such that p' is in W and q' is not in
W.
Now, by the choice of e, we know that
Removing the edge e' from T gives a new spanning tree of G(V,E).
T is a minimal spanning tree so either e and e' have the same length or this
case cannot occur. It follows that there is a minimal spanning tree containing
F union {e} and hence this set of edges is promising as claimed.
Let C be the component in which p lies. By the inductive hypothesis the set
S is promising. The Inductive Step now follows by invoking the Lemma,
with W = Set of nodes in C and F = S.
Integer Knapsack
A capacity K.
Output: A subset B of U such that the sum over u in B of s(u) does not
exceed K and the sum over u in B of v(u) is maximised.
No fast algorithm guaranteed to solve this problem has yet been discovered.
v( ui )
--------
s( ui )
is maximal
These yield the following greedy algorithm which approximately solves the
integer knapsack problem.
A very simple example shows that the method can fail to deliver an optimal
solution. Let
K = 110
Semantic Nets
Hypertext
...
These requirements must be realised subject to the constraint that the search
process respects the structure of the graph.
Any new node examined must be adjacent to some node that has
previously been visited.
backtracking
If (2) occurs then the search process is `backed-up' until a node is reached
from which a solution might still be found.
Simple Example
Knight's Tour
The implicit graph in this problem has n^2 nodes corresponding to each
position the Knight must occupy. There is an edge between two of these
nodes if the corresponding positions are `one move apart'. The sub-graph
that defines a solution is a cycle which contains each node of the implicit
graph.
Of course it is not necessary to construct this graph explicitly, in order to to
solve the problem. The algorithm below, recursively searches the graph,
labeling each square (i.e. node) in the order in which it is visited. In this
algorithm:
ok : in out boolean) is
w, z : integer;
begin
ok := ( (x,y) = (1,1) );
ok := false;
else
board(x,y) := move;
loop
knight(board, w, z, move+1, ok );
end loop;
if not ok then
end if;
end if;
end knight;
Depth-First Search
The Knight's Tour algorithm organises the search of the implicit graph using
a depth-first approach.
The labelling of a search tree prescribes the order in which the nodes of G
are to be scanned.
v : node;
lambda : integer;
T : in
outsearch_tree) is
begin
label(v) := lambda;
lambda := lambda+1;
depth_first_search(G(V,E),w,lambda,T);
end if;
end loop;
end depth_fist_search;
--*******************************
--******************************
begin
label( w ) := 0;
end loop;
lambda := 1;
end;
If G( V,E ) is a directed graph then it is possible that not all of the nodes of
the graph are reachable from a single root node. To deal with this the
algorithm is modified by changing the `Main Program Section' to
begin
for each w in V loop
label( w ) := 0;
end loop;
lambda := 1;
for each v mem V loop
if label( v ) = 0 then
depthfirstsearch ( G(V,E),v,lambda,T );
end if;
end loop;
end;
The running time of both algorithms, input G( V,E ) is O( |E| ) since each
edge of the graph is examined only once.
It may be noted that the recursive form given, is a rather inefficient method
of implementing depth first search for explicit graphs.
Iterative versions exist, e.g. the method of Hopcroft and Tarjan which is
based on a 19th century algorithm discovered by Tremaux. Depth-first
search has a large number of applications in solving other graph based
problems, for example:
Topological Sorting
Connectivity Testing
Planarity Testing
Unmarked neighbours of v;
if label( v ) = 0 then
label( v ) := lambda;
lambda := lambda + 1;
end if;
end loop;
CurrentLevel := NextLevel;
NextLevel := (es;
end loop;