Greedy Algorithms 3
Greedy Algorithms 3
Greedy Algorithms 3
1
Summary
Greedy algorithms for
• The union- find problem
• Min cost spanning trees (Prim’s algorithm and Kruskal’s
algorithm)
2
The Union-find problem
Given a set {1, 2, …., n} initially partitioned into n disjoint
subsets, one member per subset, we want to perform the
following operations:
• Find(x): return the name of the subset that x is in
• Union(x, y) : combine the two subsets that x and y are in
3
A Data structure for Union-
Find
Use a tree:
4
Set 11 is {9, 10, 11, 12}
11
Set 2 is {2} 9
10
2
12 7
3 8 6 5
Set 7 is {4, 5, 7} 5
Set 1 is {1, 3, 6, 8}
11
9
10
7
2 12
1
4
3 8 6 5
6
P 1 2 3 4 5 6 7 8 9 10 11 12
Implementing the
Operations
• find(x): Follow the chain of pointers starting at P[x] to
the root of the tree containing x. return the set element
stored there.
• union (x, y): Follow the chains of pointers starting at P[x]
and P[y] to the roots of the trees containing x and y,
respectively. Make the root of one tree point to the root
of the other.
7
Example
11
9
10
7
Union {4, 12} :
12
4 k
8
P 1 2 3 4 5 6 7 8 9 10 11 12
Analysis
Running time proportional to number of layers in the tree. But this may
be Ω (n ) for an n-node tree.
IMPROVEMENT
Make the root of the smaller tree point to the root of the bigger
9
tree.
10
More Analysis
Claim: If this algorithm is used, every tree with n nodes has at
most ⌊log n⌋ + 1 layers.
11
T was made by unioning together two trees with less than n
nodes each.
m
n-m m n-m
Then, by the induction hypothesis, T has
max { depth(T1.) + 1, depth(T2)}
= max {⌊log m⌋ + 2, ⌊log(n – m)⌋ + 1}
≤ max {⌊log n/2⌋ + 2, ⌊log n ⌋ + 1}
= ⌊log n⌋ + 1
Layers.
Implementing detail: store count of nodes in root of each tree. 12
Update during union in O(1) extra time.
Conclusion: union and find operations take time O(log n).
It is possible to do better using path compression. When
traversing a path from leaf to root, make all nodes point to root.
Gives better amortized performance.
13
Spanning Trees
A graph S = (V, T) is a spanning tree of an undirected graph G = (V, E) if:
• S is a tree, that is, S is connected and has no cycles.
• TE
1 2 1 2
4 7 3 4 7 3
6 5 6 5
1 2 1 2
4 7 3 4 7 3
6 5 14
6 5
Spanning Forests
A graph S = (V, T) is a spanning forest of an undirected graph G = (V, E)
if:
• S is a forest, that is, S has no cycles.
• T E 1 2
1 2
4 7 3
4 7 3
6 5
6 5
1 2 1 2
4 7 3 4 7 3
6 5 15
6 5
Min Cost Spanning Tree
Given a labeled, undirected, connected graph G, find a spanning tree
for G of minimum cost.
20
1 2 20
23 15 1 2
1 23 15
4 1
36 4
4 7 3 36
9 4 7 3
25 16 3 9
25 16 3
28
6 5 28
17 6 5
17
Cost = 23 + 1 + 4 + 9 + 3 + 17 = 57
16
The Muddy City Problem
The residents of Muddy City are too cheap to pave
their streets. (After all, who likes to pay
takes?)However, after several years of record
rainfall they are tired of getting muddy feet. They
are still too miserly to pave all of the streets, so
they want to pave only enough streets to ensure
that they can travel from every intersection to
every other intersection on a paved route, and
they want to spend as little money as possible
doing it. (The residents of Muddy City don’t mind 17
walking a long way to save money.)
Solution: Create a graph with a node for each intersection, and
an edge for each street. Each edge is labeled with the cost of
paving the corresponding street. The cheapest solution is a min-
cost spanning tree for the graph.
An Observation About Trees
Proof:
1. If there were more than one path between u and v, then
there would be a cycle, which contradicts the definition of a
tree.
18
u v
2. Consider adding an edge (u, v) ɇ T to T. There must already be a
path from u to v (since a tree is connected). Therefore, adding an
edge from u to v creates a cycle.
u v u v
This cycle must be unique, since id adding (u, v) creates two cycles, then
there must have been a cycle before, which contradicts the definition of
a tree.
u v
u v
u v 19
Key Fact
Here is the principle behind MCST algorithms:
e e
v v
u u
Vi Vi w x
d
22
Therefore, there must be an edge d = (w, x) such that w є Vi and x
ɇ Vi.
Let c(e) denote the cost of e and c(d) the cost of d. By hypothesis,
c(e) ≤ c(d).
23
Corollary
Claim: Let G =(V, E) be a labeled, undirected graph, and S = (V, T) a
spanning forest for G.
1. F := ф
for j := 1 to n do
Vj := { j }
F := F U {(Vj , ф)}
2. while there is > 1 tree in F do
3. Pick an arbitrary tree Si = (Vi, Ti) from F
4. Let e = (u, v) be a min cost edge, where u є Vi , v є Vi, j ≠ i 25
5. Vi := Vi U Vj; Ti := Ti U Tj U {e}
Remove tree (Vj, Tj) from forest F
Prim’s Algorithm : Overview
Take i =1 in every iteration of the algorithm.
We need to keep track of:
• The set of edges in the spanning forest, T1
• The set of vertices V1
• The edges that lead out of the tree (V1, T1) , in a data structure that
enables us to choose the one of smallest cost
26
27
Prim’s Algorithm
Q is a priority queue od edges, ordered on cost.
1. T1 := ф; V1 := {1}; Q := empty
2. for each w є V such that (1, w) є E do
3. insert (Q, (1, w))
4. while |V1| < n do
5. e := deletemin (Q)
6. Suppose e = (u, v), where u є V1.
7. if v ɇ V1 then
8. T1 := T1 U {e}
9. V1 := V1 U {v}
10. for each w є V such that (v, w) є E do
11. insert (Q, (v, w)) 28
For Data Structures
• E : adjacency list
• V1: linked list
• T1 : linked list
• Q : heap
Analysis
• Line 1:O(1)
• Line 3:O(log e)
• Line 2-3:O(n log e)
• Line 5:O(log e)
• Line 6:O(1)
• Line 7:O(1)
• Line 8:O(1)
• Line 9:O(1)
• Line 4-9: O(e log e)
29
• Line 11: O(log e)
Total : O (e log e) = O ( e log n)
• Lines 4,10-11: O(e log e)
Kruskal’s Algorithm: Overview
At every iteration of the algorithm, take e = (u, v) to be the unused edge of
smallest cost, and Vi and V j to be vertex sets of the forest such that u є Vi and
v є Vj.
(note: F is a partition of V)
• The set of edges in the spanning forest, T
• The unused edges, in a data structure that enables us to choose the one of
smallest cost
30
31
Kruskal’s Algorithm
T is the set of edges in the forest.
F is the set of vertex – sets in the forest.
1. T : = ф; F := ф
2. for each vertex v є V do F := F U {{v}}
3. Sort edges in E in ascending order of cost
4. while |F| > 1 do
5. (u, v) := the next edge in sorted order
6. if find(u) ≠ find(v) then
7. union(u, v)
8. T := T U {(u, v)}
Analysis
• Line 1:O(1)
• Line 2:O(n)
• Line 3:O(e log e)=O(e log n)
• Line 5:O(1)
• Line 6:O(log n)
• Line 7:O(log n)
• Line 8:O(1)
• Line 4-8: O(e log n)
33
Total: O(e log n)
Questions
Would Prim’s algorithm do this?
3 5 3
5
4 2 4
2 1 2 12
2 12 4 4 5
1 5
6 6
10 1 10
1 9 8 9 11
8 11
7 7
6 7 6 7
3 3
5 3 3
5
2 4 2 4
1 2 12 1 2 12
4 5 4 5
6 6
1 10 1 10
8 9 11 8 9 11 34
7 7
6 7 6 7
3 3
Would Kruskal’s algorithm do this?
5 3
5 3
2 4
2 4 1 2 12
2 12 4 5
1 4 5 6
6 1 10
1 10 8 9 11
8 9 11
7
7 6 7
6 7 3
3
5 3
5 3
2 4
1 2 12 2 4
4 5 1 2 12 4 5
6
1 10 6
9 1 10
8 11 9
8 11 35
7
6 7 7
3 6 7
3
Could a min-cost spanning tree algorithm do this?
3 5 3
5
4 2 4
2 1 2 12
2 12 4 4 5
1 5
6 6
10 1 10
1 9 8 9 11
8 11
7 7
6 7 6 7
3 3
5 3 3
5
2 4 2 4
1 2 12 1 2 12
4 5 4 5
6 6
1 10 1 10
8 9 11 8 9 11 36
7 7
6 7 6 7
3 3