DisjointSet Slide

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

DISJOINT SET DATA STRUCTURE

Partha P. Chakrabarti & Aritra Hazra


Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
Disjoint-Set Data Structures: Applications
Minimum Spanning Tree of Graph (G)
Algorithm MST_Kruskal ( G = (V,E) ) {
A = { };
for each v in V do MAKE-SET(v);
for each edge e = (u, v) in E ordered by
increasing weight(u, v) do {
if FIND-SET(u) ≠ FIND-SET(v) then {
A = A + {(u, v)};
UNION(FIND-SET(u), FIND-SET(v));
}
}
return A;
}
Disjoint-Set Data-Type and Operations
•  Primary Operations:
–  MAKE-SET(x): create a new set containing only element x
–  FIND-SET(x): return a canonical element in the set containing x
–  UNION(x, y): replace the sets containing x and y with their union

•  Performance parameters:
–  m = number of calls to FIND-SET and UNION operations
disjoint sets =
–  n = number of elements = number of calls to MAKE-SET
connected components
•  Application: Dynamic connectivity over initially empty graph
–  ADD-NODE(u): add node u (1 MAKE-SET operation)
–  ADD-EDGE(u, v): add an edge between nodes u and v (1 UNION operation)
–  IS-CONNECTED(u, v): is there a path between u and v ? (2 FIND-SET operations)
Disjoint-Set Operations: Implementation (1)
Linked List Implementation Set A: {c, h ,e, b}
•  MAKE-SET(x): O(1)
–  need to create only one
node created with
appropriate pointers Set B: {f, g, d}
•  FIND-SET(x): O(n)
–  need to traverse entire
linked list to find x
Set (A U B): {c, h, e, b, f, g, d}
•  UNION(x,y): O(n)
–  need to point back all
back-pointers of second
list to head of first list
Disjoint-Set Operations: Implementation (2)
•  Array Representation UNION(3,5) or
–  Represent each set as tree of elements UNION(8,6) or
UNION(8,7)
–  Allocate an array of parent[] of length n
–  parent[i]=j (parent of element i is j)

/ /
0 0 •  Analysis of Operations:
–  Total zeros in array = Disjoint-sets
X X
–  FIND-SET(x): O(n) worst-case
–  UNION(x,y): O(n) worst-case
•  UNION(FIND-SET(x), FIND-SET(y))
•  O(n) due to FIND-SET operation
Suppress self-loops for root for brevity! Solution: Smart Union-Find Algorithms !!
Smart Disjoint-Set Operations: Union-by-Size
•  Union-by-Size UNION(x,y) {
–  Maintain a tree size r ß FIND-SET(x);
(number of nodes) for s ß FIND-SET(y);
each root node if(r == s) return r;
–  Link root of smaller else if(size[r] > size[s]) {
tree to root of larger parent[s] ß r;
tree (break tries size[r] = size[r] + size[s];
arbitrarily) return r;
}
FIND-SET(x) { else { UNION(3,5)
while(x is not parent) parent[r] ß s;
x ß parent[x]; size[s] = size[r] + size[s];
return x; return s;
} MAKE-SET(x) {
}
parent[x] ß 0;
}
size[x] ß 1;
return x;
}
Analysis of Union-by-Size Heuristic (1)
Property: Using union-by-size, for every root node r, we have size[r] ≥ 2height(r)
Proof: [ by induction on number of links ]
–  Base case: singleton tree has size 1 and height 0
–  Inductive hypothesis: assume true after first i links

–  Tree rooted at r changes only when a smaller (or


equal) size tree rooted at s is linked into r

–  Case 1. [ height(r) > height(s) ]


sizeʹ[r] > size[r] ≥ 2height(r) = 2heightʹ(r)
–  Case 2. [ height(r) ≤ height(s) ]
sizeʹ[r] = size[r] + size[s] ≥ 2 size[s] ≥ 2 x 2height(s)
= 2height(s) + 1 = 2heightʹ(r)
Analysis of Union-by-Size Heuristic (2)
•  Theorem: Using union-by-size, any UNION or FIND-SET operation takes O(log2 n)
time in the worst case, where n is the number of elements
•  Proof:
–  The running time of each operation is bounded by the tree height
–  Using union-by-size, a tree with n nodes can have height at most log2 n
–  By the previous property, the height is ≤ ⎣log2 n⎦

•  The UNION operation takes O(1) time except for its two calls to FIND-SET
–  FIND-SET required to find out the set representative (which is the root)

•  m number of UNION and FIND-SET operations takes a total of O(m log2 n) time
Smart Disjoint-Set Operations: Union-by-Rank
•  Union-by-Rank rank = height
–  Maintain an integer
rank for each node, UNION(x,y) {
initially 0 r ß FIND-SET(x);
s ß FIND-SET(y);
–  Link root of smaller
if (r == s) return r;
rank to root of larger
else if (rank[r] ≥ rank[s]) {
rank; if tie, increase
rank of larger root by 1 parent[s] ß r;
if(rank[r] == rank[s])
FIND-SET(x) { rank[r] = rank[r] + 1; UNION(3,5)
while(x is not parent) return r;
x ß parent[x]; }
return x; else {
} MAKE-SET(x) {
parent[r] ß s;
parent[x] ß 0;
return s;
rank[x] ß 0;
}
return x;
}
}
Analysis of Union-by-Rank Heuristic (1)
Property-1: If x is not a root node, then rank[x] < rank[parent[x]]
Proof: A node of rank k is created only by linking two roots of rank k – 1.

Property-2: If x is not a root node, then rank[x] will never change again
Proof: Rank changes only for roots; a non-root never becomes a root.

Property-3: If parent[x] changes,


then rank[parent[x]] strictly
increases.
Proof: The parent can change
only for a root, so before linking
parent[x] = 0. After x is linked
using union-by-rank to new root
r we have rank[r] > rank[x].
Analysis of Union-by-Rank Heuristic (2)
Property-4: Any root node of rank k
has ≥ 2k nodes in its tree
Proof: [ by induction on k ]
•  Base case: true for k = 0
•  Inductive hypothesis: assume true
for k – 1
•  A node of rank k is created only
by linking two roots of rank k – 1
•  By inductive hypothesis, each of
two sub-tree has ≥ 2k – 1 nodes
=> resulting tree has ≥ 2k nodes

Property-5: The highest rank of a node is ≤ ⎣log2 n⎦


Proof: Immediately concluded from Property-1 and Property-4
Analysis of Union-by-Rank Heuristic (3)
Property-6: For any integer k ≥ 0, there are ≤ n / 2k nodes with rank k
Proof:
•  Any root node of rank k has ≥ 2k descendants. [by Property-4]
•  Any non-root node of rank k has ≥ 2k descendants because:
§  it had this property just before it became a non-root [by Property-4]
§  its rank does not change once it became a non-root [by Property-2]
§  its set of descendants does not change once it became a non-root
•  Different nodes of rank k cannot have common descendants [by Property-1]
Theorem: Using union-by-rank, any
UNION or FIND-SET operation takes
O(log2 n) time in the worst case, where
n is the number of elements.
Proof: The running time of UNION and
FIND-SET is bounded by the tree
height ≤ ⎣log2 n⎦ [by Property-5]
Smart Disjoint-Set Operations: Path Compression
•  When finding the root r of the tree containing x, change the parent pointer of all
nodes along the path to point directly to r
Path Compression: Example

FIND-SET(x) {
if(x is not parent)
parent[x] ß FIND-SET(parent[x]);
return x;
}
Properties of Union-by-Rank + Path Compression (1)
Property-0: The tree roots, node ranks, and elements within a tree are the same with or
without path compression.
Property-1: If x is not a root node, then rank[x] < rank[parent[x]]
Proof: Path compression can make x point to only an ancestor of parent[x]
Property-2: If x is not a root node, then rank[x] will never change again
Property-3: If parent[x] changes, then rank[parent[x]] strictly increases.
Proof: Path compression doesn’t change any ranks, but it can change parents
If parent[x] doesn’t change during a path compression the inequality continues to hold
if parent[x] changes, then rank[parent[x]] strictly increases
Property-4: Any root node of rank k has ≥ 2k nodes in its tree
Property-5: The highest rank of a node is ≤ ⎣log2 n⎦
Property-6: For any integer k ≥ 0, there are ≤ n / 2k nodes with rank k
Properties of Union-by-Rank + Path Compression (2)
•  Definitions: Rank Groups

–  log* n = 0, when n ≤ 1 i times


Inverse Ackerman

= MIN { i ≥ 0 | log2 log2 … log2 n ≤1}, when n ≥ 2


Function

–  log* n = 0, when n ≤ 1 Recursive Definition


= 1 + log* (log2 n), otherwise
–  Ackerman Function, F(j) = 1, when j = 0
= 2F(j-1), when j ≥ 1
Property-7: The largest
group number is ≤ log* Property-8: Number of nodes in a particular group g is
(log2 n) = log* n – 1 given by, ng < n/F(g)
Proof: Since largest Proof: ng < ΣF(g)r=F(g-1) +1 n/2r < 2n/2F(g-1)+1 = n/2F(g-1) = n/F(g)
possible rank is ⎣log2 n⎦, [ since, n/2r + n/2r+1 + n/2r+2 + … + n/2r+k
hence the result < (n/2r) Σ∞0 (1/2k) = 2n/2r ]
Analysis of Union-by-Rank with Path Compression (1)
x
•  Case-1: If v is root (= x), a child of root or if parent[v] is
in a different rank group; then we charge ONE unit of v
time to FIND-SET operation
•  Case-2: If v ≠ x, and both v and parent[u] are in the parent[u]
same group, then we charge ONE unit of time to node v
u
•  Observation-1: Ranks of nodes in a path from u to x
increases monotonically path-compression
w.r.t. FIND-SET(u)
–  After x is found to be the root, we do path
compression x
–  If later on, x becomes a child of another node and v
& x are in different groups, no more node charges u v
on v in later FIND-SET operations
Analysis of Union-by-Rank with Path Compression (2)
•  Observation-2: If a node v is in group g (g > 0), v can be moved and charged at most
[F(g) – F(g-1)] times before it acquires a parent in a higher group.

•  Complexity Analysis:
–  Time Complexity = (Number of nodes in group g) x (Movement charges across
groups) x (Movement charges with groups) = (n/F(g)) x (log* n) x [F(g) – F(g-1)]
≤ n log* n [ since, (n/F(g))x[F(g) – F(g-1)] ≤ n ]

•  Theorem: The time complexity required to process m UNION and FIND-SET


operations using union-by-rank with path-compression heuristic is O(m log* n) in
the worst case
–  which may be also said as O(m), as log*n ≤ 5 practically
(as otherwise n is more than the number of atoms in universe!!)
Thank you

You might also like