Path Compression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Algorithms Course Notes

Union-Find with Path Compression


Ian Parberry∗
Spring 2003

Path Compression After Path Compression

When traversing a path from a node to the root,


either:
• During a find operation.
• During a union operation.

Traverse the path once and remember the root.


Then traverse the path a second time and make all
nodes on the path point to the root.

The intuition is that this makes a shorter tree. 42

P[42]
Before Path Compression

Two Useful Functions

Define F (n) as follows:


F (0) = 1
F (n) = 2F (n−1) for all n > 0.

F grows extremely fast.

n F (n)
0 1
1 2
2 4
42 3 16
4 65536
5 2 × 1019728
P[42]
The function G is essentially the inverse of F . More
formally, G(n) is the smallest integer k such that
∗ Copyright c Ian Parberry, 1992–2003.
 F (k) ≥ n. G(n) ≤ 5 for all “practical” values of n.

1
With path compression, a series of Θ(n) union and If w is made a descendant of v at some time during
find operations can be carried out in time O(n · the execution of σ with path compression, then the
G(n)). That is, the algorithm takes O(G(n)) amor- rank of w is less than the rank of v.
tized time.
Proof: If w is made a descendant of v with path
We will measure time by counting one time unit for compression, then it’s also a descendant of v without
each node visited when traversing paths to the root path compression (just lower down). This means the
during union and find operations. rank of w is less than the rank of v.

The Rank of a Vertex Rank Groups

Let σ be a series of union and find operations.


Partition ranks into groups as follows: Rank r is in
The rank of a vertex v with respect to σ is defined rank group G(r).
to be the height of v in the resulting forest after σ
has been executed without path compression. The largest rank is log n, so the largest rank group
is G(log n) = G(n) − 1.
Reminder: the height of a vertex is the maximum
number of hops from a leaf. Book-Keeping Trick

Lemma 1
We will charge one unit of time each time we visit a
vertex. At the end of σ we will sum up all of these
If path compression is not used, then a tree of height units.
h will have at least 2h vertices.
(Note: Ignore the fact that we visit each vertex twice
Proof: This is the same argument that we made during path-compression.)
in the earlier lecture: the intuition is that making
the root of the smaller tree point to the root of the The trick is that we will use a funky book-keeping
larger tree makes short, wide trees that look a little scheme. We charge some of the units of time to
like complete binary trees at least in terms of their the vertex, and some to the operation that caused
height/size ratio. the visit. At the end we add up all the time units
charged to both operations and vertices.

Lemma 2
The Rules
There are at most n/2r vertices of rank r.
Let v be a vertex. Each time we pass through it we
Proof: By Lemma 1, each vertex of rank r has at
will charge as follows:
least 2r descendants. Since the descendants of ver-
tices at the same height must be distinct, having 1. If the parent of v is in a different rank group
more than n/2r vertices of rank r would give us more than v or v is the root, then charge one unit of
than 2r · n/2r = n vertices in total, which is impos- time to the operation.
sible.
2. If both v and its parent are in the same rank
(Note: The only way that two vertices can share group, then charge one unit of time to v.
descendants is if one is an ancestor of the other —
which would give them different heights.)

Corollary: No vertex can have rank larger than Charges Under Rule 1
log n.
By Lemma 3, vertices going up a path are monoton-
Lemma 3
ically increasing in rank. There are at most G(n)

2
different rank groups, 0 through G(n) − 1. There- total of n · G(n) by Rule 1.
fore we cross a rank group boundary at most G(n)
times. There are n vertices charged a total of n · G(n) by
Rule 2.
We charge one time unit to the operation each time
we cross a rank group boundary. So no operation is Grand total of O(n · G(n)) for an amortized time of
charged more than G(n) time units under Rule 1. O(G(n)) per operation.

Charges Under Rule 2

If Rule 2 applies, v will be made the child of a parent


with higher rank. If v is in rank group g, this can
happen at most F (g) − F (g − 1) times (the number
of ranks in rank group g) before it acquires a parent
in a higher rank group. All subsequent charges are
made under Rule 1 from this point.

Therefore each vertex in rank group g is charged


at most F (g) − F (g − 1) ≤ F (g). But how many
vertices are in rank group g? If we can find that, we
can compute the total charges to the vertices.

Let N (g) be the number of vertices in rank group


g > 0. By Lemma 2:
F (g)
 n
N (g) ≤
2r
r=F (g−1)+1
n 1 1
≤ · (1 + + + · · ·)
2F (g−1)+1 2 4
n

2F (g−1)
n

F (g)

So the total sum charged to the vertices is at most:


G(n)−1

F (g) · N (g)
g=0
G(n)−1

≤ F (g) · n/F (g)
g=0
G(n)−1

= n
g=0
= n · G(n).

Analysis

There are O(n) operations, each charged G(n) for a

You might also like