Lec3 Algo2021Spring
Lec3 Algo2021Spring
Lec3 Algo2021Spring
Divide and conquer is a common algorithm design technique that recursively divides a problem instance
into multiple sub-instances of the same problem, until the sub-instances are small enough to be solved
directly. The idea behinds divide and conquer is similar to mathematical induction. In fact, mathematical
induction is normally applied to prove the correctness of a divide-and-conquer-based algorithm. Finding
clever ways to divide the problem instance and to merge the solutions of multiple sub-instances is usually
the key to make a divide-and-conquer-based algorithm run faster. In this lecture, we will consider some
classical problems that can be typically solved by divide and conquer.
1 Sorting an Array
Problem 1. Given an array a[1, . . . , n] of n integers, sort the array in the ascending order.
We introduce a divide and conquer algorithm, the merge sort. Merge sort divides the array a[1, . . . , n] into
[ ] [ ]
two arrays, a 1, . . . , b n2 c and a b n2 c + 1, . . . , n , sorts the two arrays recursively, and then merges the two
sorted arrays. The algorithm is described in Algorithm 1.
It remains to define the function MeRge that merges two sorted arrays b and c . This can be done easily, as
described in Algorithm 2.
We first analyze the time complexity for computing the function MeRge(b[1, . . . , m], c[1, . . . , n]). Notice
that exactly one element is added into a at each while-loop iteration, and each element in b and c is added
exactly once. It is easy to see that the time complexity here is O(m + n).
1
Algorithm 2 The MeRge function
MeRge(b[1, . . . , m], c[1, . . . , n]): // b and c are sorted in ascending order
1: initialize array a with length 0
2: initialize i ← 1 and j ← 1
3: while i ≤ m and j ≤ n :
4: if b[i ] ≤ c[ j ]:
5: a .push_back(b[i ])
6: i ← i +1
7: else
8: a .push_back(c[ j ])
9: j ← j +1
10: endif
11: endwhile
12: if i > m , append c[ j , . . . , n] to the end of a
13: if j > n , append b[i , . . . , m] to the end of a
14: return a
To analyze the time complexity of MeRgeSoRt, let T (n) be the complexity of MeRgeSoRt when the length
of the input array is n . To simplify the analysis, we assume n is an integer power of 2 (i.e., n = 2k for some
[ ]
k ∈ Z+ ) without loss of generality1 . The time complexity for sorting the first half of the array a 1, . . . , n2
[ ]
is then T ( n2 ), and the time complexity for sorting the second half of the array a n2 + 1, . . . , n is also T ( n2 ).
Finally, it takes O( n2 + n2 ) = O(n) times to merge the two arrays as we have analyzed before. Therefore, we
obtain the following recurrence relation on T :
(n )
T (n) = 2T + O(n), where T (1) = 1.
2
To solve this recurrence relation, let c > 0 be a constant such that T (n) ≤ 2T ( n2 )+cn . By iteratively applying
1 If n is not an integer power of 2, we can find the largest integer s in the array, add more integers that are even larger than s
into the array to make the length of the array an integer power of 2, and then remove those newly added integers after sorting
the array. We leave it as an exercise to show that the time complexity does not change asymptotically by these operations.
2
the recurrence relation, we have
(n )
T (n) ≤ 2T + cn
( 2( n ) n) (n )
≤ 2 2T +c · + cn = 4T + 2cn
( (n 4) 2) 4 )
(
n n
≤ 4 2T +c · + 2cn = 8T + 3cn
8 4 8
..
.
≤ n · T (1) + (log2 n) · cn = O(n log n).
Putting together, we have completed the analysis of the time complexity for the merge sort.
Theorem 2. The time complexity for sorting an array of length n by the merge sort is O(n log n).
We have seen that the merge sort can sort an array in time O(n log n). Can we do better? In other words,
does there exist an algorithm that runs in time o(n log n)?
To answer this question, we first need to carefully define what computational model we are using. Com-
putational complexity-theoretic Church–Turing thesis posits that the class of problems that are solvable in
polynomial time in other “reasonable” computational models is also polynomial time solvable in a Turing
machine. However, when it comes to use a specific polynomial to upper-bound the time complexity of a
given algorithm, the precise polynomial upper bounds can be different for different computational models.
When analyzing the time complexity for the merge sort, we have implicitly assumed that comparing the
values of two integers can be done in O(1) time, which is true in the RAM (Random Access Machine) model,
but unrealistic in a Turing Machine. Other than the Turing Machine model, a commonly used model that
arguably closer to modern computers is called word RAM, where arithmetic operations such as additions
and multiplications can be done in O(1) time.
√
Under the word RAM model, the current fastest randomized sorting algorithm runs in time O(n log log n)
due to Han and Thorup [2002], and the current fastest deterministic sorting algorithm runs in time O(n log log n)
due to Han [2002].
However, if we are in a computational model where the only available binary operation between two
elements is comparison (for example, the elements can be non-numeric, but there is a total order defined),
then Θ(n log n) is indeed optimal.
Theorem 3. Suppose the only binary operation allowed is comparison. Any sorting algorithm requires making
Ω(n log n) comparisons.
Proof. Let K (n) be the maximum number of comparisons an arbitrary algorithm makes for an array of
length n . Since a comparison can only have 3 outcomes: greater than, equal to, or smaller than, the
number of possible outputs by the algorithm is at most 3K (n) .
3
On the other hand, an array of length n can have n! different orders. Given the ordered set of indices
(1, . . . , n), each output of the algorithm corresponds to a permutation of (1, . . . , n). If the output is charac-
terized by the permutation, the algorithm must be able to produce n! different outputs.
Putting together, we have 3K (n) ≥ n!, which implies K (n) = Ω(log n!) = Ω(n log n). We have applied the
inequality ( n2 )n/2 ≤ n! ≤ n n .2
Problem 4. Given an array a[1, . . . , n] that is a permutation of (1, . . . , n), count the number of inversions in
a , #{(i , j ) | (i < j ) ∧ (a[i ] > a[ j ])}.
[ ]
To apply the divide and conquer technique, we split the array into two sub-arrays: a 1, . . . , b n2 c and
[ ] [ ]
a b n2 c + 1, . . . , n as before. Suppose x a is the number of inversions within a 1, . . . , b n2 c and x b is the
[ ]
number of inversions within a b n2 c + 1, . . . , n . The total number of inversions is x a + x b plus the number
[ ]
of those inversions (i , j ) where index i comes from the sub-array a 1, . . . , b n2 c and index j comes from the
[ ]
sub-array a b n2 c + 1, . . . , n . The values of x a and x b are computed recursively, so it remains to compute
the number of inversions between the two sub-arrays.
[ ] [ ]
Notice that the indices of elements in a b n2 c + 1, . . . , n are higher than the indices of elements in a 1, . . . , b n2 c .
We only need to find all pairs of integers, one from each sub-array, such that the integer from the sub-array
[ ] [ ]
a 1, . . . , b n2 c is greater than the integer from the sub-array a b n2 c + 1, . . . , n . This can be easily done if the
two sub-arrays are sorted, but how can we guarantee the two sub-arrays are sorted? The trick here is that
sorting and counting the number of inversions can be done simultaneously. The algorithm is presented in
Algorithm 3.
As before, let T (n) be the time complexity for Algorithm 3 when the array has length n , and we aim to
build a recurrence relation for T (n). We have seen that the computation of function MeRge at Line 15
requires time O(n). By a similar analysis, the computations from Line 4 to Line 14 requires time O(n).
Putting together, we have
(n )
T (n) = 2T + O(n), where T (1) = 1.
2
We know that this gives us T (n) = O(n log n) by the computation in the previous section.
2 The part n! ≤ n n can be easily seen by term-wise comparisons of the n terms. The part ( n )n/2 ≤ n! can be seen by ( n )n/2 ≤
2 2
( n2 + 1)( n2 + 2) · · · n ≤ n!.
4
Algorithm 3 Counting the number of inversions
CountInveRsions(a[1, . . . , n])
1: if n = 1, return (0, a) // We make sure CountInveRsions output both the number of inversions and
the sorted array.
[ ]
2: (x b , b) ←countInveRsions(a 1, . . . , b n2 c )
[ ]
3: (x c , c) ←countInveRsions(a b n2 c + 1, . . . , n )
4: x ← x b + x c // below we compute the number of inversions between b and c
5: initialize i ← 1 and j ← 1
6: while i ≤ b n2 c and j ≤ n − b n2 c:
7: if b[i ] ≤ c[ j ]:
8: i ← i +1
9: x ← x + j −1
10: else
11: j ← j +1
12: endif
13: endwhile
14: if j > n − b n2 c, update x ← x + (n − b n2 c)(b n2 c − i + 1)
15: a ←MeRge(b, c ) // MeRge is defined in Algorithm 2. This step ensures sub-arrays are sorted.
16: return (x, a ).
We ask the same question as before: can we do better? Under the word RAM model, the current state-of-
√
the-art is an algorithm runs in time O(n log n) by Chan and Pătraşcu [2010].
3 Master Theorem
When we apply the divide and conquer technique, we usually arrive at a recurrence relation
(n ) ( )
T (n) = aT + O nd ,
b
where a , b and d are constants independent of n . Master theorem below is a tool to solve the recurrence
relations in this format.
(n ) ( )
Theorem 6 (Master Theorem). Given the recurrence relation T (n) = aT b + O n d where a, b, d > 0 are
constants independent of n and the initial condition satisfies T (1) = O(1), we have
T (n) = O(n d ) if d > logb a
T (n) = T (n) = O(n log n) if d = logb a .
d
T (n) = O(n logb a ) if d < logb a
5
(n )
Proof. Let c be a constant satisfying T (n) ≤ aT b + cn d . We have
(n )
T (n) ≤ aT + cn d
( b
(n) ( n )d ) (n) (
a
)
≤ a aT 2 + c + cn d = a 2 T 2 + cn d · 1 + d
b b b b
( (n) ( n )d ) ( ) ( ) ( ( )2 )
2 d a 3 n d a a
≤ a aT 3 + c 2 + cn · 1 + d = a T 3 + cn · 1 + d + d
b b b b b b
..
.
log∑
b n−1
( ) ( ) log∑
b n−1
( )
logb n d a logb a d a
≤a T (1) + cn · =O n + cn · .
bd i =0 i =0 bd
∑log n−1 ( a )
If d > logb a , we have bad < 1 and i =0b = O(1). Therefore, T (n) = O(n d ).
bd ( )
∑log n−1 a
If d = logb a , we have bad = 1 and i =0b bd
= logb n . Therefore, T (n) ≤ n d T (1) + cn d · logb n =
O(n d log n). ((
∑logb n−1 ( ) )logb n ) ( )
a a a n logb a
If d < logb a , we have bd
> 1 and i =0 bd
=O bd
=O nd
. Therefore, T (n) = O(n logb a ).
For simplicity, we assume n is an integer power of 2 (notice that we can always append zeros to the most
significant bits without change the values of a and b until n is an integer power of 2).
To use the divide and conquer technique, we split each of a and b to two parts: the “left” part with n/2
most significant bits, and the “right” part with n/2 least significant bits. We write
n n
a = aL × 2 2 + aR and b = bL × 2 2 + bR ,
and we have
n
ab = a L b L × 2n + (a L b R + a R b L ) × 2 2 + a R b R .
We aim to design a divide-and-conquer-based algorithm based on this equation. Notice that additions can
n
be done in linear time in a Turing Machine. Multiplications by 2n and 2 2 can also be done in linear time,
as multiplications by an integer power of 2 only require shifting all the bits of the integers to the left. The
four multiplications aL bL , aL bR , aR bL , aR bR can be done recursively.
Based on these, we have a divide-and-conquer-based algorithm with the time complexity characterized by
(n )
T (n) = 4T 2 + O(n), which, by master theorem, gives us T (n) = O(n 2 ). However, this is no better than
the naïve primary school bit-wise integer multiplication method.
6
However, a trick from Carl Friedrich Gauss reduces the number of multiplications in each recursive step
from 4 to 3. We only compute three multiplications aL bL , aL bR and (aL + aR )(bL + bR ), and the coefficient
n
of 2 2 can be computed by only additions/subtractions:
a L b R + a R b L = (a L + a R )(b L + b R ) − a L b L − a L b R .
The idea behinds this is that multiplications are much more computational expensive than additions,
so it is worthy to reduce the number of multiplications at the cost of increasing the number of addi-
tions/subtractions.
The divide and conquer algorithm by applying Gauss’s trick has time complexity characterized by the
(n )
recurrence relation T (n) = 3T 2 + O(n), which yields T (n) = O(n log2 3 ) ≈ O(n 1.59 ) by master theorem.
The algorithm is formally described in Algorithm 4.
Theorem 8. The time complexity for Algorithm 4 is O(n log2 3 ) ≈ O(n 1.59 ).
There exist algorithms that run faster. We will learn algorithm Fast Fourier Transform (FFT) in the future,
which runs in time O ∗ (n log n), where the ∗ in the superscript indicates that those o(log n) multiplicative
factors are omitted from n log n . Harvey and Van Der Hoeven [2021] improved this to O(n log n).
5 Matrix Multiplications
Problem 9. Given two matrices X , Y ∈ Zn×n , compute their product X Y .
The naïve algorithm to compute the product X Y takes O(n 3 ) time, as we need to compute n 2 entries and
each entry takes O(n) time to compute. Can we do it faster by the divide and conquer technique?
n
Naturally, we can split each of X and Y into four sub-matrices of dimension 2 × n2 as follows:
A B E F
X = and Y = ,
C D G H
7
and we have
AE + BG AF + B H
XY = .
C E + DG CF +DH
The addition of two matrices takes O(n 2 ) time, and the eight multiplications above are computed re-
cursively. This gives us a divide-and-conquer-based algorithm with time complexity characterized by
(n )
T (n) = 8T 2 + O(n 2 ). By master theorem, this gives us T (n) = O(n 3 ), which is no faster than the naïve
algorithm.
Volker Strassen successfully reduced the number of multiplications in each recursive step from 8 to 7.
Although the idea of reducing the number of multiplications at the cost of more less-expensive addi-
tions/subtractions is similar to Gauss’s trick in the previous section, Volker Strassen’s solution is much
more tricky:
P5 + P4 − P2 + P6 P1 + P2
XY = ,
P3 + P4 P1 + P5 − P3 − P7
where
P 1 ≜ A(F − H ) P 2 ≜ (A + B )H P 3 ≜ (C + D)E
P 4 ≜ D(G − E ) P 5 ≜ (A + D)(E + H ) P 6 ≜ (B − D)(G + H )
P 7 ≜ (A −C )(E + F ).
The running time of Strassen’s algorithm is
(n )
T (n) = 7T + O(n 2 ),
2
which, by master theorem, yields T (n) = O(n log2 7 ) ≈ O(n 2.81 ).
Theorem 10. The time complexity for Strassen’s algorithm is O(n log2 7 ) ≈ O(n 2.81 ).
Can we do better? This problem is so significant that theoretical computer scientists reserve a letter ω to
denote the exponent of n in the running time of the fastest matrix multiplication algorithm. As shown
just now, Strassen’s algorithm implies ω ≤ log2 7. On the other hand, we know ω ≥ 2 as it takes time Θ(n 2 )
to just read the two input matrices. The current best algorithm by Alman and Williams [2021] indicates
ω < 2.37286. It is conjectured by the theoretical computer scientists that ω < 2 + ε for any ε > 0.
We can surely sort the array a and then output a[k]. The time complexity for this is O(n log n) as shown
in Sect. 1. However, can we do better? It turns out that there exists an algorithm runs in time O(n). Notice
that this is already the best we can do, as it takes Θ(n) time to read the input. In this section, we present
a randomized algorithm based on the divide and conquer technique with expected running time O(n).
8
We assume for simplicity that all the elements in a are distinct. The algorithm is presented in Algorithm 5.
We leave it as an exercise to generalize Algorithm 5 to the scenario where elements in a need not to be
distinct.
9
so ( )
∑
n
E[T (n)] = O E[X i ] .
i =1
For step (†), notice that a number chosen uniformly at random from the first half of {1, . . . , n} has expectation
n/4 and the number chosen uniformly at random from the second half has expectation 3n/4.
In general, we have E[X t +1 | X t ] ≤ 43 X t . Taking expectation on both side yields
( )t
3 3
E[X t +1 ] ≤ E[X t ] ≤ · · · ≤ n.
4 4
Putting together, ( ) ( )
( )
∑
n ∑ 3 i
n−1
E[T (n)] = O E[X i ] = O n · = O(n).
i =1 i =0 4
To begin applying the divide and conquer technique, we find a vertical pivotal line x = t such that half of
the points are on the left-hand side of x = t and half of the points are on the right. This can be done by
( )
1
sorting the n points by their x -coordinates and taking t = 2 x b n2 c + x b n2 c+1 . Notice that we only need to
sort the points once at the beginning. We do not need to do the sorting at every recursive step.
Let S L be the set of all points on the left and S R be the set of all points on the right. We can find the distance
between two closest points in each of S L and S R recursively. Let hL and hR be the corresponding distances
for the two closest points in S L and the two closest points in S R respectively. The minimum distance is
then at most min{hL , hR }. However, we have not checked for those pairs of points “across the boundary
x = t ”. That is, there may be a point in S L and a point in S R whose distance is smaller than min{h L , h R }.
10
the strip SS
y
hL
hR
h = min{hL , hR }
x
x=t−h x=t x=t+h
SL SR
Figure 1: Illustration for S L , S R and the strip S S .
Let h = min{hL , hR }. We only need to worry about those points that are within the vertical “strip” with
left boundary x = t − h and right boundary x = t + h . In other words, we only need to consider the set of
points {(x i , y i ) | x i ∈ (t − h, t + h)}. Let S S = {(x i , y i ) | x i ∈ (t − h, t + h)} be the set of all points in this strip.
Figure 1 illustrates S L , S R and the strip S S . For any point A that are outside this strip, the distance from
this point to the line x = t is at least h , so the distance from A to any point on the other side of x = t is
at least h . We know that A cannot be in a pair of points across the line x = t that has distance less than
h . Let h S be the distance of the two closest points within the strip. Then the distance of the two closest
points in S is min{hL , hR , hS }. It remains to calculate hS .
A naïve way to calculate hS is to compute the distance between every pair of points within the strip. This
takes time O(n 2 ), which is unacceptable. Instead, we will do the followings.
Firstly, we sort the elements in S S by the y -coordinates. In each iteration, we consider a point (x i , y i ) ∈ S S ,
and we only need to consider those points in S S whose y -coordinates are at least y i (for each (x j , y j ) ∈ S S
with y j < y i , the pair of points (x i , y i ), (x j , y j ) has already been considered at (x j , y j )-th iteration). In
addition, we only need to consider the points whose y -coordinates belong to the interval [y i , y i + h], as
the distance between (x i , y i ) and a point with y -coordinate greater than y i + h is more than h . Figure 2
shows that there can be at most 8 points in S S whose y -coordinates belong to the interval [y i , y i + h].
Therefore, we only need to compute the distances between (x i , y i ) and at most 7 other points. Thus, we
need to compute the distance between at most 7|S S | = O(n) pairs of points in the strip.
Now, we put everything together and obtain Algorithm 6.
11
the strip SS
y
C
A E
B F
D (xi , yi )
x
x=t−h x=t x=t+h
Figure 2: When we are at the (x i , y i )-th iteration, we only need to consider those points in the rectangle
ABFE. Notice that there can be at most four points in the square ABDC, because there can be at most one
point in each of the four small squares in ABDC (notice that any two points in a small square can have
p
2
distance at most 2 h < h , and the distance between any two points in S L is at least h L ≥ h ). Similarly,
there can be at most four points in the square CDFE. Thus, there can be at most 8 points in the rectangle
ABFE. Excluding (x i , y i ), there can be at most 7 other points.
12
7.1 Time Complexity for Finding the Closest Pair of Points
To analyze the time complexity of Algorithm 6, the time required to execute the for-loop at Line 8 is O(n).
The time required to execute the sorting at Line 7 is O(n log n). However, this can be optimized to O(n) and
we leave this optimization as an exercise (see Exercise 15). Therefore, the time complexity is characterized
by
(n )
T (n) = 2T + O(n),
2
which yields T (n) = O(n log n) by master theorem.
Theorem 14. The time complexity for Algorithm 6 is O(n log n).
Exercise 15. Optimize the execution of the sorting at Line 7 to make Algorithm 6 runs in time O(n log n).
Can we do better here? A randomized algorithm runs in time O(n) was given by Khuller and Matias [1995].
参考文献
Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix multiplication.
In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 522–539. SIAM,
2021. 8
Timothy M Chan and Mihai Pătraşcu. Counting inversions, offline orthogonal range counting, and related
problems. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages
161–173. SIAM, 2010. 5
√
Y Han and M Thorup. Sorting integers in O(n log log n) expected time and linear space. In IEEE Sympo-
sium on Foundations of Computer Science (FOCS’02), 2002. 3
Yijie Han. Deterministic sorting in O(n log log n) time and linear space. In Proceedings of the thirty-fourth
annual ACM symposium on Theory of computing, pages 602–608, 2002. 3
David Harvey and Joris Van Der Hoeven. Integer multiplication in time O(n log n). Annals of Mathematics,
193(2):563–617, 2021. 7
Samir Khuller and Yossi Matias. A simple randomized sieve algorithm for the closest-pair problem. Infor-
mation and Computation, 118(1):34–37, 1995. 13
13