The Set-Covering Problem

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

The Set-Covering Problem

Advanced Algorithms (CSE 794)


Instructor: Tamal K. Dey
Scribe: Rohit Prabhavalkar

May 29, 2009
1 Introduction
We describe here the set-covering problem and present an approximation algorithm for the problem based
on a greedy heuristic. We shall also prove that this algorithm has a logarithmic approximation ratio.
2 The Set-Covering Problem
An instance (X, F) of the set-covering problem consists of a nite set X and a set F of subsets of X. We
assume that each element of X appears in at least one of the subsets in F,
X =

SF
S (1)
A set S is said to cover each of its elements. The set-covering problem requires us to determine a
minimum sized subset of F that covers all the elements of X. In other words, given an instance (X, F), we
are required to nd a set C F such that,
X =

SC
S and |C| is minimal (2)
An instance of the set-covering problem is presented in Figure (1). As can be seen in the gure, the
minimum sized set cover for this instance is 3 (for example, S
2
, S
5
and S
6
). The set-covering problem has
many practical and useful applications. As an example, consider the problem of choosing a committee of
persons such that the members of the committee, as a group, possess all the skills required to complete a
particular task. Cost or other considerations may make it desirable to choose a committee of minimum size.
This problem may be modeled as follows. Assume that the task requires a set of skills that we denote by X.
Each person p who may potentially be selected to appear on the committee possesses some set S
p
X of
these skills. If we dene F =

p
S
p
, then it is clear that the minimal set cover for this instance (X, F) of
the set-covering problem corresponds exactly to the committee of the smallest size that can accomplish the
task.

Department of Computer and Information Sciences, The Ohio State University, Columbus, Ohio, USA.
S
3
S
1
S
4
S
2 S
6
S
5
Figure 1: An instance of the set-covering problem. Each dark circle represents an element of X. The set
F = {S
1
, S
2
, S
3
, S
4
, S
5
, S
6
}, with each of the boxes indicating a member of F. The minimal set cover is
given by, C = {S
2
, S
5
, S
6
}.
2.1 NP-Completeness of Set-Covering Problem
We note here that the vertex-cover problem is a special case of the set-covering problem. To see this, assume
that we have been given a graph G = (V, E) The vertex-cover problem requires us to determine a minimal
set of vertices V

V such that every edge e E touches atleast one vertex in V

. If we dene X = E and
F =

vV
E
v
, where E
v
is the set of all edges incident on a vertex v V . It is clear that a minimal set-
covering for this instance corresponds exactly to the minimal vertex-cover. Since the decision version of the
vertex-cover problem is known to be NP-complete, we conclude that the decision version of the set-covering
problem is also NP-complete.
3 A Greedy Approximation Algorithm for the Set-Covering Problem
GREEDY-SET-COVER(X, F)
1. U X;
2. C ;
3. while U =
4. select S F that maximizes |S U|;
5. U U S;
6. C C {S};
7. endwhile
8. return C;
A greedy algorithm for the set-covering problem is presented above. The algorithm maintains a set of
elements of X that are as yet uncovered in U. In each iteration of the while loop, the algorithm greedily
2
chooses the set S that maximally covers the elements of U, until all of the elements of X are covered. The
set C contains all the sets that have been chosen as part of the set cover at any point during the operation of
the algorithm. Note that by equation (1), it is clear that this procedure terminates with a valid set cover.
3.1 Analysis of GREEDY-SET-COVER
Since each iteration of the while loop in lines 3-7 adds a set S F to the cover C, and S must cover at
least one of the as yet uncovered elements of X, it is clear that the maximum number of iteration of the
while loop is at most min(|X|, |F|). The actual body of the loop in lines 4-6 can be implemented in time
O(|X||F|) and so the overall complexity of GREEDY-SET-COVERis O(|X||F| min(|X|, |F|))
1
3.2 Notation
In the following sections we establish the fact that the algorithm GREEDY-SET-COVER, introduced in
Section (3) is an approximation algorithm with a logarithmic approximation ratio. We use the following
notation, H(d) =

d
i=1
1/i and we dene H(0) = 0.
Theorem 1. GREEDY-SET-COVER is a polynomial-time (n)-approximation algorithm, where (n) =
H(max{|S| : S F})
Proof. We have already established that GREEDY-SET-COVER is a polynomial-time algorithm in section
(3.1). Let C

be the minimal set cover and C be the cover determined by the algorithm. We shall assign a
cost of 1 unit to each of the sets selected by the algorithm. Let S
i
be the i-th set chosen by the algorithm in
line 4. We divide the unit cost associated with the selection of this set equally amongst all the elements of
X that were covered for the rst time by the addition of the set S
i
to the set C. We denote this cost by c
x
for
each element x X. Note that any element of X is assigned a cost only once during the algorithm, when it
is covered by some set S for the rst time. When a set S
i
is added to the collection C by the algorithm, the
elements that have already been covered by the previously selected sets are given by S
1
S
2
S
i1
.
Therefore, the number of elements that are covered for the rst time is exactly |S
i
(S
1
S
2
S
i1
) |.
Since this cost is shared equally between all of these newly covered elements, if x is covered for the rst
time by S
i
then,
c
x
=
1
|S
i
(S
1
S
2
S
i1
) |
(3)
Since a cost of 1 was assigned for each set S that was added to the collection C in the algorithm and the
algorithm terminates when all elements of X have been covered, we must have,
|C| =

xX
c
x
(4)
To the optimal cover, we assign the cost,

SC

xS
c
x
(5)
Also, since C

is a cover, each x X appears in at least one of the sets S C

. Therefore, we may write

xX
c
x

SC

xS
c
x
(6)
1
Actually this bound can be improved further. The algorithm can be implemented in time that is linear in the size of the input.
For our purposes, however, it is sufcient to note that the algorithm runs in polynomial time.
3
From equations (4) and (6) we have,
|C|

SC

xS
c
x
(7)
We can now use the result stated below in Proposition (1) to bound the size of the cover returned by the
algorithm. Using Proposition (1) we can re-write 7 as,
|C|

SC

H (|S|)

SC

H (max{|S| : S F})
which completes the proof of Theorem 1.
All that remains, therefore, is to prove the following proposition,
Proposition 1. If S F, then

xS
c
x
H(|S|)
Proof. Let S F be an arbitrary set. We continue to use the same notation as was used previously in the
proof of Theorem 1. For each i = 1, 2, |C| let u
i
be the number of elements of S that remain uncovered
after S
1
, S
2
, , S
i
have been selected by the algorithm. Therefore,
u
i
= |S (S
1
S
2
S
i
)|
We specially dene u
0
= |S|. Let k be the smallest index such that each of the elements of S are covered
by atleast one of the sets S
1
, S
2
, , S
k
. Note that since the algorithm terminates with each element of X
being covered, and that S X, this index is well dened. Since u
i
represents the number of elements of S
that are as yet uncovered by S
1
, S
2
, , S
i
, we must have u
i1
u
i
. Also, by construction, (u
i1
u
i
)
elements of S are covered for the rst time by the set S
i
for i = 1, 2, , k. Therefore, using Equation (3)
we have,

xS
c
x
=
k

i=1
(u
i1
u
i
)
1
|S
i
(S
1
S
2
S
i1
) |
(8)
But S
i
was chosen in line 4 of the algorithm by the greedy heuristic, since it covered the maximal
number of the remaining uncovered elements of X. Therefore we must have,
|S
i
(S
1
S
2
S
i1
) | |S (S
1
S
2
S
i1
) | (9)
= u
i1
(10)
Note that if Equation (9) did not hold, then the algorithm should have chosen set S instead of S
i
at the i th
iteration, since S covers more uncovered elements than S
i
. The remainder of the proof consists of a series
of algebraic manipulations. From Equations (8) and (10) we have,
4

xS
c
x

k

i=1
(u
i1
u
i
)
1
u
i1
=
k

i=1
u
i1

j=u
i
+1
1
u
i1

i=1
u
i1

j=u
i
+1
1
j
(Since j u
i1
)
=
k

i=1

u
i1

j=1
1
j

u
i

j=1
1
j

=
k

i=1
(H(u
i1
) (u
i
))
= H(u
0
) H(u
k
) (Telescopic sum)
= H(|S|) H(0) (choice of k ensures u
k
= 0)
= H(|S|)
Hence, proved.
Finally, the following corollary establishes the fact that the algorithm GREEDY-SET-COVER has a
logarithmic approximation ratio.
Corollary 2. GREEDY-SET-COVER is a polynomial-time (ln |X| + 1)-approximation algorithm
Proof. We know that,
H (n) =
n

i=1
1
k
ln n + 1 (11)
Therefore, from Theorem 1 and Equation (11), GREEDY-SET-COVERis a polynomial-time algorithm with
approximation ratio (n) such that,
(n) = H(max{|S| : S F}) (12)
H(|X|) (S X |S| |X|)
ln |X| + 1 (From Equation (11))
This completes the proof.
4 Concluding Remarks
The algorithm GREEDY-SET-COVER described in Section (3) is an approximation algorithm to the set-
covering problem, with a logarithmic approximation ratio.
5
References
[1] T. Cormen, C. Leiserson, R. Rivest and C. Stein. Introduction to Algorithms (Second Edition). MIT
Press (2001).
[2] T. Dey. Lecture Notes: CSE 794A: Advanced Algorithms. Ohio State University. Spring 2009.
6

You might also like