Efficient Graph-Based Image Segmentation
Efficient Graph-Based Image Segmentation
Efficient Graph-Based Image Segmentation
= (V, E
), where
E
true if Dif (C
1
, C
2
) > MInt(C
1
, C
2
)
false otherwise
(3)
where the minimum internal difference, MInt, is de-
ned as,
MInt(C
1
, C
2
)
= min(Int(C
1
) +(C
1
), Int(C
2
) +(C
2
)). (4)
The threshold function controls the degree to which
the difference betweentwocomponents must be greater
than their internal differences in order for there to be
evidence of a boundary between them (D to be true).
For small components, Int(C) is not a good estimate
of the local characteristics of the data. In the extreme
case, when |C| = 1, Int(C) = 0. Therefore, we use a
threshold function based on the size of the component,
(C) = k/|C| (5)
where |C| denotes the size of C, and k is some constant
parameter. That is, for small components we require
stronger evidence for a boundary. In practice k sets a
scale of observation, in that a larger k causes a pref-
erence for larger components. Note, however, that k is
not a minimum component size. Smaller components
are allowed when there is a sufciently large difference
between neighboring components.
Any non-negative function of a single component
can be used for without changing the algorithmic
results in Section 4. For instance, it is possible to have
the segmentation method prefer components of certain
shapes, by dening a which is large for components
that do not t some desired shape and small for ones
that do. This would cause the segmentation algorithm
172 Felzenszwalb and Huttenlocher
to aggressively merge components that are not of the
desired shape. Such a shape preference could be as
weak as preferring components that are not long and
thin (e.g., using a ratio of perimeter to area) or as strong
as preferring components that match a particular shape
model. Note that the result of this would not solely be
components of the desired shape, however for any two
neighboring components one of them would be of the
desired shape.
4. The Algorithm and its Properties
In this section we describe and analyze an algorithmfor
producing a segmentation using the decision criterion
D introduced above. We will show that a segmenta-
tion produced by this algorithm obeys the properties of
being neither too coarse nor too ne, according to the
following denitions.
Denition 1. A segmentation S is too ne if there is
some pair of regions C
1
, C
2
S for which there is no
evidence for a boundary between them.
In order to dene the complementary notion of what
it means for a segmentation to be too coarse (to have
too few components), we rst introduce the notion of a
renement of a segmentation. Given two segmentations
S and T of the same base set, we say that T is a rene-
ment of S when each component of T is contained in
(or equal to) some component of S. In addition, we say
that T is a proper renement of S when T = S. Note
that if T is a proper renement of S, then T can be
obtained by splitting one or more regions of S. When
T is a proper renement of S we say that T is ner than
S and that S is coarser than T.
Denition 2. A segmentation S is too coarse when
there exists a proper renement of S that is not too
ne.
This captures the intuitive notionthat if regions of a seg-
mentation can be split and yield a segmentation where
there is evidence for a boundary between all pairs of
neighboring regions, then the initial segmentation has
too few regions.
Two natural questions arise about segmentations that
are neither too coarse nor too ne, namely whether or
not one always exists, and if so whether or not it is
unique. First we note that in general there can be more
than one segmentation that is neither too coarse nor
too ne, so such a segmentation is not unique. On the
question of existence, there is always some segmenta-
tion that is both not too coarse and not too ne, as we
now establish.
Property 1. For any (nite) graph G = (V, E) there
exists some segmentation S that is neither too coarse
nor too ne.
It is easy to see why this property holds. Consider
the segmentation where all the elements are in a single
component. Clearly this segmentation is not too ne,
because there is only one component. If the segmenta-
tion is also not too coarse we are done. Otherwise, by
the denition of what it means to be too coarse there
is a proper renement that is not too ne. Pick one of
those renements and keep repeating this procedure
until we obtain a segmentation that is not too coarse.
The procedure can only go on for n 1 steps because
whenever we pick a proper renement we increase the
number of components in the segmentation by at least
one, and the nest segmentation we can get is the one
where every element is in its own component.
We nowturn to the segmentation algorithm, which is
closely related to Kruskals algorithm for constructing
a minimum spanning tree of a graph (cf. Cormen et al.,
1990). It can be implemented to run in O(m log m)
time, where m is the number of edges in the graph.
Algorithm 1. Segmentation algorithm.
The input is a graph G = (V, E), with n vertices
and m edges. The output is a segmentation of V into
components S = (C
1
, . . . , C
r
).
0. Sort E into = (o
1
, . . . , o
m
), by non-decreasing
edge weight.
1. Start with a segmentation S
0
, where each vertex v
i
is in its own component.
2. Repeat step 3 for q = 1, . . . , m.
3. Construct S
q
given S
q1
as follows. Let v
i
and
v
j
denote the vertices connected by the q-th edge
in the ordering, i.e., o
q
= (v
i
, v
j
). If v
i
and v
j
are in disjoint components of S
q1
and w(o
q
) is
small compared to the internal difference of both
those components, then merge the two components
otherwise do nothing. More formally, let C
q1
i
be
the component of S
q1
containing v
i
and C
q1
j
the component containing v
j
. If C
q1
i
= C
q1
j
and
w(o
q
) MInt(C
q1
i
, C
q1
j
) then S
q
is obtained
Efcient Graph-Based Image Segmentation 173
from S
q1
by merging C
q1
i
and C
q1
j
. Otherwise
S
q
= S
q1
.
4. Return S = S
m
.
We now establish that a segmentation S produced
by Algorithm 1 obeys the global properties of being
neither too ne nor too coarse when using the re-
gion comparison predicate D dened in (3). That is,
although the algorithm makes only greedy decisions
it produces a segmentation that satises these global
properties. Moreover, we show that any of the possi-
ble non-decreasing weight edge orderings that could
be picked in Step 0 of the algorithm produce the same
segmentation.
Lemma 1. In Step 3 of the algorithm, when consider-
ing edge o
q
, if two distinct components are considered
and not merged then one of these two components will
be in the nal segmentation. Let C
q1
i
and C
q1
j
denote
the two components connected by edge o
q
= (v
i
, v
j
)
when this edge is considered by the algorithm. Then ei-
ther C
i
= C
q1
i
or C
j
= C
q1
j
, where C
i
is the compo-
nent containing v
i
and C
j
is the component containing
v
j
in the nal segmentation S.
Proof: There are two cases that would result
in a merge not happening. Say that it is due to
w(o
q
) > Int(C
q1
i
) +(C
q1
i
). Since edges are consid-
ered in non-decreasing weight order, w(o
k
) w(o
q
),
for all k q + 1. Thus no additional merges will
happen to this component, i.e., C
i
= C
q1
i
. The case
for w(o
q
) > Int(C
q1
j
) +(C
q1
j
) is analogous.
Note that Lemma 1 implies that the edge causing
the merge of two components is exactly the minimum
weight edge between the components. Thus the edges
causing merges are exactly the edges that would be
selected by Kruskals algorithm for constructing the
minimum spanning tree (MST) of each component.
Theorem 1. The segmentation S produced by
Algorithm 1 is not too ne according to Denition 1,
using the region comparison predicate D dened in (3).
Proof: By denition, in order for S to be too ne
there is some pair of components for which D does not
hold. There must be at least one edge between such
a pair of components that was considered in Step 3
and did not cause a merge. Let o
q
= (v
i
, v
j
) be the
rst such edge in the ordering. In this case the algo-
rithm decided not to merge C
q1
i
with C
q1
j
which
implies w(o
q
) > MInt(C
q1
i
, C
q1
j
). By Lemma 1 we
know that either C
i
= C
q1
i
or C
j
= C
q1
j
. Either way
we see that w(o
q
) > MInt(C
i
, C
j
) implying D holds
for C
i
and C
j
, which is a contradiction.
Theorem 2. The segmentation S produced by
Algorithm1 is not too coarse according to Denition 2,
using the region comparison predicate D dened in
(3).
Proof: In order for S to be too coarse there must be
some proper renement, T, that is not too ne. Con-
sider the minimum weight edge e that is internal to a
component C S but connects distinct components
A, B T. Note that by the denition of renement
A C and B C.
Since T is not too ne, either w(e) > Int(A) +(A)
or w(e) > Int(B)+(B). Without loss of generality, say
the former is true. By construction any edge connecting
A to another sub-component of C has weight at least as
large as w(e), which is in turn larger than the maximum
weight edge in MST(A, E) because w(e) > Int(A).
Thus the algorithm, which considers edges in non-
decreasing weight order, must consider all the edges
in MST(A, E) before considering any edge from A to
other parts of C. So the algorithm must have formed
A before forming C, and in forming C it must have
merged A with some other sub-component of C. The
weight of the edge that caused this merge must be least
as large as w(e). However, the algorithmwouldnot have
merged A in this case because w(e) > Int(A) +(A),
which is a contradiction because the algorithm did
form C.
Theorem 3. The segmentation produced by
Algorithm 1 does not depend on which non-decreasing
weight order of the edges is used.
Proof: Any ordering can be changed into another
one by only swapping adjacent elements. Thus it is
sufcient to show that swapping the order of two ad-
jacent edges of the same weight in the non-decreasing
weight ordering does not change the result produced by
Algorithm 1.
Let e
1
and e
2
be two edges of the same weight that
are adjacent in some non-decreasing weight ordering.
Clearlyif whenthe algorithmconsiders the rst of these
two edges they connect disjoint pairs of components
or exactly the same pair of components, then the or-
der in which the two are considered does not matter.
The only case we need to check is when e
1
is between
174 Felzenszwalb and Huttenlocher
two components A and B and e
2
is between one of
these components, say B, and some other component
C.
Now we show that e
1
causes a merge when con-
sidered after e
2
exactly when it would cause a merge
if considered before e
2
. First, suppose that e
1
causes
a merge when considered before e
2
. This implies
w(e
1
) MInt(A, B). If e
2
were instead considered
before e
1
, either e
2
would not cause a merge and triv-
ially e
1
would still cause a merge, or e
2
would cause
a merge in which case the new component B C
would have Int(B C) = w(e
2
) = w(e
1
). So we
know w(e
1
) MInt(A, B C) which implies e
1
will
still cause a merge. On the other hand, suppose that
e
1
does not cause a merge when considered before
e
2
. This implies w(e
1
) > MInt(A, B). Then either
w(e
1
) > Int(A) + (A), in which case this would
still be true if e
2
were considered rst (because e
2
does not touch A), or w(e
1
) > Int(B) +(B). In this
second case, if e
2
were considered rst it could not
cause a merge since w(e
2
) = w(e
1
) and so w(e
2
) >
MInt(B, C). Thus when considering e
1
after e
2
we still
have w(e
1
) > MInt(A, B) and e
1
does not cause a
merge.
4.1. Implementation Issues and Running Time
Our implementation maintains the segmentation S us-
ing a disjoint-set forest with union by rank and path
compression (cf. Cormen et al., 1990). The running
time of the algorithm can be factored into two parts.
First in Step 0, it is necessary to sort the weights into
non-decreasing order. For integer weights this can be
done in linear time using counting sort, and in general
it can be done in O(m log m) time using any one of
several sorting methods.
Steps 13 of the algorithm take O(m(m)) time,
where is the very slow-growing inverse Ackermans
function. In order to check whether two vertices are in
the same component we use set-nd on each vertex, and
inorder tomerge twocomponents we use use set-union.
Thus there are at most three disjoint-set operations per
edge. The computation of MInt can be done in constant
time per edge if we know Int and the size of each com-
ponent. Maintaining Int for a component can be done in
constant time for each merge, as the maximum weight
edge in the MST of a component is simply the edge
causing the merge. This is because Lemma 1 implies
that the edge causing the merge is the minimumweight
edge between the two components being merged. The
size of a component after a merge is simply the sum of
the sizes of the two components being merged.
5. Results for Grid Graphs
First we consider the case of monochrome (intensity)
images. Color images are handled as three separate
monochrome images, as discussed below. As in other
graph-based approaches to image segmentation (e.g.,
Shi and Malik, 1997; Wu and Leahy, 1993; Zahn, 1971)
we dene an undirected graph G = (V, E), where each
image pixel p
i
has a corresponding vertex v
i
V. The
edge set E is constructed by connecting pairs of pixels
that are neighbors in an 8-connected sense (any other
local neighborhood could be used). This yields a graph
where m = O(n), so the running time of the segmenta-
tion algorithm is O(n log n) for n image pixels. We use
an edge weight function based on the absolute intensity
difference between the pixels connected by an edge,
w(v
i
, v
j
) = |I ( p
i
) I ( p
j
)|
where I ( p
i
) is the intensityof the pixel p
i
. Ingeneral we
use a Gaussianlter tosmooththe image slightlybefore
computing the edge weights, in order to compensate for
digitization artifacts. We always use a Gaussian with
= 0.8, which does not produce any visible change
to the image but helps remove artifacts.
For color images we run the algorithm three times,
once for each of the red, green and blue color planes,
and then intersect these three sets of components.
Specically, we put two neighboring pixels in the same
component when they appear in the same component in
all three of the color plane segmentations. Alternatively
one could run the algorithmjust once on a graph where
the edge weights measure the distance between pixels
in some color space, however experimentally we ob-
tained better results by intersecting the segmentations
for each color plane in the manner just described.
There is one runtime parameter for the algorithm,
which is the value of k that is used to compute
the threshold function . Recall we use the function
(C) = k/|C| where |C| is the number of elements
in C. Thus k effectively sets a scale of observation, in
that a larger k causes a preference for larger compo-
nents. We use two different parameter settings for the
examples in this section (and throughout the paper),
depending on the resolution of the image and the de-
gree to which ne detail is important in the scene. For
instance, in the 128128 images of the COILdatabase
Efcient Graph-Based Image Segmentation 175
of objects we use k = 150. In the 320 240 or larger
images, such as the street scene and the baseball player,
we use k = 300.
The rst image in Fig. 2 shows a street scene. Note
that there is considerable variation in the grassy slope
leading up to the fence. It is this kind of variability that
our algorithm is designed to handle (recall the high
variability region in the synthetic example in Fig. 1).
The second image shows the segmentation, where each
region is assigned a randomcolor. The six largest com-
ponents found by the algorithm are: three of the grassy
areas behind the fence, the grassy slope, the van, and
the roadway. The missing part of the roadway at the
lower left is a visibly distinct region in the color image
from which this segmentation was computed (a spot
due to an imaging artifact). Note that the van is also
not uniform in color, due to specular reections, but
these are diffuse enough that they are treated as inter-
nal variation and incorporated into a single region.
The rst image in Fig. 3 shows two baseball players
(from Shi and Malik, 1997). As in the previous exam-
ple, there is a grassy region with considerable variation.
The uniforms of the players also have substantial vari-
ation due to folds in the cloth. The second image shows
the segmentation. The six largest components found by
the algorithm are: the back wall, the Mets emblem, a
large grassy region (including part of the wall under
the top player), each of the two players uniforms, and
a small grassy patch under the second player. The large
grassy region includes part of the wall due to the rela-
tively high variation in the region, and the fact that there
is a long slow change in intensity (not strong evidence
for a boundary) between the grass and the wall. This
boundary is similar in magnitude to those within the
player uniforms due to folds in the cloth.
Figure 4 shows the results of the algorithm for an
image of an indoor scene, where both ne detail and
larger structures are perceptually important. Note that
the segmentation preserves small regions such as the
name tags the people are wearing and things behind
the windows, while creating single larger regions for
high variability areas such as the air conditioning duct
near the top of the image, the clothing and the fur-
niture. This image also shows that sometimes small
boundary regions are found, for example at the edge
of the jacket or shirt. Such narrow regions occur be-
cause there is a one or two pixel wide area that is
halfway between the two neighboring regions in color
and intensity. This is common in any segmentation
method based on grid graphs. Such regions can be elim-
inated if desired, by removing long thin regions whose
color or intensity is close to the average of neighboring
regions.
Figure 5 shows three simple objects from the
Columbia COIL image database. Shown for each is the
largest region found by our algorithm that is not part
of the black background. Note that each of these ob-
jects has a substantial intensity gradient across the face
of the object, yet the regions are correctly segmented.
This illustrates another situation that the algorithmwas
designed to handle, slow changes in intensity due to
lighting.
6. Results for Nearest Neighbor Graphs
One common approach to image segmentation is based
on mapping each pixel to a point in some feature
space, and then nding clusters of similar points (e.g.,
Comaniciu and Meer, 1997, 1999; Jain and Dubes,
1988). In this section we investigate using the graph-
based segmentation algorithm from Section 4 in order
to nd such clusters of similar points. In this case, the
graph G = (V, E) has a vertex corresponding to each
feature point (each pixel) and there is an edge (v
i
, v
j
)
connecting pairs of feature points v
i
and v
j
that are
nearby in the feature space, rather than using neighbor-
ing pixels in the image grid. There are several possible
ways of determining which feature points to connect
by edges. We connect each point to a xed number of
nearest neighbors. Another possibility is to use all the
neighbors within some xed distance d. In any event,
it is desirable to avoid considering all O(n
2
) pairs of
feature points.
The weight w(v
i
, v
j
) of an edge is the distance be-
tween the two corresponding points in feature space.
For the experiments shown here we map each pixel
to the feature point (x, y, r, g, b), where (x, y) is the
location of the pixel in the image and (r, g, b) is the
color value of the pixel. We use the L
2
(Euclidean)
distance between points as the edge weights, although
other distance functions are possible.
The internal difference measure, Int(C), has a rela-
tively simple underlying intuition for points in feature
space. It species the minimum radius of dilation
necessary to connect the set of feature points con-
tained in C together into a single volume in feature
space. Consider replacing each feature point by a ball
with radius r. From the denition of the MST it can
be seen that the union of these balls will form one
176 Felzenszwalb and Huttenlocher
Figure 2. A street scene (320 240 color image), and the segmentation results produced by our algorithm ( = 0.8, k = 300).
Figure 3. A baseball scene (432 294 grey image), and the segmentation results produced by our algorithm ( = 0.8, k = 300).
Figure 4. An indoor scene (image 320 240, color), and the segmentation results produced by our algorithm ( = 0.8, k = 300).
single connected volume only when r Int(C)/2.
The difference between components, Dif (C
1
, C
2
), also
has a simple underlying intuition. It species the min-
imum radius of dilation necessary to connect at least
one point of C
1
to a point of C
2
. Our segmentation tech-
nique is thus closely related to the work of Comaniciu
and Meer (1999), which similarly takes an approach
to clustering based on dilating points in a parameter
space (however they rst use a novel transformation
of the data that we do not perform, and then use a
xed dilation radius rather than the variable one that we
use).
Rather than constructing the complete graph, where
all points are neighbors of one another, we nd a
small xed number of neighbors for each point. This
results in a graph with O(n) edges for n image
Efcient Graph-Based Image Segmentation 177
Figure 5. Three images from the COIL database, and the largest
non-background component found in each image (128 128 color
images; algorithm parameters = 0.8, k = 150).
pixels, and an overall running time of the segmenta-
tion method of O(n log n) time. There are many pos-
sible ways of picking a small xed number of neigh-
bors for each point. We use the ANN algorithm (Arya
and Mount, 1993) to nd the nearest neighbors for
each point. This algorithm is quite fast in practice,
given a 5-dimensional feature space with several hun-
dred thousand points. The ANN method also allows
the nding of approximate nearest neighbors, which
runs more quickly than nding the actual nearest neigh-
bors. For the examples reported here we use ten near-
est neighbors of each pixel to generate the edges of the
graph.
One of the key differences fromthe previous section,
where the image grid was used to dene the graph,
is that the nearest neighbors in feature space capture
more spatially non-local properties of the image. In the
grid-graph case, all of the neighbors in the graph are
neighbors in the image. Here, points can be far apart
in the image and still be among a handful of nearest
Figure 6. A synthetic image (40 32 grey image) and the segmentation using the nearest neighbor graph ( = 0, k = 150).
neighbors (if their color is highly similar and interven-
ing image pixels are of dissimilar color). For instance,
this can result segmentations with regions that are dis-
connected in the image, which did not happen in the
grid-graph case.
Figure 6 shows a synthetic image from Perona and
Freeman (1998) and Gdalyahu et al. (1999) and its
segmentation, using k = 150 and with no smoothing
( = 0). In this example the spatially disconnected re-
gions do not reect interesting scene structures, but we
will see examples below which do.
For the remaining examples in this section, we use
k = 300 and = 0.8, as in the previous section.
First, we note that the nearest neighbor graph produces
similar results to the grid graph for images in which
the perceptually salient regions are spatially connected.
For instance, the street scene and baseball player scene
considered in the previous section yield very similar
segmentations using either the nearest neighbor graph
or the grid graph, as can be seen by comparing the
results in Fig. 7 with those in Figs. 2 and 3.
Figure 8 shows two additional examples using the
nearest neighbor graph. These results are not possible
to achieve with the grid graph approach because certain
interesting regions are not spatially connected. The rst
example shows a ower garden, where the red owers
are spatially disjoint in the foreground of the image, and
then merge together in the background. Most of these
owers are merged into a single region, which would
not be possible with the grid-graph method. The sec-
ond example in Fig. 8 shows the Eiffel tower at night.
The bright yellow light forms a spatially disconnected
region. These examples show that the segmentation
method, coupled with the use of a nearest neighbor
graph, can capture very high level properties of im-
ages, while preserving perceptually important region
boundaries.
178 Felzenszwalb and Huttenlocher
Figure 7. Segmentation of the street and baseball player scenes from the previous section, using the nearest neighbor graph rather than the
grid graph ( = 0.8, k = 300).
Figure 8. Segmentation using the nearest neighbor graph can capture spatially non-local regions ( = 0.8, k = 300).
7. Summary and Conclusions
In this paper we have introduced a new method for
image segmentation based on pairwise region compar-
ison. We have shown that the notions of a segmentation
being too coarse or too ne can be dened in terms of a
function which measures the evidence for a boundary
between pairs of regions. Our segmentation algorithm
makes simple greedy decisions, and yet produces seg-
mentations that obey the global properties of being not
too coarse and not too ne using a particular region
comparison function. The method runs in O(m log m)
time for m graph edges and is also fast in practice,
generally running in a fraction of a second.
Efcient Graph-Based Image Segmentation 179
The pairwise region comparison predicate we use
considers the minimum weight edge between two re-
gions in measuring the difference between them. Thus
our algorithm will merge two regions even if there is
a single low weight edge between them. This is not
as much of a problem as it might rst appear, in part
because this edge weight is compared only to the min-
imum spanning tree edges of each component. For in-
stance, the examples considered in Sections 5 and 6
illustrate that the method nds segmentations that cap-
ture many perceptually important aspects of complex
imagery. Nonetheless, one can envision measures that
require more than a single cheap connection before de-
ciding that there is no evidence for a boundary between
two regions. One natural way of addressing this issue is
to use a quantile rather than the minimumedge weight.
However, in this case nding a segmentation that is nei-
ther too coarse nor too ne is an NP-hard problem (as
shown in the Appendix). Our algorithm is unique, in
that it is both highly efcient and yet captures non-local
properties of images.
We have illustrated our image segmentation algo-
rithm with two different kinds of graphs. The rst of
these uses the image grid to dene a local neighborhood
between image pixels, and measures the difference in
intensity (or color) between each pair of neighbors.
The second of these maps the image pixels to points in
a feature space that combines the (x, y) location and
(r, g, b) color value. Edges in the graph connect points
that are close together in this feature space. The algo-
rithm yields good results using both kinds of graphs,
but the latter type of graph captures more perceptually
global aspects of the image.
Image segmentation remains a challenging problem,
however we are beginning to make substantial progress
through the introduction of graph-based algorithms
that both help rene our understanding of the problem
and provide useful computational tools. The work re-
ported here and the normalized cuts approach (Shi and
Malik, 1997) are just a few illustrations of these recent
advances.
Appendix: NP-Hardness of D with Quantiles
Intuitively the region comparison predicate D de-
ned in Section 3.1 could be made more robust by
changing the denition of the difference between two
regions to reect a quantile rather than the mini-
mum weight edge between them. We show that with
this modication the problem of nding a segmenta-
tion that is neither too coarse nor too ne becomes
NP-hard.
The only difference between the new problem and
the old one is the denition of the difference between
two regions C
1
, C
2
S in Eq. (2), which becomes
Dif (C
1
, C
2
) = K
th
w(v
i
, v
j
) (6)
where K
th
selects the Kth quantile of its arguments
(K should be between zero and one). For example
with K = 0.5 the difference becomes the median edge
weight between the two components. This quantile is
computed over all edges (v
i
, v
j
) such that v
i
C
1
and
v
j
C
2
.
We reduce the min ratio cut problem with uniform
capacities and demands to the problem of nding a
good segmentation. The min ratio cut problem with
uniform capacities and demands is the following: we
are given a graph G = (V, E) and another set of edges
F. Each edge in E indicates a unit capacity between
a pair of nodes and each edge in F indicates a unit
demand between a pair of nodes. The value of a cut
(A, B) is the ratio of the total capacity and the total
demand between the sets A and B. So it is the ratio of
the number edges in E crossing the cut and the number
of edges in F crossing the cut. Finding the value of the
minimumratio cut is an NP-hard problem(cf. Ausiello
et al., to appear).
First we show how to transform an instance of this
problem to one where the sets E and F are disjoint,
without modifying the value of the minimum cut. For
every edge (a, b) E F we create a new node ab,
and exchange the edge in E with the edges (a, ab) and
(b, ab). For a cut with a and b in the same side, its
always better to keep ab in that side too and the value
of the cut is the same as in the original graph. For a
cut with a and b in different sides the node ab can be
in either side and there will be one capacity and one
demand edge crossing the cut and the value of cut is
again the same as in the original graph.
Nowwe showhowto decide if the modied instance
of the min ratio cut problem has a cut with value at
most v by solving a segmentation problem. Let c be
the number of edges from E crossing a cut (A, B) and
similarly d is the number of edges from F crossing
(A, B). Its easy to show that the cut value is small
exactly when the fraction of edges crossing the cut that
come from F is large,
c
d
v
d
c +d
1
v +1
(7)
180 Felzenszwalb and Huttenlocher
Dene G
= (V, E
) where E
= E F. We let the
edges from E have weight zero and the edges from F
have weight one.
Lemma 2. The graph G has a cut with value at most
v if and only if a segmentation of G
as a
single component is too coarse.
If G
C
1
= C
2
C
l
is one. So the fraction of weight
one edges between C
1
and
C
1
is at least 1/(v + 1).
Equation (7) implies that value of the cut (C
1
,
C
1
) is
at most v.
It is straightforward to see that the transformation of
the min ratio cut problem to the problem of nding a
segmentation presented above can be done in polyno-
mial time. This is sufcient to show the hardness of the
segmentation problem.
Theorem 4. The problem of nding a segmentation
that is neither too coarse nor too ne using Dif as
dened in Eq. (6) is NP-hard.
Acknowledgments
This work was supported in part by gifts from Intel,
Microsoft and Xerox corporations, in part by DARPA
under contract DAAL01-97-K-0104, and in part by
NSF Research Infrastructure award CDA-9703470.
We would like to thank Shree Nayar, Jianbo Shi and
Daphna Weinshall for use of their images. We would
also like to thank Jon Kleinberg, Eva Tardos and Dan
Ramras for discussions about the algorithmand the NP
hardness result.
References
Arya, S. and Mount, D.M. 1993. Approximate nearest neighbor
searching. In Proc. 4th Annual ACM-SIAM Symposium on Dis-
crete Algorithms, pp. 271280.
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti
Spaccamela, A., and Protasi, M. (to appear). Complexity and Ap-
proximation. Combinatorial Optimization Problems and their Ap-
proximability Properties. Springer-Verlag: Berlin.
Comaniciu, D. and Meer, P. 1997. Robust analysis of feature
spaces: Color image segmentation. In Proceedings of IEEE Con-
ference on Computer Vision and Pattern Recognition, pp. 750
755.
Comaniciu, D. and Meer, P. 1999. Mean shift analysis and appli-
cations. In Proceedings of IEEE Conference on Computer Vision
and Pattern Recognition, pp. 11971203.
Cooper, M.C. 1998. The tractability of segmentation and scene anal-
ysis. International Journal of Computer Vision, 30(1):2742.
Cormen, T.H., Leiserson, C.E., and Rivest, R.L. 1990. Introduction
to Algorithms. The MIT Press: McGraw-Hill Book Company.
Felzenszwalb, P. and Huttenlocher, D. 1998. Image segmentation us-
ing local variation. In Proceedings of IEEE Conference on Com-
puter Vision and Pattern Recognition, pp. 98104.
Gdalyahu, Y., Weinshall, D., and Werman, M. 1999. Stochastic clus-
tering by typical cuts. In Proceedings of IEEEConference on Com-
puter Vision and Pattern Recognition, pp. 25962601.
Jain, A.K. and Dubes, R.C. 1988. Algorithms for Clustering Data.
Prentice Hall.
Jermyn, I. and Ishikawa, H. 2001. Globally optimal regions and
boundaries as minimum ratio weight cycles. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, 23:1075
1088.
Pavlidas, T. 1977. Structural Pattern Recognition. Springer-Verlag.
Perona, P. and Freeman, W. 1998. Afactorization approach to group-
ing. In Proceedings of the European Conference on Computer
Vision, pp. 655670.
Ratan, A.L., Maron, O., Grimson, W.E.L., and Lozano-Perez, T.
1999. A framework for learning query concepts in image clas-
sication. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 423431.
Shi, J. and Malik, J. 1997. Normalized cuts and image segmentation.
In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 731737.
Urquhart, R. 1982. Graph theoretical clustering based on limited
neighborhood sets. Pattern Recognition, 15(3):173187.
Efcient Graph-Based Image Segmentation 181
Weiss, Y. 1999. Segmentationusingeigenvectors: Aunifyingview. In
Proceedings of the International Conference on Computer Vision,
2:975982.
Wertheimer, M. 1938. Laws of organization in perceptual forms (par-
tial translation). In ASourcebook of Gestalt Psychology. W.B. Ellis
(Ed.). Harcourt, Brace and Company, pp. 7188.
Wu, Z. and Leahy, R. 1993. An optimal graph theoretic approach to
data clustering: Theory and its application to image segmentation.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
11:11011113.
Zahn, C.T. 1971. Graph-theoretic methods for detecting and describ-
ing gestalt clusters. IEEE Transactions on Computing, 20:6886.