Efficient Graph-Based Image Segmentation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

International Journal of Computer Vision 59(2), 167181, 2004

c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.


Efcient Graph-Based Image Segmentation
PEDRO F. FELZENSZWALB
Articial Intelligence Lab, Massachusetts Institute of Technology
[email protected]
DANIEL P. HUTTENLOCHER
Computer Science Department, Cornell University
[email protected]
Received September 24, 1999; Revised August 26, 2003; Accepted September 17, 2003
Abstract. This paper addresses the problem of segmenting an image into regions. We dene a predicate for
measuring the evidence for a boundary between two regions using a graph-based representation of the image. We
then develop an efcient segmentation algorithm based on this predicate, and show that although this algorithm
makes greedy decisions it produces segmentations that satisfy global properties. We apply the algorithm to image
segmentation using two different kinds of local neighborhoods in constructing the graph, and illustrate the results
with both real and synthetic images. The algorithm runs in time nearly linear in the number of graph edges and
is also fast in practice. An important characteristic of the method is its ability to preserve detail in low-variability
image regions while ignoring detail in high-variability regions.
Keywords: image segmentation, clustering, perceptual organization, graph algorithm
1. Introduction
The problems of image segmentation and grouping re-
main great challenges for computer vision. Since the
time of the Gestalt movement in psychology (e.g.,
Wertheimer, 1938), it has been known that perceptual
grouping plays a powerful role in human visual percep-
tion. A wide range of computational vision problems
could in principle make good use of segmented images,
were such segmentations reliably and efciently com-
putable. For instance intermediate-level vision prob-
lems such as stereo and motion estimation require an
appropriate region of support for correspondence oper-
ations. Spatially non-uniformregions of support can be
identied using segmentation techniques. Higher-level
problems such as recognition and image indexing can
also make use of segmentation results in matching, to
address problems such as gure-ground separation and
recognition by parts.
Our goal is to develop computational approaches to
image segmentation that are broadly useful, much in
the way that other low-level techniques such as edge
detection are used in a wide range of computer vision
tasks. In order to achieve such broad utility, we believe
it is important that a segmentation method have the
following properties:
1. Capture perceptually important groupings or re-
gions, which often reect global aspects of the im-
age. Two central issues are to provide precise char-
acterizations of what is perceptually important, and
to be able to specify what a given segmentation tech-
nique does. We believe that there should be pre-
cise denitions of the properties of a resulting seg-
mentation, in order to better understand the method
as well as to facilitate the comparison of different
approaches.
168 Felzenszwalb and Huttenlocher
2. Be highly efcient, running in time nearly linear
in the number of image pixels. In order to be of
practical use, we believe that segmentation meth-
ods should run at speeds similar to edge detection or
other low-level visual processing techniques, mean-
ing nearly linear time and with lowconstant factors.
For example, a segmentation technique that runs at
several frames per second can be used in video pro-
cessing applications.
While the past few years have seen considerable
progress in eigenvector-based methods of image seg-
mentation (e.g., Shi and Malik, 1997; Weiss, 1999),
these methods are too slow to be practical for many
applications. In contrast, the method described in this
paper has been used in large-scale image database ap-
plications as described in Ratan et al. (1999). While
there are other approaches to image segmentation that
are highly efcient, these methods generally fail to cap-
ture perceptually important non-local properties of an
image as discussed below. The segmentation technique
developed here both captures certain perceptually im-
portant non-local image characteristics and is compu-
tationally efcientrunning in O(n log n) time for n
image pixels and with lowconstant factors, and can run
in practice at video rates.
Figure 1. A synthetic image with three perceptually distinct regions, and the three largest regions found by our segmentation method (image
320 240 pixels; algorithm parameters = 0.8, k = 300, see Section 5 for an explanation of the parameters).
As with certain classical clustering methods
(Urquhart, 1982; Zahn, 1971), our method is based on
selecting edges from a graph, where each pixel corre-
sponds to a node in the graph, and certain neighboring
pixels are connected by undirected edges. Weights on
each edge measure the dissimilarity between pixels.
However, unlike the classical methods, our technique
adaptively adjusts the segmentation criterion based on
the degree of variability in neighboring regions of the
image. This results in a method that, while making
greedy decisions, can be shown to obey certain non-
obvious global properties. We also show that other
adaptive criteria, closely related to the one developed
here, result in problems that are computationally dif-
cult (NP hard).
We now turn to a simple synthetic example illustrat-
ing some of the non-local image characteristics cap-
tured by our segmentation method. Consider the image
shown in the top left of Fig. 1. Most people will say
that this image has three distinct regions: a rectangular-
shaped intensity ramp in the left half, a constant in-
tensity region with a hole on the right half, and a
high-variability rectangular region inside the constant
region. This example illustrates some perceptually im-
portant properties that we believe shouldbe capturedby
a segmentationalgorithm. First, widelyvaryingintensi-
ties should not alone be judged as evidence for multiple
Efcient Graph-Based Image Segmentation 169
regions. Such wide variation in intensities occurs both
in the ramp on the left and in the high variability re-
gion on the right. Thus it is not adequate to assume
that regions have nearly constant or slowly varying
intensities.
Asecond perceptually important aspect of the exam-
ple in Fig. 1 is that the three regions cannot be obtained
using purely local decision criteria. This is because the
intensity difference across the boundary between the
ramp and the constant region is actually smaller than
many of the intensity differences within the high vari-
ability region. Thus, in order to segment such an image,
some kind of adaptive or non-local criterion must be
used.
The method that we introduce in Section 3.1 mea-
sures the evidence for a boundary between two regions
by comparing two quantities: one based on intensity
differences across the boundary, and the other based
on intensity differences between neighboring pixels
within each region. Intuitively, the intensity differences
across the boundary of two regions are perceptually
important if they are large relative to the intensity dif-
ferences inside at least one of the regions. We develop
a simple algorithm which computes segmentations us-
ing this idea. The remaining parts of Fig. 1 show the
three largest regions found by our algorithm. Although
this method makes greedy decisions, it produces re-
sults that capture certain global properties which are
derived below and whose consequences are illustrated
by the example in Fig. 1. The method also runs in a
small fraction of a second for the 320 240 image in
the example.
The organization of this paper is as follows. In the
next Section we discuss some related work, includ-
ing both classical formulations of segmentation and
recent graph-based methods. In Section 3 we consider
a particular graph-based formulation of the segmenta-
tion problem and dene a pairwise region comparison
predicate. Then in Section 4 we present an algorithm
for efciently segmenting an image using this predi-
cate, and derive some global properties that it obeys
even though it is a greedy algorithm. In Section 5 we
show results for a number of images using the image
grid to construct a graph-based representation of the
image data. Then in Section 6 we illustrate the method
using more general graphs, but where the number of
edges is still linear in the number of pixels. Using this
latter approach yields results that capture high-level
scene properties such as extracting a ower bed as a
single region, while still preserving ne detail in other
portions of the image. In the Appendix we show that a
straightforward generalization of the region compari-
son predicate presented in Section 3 makes the problem
of nding a good segmentation NP-hard.
2. Related Work
There is a large literature on segmentation and clus-
tering, dating back over 30 years, with applications in
many areas other than computer vision (cf. Jain and
Dubes, 1988). In this section we briey consider some
of the relatedworkthat is most relevant toour approach:
earlygraph-basedmethods (e.g., Urquhart, 1982; Zahn,
1971), region merging techniques (e.g., Cooper, 1998;
Pavlidas, 1977), techniques based on mapping image
pixels tosome feature space (e.g., ComaniciuandMeer,
1997, 1999) and more recent formulations in terms of
graph cuts (e.g., Shi and Malik, 1997; Wu and Leahy,
1993) and spectral methods (e.g., Weiss, 1999).
Graph-based image segmentation techniques gener-
ally represent the problem in terms of a graph G =
(V, E) where each node v
i
V corresponds to a pixel
in the image, and an edge (v
i
, v
j
) E connects vertices
v
i
and v
j
. A weight is associated with each edge based
on some property of the pixels that it connects, such
as their image intensities. Depending on the method,
there may or may not be an edge connecting each pair
of vertices. The earliest graph-based methods use xed
thresholds and local measures in computing a segmen-
tation. The work of Zahn (1971) presents a segmen-
tation method based on the minimum spanning tree
(MST) of the graph. This method has been applied
both to point clustering and to image segmentation.
For image segmentation the edge weights in the graph
are based on the differences between pixel intensities,
whereas for point clustering the weights are based on
distances between points.
The segmentation criterion in Zahns method is
to break MST edges with large weights. The inade-
quacy of simply breaking large edges, however, is il-
lustrated by the example in Fig. 1. As mentioned in
the introduction, differences between pixels within the
high variability region can be larger than those be-
tween the ramp and the constant region. Thus, de-
pending on the threshold, simply breaking large weight
edges would either result in the high variability re-
gion being split into multiple regions, or would merge
the ramp and the constant region together. The algo-
rithmproposed by Urquhart (1982) attempts to address
170 Felzenszwalb and Huttenlocher
this shortcoming by normalizing the weight of an edge
usingthe smallest weight incident onthe vertices touch-
ing that edge. When applied to image segmentation
problems, however, this is not enough to provide a rea-
sonable adaptive segmentation criterion. For example,
many pixels in the high variability region of Fig. 1 have
some neighbor that is highly similar.
Another early approach to image segmentation is
that of splitting and merging regions according to how
well each region ts some uniformity criterion (e.g.,
Cooper, 1998; Pavlidas, 1977). Generally these unifor-
mity criteria obey a subset property, such that when a
uniformity predicate U(A) is true for some region A
then U(B) is also true for any B A. Usually such
criteria are aimed at nding either uniform intensity
or uniform gradient regions. No region uniformity cri-
terion that has been proposed to date could be used
to correctly segment the example in Fig. 1, due to the
high variation region. Either this region would be split
into pieces, or it would be merged with the surrounding
area.
A number of approaches to segmentation are based
on nding compact clusters in some feature space (cf.
Comaniciu and Meer, 1997; Jain and Dubes, 1988).
These approaches generally assume that the image is
piecewise constant, because searching for pixels that
are all close together in some feature space implicitly
requires that the pixels be alike (e.g., similar color).
A recent technique using feature space clustering
(Comaniciu and Meer, 1999) rst transforms the data
by smoothing it in a way that preserves boundaries be-
tween regions. This smoothing operation has the over-
all effect of bringing points in a cluster closer together.
The method then nds clusters by dilating each point
with a hypersphere of some xed radius, and nding
connected components of the dilated points. This tech-
nique for nding clusters does not require all the points
in a cluster to lie within any xed distance. The tech-
nique is actually closely related to the region compar-
ison predicate that we introduce in Section 3.1, which
can be viewed as an adaptive way of selecting an ap-
propriate dilation radius. We return to this issue in
Section 6.
Finally we briey consider a class of segmentation
methods based on nding minimum cuts in a graph,
where the cut criterion is designed in order to mini-
mize the similarity between pixels that are being split.
Work by Wu and Leahy (1993) introduced such a cut
criterion, but it was biased toward nding small com-
ponents. This bias was addressed with the normalized
cut criterion developed by Shi and Malik (1997), which
takes into account self-similarity of regions. These cut-
based approaches to segmentation capture non-local
properties of the image, in contrast with the early
graph-based methods. However, they provide only a
characterization of each cut rather than of the nal
segmentation.
The normalized cut criterion provides a signicant
advance over the previous work in Wu and Leahy
(1993), both from a theoretical and practical point of
view (the resulting segmentations capture intuitively
salient parts of an image). However, the normalized
cut criterion also yields an NP-hard computational
problem. While Shi and Malik develop approxima-
tion methods for computing the minimum normalized
cut, the error in these approximations is not well un-
derstood. In practice these approximations are still
fairly hard to compute, limiting the method to rel-
atively small images or requiring computation times
of several minutes. Recently Weiss (1999) has shown
how the eigenvector-based approximations developed
by Shi and Malik relate to more standard spectral parti-
tioning methods on graphs. However, all such methods
are too slow for many practical applications.
An alternative to the graph cut approach is to look
for cycles in a graph embedded in the image plane. For
example in Jermyn and Ishikawa (2001) the quality of
each cycle is normalized in a way that is closely related
to the normalized cuts approach.
3. Graph-Based Segmentation
We take a graph-based approach to segmentation. Let
G = (V, E) be an undirected graph with vertices
v V, the set of elements to be segmented, and edges
(v
i
, v
j
) E corresponding to pairs of neighboring
vertices. Each edge (v
i
, v
j
) E has a corresponding
weight w(v
i
, v
j
), which is a non-negative measure of
the dissimilarity between neighboring elements v
i
and
v
j
. In the case of image segmentation, the elements in
V are pixels and the weight of an edge is some mea-
sure of the dissimilarity between the two pixels con-
nected by that edge (e.g., the difference in intensity,
color, motion, location or some other local attribute).
In Sections 5 and 6 we consider particular edge sets and
weight functions for image segmentation. However, the
formulation here is independent of these denitions.
In the graph-based approach, a segmentation S
is a partition of V into components such that each
Efcient Graph-Based Image Segmentation 171
component (or region) C S corresponds to a con-
nected component in a graph G

= (V, E

), where
E

E. In other words, any segmentation is induced


by a subset of the edges in E. There are different ways
to measure the quality of a segmentation but in general
we want the elements in a component to be similar,
and elements in different components to be dissimi-
lar. This means that edges between two vertices in the
same component should have relatively low weights,
and edges between vertices in different components
should have higher weights.
3.1. Pairwise Region Comparison Predicate
In this section we dene a predicate, D, for evaluating
whether or not there is evidence for a boundary be-
tween two components in a segmentation (two regions
of an image). This predicate is based on measuring the
dissimilarity between elements along the boundary of
the two components relative to a measure of the dissim-
ilarity among neighboring elements within each of the
two components. The resulting predicate compares the
inter-component differences to the within component
differences and is thereby adaptive with respect to the
local characteristics of the data.
We dene the internal difference of a component
C V to be the largest weight in the minimum span-
ning tree of the component, MST(C, E). That is,
Int(C) = max
eMST(C,E)
w(e). (1)
One intuition underlying this measure is that a given
component C only remains connected when edges of
weight at least Int(C) are considered.
We dene the difference between two components
C
1
, C
2
V to be the minimum weight edge connect-
ing the two components. That is,
Dif (C
1
, C
2
) = min
v
i
C
1
,v
j
C
2
,(v
i
,v
j
)E
w(v
i
, v
j
). (2)
If there is no edge connecting C
1
and C
2
we let
Dif (C
1
, C
2
) = . This measure of difference could
in principle be problematic, because it reects only
the smallest edge weight between two components. In
practice we have found that the measure works quite
well in spite of this apparent limitation. Moreover,
changing the denition to use the median weight, or
some other quantile, in order to make it more robust to
outliers, makes the problem of nding a good segmen-
tation NP-hard, as discussed in the Appendix. Thus
a small change to the segmentation criterion vastly
changes the difculty of the problem.
The region comparison predicate evaluates if there
is evidence for a boundary between a pair or compo-
nents by checking if the difference between the compo-
nents, Dif (C
1
, C
2
), is large relative to the internal dif-
ference within at least one of the components, Int(C
1
)
and Int(C
2
). A threshold function is used to control the
degree to which the difference between components
must be larger than minimum internal difference. We
dene the pairwise comparison predicate D(C
1
, C
2
),
as
D(C
1
, C
2
) =

true if Dif (C
1
, C
2
) > MInt(C
1
, C
2
)
false otherwise
(3)
where the minimum internal difference, MInt, is de-
ned as,
MInt(C
1
, C
2
)
= min(Int(C
1
) +(C
1
), Int(C
2
) +(C
2
)). (4)
The threshold function controls the degree to which
the difference betweentwocomponents must be greater
than their internal differences in order for there to be
evidence of a boundary between them (D to be true).
For small components, Int(C) is not a good estimate
of the local characteristics of the data. In the extreme
case, when |C| = 1, Int(C) = 0. Therefore, we use a
threshold function based on the size of the component,
(C) = k/|C| (5)
where |C| denotes the size of C, and k is some constant
parameter. That is, for small components we require
stronger evidence for a boundary. In practice k sets a
scale of observation, in that a larger k causes a pref-
erence for larger components. Note, however, that k is
not a minimum component size. Smaller components
are allowed when there is a sufciently large difference
between neighboring components.
Any non-negative function of a single component
can be used for without changing the algorithmic
results in Section 4. For instance, it is possible to have
the segmentation method prefer components of certain
shapes, by dening a which is large for components
that do not t some desired shape and small for ones
that do. This would cause the segmentation algorithm
172 Felzenszwalb and Huttenlocher
to aggressively merge components that are not of the
desired shape. Such a shape preference could be as
weak as preferring components that are not long and
thin (e.g., using a ratio of perimeter to area) or as strong
as preferring components that match a particular shape
model. Note that the result of this would not solely be
components of the desired shape, however for any two
neighboring components one of them would be of the
desired shape.
4. The Algorithm and its Properties
In this section we describe and analyze an algorithmfor
producing a segmentation using the decision criterion
D introduced above. We will show that a segmenta-
tion produced by this algorithm obeys the properties of
being neither too coarse nor too ne, according to the
following denitions.
Denition 1. A segmentation S is too ne if there is
some pair of regions C
1
, C
2
S for which there is no
evidence for a boundary between them.
In order to dene the complementary notion of what
it means for a segmentation to be too coarse (to have
too few components), we rst introduce the notion of a
renement of a segmentation. Given two segmentations
S and T of the same base set, we say that T is a rene-
ment of S when each component of T is contained in
(or equal to) some component of S. In addition, we say
that T is a proper renement of S when T = S. Note
that if T is a proper renement of S, then T can be
obtained by splitting one or more regions of S. When
T is a proper renement of S we say that T is ner than
S and that S is coarser than T.
Denition 2. A segmentation S is too coarse when
there exists a proper renement of S that is not too
ne.
This captures the intuitive notionthat if regions of a seg-
mentation can be split and yield a segmentation where
there is evidence for a boundary between all pairs of
neighboring regions, then the initial segmentation has
too few regions.
Two natural questions arise about segmentations that
are neither too coarse nor too ne, namely whether or
not one always exists, and if so whether or not it is
unique. First we note that in general there can be more
than one segmentation that is neither too coarse nor
too ne, so such a segmentation is not unique. On the
question of existence, there is always some segmenta-
tion that is both not too coarse and not too ne, as we
now establish.
Property 1. For any (nite) graph G = (V, E) there
exists some segmentation S that is neither too coarse
nor too ne.
It is easy to see why this property holds. Consider
the segmentation where all the elements are in a single
component. Clearly this segmentation is not too ne,
because there is only one component. If the segmenta-
tion is also not too coarse we are done. Otherwise, by
the denition of what it means to be too coarse there
is a proper renement that is not too ne. Pick one of
those renements and keep repeating this procedure
until we obtain a segmentation that is not too coarse.
The procedure can only go on for n 1 steps because
whenever we pick a proper renement we increase the
number of components in the segmentation by at least
one, and the nest segmentation we can get is the one
where every element is in its own component.
We nowturn to the segmentation algorithm, which is
closely related to Kruskals algorithm for constructing
a minimum spanning tree of a graph (cf. Cormen et al.,
1990). It can be implemented to run in O(m log m)
time, where m is the number of edges in the graph.
Algorithm 1. Segmentation algorithm.
The input is a graph G = (V, E), with n vertices
and m edges. The output is a segmentation of V into
components S = (C
1
, . . . , C
r
).
0. Sort E into = (o
1
, . . . , o
m
), by non-decreasing
edge weight.
1. Start with a segmentation S
0
, where each vertex v
i
is in its own component.
2. Repeat step 3 for q = 1, . . . , m.
3. Construct S
q
given S
q1
as follows. Let v
i
and
v
j
denote the vertices connected by the q-th edge
in the ordering, i.e., o
q
= (v
i
, v
j
). If v
i
and v
j
are in disjoint components of S
q1
and w(o
q
) is
small compared to the internal difference of both
those components, then merge the two components
otherwise do nothing. More formally, let C
q1
i
be
the component of S
q1
containing v
i
and C
q1
j
the component containing v
j
. If C
q1
i
= C
q1
j
and
w(o
q
) MInt(C
q1
i
, C
q1
j
) then S
q
is obtained
Efcient Graph-Based Image Segmentation 173
from S
q1
by merging C
q1
i
and C
q1
j
. Otherwise
S
q
= S
q1
.
4. Return S = S
m
.
We now establish that a segmentation S produced
by Algorithm 1 obeys the global properties of being
neither too ne nor too coarse when using the re-
gion comparison predicate D dened in (3). That is,
although the algorithm makes only greedy decisions
it produces a segmentation that satises these global
properties. Moreover, we show that any of the possi-
ble non-decreasing weight edge orderings that could
be picked in Step 0 of the algorithm produce the same
segmentation.
Lemma 1. In Step 3 of the algorithm, when consider-
ing edge o
q
, if two distinct components are considered
and not merged then one of these two components will
be in the nal segmentation. Let C
q1
i
and C
q1
j
denote
the two components connected by edge o
q
= (v
i
, v
j
)
when this edge is considered by the algorithm. Then ei-
ther C
i
= C
q1
i
or C
j
= C
q1
j
, where C
i
is the compo-
nent containing v
i
and C
j
is the component containing
v
j
in the nal segmentation S.
Proof: There are two cases that would result
in a merge not happening. Say that it is due to
w(o
q
) > Int(C
q1
i
) +(C
q1
i
). Since edges are consid-
ered in non-decreasing weight order, w(o
k
) w(o
q
),
for all k q + 1. Thus no additional merges will
happen to this component, i.e., C
i
= C
q1
i
. The case
for w(o
q
) > Int(C
q1
j
) +(C
q1
j
) is analogous.
Note that Lemma 1 implies that the edge causing
the merge of two components is exactly the minimum
weight edge between the components. Thus the edges
causing merges are exactly the edges that would be
selected by Kruskals algorithm for constructing the
minimum spanning tree (MST) of each component.
Theorem 1. The segmentation S produced by
Algorithm 1 is not too ne according to Denition 1,
using the region comparison predicate D dened in (3).
Proof: By denition, in order for S to be too ne
there is some pair of components for which D does not
hold. There must be at least one edge between such
a pair of components that was considered in Step 3
and did not cause a merge. Let o
q
= (v
i
, v
j
) be the
rst such edge in the ordering. In this case the algo-
rithm decided not to merge C
q1
i
with C
q1
j
which
implies w(o
q
) > MInt(C
q1
i
, C
q1
j
). By Lemma 1 we
know that either C
i
= C
q1
i
or C
j
= C
q1
j
. Either way
we see that w(o
q
) > MInt(C
i
, C
j
) implying D holds
for C
i
and C
j
, which is a contradiction.
Theorem 2. The segmentation S produced by
Algorithm1 is not too coarse according to Denition 2,
using the region comparison predicate D dened in
(3).
Proof: In order for S to be too coarse there must be
some proper renement, T, that is not too ne. Con-
sider the minimum weight edge e that is internal to a
component C S but connects distinct components
A, B T. Note that by the denition of renement
A C and B C.
Since T is not too ne, either w(e) > Int(A) +(A)
or w(e) > Int(B)+(B). Without loss of generality, say
the former is true. By construction any edge connecting
A to another sub-component of C has weight at least as
large as w(e), which is in turn larger than the maximum
weight edge in MST(A, E) because w(e) > Int(A).
Thus the algorithm, which considers edges in non-
decreasing weight order, must consider all the edges
in MST(A, E) before considering any edge from A to
other parts of C. So the algorithm must have formed
A before forming C, and in forming C it must have
merged A with some other sub-component of C. The
weight of the edge that caused this merge must be least
as large as w(e). However, the algorithmwouldnot have
merged A in this case because w(e) > Int(A) +(A),
which is a contradiction because the algorithm did
form C.
Theorem 3. The segmentation produced by
Algorithm 1 does not depend on which non-decreasing
weight order of the edges is used.
Proof: Any ordering can be changed into another
one by only swapping adjacent elements. Thus it is
sufcient to show that swapping the order of two ad-
jacent edges of the same weight in the non-decreasing
weight ordering does not change the result produced by
Algorithm 1.
Let e
1
and e
2
be two edges of the same weight that
are adjacent in some non-decreasing weight ordering.
Clearlyif whenthe algorithmconsiders the rst of these
two edges they connect disjoint pairs of components
or exactly the same pair of components, then the or-
der in which the two are considered does not matter.
The only case we need to check is when e
1
is between
174 Felzenszwalb and Huttenlocher
two components A and B and e
2
is between one of
these components, say B, and some other component
C.
Now we show that e
1
causes a merge when con-
sidered after e
2
exactly when it would cause a merge
if considered before e
2
. First, suppose that e
1
causes
a merge when considered before e
2
. This implies
w(e
1
) MInt(A, B). If e
2
were instead considered
before e
1
, either e
2
would not cause a merge and triv-
ially e
1
would still cause a merge, or e
2
would cause
a merge in which case the new component B C
would have Int(B C) = w(e
2
) = w(e
1
). So we
know w(e
1
) MInt(A, B C) which implies e
1
will
still cause a merge. On the other hand, suppose that
e
1
does not cause a merge when considered before
e
2
. This implies w(e
1
) > MInt(A, B). Then either
w(e
1
) > Int(A) + (A), in which case this would
still be true if e
2
were considered rst (because e
2
does not touch A), or w(e
1
) > Int(B) +(B). In this
second case, if e
2
were considered rst it could not
cause a merge since w(e
2
) = w(e
1
) and so w(e
2
) >
MInt(B, C). Thus when considering e
1
after e
2
we still
have w(e
1
) > MInt(A, B) and e
1
does not cause a
merge.
4.1. Implementation Issues and Running Time
Our implementation maintains the segmentation S us-
ing a disjoint-set forest with union by rank and path
compression (cf. Cormen et al., 1990). The running
time of the algorithm can be factored into two parts.
First in Step 0, it is necessary to sort the weights into
non-decreasing order. For integer weights this can be
done in linear time using counting sort, and in general
it can be done in O(m log m) time using any one of
several sorting methods.
Steps 13 of the algorithm take O(m(m)) time,
where is the very slow-growing inverse Ackermans
function. In order to check whether two vertices are in
the same component we use set-nd on each vertex, and
inorder tomerge twocomponents we use use set-union.
Thus there are at most three disjoint-set operations per
edge. The computation of MInt can be done in constant
time per edge if we know Int and the size of each com-
ponent. Maintaining Int for a component can be done in
constant time for each merge, as the maximum weight
edge in the MST of a component is simply the edge
causing the merge. This is because Lemma 1 implies
that the edge causing the merge is the minimumweight
edge between the two components being merged. The
size of a component after a merge is simply the sum of
the sizes of the two components being merged.
5. Results for Grid Graphs
First we consider the case of monochrome (intensity)
images. Color images are handled as three separate
monochrome images, as discussed below. As in other
graph-based approaches to image segmentation (e.g.,
Shi and Malik, 1997; Wu and Leahy, 1993; Zahn, 1971)
we dene an undirected graph G = (V, E), where each
image pixel p
i
has a corresponding vertex v
i
V. The
edge set E is constructed by connecting pairs of pixels
that are neighbors in an 8-connected sense (any other
local neighborhood could be used). This yields a graph
where m = O(n), so the running time of the segmenta-
tion algorithm is O(n log n) for n image pixels. We use
an edge weight function based on the absolute intensity
difference between the pixels connected by an edge,
w(v
i
, v
j
) = |I ( p
i
) I ( p
j
)|
where I ( p
i
) is the intensityof the pixel p
i
. Ingeneral we
use a Gaussianlter tosmooththe image slightlybefore
computing the edge weights, in order to compensate for
digitization artifacts. We always use a Gaussian with
= 0.8, which does not produce any visible change
to the image but helps remove artifacts.
For color images we run the algorithm three times,
once for each of the red, green and blue color planes,
and then intersect these three sets of components.
Specically, we put two neighboring pixels in the same
component when they appear in the same component in
all three of the color plane segmentations. Alternatively
one could run the algorithmjust once on a graph where
the edge weights measure the distance between pixels
in some color space, however experimentally we ob-
tained better results by intersecting the segmentations
for each color plane in the manner just described.
There is one runtime parameter for the algorithm,
which is the value of k that is used to compute
the threshold function . Recall we use the function
(C) = k/|C| where |C| is the number of elements
in C. Thus k effectively sets a scale of observation, in
that a larger k causes a preference for larger compo-
nents. We use two different parameter settings for the
examples in this section (and throughout the paper),
depending on the resolution of the image and the de-
gree to which ne detail is important in the scene. For
instance, in the 128128 images of the COILdatabase
Efcient Graph-Based Image Segmentation 175
of objects we use k = 150. In the 320 240 or larger
images, such as the street scene and the baseball player,
we use k = 300.
The rst image in Fig. 2 shows a street scene. Note
that there is considerable variation in the grassy slope
leading up to the fence. It is this kind of variability that
our algorithm is designed to handle (recall the high
variability region in the synthetic example in Fig. 1).
The second image shows the segmentation, where each
region is assigned a randomcolor. The six largest com-
ponents found by the algorithm are: three of the grassy
areas behind the fence, the grassy slope, the van, and
the roadway. The missing part of the roadway at the
lower left is a visibly distinct region in the color image
from which this segmentation was computed (a spot
due to an imaging artifact). Note that the van is also
not uniform in color, due to specular reections, but
these are diffuse enough that they are treated as inter-
nal variation and incorporated into a single region.
The rst image in Fig. 3 shows two baseball players
(from Shi and Malik, 1997). As in the previous exam-
ple, there is a grassy region with considerable variation.
The uniforms of the players also have substantial vari-
ation due to folds in the cloth. The second image shows
the segmentation. The six largest components found by
the algorithm are: the back wall, the Mets emblem, a
large grassy region (including part of the wall under
the top player), each of the two players uniforms, and
a small grassy patch under the second player. The large
grassy region includes part of the wall due to the rela-
tively high variation in the region, and the fact that there
is a long slow change in intensity (not strong evidence
for a boundary) between the grass and the wall. This
boundary is similar in magnitude to those within the
player uniforms due to folds in the cloth.
Figure 4 shows the results of the algorithm for an
image of an indoor scene, where both ne detail and
larger structures are perceptually important. Note that
the segmentation preserves small regions such as the
name tags the people are wearing and things behind
the windows, while creating single larger regions for
high variability areas such as the air conditioning duct
near the top of the image, the clothing and the fur-
niture. This image also shows that sometimes small
boundary regions are found, for example at the edge
of the jacket or shirt. Such narrow regions occur be-
cause there is a one or two pixel wide area that is
halfway between the two neighboring regions in color
and intensity. This is common in any segmentation
method based on grid graphs. Such regions can be elim-
inated if desired, by removing long thin regions whose
color or intensity is close to the average of neighboring
regions.
Figure 5 shows three simple objects from the
Columbia COIL image database. Shown for each is the
largest region found by our algorithm that is not part
of the black background. Note that each of these ob-
jects has a substantial intensity gradient across the face
of the object, yet the regions are correctly segmented.
This illustrates another situation that the algorithmwas
designed to handle, slow changes in intensity due to
lighting.
6. Results for Nearest Neighbor Graphs
One common approach to image segmentation is based
on mapping each pixel to a point in some feature
space, and then nding clusters of similar points (e.g.,
Comaniciu and Meer, 1997, 1999; Jain and Dubes,
1988). In this section we investigate using the graph-
based segmentation algorithm from Section 4 in order
to nd such clusters of similar points. In this case, the
graph G = (V, E) has a vertex corresponding to each
feature point (each pixel) and there is an edge (v
i
, v
j
)
connecting pairs of feature points v
i
and v
j
that are
nearby in the feature space, rather than using neighbor-
ing pixels in the image grid. There are several possible
ways of determining which feature points to connect
by edges. We connect each point to a xed number of
nearest neighbors. Another possibility is to use all the
neighbors within some xed distance d. In any event,
it is desirable to avoid considering all O(n
2
) pairs of
feature points.
The weight w(v
i
, v
j
) of an edge is the distance be-
tween the two corresponding points in feature space.
For the experiments shown here we map each pixel
to the feature point (x, y, r, g, b), where (x, y) is the
location of the pixel in the image and (r, g, b) is the
color value of the pixel. We use the L
2
(Euclidean)
distance between points as the edge weights, although
other distance functions are possible.
The internal difference measure, Int(C), has a rela-
tively simple underlying intuition for points in feature
space. It species the minimum radius of dilation
necessary to connect the set of feature points con-
tained in C together into a single volume in feature
space. Consider replacing each feature point by a ball
with radius r. From the denition of the MST it can
be seen that the union of these balls will form one
176 Felzenszwalb and Huttenlocher
Figure 2. A street scene (320 240 color image), and the segmentation results produced by our algorithm ( = 0.8, k = 300).
Figure 3. A baseball scene (432 294 grey image), and the segmentation results produced by our algorithm ( = 0.8, k = 300).
Figure 4. An indoor scene (image 320 240, color), and the segmentation results produced by our algorithm ( = 0.8, k = 300).
single connected volume only when r Int(C)/2.
The difference between components, Dif (C
1
, C
2
), also
has a simple underlying intuition. It species the min-
imum radius of dilation necessary to connect at least
one point of C
1
to a point of C
2
. Our segmentation tech-
nique is thus closely related to the work of Comaniciu
and Meer (1999), which similarly takes an approach
to clustering based on dilating points in a parameter
space (however they rst use a novel transformation
of the data that we do not perform, and then use a
xed dilation radius rather than the variable one that we
use).
Rather than constructing the complete graph, where
all points are neighbors of one another, we nd a
small xed number of neighbors for each point. This
results in a graph with O(n) edges for n image
Efcient Graph-Based Image Segmentation 177
Figure 5. Three images from the COIL database, and the largest
non-background component found in each image (128 128 color
images; algorithm parameters = 0.8, k = 150).
pixels, and an overall running time of the segmenta-
tion method of O(n log n) time. There are many pos-
sible ways of picking a small xed number of neigh-
bors for each point. We use the ANN algorithm (Arya
and Mount, 1993) to nd the nearest neighbors for
each point. This algorithm is quite fast in practice,
given a 5-dimensional feature space with several hun-
dred thousand points. The ANN method also allows
the nding of approximate nearest neighbors, which
runs more quickly than nding the actual nearest neigh-
bors. For the examples reported here we use ten near-
est neighbors of each pixel to generate the edges of the
graph.
One of the key differences fromthe previous section,
where the image grid was used to dene the graph,
is that the nearest neighbors in feature space capture
more spatially non-local properties of the image. In the
grid-graph case, all of the neighbors in the graph are
neighbors in the image. Here, points can be far apart
in the image and still be among a handful of nearest
Figure 6. A synthetic image (40 32 grey image) and the segmentation using the nearest neighbor graph ( = 0, k = 150).
neighbors (if their color is highly similar and interven-
ing image pixels are of dissimilar color). For instance,
this can result segmentations with regions that are dis-
connected in the image, which did not happen in the
grid-graph case.
Figure 6 shows a synthetic image from Perona and
Freeman (1998) and Gdalyahu et al. (1999) and its
segmentation, using k = 150 and with no smoothing
( = 0). In this example the spatially disconnected re-
gions do not reect interesting scene structures, but we
will see examples below which do.
For the remaining examples in this section, we use
k = 300 and = 0.8, as in the previous section.
First, we note that the nearest neighbor graph produces
similar results to the grid graph for images in which
the perceptually salient regions are spatially connected.
For instance, the street scene and baseball player scene
considered in the previous section yield very similar
segmentations using either the nearest neighbor graph
or the grid graph, as can be seen by comparing the
results in Fig. 7 with those in Figs. 2 and 3.
Figure 8 shows two additional examples using the
nearest neighbor graph. These results are not possible
to achieve with the grid graph approach because certain
interesting regions are not spatially connected. The rst
example shows a ower garden, where the red owers
are spatially disjoint in the foreground of the image, and
then merge together in the background. Most of these
owers are merged into a single region, which would
not be possible with the grid-graph method. The sec-
ond example in Fig. 8 shows the Eiffel tower at night.
The bright yellow light forms a spatially disconnected
region. These examples show that the segmentation
method, coupled with the use of a nearest neighbor
graph, can capture very high level properties of im-
ages, while preserving perceptually important region
boundaries.
178 Felzenszwalb and Huttenlocher
Figure 7. Segmentation of the street and baseball player scenes from the previous section, using the nearest neighbor graph rather than the
grid graph ( = 0.8, k = 300).
Figure 8. Segmentation using the nearest neighbor graph can capture spatially non-local regions ( = 0.8, k = 300).
7. Summary and Conclusions
In this paper we have introduced a new method for
image segmentation based on pairwise region compar-
ison. We have shown that the notions of a segmentation
being too coarse or too ne can be dened in terms of a
function which measures the evidence for a boundary
between pairs of regions. Our segmentation algorithm
makes simple greedy decisions, and yet produces seg-
mentations that obey the global properties of being not
too coarse and not too ne using a particular region
comparison function. The method runs in O(m log m)
time for m graph edges and is also fast in practice,
generally running in a fraction of a second.
Efcient Graph-Based Image Segmentation 179
The pairwise region comparison predicate we use
considers the minimum weight edge between two re-
gions in measuring the difference between them. Thus
our algorithm will merge two regions even if there is
a single low weight edge between them. This is not
as much of a problem as it might rst appear, in part
because this edge weight is compared only to the min-
imum spanning tree edges of each component. For in-
stance, the examples considered in Sections 5 and 6
illustrate that the method nds segmentations that cap-
ture many perceptually important aspects of complex
imagery. Nonetheless, one can envision measures that
require more than a single cheap connection before de-
ciding that there is no evidence for a boundary between
two regions. One natural way of addressing this issue is
to use a quantile rather than the minimumedge weight.
However, in this case nding a segmentation that is nei-
ther too coarse nor too ne is an NP-hard problem (as
shown in the Appendix). Our algorithm is unique, in
that it is both highly efcient and yet captures non-local
properties of images.
We have illustrated our image segmentation algo-
rithm with two different kinds of graphs. The rst of
these uses the image grid to dene a local neighborhood
between image pixels, and measures the difference in
intensity (or color) between each pair of neighbors.
The second of these maps the image pixels to points in
a feature space that combines the (x, y) location and
(r, g, b) color value. Edges in the graph connect points
that are close together in this feature space. The algo-
rithm yields good results using both kinds of graphs,
but the latter type of graph captures more perceptually
global aspects of the image.
Image segmentation remains a challenging problem,
however we are beginning to make substantial progress
through the introduction of graph-based algorithms
that both help rene our understanding of the problem
and provide useful computational tools. The work re-
ported here and the normalized cuts approach (Shi and
Malik, 1997) are just a few illustrations of these recent
advances.
Appendix: NP-Hardness of D with Quantiles
Intuitively the region comparison predicate D de-
ned in Section 3.1 could be made more robust by
changing the denition of the difference between two
regions to reect a quantile rather than the mini-
mum weight edge between them. We show that with
this modication the problem of nding a segmenta-
tion that is neither too coarse nor too ne becomes
NP-hard.
The only difference between the new problem and
the old one is the denition of the difference between
two regions C
1
, C
2
S in Eq. (2), which becomes
Dif (C
1
, C
2
) = K
th
w(v
i
, v
j
) (6)
where K
th
selects the Kth quantile of its arguments
(K should be between zero and one). For example
with K = 0.5 the difference becomes the median edge
weight between the two components. This quantile is
computed over all edges (v
i
, v
j
) such that v
i
C
1
and
v
j
C
2
.
We reduce the min ratio cut problem with uniform
capacities and demands to the problem of nding a
good segmentation. The min ratio cut problem with
uniform capacities and demands is the following: we
are given a graph G = (V, E) and another set of edges
F. Each edge in E indicates a unit capacity between
a pair of nodes and each edge in F indicates a unit
demand between a pair of nodes. The value of a cut
(A, B) is the ratio of the total capacity and the total
demand between the sets A and B. So it is the ratio of
the number edges in E crossing the cut and the number
of edges in F crossing the cut. Finding the value of the
minimumratio cut is an NP-hard problem(cf. Ausiello
et al., to appear).
First we show how to transform an instance of this
problem to one where the sets E and F are disjoint,
without modifying the value of the minimum cut. For
every edge (a, b) E F we create a new node ab,
and exchange the edge in E with the edges (a, ab) and
(b, ab). For a cut with a and b in the same side, its
always better to keep ab in that side too and the value
of the cut is the same as in the original graph. For a
cut with a and b in different sides the node ab can be
in either side and there will be one capacity and one
demand edge crossing the cut and the value of cut is
again the same as in the original graph.
Nowwe showhowto decide if the modied instance
of the min ratio cut problem has a cut with value at
most v by solving a segmentation problem. Let c be
the number of edges from E crossing a cut (A, B) and
similarly d is the number of edges from F crossing
(A, B). Its easy to show that the cut value is small
exactly when the fraction of edges crossing the cut that
come from F is large,
c
d
v
d
c +d

1
v +1
(7)
180 Felzenszwalb and Huttenlocher
Dene G

= (V, E

) where E

= E F. We let the
edges from E have weight zero and the edges from F
have weight one.
Lemma 2. The graph G has a cut with value at most
v if and only if a segmentation of G

is not one single


component, where Dif is dened in Eq. (6), K = 1
1/(v +1) and (C) = 0 for all C.
Proof: First we show that if G has a cut (A, B) with
value at most v there exists C A such that the seg-
mentation {C,

C} is not too ne. We just need to nd C
such that Int(C) = 0 and Dif (C,

C) = 1. If G has a cut
(A, B) with value at most v, than Eq. (7) tells us that
d/(c+d) 1/(v+1). Remember that there are d edges
of weight one and c edges of weight zero crossing the
cut. So the fraction of weight one edges crossing the
cut is at least 1/(v + 1). Look at the connected com-
ponents of A using only edges of weight zero. Clearly
Int(C) = 0 for all such components. Let C be the com-
ponent with largest fraction of weight one edges going
to B. This fraction must be at least 1/(v + 1). More-
over, the fraction of weight one edges between C and

C = V C is at least as large since



C = B (

C A)
and the there are only weight one edges between C and

C A. This implies the fraction of weight zero edges


between C and

C is less than 1 1/(v + 1) = K. So
the Kth quantile weight between C and

C is one. Thus
Dif (C,

C) = 1 and the segmentation S = {C,

C} of
G

is not too ne. Hence the segmentation of G

as a
single component is too coarse.
If G

has a segmentation that is not a single


component S = {C
1
, . . . , C
l
} then the Kth quantile
edge weight between every pair of components C
i
and
C
j
is one (or else the segmentation would be too ne).
Thus the Kth quantile edge weight between C
1
and

C
1
= C
2
C
l
is one. So the fraction of weight
one edges between C
1
and

C
1
is at least 1/(v + 1).
Equation (7) implies that value of the cut (C
1
,

C
1
) is
at most v.
It is straightforward to see that the transformation of
the min ratio cut problem to the problem of nding a
segmentation presented above can be done in polyno-
mial time. This is sufcient to show the hardness of the
segmentation problem.
Theorem 4. The problem of nding a segmentation
that is neither too coarse nor too ne using Dif as
dened in Eq. (6) is NP-hard.
Acknowledgments
This work was supported in part by gifts from Intel,
Microsoft and Xerox corporations, in part by DARPA
under contract DAAL01-97-K-0104, and in part by
NSF Research Infrastructure award CDA-9703470.
We would like to thank Shree Nayar, Jianbo Shi and
Daphna Weinshall for use of their images. We would
also like to thank Jon Kleinberg, Eva Tardos and Dan
Ramras for discussions about the algorithmand the NP
hardness result.
References
Arya, S. and Mount, D.M. 1993. Approximate nearest neighbor
searching. In Proc. 4th Annual ACM-SIAM Symposium on Dis-
crete Algorithms, pp. 271280.
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti
Spaccamela, A., and Protasi, M. (to appear). Complexity and Ap-
proximation. Combinatorial Optimization Problems and their Ap-
proximability Properties. Springer-Verlag: Berlin.
Comaniciu, D. and Meer, P. 1997. Robust analysis of feature
spaces: Color image segmentation. In Proceedings of IEEE Con-
ference on Computer Vision and Pattern Recognition, pp. 750
755.
Comaniciu, D. and Meer, P. 1999. Mean shift analysis and appli-
cations. In Proceedings of IEEE Conference on Computer Vision
and Pattern Recognition, pp. 11971203.
Cooper, M.C. 1998. The tractability of segmentation and scene anal-
ysis. International Journal of Computer Vision, 30(1):2742.
Cormen, T.H., Leiserson, C.E., and Rivest, R.L. 1990. Introduction
to Algorithms. The MIT Press: McGraw-Hill Book Company.
Felzenszwalb, P. and Huttenlocher, D. 1998. Image segmentation us-
ing local variation. In Proceedings of IEEE Conference on Com-
puter Vision and Pattern Recognition, pp. 98104.
Gdalyahu, Y., Weinshall, D., and Werman, M. 1999. Stochastic clus-
tering by typical cuts. In Proceedings of IEEEConference on Com-
puter Vision and Pattern Recognition, pp. 25962601.
Jain, A.K. and Dubes, R.C. 1988. Algorithms for Clustering Data.
Prentice Hall.
Jermyn, I. and Ishikawa, H. 2001. Globally optimal regions and
boundaries as minimum ratio weight cycles. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, 23:1075
1088.
Pavlidas, T. 1977. Structural Pattern Recognition. Springer-Verlag.
Perona, P. and Freeman, W. 1998. Afactorization approach to group-
ing. In Proceedings of the European Conference on Computer
Vision, pp. 655670.
Ratan, A.L., Maron, O., Grimson, W.E.L., and Lozano-Perez, T.
1999. A framework for learning query concepts in image clas-
sication. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 423431.
Shi, J. and Malik, J. 1997. Normalized cuts and image segmentation.
In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 731737.
Urquhart, R. 1982. Graph theoretical clustering based on limited
neighborhood sets. Pattern Recognition, 15(3):173187.
Efcient Graph-Based Image Segmentation 181
Weiss, Y. 1999. Segmentationusingeigenvectors: Aunifyingview. In
Proceedings of the International Conference on Computer Vision,
2:975982.
Wertheimer, M. 1938. Laws of organization in perceptual forms (par-
tial translation). In ASourcebook of Gestalt Psychology. W.B. Ellis
(Ed.). Harcourt, Brace and Company, pp. 7188.
Wu, Z. and Leahy, R. 1993. An optimal graph theoretic approach to
data clustering: Theory and its application to image segmentation.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
11:11011113.
Zahn, C.T. 1971. Graph-theoretic methods for detecting and describ-
ing gestalt clusters. IEEE Transactions on Computing, 20:6886.

You might also like