What Energy Functions Can Be Minimized Via Graph Cuts?

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

What Energy Functions can be Minimized via

Graph Cuts?
Vladimir Kolmogorov and Ramin Zabih
Computer Science Department, Cornell University, Ithaca, NY 14853
[email protected], [email protected]
Abstract. Many problems in computer vision can be naturally phrased
in terms of energy minimization. In the last few years researchers have
developed a powerful class of energy minimization methods based on
graph cuts. These techniques construct a specialized graph, such that
the minimum cut on the graph also minimizes the energy. The mini-
mum cut in turn is eciently computed by max ow algorithms. Such
methods have been successfully applied to a number of important vision
problems, including image restoration, motion, stereo, voxel occupancy
and medical imaging. However, each graph construction to date has been
highly specic for a particular energy function. In this paper we address
a much broader problem, by characterizing the class of energy functions
that can be minimized by graph cuts, and by giving a general-purpose
construction that minimizes any energy function in this class. Our results
generalize several previous vision algorithms based on graph cuts, and
also show how to minimize an interesting new class of energy functions.
1 Introduction and summary of results
Many of the problems that arise in early vision can be naturally expressed in
terms of energy minimization. The computational task of minimizing the energy
is usually quite dicult, as it generally requires minimizing a non-convex func-
tion in a space with thousands of dimensions. If the functions have a special
form they can be solved eciently using dynamic programming [2]. However,
researchers typically use general purpose global optimization techniques such as
simulated annealing [3, 11], which is extremely slow in practice.
In the last few years, however, researchers have developed a new approach
based on graph cuts. The basic technique is to construct a specialized graph
for the energy function to be minimized, such that the minimum cut on the
graph in turn minimizes the energy. The minimum cut in turn can be computed
very eciently by max ow algorithms. These methods have been successfully
used for a wide variety of vision problems including image restoration [7, 8, 16,
13], stereo and motion [4, 7, 8, 15, 18, 21, 22], voxel occupancy [24] and medical
imaging [6, 5, 17]. The output of these algorithms is generally a solution with
some interesting theoretical quality guarantee. In some cases [7, 15, 16, 13, 21] it
is the global minimum, in other cases a local minimum in a strong sense [8] that is
within a known factor of the global minimum. The experimental results produced
by these algorithms are also quite good; for example, two recent evaluations of
stereo algorithms using real imagery with ground truth found that a graph cut
method gave the best overall performance [23, 25].
Minimizing an energy function via graph cuts, however, remains a techni-
cally dicult problem. Each paper constructs its own graph specically for its
individual energy function, and in some of these cases (especially [18, 8]) the
construction is fairly complex. One consequence is that researchers sometimes
use heuristic methods for optimization, even in situations where the exact global
minimum can be computed via graph cuts [14, 20, 9]. The goal of this paper is
to precisely characterize the class of energy functions that can be minimized via
graph cuts, and to give a general-purpose graph construction that minimizes any
energy function in this class. Our results provide a signicant generalization of
the energy minimization methods used in [46, 8, 13, 17, 24], and show how to
minimize an interesting new class of energy functions.
1.1 Summary of our results
In this paper we consider two classes of energy functions. Let x
1
, . . . , x
n
, x
i

0, 1 be a set of binary-valued variables. We dene the class T
2
to be functions
of the form
E(x
1
, . . . , x
n
) =

i
E
i
(x
i
) +

i<j
E
i,j
(x
i
, x
j
). (1)
We dene the class T
3
to be functions of the form
E(x
1
, . . . , x
n
) =

i
E
i
(x
i
) +

i<j
E
i,j
(x
i
, x
j
) +

i<j<k
E
i,j,k
(x
i
, x
j
, x
k
). (2)
Obviously, the class T
2
is a strict subset of the class T
3
.
The main result in this paper is a precise characterization of the functions in
T
3
that can be minimized using graph cuts, together with a graph construction
for minimizing such functions. Moreover, we give a necessary condition for all
other classes which must be met for a function to be minimized via graph cuts.
Note that in this paper we only consider binary-valued variables. Most of the
previous work with graph cuts cited above considers energy functions that involve
variables with more than 2 possible values. For example, the work on stereo, mo-
tion and image restoration described in [8] addresses the standard pixel-labeling
problem in early vision. In these pixel-labeling problems, the variables represent
individual pixels, and the possible values for an individual variable represent its
possible displacements or intensities. However, many of the graph cut methods
that handle multiple possible values actually consider a pair of labels at a time.
As a consequence, even though we only address binary-valued variables, our re-
sults generalize the algorithms given in [46, 8, 13, 17, 24]. As an example, we will
show in section 4.1 how to use our results to solve the pixel-labeling problem,
even though the pixels have many possible labels.
We also identify an interesting class of class of energy functions that have not
yet been minimized using graph cuts. All of the previous work with graph cuts
2
involves a neighborhood system that is dened on pairs of pixels. In the language
of Markov Random Fields [11, 19], these methods consider rst-order MRFs.
The associated energy functions lie in T
2
. Our results allow for the minimization
of energy functions in the larger class T
3
, and thus for neighborhood systems
involve triples of pixels.
1.2 Organization
The rest of the paper is organized as follows. In section 2 we give an overview of
graph cuts. In section 3 we formalize the problem that we want to solve. Section 4
contains our main theorem for the class of functions T
2
and shows how it can be
used. Section 5 contains our main theorems for other classes. Detailed proofs of
our theorems, together with the graph constructions, are deferred to section 6.
A summary of the actual graph constructions given in the appendix.
2 Overview of graph cuts
Suppose ( = (1, c) is a directed graph with two special vertices (terminals),
namely the source s and the sink t. An s-t-cut (or just a cut as we will refer to
it later) C = S, T is a partition of vertices in 1 into two disjoint sets S and T,
such that s S and t T. The cost of the cut is the cut is the sum of costs of
all edges that go from S to T:
c(S, T) =

uS,vT,(u,v)E
c(u, v).
The minimum s-t-cut problem is to nd a cut C with the smallest cost. Due
to the theorem of Ford and Fulkerson [10] this is equivalent to computing the
maximum ow from the source to sink. There are many algorithms which solve
this problem in polynomial time with small constants [1, 12].
It is convenient to denote a cut C = S, T by a labeling f mapping from the
set of the nodes 1 s, t to 0, 1 where f(v) = 0 means that v S, and
f(v) = 1 means that v T. We will use this notation later.
3 Dening graph representability
Let us consider a graph ( = (1, c) with terminals s and t, thus 1 = v
1
, . . . , v
n
, s, t.
Each cut on ( has some cost; therefore, ( represents the energy function map-
ping from all cuts on ( to the set of nonnegative real numbers. Any cut can be
described by n binary variables x
1
, . . . , x
n
corresponding to nodes in ( (exclud-
ing the source and the sink): x
i
= 0 when v
i
S, and x
i
= 1 when v
i
T.
Therefore, the energy E that ( represents can be viewed as a function of n
binary variables: E(x
1
, . . . , x
n
) is equal to the cost of the cut dened by the
conguration x
1
, . . . , x
n
(x
i
0, 1).
3
We can eciently minimize E by computing the minimum s-t-cut on (. Thus
a natural question to ask is what is the class of energy functions for which we
can construct a graph that represents it.
We can also generalize our construction. Above we used each node (except the
source and the sink) for encoding one binary variable. Instead we can specify a
subset 1
0
= v
1
, . . . , v
k
1 s, t and introduce variables only for the nodes
in this set. Then there may be several cuts corresponding to a conguration
x
1
, . . . , x
k
. If we dene the energy E(x
1
, . . . , x
k
) as the minimum among costs
of all such cuts then the minimum s-t-cut on ( will again yield the conguration
which minimizes E.
Finally, note that the conguration that minimizes E will not change if we
add a constant to E.
We will summarize graph constructions that we allow in the following de-
nition.
Denition 1. A function E of n binary variables is called graph-representable
if there exists a graph ( = (1, c) with terminals s and t and a subset of nodes
V
0
= v
1
, . . . , v
n
1s, t such that for any conguration x
1
, . . . , x
n
the value
of the energy E(x
1
, . . . , x
n
) is equal to a constant plus the cost of the minimum
s-t-cut among all cuts C = S, T in which v
i
S, if x
i
= 0, and v
i
T, if x
i
= 1
(1 i n). We say that E is exactly represented by (, V
0
if this constant is
zero.
The following lemma is an obvious consequence of this denition.
Lemma 2. Suppose the energy function E is graph-representable by a graph (
and a subset 1
0
. Then it is possible to nd the exact minimum of E in polynomial
time by computing the minimum s-t-cut on (.
In this paper we will give a complete characterization of the classes T
2
and
T
3
in terms of graph representability, and show how to construct graphs for
minimizing graph-representable energies within these classes. Moreover, we will
give a necessary condition for all other classes which must be met for a function
to be graph-representable. Note that it would be suce to consider only the class
T
3
since T
2
T
3
. However, the condition for T
2
is simpler so we will consider
it separately.
4 The class F
2
Our main result for the class T
2
is the following theorem.
Theorem 3. Let E be a function of n binary variables from the class T
2
, i.e.
it can be written as the sum
E(x
1
, . . . , x
n
) =

i
E
i
(x
i
) +

i<j
E
i,j
(x
i
, x
j
).
4
Then E is graph-representable if and only if each term E
i,j
satises the inequality
E
i,j
(0, 0) +E
i,j
(1, 1) E
i,j
(0, 1) +E
i,j
(1, 0).
4.1 Example: pixel-labeling via expansion moves
In this section we show how to apply this theorem to solve the pixel-labeling
problem. In this problem, are given the set of pixels T and the set of labels L.
The goal is to nd a labeling l (i.e. a mapping from the set of pixels to the set
of labels) which minimizes the energy
E(l) =

pP
D
p
(l
p
) +

p,qN
V
p,q
(l
p
, l
q
)
where ^ TT is a neighborhood system on pixels. Without loss of generality
we can assume that ^ contains only ordered pairs p, q for which p < q (since
we can combine two terms V
p,q
and V
q,p
into one term). We will show how our
method can be used to derive the expansion move algorithm developed in [8].
This problem is NP-hard if [L[ > 2 [8]. [8] gives an approximation algorithm
for minimizing this energy. A single step of this algorithm is an operation called
an -expansion. Suppose that we have some current conguration l
0
, and we
are considering a label L. During the -expansion operation a pixel p is
allowed either to keep its old label l
0
p
or to switch to a new label : l
p
= l
0
p
or
l
p
= . The key step in the approximation algorithm presented in [8] is to nd
the optimal expansion operation, i.e. the one that leads to the largest reduction
in the energy E. This step is repeated until there is no choice of where the
optimal expansion operation reduces the energy.
[8] constructs a graph which contains nodes corresponding to pixels in T.
The following encoding is used: if f(p) = 0 (i.e., the node p is in the source set)
then l
p
= l
0
p
; if f(p) = 1 (i.e., the node p is in the sink set) then l
p
= .
Note that the key technical step in this algorithm can be naturally expressed
as minimizing an energy function involving binary variables. The binary variables
correspond to pixels, and the energy we wish to minimize can be written formally
as
E(x
p1
, . . . , x
pn
) =

pP
D
p
(l
p
(x
p
)) +

p,qN
V
p,q
(l
p
(x
p
), l
q
(x
q
)), (3)
where
p T l
p
(x
p
) =

l
0
p
, x
p
= 0
, x
p
= 1.
We can demonstrate the power of our results by deriving an important re-
striction on this algorithm. In order for the graph cut construction of [8] to
work, the function V
p,q
is required to be a metric. In their paper, it is not clear
whether this is an accidental property of the construction (i.e., they leave open
5
the possibility that a more clever graph cut construction may overcome this
restriction).
Using our results, we can easily show this is not the case. Specically, by
theorem 3, the energy function given in equation 3 is graph-representable if and
only if each term V
p,q
satises the inequality
V
p,q
(l
p
(0), l
q
(0)) +V
p,q
(l
p
(1), l
q
(1)) V
p,q
(l
p
(0), l
q
(1)) +V
p,q
(l
p
(1), l
q
(0))
or
V
p,q
(, ) +V
p,q
(, ) V
p,q
(, ) +V
p,q
(, )
where = l
0
p
, = l
0
q
. If V
p,q
(, ) = 0, then this is the triangle inequality:
V
p,q
(, ) V
p,q
(, ) +V
p,q
(, )
This is exactly the constraint on V
p,q
that was given in [8].
5 More general classes of energy functions
We begin with several denitions. Suppose we have a function E of n binary
variables. If we x m of these variables then we get a new function E

of n m
binary variables; we will call this function a projection of E. The notation for
projections is as follows.
Denition 4. Let E(x
1
, . . . , x
n
) be a function of n binary variables, and let I, J
be a disjoint partition of the set of indices 1, . . . , n: I = i(1), . . . , i(m), J =
j(1), . . . , j(nm). Let
i(1)
, . . . ,
i(m)
be some binary constants. A projection
E

= E[x
i(1)
=
i(1)
, . . . , x
i(m)
=
i(m)
] is a function of n m variables dened
by
E

(x
j(1)
, . . . , x
j(nm)
) = E(x
1
, . . . , x
n
),
where x
i
=
i
for i I. We say that we x variables x
i(1)
, . . ., x
i(m)
.
Now we give a denition of regular functions.
Denition 5.
All functions of one variable are regular.
A function E of two variables is called regular if E(0, 0)+E(1, 1) E(0, 1)+
E(1, 0).
A function E of more than two variables is called regular if all projections
of E of two variables are regular.
Now we are ready to formulate our main theorem for T
3
.
Theorem 6. Let E be a function of n binary variables from T
3
, i.e. it can be
written as the sum
E(x
1
, . . . , x
n
) =

i
E
i
(x
i
) +

i<j
E
i,j
(x
i
, x
j
) +

i<j<k
E
i,j,k
(x
i
, x
j
, x
k
).
Then E is graph-representable if and only if E is regular.
6
Finally, we give a necessary condition for all other classes.
Theorem 7. Let E be a function of binary variables. If E is not regular then
E is not graph-representable.
6 Proofs
6.1 Basic lemmas
Denition 8. The functional will be a mapping from the set of all functions
(of binary variables) to the set of real numbers which is dened as follows. For
a function E(x
1
, . . . , x
n
)
(E) =

x1{0,1},...,xn{0,1}
(
n
i=1
(1)
xi
) E(x
1
, . . . , x
n
).
For example, for a function E of two variables (E) = E(0, 0) E(0, 1)
E(1, 0) + E(1, 1). Note that a function E of two variables is regular if and only
if (E) 0.
It is trivial to check the following properties of .
Lemma 9.
is linear, i.e. for a scalar c and two functions E

, E

of n variables (E

+
E

) = (E

) +(E

) and (c E

) = c (E

).
If E is a function of n variables that does not depend on at least one of the
variables then (E) = 0.
The next two lemmas provide building blocks for constructing graphs for
complex functions.
Lemma 10. Let I = 1, . . . , n, I

= i

(1), . . . , i

(n

) I,
I

= i

(1), . . . , i

(n

) I be sets of indices. If the functions E

(x
i

(1)
, . . . , x
i

(n

)
)
and E

(x
i

(1)
, . . . , x
i

(n

)
) are graph-representable, then so is the function
E(x
1
, . . . , x
n
) = E

(x
i

(1)
, . . . , x
i

(n

)
) +E

(x
i

(1)
, . . . , x
i

(n

)
).
Proof. Let us assume for the simplicity of notation that E

and E

are functions
of all n variables: E

= E

(x
1
, . . . , x
n
), E

= E

(
1
, . . . ,
n
). By the denition
of graph respresentability, there exist constants K

, K

, graphs (

= (1

, c

),
(

= (1

, c

) and the set 1


0
= v
1
, . . . , v
n
, 1
0
1

s, t, 1
0
1

s, t
such that E

+ K

is exactly represented by (

, V
0
and E

+ K

is exactly
represented by (

, V
0
. We can assume that the only common nodes of (

and
7
(

are 1
0
s, t. Let us construct the graph ( = (1, c) as the combined graph
of (

and (

: 1 = 1

, c = c

.
Let

E be the function that (, 1
0
exactly represent. Let us prove that

E
E + (K

+K

) (and, therefore, E is graph-representable).


Consider a conguration x
1
, . . . , x
n
. Let C

= S

, T

be the cut on (

with
the smallest cost among all cuts for which v
i
S

if x
i
= 0, and v
i
T

if x
i
= 1
(1 i n). According to the denition of graph representability,
E

(x
1
, . . . , x
n
) +K

uS

,vT

,(u,v)E

c(u, v)
Let C

= S

, T

be the cut on (

with the smallest cost among all cuts for


which v
i
S

if x
i
= 0, and v
i
T

if x
i
= 1 (1 i n). Similarly,
E

(x
1
, . . . , x
n
) +K

uS

,vT

,(u,v)E

c(u, v)
Let S = S

, T = T

. Its easy to check that C = S, T is a cut on


(. Thus,

E(x
1
, . . . , x
n
)

uS,vT,(u,v)E
c(u, v)
=

uS

,vT

,(u,v)E

c(u, v) +

uS

,vT

,(u,v)E

c(u, v)
= (E

(x
1
, . . . , x
n
) +K

) +(E

(x
1
, . . . , x
n
) +K

) = E(x
1
, . . . , x
n
) +(K

+K

).
Now let C = S, T be the cut on ( with the smallest cost among all cuts for
which v
i
S if x
i
= 0, and v
i
T if x
i
= 1 (1 i n), and let S

= S 1

,
T

= T 1

, S

= S 1

, T

= T 1

. Its easy to see that C

= S

, T

and
C

= S

, T

are cuts on (

and (

, respectively. According to the denition of


graph representability,
E(x
1
, . . . , x
n
) + (K

+K

) = (E

(x
1
, . . . , x
n
) +K

) + (E

(x
1
, . . . , x
n
) +K

uS

,vT

,(u,v)E

c(u, v) +

uS

,vT

,(u,v)E

c(u, v)
=

uS,vT,(u,v)E

c(u, v) =

E(x
1
, . . . , x
n
).
.
Lemma 11. Suppose two functions E and E

of n variables are such that


x
1
, . . . , x
n
E

(x
1
, . . . , x
n
) =

E(x
1
, . . . , x
n
), x
k
= 0
E(x
1
, . . . , x
n
) +C, x
k
= 1,
for some constants k and C (1 k n). Then
8
E

if graph-representable if and only if E is graph-representable;


E

is regular if and only if E is regular.


Proof. Let us introduce the following function E
C
:
x
1
, . . . , x
n
E
C
(x
1
, . . . , x
n
) =

0, x
k
= 0
C, x
k
= 1.
We need to show that E
C
is graph-representable for any C then the rst
part of the lemma will follow from the lemma 10 since E

E + E
C
and
E E

+E
C
.
It is easy to construct a graph which represents E
C
. The set of nodes in this
graph will be v
1
, . . . , v
n
, s, t and the set of edges will include the only edge
(s, v
k
) with the capacity C (if C 0) or the edge (v
k
, t) with the capacity C
(if C < 0). It is trivial to check that this graph exactly represents E
C
(in the
former case) or E
C
+C (in the latter case).
Now let us assume that one of the functions E and E

is regular, for example,


E. Consider a projection of E

of two variables:
E

[x
i(1)
=
i(1)
, . . . , x
i(m)
=
i(m)
],
where m = n 2 and i(1), . . . , i(m) 1, . . . , n. We need to show that this
function is regular, i.e. that the functional of this function is nonpositive. Due
to the linearity of we can write
(E

[x
i(1)
=
i(1)
, . . . , x
i(m)
=
i(m)
]) =
= (E[x
i(1)
=
i(1)
, . . . , x
i(m)
=
i(m)
]) +
+ (E
C
[x
i(1)
=
i(1)
, . . . , x
i(m)
=
i(m)
]).
The rst term is nonpositive by assumption, and the second term is 0 by lemma 9.
.
6.2 Proof of theorems 3 and 6: the constructive part
In this section we will give the constructive part of the proof: given a regular
energy function from class T
3
we will show how to construct a graph which
represents it. We will do it in three steps. First we will consider regular functions
of two variables, then regular functions of three variables and nally regular
functions of the form as in the theorem 6.
This will also prove the constructive part of the theorem 3. Indeed, suppose
a function is from the class T
2
and each term in the sum satises the condition
given in the theorem 3 (i.e. regular). Then each term is graph-implementable (as
we will show in this section) and, hence, the function is graph-implementable as
well according to the lemma 10.
The other direction of theorems 3 and 6 as well as the theorem 7 will be
proven in the section 6.3.
9
Functions of two variables Let E(x
1
, x
2
) be a function of two variables
represented by a table
E =
E(0, 0) E(0, 1)
E(1, 0) E(1, 1)
Lemma 11 tells us that we can add a constant to any column or row with-
out aecting theorem 6. Thus, without loss of generality we can consider only
functions E of the form
E =
0 A
0 0
(we subtracted a constant from the rst row to make the upper left element
zero, then we subtracted a constant from the second row to make the bottom
left element zero, and nally we subtracted a constant from the second row to
make the bottom right element zero).
(E) = A 0 since we assumed that E is regular; hence, A is non-negative.
Now we can easily constuct a graph ( which represents this function. It will
have four vertices 1 = v
1
, v
2
, s, t and one edge c = (v
1
, v
2
) with the cost
c(v
1
, v
2
) = A. It is easy to see that (, 1
0
= v
1
, v
2
represent E since the only
case when the edge (v
1
, v
2
) is cut (yielding a cost A) is when v
1
S, v
2
T, i.e.
when x
1
= 0, x
2
= 1.
Note that we did not introduce any additional nodes for representing bi-
nary interactions of binary variables. This is in contrast to the construction in
[8] which added auxiliary nodes for representing energies that we just consid-
ered. Our construction yields a smaller graph and, thus, the minimum cut can
potentially be computed faster.
Functions of three variables Now let us consider a regular function E of
three variables. Let us represent it as a table
E =
E(0, 0, 0) E(0, 0, 1)
E(0, 1, 0) E(0, 1, 1)
E(1, 0, 0) E(1, 0, 1)
E(1, 1, 0) E(1, 1, 1)
Two cases are possible:
Case 1. (E) 0. We can apply transformations described in the lemma 11
and get the following function:
E =
0 0
0 A
1
0 A
2
A
3
A
0
Its easy to check that these transformations preserve the functional . Hence,
A = A
0
(A
1
+ A
2
+ A
3
) = (E) 0. By applying the regularity constraint
10
to the projections E[x
1
= 0], E[x
2
= 0], E[x
3
= 0] we also get A
1
0, A
2
0,
A
3
0.
We can represent E as the sum of functions
E =
0 0
0 A
1
0 0
0 A
1
+
0 0
0 0
0 A
2
0 A
2
+
0 0
0 0
0 0
A
3
A
3
+
0 0
0 0
0 0
0 A
We need to show that all terms here are graph-representable, then lemma 10
will imply that E is graph-representable as well.
The rst three terms are regular functions depending only on two variables
and thus are graph-representable as was shown in the previous section. Let us
consider the last term.
The graph ( that represents this term can be constructed as follows. The
set of nodes will contain one auxilary node u: 1 = v
1
, v
2
, v
3
, u, s, t. The set
of edges will consist of directed edges c = (v
1
, u), (v
2
, u), (v
3
, u), (u, t) with
capacities A

= A. Let us prove that (, 1


0
= v
1
, v
2
, v
3
exactly represent the
following function E

(x
1
, x
2
, x
3
) = E(x
1
, x
2
, x
3
) +A

:
E

=
A

0
If x
1
= x
2
= x
3
= 1 then the cost of the minimum cut is 0 (the minimum cut
is S = s, T = v
1
, v
2
, v
3
, u, t. Suppose at least one of the variables x
1
, x
2
, x
3
is 0; without loss of generality, we can assume that x
1
= 0, i.e. v
1
o. If u S
then the edge (u, t) is cut; if u T the edge (v
1
, u) is cut yielding the cost A

.
Hence, the cost of the minimum cut is at least A

. However, if u S the cost of


the cut is exactly A

no matter where the nodes v


1
, v
2
, v
3
are. We proved that
(, 1
0
exactly represent E

.
Case 2. (E) < 0. This case is similar to the case 1. We can transform the
energy to
E =
A
0
A
3
A
2
0
A
1
0
0 0
=
A
1
0
0 0
A
1
0
0 0
+
A
2
0
A
2
0
0 0
0 0
+
A
3
A
3
0 0
0 0
0 0
+
A 0
0 0
0 0
0 0
where A = A
0
(A
1
+A
2
+A
3
) = (E) < 0 and A
1
0, A
2
0, A
3
0 since E
is regular. The rst three terms are regular functions of two variables and the last
term can be represented by the graph ( = (1, c) where 1 = v
1
, v
2
, v
3
, u, s, t
and c = (u, v
1
), (u, v
2
), (u, v
3
), (s, u); capacities of all edges are A.
11
Functions of many variables Finally let us consider a regular function E
which can be written as
E(x
1
, . . . , x
n
) =

i<j<k
E
i,j,k
(x
i
, x
j
, x
k
),
where i, j, k are indices from the set 1, . . . , n (we omitted terms involving
functions of one and two variables since they can be viewed as functions of three
variables).
Although E is regular, each term in the sum need not necessarily be regular.
However we can regroup terms in the sum so that each term will be regular
(and, thus, graph-representable). This can be done using the following lemma
and a trivial induction argument.
Denition 12. Let E
i,j,k
be a function of three variables. The functional N(E
i,j,k
)
is dened as the number of projections of two variables of E
i,j,k
with the positive
value of the functional .
Note that N(E
i,j,k
) = 0 exactly when E
i,j,k
is regular.
Lemma 13. Suppose the function E of n variables can be written as
E(x
1
, . . . , x
n
) =

i<j<k
E
i,j,k
(x
i
, x
j
, x
k
),
where some of the terms are not regular. Then it can be written as
E(x
1
, . . . , x
n
) =

i<j<k

E
i,j,k
(x
i
, x
j
, x
k
),
where

i<j<k
N(

E
i,j,k
) <

i<j<k
N(E
i,j,k
).
Proof. For the simplicity of notation let us assume that the term E
1,2,3
is not
regular and (E
1,2,3
[x
3
= 0]) > 0 or (E
1,2,3
[x
3
= 1]) > 0 (we can ensure this
by renaming indices). Let
C
k
= max

k
{0,1}
(E
1,2,k
[x
k
=
k
]) k 4, . . . , n
C
3
=
n

k=4
C
k
Now we will modify the terms E
1,2,3
, . . ., E
1,2,n
as follows:

E
1,2,k
E
1,2,k
R[C
k
] k 3, . . . , n
where R[C] is the function of two variables x
1
and x
2
dened by the table
R[C] =
0 0
0 C
12
(other terms are unchanged:

E
i,j,k
E
i,j,k
, (i, j) ,= (1, 2)). We have
E(x
1
, . . . , x
n
) =

i<j<k

E
i,j,k
(x
i
, x
j
, x
k
)
since

n
k=3
C
k
= 0 and

n
k=3
R[C
k
] 0.
If we consider R[C] as a function of n variables and take a projection of
two variables where the two variables that are not xed are x
i
and x
j
(i < j),
then the functional will be C, if (i, j) = (1, 2), and 0 otherwise since in the
latter case a projection actually depends on at most one variable. Hence, the
only projections of two variables that could have changed their value of the
functional are

E
1,2,k
[x
3
=
3
, . . . , x
n
=
n
], k 3, . . . , n, if we treat

E
1,2,k
as functions of n variables, or

E
1,2,k
[x
k
=
k
], if we treat

E
1,2,k
as functions of
three variables.
First let us consider terms with k 4, . . . , n. We have (E
1,2,k
[x
k
=
k
])
C
k
, thus
(

E
1,2,k
[x
k
=
k
]) = (E
1,2,k
[x
k
=
k
]) (R[C
k
][x
k
=
k
]) C
k
C
k
= 0
Therefore we did not introduce any nonregular projections for these terms.
Now let us consider the term (

E
1,2,3
[x
3
=
3
]). We can write
(

E
1,2,3
[x
3
=
3
]) = (E
1,2,3
[x
3
=
3
]) (R[C
3
][x
3
=
3
]) =
= (E
1,2,3
[x
3
=
3
]) (
n

k=4
C
k
) =
n

k=3
(E
1,2,k
[x
k
=
k
])
where
k
= arg max
{0,1}
(E
1,2,k
[x
k
= ]), k 4, . . . , n. The last expression
is just (E[x
3
=
3
, . . . , x
n
=
n
]) and is nonpositive since E is regular by
assumption. Hence, values (

E
1,2,3
[x
3
= 0]) and (

E
1,2,3
[x
3
= 1]) are both
nonpositive and, therefore, the number of nonregular projections has decreased.
.
6.3 Proof of theorem 7
In this section we will prove a necessary condition for graph representability: if a
function of binary variables is graph-representable then it is regular. It will also
imply the corresponding directions of the theorems 3 and 6. Note that theorem 3
needs a little bit of reasoning, as follows. Let us consider a graph-representable
function E from the class T
2
:
E(x
1
, . . . , x
n
) =

i
E
i
(x
i
) +

i<j
E
i,j
(x
i
, x
j
)
E is regular as we will prove in this section. It means that the functional of
any projection of E of two variables is nonpositive. Let us consider a projection
where the two variables that are not xed are x
i
and x
j
. By lemma 9 the value
of the functional of this projection is equal to (E
i,j
) (all other terms yield
zero). Hence, all terms E
i,j
are regular, i.e. they satisfy the condition in the
theorem 3.
13
Denition 14. Let ( = (1, c) be a graph, v
1
, . . . , v
k
be a subset of nodes 1 and

1
, . . . ,
k
be binary constants whose values are from 0, 1. We will dene the
graph ([x
1
=
1
, . . . , x
k
=
k
] as follows. Its nodes will be the same as in (
and its edges will be all edges of ( plus additional edges corresponding to nodes
v
1
, . . . , v
k
: for a node v
i
, we add the edge (s, v
i
), if
i
= 0, or (v
i
, t), if
i
= 1,
with an innite capacity.
It should be obvious that these edges enforce constraints f(v
1
) =
1
, . . .,
f(v
k
) =
k
in the minimum cut on ([x
1
=
1
, . . . , x
k
=
k
], i.e. if
i
= 0 then
v
i
S, and if
i
= 1 then v
i
T. (If, for example,
i
= 0 and v
i
T then the
edge (s, v
i
) must be cut yielding an innite cost, so it would not the minimum
cut.)
Now we can give a denition of graph representability which is equivalent to
the denition 1. This new denition will be more convenient for the proof.
Denition 15. We say that the function E of n binary variables is exactly
represented by the graph ( = (1, c) and the set 1
0
1 if for any congura-
tion
1
, . . . ,
n
the cost of the minimum cut on ([x
1
=
1
, . . . , x
k
=
k
] is
E(
1
, . . . ,
n
).
Lemma 16. Any projection of a graph-representable function is graph-representable.
Proof. Let E be a graph-representable function of n variables, and the graph ( =
(1, c) and the set 1
0
represents E. Suppose that we x variables x
i(1)
, . . . , x
i(m)
.
It is straightforward to check that the graph ([x
i(1)
=
i(1)
, . . . , x
i(m)
=
i(m)
]
and the set 1

0
= 1
0
v
i(1)
, . . . , v
i(m)
represent the function E

= E[x
i(1)
=

i(1)
, . . . , x
i(m)
=
i(m)
]. .
This lemma implies that it suces to prove theorem 7 only for energies of
two variables.
Let E(x
1
, x
2
) be a graph-representable function of two variables. Let us rep-
resent this function as a table:
E =
E(0, 0) E(0, 1)
E(1, 0) E(1, 1)
Lemma 11 tells us that we can add a constant to any column or row with-
out aecting theorem 7. Thus, without loss of generality we can consider only
functions E of the form
E =
0 0
0 A
(we subtracted a constant from the rst row to make the upper left element
zero, then we subtracted a constant from the second row to make the bottom
left element zero, and nally we subtracted a constant from the second row to
make the upper right element zero).
14
We need to show that E is regular, i.e. that (E) = A 0. Suppose this is
not true: A > 0.
Suppose the graph ( and the set 1
0
= v
1
, v
2
represent E. It means that
there is a constant K such that (, 1
0
exactly represent E

(x
1
, x
2
) = E(x
1
, x
2
) +
K:
E

=
K K
K K +A
The cost of the minimum s-t-cut on ( is K (since this cost is just the mini-
mum entry in the table for E

); hence, K 0. Thus the value of the maximum


ow from s to t in ( is K. Let (
0
be the residual graph obtained from ( after
pushing the ow K. Let E
0
(x
1
, x
2
) be the function exactly represented by (
0
, 1
0
.
By the denition of graph representability, E

(
1
,
2
) is equal to the value
of the minimum cut (or maximum ow) on the graph ([x
1
=
1
, x
2
=
2
]. The
following sequence of operations shows one possible way to push the maximum
ow through this graph.
First we take the original graph ( and push the ow K; then we get the
residual graph (
0
. (It is equivalent to pushing ow through ([x
1
=
1
, x
2
=

2
] where we do not use edges corresponding to constraints x
1
=
1
and
x
2
=
2
).
Then we add edges corresponding to these constraints; then we get the graph
(
0
[x
1
=
1
, x
2
=
2
].
Finally we push the maximum ow possible through the graph (
0
[x
1
=

1
, x
2
=
2
]; the amount of this ow is E
0
(
1
,
2
) according to the denition
of graph representability.
The total amount of ow pushed during all steps is K +E
0
(
1
,
2
); thus,
E

(
1
,
2
) = K +E
0
(
1
,
2
)
or
E(
1
,
2
) = E
0
(
1
,
2
)
We proved that E is exactly represented by (
0
, 1
0
.
The value of the minimum cut/maximum ow on (
0
is 0 (it is the minimum
entry in the table for E); thus, there is no augmenting path from s to t in (
0
.
However, if we add edges (v
1
, t) and (v
2
, t) then there will be an augmenting
path from s to t in (
0
[x
1
=
1
, x
2
=
2
] since E(1, 1) = A > 0. Hence, this
augmenting path will contain at least one of these edges and, therefore, either
v
1
or v
2
will be in the path. Let P be the part of this path going from the source
until v
1
or v
2
is rst encountered. Without loss of generality we can assume that
it will be v
1
. Thus, P is an augmenting path from s to v
1
which does not contain
edges that we added, namely (v
1
, t) and (v
2
, t).
Finally let us consider the graph (
0
[x
1
= 1, x
2
= 0] which is obtained from(
0
by adding edges (v
1
, t) and (s, v
2
) with innite capacities. There is an augmenting
path P, (v
1
, t) from the source to the sink in this graph; hence, the minimum
cut/maximum ow on it greater then zero, or E(1, 0) > 0. We get a contradiction.
15
Appendix: Summary of graph constructions
We now summarize the graph constructions used for regular functions. The no-
tation D(v, c) means that we add an edge (s, v) with the weight c if c > 0, or an
edge (v, t) with the weight c if c < 0.
Regular functions of one binary variable
Recall that all functions of one variable are regular. For a function E(x
1
), we
construct a graph ( with three vertices 1 = v
1
, s, t. There is a single edge
D(v
1
, E(1) E(0)).
Regular functions of two binary variables
We now show how to construct a graph ( for a regular function E(x
1
, x
2
) of two
variables. It will contain four vertices: 1 = v
1
, v
2
, s, t. The edges c are given
below.
D(v
1
, E(1, 0) E(0, 0));
D(v
2
, E(1, 1) E(1, 0));
(v
1
, v
2
) with the weight (E).
Regular functions of three binary variables
We next show how to construct a graph ( for a regular function E(x
1
, x
2
, x
3
) of
three variables. It will contain ve vertices: 1 = v
1
, v
2
, v
3
, u, s, t. If (E) 0
then the edges are
D(v
1
, E(1, 0, 1) E(0, 0, 1));
D(v
2
, E(1, 1, 0) E(1, 0, 0));
D(v
3
, E(0, 1, 1) E(0, 1, 0));
(v
2
, v
3
) with the weight (E[x
1
= 0]);
(v
3
, v
1
) with the weight (E[x
2
= 0]);
(v
1
, v
2
) with the weight (E[x
3
= 0]);
(v
1
, u), (v
2
, u), (v
3
, u), (u, t) with the weight (E).
If (E) < 0 then the edges are
D(v
1
, E(1, 1, 0) E(0, 1, 0));
D(v
2
, E(0, 1, 1) E(0, 0, 1));
D(v
3
, E(1, 0, 1) E(1, 0, 0));
(v
3
, v
2
) with the weight (E[x
1
= 1]);
(v
1
, v
3
) with the weight (E[x
2
= 1]);
(v
2
, v
1
) with the weight (E[x
3
= 1]);
(u, v
1
), (u, v
2
), (u, v
3
), (s, u) with the weight (E).
16
References
1. Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows:
Theory, Algorithms, and Applications. Prentice Hall, 1993.
2. Amir Amini, Terry Weymouth, and Ramesh Jain. Using dynamic programming
for solving variational problems in vision. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 12(9):855867, September 1990.
3. Stephen Barnard. Stochastic stereo matching over scale. International Journal of
Computer Vision, 3(1):1732, 1989.
4. S. Bircheld and C. Tomasi. Multiway cut for stereo and motion with slanted
surfaces. In International Conference on Computer Vision, pages 489495, 1999.
5. Yuri Boykov and Marie-Pierre Jolly. Interactive organ segmentation using graph
cuts. In Medical Image Computing and Computer-Assisted Intervention, pages
276286, 2000.
6. Yuri Boykov and Marie-Pierre Jolly. Interactive graph cuts for optimal boundary
and region segmentation of objects in N-D images. In International Conference on
Computer Vision, pages I: 105112, 2001.
7. Yuri Boykov, Olga Veksler, and Ramin Zabih. Markov Random Fields with ef-
cient approximations. In IEEE Conference on Computer Vision and Pattern
Recognition, pages 648655, 1998.
8. Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy mini-
mization via graph cuts. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 23(11):12221239, November 2001.
9. C.Schellewald, J.Keuchel, and C.Schnorr. Image labeling and grouping by mini-
mizing linear functionals over cones. In International Workshop on Energy Mini-
mization Methods in Computer Vision and Pattern Recognition, 2001.
10. L. Ford and D. Fulkerson. Flows in Networks. Princeton University Press, 1962.
11. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 6:721741, 1984.
12. A. Goldberg and R. Tarjan. A new approach to the maximum ow problem.
Journal of the Association for Computing Machinery, 35(4):921940, October 1988.
13. D. Greig, B. Porteous, and A. Seheult. Exact maximum a posteriori estimation for
binary images. Journal of the Royal Statistical Society, Series B, 51(2):271279,
1989.
14. I. Herlin and G. Giraudon. Use of temporal information in a segmentation algo-
rithm of ultrasound images. In IEEE Conference on Computer Vision and Pattern
Recognition, 1993.
15. H. Ishikawa and D. Geiger. Occlusions, discontinuities, and epipolar lines in stereo.
In European Conference on Computer Vision, pages 232248, 1998.
16. H. Ishikawa and D. Geiger. Segmentation by grouping junctions. In IEEE Con-
ference on Computer Vision and Pattern Recognition, pages 125131, 1998.
17. Junmo Kim, John Fish, Andy Tsai, Cindy Wible, Ala Willsky, and William Wells.
Incorporating spatial priors into an information theoretic approach for fMRI data
analysis. In Medical Image Computing and Computer-Assisted Intervention, pages
6271, 2000.
18. Vladimir Kolmogorov and Ramin Zabih. Visual correspondence with occlusions
using graph cuts. In International Conference on Computer Vision, pages 508515,
2001.
19. S. Li. Markov Random Field Modeling in Computer Vision. Springer-Verlag, 1995.
17
20. N. Paragios and V. Ramesh. An MRF-based approach for real-time subway mon-
itoring. In IEEE Conference on Computer Vision and Pattern Recognition, 2001.
21. S. Roy. Stereo without epipolar lines: A maximum ow formulation. International
Journal of Computer Vision, 1(2):115, 1999.
22. S. Roy and I. Cox. A maximum-ow formulation of the n-camera stereo corre-
spondence problem. In International Conference on Computer Vision, 1998.
23. Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense two-
frame stereo correspondence algorithms. Technical Report 81, Microsoft Research,
2001. To appear in IJCV. An earlier version appears in CVPR 2001 Workshop on
Stereo Vision.
24. Dan Snow, Paul Viola, and Ramin Zabih. Exact voxel occupancy with graph cuts.
In IEEE Conference on Computer Vision and Pattern Recognition, pages 345352,
2000.
25. Richard Szeliski and Ramin Zabih. An experimental comparison of stereo algo-
rithms. In B. Triggs, A. Zisserman, and R. Szeliski, editors, Vision Algorithms:
Theory and Practice, number 1883 in LNCS, pages 119, Corfu, Greece, September
1999. Springer-Verlag.
18

You might also like