Balanced Queries: Divide and Conquer⋆
Dmitri Akatov1 and Georg Gottlob1,2
2
1
Oxford University Computing Laboratory, University of Oxford
Oxford Man Institute of Quantitative Finance, University of Oxford
{dmitri.akatov,georg.gottlob}@comlab.ox.ac.uk
Abstract. We define a new hypergraph decomposition method called
Balanced Decomposition and associate Balanced Width to hypergraphs
and queries. We compare this new method to other well known decomposition methods, and analyze the complexity of finding balanced decompositions of bounded width and the complexity of answering queries
of bounded width. To this purpose we define a new complexity class,
allowing recursive divide and conquer type algorithms, as a resourcebounded class in the nondeterministic auxiliary stack automaton computation model, and show that finding decompositions of bounded balanced width is feasible in this new class, whereas answering queries of
bounded balanced width is complete for it.
1
Introduction
The aim of the study of hypergraph decompositions is to find tractable subclasses
of the Boolean Conjunctive Query (BCQ) evaluation problem in databases and
the Constraint Satisfaction Problem in AI. Both these problems are equivalent
and well known to be NP-complete [6,19] with the cyclicity of the hypergraphs
causing the state explosion. A hypergraph decomposition transforms a hypergraph into an acyclic structure (a labelled tree), reducing the complexity of
the associated problem. The tractability results of these problems rely on the
acyclicity of the tree on the one hand and on certain properties of its labels on the
other. Probably the most prominent decomposition method is the tree decomposition of [22], originally developed for graphs, but also applicable to hypergraphs.
[10,7,16,17] provide an overview of more recent decomposition methods including
(generalized) hypertree decompositions, spread cut decompositions and fractional
hypertree decompositions. An important notion in most decompositions is their
width, which often ensures tractability if it is independent of the hypergraph
under consideration. Thus the two main complexity-theoretic problems usually
considered are the following:
– Decomposition problem: What is the complexity of recognizing hypergraphs admitting a decomposition of fixed width?
⋆
Work funded by EPSRC Grant EP/G055114/1 “Constraint Satisfaction for Configuration: Logical Fundamentals, Algorithms and Complexity. G. Gottlob would also
like to acknowledge the Royal Society Wolfson Research Merit Award.
P. Hliněný and A. Kučera (Eds.): MFCS 2010, LNCS 6281, pp. 42–54, 2010.
c Springer-Verlag Berlin Heidelberg 2010
Balanced Queries: Divide and Conquer
43
– BCQ evaluation problem: What is the complexity of evaluating BCQs
for the class of queries with a decomposition of fixed width?
A particularly nice property for queries with bounded tree and (generalized)
hypertree width, thus also including acyclic queries, is that the BCQ evaluation problem is not only tractable, but also complete for the complexity class
LOGCFL [12]. This complexity class lies very low within the NC-AC hierarchy
(and hence within P) between NC1 and AC1 and hence is highly parallelizable. In [11] Gottlob et al. present a parallel algorithm for the BCQ evaluation problem1 which is optimal under its complexity restrictions, and whose
running time does not depend on the shape of the hypertree. The problem of
recognizing hypergraphs of bounded hypertree width is also in LOGCFL [12],2
however, most efficient sequential algorithms, see e.g. [15], compute the decomposition node by node in a top-down manner, following the shape of the resulting tree, which for most hypergraphs is often deep (linear in the size of the
hypergraph) and narrow (branching factor of 1 for most nodes). This has negative effects on the parallelization of such hypertree computation algorithms,
which is easiest when the computation tree is balanced, indicating a division
of the problem into smaller independent subproblems which can be conquered
recursively.
While looking for better, more “balanced”, algorithms, we decided to analyze
hypertrees which are balanced a priori, but are not necessarily valid (generalized) hypertree decompositions. These balanced decompositions constitute an
entirely new hypergraph decomposition method in its own right, and hence deserve further analysis. In particular they possess beneficial properties for parallelization, they capture wider classes of hypergraphs than other known decomposition methods, and provide more insight into the structure of NP-complete
problems.
In section 3 we provide the formal definition of balanced decompositions and
compare them to generalized hypertree decompositions. In section 4 we characterize balanced decompositions game-theoretically by defining the Robber and
Sergeants Game for hypergraphs. To better understand the complexity of the
decomposition and the BCQ evaluation problems, we define a new complexity
class DC1 in section 5 by limiting resources of Nondeterministic auxiliary Stack
Automata [18], and identify its lower bounds as LOGCFL and GC(log2 n, NL)
in the Guess-and-Check model and its upper bound as NTiSp(nO(1) , log n), the
space-bounded subclass of NP. In section 6 and section 7 we show that recognizing hypergraphs of bounded balanced width (BW) is feasible in DC1 , while
the BCQ evaluation problem for the class of queries of bounded balanced
width is complete for DC1 . We conclude the paper in section 8.
Omitted proofs can be found in the full version of this paper [1].
1
2
Actually, the algorithm was developed for acyclic BCQs, but can easily be adapted
to generalized hypertree decompositions.
Unfortunately recognizing hypergraphs of generalized hypertree width at least 3 is
NP-complete [14].
44
2
D. Akatov and G. Gottlob
Preliminaries
All sets in this paper are finite.
We assume the reader to be familiar with the standard formalizations of rooted
and ordered trees. We use T to denote a tree, its node set and its “child function”,
we write Tp for a subtree rooted at a node p, O(T ) for the “root” of T , and ⊑T
for the “ancestor relation”, with O(T ) ⊑T p for all other p ∈ T .
A hypergraph is a tuple H = (V (H), E(H)) where V (H) is a set called the
vertices of H and E(H) ⊆ P(V (H) \ {∅} is a set called the edges or hyperedges
of H. Given a hypergraph H and R, S ⊆ E(H),
we say Q ⊆ R \ S is an [S]
component of R iff either Q = {e} with e ⊆ S, or for any two edges in Q there
exists a sequence (an [S]-path) of edges in Q betweenthem, such that every two
consecutive edges share some vertex not covered by S.
Given a hypergraph H, a hypertree for H is a triple (T, χ, λ), where T is a
rooted tree, and χ and λ are labeling functions which associate to each vertex
p ∈ T two sets χ(p) ⊆ V (H) andλ(p) ⊆ E(H).
Given p ∈ T we define χ(Tp ) = {χ(q)|q ∈ Tp } and λ(Tp ) = {λ(q)|q ∈ Tp }.
The width of a hypertree is maxp∈T |λ(p)|.
A hypertree decomposition is a hypertree satisfying the following conditions:
1.
2.
3.
4.
For all e ∈ E(H), there exists p ∈ T , such that e ⊆ χ(p),
for all v ∈ V (H), the set{p ∈ T |v ∈ χ(p)} induces a connected subtree of T ,
for each p ∈ T , χ(p)
⊆ λ(p),
for each p ∈ T , ( λ(p)) ∩ χ(Tp ) ⊆ χ(p).
A generalized hypertree decomposition is a hypertree only satisfying the first
three of these conditions. The width of a (generalized) hypertree decomposition
is the width of its hypertree. The (generalized) hypertree width (GHW / HW)
of a hypergraph H is the minimal width over all its (generalized) hypertree
decompositions [12].
The monotone Robber and Marshals Game and its equivalence with hypertree
decompositions is studied in [13] and [3].
A Database is a relational structure over a schema (signature). A Boolean
Conjunctive Query (BCQ) is also a relational structure (over the same schema)
containing no constants. We call every tuple occurring in some relation also an
atom, and the objects of the base set occurring in the query or an atom its
variables. We write atoms(q) for the set of atoms of a query q and var(a) for the
set of variables occurring in an object a (e.g. an atom or a query). We say that
a database D satisfies a BCQ q (D |= q) iff there exists a homomorphism from
q to D.3
The underlying hypergraph of a BCQ q is a the hypergraph H(q), with V (H(q))
= var(q) and E(H(q)) = {var(a)|a ∈ atoms(q)}. A decomposition of a BCQ is
simply a decomposition of its underlying hypergraph.
3
An alternative definition of databases and BCQs can be found e.g. in [2].
Balanced Queries: Divide and Conquer
3
45
Balanced Decompositions
Definition 1. Let H be a hypergraph. A hypercut decomposition of H is a
hypertree (T, χ, λ) for H which satisfies the following conditions:
1. For each e ∈ E(H) there exists p ∈ T such that e ∈ λ(p),
2. for each Y ∈ V (H), the set {p∈ T |Y ∈ χ(p)} contains its ⊑T -meet,
3. for each vertex p ∈ T , χ(p) = λ(p).
A hypercut decomposition is a shallow decomposition if additionally depth(T ) ≤
log |E(H)| holds. A hypercut decomposition is a balanced decomposition if additionally |λ(Tq )| ≤ |λ(Tp )|/2 holds for all p ∈ T, q ∈ T (p). The width of a
shallow decomposition or a balanced decomposition is the width of its hypertree.
The shallow width, resp. balanced width, of H is the minimum width over all its
shallow, resp. balanced, decompositions. We write SW(H) for the shallow width
and BW(H) for the balanced width of H.
Notice that a hypercut decomposition is uniquely defined by T and λ alone,
while the χ-labels are only required for the second condition. We do not define
a hypercut width, since then every hypergraph would trivially have width one.
Example 1. The balanced and shallow widths of the following hypergraph are
two and we also show a balanced decomposition, which is also shallow (in the
decomposition we only present the λ-labels):
A
a
e
n
K
b
G
k
L
c
D
H
t
M
d
E
h
l
p
o
s
C
g
f
j
F
B
i
m
I
q
u
N
J
r
v
O
ss
ss
g, p K
KKK
j, k
777
n, +o
e, f,
,
++
,
a b s t
l, m:
::
q, ,r
h, +i
+
,
+
,
c d u v
For any such “grid graph” of width k and length m, it is easy to construct
a balanced decomposition of width k. The generalized hypertree width of such
hypergraphs, however, is always k+1, and any decomposition tree will have depth
at least m (linear in the size of the hypergraph), and will generally contain a
long “chain” with no branching.
Proposition 1. Let H be a hypergraph. The following holds:
SW (H) ≤ BW (H) ≤ GHW (H) ≤ SW (H) log |E(H)| .
A shallow tree will have “good” branching at some point, however this branching
does not have to occur at every internal node, as in a (perfectly) balanced tree.
Hence also the distinction between balanced and shallow decompositions, which
also capture slightly different classes of hypergraphs. For instance, every graph
consisting of a single cycle has shallow width one, whereas its balanced width
is always two. However, for all complexity-theoretic purposes, the distinction
between balanced and shallow decompositions is not relevant, since our results
apply to both of them.
46
D. Akatov and G. Gottlob
4
Robber and Sergeants
As with many other decomposition methods for hypergraphs it helps to visualize
a decomposition in terms of a two player game between a Robber and some Law
Enforcement Entity, [24,13,16]. We shall define the Robber & k Sergeants game
on a hypergraph H (R&Sk (H)). It resembles the Robber and Marshals game
[13], but has two important differences: The robber is positioned on edges rather
than vertices (an escape space hence becomes a set of edges rather than a set of
vertices). Also, the sergeants only have to cover any edge once and it remains
covered for the rest of the game (the robber can never go to that edge again).
Hence the game is by definition monotone (the escape space can never increase).
Let H be a hypergraph, let k be a positive integer, and let A ⊆ E(H) such
that A is connected, be the initial escape space. The Robber and k Sergeants
Game from A (R&Sk (A)) is played by two players - R (the robber) and S (the
sergeants). Player S announces moves by choosing a set S of up to k edges
of A. If S covers the whole of A, player S wins. Otherwise, player R chooses
an [S]-connected component of A, say B. They then proceed to play the game
R&Sk (B). If the game is shallow, then player R wins R&Sk (A) if he can sustain
play for more than log |A| moves. If the game is balanced, then player R wins if
from any escape space A and a sergeants’ move S he can select an [S]-component
B of A such that |B| > |A|/2. The game R&Sk (H) is the game R&Sk (E(H))
(on the full hypergraph). Player S has a winning strategy, if for any possible
move of player R, he can still win the game. This leads to the formal definition
of a winning strategy:
Definition 2. Let k be a positive integer, let H be a connected hypergraph. A
winning strategy for R&Sk (H) is a tuple (T, ρ, λ), where T is a rooted tree and
ρ, λ : T → P(A) are labelling functions (escape space and sergeants’ moves,
respectively) such that the following conditions hold:
1.
2.
3.
4.
5.
Initial Condition: ρ(O(T )) = E(H).
Boundedness: For all t ∈ T , 1 ≤ |λ(t)| ≤ k.
Completeness: For all s ∈ T , ρ(s) = µ(s) ∪ t∈T (s) ρ(t).
Separation: For all s ∈ T, t = u ∈ T (s), ρ(t) ∩ ρ(u) = ∅.
Connectedness: For all s ∈ T, t ∈ T (s), e ∈ ρ(s), f ∈ ρ(t), e is [λ(s)]connected to f in ρ(s) iff e ∈ ρ(t).
A winning strategy in the shallow R&Sk game additionally satisfies depth(T ) ≤
log |E(H)|. A winning strategy in the balanced R&Sk game additionally satisfies
|ρ(t)| ≤ |ρ(s)|/2, for all s ∈ T, t ∈ T (s).
The separation and connectedness conditions say that escape space labels of the
children of a node s are distinct [λ(s)]-components of ρ(s). The completeness
condition says that we include all such components, and also that for each node
s we have λ(s) ⊆ ρ(s).
Lemma 1. Let H be a hypergraph. There exists a k-width shallow/balanced decomposition of H iff there exists a winning strategy in the shallow/balanced R&Sk
game on H.
Balanced Queries: Divide and Conquer
5
47
The DC Hierarchy
An auxiliary pushdown automaton (AuxPDA) is a generalization of both the
Turing Machine (TM) and the Pusdown Automaton (PDA) — it possesses both
a tape and a pushdown. Adding a pushdown makes these machines more powerful than Turing Machines, since they admit recursive algorithms, which push
“temporary variables” before a recursive call and pop them after the call returns. For instance, a nondeterministic TM using simultaneously logarithmic
space and polynomial time (in NTiSp(nO(1) , log n), see e.g. [20]) can solve problems precisely in NL. A nondeterministic AuxPDA (NauxPDA) with the same
time and space bound on the worktape can precisely solve problems in LOGCFL,
which contains the latter complexity class. We do not usually limit the space
on the pushdown (the maximal pushdown height ), however, Ruzzo showed that
problems in LOGCFL only require O(log2 n) space on the pushdown. We write
NTiSpPh(T (n), S(n), H(n)) for the class of problems solvable by a NauxPDA
which is simultaneously bounded by time O(T (n)), worktape space O(S(n)) and
maximal pushdown height O(H(n)).
A stack acts like a pushdown for writing (pushing and popping), but like a
tape for reading (any cell can be read). Thus, Stack Automata (SA), introduced
by Ginsburg et al. [8], are more powerful than PDAs. Analogously to extending
TMs with a pushdown, Ibarra proposed to do the same with SAs [18], yielding
the model of the auxiliary stack automaton (AuxSA). AuxSAs allow recursive
algorithms, just like AuxPDAs, but these algorithms additionally have access
to all previously computed temporary variables (the accumulated temporary
variables) and are thus more powerful. We write NTiSpPh(T (n), S(n), H(n))
for the class of problems solvable by a nondeterministic SA (NauxSA) which is
simultaneously bounded by time O(T (n)), worktape space O(S(n)) and maximal
stack height O(H(n)).
Since a pushdown can be simulated by a stack, and a stack can in turn be
simulated by a worktape we have the following relationship of complexity classes:
NTiSpPh(T (n), S(n), H(n))
⊆NTiSpSh(T (n), S(n), H(n))
⊆NTiSp(T (n), max(S(n), H(n))) .
Definition 3. For integers k ≥ 0, let DCk = NTiSpSh(nO(1) , log n, logk+1 n).
This new hierarchy of complexity classes lies between NL=DC0 and NP. The
name suggests on the one hand the way an algorithm in such a particular class
might work (Divide and Conquer through recursion), and on the other hand the
maximal depth of recursive calls (O(logk n) for DCk , each time storing O(log n)
cells on the stack). A single function call then is an NL algorithm which additionally can access previously computed temporary variables. In this paper we will
only consider the class DC1 , allowing a logarithmic number of recursive calls.
In particular, log n recursive calls exactly allow at each call to divide an input
of length n into at least two parts each at most half as big until the parts have
constant size. We have LOGCFL ⊆ DC1 ⊆ NTiSp(nO(1) , log2 n).
48
D. Akatov and G. Gottlob
We can use SAs to simulate the Guess-and-Check model of [5], by nondeterministically guessing an “advice string” and placing it on the stack. In particular, we have GC(logk+1 n, NL) ⊆ DCk , where the former class is the class
of languages for which an advice string of length O(logk+1 n) can be guessed
such that the original input plus the advice string can be decided in NL. Note
that GC(log2 n, NL) is not known or believed to be contained in P, which
strongly indicates that neither is DC1 .
We get the following inclusion diagram of complexity classes:
DSpace(log2 n)
NP XXXXXX
XXXXX
NTiSp(nO(1) , log2 n)
P
NC2
1
ffffff DC
f
f
f
f
f
ff
GC(log2 n, NL)
LOGCFL
f
f
f
f
f
fff
NL = DC0
6
Membership in DC1
Winning strategies in the R&Sk game give us an easy way to find balanced decompositions. Consider the algorithm k-robber-sergeants:
Algorithm 1. k-robber-sergeants
1
2
3
4
5
6
7
8
9
10
11
: input Hypergraph H
: fixed parameter k: Integer
: check-win(firstEdge(H), |V (H)|)
: accept
: procedure check-win(Edge r, Integer size)
: guess Edge[1 . . . k] sergs
: for each Edge f ∈ E(H) do
:
if connected(r,f) then
:
Integer n := count-connected-edges(f , sergs)
:
if n > size/2 then reject
:
else if n > 0 then check-win(f, sergs)
Here the function count-connected-edges(e, sergs) counts the number of edges
[S]-connected to e, where S is the set of all sergeant hyperedges already on the
stack plus sergs. This can obviously be done in NL, and even in L by using the
undirected st-connectivity algorithm of [21].
Theorem 1. Let k be a positive integer, and H a hypergraph. The algorithm
k-robber-sergeants accepts iff a balanced decomposition of H of width at most k
exists. Moreover this algorithm is in DC1 .
Balanced Queries: Divide and Conquer
49
k-robber-sergeants repeats a lot of work, however, this affects neither its correctness nor its complexity bounds. A deterministic algorithm would of course
trade space for time, avoiding any redundancy. It is an easy exercise to adapt
k-robber-sergeants to check for shallow decompositions instead.
As for the bounded BW BCQ evaluation problem, consider the algorithm
k-hd-bcq:
Algorithm 2. k-hd-bcq
1
2
3
4
5
6
7
8
9
10
11
12
13
: fixed parameter k: Integer
: input Database d
: input Query q with a k-width hypercut decomposition
: satisfiable(root(q))
: accept
: procedure satisfiable(Node u)
: for each Atom a ∈ λ(u) do
// at most k repetitions
:
guess Tuple t ∈ table(d, name(a))
:
for each (Atom, Tuple) (b, s) on stack do
:
if not compatible((a, t), (b, s)) then reject
:
push (a, t)
: for each Node v ∈ children(u) do satisfiable(v)
: pop all (a, t) pairs which were pushed during current function call
We assume that the tree of the hypercut decomposition is an ordered tree,
and that we can access its root using the function root, and that given a node u,
we can iterate through children(u) using a single pointer. Given an atom a the
function name(a) returns the “schema” s of that atom, and we can use table(d,
s) to access the appropriate table in d. When we call satisfiable recursively, we
assume that the current temporary variables (in this case only u and v) are
placed on the stack, and popped after the recursive call returns. We push and
pop the atom a and tuple t explicitly at every iteration within one call, because
we need them on the stack before satisfiable is called recursively. The function
compatible((a, t), (b, u)) checks whether tuples t and u are compatible under the
schemas of a and b, respectively, i.e. whether all shared variables of a and b have
the same values in t and u.
Theorem 2. Fix a positive integer k. Given a database D, a boolean conjunctive
query Q with associated hypergraph H and a hypercut decomposition (T, χ, λ) of
H of width at most k, the algorithm k-hd-bcq accepts iff D Q. Moreover, the
algorithm operates in NTiSpSh(nO(1) , log n, d log n), where n is the size of the
whole input and d = depth(T ).
Corollary 1. For queries of fixed (bounded) shallow or balanced width the BCQ
evaluation problem is in DC1 .
50
7
D. Akatov and G. Gottlob
DC1 -Completeness
To show hardness for DC1 of the BCQ evaluation problem for queries of fixed
balanced width we will need some preliminary results about NauxSAs. For any
AuxSA, AuxPDA, PDA or SA we define the push-pop tree to be an ordered tree
representing the sequence of the non-read stack operations of the machine. The
root of the tree represents an empty pushdown/stack, any one node represents
some distinct state of the pushdown/stack, and a new child is added to that
node whenever the machine pushes while the pushdown/stack is in that state.
For example, if the sequence is Push, Pop, Push, Push, Pop, Push, Pop, Push,
Pop, Pop, then the push-pop tree looks like this:
∗
~~ @@
∗
∗@
~~ @
∗ ∗ ∗
Definition 4. We call a NauxSA M regular if it has the following properties:
1. The push-pop tree of M is a full binary tree.
2. After every push the entire contents of the stack is read.
3. Whenever M doesn’t push, it acts deterministically (the only nondeterministic steps are the pushes).
Lemma 2. Let M be a NauxSA with stack size O(log2 n), tape size O(log n)
running in polynomial time and deciding the language L. Then there exists a
regular NauxSA M ′ with same stack and tape sizes running in polynomial time
deciding L. Moreover, there exists a log-space reduction from M to M ′ .
Theorem 3. Let M be a regular NauxSA running in polynomial time, logarithmic space and with a push-pop tree of height k = O(log n) and x a string. Then
there exists a database B and a Boolean Conjunctive Query Q with a Balanced
Decomposition of width 8 such that B Q iff M accepts x. Moreover there exists
a log-space reduction from (M, x) to (B, Q).
Proof. The database B has the tables I(C), A(C), D(C1 , C2 ) and for each i,
1 ≤ i ≤ k, the tables Ui (C1 , S, C2 ), Ri (C1 , S, C2 ) and Oi (C1 , C2 ).4 The table I
contains the initial configuration of M as the only tuple. The table A contains all
accepting configurations of M . A tuple (c, d) is in D iff M starting in configuration c (deterministically) reaches configuration d without performing any stack
operations, and the next operation of M would be a stack operation. A tuple
(c, d) is in Oi iff M starting in configuration c would next perform a pop and
end up in configuration d. A tuple (c, s, d) is in Ui iff M starting in configuration
c would next push s into the ith cell and end up in configuration d. Similarly a
4
For a schema, T (A, B, C) here indicates that we have a table called T which has
three attributes (with “types” A, B and C), such that every tuple in that table has
exactly three elements.
Balanced Queries: Divide and Conquer
51
tuple (c, s, d) is in Ri iff M starting in configuration c would next read s from
the ith cell and end up in configuration d. All these tables can be computed in
log-space.
Now we build the query Q which corresponds to the run of M , contracting
multiple consecutive deterministic steps into one atom: Let T be a full binary
tree of depth k. For a node N let pj (N ), l(N ) and r(N ) denote the j-th ancestor,
left child and right child of N , respectively. For each N ∈ T \ O(T ), define the
R
2
3
query QR
N in the following way: if depthT (N ) = 1, then QN = R1 (XN , SN , XN ),
otherwise
d(N )
2
QR
N =Rd(N ) (XN , SN , YN
d(N )
) ∧ D(YN
d(N )
, ZN
)
d(N )−1
d(N )−1
d(N )−1
d(N )
)
, ZN
) ∧ D(YN
∧Rd(N )−1 (ZN , Sp(N ) , YN
d(N )−1
d(N )−2
∧Rd(N )−2 (ZN
, Sp2 (N ) , YN
)
...
2
2
3
, Spd(N )−1 (N ) , XN
).
) ∧ R1 (ZN
∧D(YN2 , ZN
QR
N now encodes the actions of the machine which read and process the contents
of the stack, after it reached the node N in the push-pop tree. The variables SN
represent the strings already pushed to the stack, and the variables YNi represent
the intermediate configurations between successive reads. Note how this query
will be “attached” into the “simulation” of the machine through its first and
last variables (corresponding to the starting and finishing configurations of the
“stack processing”).
For each N ∈ T , define a query QN in the following way:
If N is the root, then
1
2
2
1
7
3
, XN
) ∧ U1 (XN
, Sl(N ) , Xl(N
QN =D(XN
) ) ∧ D(Xl(N ) , XN )∧
3
1
7
4
U1 (XN
, Sr(N ) , Xr(N
) ) ∧ D(Xr(N ) , XN ) ,
if N is a leaf, then
1
2
3
4
4
7
QN = D(XN
, XN
) ∧ QR
N ∧ D(XN , XN ) ∧ Od(N ) (XN , XN ) ,
otherwise
1
2
QN =D(XN
, XN
) ∧ QR
N∧
3
4
4
1
D(XN
, XN
) ∧ Ud(N )+1 (XN
, Sl(N ) , Xl(N
) )∧
5
5
1
7
D(Xl(N
) , XN ) ∧ Ud(N )+1 (XN , Sr(N ) , Xr(N ) )∧
6
6
7
7
D(Xr(N
) , XN ) ∧ Od(N ) (XN , XN ) .
QN now encodes all the actions of the machine after it reached the node N in
i
represent the configurations of M while it
the push-pop tree. The variables XN
has SN pushed in the d(N )th stack cell. First it reads and processes the stack
(unless N is the root and the stack is empty), then it pushes to the stack and
52
D. Akatov and G. Gottlob
“passes control” to the first child, then it pushes to the stack again and passes
control to the second child (unless N is a leaf and it does not have children),
and finally pops the stack and passes control to its parent (unless N is the
root). This passing of control is achieved through sharing of variables, which
correspond to the according configurations of the machine at any such point in
the computation.
Between all steps accessing the stack we also introduce a deterministic step.
In case the machine does not need any deterministic steps, this can be encoded
in the table D by having a tuple with two equal elements.
Finally define
1
4
Q = I(XO(T
QN ) ∧ A(XO(T
)) ∧ (
)) .
N ∈T
Here we glue all partial queries together, and we also require the first configuration to be the initial configuration of M , and the last configuration to be an
accepting configuration. A valid instantiation of the variables in Q corresponds
to a successful run of M . Hence B Q iff M accepts x.
The hypergraph corresponding to Q has a hyperedge for every atom in the
query, since they are all different. In particular, every subquery QR
N will correspond to 2depthT (N ) − 1 hyperedges. Additionally we have 5 hyperedges for the
root, 3 hyperedges for each leaf, 7 hyperedges for every internal node, and 2 more
hyperedges for the overall query. Altogether we get 4(k + 1)2k − 1 hyperedges.
We can build a balanced decomposition in the following way: For every QR
N it
is easy to build an incomplete binary tree such that each node is labelled with
exactly one atom (D or R) and that the tree is a balanced decomposition of QR
N
of width 1, since QR
N is acyclic (ignoring the variables SN which will be present
in the final tree already). Call each of these trees RN . Now let T ′ be a tree which
is like T , and label every N ∈ T with QN without QR
N . Now “merge” each RN
with the corresponding node N of T ′ by adding the label of the root of RN to
the label of N and attaching the rest of the subtree to N . Also, add two more
nodes as children of the root, one labelled with the I-atom and the other with
the A-atom, to produce the final (T ′ , λ). There will be at most 8 atoms in every
label.
It is obvious that (T ′ , λ) is balanced, since every node has two or four children
and is built absolutely symmetrically.
⊓
⊔
Corollary 2. The BCQ evaluation problem for queries of bounded balanced
width is complete for DC1 .
8
Determinization, Parallelization and Future Work
A standard technique to make a nondeterministic algorithm deterministic is
a brute-force search of the computation tree, trying out all nondeterministic
choices. In many cases this increases the time requirement, however not always
the space requirement. For the class DC1 this is also the case, in particular
Balanced Queries: Divide and Conquer
53
since it is a subclass of DSpace(log2 n). The deterministic algorithm works its
way along the push-pop tree, backtracking its steps whenever some “nondeterministic” choice leads to rejection. Notice, that once a node in the push-pop
tree has several descendants, we can split the work to different processors, since
the results for the subtasks do not depend on each other. The only downside
is the amount of work that needs to be done at every such node, since there
are O(nO(1) log n ) possibilities for the contents of the stack (at the deepest level).
Hence the algorithm, even with parallelization, remains superpolynomial. It is
however still quasipolynomial.
Future Work includes a better analysis of relations between resource-bounded
(N)AuxSAs and other models of computation, in order to relate the complexity
classes in the DC hierarchy to other known complexity classes, in particular those
presented in [23], [9] and [4]. Another direction of work is to establish whether
the problem of recognizing hypergraphs of bounded BW is complete for DC1
or whether it belongs to a lower complexity class. Finally, an important aspect
of future work is the implementation and testing of the parallelization methods
described above in practice.
References
1. http://benner.dbai.tuwien.ac.at/staff/gottlob/DAGG-MFCS10.pdf
2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley,
Reading (November 1994)
3. Adler, I.: Marshals, monotone marshals, and hypertree-width. Journal of Graph
Theory 47(4), 275–296 (2004)
4. Beigel, R., Fu, B.: Molecular computing, bounded nondeterminism, and efficient
recursion. In: Proceedings of the 24th International Colloquium on Automata,
Languages, and Programming, vol. 25, pp. 816–826 (1998)
5. Cai, L., Chen, J.: On the amount of nondeterminism and the power of verifying.
SIAM Journal on Computing 26, 311–320 (1997)
6. Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in
relational data bases. In: STOC 1977: Proceedings of the ninth annual ACM symposium on Theory of computing, pp. 77–90. ACM, New York (1977)
7. Cohen, D., Jeavons, P., Gyssens, M.: A unified theory of structural tractability
for constraint satisfaction and spread cut decomposition. In: IJCAI 2005: Proceedings of the 19th international joint conference on Artificial intelligence, pp. 72–77.
Morgan Kaufmann Publishers Inc., San Francisco (2005)
8. Ginsburg, S., Greibach, S.A., Harrison, M.A.: Stack automata and compiling. J.
ACM 14(1), 172–201 (1967)
9. Goldsmith, J., Levy, M.A., Mundhenk, M.: Limited nondeterminism (1996)
10. Gottlob, G., Leone, N., Scarcello, F.: A comparison of structural csp decomposition
methods. Artificial Intelligence 124(2), 243–282 (2000)
11. Gottlob, G., Leone, N., Scarcello, F.: The complexity of acyclic conjunctive queries.
J. ACM 48(3), 431–498 (2001)
12. Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions and tractable
queries. Journal of Computer and System Sciences 64(3), 579–627 (2002)
13. Gottlob, G., Leone, N., Scarcello, F.: Robbers, marshals, and guards: Game theoretic and logical characterizations of hypertree width. J. Comput. Syst. Sci. 66(4),
775–808 (2003)
54
D. Akatov and G. Gottlob
14. Gottlob, G., Miklos, Z., Schwentick, T.: Generalized hypertree decompositions: Nphardness and tractable variants. In: PODS 2007: Proceedings of the twenty-sixth
ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,
pp. 13–22. ACM Press, New York (2007)
15. Gottlob, G., Samer, M.: A backtracking-based algorithm for hypertree decomposition. J. Exp. Algorithmics 13 (2009)
16. Grohe, M., Marx, D.: Constraint solving via fractional edge covers. In: SODA
2006: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete
algorithm, pp. 289–298. ACM, New York (2006)
17. Hlineny, P., Oum, S.-i., Seese, D., Gottlob, G.: Width parameters beyond treewidth and their applications. The Computer Journal 51(3), 326–362 (2007)
18. Ibarra, O.H.: Characterizations of some tape and time complexity classes of turing
machines in terms of multihead and auxiliary stack automata. Journal of Computer
and System Sciences 5(2), 88–117 (1971)
19. Mackworth, A.: Consistency in networks of relations. Artificial Intelligence 8(1),
99–118 (1977)
20. Monien, B., Sudborough, I.H.: Bandwidth constrained np-complete problems. Theoretical Computer Science 41, 141–167 (1985)
21. Reingold, O.: Undirected st-connectivity in log-space. In: STOC 2005: Proceedings
of the thirty-seventh annual ACM symposium on Theory of computing, pp. 376–
385. ACM, New York (2005)
22. Robertson, N., Seymour, P.: Graph minors. ii. algorithmic aspects of tree-width.
Journal of Algorithms 7(3), 309–322 (1986)
23. Ruzzo, W.L.: Tree-size bounded alternation. Journal of Computer and System
Sciences 21(2), 218–235 (1980)
24. Seymour, P., Thomas, R.: Graph searching and a min-max theorem for tree-width.
Journal of Combinatorial Theory, Series B 58(1), 22–33 (1993)