Acta Informatica21, 47-60 (1984)
9 Springer-Verlag 1984
Optimal Multiway Search Trees for Variable Size Keys
Jayme Luiz Szwarcfiter
Universidade Federal do Rio de Janeiro, N6cleo de Computaq~oEletr6nica,
Caixa Postal 2324, 20.001 - Rio de Janeiro - RJ Brasii
Summary. This paper considers the construction of optimal search trees for
a sequence of n keys of varying sizes, under various cost measures. Constructing optimal search cost multiway trees is NP-hard, although it can be
done in pseudo-polynomial time O(n3L) and space O(n2L), where L is the
page size limit. An optimal space multiway search tree is obtained in O(n 3)
time and O(n 2) space, while an optimal height tree in O(n2 log 2 n) time and
O(n) space both having additionally minimal root sizes. The monotonicity
principle does not hold for the above cases. Finding optimal search cost
weak B-trees is NP-hard, but a weak B-tree of height 2 and minimal root
size can be constructed in O(nlogn) time. In addition, if its root is restricted to contain M keys then a different algorithm is applied, having time
complexity O(nM log n). The latter solves a problem posed by McCreight.
1. Introduction
Multiway trees for variable size keys were discussed by Knuth [10] and
McCreight [11]. The present paper considers the construction of some optimal
trees of this kind. We show that the problem of constructing an optimal cost
(i.e. search cost) multiway search tree is NP-hard. However it can be solved in
pseudo-polynomial time O(naL) and space O(n2L), where L is the given page
size limit. Optimal space multiway search trees are constructed in O(n 3) time
and O(n 2) space, while the optimal height problem is solved in time
O(n2 log 2 n) and space O(n). These two algorithms are described by a common
formulation and find optimal trees with minimal root sizes. Next it is shown
that the problem of constructing an optimal cost weak B-tree is NP-hard,
although it also admits a pseudo-polynomial time solution. Weak B-trees are
similar to B-trees, except that the lower and upper page limits can be independent. Following we describe an O(n log n) time algorithm for finding a weak
B-tree of height 2 and minimal root size. If additionaly the root is restricted to
be formed by a given number M of keys then a different process is applied,
48
J.L. Szwarcfiter
requiring O(nMlogn) time and O(n) space. This algorithm solves a problem
posed by McCreight in [11], where polynomial time algorithms were described
only for the cases M = 2 and 3.
As for fixed size keys, an algorithm to construct an optimal cost binary
search tree in O(n 3) time was described by Gilbert and Moore [3]. Knuth [9]
presented a similar algorithm but additionally introduced gap weights and
proved a monotonicity principle that decreased the time complexity to O(n2).
Garey [2] included a height restriction in the trees. The space bound for these
algorithms is O(n2). Vaishnavi, Kriegel and Wood [12] and Gotlieb [4] described optimal cost multiway search tree algorithms, both having O(n3t) time
and O(n2t) space complexities, where t is the maximum number of keys per
node. Gotlieb [4] and Gotlieb and Wood [5] showed that the monotonicity
principle does not extend to multiway search trees. However it does apply
when the gap weights are absent [41 which reduces the running time in this
case to O(n 2 t). Finally, Itai [7] described a general technique for reducing the
factor t to log t, in the above time complexities.
Basically, the monotonicity principle is the fact that the rightmost key in
the root of an optimal tree does not need to move left (right) when a new key
is added at the right (left) of the key sequence. This restricts the number of
candidates to consider for the root of the new optimal tree. Examples show
that this property does not hold for the present multiway search tree problems.
Unlike the fixed size key case [4], it fails also for the search cost criterion with
no gap weights.
2. Preliminaries
Let E = ( e l , ...,e.) be a sequence of elements called keys, each ei having a
positive integer size si and an arbitrary finite value Yi, satisfying Yi<Y~+I,
1 <i<n. A multiway search tree for E is an ordered rooted tree T such that
each key of E is assigned to exactly one node of T, while each node x keeps a
subset E(x) of keys satisfying:
(i) E(x)=dpc>x is a leaf,
(ii) each non-leaf x has exactly IE(x)l + 1 sons, and
(iii) if y is the k-th son of x in the ordering of T and ej is an arbitrary key
of E(y) then exactly k - 1 among the keys e~E(x) satisfy y,<yj.
The n + l leaves of T are called gaps and denoted go .... ,g,, respectively.
The size of a node x is the sum of the sizes of the keys of E(x). Letting L be an
integer, then T has page limit (or limit) L whenever size (x)< L, for any node x
of T. The level of x is the number of nodes in the path between x and the root
of T. The height of T is the maximum level among the non-leaf nodes and its
space is the total number of non-leaf nodes.
Suppose there are associated to E non-negative real key weights pl ..... p,
and gap weightsqo .... , q , . L e t W= ~ p,+ ~ q j a n d d e f i n e y 0 = - o o and
l~i~n
O~j~n
Y.+ 1= + ~- Then pJWand qJWare the probabilities that the search argument
Optimal Search Trees
49
has value Yi and a value strictly between y~ and Yj+I, respectively. The search
cost (or cost) of T is the sum
Pi level (x(ei))+
1 <--i<n
~
qj (level (g~)-l),
O<j<n
where x(ei) is the node of T containing key e i.
Let L1, L 2 be integers, 0 < L 1 < L 2. A weak B-tree 1"6] of limits (L 1, LE) is a
multiway search tree T of limit L z such that:
(i) size (x) > LI, for any non-leaf x 4: root (T), and
(ii) all leaves of T have the same level.
A B-tree [1] is a weak B-tree with L 1 = I-L2/2].
If the sizes of the keys are fixed then consider s i = 1, i < i < n. For this case,
all known optimal cost algorithms employ dynamic programming using the
following decomposition. Let <e~.... , ej> be the key sequence and T the corresponding optimal cost tree of limit M. The problem of finding T is decomposed in the two subproblems of finding optimal cost trees TL and TR of limits
M for <ei . . . . . e k _ l > and <e~+~, ...,ej>, respectively. Suppose the root r of T
consists of m keys, 1 < m < M (Fig. 1).
Case I. m = 1. Then e k is the key in r.
Case 2. m > 1. Then ek is the rightmost key in r and the root of TL is restricted
to m - 1 keys.
TCT
CaseI
/,,' 9(
)
//
/
Z
TL
I ~\
_3
Fig. 1. The decompositionrule
Case2
50
J.L. Szwarcfiter
The cost of T can be computed from e k and the costs of TL and TR.
This decomposition was first used in [3] for binary trees (Case 1 only) and
generalized in [7].
3. Optimal Cost Muitiway Search Trees
In this section we use optimal tree to mean an optimal cost multiway search
tree. The problem of constructing an optimal tree for a given key sequence is
shown to be NP-hard, but a pseudo-polynomial time algorithm is described.
The NP-completeness proof is a simple transformation from the partition
problem [8-1. An instance for partition is a set A of elements, each one with a
non-negative integer value. The question is to decide whether A can be partitioned into two subsets both having the same sum of values of their elements.
Let E = (e 1..... e,) be a sequence of keys with sizes s~, key weights p~ and
gap weights q~, 1 _<i_<n and 0 < j < n. Let L and C be positive integers, L > s~,
l<i<n.
Theorem 1. Deciding whether there exists a multiway search tree for E having
limit L and cost < C is NP-complete. It remains so even if all gap weights are
zero and each key size equals the corresponding key weight.
Proof. Consider an arbitrary instance of the partition problem, namely a set A
= {al, ..., a,}, where each a i has a non-negative integer value vi. Denote 89~ v~
by b. Define a key sequence E = (e 1..... e,) such that
oj~A
si=pi=vi,
l <i<n,
and
qi=O, O<=j<=n,
It follows that there exists a subset A'c_ A satisfying ~, vj = b iff there exists a
aj~A'
multiway search tree for E, having limit b and cost =<3b. Such a tree would
have height 2 and the subset of keys in the root would be in one-to-one
correspondence with A'A.
However an optimal tree can be constructed in pseudo-polynomial time.
Let E=(ex, ..., e,) be a key sequence as above and L the limit of the desired
optimal tree T for E.
For O < i < j < n and O < m < L define
w(i,j)= ~, Pk+ ~
i<k<-j
qk,
and
i<-k~_j
[ ~ , when s k> m for all k, i < k -<j. Otherwise
0t(i,j, m) = / the cost of the optimal tree of limit L for (e~+ 1, ---, ej)
t such that the root has size _<_m.
For O<i<n and O < m < L define
w(i,i)=O,
and
ot(i,i,m)=qi.
Optimal Search Trees
51
Clearly, ct(0, n, L) is the cost of the desired optimal tree. Applying the decomposition rule of w2, let e k be the rightmost key in the root of T. Then TR is an
optimal tree of limit L. So is Tt., except that in Case 2 its root is restricted to
size at most m - s k. Therefore ot(O,n,L) can be obtained by the following
computation.
For O < i < j < n and O<m<L, let
o0, when s k > m for all k, i < k <j. Otherwise
ot(i,j, m)= ~ min {min [or(i, k - 1, m-- sk), or(i, k - 1, L) + w(i, k - 1)-I
i<k<j
[ ~,,~,h_<,~,,t
+ Pk + ct(k, j, L) + w (k, j)}.
The above algorithm requires O(n3L) time and O(n2L) space. The time
bound cannot be reduced by an application of the monotonicity principle. For
example, in Fig. 2, e 2 is the rightmost key in the root of the optimal tree for
the key sequence <e2, %>. When adding e 1 at the left the rightmost key moves
right.
Optimal cost tree of page limit 2 for <e2,%>
Z
Optimal cost tree of page limit 2 for <et, e2, e3>
0
~
2
.3
si
-
I
2
i
Pi
-
2
2
I
q~
0
0
0
0
key sequence
Fig. 2. Failure of the monotonicity principle for the cost criterion
52
J.L. Szwarcfiter
4. Optimal Height or Space Multiway Search Tree
Throughout this section, an optimal tree means either an optimal height or
space multiway search tree, according to the desired minimization criterion.
An algorithm is described for finding an optimal tree for a given key
sequence and limit. Among the possible optimal trees, the algorithm chooses
one with minimal root size. It uses the decomposition of Sect. 2, as follows:
Let T be an optimal tree of limit L, having space S and height H. Suppose
T is space optimal. Then TR is space optimal and of limit L. Let S' be the
space of TR. In case 1, TL is space optimal, has limit L and space S - S ' - 1 .
Similarly in Case 2, except that the root is restricted to size at most L - s k and
its space is S - S ' . Suppose now T is height optimal. Then TL has height at
most H - 1 in Case 1 and at most H otherwise. The height of TR is always no
more than H - 1 . Clearly, at least one of TL and TR is height optimal, but we
can restrict the search to the case in which both are.
A quasi multiway search tree of limit L is a multiway search tree of limit L,
except for the root whose size is unbounded. A multiway (or quasi multiway)
search tree has parameter z when its height or space is z, respectively according
to the case in consideration. Denote by Z the parameter of the optimal tree.
Let E = (e 1.... , en), each key e i having size s i. For given L and z > 0 define:
0,
when i >j. Otherwise
~t(i,j, z) = ~ the minimal size of the root of a quasi multiway search tree
[ o f limit L and parameter < z for the key sequence (e~, ..., e~).
When z < 0 define ~(i,j, z)= ~ for all i,j. Now let
f~,
when ot(i,j,z')>Lfor all z', l<z'<z. Otherwise
a(i,j,z)=l min {z'lot(i,j,z')<L}.
(i)
I,l < z ' < z
~(i,j,
z)=(oo,
when tr(i,j, z)=oo. Otherwise
(ii)
In other words, tr(i,j, z) equals the parameter of the optimal tree of limit L
for (ei, ...,ej), provided it is <z. In this case, ~(i,j, z)=0. If the parameter of
this tree is greater than z then a(i,j, z)=[3(i,j, z)= oo.
The computation of 0t is as follows.
For 1 <i<j<n,
ot(i,j, 1)= ~ s k.
(iii)
i<--k<--j
Then for 1 < i < j < n and z = 2, 3....
~t(i,j, z)= min {min [~t(i, k - 1, f(k,j, z)), fl(i, k - 1,f(k,j, z ) - 1)] +sk},
where
, (z-fl(k+l,j,z-1)
f(k'J'z)=)z-a(k+l,j,z-1)-
for heightminimization
for space minimization.
(iv)
(v)
Optimal Search Trees
53
The process starts by computing (iii), i.e. the value of each ~t for z = 1. Then
it proceeds to (iv) for z > 1. It stops at the least z such that a(1, n, z)< ~ . The
terminating z satisfies z=Z. Clearly, tr(1, n , Z ) = Z and 0t(1, n, Z) is the (minimal) root size of the final optimal tree. After each at is calculated, the corresponding a is evaluated by (i) and then fl by (ii). All computations are common
for both height or space minimization, except f Each computation of ct(i,j, z)
by (iv) involves the evaluation by (v) of the f(k,j, z) function, for i<k<i. Each
evaluation of a, fl or f can be done in constant time, provided some previously
computed values were kept. Therefore the algorithm can be implemented in
O(naZ) time and O(n2Z) space.
Lemma 1.
tr(i,j, z) = oo =~a(i,j + 1, z) = oo,
tr(i,j, z)< ~=~tr(i,j, z+ 1)=tr(i,j, z) and
a(i,j,z)<oo and tr(i,j+l,z)=oo~a(i,j,z)=z
and
a(i,j+l,z+l)=z+l.
The proof is straightforward.
By using Lemma 1 it is possible to improve the algorithm. Observe that
whenever a(i,j,z)=~ there is no need to compute ~(i,j',z) for j'>j, since
tr(i,j', z)= ~ . Also, when a(i,j, z)< oo we can avoid the computation of ct(i,j, z')
for z'>z, since tr(i,j,z')=a(i,j,z). Finally, if a(i,j,z)<oo and a ( i , j + l , z ) = ~ ,
necessarily ~t(i,j+ 1, z)>L and ot(i,j+ 1, z+ 1)<L. Consequently, for each pair
i,j such that l < i < j < n , ot does not need to be computed more than twice
(obtaining a value ct > L at most once and ~ < L exactly once).
The above observations lead to the following formulation.
Height or Space Minimization Algorithm
In the initial step, let z = 0 and for 1 < i < n define each key e~ as unfinished and
j(i)=i. In the general step, if there are no unfinished keys the process terminates. Otherwise label as unlocked each still unfinished key, increase z by
one, perform the below locking procedure and repeat the general step.
Locking Procedure
Verify if there are still unlocked keys. If negative, the procedure terminates.
Otherwise, choose arbitrarily an unlocked key e~ and compute ~(i,j(i), z) and
the corresponding tr and ft. Check whether a(i,j(i),z)<~. In the affirmative
case, increase j(i) by one and ifj(i) becomes n + 1 redefine e~ as both locked and
finished. When a(i,j(i), z)= ~ just relabel ei as locked (but still unfinished). In
any case, repeat the locking procedure.
The new algorithm runs in O(n 3) time and O(n 2) space.
54
J.L. Szwarcfiter
Now, let us restrict to height minimization, i.e. consider z as the height. The
following definitions are useful. For z >0,
rightz(e~) = max k, i < k < n, such that the optimal tree for (e~..... ek)
has height -<z.
left~(ej) = min k, 1 < k <j, such that the optimal tree for (e k..... ej)
has height <z.
The values of right~ and left~ are clearly not independent. It follows that
lefts(ej) = rain k,
1 < k <j, such that right~(ek) >j.
Therefore, given right~(e~) for each i, l < i < n , all the left~ values can be
computed in O (n) time.
For 1 < i < k < j < n and height z > l define the candidate Q~=Qk(i,j, z) with
value *Qk(i,j, z) given by
*Qk(i,j, z) = min {~(i, k - 1,f (k,j, z)), fl(i, k - 1 , f (k,j, z ) - 1} + s k
and let
Q(i,j, z)= {Qk(i,j, z)[*Qk(i,j, z)< ~ , i< k <=j}.
Then (iv) can be rewritten as
~(i,j, z) = min {*QklQkeQ(i,j, z)}.
In other words, the candidates are the operands of the minimization (iv)
and consequently ~(i,j,z) equals the minimal value among the candidates
Qk(i,j, z).
Lemma 2. Let 1 < i < j < n and height z> 1. If fl(i,j, z)=0 then Q(i,j+ 1; z) can be
constructed from Q (i,j, z) as follows:
Q(i,j + 1, z) = [ Q(i,j, z) u {Q/+ 1(i,j + 1, z)}] - E X (i,j, z),
where EX(i,j, z) = {Qk(i,j, z)l fl(k + 1,j + 1, z - 1) = ~ , i < k <j}.
Proof.
Since fl(i,j, z) = O, ~(i,j, z) < L
and
therefore
*Qi+ 1(i,j + 1, z)
= min {~(i,j, z), fl(i,j, z - 1)} + sj+ 1 < ~ . Then Q j+ 1(i,j + 1, z)eQ(i,j + 1, z). As for
the exclusions (the candidates of EX) if fl(k+ 1,j+ 1, z - 1 ) = ~
then Qk(i,j
+ l , z ) ~ Q ( i , j + l , z ) , i < k < j . When fl(k+l, j + l , z - l ) = 0 it follows that fl(k
+ 1,j, z - 1)=0 and since fl(i, k - 1, z)=0 we conclude that *Qk(i,j, z)=*Qk(i, j
+ 1, z). The latter corresponds to the common candidates of Q(i,j, z) and Q(i,j
+ 1, z),.
L e m m a 3 . Let l <i<j<_n and height z>_l. If ~(i,j,z)=O and fl(i,j+l,z)=oo
then
Q(i,j+ 1, z + 1)= {Qkl*Qk=sk, leftz(ej+l)<k<-j+ 1}.
Proof If k<leftz(ej+ 1) then /~(k+ 1,j+ 1, z)= ~ , consequently *Qk= ~ and
QkCQ(i,j+l, z+ 1). In addition, if k > j + 1 then Qk is not a candidate of Q(i,j
Optimal Search Trees
55
+ 1, z + 1). Suppose now leftz(ej+ 1)<k<=j+ 1. Because fl(i,j, z)=O, fl(i, k - 1, z)
=0. Because fl(k + l,j+ l,z)=O, fl(k + l,j,z)=O. Therefore *Qk(i,j+ l,z + l)
= m i n {~t(i, k - 1, z + 1), fl(i, k - 1, z)} + s k = s k and QkeQ(i,j+ 1, z+ 1),.
A possibility for further improving the height minimization algorithm is to
use a priority queue to contain the sets Q(i,j,z). The central point then
becomes updating the queue. We next describe a method for it.
A simple change in the locking procedure is to make the choice of the
unlocked key no longer arbitrary. Instead, we shall always select the unlocked
key e~, with maximum i. Suppose such e~ has been chosen and that ~(i,j, z),
j<n, has been calculated obtaining fl(i,j,z)=O. We now follow the next computations.
Initially, since ~(i,j, z)=0, e i remains unlocked and is again chosen in the
locking procedure. We prepare the computation of ct(i,j + 1, z). Use Lemma 2
to obtain Q(i,j+ 1, z) from Q(i,j, z). This corresponds to inserting in the queue
the candidate of value *Qj+ 1(i,j + 1, z) = min {ct(i,j, z), fl(i,j, z - 1)} + sj+ 1 and
removing from it the candidates of the set EX(i,j, z). The latter is identified by
iteratively checking for k=i, i + 1, ... whether fl(k+ 1,j+ 1, z)= c~ and stopping
when this becomes false. As long as we keep choosing the same key e~ in the
locking procedure, until e~ becomes locked no candidate is removed more than
once from the priority queue. Therefore O(n) deletions may occur until e i gets
locked again. This means O(n2H) deletions overall, where H is the height of
the optimal tree, i.e. O(n 2 log 2 n) time. This is the dominant factor in the time
complexity of the algorithm. Note that H is O(log n).
If the above computation of ~(i,j+ 1, z) results fl(i,j+ 1, z ) = 0 and j + 1 <n,
repeat the same argument and construct Q(i,j+2, z) from Q(i,j+l,z), and so
on. Otherwise, if fl(i,j+ 1, z)=oo then e~ becomes locked but still unfinished.
When ei is eventually again unlocked and chosen in a new computation of the
locking procedure, the value to be calculated is ct(i,j+l,z+l). Lemma 3 indicates directly the contents of Q(i,j+ 1, z+ 1). We then disregard the current
priority queue and construct a new one in O(n) time, containing the values of
Q(i,j+l,z+l). A key may become locked O(H)times. Therefore O(n21ogn)
time is needed overall for these constructions. From Lemma 3 we observe that
the computation of Q ( i , j + l , z + l ) depends on knowing leftz(e~+~). Since
fl(i,j, z)=0 and fl(i,j+ 1, z)= ~ we have rightz(ei)= j. This means that each time
the set of unfinished keys becomes locked (at the end of the locking procedure),
the corresponding right~ values are all known. At this momera and as absorved
before, we can compute all leftz values in O(n) time, i.e. O(n log n) overall.
The remaining main operation is the actual minimization in the priority
queue to obtain the ~ values. This requires O(n 2 logn) time overall. The time
complexity is therefore O(n 2 log 2 n).
As for the space complexity, observe that the only ~ which we need to
remember is ~(i,j, z) when updating the priority queue for Q(i,j+ 1, z). When
this occurs, ~t(i,j, z) is precisely the last one calculated. Therefore constant space
suffices for the ~t's. Values of fl corresponding to z - 1 or z are needed in
general, when performing the computations for height z. But they can be easily
obtained respectively, either from the left~_, or right z values, which must be
56
J.L. Szwarcfiter
then available and o c c u p y O(n) space. The priority queue contains at most O(n)
values. The space complexity is therefore O(n).
The above strategy applies for z > 1. The case z = 1 should be done first and
consists of calculating left~(ei) , using (iii).
The m o n o t o n i c i t y principle c a n n o t be applied to any height or space
minimization algorithm, seeking for minimal root size. See Fig. 3.
C--z7-3
Optimal height and space tree of page limit 4 for (e2, e3, e4, es, e6), having minimal root size.
Optimal height and space tree of page limit 4 for (el, e2, e3, e4, es, e6), having minimal root size.
i
1
2
3
t.
5
6
si
2
2
2
2
2
1
key sequence
Fig. 3. Failure of the monotonicity principle for either height or space criterion
5. Optimal Cost Weak B-Trees
In this section we show that the problem of constructing an optimal cost weak
B-tree is N P - h a r d .
D e n o t e by E=(e~) a sequence of keys with sizes s~, key weights p~ and
gap weights qi, l < i < n and O<j<-<_n. Let L I , L 2 and C be positive integers,
LI < L 2.
Theorem 2. Deciding whether there exists a weak B-tree for E having limits
(L~, L2) and cost < C is NP-complete. It remains so even if all gap weights are
zero and each key size equals the corresponding key weight.
Proof. Consider an arbitrary instance of the partition problem namely a set A
= { a 1.... ,a,n } each a i having a non-integer value vi associated with it. Let b
Optimal Search Trees
57
= 89 be an integer, otherwise the solution is trivial. Construct a key sequence E = ( e ~.... , e 2,, + 3) formed by four types of keys:
Type I. key e 1, with s I = P l = r e ( b + 1);
Type 2. key e 2, with s2=p2 =m;
Type 3. keys e3,e s .... ,e2m+3 , with Sk=Pk---1, k = 3 , 5 , . . . , 2 m + 3 ; and
Type 4. keys e4,e 6 ..... e2.+2, with Sk=Pk=mV<k_2)/2, k = 4 , 6 ..... 2 m + 2 .
Let all q~= 0, 0 < i < 2m + 3. It follows that there exists a subset A ' _ A such
that ~ v~=b if and only if there exists a weak B-tree for E with limits (1, mb
a,~A'
+m) and c o s t < 5 m b + 5 m + 2 . Such a tree would have height 2 and the type 4
keys of the root would be one-to-one correspondence with A'A.
Again the NP-completeness is not strong. An optimal cost weak B-tree can
be found in pseudo-polynomial time by appropriately extending the algorithm
of Sect. 3.
6. Weak B-Trees of Height 2
Given a key sequence E = (el, ..., e.), each e~ with size s t and given integers
L 1, L 2 with 0 < L 1 < L2, we first consider the problem of finding a weak B-tree T
for E having limits (L1,L2), height 2 and minimal root size. We assume that
Z s i > L2, otherwise there is no reason for a tree of height 2.
Observe that T can be determined just by finding the subset of keys which
forms the root. In order to compute this subset, we construct an acyclic
digraph G with vertex set {Vo,vl .... , v,+ 1}. G has one directed edge (vi, vj) and
a distance d o for each i,j, 0 =<i < j =<n + 1. Each distance is defined as follows:
dij=fsj,),
when L1----<i<k<j
~
Sk~L2"Otherwise
O0,
where s, + 1 -- 0.
Let the length of a path P in G be the sum of the distances of the edges of
P. It follows that the weak B-trees of limits (L1, L2) and height 2 are in one-toone correspondence with the Vo-V,+ ~ paths in G of length < L 2. Denote by D~
the length dijw...Wdk, n+ 1 of the shortest v~-v.+~ path P~ in G. Compute the
shortest .path Po of length D O from vo to v.+~. If D o > L 2 the desired weak Btree T cannot exist. Otherwise the root of T is formed by the keys of {ejl v:Po
-{Vo,V.+l}}.
A straightforward implementation of the above process gives a O(n z) time
and space algorithm for finding the tree T. However it is possible to improve it
by taking advantage of the special distribution of the edge distances.
For
For 0 < k < n + 1 define the candidate Qk with value *Qk = sk +
0 < i < n + 1, let
Dk"
Q(i)= {Qk[*Qk < oo and dik< 0% i < k < n + I}.
58
J.L. Szwarcfiter
The shortest distances from vi to v,+ 1 can be computed by
~,
if Q(i) = ~b. Otherwise
D.
, [min{.QklQkeQ(i)}
(i)
for i = n + 1, n, ...,0.
We use a priority queue to contain the values of the candidates of Q(i). The
point again is updating it after each iteration. The following functions are
useful. For 0 < i < n, let
~n+2,
az(i)=/min{jl
when i < k ~ , + t s k < L t . Otherwise
~
sk>Ll, i+ 1 < j < n + 1}
(ii)
i<k<j
9 f n + I,
when i = n. Otherwise
~2(z) = ~
max{jl ~ sk<=L2, i + l < j < n + l }
(iii)
i<k<j
In other words, if the shortest distance D~ is finite and vj is the vertex that
follows vi in P~ then i<trl(i)<=j<_~2(i ). In this case, *Qj=Dr There is no
difficulty computing all trl(i ) and tr2(i), i=n, n - 1 ..... 0, in O(n) time.
The following lemma shows how to construct Q(i-1) from Q(i).
L e m m a 4. I f 0 < i <=n + 1 and Q(n + 1) = ~b then
Q(i - 1) = [Q(i)uIN(i)] - EX(i),
where
IN(i) = {Qkl *Qk < 00, crI (i - 1) < k < min {trl(i ) - 1, tr 2(i - i)} }
and
EX(i)= {Qk[*Qk< o0, max {trl(i), a 2 ( i - 1)+ 1} <k<tr2(i)}.
Proof. If t r l ( i - 1 ) > t r 2 ( i - 1 ) then t r l ( i - 1 ) = n + 2 and trl(i)>tr2(i ). Then Q(i)
= IN(i) = EX(i) = tk and consequently Q(i - 1) = ~p. Otherwise a ~(i - 1) < tr 2(i - 1)
and suppose first a 1(i) > a2(i ). Then Q(i) = EX(i) = ~b and since tr 1(i) = n + 2, tr2(i
- 1 ) < t r l ( i ) - i and consequently Q(i-1) coincides with IN(i). Finally consider
a i (i - 1) < tr 2(i - 1) and a i (i) < a 2(i). Let *Qk < ~ . If k < tr 1(i - 1) or k > tr 2(i) then
Qkq~Q(i) and QkCQ(i-1). If k<a~(i) then Qkq~Q(i), but QkeQ(i-1) when al(i
- 1 ) < k < c r 2 ( i - 1 ) . Therefore we include the candidates of 1N(i). If t r 2 ( i - 1 ) < k
then Q ~ Q ( i - 1 ) , but QkeQ(i) when trl(i)<k<tr2(i ). Therefore we exclude the
candidates of EX(i). The remaining possibility is al(i)<k <tr2(i-1). In this case
QkeQ(i) and QkeQ(i - 1). The latter candidates remain unchanged,.
The algorithm then follows. Initially, define Q(n+ 1 ) = ~ and compute a~(i)
and tr2(i ) for each i,i=n, n - 1 ..... O, using (ii) and (iii) respectively. Subsequently, for i=n + 1, n.... ,0 compute D~ using (i). The priority queue which
contains the candidates of Q(i) is updated using L e m m a 4, after each iteration i.
There are O(n) minimizations, inclusions and exclusions in the process. We
need therefore O(n log n) time 9The space complexity is O(n).
Now, consider the following problem. We wish to find a weak B-tree T as
in the above case, except that additionally its root is required to be formed by
a given number M of keys [11]. A solution can be constructed using the
Optimal Search Trees
59
following dynamic programming algorithm based agaid in the decomposition
of w Let ek be the rightmost key in the root of T. Then L 1 __< ~ s i < L 2. In
k<i~_n
Case l, LI<= ~
s i ~ L 2. In Case 2, TL is a weak B-tree for (el,...,ek_l)
of
1 <i<k
limits (L~, L2), height 2, minimal root size and having M - 1 keys in the root 9
For 1 __i =<j_<_n and m > 0, define
(the minimal size of the root of a weak B-tree for (e i.... , ej)
~(i,j,m)=~of limits (L1,L2), height 2 and having m keys in the root;
/
[ ~,
whenever the above tree does not exist.
For 1 <-i<=j<__n,define:
w.on ,
9
Z
O, orwise
i<k<-J
The problem can be solved by computing at(1,n,M) using the following
equation.
For 2m+ l <-j<=n and l <_m<_M,
~(1,j, m) =
{~t(1,k-l,m-1)+Sk+~(k+l,j,O)}.
min
2m<k~j-
I
Observe that if ~(i,j, ra) is finite then necessarily j___2m+l and the right
most key ek in the root of the corresponding weak B-tree satisfies 2m__<k =<j
--1.
Using the functions tr 1 and tr 2 as defined above we can compute each
~(i,j,0) in constant time, as follows:
If l <i<=j<n,
ot(i,j, 0)=~'0,
when t r , ( i - 1 ) < j + l
and
t r 2 ( i - 1 ) > j + l . Otherwise
Therefore we can compute or(1, n , M ) i n O(n2M) time and O(n) space. The
time bound can be improved as below described.
For 1 < m < M and 2m<k define the candidate Qk with value
*Qk = 0t(l, k - 1, m - 1)+s k.
For 1 < j < n and 1 < m < M, let
Q(j,m)={QkI*Qk<=L2 and ct(k+ 1,j,0)=0, 2m<__k<=j-1}.
Then ct(1,j, m) can be computed by
0t(1,j,m)=j'oo,,
if Q(j,m)=~. Otherwise
(min {*Qk IQReQ(J, m)},
for l < j < n and l < m < M .
A priority queue is used to keep the sets Q(j, m). For a fixed m, update the
queue as follows. If j < 2 m + l then Q(j,m)=q~. Otherwise we construct Q(j+
1,m) from Q(j,m). Include each candidate Qk such that *Qk<L2, ~ ( k + l , j , 0 )
60
J.L. Szwarcfiter
= oo and ~t(k+ 1,j + 1, 0) = 0. Exclude each Qke Q(j, m) which satisfies ct(k+ 1,j, 0)
= 0 and 0t(k+ 1,j + 1, 0)= oo. Each candidate to be changed can be identified in
constant time, since the next possible inclusion and exclusion are Q~+I and Qp,
respectively, where q = max {klQkeQ(j, m)} and p = min {klQk e Q(j, m)}. For each
m, there can be O(n) minimizations, inclusions and exclusions. The time and
space complexities are therefore O(nM log n) and O(n), respectively.
7. Conclusions
The construction of optimal multiway search trees and optimal weak B-trees
for variable size keys have been considered. In particular, constructing optimal
cost trees is NP-hard in both cases, although both admit pseudo-polynomial
time algorithms. But in many applications the limits of the trees are orders of
magnitude less than the number of keys. Clearly, the algorithms are polynomial for this class of problems.
It might well be worthwhile to consider an alternative strategy for defining
weak B-trees of variable size keys. Namely, to adopt as lower limit a given
number of keys, while maintaining the size as upper limit. The optimal cost
problem remains of course NP-hard, but the trees become easier to manipulate. For example, an optimal height weak B-tree can be found in polynomial time if this alternative is adopted. When controlling nodes only by
sizes, the construction of a weak B-tree of height 2 can also be carried out in
polynomial time as described in Sect. 6. The case when the height > 3 would
bear further investigation.
Acknowledgements. To Ysmar V. Silva F9 for all the discussions and insightful remarks.
References
1. Bayer, R., McCreight, E.: Organization and maintenance of large ordered indexes. Acta
Informat. 1, 173-189 (1971)
2. Garey, M.R.: Optimal binary search trees with restricted maximal depth. SIAM J. Comput. 2,
101-110 (1974)
3. Gilbert, E.N., Moore, E.F.: Variable-length binary encodings. Bell System Tech. J. 38, 933-968
(1959)
4. Gotlieb, L.: Optimal multiway search trees. SIAM J. Comput. 10, 422-433 (1981)
5. Gotlieb, L., Wood, D.: The construction of optimal multiway search trees and the monotonicity principles. Internat. J. Comput. Math. So. Ag, 17-24 (1981)
6. Huddleston, S., Mehlhorn, K.: A new data structure for representing sorted lists: Acta Informat. 17, 157-184 (1982)
7. Itai, A.: Optimal alphabetic trees. SIAM J. Comput. 5, 9-18 (1976)
8. Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations. Miller, R.E., Thatcher, J.W. (eds.) New York: Plenum Press 1972
9. Knuth, D.E.: Optimum binary search trees. Acta Informat. 1, 14-25 (1971)
10. Knuth, D.E.: The Art of Computer Programming, Vol. 3: Sorting and searching. Reading
(Mass.): Addison-Wesley 1973
11. McCreight, E.M.: Pagination of B*-trees with variable length records. Comm. ACM 20, 670674 (1977)
12. Vaisbnavi, V.K., Kriegel, H.P., Wood, D.: Optimum multiway search trees. Acta Informat. 14,
119-133 (1980)
Received March 22, 1983/November 8, 1983