Academia.eduAcademia.edu

Growing nearly optimal binary search trees

1982, Information Processing Letters

Volume 14, number 3 INFORMATWN PROCESSINGLETTERS 16 May 1982 zyxwvutsrqpo GROWINGNEARLY OPTIMALBINARY SEARCHTREES James F. KORSH zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Departmentof ComputerScience, TempleUniversity Philadelph;ia, , PA 19122, U.S.A. Received 25 August 1981; revised version received 23 January1982 Binary search tree, greedy trees 1. Introduction by the n log n nearly optimal algorithms. Greedy trees are grown from I.& ICI< Kz < .. < K, represent n ordered keys to be stored at the internal nodes of a binary search tree. Let Q0represent keys less than K1, Q ,, keys greater than K, and Qt keys strictly between Ki and Kl+r .The probability that K is the key to be searched for is 4 if K = Ki and C+if K E Qi. An optin al binary search tree T minimizes the expected number of comparisons, P(T), in the keysearch. P(T) is given by: l C 1 Cj<n Pj@j + 1) + C 06iCn @i where 9j is the level of Kj in T and ai is the level of Qi in T. The Q’smust be represented as external nodes in T and an inordcr traversal of T must access the nodes in the order Qo, Kr,Q 1, Kz,.... Kn, Qn. The root is at level zero and the o’s and p’s sum to g. This is the model formulated in [S]. Optimal trees may be found in 0(n2) time using 0(n2) space [S], and a number of nearly optimal trees may be found in O(n) time and space while others may take O(n log n) time [2-41. A nearly optimal tree, in this paper, must have a bounded relative error with respect to the optimal tree. In this paper we will compare these algorithms on the basis of their performance on 50 probability distributions with n = 200 taken from data generated by Gotlieb and Walker [l ] . The results indicate that greedy trees [4] outperform, on the averageand in the worst case, ?he other linear algorithms that generate nearly optimal trees and do better or almost as well as the trees produced 0020-0190/82/0000-0000/$02.75 0 1982 North-Holland the bottom up. All other algorithms considered here grow down from the top. Greedy trees are formed by finding the smallest triple, cti_ 1 + Cyi,combining its corresponding keys to form a subtree with probability %- 1 + & + %, treating the subtree as a new external node, and iterating the construction. Weightbalanced trees are grown by determining the root to minimize the difference in weight (probability) between its left and right subtrees and then applying the same construction to the left and right subtrees obtained. Min-max and entropy trees are grown similarly but different criteria are used for selecting the root. The criterion for the min-max tree is to minimize the maximum of the left and right subtree weights. The criterion for the entropy tree is to maximize the entropy function of three variables, the left subtree weight, the root weight and the right subtree weight. The modifiedentropy tree is similar to the entropy tree except large p’s and a’s are given special influence. The bisectiontree is similar to the weight balanced tree, but gives special consideration to a values adjacent to the root. The trees recommended by Gotlieb and Walker [ 1zyxwvutsrqponmlkjihgfedcbaZYXW ] are similar to wei@t balanced trees, except that nodes adjacent to what would be the weight balanced root may become the root. Also, whenever a subtree is to be constructed with fewer than No internal nodes, it is constructed to be optimal by using the O(Ni) optimal construction. This same idea is used in modifications of the greedy algorithm. When No internal nodes are left after the greedy construction has generated (N - No) 139 Tabie 1 a Optimal Greedy 25 n log n Greedy 15 n log n Greedy 25 n GW weight balanced Greedy 15 n Modified entropy Greedy n Entropy Min-max Bisection Vcigi~i balanced 4.29 6.61 5.87 6.07 6.64 5.96 5.83 6.26 7.09 7.-N 4.31 6.65 5.93 6.17 6.69 6.03 5.85 6.28 7.11 7.45 4.32 6.68 5.97 6.19 6.72 6.06 5.88 6.29 7.13 7.49 4.32 6.68 5.97 6.20 6.71 6.09 5.88 6.33 7.18 7.49 4.36 6.68 5.94 6.16 6.72 6.04 5.83 6.37 7.27 7.37 4.33 6.69 5.97 6.22 6.72 6.09 5.88 6.33 7.18 7.52 4.32 6.63 5.94 6.17 6.69 6.01 5.91 6.41 7.34 7.39 4.33 6.73 5.97 6.23 6.88 6.11 5.89 6.35 7.18 7.53 4.45 6.63 6.01 6.17 6.71 6.02 5.92 6.52 7.47 7.39 -4.55 7.02 6.11 6.32 6.79 6.31 6.14 6.72 7.45 7.51 4.84 7.15 6.31 6.47 7.00 6.55 6.33 6.75 7.47 7.53 4.93 7.09 6.12 6.42 6.82 6.59 6.33 6.85 7.44 7.52 4.63 7.26 6.25 6.40 7.02 6.54 6.45 6.58 7.12 7.70 4.63 7.34 6.33 6.51 7.05 6.57 6.48 6.62 7.13 7.77 4.63 7.35 6.35 6.56 7.08 6.60 6.48 6.63 7.14 7.79 4,63 7.37 6.34 6.55 7.14 6.60 6.47 6.62 7.15 7.80 4.70 7.26 6.29 6.43 7.06 6.56 6.46 6.86 7.46 7.74 4.63 7.37 6.36 6.55 7.13 6.60 6.48 6.65 7.15 7.80 4.65 7.30 6.32 6.51 7.07 6.61 6.48 6.86 7.42 7.79 4.63 7.39 6.47 6.71 7.14 6.60 6.48 6.80 7.16 7.90 4.77 7.30 5.33 6.51 7.09 6.61 6.50 6.95 7.50 7.79 5.00 7.64 6.68 6.87 7.21 6.88 6.80 6.84 7.35 7.87 5.30 7.62 6.75 6.97 7.28 7.02 6.88 6.98 7.73 7.94 5.66 7.80 6.84 6.88 7.24 7.03 6.99 6.87 7.32 7.89 4.00 6.42 5.56 5.76 6.22 5.65 539 6.09 7.01 6.97 4.01 6.46 5.62 5.85 6.23 5.67 5.61 6.10 7.05 7.02 4.01 6.46 5.62 5.85 6.25 5.67 5.63 6.11 7.07 7.02 4.08 6.51 5.67 5.89 6.30 5.67 5.67 6.11 7.12 7.02 4.05 6.47 5.58 5.82 6.26 5.71 5.73 6.32 7.28 7.05 4.08 6.51 5.67 5.89 6.30 5.67 5.68 6.11 7.12 7.02 4.04 6.46 5.61 5.84 6.33 5.68 5.62 6.23 7.31 7.04 4.09 6.51 5.67 5.97 6.30 5.67 5.68 6.12 7.21 7.17 4.08 6.49 5.69 5.85 6.40 5.75 5.65 6.38 7.38 7.05 4.23 6.56 5.83 5.98 6.28 6.02 5.91 6.39 7.22 7.10 4.76 6.68 5.93 6.15 7.24 6.23 6.12 6.45 7.31 7.14 4.54 6.99 5.85 6.21 6.37 6.25 6002 6.62 Case --- z $ m 36 1 2 3 4 5 6 7 8 9 IO 5 7 8 !i 10 1 2 3 _I s m 7. 8 9 10 -7.26 7.09 Table la (ccmtinued) _- wJ 5 g 6 7 8 9 10 __~ _ zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Greedy 15 nlogn Greedy 25 n GW Weight balanced Greedy 15 n Modified entropy Greedy n Entropy Mill-XIlilX Bisection Weight balanced 7.42 6.55 6.65 7.27 6.55 6.75 6.73 7.11 7.87 5.05 7.48 6.63 6.74 7.31 6.60 6.80 6.78 7.13 7.96 5.07 7.50 6.63 6.76 7.31 6.61 6.82 6.78 7.15 8.00 5.06 7.55 6.68 6.80 7.35 6.64 6.83 6.78 7.16 7.98 5.06 7.46 6.56 6.67 7.34 6.56 6.80 6.95 7.38 7.88 5.06 7.55 6.68 6.80 7.39 6.70 6.84 6.78 7.16 8.00 5.04 7.50 6.61 6.74 7.32 6.56 6.81 6.92 7.39 7.91 5.14 7.59 6.69 6.80 7.44 6.70 6.87 6.80 7.16 8.01 5.08 7.50 6.61 6.74 7.35 6.59 6.82 7.05 7.50 7.91 5.38 7.69 6.81 7.03 7.45 6.87 7.05 6.91 7.29 8.02 5.61 7.68 7.01 7.13 7.50 6.99 7.08 7.03 7.61 8.06 5.88 7.85 6.98 7.18 7.46 7.08 7.18 6.96 7.31 8.03 4.14 5.97 5.75 6.00 6.66 5.33 5.41 5.81 6.99 7.27 4.15 5.99 5.80 6.05 6.69 5.35 5.45 5.84 7.04 7.31 4.16 6.03 5.81 6.07 6.70 5.36 5.46 5.87 7.05 7.33 4.20 6.06 5.86 6.10 6.71 5.37 5.48 5.87 7.08 7.34 4.18 6.05 5.78 6.05 6.81 5.37 5.43 5.89 7.20 7.36 4.20 6.06 5.86 6.10 6.77 5.37 5.62 5.87 7.08 7.34 4.18 6.02 5.84 6.10 6.72 5.41 5.48 5.92 7.17 7.32 4.29 6.08 6.03 6.17 6.79 5.51 5.62 5.88 7.12 7.36 4.28 6.05 5.93 6.11 6.79 5.46 5.53 5.97 T.31 7.32 4.32 6.19 5.95 6.44 6.87 5.61 5.90 6.05 7.22 7.42 4.66 6.42 6.19 6.53 6.85 5.87 3.99 6.44 7.45 7.49 5.32 6.75 6.28 6.51 6.96 5.97 6.37 6.36 7.24 7.41 5.04 7 8 9 10 ~~ Greedy 25 nlogn CiiSC optimal a The data for the GW weight balanced, entropy, modified entropy, min-max, weight balanced and optimal trees was obtained from [2]. The author was unable to reproduce the data of [ 2 ] for bisection trees and has used the results of his own implementation of the bisection algorithm in this case. (Also the definition of average path length used for the tables is the same as in [ 11, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA thus it is the definition given in the introduction here plus the sum of the o’s after normalization.) Volume 14, number 3 INFORMATION PROCESSING LETTERS 16 May 1982 subtrees, the optimal subtree is formed from these to complete the construction. The reader is referred to [l-6] for more detailed descriptions of these algorithms. Known implementations of the entropy, modified entropy and GotliebWalker (GW)algorithms take n log Lti me. The min-max, bisection, weightbalanced and greedy algorithms may be implemented to take linear time. The two modified versions of the greedy algorithm with parameter Ne are, respectively, the linear time implementation of [4] and an n log n implementation. The difference is that for the linear version, the (N - Ne) subtrees will not necessarily be the (N - Ne) smallest in weight, while for the n log n version they will be. 2. Resultsand discussion The results obtained by applying the tree constructions to 50 probability distributions with n = 200 are summarized in Table 1. Table 2 gives the average and maximal relative errors over the 50 distributions. All the algorithms, except for the GWweight balanced algorithm, have been shown to be nearly optimal. It is clear that, in terms of average and worst case behavior, the greedy trees outperform all the other linear time tree growing algorithms. It is at least competitive with the n log n time tree growing algorithms. The linear version of the greedy tree growing algorithm with No fixed is niselflinear in time and compares even more favorably with the n log n algorithms. Finally the n log n version of the greedy tree construction with fixed parameter No yields average relative errors of less than 1%and a maximum error of less than 2%. It appears that relative errors of a few percent may be obtained by the greedy algorithm in linear time, but to reduce the errors further, at reasonable cost in time, the n log n version is needed. References [l] C.C. Gotlieb and W.A. Walker, A topdown algorithm for construct& nearly optimal lexicographical trees, in: Graph Theory md Computing (Academic Press, New York, 1972). [2] R, GUttIer,K. MehIhom and W. Schneider, Binary search trees: avera@ and worst case behavior, EIC 1.6(1980) l-3,41-61. 142 Volume 14, number 3 INFORMATIONPROCESSINGLETTERS 16 May 1982 (3) Y. Horibe and T. Nemetz, On the maxentropy rule for [6] K. Mehlhom, Nearly optimal binary search trees, Acta a binary search tree, Acta Inform. 12 (1979) 63-72. Inform. 5 (1975) 287-295. [4] J. Korsh, Greedy binary search trees are nearly optimal, [7) K. Mehlhorn, A best possible bound for the weighted path length of bii search trees, SIAM 3. Comput. fnfonn. process. Lets. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA 13(l) (1981) 16-19. (5) D. Knuth, The Art of Computer Programming,Vol. 3 6(2) (1977) 235-239. (Addison-Wesley,Reading,lUA, 1973). 143