Academia.eduAcademia.edu

Generating probabilities with a specified entropy

2000, Journal of The Franklin Institute-engineering and Applied Mathematics

This paper describes a method to randomly generate vectors of symbol probabilities so that the corresponding discrete memoryless source has a prescribed entropy. One application is to Monte Carlo simulation of the performance of noiseless variable length source coding.

Journal of the Franklin Institute 337 (2000) 97}103 Brief communication Generating probabilities with a speci"ed entropyq Peter F. Swaszek!,*, Sidharth Wali",1 !Department of Electrical and Computer Engineering, Kelley Annex, University of Rhode Island, Kingston, RI 02881, USA "Texas Instruments Inc., Plano, TX 75023, USA Received 16 August 1999; received in revised form 2 February 2000 Abstract This paper describes a method to randomly generate vectors of symbol probabilities so that the corresponding discrete memoryless source has a prescribed entropy. One application is to Monte Carlo simulation of the performance of noiseless variable length source coding. ( 2000 The Franklin Institute. Published by Elsevier Science Ltd. All rights reserved. Keywords: Monte Carlo simulation; Entropy coding 1. Introduction Of interest here is the problem of randomly selecting a vector p"[p , p ,2, p ] 1 2 n such that p *0, j"1, 2,2, n, and +n p "1 (the range of p is a dimension n!1 j/1 j j simplex); the vector p can be thought of as the set of character probabilities for a randomly selected discrete memoryless source. For such sources the entropy n H (p)"! + p log p n j j j/1 is often of interest. On the simplex the entropy varies from a maximum of log n (at the center of the simplex with each p "1/n) to a minimum of zero at each of the n vertices j (where exactly one p equals 1). j q A version of this material appears in the Proceedings of the 1997 Johns Hopkins Conference on Information Sciences and Systems. * Corresponding author. Fax: #1-401-782-6422. E-mail address: [email protected] (P.F. Swaszek). 1 S. Wali was with the University of Rhode Island. 0016-0032/00/$ - see front matter ( 2000 The Franklin Institute. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 1 6 - 0 0 3 2 ( 0 0 ) 0 0 0 1 0 - 7 98 P.F. Swaszek, S. Wali / Journal of the Franklin Institute 337 (2000) 97}103 An algorithm for generating vectors p is described in [1, p. 232] for which the resulting p are uniformly distributed on the simplex. While the length of the vector can be speci"ed, its entropy cannot. This is a major disadvantage, especially notable when used to simulate sources with larger values of n, in that the resulting entropy is usually close to log n (more precisely, as n increases the mean of the entropy for a uniformly distributed p grows toward log n and the variance goes to zero [2,3]; the average source tends to one with high entropy); hence, many of the vectors so generated are ones for which the entropy measure is not very interesting. For example, the corresponding entropy codes are nearly "xed length codes. While other ad hoc techniques for generating a random vector p exist, they are limited to either controlling the entropy (such as setting m of the probabilities equal to a, the remaining n!m equal to (1!ma)/(n!m), and then solving for a to yield the desired entropy) or ranging over the entire simplex (such as generating a vector of positive random variables and then normalizing them to sum to unity). This paper presents an iterative algorithm for selecting p so that the resulting source has both a "xed entropy and has as its range all vectors p with the desired entropy. Unfortunately, while the algorithm's output does range over the complete set of sources with a given entropy, the distribution on this range is not uniform. Modifying the algorithm or constructing a di!erent algorithm to yield a uniform distribution appears to be a very di$cult task. 2. The algorithm The algorithm, based on a straightforward decomposition of the entropy function, focuses on one of the probabilities and computes its range of allowable values that satis"es both the entropy constraint and the original restriction of being a probability. This range is then used to randomly select the probability. The entropy of the remaining (unknown) probabilities is then computed and the process is repeated (iteratively "nding one of the probabilities) until only two probabilities remain. Solving a binary entropy expression yields these two values. Finally, a normalization step converts the chosen values to the desired p. 2.1. Algorithm development Begin by assuming that there are n*3 probabilities to "nd and that the desired entropy is H . Singling out the nth probability and renaming it q (p "q ) yields n n n n n~1 H "! + p log p #q log q . n j j n n j/1 Selectively multiplying and dividing through by 1!q (assuming that q O1, i.e. n n H O0) yields n p (1!q ) n~1 p n !q log q . j log j H "!(1!q ) + n n n n 1!q 1!q n n j/1 P.F. Swaszek, S. Wali / Journal of the Franklin Institute 337 (2000) 97}103 99 (Note that if H "0 then select p "p "2"p "0, p "1, and terminate the n 1 2 n~1 n algorithm at this point.) De"ning r "p /(1!q ), j"1, 2,2, n!1, this last expresj j n sion can be written as n~1 n~1 H "!(1!q ) + r log r !(1!q ) log (1!q ) + r !q log q . n n j j n n j n n j/1 j/1 Note that the r (or the vector r) themselves form a full set of probabilities so that this j last expression simpli"es to H "(1!q )H (r)!(1!q ) log (1!q )!q log q n n n~1 n n n n "(1!q )H (r)#h (q ) (1) n n~1 b n in which H (r) is the entropy of the (n!1)-ary vector r and h ( ) ) is the binary n~1 b entropy function h (x)"!x log x!(1!x) log (1!x). b The resulting expansion of H in (1) has the obvious interpretation that the source n entropy equals the binary entropy of choosing the nth symbol or not plus the probability of not choosing it times the entropy of the remaining n!1 symbols. Rearranging terms yields H !h (q ) b n . H (r)" n n~1 1!q n For convenience, de"ne the function ¸(H, q) as H!h (q) b . ¸(H, q)" 1!q (2) While the probability vector r is as yet unknown, it must satisfy 0)H (r)) n~1 log (n!1); hence, q must satisfy n 0)¸(H , q ))log (n!1). n n The result is two-fold: 1. First, a condition on the possible value of q beyond the normal restriction of n 0)q )1 is obtained; hence, select q to satisfy both n n 0)q )1 n and 0)¸(H , q ))log (n!1). n n A simple approach, employed in the examples below, is to select q uniformly from n this range of allowable values. 2. Once q is chosen, the remaining normalized probabilities, the r , need to be chosen n j so that they have entropy ¸(H , q ). Since this is the same problem as stated above, n n except that there are now n!1 variables and a di!erent value of entropy 100 P.F. Swaszek, S. Wali / Journal of the Franklin Institute 337 (2000) 97}103 (H "¸(H , q )), the above step can be iterated. Note that the next iteration's n~1 n n entropy, H (r), may be larger or smaller than H . n~1 n This recursive selection of one value and computation of the remaining entropy continues until only two variables remain, r and r ("1!r ). The remaining 1 2 1 entropy for these normalized variables, H ("¸(H , q )), satis"es the usual binary 2 3 3 entropy function H "!r log r !r log r . (3) 2 1 1 2 2 To "nish the selection of the q , take q "r and q "r "1!q as one of the two j 1 1 2 2 1 (equivalent) solutions of (3). At this point the algorithm yields a set of scaled values q through q (scaled in that 1 n the vectors r were scaled to sum to unity). To "nd p the scaling constants must be removed. Recursing through the sequential de"nitions of the r vectors, the relevant relations are G q , k"n, n (4) p ), k"n!1, n!2,2, 3, 2, p " q (1!+n j/k`1 j k k k"1. q (1!+n p ), j/3 j 1 The range of selection of the individual values, each q , is limited by the function k ¸(H , q ). Depending upon the relative values of the current entropy, H , and the k k k number of remaining probabilities, k, the range of q that satis"es both k 0)¸(H , q ))log (k!1) and 0)q )1 has three distinct possible forms as shown k k k in Fig. 1. f 1)H )log (k!1): q is restricted to the single interval [0, a] where a satis"es k k ¸(H , a)"log (k!1). k Fig. 1. Possible realizations of the range of q : (a) 1)H )log (k!1), (b) log (k!1)(H , and (c) H (1. k k k k P.F. Swaszek, S. Wali / Journal of the Franklin Institute 337 (2000) 97}103 101 f log (k!1)(H : q is restricted to the single interval [a, b] where the interval's k k endpoints satisfy ¸(H , a)"¸(H , b)"log (k!1). k k f H (1: q is restricted to the union of two intervals, [0, a] and [b, c], where the k k intervals' endpoints satisfy ¸(H , a)"¸(H , b)"0 and ¸(H , c)"log (k!1). k k k Note that for all values of H, the function ¸(H, q) is continuous on the range 0)q)1, takes initial value of ¸"H for q"0, exhibits one minimum at log q"!H (or q"b~H assuming logarithms to the base b), and goes to in"nity as q approaches unity; hence, simple root "nding techniques can be applied to "nd the desired interval endpoints. 2.2. An example As an example, let n"5 and assume that the desired entropy is H "1.5 bits. The 5 q are generated as follows: k f Using H "1.5 and k"5 yields that q is restricted to [0, 0.6942]. The random 5 5 selection of q "0.4715 yields the next stage entropy of ¸(1.5, 0.4715)"0.9506 bits. 5 f Using H "0.9506 and k"4 yields that q is restricted to [0, 0.3699] X 4 4 [0.6301, 0.8239]. The random selection of q "0.7871 yields the next stage entropy 4 of ¸(0.9506, 0.7871)"0.9565 bits. f Using H "0.9565 and k"3 yields that q is restricted to [0, 0.3777] X 3 3 [0.6223, 0.7883]. The random selection of q "0.2085 yields the next stage entropy 3 of ¸(0.9565, 0.2085)"0.2751 bits. f Inverting the binary entropy function at H "0.2751 bits yields q "0.0474 and 2 1 q "0.9526. 2 All that is left to specify p is to remove the scalings p "q "0.4715, 5 5 p "q (1!p )"0.4160, 4 4 5 p "q (1!p !p )"0.0235, 3 3 5 4 p "q (1!p !p !p )"0.0042, 2 2 5 4 3 p "q (1!p !p !p )"0.0848, 1 1 5 4 3 resulting in a set of probabilities with entropy 1.5 bits. 3. An application Over 40 years ago Hu!man [4] (see also [5]) introduced a procedure for designing variable length source codes which achieve performance close to Shannon's entropy bound. For individual codeword lengths, l , the average length ¸M ("+n l p ) of i/1 i i i 102 P.F. Swaszek, S. Wali / Journal of the Franklin Institute 337 (2000) 97}103 Fig. 2. Histograms (2000 trials) of ¸M for various combinations of n and H . n a `Hu!mana code is always within one unit of the source entropy H )¸ M (H #1. n n Since that time, many authors have considered tighter bounds on the average length. As an application, the algorithm developed above is employed to learn more about the performance of Hu!man's algorithm by Monte Carlo simulation. Speci"cally, the average length of Hu!man codes for randomly generated sources were compiled for di!erent combinations of n and H . As examples, Fig. 2 shows histograms of the n average length for four di!erent combinations. Note that the spread of each histogram is signi"cantly less than the one bit suggested by the bound above. 4. Comments While the technique for generating vectors p is employed above to the example of Hu!man codes, the same methods could be used to study other noiseless variable length codes such as those with self-synchronizing properties for noisy channels. P.F. Swaszek, S. Wali / Journal of the Franklin Institute 337 (2000) 97}103 103 As a "nal remark, note that a parallel or binary tree type approach to this problem, as compared to the serial approach above, is possible. Details appear in [6]. The complexity of the parallel approach is similar in that n!1 roots of the binary entropy function must be computed. References [1] G.S. Fishman, Monte Carlo Concepts, Algorithms, and Applications, Springer, Berlin, 1996. [2] L.L. Campbell, Averaging entropy, IEEE Trans. Inform. Theory IT-41 (1995) 338}339. [3] C.G. Gunther, W.R. Schneider, Entropy as a function of alphabet size, Proceedings of the 1993 IEEE International Symposium on Information Theory, p. 70. [4] D.A. Hu!man, A method for the construction of minimum-redundancy codes, Proceedings IRE, Vol. 40, September 1952, pp. 1098}1101. [5] R.G. Gallagher, Variations on a theme by Hu!man, IEEE Trans. Inform. Theory IT-24 (1978) 668}674. [6] S. Wali, Generating discrete memoryless sources for testing variable length codes, M.S.E.E. thesis, Department of Electrical Engineering, University of Rhode Island, 1997.