Academia.eduAcademia.edu

Entropy measures and some distribution approximations

Many information measures are suggested in the literature. Among these measures are the Shannon H,(O) and the Awad A,(O) entropies. In this work we suggest a new entropy measure, B~(0), which is based on the maximum likelihood function. These three entropies were calculated from the gamma distribution and its normal approximation, the binomial and its Poisson approximation, and the Poisson and its normal approximation. The relative losses in these three entropies are used as a criterion for the appropriateness of the approximation. Copyright © 1996 Elsevier Science Ltd.

(~i Pergamon Microelectron.Reliab.,Vol. 36, No. 10, pp. 1569-1580, 1996 Copyright © 1996ElsevierScienceLtd Printed in Great Britain.All rights reserved 0026-2714/96 $15.00+.00 PII: S0026-2714(96)00006-6 ENTROPY MEASURES AND SOME DISTRIBUTION APPROXIMATIONS M. M. A Z Z A M and A. M. A W A D Department of Mathematics, University of Jordan, Amman, Jordan (Received for publication 30 Auoust 1995) Abstract--Many information measures are suggested in the literature. Among these measures are the Shannon H,(O) and the Awad A,(O) entropies. In this work we suggest a new entropy measure, B~(0), which is based on the maximum likelihood function. These three entropies were calculated from the gamma distribution and its normal approximation, the binomial and its Poisson approximation, and the Poisson and its normal approximation. The relative losses in these three entropies are used as a criterion for the appropriateness of the approximation. Copyright © 1996 Elsevier Science Ltd. is defined by 1. INTRODUCTION Shannon I-1] introduced a measure for the a m o u n t of information in a random variable X. This measure, known as "the Shannon entropy", is defined by H(O) = - E log f(X; 0), where f(x; 0) is the probability density function (p.d.f.) of the random variable X. Awad [2] extended the Shannon entropy to the A-entropy (known as the sup-entropy) which is defined by A(O) = - E log f(X; O) ' L(X; 0) Bn(O) = - E log L(X; ~)' provided that the M L E ~ exists and is unique. The following theorem [3] will be used in the sequel. 1.2. Theorem If X ~ Gamma(a, 1), then E(IogX) = fo ~ logx" xa-I dx = W(a), F(a) e -x where W(a) is the digamma function which is given by the following. where 6 = supx f(x; 0). In this work we will study the information embedded in a random sample of size n drawn from a distribution with p.d.f, f(x; 0), 0 ~ f~. The Shannon entropy for a random sample of size n, Hn(O) is given by 1.3. Remark For the computation of the digamma function one might use the following known results [4]: (i) W(1) = - y ~ -0.577215665, (ii) ~ ( n ) = - y + ~ , - ] 1/k, ifn is an integer, n 1> 2. H,(O) = - ~ E log f(Xi; 0). i-I 1.4. Corollary The sup-entropy An(O) is given by If X ~ Gamma(c~, fl), with p.d.f. An(O) = - ~ E log f ( X : 0), i=i 6 f(x; 0) = where ~ = supx , f(xi; 0). Since X 1. . . . . X n are i.i.d.f(x; 0), Hn(O) = nil(O) and AdO) = 1 r(~)fl" e-X/ax,- 1 if x > 0 and zero otherwise, 0 -- (~, fl), hA(O). It was shown in Ref. [2] that the A-entropy has some advantages over the H-entropy. We suggest a modification of the A-entropy as given below. 1.1. Definition Let X = (Xl, X2 . . . . . Xn) be a random sample from a population with p.d.f, f(x; 0), 0 ~ f~. Let L(X; 0) =f(x~ . . . . . xn; 0), and 0 be the maximum likelihood estimate (MLE) of 0. Then the B : e n t r o p y then X/fl ~ Gamma(~, 1) and so E(log X ) = log fl + ~(~). Section 2 gives the values of the measures Hn(0), An(O) and Bn(O) for the gamma distribution and the corresponding normal approximation. Section 3 gives the values of the same measures for the binomial distribution and its Poisson approximation. The values of the above mentioned measures for the Poisson distribution and its normal approximation are given in Section 4. In Section 5 we give the absolute 1569 1570 (a) M . M . A z z a m a n d A. M. A w a d 0.070 0.5 = th (b) 0.060 0.050 ::i. 0.o40 t~ 0.030 0.008 5=th • • - • m 0.0O6 0.020 0.004 0.010 I 25 (c) o.o18 0.016 0.014 m" o.o12 0.010 - ~ • o l o ab i a i I . . 50 75 100 125 n ,k . . I . . 150 175 0.0]2 0.002 0 & 200 lO=th ° o° - I 25 0 (d) ~ o.0]o I 50 °° I 75 ° • Oeo o •go i 100 n I 150 I 125 oO I 175 I 200 o 20 = th 0.0Ol 0.008 ~." 0.008 ~ o.oo6 J ,,; 0.0o4 0.0o6 0.004 OOOOOOOooooooo0 0.002 0 25 0.0O2 I I I I I I I 50 75 I00 125 150 175 200 0 n (e) -- 0 OO0 OO0 I , , i 25 50 75 10o 125 OO 150 O0 i r 175 200 n o.oo8 40 = th (f) ~ 0.008 0.007 w o.0o6 • 0.006 60 = th ~.' 0.005 = 4,.¢ o.004 -.- 0 . 0 0 4 ~e o.oo3 0.002 0.002 OO0 OOoo O• 25 0 I 50 I 75 I I I00 125 B 0.001 ~ e °l°°qr 150 175 0 200 °I° ° °I ° • l O O ~' " °l • • I' I 0 25 50 75 n (g) O O0 , 0.007 w 80=th 0.006 (h) 0.007 0.005 0.0O5 ,~ 0.004 ~ 0.003 '4.0.0O3 [ r~ 0.002 0.002 0 • 0 0 I 25 I 50 I 75 150 175 200 , 0.001 0000 10O I 125 lO0=th 0.006- ~. 0.004 0 125 n - 0.001 100 T°°l°°~ !50 175 200 0 0 25 i OOoo~ I 50 75 ° 0• ° go i ["'i I00 125 150 175 200 n Fig. I. Relative loss versus sample size when g a m m a distribution is approximated by normal distribution• n (continued) gR.L.A.E. o .o .~ .o .~ o l l l I I .o o .o l I. I m. I R.L.H.E. .o .o o R.L.H.E. R.L.H.E. .o I I I L- o o .o I I I o I Ia I I p I .o I --0 -e --0 ~-e ~--e t~ 0 8 8 -e II II II -0 R.L.B.E. R.LH.E. .o o~aa I N I .o 0 I I I I I .o R.LHoE. p .o .o .o P R.L.H.E. p o P P 0 0 I 1 I I I I I I I I • --@ "~-e --0 --0 ~-e II --0 II "-e 8 II m th = 2 0 th ~,~ 2O t h -- 0 . 5 o I I 1 I 1 I I ~" th = 0.5 I I I L~ o L~ u~ b l o o o P. ~r II tit =20 p ~ .~ I I .o I ~-,=:r .o I p .~ p th = 2 0 o p th =0.5 .o .¢:, cD o o th=0.5 ~ o I l= 0 0 ~. '_' s II = m o I! Entropy measures (i) (J) 0.010 5=n lO=n 0.012 ~ 0.008 // // oo'oF 0.006 g 1573 ~_ o.oo.r II ~ 0.004 II \\ \\ 0.11114 0.002 0.002 I 0 0 50 I00 150 200 250 300 (k) 0 50 IO0 150 200 O) 20 ffi n o.o175 =n 0.020 0.0150 0.0125 ! 0.015 g- o.oloo II ,,~ 0.010 It "~ 0.0075 I 0.0050 0.005 0.OO25 0 50 75 I00 125 I 175 150 0 (m) 0.0010 ~----I1 (n) lO=n 0.0012 0.0008 0.0010 ~ 0.0006 ~ 0.0008 II It 0.0006 O.O004 0.1111134 0.0002 0.0002 I 0 500 I000 1500 2000 0 2 5 0 0 3000 (o) 0 500 1000 1500 2000 ~) 0.00175 20=n 0.00150 0.0020 t A ,,°='F 0.00125 ~ 0.00100 // I ~-o o.=o~ tt 0.00075 "5 30 = n \\ \ 0.00050 0.00025 0 o-orj, . . . . . . ,. 500 750 1000 1250 1500 I 1750 Fig. 2. 600 (continued). 800 1000 , 1200 14(]0 1600 1574 (a) M.M. 0.00175 - A z z a m a n d A. M . A w a d (b) 0.01 = th i 0.03 = th 0.004 i- 0.00150 0.00125 - ~. 0.00100 - 0.003 - • 00 ,~ 0.00075 - 0.002 - 0,00050 - 0.001 - 0.00025 I 0 20 0 • I I 40 60 I • I 0 100 0 l l I I 2o 40 60 80 ,I I00 n (c) (d) 0.0o9 j 0.07 = th 0.012 0.05 = th 0.010 m. ~v m. ::=[: 0.008 0.006 0.004 0.002 [-- o001ol- 0.002 I I o I I I 6O 8O 100 I 2O 0 I 4O (e) ,I, 100 (f) u • O O O O O u 0.0200 0.09 = th 0.014, 0.1 = t h o.o175 0.012' O.Ol5O ~ m. O.OLO • •0 • 00 0.0125 o.oolo ~. 0.OO8 '~ 0.0075 '~ 0.006 0.0050 0.004- 0.0025 0 0.002 o o I I I 20 4O 6O I 8O I I00 0 I I I I 20 40 60 80 n I00 n (~ (h) 0.08 0.3 = th 0.07 0.16[0.14 [-- 0.5 = th 0.12 J-0.10 l-- 0.06 m. 0.O5 ~ I n n 0.016 I 6O 0.04 0,08 I 0.06 '~ 0.03 • 0.02 00 • • 0.04 • 0.02 0.01 0 I I I I I 20 40 60 80 I00 0 I 0 I I 20 I I 80 ioo n Fig. 3. P o i s s o n a p p r o x i m a t i o n to the binomial distribution. (continued) Entropy measures (i) O) 8 0.30 0.7 = th 1575 0.9 0.9 = th 0.8 0.25• 0.20- o 0.7 m. 0.6 o • • ~.' 0.15 r~ o.10 - • m. o.5 • • 0.05 0 0 I I I I I 20 40 60 80 100 O • O• • 0.40.30.20.10 0 • • • I I I I 40 60 80 00 • I (]) 0.01 = t h 0.020 • O0 0.010 O0 0.03 = th 0.05 0.015 - m. O 0.04 ~. 0.03 • 0.005 - O0 • 0 o 0.O2 O0 • • I ¶ 40 I 60 • 0.01 - • 20 0 I 80 I o I00 0 I I 20 40 n °~'e 60 I I 8O 10o n ( n ) 0.10 0.08 0.05 = th 0.07 = th 0.07 0.08 0.06 ~. 0.05 < 0.04~ O n (k) (m) O• 20 n 0~ O O 0.06 .i ,~ 0.04; 0.03 : • 0.02 -- • • 0.02 - 0.01 -0 0 I I 20 40 • °l 60 I I 80 100 I 0 9 20 0 n I I I 60 80 100 n (o) (P) 0.09 = th 0.20 0.1 = t h 0.08 0.15 m. <. <. 0.10 0.06 0.04 0.02 0.05 • el 0 0 • • 20 • O • /u, I • 40 60 I I 80 100 - 0 n 0 q* 20 el 40 n Fig. 3. (continued). I I I 60 80 100 1576 M . M . Azzam and A. M. Awad (b) (r) 0.25 I 0 . 3 = t h • O ° 0.5 = th 0.4 0.20 ~. 0.3 <. 0.15 <. ~0.2- 0.10- 0.1- 0.05 0 • 0 I 20 I 40 I 60 I 80 I 100 0 • 0 IO 20 I 4o n I 60 I 80 I 1oo t 60 I 80 I 1oo n (s) (t) i 0.6 0.7 = th 0.9 = th 0.8 0.5 ~. 0.6 < 0.4 ~ ~.' 0.3 <. ~ 0.4 0.2-0.2 0.1--I 0 0 20 I 40 I 60 I 80 9 /. 20 0 100 I 4o n n (u) (v) 0.0200 [- 0.01 = t h 0.030 0,0175 1"0.0150 10.0125 [-0 o t o o I- o 0.03 = th .:oZfV 0.0075 V 0.00501-- ooO25oF • 0.025V 0.010 I 20 I 40 I 60 l 80 0.005 V O" I I00 0 I 20 I 6O I 40 n I I 80 I00 n (w) (x) 0.05 0.07 -- th e 0.08 0.05 = th 0.04 ~ 0.06 0.03 ~0.04 "~ 0.02 0.02 0.01 o o I 2o I I 6o I 8O I I00 I 20 n I 6o I 40 I 80 I lOO n Fig. 3 (continued) Entropy measures 1577 (z) (Y) 0.01 - 0.09 = th 0.1 =th 0.10 0.08 • 0.06 0.08 • 0.06 ~ • •D t~ 0.040.02 - 0 0 0 0 0 I I I I I • • • 20 40 60 80 IO0 0.02 0 O • 0 I 0 • I • 0.3 = th • • • (B) 0.25 - 0.5 _ 0.5 = th 0.4 ~.' 0.20 - ~ 0.3 0.150.10 - 0.2 0.05 - 0.1 • 0 I I I I 20 40 60 80 I -- 0 100 • 0 00 I I I I I 20 40 60 80 100 rl n (C) 0.8 - 0.7 = th 0.7 0.6 0.5 ~ 0.4 0.3 0.2 0.1 0 I 100 n (A) 0.35 0 I 80 I 20 n 0.30 - O ,,z 0.04 • • (D) 1.0 0.9 = th 0.8 • m o.6 4 '~ 0.4 • • • • 0 0 0 0 • 0.2 I I I I 20 40 60 80 n I 0 100 Fig. 3. 0 I I I I I 20 40 60 80 100 n (continued). relative losses in the entropy measures m e n t i o n e d above due to approximation. Concluding remarks are also given. entropies of the exact distribution 2.1. The LetY= X, then Y ~ Gamma(n, !). The p.d.f, of Y is given by 2. NORMAL APPROXIMATION TO THE GAMMA DISTRIBUTION g(y; 0) = - - The central limit theorem states that if X l, X2,..., X, is a r a n d o m sample from a population with m e a n # and finite variance a 2, then x/n()~ - #)/o converges in distribution to the s t a n d a r d normal distribution, N(0, I). That is, for large n, .~ has approximately the normal distribution with mean ~ and variance o-2/n. In this section we will assume that X 1, X 2. . . . . X. is a r a n d o m sample from an exponential distribution with mean 0. We will evaluate the S h a n n o n entropy, the A-entropy a n d the B-entropy for the exact distribution of X and its normal approximation. Then we select the possible values of the sample size n, such that the relative loss in e n t r o p y due to normal a p p r o x i m a t i o n is less than a given small n u m b e r e. 36:10-Q 1 e - YnlOyn - t if y > 0 and zero otherwise. Notice that fn-lo 5 = sup O(Y; 0) = g ~ - - - ~ ;0 ) Y and O=r. Then H,(O) = -El-log - F(n) - n log 0 + n log n 1) log Y] YOIn + ( n - = n + log(F(n)) - log n - (n - 1)U/(n) + log 0, (from Section 1.4.), 1578 M.M. Azzam and A. M. Awad of 0 2 + 0.06 0.05 • ~ nyO -- ny 2 = 0 which is 0=cg, • wherec=- 1+ 2 4 / n - 1 , 004 • '~ 0.03 • • • • • • • • since y > 0 a.s. • • • Mean nth • 0.02 . . . . . i- ~--i-b----o. . . . . . . . . . . . . . . . . . I° el ° 5 10 15 20 (a) Shannon entropy I I I I 25 30 a n d the c o r r e s p o n d i n g entropies of the a p p r o x i m a t e distribution are H* (O) = ½ + l o g O - ½ 1 o g ( ~ ) , 0.6- A*(O)- 0.5- and ~.' 0 . 4 - ~ • 0.3 0.2- • OO 0.10 ...... 0 • OOOOoOo L ..... 5 0 • • L ..... • • • • • Mean nth • L ..... L ..... k ..... • / • 3. POISSON APPROXIMATION TO T H E BINOMIAL D I S T R I B U T I O N . o • o ° o • • • • Meannth 0.21--- w - . o---•_ o - . . . . . . . . . . . . . . . . . . . . . . . lea I I 0 5 10 15 20 (C) B-entropy I 0 is n o r m a l l y distributed with m e a n x/rn a n d variance 1. N o t e t h a t A*(O) a n d B*(O) are free of 0. "..• 1 • • • c x/~Y 30 • 1 V L 25 0.7~ 0"6V Elog V+½- where • • 10 15 20 (b) A-entropy 0.3[- B* = ½ 1 o g n - l o g c - I I I 25 30 Let Xs, X2 . . . . . X. be a r a n d o m sample from a Bernoulli distribution with p.d.f. f(x; O) = 0~(1 - 0) 1 -x ifx • {0, 1 } a n d zero otherwise. Let Y = ~7= x Xi; notice t h a t Y ~ Binomial (n, 0). Fig. 4. Relative losses when the Poisson distribution is approximated by the normal distribution. 3.1. The entropies of the exact distribution The p.d.f, of Y is given by A,(O) = H,(O) + log(6) = l+(n-1){log(n-1)-U/(n)}, n>/2, 9(y; O) = ( ; ) O Y ( 1 - 0)"-y and B,(O) = - E [ n log(Y/0) - n Y/O + n] = n{log n - ~ ( n ) } (from Section 1.4.). N o t e t h a t A.(O) a n d B.(O) are free of 0 a n d they depend only on the sample size n. 2.2. The entropies of the normal approximation By the central limit t h e o r e m Y ~ N (o 2 n _ Notice t h a t 0 = Y/n. It can be s h o w n t h a t 6 = sup 9(Y; 0) = g([0(n + 1)]; 0), n 1> 2, y where [ml is the greatest integer less t h a n or equal to m. H,(O) = E(log(Y[(n - Y)!)) - log n[ - 0 log 0 , 0 > 0. The a p p r o x i m a t e p.d.f, of Y is given by n if y • {0, 1. . . . . n} a n d zero otherwise. since 0 = Y - n(1 - 0) l o g ( 1 - 0), A,(O) = n,(o) + log 3, 0)2t 13.(0) = E(Y log Y) + E((n - Y) iog(n - Y)) J - if y > 0 a n d zero otherwise. Notice t h a t (n~ 6 = sup h(y; O) = \2n~,] 1/2 " Y It can be s h o w n t h a t the M L E for 0 is the solution n { O l o g 0 + (1 - 0) log(1 - 0) + l o g n 1. 3.2. The entropies for the Poisson approximation Notice t h a t if Y ~ b i n o m i a l (n; 0), l i m . . o• nO = 2, then g(y; O) = ( ; ) O ' ( 1 - - 0)"-' . ~ , e-aM-y! Entropy measures That is, for large n and small 0, Y ~ Poisson (nO). The approximate p.d.f, of Y is given by e-"°(nO)" h(y; O) y~ 1579 distribution are H*(O) = if y e {0, 1, 2. . . . } A*(O) - ½. and zero otherwise. Moreover, Notice that fi = sup h(y; O) = h([nO]; 0), ½ + ½ log(2nu0), ,*(o) : (n/> 2) ~[- , f(Y; 0)] y 1 and 1 = ½ log 0 + ½ - ½E(log 0) - 2nn E 0 ( Y - nO)2 O= Y/n. =½1og0+½-½Elog[x/i It can be shown that H*(O) = E(log Y!) + nO(1 - log(n0)), +½1og2n- 1 E 1 2n 0 ( Y - nO)2" A*(O) = E(log Y!) - log(In0] !) - (nO - [nO]) log(n0) + 4 Y z - 11 (1) Yet and 1 B*(O) = E(Y log Y) - nO log(n0). y 2nO ( Y - nO)2= 4. N O R M A L APPROXIMATION TO THE POISSON DISTRIBUTION )no Y 2~- 1 + :¼(1+~+4~-1)-Y+ y2 ~11 + 4 Y z - 1 : ¼(\/1 + 4 Y z - 1) Let Xl, X 2 . . . . . X n be a random sample from a Poisson distribution with mean 0. Let Y - ~i= 1 XI. Notice that Y has the Poisson distribution with mean nO. -- ( n -Y+ YE(x/1 + 4 Y 2 - 1) 4y 2 :½~1+4Y 4.1. The entropies of the exact distribution 2- Y. So, Notice that the entropy for the exact distribution are the same as those given in Section 3.2. B,(O) = ½log 0 + ½ - ½E log[x/1 + 4Y 2 - 1] +½1og2+½1ogn-½Ex/1 +4Y 2+n0. 4.2. The entropies of the normal approximation Y - nO By the central limit theorem - - N(0, 1). That is, for large n, the approximate p.d.f, of Y is given by 1 nO)2~. 1 1/2 h(y; O) = (2nnO) exp { - 2 n 0 (Y To find the MLE 0, we proceed as follows: 3 1 log(h(y, 0)) = - ½ log(2mt0) - 7 ~ (y - n0) 2, 2nO o~log(h(y,O)) - 1 20 --4n20(y--nO)--2n(y--nO) 2 4n202 ~ l o g h(y, 0 ) = 0 gives 5. RELATIVE LOSS IN ENTROPIES D U E TO APPROXIMATIONS To have an idea about the relative performance of the three entropies given in Section 1, we have used "Mathematica 2.2.3" to calculate the relative losses in the three entropies H,(O), A.(O) and 13,(0) incurred by approximating the gamma by normal, the binomial by Poisson and the Poisson by normal. We will use the relative losses as a criterion to judge the appropriateness of the approximation. It will be assumed that the approximation is acceptable if the relative loss does not exceed 10%. Figure 1 gives the plots of the relative losses versus the sample size when the gamma distribution is approximated by normal distribution. From this figure we observe that: 2n202 + 2nO - 2y 2 = 0. Hence, 0= - 1 + x/1 + 4y 2 2n The corresponding entropies for the approximate (1) The relative loss is decreasing in both n and 0 for the three considered entropies. (2) The relative loss in the H-entropy and the A-entropy is less than 10% for all n and all 0. The relative loss in the B-entropy is less than 10% of n~>20. 1580 M.M. Azzam and A. M. Awad These observations show that the B-entropy has an advantage over the A-entropy and the H-entropy since it shows that the normal approximation to the gamma is acceptable for large sample sizes. This fact is also clear from Fig. 2 which gives the plots of the probability density functions of the gamma and the normal distributions. For the Poisson approximation to the binomial distribution it is observed from Fig. 3 that the relative loss in the H-entropy is less than 10~o for 0 < 0.3 and n = 10(10)100. However, the H-entropy gives a relative loss less than 10~o for 0 ~< 0.07. The B-entropy gives a relative loss less than 10~o for 0 ~< 0.10. Figure 4 gives the relative losses of the three entropies when the Poisson distribution is approximated by the normal distribution. We observe that the relative loss (which is a function of nO) is less than 10~o only for the H-entropy. Acknowledgement--This research has been supported by the University of Jordan. REFERENCES 1. C. E. Shannon, A mathematical theory of communication, Bell System Tech. J. 27, 379-423; 623-659 (1948). 2. A. Awad, A statistical information measure. Dirasat (Sci.), XIV(12), 7-20 (1987). 3. A. Awad, The Shannon entropy of generalized gamma and of related distributions: Proc. First Jordanian Mathematics Conf. (edited by AI-Zoubi), Yarmouk University, Jordan, pp. 13-27 (1991). 4. I. S. Gradshteyn and I. M. Ryzhik, Tables of Integrals, Series and Products. Academic Press, London (1965).