No of Flips For First Head
No of Flips For First Head
No of Flips For First Head
Homework 1
Solutions
Text
Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips
required.
Solution
(a) The random variable X is on the domain X = {1, 2, 3, . . .} and it denotes the number of flips
needed to get the first head, i.e. 1 + the number of consecutive tails appeared before the first
head.
Since the coin is said to be fair, we have p(“head”) = p(“tail”) = 12 and hence (exploiting the
independence of the coin flips):
1
p(X = 1) = p(“head”) =
2
2
1 1 1
p(X = 2) = p(“tail”) ∗ p(“head”) = ∗ =
2 2 2
..
.
ntimes n
z }| { 1 1 1 1
p(X = n) =p(“tail”) ∗ . . . ∗ p(“tail”) ∗p(“head”) = ∗ . . . ∗ ∗ =
2 2 2 2
x
1
pX (x) =
2
1
Once the distribution is known, H(X) can be computed from the definition:
X
H(X) = − pX (x) log2 pX (x)
x∈X
∞ x x
X 1 1
=− log2
2 2
x=1
∞ x x
X 1 1
=− log2 (since the summed expr. equals 0 for x = 0)
2 2
x=0
∞ x
X 1 1
=− x log2 (property of logarithms)
2 2
x=0
X ∞ x
1 1
= − log2 x
2 2
x=0
∞ x ∞
!
1
X 1 X k
= x= 2
2 = 2 [bit] exploiting (k)x x =
x=0
2 1 − 21 x=0
(1 − k)2
(b) Since the most likely value for X is 1 (p(X = 1) = 12 ), the most efficient first question is: “Is
X = 1?”; the next question will be “Is X = 2?” and so on, until a positive answer is found.
If this strategy is used, the random variable Y representing the number of questions will have
the same distribution as X and it will be:
∞ y 1
X 1 2
E [Y ] = y = =2
2 1 2
y=0 1 − 2
Entropy of functions of a random variable. Let X be a discrete random variable. Show that the
entropy of a function of X is less than or equal to the entropy of X by justifying the following
steps:
(a) (b)
H(X, g(X)) = H(X) + H(g(X)|X) = H(X) (1)
(c) (d)
H(X, g(X)) = H(g(X)) + H(X|g(X)) ≥ H(g(X)) (2)
2
Thus, H(g(X)) ≤ H(X).
Solution
(a) It comes from entropy’s chain rule applied to random variables X and g(X), i.e. H(X, Y ) =
H(X) + H(Y |X), so H(X, g(X)) = H(X) + H(g(X)|X).
(b) Intuitively, if g(x) depends only on X and if the value of X is known, g(X) is completely
specified and it has a deterministic value. The entropy of a deterministic value is 0, so
H(g(X)|X) = 0 and H(X) + H(g(X)|X) = H(X).
(c) Again, this formula comes from the entropy’s chain rule, in the form: H(X, Y ) = H(Y ) +
H(X|Y ).
(d) Proving that H(g(X)) + H(X|g(X)) ≥ H(g(X)) means proving that H(X|g(X)) ≥ 0: the
non-negativity is one of the property of entropy and can be proved from its definition by noting
that the logarithm of a probability (a quantity always less than or equal to 1) is non-positive.
In particular H(X|g(X)) = 0 if the knowledge of the value of g(X) allows to totally specify
the value of X; otherwise H(X|g(X)) > 0 (for example if g(X) is an injective function).
Text
Coin weighing. Suppose that one has n coins, among which there may or may not be one counterfeit
coin. If there is a counterfeit coin, it may be either heavier or lighter than the other coins. The
coins are to be weighed by a balance.
Find an upper bound on the number of coins n so that k weighings will find the counterfeit coin
(if any) and correctly declare it to be heavier or lighter.
Solution
Let X be a string of n characters on the alphabet X = {−1, 0, 1}n , each of which represents one
coin. Each of the characters of X may have three different values (say 1 if the coin is heavier than
a normal one, 0 if it is regular, −1 if it is lighter). Since only one of the coins may be counterfeit,
X may be a string of all 0 (if all the coins are regular) or may present either a 1 or a −1 at only
one position. Thus, the possible configurations for X are 2n + 1.
Under the hypothesis of a uniform distribution of the probability of which coin is counterfeit, the
entropy of X will be:
3
Now let Z = [Z1 , Z2 , . . . , Zk ] be a random variable representing the weighings; each of the Zi will
have three possible values to indicate whether the result of the weighing is balanced, left arm
heavier or right arm heavier. The entropy of each Zi will be upper-bounded by the three possible
values it can assume: H(Zi ) ≤ log2 3, ∀i ∈ [1, k] and for Z (under the hypothesis of independence
of the weighings):
k
(ChainR.) X
H(Z) = H(Z1 , Z2 , . . . , Zk ) = H(Zi |Zi−1 , . . . , Z1 )
i =1
k
(Indep.) X
= H(Zi ) ≤ log2 3
i =1
Since we want to know how many weghings will yield the same amount of information which is
given by the configuration of X (i.e. we want to know how many weighings will be needed to find
out which coin - if any - is counterfeit), we can write:
X Y 0 1
0 1/3 1/3
1 0 1/3
Find:
(a) H(X), H(Y ).
(b) H(X|Y ), H(Y |X).
(c) H(X, Y ).
(d) H(Y ) − H(Y |X).
(e) I(X; Y ).
(f) Draw a Venn diagram for the quantities in parts (a) through (e).
Solution:
4
Compute of marginal distributions:
2 1
p(x) = [ , ]
3 3
1 2
p(y) = [ , ]
3 3
2 2 1 1
H(X) = − log2 − log2 (3)
3 3 3 3
H(X) = 0.918bits (4)
1 1 2 2
H(Y ) = − log2 − log2 (5)
3 3 3 3
H(Y ) = 0.918bits (6)
1
X
H(X|Y ) = p(y = i)H(X|Y = y) (7)
i=0
1 2
H(X|Y ) = H(X|Y = 0) + H(X|Y = 1) (8)
3 3
1 2
H(X|Y ) = H(1, 0) + H(1/2, 1/2) (9)
3 3
H(X|Y ) = 2/3 (10)
5
1
X
H(Y |X) = p(x = i)H(Y |X = x) (11)
i=0
2 1
H(Y |X) = H(Y |X = 0) + H(Y |X = 1) (12)
3 3
2 1
H(Y |X) = H(1/2, 1/2) + H(0, 1) (13)
3 3
H(Y |X) = 2/3 (14)
(c) H(X, Y ).
1,1
X
H(X, Y ) = p(x, y) log2 p(x, y) (15)
x=0,y=0
1 1
H(X, Y ) = −3 log2 (16)
3 3
H(X, Y ) = 1.5849625bits (17)
Figure 3: H(X,Y)
6
(d) H(Y ) − H(Y |X).
(e) I(X; Y ).
X p(x, y)
I(X; Y ) = p(x, y) log2 (20)
x,y
p(x)p(y)
1 1/3 1 1/3 1 1/3
I(X; Y ) = log2 + log2 + log2 (21)
3 (2/3)(1/3) 3 (2/3)(2/3) 3 (1/3)(2/3)
I(X; Y ) = 0.25162916 (22)
Figure 5: I(X;Y)
7
H(X1 , X2 , ..., Xn ), H(R), and H(Xn, R). Show all equalities and inequalities, and bound all the
differences.
Solution:
Lets assume that one random variable Xj (0 < j ≤ n) is known, then if R is also known, H(Xj,R)
will provide the same information about uncertainty than H(X1,X2, . . Xj, . . ,Xn), since the whole
sequence of X can be completely recovered from the knowledge of Xj and R. For example, with
X5 = 1 and the run lengths R = (3, 2, 2, 1, 2) it’s possible to recover the original sequence as follows:
Computing H(Xj) = nx p(Xj) log2 Xj, where the distribution of Xj is unknown, it can be as-
P
sumed to be: a probability of p for Xj=0 and of (1-p) for Xj=1. It can be observed that the
maximum entropy is given when p=1/2 leading max H(Xj)= 1.
Then:
Considering the results obtained in problem 2.4, we can write also that: H(R) ≤ H(X). Because
R is a function of X.