1.1 Shannon's Information Measures: Lecture 1 - January 26

ECE 562: Information Theory
Spring 2006
Lecture 1 January 26
Lecturer: Sergio D. Servetto Scribe: Roberto Torralbas
1.1
Shannons Information Measures
Claude E. Shannon in his paper The Mathematical Theory of Communications laid out the foundation for the eld of information theory. In it Shannon introduced two fundamental concepts about information. The rst one being that information is uncertainty and the second that information to be transmitted is digital. Shannon also introduced a very important theorem know as the source coding theorem, which introduces entropy as the fundamental measure of information. Entropy characterizes the minimum rate of a source code representing an information source essentially free of error. Information measures refer to entropy, conditional entropy, mutual information, and conditional mutual information. In the following section this measures are introduced along with their properties. The physical meaning of these measures will be discussed in subsequent lectures.
1.2
Entropy
H (X ) =
x
Denition 1.1. The enrtopy H(X) of a random variable X is dened by: p(x) loga p(x) (1.1)
Entropy is the rst measure of information (average amount of uncertainty about the outcome of a random variable). In the above denition p(x) is the pmf for the random variable X. In subsequent lectures it will be shown that a, the base of the logarithm is usually taken to be the size of the code alphabet. This denition could be writen in a diferent form if we use the denition of the expectation of a random variable. Say: g: X R X is an random variable on X( X = {1...n}) Then the expectation Y = g (X ) is: E (Y ) = E (g (X )) =
x
p(x)g (x)
(1.2)
Therefore we can rewrite the enrtopy as: H (X ) = E (loga p(x)) (1.3)
1-1
ECE 562
Spring 2006
Example 1.1. Take 0 p 1 an dene the binary entropy with samples in X = {1...n}as follows: hb (p) = p log2 p (1 p) log2 (1 p) , 0 < p < 1 0 , otherwise. (1.4)
At what value of p will the entropy reach a maximum?
Figure 1.1. Binary Entropy
Notice that uncertainty is based in your ability to predict. This happens when we have a fair coin toss.
1.3
The Joint Entropy
Denition 1.2. The joint entropy H(X,Y) of a pair of random variables X and Y is dened by p(x, y ) loga p(x, y ) = E (loga p(X, Y )). (1.5) H (X, Y ) =
x, y
The join entropy is the measure of the amount of uncertainty two variables contain
1-2
ECE 562
Spring 2006
1.4
The Conditional Entropy
Denition 1.3. The conditional entropy H (X |Y ) of a pair of random variables X and Y is dened by p(x, y ) loga p(x|y ) = E (loga p(X |Y )). (1.6) H (X | Y ) =
x, y
Recall that p(x|y ) =
p(x,y ) p(y )
then we have that:
H (X | Y ) =
x, y
p(x, y ) loga p(x|y ) p(y )p(x|y ) loga p(x|y )

y x
= =
y
p (y )
x
p(x|y ) loga p(x|y )
=
y
p (y )H (X | Y = y )
H (X | Y ) Ey (H (X |Y ) Example 1.2. Given the table below calculate H (Y |X = 0), H (X |Y = 1), and H (X |Y ). x\y a b 0 1/4 1/4 1 1/2 0 It is obvious that if X = 0 then we have the binary entropy discussed previously and therefore H (Y |X = 0) = 1 since the probability of event a equals that of event b. Similiarly H (Y |X = 1) = 0 since there is not uncertainty as event b never occurs. Finally for H (X |Y ) we get: H (Y |X ) = P (X = 0)H (Y |X = 0) + P (X = 1)H (Y |X = 1) = (1/2) (1) + (1/2) 0 = 1/2 There are some special properties among the entropy , mutual entropy , and conditional entropy of two random variables: H (X, Y ) = H (X ) + H (Y |X ) = H (Y ) + H (X | Y )
1-3
ECE 562 Proof:
Spring 2006
H (X, Y ) = H (X ) + H (Y | X ) = H (X ) + H (Y | X ) = E (loga p(X, Y )) = E (loga p(X )p(Y |X ) = E (loga p(X )) E (loga p(Y |X ) = H (X ) + H (Y | X )
1.5
Mutual Information
Denition 1.4. The mutual information, I(X,Y), of a pair of random variables X and Y is dened by: p(x, y ) p(x, y ) p(x, y ) loga ( I (X, Y ) = ) = E (loga ( )). (1.7) p ( x ) p ( y ) p ( x ) p ( y ) x, y -Notice that if Y and X are independent then the mutual information between the two random variables is zero ( I(X,Y)=0 for Y X ). -If P(X=Y)= 1 I(X,Y)=H(X)=H(Y). Proof: I (X, X ) = E (loga ( p(x) 1 ) = E (loga ( )) = H (x) p(x)p(x) p(x)
Finally the I(X,Y) has the following properties: I (X, Y ) = H (X ) H (X | Y ) = H (Y ) H (Y | X ) = H (X ) + H (Y ) H (Y, X )
1-4
ECE 562
Spring 2006
Figure 1.2. Venn Diagram Relating the Mutual Information, Entropy, and Conditional Entropy of Two Random Variables
1-5

1.1 Shannon's Information Measures: Lecture 1 - January 26

Uploaded by

Copyright:

Available Formats

1.1 Shannon's Information Measures: Lecture 1 - January 26

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1.1 Shannon's Information Measures: Lecture 1 - January 26

Uploaded by

Copyright:

Available Formats

ECE 562: Information Theory

Shannons Information Measures

Therefore we can rewrite the enrtopy as: H (X ) = E (loga p(x)) (1.3)

At what value of p will the entropy reach a maximum?

Figure 1.1. Binary Entropy

The Joint Entropy

The Conditional Entropy

Recall that p(x|y ) =

then we have that:

p(x, y ) loga p(x|y ) p(y )p(x|y ) loga p(x|y )

p(x|y ) loga p(x|y )

ECE 562 Proof:

Finally the I(X,Y) has the following properties: I (X, Y ) = H (X ) H (X | Y ) = H (Y ) H (Y | X ) = H (X ) + H (Y ) H (Y, X )

You might also like