1.1 Shannon's Information Measures: Lecture 1 - January 26

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ECE 562: Information Theory

Spring 2006

Lecture 1 January 26
Lecturer: Sergio D. Servetto Scribe: Roberto Torralbas

1.1

Shannons Information Measures

Claude E. Shannon in his paper The Mathematical Theory of Communications laid out the foundation for the eld of information theory. In it Shannon introduced two fundamental concepts about information. The rst one being that information is uncertainty and the second that information to be transmitted is digital. Shannon also introduced a very important theorem know as the source coding theorem, which introduces entropy as the fundamental measure of information. Entropy characterizes the minimum rate of a source code representing an information source essentially free of error. Information measures refer to entropy, conditional entropy, mutual information, and conditional mutual information. In the following section this measures are introduced along with their properties. The physical meaning of these measures will be discussed in subsequent lectures.

1.2

Entropy
H (X ) =
x

Denition 1.1. The enrtopy H(X) of a random variable X is dened by: p(x) loga p(x) (1.1)

Entropy is the rst measure of information (average amount of uncertainty about the outcome of a random variable). In the above denition p(x) is the pmf for the random variable X. In subsequent lectures it will be shown that a, the base of the logarithm is usually taken to be the size of the code alphabet. This denition could be writen in a diferent form if we use the denition of the expectation of a random variable. Say: g: X R X is an random variable on X( X = {1...n}) Then the expectation Y = g (X ) is: E (Y ) = E (g (X )) =
x

p(x)g (x)

(1.2)

Therefore we can rewrite the enrtopy as: H (X ) = E (loga p(x)) (1.3)

1-1

ECE 562

Lecture 1 January 26

Spring 2006

Example 1.1. Take 0 p 1 an dene the binary entropy with samples in X = {1...n}as follows: hb (p) = p log2 p (1 p) log2 (1 p) , 0 < p < 1 0 , otherwise. (1.4)

At what value of p will the entropy reach a maximum?

Figure 1.1. Binary Entropy

Notice that uncertainty is based in your ability to predict. This happens when we have a fair coin toss.

1.3

The Joint Entropy

Denition 1.2. The joint entropy H(X,Y) of a pair of random variables X and Y is dened by p(x, y ) loga p(x, y ) = E (loga p(X, Y )). (1.5) H (X, Y ) =
x, y

The join entropy is the measure of the amount of uncertainty two variables contain

1-2

ECE 562

Lecture 1 January 26

Spring 2006

1.4

The Conditional Entropy

Denition 1.3. The conditional entropy H (X |Y ) of a pair of random variables X and Y is dened by p(x, y ) loga p(x|y ) = E (loga p(X |Y )). (1.6) H (X | Y ) =
x, y

Recall that p(x|y ) =

p(x,y ) p(y )

then we have that:

H (X | Y ) =
x, y

p(x, y ) loga p(x|y ) p(y )p(x|y ) loga p(x|y )


y x

= =
y

p (y )
x

p(x|y ) loga p(x|y )

=
y

p (y )H (X | Y = y )

H (X | Y ) Ey (H (X |Y ) Example 1.2. Given the table below calculate H (Y |X = 0), H (X |Y = 1), and H (X |Y ). x\y a b 0 1/4 1/4 1 1/2 0 It is obvious that if X = 0 then we have the binary entropy discussed previously and therefore H (Y |X = 0) = 1 since the probability of event a equals that of event b. Similiarly H (Y |X = 1) = 0 since there is not uncertainty as event b never occurs. Finally for H (X |Y ) we get: H (Y |X ) = P (X = 0)H (Y |X = 0) + P (X = 1)H (Y |X = 1) = (1/2) (1) + (1/2) 0 = 1/2 There are some special properties among the entropy , mutual entropy , and conditional entropy of two random variables: H (X, Y ) = H (X ) + H (Y |X ) = H (Y ) + H (X | Y )

1-3

ECE 562 Proof:

Lecture 1 January 26

Spring 2006

H (X, Y ) = H (X ) + H (Y | X ) = H (X ) + H (Y | X ) = E (loga p(X, Y )) = E (loga p(X )p(Y |X ) = E (loga p(X )) E (loga p(Y |X ) = H (X ) + H (Y | X )

1.5

Mutual Information

Denition 1.4. The mutual information, I(X,Y), of a pair of random variables X and Y is dened by: p(x, y ) p(x, y ) p(x, y ) loga ( I (X, Y ) = ) = E (loga ( )). (1.7) p ( x ) p ( y ) p ( x ) p ( y ) x, y -Notice that if Y and X are independent then the mutual information between the two random variables is zero ( I(X,Y)=0 for Y X ). -If P(X=Y)= 1 I(X,Y)=H(X)=H(Y). Proof: I (X, X ) = E (loga ( p(x) 1 ) = E (loga ( )) = H (x) p(x)p(x) p(x)

Finally the I(X,Y) has the following properties: I (X, Y ) = H (X ) H (X | Y ) = H (Y ) H (Y | X ) = H (X ) + H (Y ) H (Y, X )

1-4

ECE 562

Lecture 1 January 26

Spring 2006

Figure 1.2. Venn Diagram Relating the Mutual Information, Entropy, and Conditional Entropy of Two Random Variables

1-5

You might also like