SummaryFeb5 2024
SummaryFeb5 2024
SummaryFeb5 2024
In blue, references to the book Elements of Information Theory by T.M. Cover and J.A. Thomas (2nd edition)
That is,
PX × PY (x, y) = PX (x)PY (y)
Then, the mutual information between X and Y is defined as
I(X; Y ) ≜ D(PXY ∥PX × PY )
Definition 7 (Conditional mutual information). Let X, Y , and Z be three random variables with joint PMF PXY Z taking
value in X × Y × Z. Then, the conditional mutual information between X and Y given Z is defined as
I(X; Y |Z) ≜ H(X|Z) − H(X|Y, Z)
→ We have
I(X; Y |Z) = H(X|Z) − H(X|Y, Z)
= H(Y |Z) − H(Y |X, Z)
" #
PXY |Z (X, Y |Z)
= E log
PX|Z (X|Z)PY |Z (Y |Z)
Theorem 2 (Theorem 2.5.2—“Chain rule for mutual information”). Let X, Y , and Z be three random variables with joint
PMF PXY Z taking value in X × Y × Z. Then
I(X, Y ; Z) = I(X; Z) + I(Y ; Z|X) = I(Y ; Z) + I(X; Z|Y )
→ The chain rule for mutual information can be generalized to n + 1 random variables:
n
X
I(X1 , . . . , Xn ; Y ) = I(Xk ; Y |X1k−1 )
k=1
→ If f has a second derivative that is nonnegative everywhere, then f is convex. If the second derivative is positive,
then f is strictly convex.
→ If f has a second derivative that is nonpositive, then f is concave. If the second derivative is negative, then f is
strictly concave.
→ The above definitions also apply to functions that do not have a second derivative. (Theorem 2.6.1)
1
Theorem 3 (Theorem 2.6.2—“Jensen’s inequality”). If f is a convex function and X is a random variable, then
E[f (X)] ≥ f (E[X])
Moreover, if f is strictly convex, then E[f (X)] = f (E[X]) implies that X = E[X] with probability one.
→ Jensen’s inequality implies that, if f is a concave function and X is a random variable, then
E[f (X)] ≤ f (E[X])
Moreover, if f is strictly concave, then E[f (X)] = f (E[X]) implies that X = E[X] with probability one.
Theorem 4 (Theorem 2.6.3—“Information inequality”). Let P and Q be two PMFs taking value in X . Then
D(P ∥Q) ≥ 0
with equality if, and only if, P (x) = Q(x), x ∈ X .