Capacity, Mutual Information, and Coding For Finite-State Markov Channels
Capacity, Mutual Information, and Coding For Finite-State Markov Channels
Capacity, Mutual Information, and Coding For Finite-State Markov Channels
3, MAY 1996
Abstruct- The Finite-State Markov Channel (FSMC) is a where h is the entropy function, qn = p ( z n = 1 1 zn-’), q,
discrete time-varying channel whose variation is determined by converges to qm in distribution, and qW is independent of the
a finite-state Markov process. These channels have memory
initial channel state.
due to the Markov channel variation. We obtain the FSMC
capacity as a function of the conditional channel state proba- In this paper we derive the capacity of a more general
bility. We also show that for i.i.d. channel inputs, this condi- finite-state Markov channel, where the channel states are not
tional probability converges weakly, and the channel’s mutual necessarily BSC’s. We model the channel as a Markov chain
information is then a closed-form continuous function of the S, which takes values in a finite state space C of memoryless
input distribution. We next consider coding for FSMC’s. In
general, the complexity of maximum-likelihood decoding grows
channels with finite input and output alphabets. The condi-
exponentially with the channel memory length. Therefore, in tional input/output probability is thus p(y, I x,,S,), where
practice, interleaving and memoryless channel codes are used. x, and y, denote the channel input and output, respectively.
This technique results in some performance loss relative to The channel transition probabilities are independent of the
the inherent capacity of channels with memory. We propose a input, so our model does not include IS1 channels. We refer to
maximum-likelihood decision-feedback decoder with complexity
that is independent of the channel memory. We calculate the the channel model as a finite-state Markov channel (FSMC).
capacity and cutoff rate of our technique, and show that it If the transmitter and receiver have perfect state information,
preserves the capacity of certain FSMC’s. We also compare then the capacity of the FSMC is just the statistical average
the performance of the decision-feedback decoder with that of over all states of the corresponding channel capacity [2]. On
interleaving and memoryless channel coding on a fading channel
with 4PSK modulation.
the other hand, with no information about the channel state
or its transition structure, capacity is reduced to that of the
Index Terms-Finite-state Markov channels, capacity, mutual Arbitrarily Varying Channel [3]. We consider the intermediate
information, decision-feedback maximum-likelihood decoding.
case, where the channel transition structure of the FSMC is
known.
I. INTRODUCTION The memory of the FSMC comes from the dependence of
HIS PAPER extends the capacity and coding results the current channel state on past inputs and outputs. As a
of Mushkin and Bar-David [l] for the Gilbert-Elliot result, the entropy in the channel output is a function of the
channel to a more general time-varying channel model. The channel state conditioned on all past outputs. Similarly, the
Gilbert-Elliot channel is a stationary two-state Markov chain, conditional output entropy given the input is determined by
where each state is a binary-symmetric channel (BSC), as in the channel state probability conditioned on all past inputs and
Fig. 1. The transition probabilities between states are g and b, outputs. We use this fact to obtain a formula for channel ca-
respectively, and the crossover probabilities for the “good” and pacity in terms of these conditional probabilities. Our formula
“bad” BSC’s are p~ and p ~ respectively, , where p~ < p ~ .can be computed recursively, which significantly reduces its
Let 2 , E {O, I}, y, E (0. l}, and xn = 5 , c€ yn denote. computation complexity. We also show that when the channel
respectively, the channel input, channel output, and channel inputs are i.i.d., these conditional state probabilities converge
error on the nth transmission. In [l], the capacity of the in distribution, and their limit distributions are continuous
Gilbert-Elliot channel is derived as functions of the input distribution. Thus for any i.i.d. input
distribution 8,the mutual information of the FSMC is a closed-
form continuous function of 0. This continuity allows us to
find li.i.d, the maximum mutual information relative to all
i.i.d. input distributions, using straightforward maximization
techniques. Since 1i.i.d < C , our result provides a simple lower
Manuscript received February 18, 1994; revised September 15, 199.5. This
work was supported in part by an IBM graduate fellowship, and in part by the
bound for the capacity of general FSMC’s.
PATH program, Institute of Transportation Studies, University of California, The Gilbert-Elliot channel has two features which facilitate
Berkeley. The material in this paper was presented in part at the IEEE its capacity analysis: its conditional entropy H ( Y ” 1 X n ) is
International Symposium on Information Theory, Trondheim, Norway, June
1994. independent of the input distribution, and it is a symmetric
A. J. Goldsmith is with the Dcpartment of Electrical Engineering, Califomia channel, so a uniform input distribution induces a uniform
Institute of Technology, Pasadena, CA 91 125 USA. output distribution. We extend these properties to a general
P. P. Varaiya is with the Department of Electrical Engineering and Computer
Science, University of California, Berkeley, CA 94720 USA. class of FSMC’s and show that for this class, 1i.i.d equals the
Publisher Item Identifier S 0018-9448(96)0293.5-5. channel capacity. This class includes channels varying between
0018-9448/96$05.00 0 1996 IEEE
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
GOLDSMITH AND VAKAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS X69
1
1-PG
Fig. 1. Gilbert-Elliot channel.
any finite number of BSC’s, as well as quantized additive diversity is implemented with the code metric instead of the
white noise (AWN) channels with symmetric PSK inputs and interleaver. Thus as with interleaving and memoryless channel
time-varying noise statistics or amplitude fading. encoding, channel correlation information is ignored with these
In principle, communication over a finite-state channel is coding schemes. Maximum-likelihood sequence estimation for
possible at any rate below the channel capacity. However, fading channels without coding has been examined in [lo],
good maximum-likelihood (ML) coding strategies for channels [ l l ] . However, it is difficult lo implement coding with these
with memory are difficult to determine, and the decoder schemes due to the code delays. In our scheme, coding delays
complexity grows exponentially with memory length. Thus a do not result in state decision delays, since the decisions are
common strategy for channels with memory is to disperse the based on estimates of the coded bits. We can introduce coding
memory using an interleaver: if the span of the interleaver is in our decision-feedback scheme with a consequent increase
long, then the cascade of the interleaver, channel, and deinter- in delay and complexity, as we will discuss in Section VI.
leaver can be considered memoryless, and coding techniques The remainder of the paper is organized as follows. In
for memoryless channels may be used [4]. However, this Section I1 we define the FSMC, and obtain some properties of
cascaded channel has a lower inherent Shannon capacity than the channel based on this definition. In Section I11 we derive a
the original channel, since coding is restricted to memoryless recursive relationship for the distribution of the channel state
channel codes. conditioned on past inputs and outputs, or on past outputs
The complexity of ML decoding can be reduced signif- alone. We also show these conditional state distributions
icantly without this capacity degradation by implementing converge to limit distributions for i.i.d. channel inputs. In
a decision-feedback decoder, which consists of a recursive Section IV we obtain the capacity of the FSMC in terms of
estimator for the channel state distribution conditioned on past the condition state distributions, and obtain a simple formula
inputs and outputs, followed by an ML decoder. We will see for I , , d . Uniformly symmeiric variable-noise FSMC’s are
that the estimate T , = p ( S n 1 znp1, , X ~ . ~ ~ - ~ , . . is J ~ ) in Section V. For this channel class (which includes the
. , ?defined
a sufficient statistic for the ML decoder input, given all past Gilbert-Elliot channel), capacity is achieved with uniform i.i.d.
inputs and outputs. Thus the ML decoder operates on a memo- channel inputs. In Section VI we present the decision-feedback
ryless system. The only additional complexity of this approach decoder, and obtain the capacity and cutoff rate penalties of the
over the conventional method of interleaving and memoryless decision-feedback decoding scheme. These penalties vanish
channel encoding is the recursive calculation of 7rn. We will for uniformly symmetric variable-noise channels. Numerical
calculate the capacity penalty of the decision-feedback decoder results for the capacity and cutoff rate of a two-state variable-
for general FSMC’s (ignoring error propagation), and show noise channel with 4PSK modulation and decision-feedback
that this penalty vanishes for a certain class of FSMC’s. decoding are presented in Section VII.
The most common example of an FSMC is a correlated
fading channel. In [ 5 ] ,an FSMC model for Rayleigh fading 11. CHANNELMODEL
is proposed, where the channel state varies over binary-
symmetric channels with different crossover probabilities. Our Let S, be the state at time n of an irreducible, aperiodic, sta-
recursive capacity formula is a generalization of the capacity tionary Markov chain with state space C = { cl . . C K } . S, is
s
found in [ 5 ] , and we also prove the convergence of their positive recurrent and ergodic. The state space C corresponds
to K different discrete memoryless channels (DMC’s), with
recursive algorithm. Since capacity is generally unachievable
for any practical coding scheme, the channel cutoff rate common finite input and output alphabets denoted by X and
JJ, respectively. Let P be the matrix of transition probabilities
indicates the practical achievable information rate of a channel
for S . so
with coding. The cutoff rate for correlated fading channels
with MPSK inputs, assuming channel state information at the
PA,,,,= p ( S n + ~= cm j Sn = CA,) (2)
receiver, was obtained in [6]: we obtain the same cutoff rate
on this channel using decision-feedback decoding. independent of n by stationarity. We denote the input and
Most coding techniques for fading channels rely on built-in output of the FSMC at time r i by z,, and y, respectively,
time diversity in the code to mitigate the fading effect. Code and we assume that the channel inputs are independent of its
designs of this type can be found in [7]-[9] and the references states. We will use the notation
therein. These codes use the same time-diversity idea as
n
interleaving and memoryless channel encoding, except that the T n = ( T I , . . .T n )
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
870 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
111. CONDITIONAL
STATEDISTRIBUTION
The conditional channel state distribution is the key to
determining the capacity of the FSMC through a recursive
algorithm. It is also a sufficient statistic for the input given all
past inputs and outputs, thus allowing for the reduced complex-
ity of the decision-feedback decoder. In this section we show
that the state distribution conditioned on past input/output pairs
can be calculated using a recursive formula. A similar formula
is derived for the state distribution conditioned on past outputs
alone, under the assumption of independent channel inputs. We
also show that these state distributions converge weakly under
i i d . inputs, and the resulting limit distributions are continuous
functions of the input distribution.
We denote these conditional state distributions by the K -
dimensional random vectors T~ = ( T ~l ) ( ,. . . ,.irn(K))and
P(Pn+l = 01 I P n = P ) = f ^ ( Y n , P )= 4
YnEY
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
~
GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 87 1
lirn ~ [ f ( 7 r ~ =
) ] lim E [ ~ ( T ; ) ]
n-oo n-m
and
where
and
r K 1
and
P: = p ( S n I F1,
SO = ea).
This convergence allows us to obtain a closed-form solution Using this lemma in (19) and (20) and substituting into
for the mutual information under i.i.d. inputs. We also show (IS) yields the following theorem.
in Lemmas A2.3 and A2.5 of Appendix I1 that the limit Theorem 4.1: The capacity of the FSMC is given by
distributions for 7r and p are continuous functions of the input
distribution. 1
C = lim max -
Lemmas A2.6 and A2.7 of Appendix I1 show the surprising "-00 P(X-) n
result that 7rn and pn are not necessarily Markov chains when n r r K
the input distribution is Markov. Since the weak convergence
of 7rn and prl requires this Markov property, (15) and (16) are i=l L L k=l J
not valid for general Markov inputs.
properties of the entropy and mutual information when the to calculate than Gallager's formula (17), since the 7rn terms
channel inputs are i.i.d. can be computed recursively. The recursive calculation for pt
By definition, the Markov chain Sn is aperiodic and irre- requires independent inputs. Ilowever, for many channels of
ducible over a finite state space, so the effect of its initial state interest H ( Y , 1 p a ) will be a constant independent of the input
dies away exponentially with time [12]. Thus the FSMC is an distribution (such channels are discussed in Section V). For
indecomposable channel. The capacity of an indecomposable these channels, the capacity calculation reduces to minimizing
channel is independent of its initial state, and is given by [13, the second term in (23) relative to the input distribution, and
Theorem 4.6.41 the complexity of this minimization is greatly reduced when
7rt can be calculated easily.
1
C = lim max - I ( X n ; Y n ) (17) Using Lemma 4.1, we can also express the capacity as
n-w~(xn) rb
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
SI2 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
Lemma 4.2: When the channel inputs are stationary Proof: From (18)
If we fix 8 E P ( X )
n
H(Y" 1 X") = H(yz I x,,
Y2',X2-') (31)
Lemma 4.3: For i.i.d. input distributions, the following z=1
limits exist and are equal:
by (20), and the terms of the summation are nonnegative and
lim H(Y, I X,,X~--~,Y~-') monotonically decreasing in i by Lemma 4.2. Thus
n i m
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
GOLDSMlTH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS x73
V. UNIFORMLY
SYKMETRIC
VARIABLE-NOISE
CHANNELS Lemma 5.2: For uniformly symmetric variable-noise
In this section we c' h e two classes of FSMC's: uniformly FSMC's and all i , H(Yn I Xn,7r,) and H ( Y , 1 X,,n;)
symmetric channels i I variable-noise channels. The mutual do not depend on the input distribution.
information and capa ty of these channel classes have ad- Consider an FSMC where each c k t C is an AWN channel
ditional properties wl ch we outline in the lemmas below. with noise density n k . If we let 2 = Y X , then it is
~
Moreover, we will show in the next section that the decision- easily shown that this is a variable-noise channel. However,
feedback decoder achieves capacity for uniformly symmetric such channels have an infinite output alphabet. In general, the
variable-noise FSMC's. output of an AWN channel is quantized to the nearest symbol
Dejnition: For a DMC, let M denote the matrix of in- in a finite output alphabet: we call this the quantized AWN
putloutput probabilities (Q-AWN) channel.
If the Q-AWN channel has a symmetric multiphase input
n alphabet of constant amplitude and output phase quantization
Mij = p(y = .j I 2 =i)j j E y. 1; E x. [4, p. 801, then it is easily checked that pk:(y I x) depends only
A discrete memoryless channel is output-symmetric if the rows on pk( Iy-xl), which in turn depends only on the noise density
of M are permutations of each other, and the columns of M n k . Thus it is a variable-noise channeL3 We show in Appendix
are permutations of each other.2 VI that variable-noise Q-AWN channels with the same input
Dejinition: A FSMC is uniformly symmetric if every chan- and output alphabets are also uniformly symmetric. Uniformly
nel c k E C iy output-symmetric. symmetric variable-noise channels have the property that 1i.i.d.
The next lemma, proved in Appendix V, shows that for equals the channel capacity, as we show in the following
uniformly symmetric FSMC's, the conditional output entropy theorem.
is maximized with uniform i.i.d. inputs. Theorem 5.I : Capacity of uniformly symmetric variable-
Lemma 5.1: For uniformly symmetric FSMC's and any noise channels is achieved with an input distribution that is
initial state So = c,, H ( Y , I p n ) . H ( Y , I &), H ( Y , I 7 r n ) , uniform and i.i.d. The capacity is given by
and H ( Y , I T A ) are all maximized for a uniform and i.i.d. r
input distribution, and these maximum values equal log IYI.
Definition: Let X , and Y, denote the input and output,
respectively, of an FSMC. We say that an FSMC is a variable-
noise channel if there exists a function such that for 2, =
4 ( X n , Y , ) , p ( Z n I X n ) = p ( Z n ) , and 2" is a sufficient
statistic for S" (so S" is independent of X " and Y" given
Z n ) . Typically, d, is associated with an additive noise channel, where p is the limiting distribution for T , under uniform i.i.d.
as we discuss in more detail below. inputs. Moreover, C = limn-+ooC, = limn+oo (7; for all i ,
If 2" is a sufficient statistic for S",then where
n
7r" = p ( S , I xn-1,yn-l)
=p(S, I x-1, y71-1, 2"-1 ) =p(sn
I zrL--l).
(36)
increases with n, and
Using (36) and replacing the pairs (X,.Y,) with Z,, in
the derivation of Appendix I, we can simplify the recursive
calculation of 7r,
decreases with n.
Proof From Lemmas 5.1 and 5.2, C,; Ci, and C are
all maximized with uniform i.i.d. inputs. With this input
distribution
where D ( z n ) is a diagonal K x K matrix with kth diagonal
term p ( z n I S, = c k ) . The transition probabilities are also Crl = log IYI - H(Y" I X,,n,)
simplified
and
c; = log /YI - H(Y" I Xn,7rA).
2,€2 Applying Lemmas 4.2 and 4.3, we get that H ( Y , I X7,,7rn)
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
~
874 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
n n
INTER-
ENCODER LEAVER FSMC DECODER
' j / - I :
................................................... :
converge to the same limit. Finally, under uniform i i d . inputs channel state dies away, the received symbols within any
row of the deinterleaver become independent as J becomes
infinite. However, the symbols within any column of the
deinterleaver are received from consecutive channel uses,
and are thus dependent. This dependence is called the latent
channel memory, and the state estimator enables the ML
by Lemma 4.1 and (32). Applying Lemma 4.6 to decoder to make use of this memory.
Specifically, the state estimator uses the recursive relation-
lim H ( Y n 1 X,,nn)
12-03 ship of (1 1) to estimate nn. It will be shown below that the ML
decoder operates on a memoryless system, and can therefore
completes the proof. 0
determine the ML input sequence on a per-symbol basis. The
The BSC is equivalent to a binary-input Q-AWN channel
input to the ML decoder is the channel output y, and the
with binary quantization [4]. Thus an FSMC where C k indexes
state estimate e,, and its output is the z, which maximizes
a set of BSC's with different crossover probabilities is a
logp(y,. ?, 1 z n ) , assuming equally likely input ~ y m b o l s . ~
uniformly symmetric variable-noise channel. Therefore, both
The soft-decision decoder uses conventional techniques (e.g.,
[l, Proposition 41 and the capacity formula obtained in [ 5 ] are
Viterbi decoding) with branch metrics
corollaries of Theorem 5.1.
(43)
VI. DECISION-FEEDBACK
DECODER
A block diagram for a system with decision-feedback de- We now evaluate the information, capacity, and cutoff rates
coding is depicted in Fig. 3. The system is composed of a of a system using the decision-feedback decoder, assuming
conventional (block or convolutional) encoder for memory- Cn = nn (i.e., ignoring error propagation). We will use the
less channels, block interleaver, FSMC, decision-feedback de- A
notation y31 = yn to explicitly denote that yn is in the j t h
coder, and deinterleaver. Fig. 4 outlines the decision-feedback n
row and Ith column of the deinterleaver. Similarly, njl = T ,
decoder design, which consists of a channel state estimator A
followed by an ML decoder. We will show in this section and zjl = 2 , denote, respectively, the state estimate and
that if we ignore error propagation, a system employing this interleaver input corresponding to yjl. Assume now that the
decision-feedback decoding scheme on uniformly symmetric state estimator is reset every J iterations so, for each 1, the
variable-noise channels is information-lossless:it has the same state estimator goes through j recursions of (1 1) to calculate
capacity as the original FSMC, given by (30) for i.i.d. uniform ~ ~ Byl (12),
. this recursion induces a distribution p(njl) on
inputs. Moreover, we will see that the output of the state n j that
~ depends only on p ( X j - ' ) . Thus the system up to the
estimator is a sufficient statistic for the current output given all output of the state estimator is equivalent to a set of parallel
past inputs and outputs, which reduces the system of Fig. 3 to z-output channels, where the n-output channel is defined, for
a discrete memoryless channel. Thus the ML input sequence a given j , by the input zjl, the output pair (yjl, njl), and the
is determined on a symbol-by-symbol basis, eliminating the inputloutput probability
complexity and delay of sequence decoders.
The interleaver works as follows. The output of the encoder P(Yjl,Tjl I X j l ) = C P k ( Y j l I xjz)Tjl(k)P(njl). (44)
k
is stored row by row in a J x L interleaver, and transmitted
over the channel column by column. The deinterleaver per- 41f the xn are not equally likely, then logp(s,) must be added to the
forms the reverse operation. Because the effect of the initial decoder metric.
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 87.5
/
Let
- .I
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
~
876 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
"G "8
I
I I
X
*
Y t
1-b 1-9
TWO-STATE CHANNEL
:J
later corrects this initial rr, estimate if the decoded symbols - n=15
- n=l5
differ from their initial estimates [ 181. This method truncates 08 _ _ n=10
. _ n=5
_
the number of symbols affected by an incorrect decision, at n=O
a cost of increased complexity to recalculate and update the
0.6
rrn values. Finally, decision-feedback decoding can be done
in parallel, where each parallel path corresponds to a different g=b=.l
estimate of the received symbol. The number of parallel paths da= .01
0.4
will grow exponentially in this case, however we may be able
to apply some of the methods outlined in [I91 and [20] to
reduce the number of paths sustained through the trellis.
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
~
..;
GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 877
and
.
2
c
5
0
m
1.6
1.4
noise channels with symmetric PSK inputs as well as channels
which vary over a finite set of BSC's. For general FSMC's,
we obtained the capacity and cutoff rate penalties of the
1.2
., / . - ' - ---- decision-feedback decoding scheme.
_ _ - -- - _ - - - -
We also presented numerical results for the performance of
1 .o
-- .. ....,...
the decision-feedback decoder on a two-state variable-noise
..... . , ,
channel with 4PSK modulation. These results demonstrate
0.8
..... . . . . . ..,.. .. . ' '
_ - cutoff rate (b=.l) use interleaving and memoryless channel encoding, and the
0.4 . _ capacity
_ (b=.9) improvement is most pronounced on quasistatic channels.
cutoff rate (b=.9) This result is intuitive, since the longer the FSMC stays
0.2
in a given state, the more accurately the state estimator
0.9 I I I I I I I I I 1 will predict that state. Finally, we present results for the
) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
decoder performance relative to the average fade duration;
9
as expected, the performance improves as the average fade
Fig. 9. Decoder performance versu5 g duration decreases.
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
~~
where we again use Bayes rule and the last equality follows To obtain the weak convergence of rn and ,on, we also
from (5). Substituting (56) in the denominator of ( 5 9 , and assume that the channel inputs are i.i.d., since we can then ap-
canceling the common terms p ( z , I IC,-')
and p(xn-', 1~"~') ply convergence results for partially observed Markov chains
yields n
[21]. Consider the new stochastic process U, = (S,,y,, z,)
defined on the state space U = C x Y x X. Since S, is
P(Sn I xn,Yn)
stationary and ergodic and z, is i.i.d., U, is stationary and
-
P(Yn I Sn,zn)p(Sn I xn-l,yn-l) ergodic. It is easily checked that U, is Markov.
P(Yn I s n = Ck,zn)P(sn = Ck 1 xn-'.yn-') Let (S. y. z ) j denote the jth element of U,and J = IUI. To
n
ktK
(57) specify its individual components, we use the notation
n
which, for a particular value of S,, becomes ) )( S , Y > Z ) j .
( s ( j ) > Y ( j ) > q=
P(S, = Cl I x n , Y n ) The J x J probability transition matrix for U , P u , is
-
- P(Yn I sn = CL,z:,)P(Sn = c1 I zn-l,yn-l)
c P(Yn I s, = Ck,x,)P(Sn = C k I xn-1%Yn-l) = P[(Sn+l,Yn+l,GL+l)= (S,Y,.)j I (Sn,Yn,Zn)
k€K = (S,Y,.)k)l (61)
(58)
independent of n. The initial distribution of U , T:, is given by
Finally, from (4)
P ( S 0 = C k ; Yo = Y,5 0 = = TO(k)Pk(YO I zo)p(zo). (62)
P(S,+I -- Cl I z n ,Y") = P(S, = cg I z n ,Y n ) P 3 1 . (59)
JEK Let gy.z: U i Y x X and gy : U i Y be the projections
Substituting this into (58) yields the desired result = (Yn,GL)
gy,z(Sn,Yn,~n)
and
APPENDIXI1
gy(Sn,Yn,Zn)= (Yn).
In this Appendix we show that for i.i.d. inputs, T , and
,on are Markov chains that converge in distribution to a limit These projections form the new processes W, = g,,x[U,] and
which is independent of the initial channel state, and that the V, = gy[Un]. We regard W, and V, as partial observations
resulting limit distributions are continuous functions of the of the Markov chain U,; the pairs (Un,W,) and (U,, Vn)
input distribution ~ ( I c )We
. also show that the Markov property are referred to as partially observed Markov chains. The
does not hold for Markov inputs. distribution of U, conditioned on W, and V,, respectively, is
We begin by showing the Markov property for independent
7r: = (T,u(1),".,7r,u(J))
inputs.
Lemma A2.1: For independent inputs, 7rn is a Markov and
chain. U-
P, = (,o:(l), . . . ,P , u ( J ) )
Proof:
where
I
P(Tn+l T n , . . . , T O ) = P(T,+l I T n , . . . , T O ,Z, Yn)
x n ,Yn T3.i) = P(U, = ( S , Y , Z ) j I Wn) (63)
.P(Z,, Yn IT,,.. '1 710) and
= P(T,+~ I Tn, , IT,)
~ n ) p ( x n~n = P(U, = ( S , Y , Z ) j I V"). (64)
XnrYn
Note that
= P(Tn+l Tn) I (60)
where the second equality follows from (1 1) and (6). Thus 7r, .,u(j) = P(Un = Y , 4 J I W")
(S,
is Markov. A similar argument using (13) and (8) shows that = P(S, = q3)I Z n ,w")li.n = Z(3), Y n = Y(3)l
,on is also Markov for independent inputs. = n n ( k ) l [ z ,= X ( 3 ) , ? J n = Y(3)I (65)
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 879
where S(jl = c k . Thus if n," converges in distribution, n, and note that for any y E Y and :E E X
must also converge in distribution. Similarly, p n converges in
distribution if does. M&, x) = N&)I(ic(j) = x). (70)
We will use the following definition for subrectangular To apply Theorem A2.1 to p t , we must find a sequence
matrices in the subsequent theorem. y1, . . . , y1 which yields a nonzero and subrectangular matrix
Definition: Let D = ( D r 3 )denote a d x d matrix. If for the product N(y1) . . . N ( y l ) .Consider the projection onto
Dz1,71 # 0 and Dz2,32# 0 implies that also DzlrJ2# 0 and y of the sequence ( y n , z,), 71 = 1,. . . , nr, from Assumption
DZLr3, # 0, then D is called a subrectangular matrix. 1. Let yn,n = 1,. . . , m denote this projection. Using (70)
We can now state the convergence theorem, due to Kaijser and the fact that all the elements of M are nonnegative, it
[21], for the distribution of a Markov chain conditioned on A
is easily shown that for M = M(yl,sl)...M(y,,z,) and
partial observations. A
Theorem A2. I : Let U, be a stationary and ergodic Markov N = N(y1) . . . N(g,), if for any i and ,I, Mi,j is nonnegative,
chain with transition matrix Pu and state space U . Let y be then Ni,j is nonnegative also. From this we deduce that if M
a function with domain U and range Z.Define a new process is nonzero and subrectangular, then N must also be nonzero
2, = g(U,). For z E Z and U(,) the j t h element of U , define and subrectangular.
matrix M ( z ) by We can now apply Theorem A2.1 to p:, which yields the
convergence in distribution OS (1: and thus p,. Moreover, the
limit distributions of these random vectors are independent of
their initial states. Thus we get the following result, which
Suppose that P u and 9 are such that there exists a finite was stated in (16).
sequence z1,. . . , z , of elements in Z that yield a nonzero sub- Lemma A2.4: For any bounded continuous function f , the
rectangular matrix for the matrix product M ( z 1 ). . . M(z,). following limits exist and are equal for all i :
Then p ( U,, I Z n ) converges in distribution and moreover the
limit distribution is independent of the initial distribution of U .
We first apply this theorem to T,". From (13) and (14), the limit distribution of p n is also a
Assumption 1: Assume that there exists a finite sequence function of the input distribution. The following lemma shows
(yn, .E,), n = 1,.. . , rrL, such that the matrix product that the limit distribution of prL is continuous on P ( X ) .
M(y1, z1). . . M(y,, z,) is nonzero and subrectangular, Lemma A2.5: Let U' denote the limit distribution of p n as
where a function of the i.i.d. distribution I9 E P ( X ) . Then U' is a
continuous function of H , so 0, 4H implies that vBn' 4U'.
Proof of Lemmas A2.3 and A2.5: We must show that for
all 0,, 0 E P ( X ) ,if Qm --+ 0, then p', --+ p' and u ' ~ ~ u~8 . ---f
Then by Theorem A2.1, n," converges in distribution to a limit We first show the convergence of vBm. From [12, p. 3461,
which is independent of the initial distribution. By (65), this in order to show that I,'- --+ U', it suffices to show that
implies that 7rTL also converges in distribution, and its limit (U',} is a tight sequence of probability measure^,^ and that
distribution is independent of T O . We thus get the following any subsequence of U'- which converge? weakly converges
lemma, which was stated in (15). to U'.
Lemma A2.2: For any bounded continuous function f , the Tightness of the sequence {U',} follows from the fact that
following limits exist and are equal for all i A is a compact set. Now suppose there is a subsequence
n
lim E[.f(7rR)].
lini E[f(7rn)]= n-cc (68) U',, = U', which converges weakly to ,si/. We must show that
n-cc
(Sl, = U ' , where 1' is the unique invariant distribution for p
The subrectangularity condition on M is satisfied if for some under the transformation (14) with input distribution p ( x ) = 0.
input ic E X there exists a y E 3 such that p k ( y I .E) > 0 Thus it suffices to show that for every bounded, continuous,
for all k . It is also satisfied if all the elements of the matrix real-valued function 4 on A,
P are nonzero.
From (1 1) and (12), the limit distribution of T~ is a function
of the i.i.d. input distribution. Let P ( X ) denote the set of
4(ff)d)(dQ)= J, J, 4 ( 4 4 W ) P 0 ( f j ( ?I P ) (72)
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
880 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
p(r3 = I ~2 = a o , ~=
i ~ 1 =) 113
I /3)vok(dP)
IJ, 4(.f"Y,P))P0(Y while
s,
+
p(r3 QO I ~2 = ~ 0 =) 5/6.
- 4(f0(l/,P))P0(YI P ) V ( d P ) I . (80)
So is not a Markov process.
But for any fixed y and P, d k --+ B implies that f O k (y, p) i Lemma A2.7: In general, the Markov property does not
f ' ( y , P ) , since from (13), the numerator and denominator of hold for pn under Markov inputs.
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 88 I
Proof We prove this using a counterexample similar to The second inequality in (25) results from the fact that
that of Lemma A2.6. Let the FSMC be as in Lemma A2.6 conditioning on an additional random variable, in this case the
with the following change in the definition of the memoryless initial state So, always reduces the entropy [14]. The proof of
channels c1, c2, and cg: the third inequality in (25) is similar to that of the first
c l : p l ( 1 IO) = p l ( 1 I 1) = 1, otherwise p l ( y I E ) = 0.
e2 : p 2 ( 2 I 0) = p 2 ( 2 I 1) = 1, otherwise p 2 ( y I J ) = 0.
c3 : p 3 ( 0 I 0) = p 3 ( 2 I 0) = 1/2,
I
P 3 ( 0 1) = 1/4,
P 3 ( 2 I 1) = 3/4. otherwise p3(y 1 J ) = 0.
It is easily shown that the state space for the stochastic process
{pn};?Lo includes the points and a1 defined in Lemma
A2.6. Using the same Markov input distribution defined there,
we have
where a and d follow from properties of conditional expec-
P(P3 = 0 0 1 p2 = QO,Pl = CUI) = 5/36
tation, b follows from (4) and (S), c follows from Jensen's
while inequality, and e follows from the channel and input station-
arity. 0
P ( P 3 = (20 I P2 = R o ) = 8/57. Proofof Lemma 4.3: From Lemma 4.1
So { P ~ } , " , ~ is not a Markov process.
APPENDIXI11
In this Appendix, we prove Lemma 4.1. Consider first
H ( Y n 1 X n , X n - l , Y n - l ) . We have
Similarly
H(Y,, I x,,xn-1,Y-1)
= E[-logp(y, I 2 , , zn--l,y n - 9
K
-log p ( y T 11 E, , S,,,
= ck)
k=1
.p ( ~ =
n ck I En-1,Y7~-1)1
n
1
K
where n: = xR for some 1:. Applying (IS) to (86) and (87)
[
= E -log C p l ; ( y n . n;,)n,(k)
= E[-logp(y,
k=l
1 5 , , 7rTL)]
1 completes the proof.
Proof of Lemma 4.4: The proof of this lemma is similar
to that of Lemma 4.2 above. For the first inequality in (27),
= H(KL j X n , x n ) . (83) we have
The argument that H(Y, 1 Yn-')) = H ( Y , I ,on) is the same,
Ef(P[YnI A E:f(P[Y,l+l I Yznl)
with all the z terms removed and x, replaced by pTL. U b
I ?]"I I Yz"))
= E:f(E(P[?In+l
APPENDIXIV 5 EE(f(P[Yn+lI Y,]) I Y,n)
In this Appendix, we prove Lemmas 4.24.6. d
= Ef(P[Yn+lI ? I n ] ) (88)
Proof of Lemma 4.2; We first note that the conditional
H(W I v)
= ElOgP(w I where the log function where a follows from the stationarity of the inputs and channel,
is concave On Lo, l1. To show the first inequality in (2s), let f b and d follow from properties of conditional expectation [12],
denote any concave function. Then and c is a consequence of Jensen's inequality.
Ef(P[% I xn--l, P I ) The second inequality results from the fact that conditioning
'
Z7L,
on an additional random variable reduces entropy. Finally, for
E.f (p[:Ym.+l 1 2TL+1 > z; > Y;1) the third inequality, we have
b
= E.f(E(P[Y,+lI Zn+l,:~%P] I zn+l,G,Y;))
E f ( P [ P n + l I Yn,s o ] ) g E . f ( E ( P [ Y n + l I Yn,SI] I Y"; So))
4 EE(.f(P[?/,,+,I Zn+l,Zn,yn]) Tk+1,2;,Y;) I
4- Ef(E(P[YTL+l I w;", SI] I Y", so))
d
= Ef(p[yn+l 1 271+l,Z'L:yn]) (84)
where U follows from the stationarity of the channel and
5 EE(f(P[Y,+l I Yz", SI]) I Y n , so)
d
the inputs, b and d follow from properties of conditional = Ef(P[Yn+l I Y;:S1])
expectation [ 121, and c is a consequence of Jensen's inequality. A Ef(P[Yn.I so]) (89)
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
where a and d follow from properties of conditional expecta- is continuous in p and is bounded by log I y I [12, Theorem
tion, b follows from (6), c follows from Jensen's inequality, 25.81.
and e follows from the channel and input stationarity. 0 The limiting conditional entropy H ( Y n 1 X,, T,) is ob-
Proof of Lemma 4.5: Following a similar argument as in tained with a similar argument. Let p: denote the distribution
the proof of Lemma 4.3, we have that of T: and be denote the corresponding limit distribution. Then
a
where p: = pC for some i . Applying (16) to (90) and (91)
= lim
n i x
Y*-
c
s"-lEX"-l
completes the proof. 0
Proof of Lemma 4.6: We first consider the limiting con-
ditional entropy H ( Y n I p:) as n + 00. Let u : denote the
distribution of p: and U' denote the corresponding limit dis-
tribution. Also, let pO(Y 1 .) explicitly denote the dependence
of the (conditional) output probability on 8.Then
lirn H ( Y , I p:)
nim
= 7 2lim
-cc c
Y"-'€Yn-l
Zn-l tX-1
= lim r
The second and fourth equalities in (92) follow from the fact >
;% -lOgP(Y I z, T ) P ( Y I 5 , T ) e ( z )
that pn is a function of gnP1. We also use this in the fifth Y EY
X t X
equality to take expectations relative to p n instead of y n - l . (93)
The sixth equality follows from the definition of un and the where we use the fact that T , is a function of zn-' and y n - l ,
stationarity of the channel inputs. The last equality follows and the last equality follows from the weak convergence of
from the weak convergence of p a and the fact that the entropy' 7ra to T O . 0
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
~
GOISXMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 883
and if M is even
= log IYI (95) Thus P; depends only on the value of Z z J ;the rows of PG are
therefore permutations of each other, and so are the columns.
and similarly
APPENDIXVI1
We will show that the 7r-output channel is asymptotically
for any 7 . Since (95) only requires that p(x,) is uniform memoryless as J i 00. Indeed, since the FSMC is indecom-
for each n, an i.i.d. uniform input distribution achieves this posable and stationary
maximum. Substituting T for p in the above argument yields
the result for H(Y,, 1 T,) and H ( Y , 1 TA). 0 lim P(S,+J, Sn) = lim p ( S , + ~ ) p ( s , ) ( 100)
Proof ($Lemma 5.2: We consider only H(Yn I X,, T,), J-00 J-00
since the same argument applieq for H ( Y n 1 X,.T;). By the for any n,and thus also
output qymmetry of each ~k E C,the sets
are permutations of each other. Thus Therefore, since T ~and . ~J) iterations apart, T ~and
L T ~ ( ~ -are L
T ~ ( ~ are
- ~asymptotically
) independent as J -+ 00.
In order to show that the 7r-output channel is memoryless,
we must show that for any ,j and L
L
p(YjL,TjL I zjL ) - : I
n ~ ( ~ j l , ~x jjl )l . (102)
l=1
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
884 [EEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
Thus we need only show that the lth factor in the right-hand Thus we obtain (45) as follows:
side of (103) equals p(y,l,?r,l 1 ( c 3 1 ) in the limit as J + x. 1
This result is proved in the following lemma. --I(YJ;7 r J ; X J )
J
Lemma A7.1: For asymptotically large J
= H ( Y J ,7r") - H ( Y J ,7rJ I X J )
p ( y J l ,7r31 I Z31,
@I), 7 r ( l - 1 ) , Z3(z-1) ) = P(Y3l.n.71 I "31). =H(YJ j 7 r J ) + H(7rJ)
(104) - ( H ( Y J j T J , X J ) + H(7rJ I X J ) )
= H ( Y J I7rJ) - H ( Y J l T J , X J )
= CHI?
3=1
1x3) - H(Y, ITJ>XJ) ( 109)
where the second equality follows from (4) and ( 5 ) , the third channels The last inequality follows from the fact that
equality follows from (4) and ( l l ) , and the fourth equality
H ( Y , I YJ-1.T") = H ( Y , Ip p J ) = H(Y, I7rJ) (110)
follows from (101) in the asymptotic limit of deep interleaving.
since the 7r3 channels are memoryless and pJ = Ez3--17rJ
APPENDIXVI11 APPENDIXIX
The 7r-output channels are independent if In this Appendix we examine the cutoff rate for uniformly
symmetric variable-noise channels. The first three lemmas
J
show that for these channels, the maximizing distribution of
P(YJ,TJ I Z J ) = flP(Yi,.rri I Z j ) . (52) is uniform and i i d . We then determine that Rj, as given
j=1
by (52), is monotonically increasing in j , and use this to get a
This is shown in the following string of equalities: simplified formula for R d f in terms of the limiting value of Rj.
Lemma A9.I: For all j , Rj depends only on p ( x j ) .
P(YJ,XJ I ZJ) Proof From the proof of Lemma 5.2, 7 r j is a function of
J Zj-l, and is independent of Xj-'. So p(.irj) does not depend
= np(yj,7rj I Zj,y+1,7rj-1,Zj-l) on the input distribution. The result then follows from the
j=1 definition o f Rj. 0
.7
Corollary: An independent input distribution achieves the
= fl& 1 7rj,"j,yj-l,7rj-l,Zj-l) maximum of Rdf.
j=1 Lemma A9.2: For a fixed input distribution p ( X J ) ,the J
. P(7rj I Z j , yj-l, zj-l) corresponding 7r-output channels are all symmetric [13, p. 941.
J Proof We must show that for any j < J , the set of
= np(yj I 7rj,zj)p(7rJ 1 X j , y j - 1 , 7 r - l ; z j - 1 ) outputs for the j t h 7r-output channel can be partitioned into
j=1 subsets such that the corresponding submatrices of transition
J probabilities have rows which are permutations o f each other
=nPivj I 7rj,Zj)P(Xj) (107) and columns which are permutations of each other. We will
j=1 call such a matrix row/column-permutable.
Let nj 5 IXljIJlj be the number of points 6 E A with
where the third equality follows from ( 5 ) and the last equality
p(7rj = 6) > 0, and let {&}:& explicitly denote this set.
follows from the fact that we ignore error propagation, so
Then we can partition the output into nj sets, where the ith
z j - l , yj-', and 7 r - l are all known constants at time j .
set consists of the pairs {(y,&): y E Y } . We want to show
We now determine the average mutual information of the
that the transition probability matrix associated with each of
parallel 7r-output channels for a fixed input distribution p ( X " ) .
these output partitions is rowkolumn-permutable, i.e., that for
The average mutual information of the parallel set is
all i, 1 5 i 5 n j , the 1x1
x IyI matrix
1 ' A
1, = - I ( Y J , 7rJ; X J ) . (108)
J P2= p ( y j = y , " j = 6; I "j = (c), z E x,y E y (111)
From above, the parallel channels are independent, and each has rows which are permutations of each other, and columns
channel is memoryless with asymptotically deep interleaving. which are permutations of each other.
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
~
GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 885
Since the FSMC is a variable-noise channel, there is a Following an argument similar to that of Lemma 4.2, we have
n
function f such that pk(y I x) depends only on z = f ( x , y )
for all k , 1 5 k 5 K . Therefore, if for some k’, pkt(y I x) =
pkt(y’ I d), then f(x,y) = f(x’,y’). But since z = f ( x ; y )
is the same for all k , this implies that
p; =
L1 1
C P k ( Y I x) S i P ( T j = S i ) , 32
since
p ( y j = y, 7rj = Si 1 x j = x)
K
= C p k ( y j = y I xj = x)Sip(7rj = S i ) . (115)
k=l
U
Lemma A9.3: For i.i.d. uniform inputs, R;is monotonically
increasing in j.
where a follows from stationarity and b follows from Jensen’s
Proofi For uniform i i d . inputs
inequality. U
Lemma A9.4: For uniformly symmetric variable-noise
channels, a uniform i.i.d. input distribution maximizes R d f .
Moreover
. J
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.
886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996
Authorized licensed use limited to: National Chiao Tung Univ.. Downloaded on May 13,2020 at 18:01:10 UTC from IEEE Xplore. Restrictions apply.