Random Processes: C 161 °alan V. Oppenheim and George C. Verghese, 2010
Random Processes: C 161 °alan V. Oppenheim and George C. Verghese, 2010
Random Processes: C 161 °alan V. Oppenheim and George C. Verghese, 2010
Random Processes
INTRODUCTION
Much of your background in signals and systems is assumed to have focused on the
eect of LTI systems on deterministic signals, developing tools for analyzing this
class of signals and systems, and using what you learned in order to understand
applications in communication (e.g., AM and FM modulation), control (e.g., sta
bility of feedback systems), and signal processing (e.g., ltering). It is important to
develop a comparable understanding and associated tools for treating the eect of
LTI systems on signals modeled as the outcome of probabilistic experiments, i.e.,
a class of signals referred to as random signals (alternatively referred to as random
processes or stochastic processes). Such signals play a central role in signal and
system design and analysis, and throughout the remainder of this text. In this
chapter we dene random processes via the associated ensemble of signals, and be
gin to explore their properties. In successive chapters we use random processes as
models for random or uncertain signals that arise in communication, control and
signal processing applications.
9.1 DEFINITION AND EXAMPLES OF A RANDOM PROCESS
In Section 7.3 we dened a random variable X as a function that maps each outcome
of a probabilistic experiment to a real number. In a similar manner, a real-valued
CT or DT random process, X(t) or X[n] respectively, is a function that maps
each outcome of a probabilistic experiment to a real CT or DT signal respectively,
termed the realization of the random process in that experiment. For any xed
time instant t = t
0
or n = n
0
, the quantities X(t
0
) and X[n
0
] are just random
variables. The collection of signals that can be produced by the random process is
referred to as the ensemble of signals in the random process.
EXAMPLE 9.1 Random Oscillators
As an example of a random process, imagine a warehouse containing N harmonic
oscillators, each producing a sinusoidal waveform of some specic amplitude, fre
quency, and phase, all of which may be dierent for the dierent oscillators. The
probabilistic experiment that results in the ensemble of signals consists of selecting
an oscillator according to some probability mass function (PMF) that assigns a
probability to each of the numbers from 1 to N, so that the ith oscillator is picked
c 161 Alan V. Oppenheim and George C. Verghese, 2010
162 Chapter 9 Random Processes
Amplitude
X(t; )
t
0
t
FIGURE 9.1 A random process.
with probability p
i
. Associated with each outcome of this experiment is a specic
sinusoidal waveform.
In Example 9.1, before an oscillator is chosen, there is uncertainty about what
the amplitude, frequency and phase of the outcome of the experiment will be.
Consequently, for this example, we might express the random process as
X(t) = A sin(t + )
where the amplitude A, frequency and phase are all random variables. The
value X(t
1
) at some specic time t
1
is also a random variable. In the context of
this experiment, knowing the PMF associated with each of the numbers 1 to N
involved in choosing an oscillator, as well as the specic amplitude, frequency and
phase of each oscillator, we could determine the probability distributions of any of
the underlying random variables A, , or X(t
1
) mentioned above.
Throughout this and later chapters, we will be considering many other examples of
random processes. What is important at this point, however, is to develop a good
mental picture of what a random process is. A random process is not just one signal
but rather an ensemble of signals, as illustrated schematically in Figure 9.2 below,
for which the outcome of the probabilistic experiment could be any of the four wave
forms indicated. Each waveform is deterministic, but the process is probabilistic
or random because it is not known a priori which waveform will be generated by
the probabilistic experiment. Consequently, prior to obtaining the outcome of the
probabilistic experiment, many aspects of the signal are unpredictable, since there
is uncertainty associated with which signal will be produced. After the experiment,
or a posteriori, the outcome is totally determined.
If we focus on the values that a random process X(t) can take at a particular
instant of time, say t
1
i.e., if we look down the entire ensemble at a xed time
what we have is a random variable, namely X(t
1
). If we focus on the ensemble of
values taken at an arbitrary collection of xed time instants t
1
< t
2
< < t
for
some arbitrary integer , we are dealing with a set of jointly distributed random
variables X(t
1
), X(t
2
), , X(t
)
for all and all t
1
, t
2
, , t
.
An important set of questions that arises as we work with random processes in later
chapters of this book is whether, by observing just part of the outcome of a random
process, we can determine the complete outcome. The answer will depend on the
details of the random process, but in general the answer is no. For some random
processes, having observed the outcome in a given time interval might provide
sucient information to know exactly which ensemble member was determined. In
other cases it would not be sucient. We will be exploring some of these aspects in
more detail later, but we conclude this section with two additional examples that
Alan V. Oppenheim and George C. Verghese, 2010 c
164 Chapter 9 Random Processes
further emphasize these points.
EXAMPLE 9.2 Ensemble of batteries
Consider a collection of N batteries, each providing one voltage out of a given nite
set of voltage values. The histogram of voltages (i.e., the number of batteries with
a given voltage) is given in Figure 9.3. The probabilistic experiment is to choose
Number of
Batteries
Voltage
FIGURE 9.3 Histogram of battery distribution for Example 9.2.
one of the batteries, with the probability of picking any specic one being
N
1
, i.e.,
they are all equally likely to be picked. A little reection should convince you that
if we multiply the histogram in Figure 9.3 by
N
1
, this normalized histogram will
represent (or approximate) the PMF for the battery voltage at the outcome of the
experiment. Since the battery voltage is a constant signal, this corresponds to a
random process, and in fact is similar to the oscillator example discussed earlier,
but with = 0 and = 0, so that only the amplitude is random.
For this example observation of X(t) at any one time is sucient information to
determine the outcome for all time.
EXAMPLE 9.3 Ensemble of coin tossers
Consider N people, each independently having written down a long random string
of ones and zeros, with each entry chosen independently of any other entry in their
string (similar to a sequence of independent coin tosses). The random process now
comprises this ensemble of strings. A realization of the process is obtained by
randomly selecting a person (and therefore one of the N strings of ones and zeros),
following which the specic ensemble member of the random process is totally
determined. The random process described in this example is often referred to as
Alan V. Oppenheim and George C. Verghese, 2010 c
Section 9.1 Denition and examples of a random process 165
the Bernoulli process because of the way in which the string of ones and zeros is
generated (by independent coin ips).
Now suppose that person shows you only the tenth entry in the string. Can you
determine (or predict) the eleventh entry from just that information? Because of
the manner in which the string was generated, the answer clearly is no. Similarly
if the entire past history up to the tenth entry was revealed to you, could you
determine the remaining sequence beyond the tenth? For this example, the answer
is again clearly no.
While the entire sequence has been determined by the nature of the experiment,
partial observation of a given ensemble member is in general not sucient to fully
specify that member.
Rather than looking at the nth entry of a single ensemble member, we can consider
the random variable corresponding to the values from the entire ensemble at the
nth entry. Looking down the ensemble at n = 10, for example, we would would see
ones and zeros with equal probability.
In the above discussion we indicated and emphasized that a random process can
be thought of as a family of jointly distributed random variables indexed by t or
n. Obviously it would in general be extremely dicult or impossible to represent a
random process this way. Fortunately, the most widely used random process models
have special structure that permits computation of such a statistical specication.
Also, particularly when we are processing our signals with linear systems, we often
design the processing or analyze the results by considering only the rst and second
moments of the process, namely the following functions:
Mean:
X
(t
i
) = E[X(t
i
)], (9.1)
Auto-correlation: R
XX
(t
i
, t
j
) = E[X(t
i
)X(t
j
)], and (9.2)
Auto-covariance: C
XX
(t
i
, t
j
) = E[(X(t
i
)
X
(t
i
))(X(t
j
)
X
(t
j
))]
= R
XX
(t
i
, t
j
)
X
(t
i
)
X
(t
j
). (9.3)
The word auto (which is sometime written without the hyphen, and sometimes
dropped altogether to simplify the terminology) here refers to the fact that both
samples in the correlation function or the covariance function come from the same
process; we shall shortly encounter an extension of this idea, where the samples are
taken from two dierent processes.
One case in which the rst and second moments actually suce to completely
specify the process is in the case of what is called a Gaussian process, dened
as a process whose samples are always jointly Gaussian (the generalization of the
bivariate Gaussian to many variables).
We can also consider multiple random processes, e.g., two processes, X(t) and Y (t).
For a full stochastic characterization of this, we need the PDFs of all possible com
binations of samples from X(t), Y (t). We say that X(t) and Y (t) are independent
if every set of samples from X(t) is independent of every set of samples from Y (t),
Alan V. Oppenheim and George C. Verghese, 2010 c
166 Chapter 9 Random Processes
so that the joint PDF factors as follows:
f
X(t1), ,X(t
k
),Y (t
), ,Y (t
)
(x
1
, , x
k
, y
1
, , y
)
1
= f
X(t1), ,X(t
k
)
(x
1
, , x
k
).f
Y (t
), ,Y (t
)
(y
1
, , y
) . (9.4)
1
If only rst and second moments are of interest, then in addition to the individual
rst and second moments of X(t) and Y (t) respectively, we need to consider the
cross-moment functions:
Cross-correlation: R
XY
(t
i
, t
j
) = E[X(t
i
)Y (t
j
)], and (9.5)
Cross-covariance: C
XY
(t
i
, t
j
) = E[(X(t
i
)
X
(t
i
))(Y (t
j
)
Y
(t
j
))]
= R
XY
(t
i
, t
j
)
X
(t
i
)
Y
(t
j
). (9.6)
If C
XY
(t
1
, t
2
) = 0 for all t
1
, t
2
, we say that the processes X(t) and Y (t) are uncor
related. Note again that the term uncorrelated in its common usage means that
the processes have zero covariance rather than zero correlation.
Note that everything we have said above can be carried over to the case of DT
random processes, except that now the sampling instants are restricted to be dis
crete time instants. In accordance with our convention of using square brackets
[ ] around the time argument for DT signals, we will write
X
[n] for the mean
of a random process X[ ] at time n; similarly, we will write R
XX
[n
i
, n
j
] for the
correlation function involving samples at times n
i
and n
j
; and so on.
9.2 STRICT-SENSE STATIONARITY
In general, we would expect that the joint PDFs associated with the random vari
ables obtained by sampling a random process at an arbitrary number k of arbitrary
times will be time-dependent, i.e., the joint PDF
f
X(t1), ,X(t
k
)
(x
1
, , x
k
)
will depend on the specic values of t
1
, , t
k
. If all the joint PDFs stay the same
under arbitrary time shifts, i.e., if
f
X(t1 ), ,X(t
k
)
(x
1
, , x
k
) = f
X(t1+ ), ,X(t
k
+ )
(x
1
, , x
k
) (9.7)
for arbitrary , then the random process is said to be strict-sense stationary (SSS).
Said another way, for a strict-sense stationary process, the statistics depend only
on the relative times at which the samples are taken, not on the absolute times.
EXAMPLE 9.4 Representing an i.i.d. process
Consider a DT random process whose values X[n] may be regarded as independently
chosen at each time n from a xed PDF f
X
(x), so the values are independent and
identically distributed, thereby yielding what is called an i.i.d. process. Such pro
cesses are widely used in modeling and simulation. For instance, if a particular
c Alan V. Oppenheim and George C. Verghese, 2010
Section 9.3 Wide-Sense Stationarity 167
DT communication channel corrupts a transmitted signal with added noise that
takes independent values at each time instant, but with characteristics that seem
unchanging over the time window of interest, then the noise may be well modeled
as an i.i.d. process. It is also easy to generate an i.i.d. process in a simulation envi
ronment, provided one can arrange a random-number generator to produce samples
from a specied PDF (and there are several good ways to do this). Processes with
more complicated dependence across time samples can then be obtained by ltering
or other operations on the i.i.d. process, as we shall see in the next chapter.
For such an i.i.d. process, we can write the joint PDF quite simply:
f
X[n1 ],X[n2], ,X[n
]
(x
1
, x
2
, , x
) = f
X
(x
1
)f
X
(x
2
) f
X
(x
) (9.8)
for any choice of and n
1
, , n
X
(t) =
X
(9.9)
R
XX
(t
1
, t
2
) = R
XX
(t
1
+ , t
2
+ ) for every
= R
XX
(t
1
t
2
, 0) . (9.10)
(Note that for a Gaussian process (i.e., a process whose samples are always jointly
Gaussian) WSS implies SSS, because jointly Gaussian variables are entirely deter
mined by the their joint rst and second moments.)
Two random processes X(t) and Y (t) are jointly WSS if their rst and second
moments (including the cross-covariance) are stationary. In this case we use the
notation R
XY
() to denote E[X(t + )Y (t)].
EXAMPLE 9.5 Random Oscillators Revisited
Consider again the harmonic oscillators as introduced in Example 9.1, i.e.
X(t; A, ) = A cos(
0
t + )
where A and are independent random variables, and now
0
is xed at some
known value.
If is actually xed at the constant value
0
, then every outcome is of the form
x(t) = A cos(
0
t +
0
), and it is straightforward to see that this process is not WSS
c Alan V. Oppenheim and George C. Verghese, 2010
,
168 Chapter 9 Random Processes
(and hence not SSS). For instance, if A has a nonzero mean value,
A
= 0, then the
expected value of the process, namely
A
cos(
0
t +
0
), is time varying. To argue
that the process is not WSS even when
A
= 0, we can examine the autocorrelation
function. Note that x(t) is xed at the value 0 for all values of t such that
0
t +
0
is an odd multiple of /2, and takes the values A half-way between such points;
the correlation between such samples taken /
0
apart in time can correspondingly
be 0 (in the former case) or E[A
2
] (in the latter). The process is thus not WSS.
On the other hand, if is distributed uniformly in [, ], then
_
1
X
(t) =
A
cos(
0
t + )d = 0 , (9.11)
2
C
XX
(t
1
, t
2
) = R
XX
(t
1
, t
2
)
= E[A
2
]E[cos(
0
t
1
+ ) cos(
0
t
2
+ )]
E[A
2
]
= cos(
0
(t
2
t
1
)) , (9.12)
2
so the process is WSS. It can also be shown to be SSS, though this is not totally
straightforward to show formally.
To simplify notation for a WSS process, we write the correlation function as
R
XX
(t
1
t
2
); the argument t
1
t
2
is referred to as the lag at which the corre
lation is computed. For the most part, the random processes that we treat will
be WSS processes. When considering just rst and second moments and not en
tire PDFs or CDFs, it will be less important to distinguish between the random
process X(t) and a specic realization x(t) of it so we shall go one step further
in simplifying notation, by using lower case letters to denote the random process
itself. We shall thus talk of the random process x(t), and in the case of a WSS
process denote its mean by
x
and its correlation function Ex(t + )x(t) by
R
xx
(). Correspondingly, for DT well refer to the random process x[n] and (in the
WSS case) denote its mean by
x
and its correlation function Ex[n + m]x[n] by
R
xx
[m].
9.3.1 Some Properties of WSS Correlation and Covariance Functions
It is easily shown that for real-valued WSS processes x(t) and y(t) the correlation
and covariance functions have the following symmetry properties:
R
xx
( ) = R
xx
( ) , C
xx
() = C
xx
( ) (9.13)
R
xy
( ) = R
yx
() , C
xy
() = C
yx
( ) . (9.14)
We see from (9.13) that the autocorrelation and autocovariance have even symme
try. Similar properties hold for DT WSS processes.
Another important property of correlation and covariance functions follows from
noting that the correlation coecient of two random variables has magnitude not
c Alan V. Oppenheim and George C. Verghese, 2010
Section 9.4 Summary of Denitions and Notation 169
exceeding 1. Applying this fact to the samples x(t) and x(t + ) of the random
process x( ) directly leads to the conclusion that
C
xx
(0) C
xx
( ) C
xx
(0) . (9.15)
In other words, the autocovariance function never exceeds in magnitude its value
at the origin. Adding
x
2
to each term above, we nd the following inequality holds
for correlation functions:
R
xx
(0) + 2
x
2
R
xx
() R
xx
(0) . (9.16)
In Chapter 10 we will demonstrate that correlation and covariance functions are
characterized by the property that their Fourier transforms are real and non
negative at all frequencies, because these transforms describe the frequency dis
tribution of the expected power in the random process. The above symmetry con
straints and bounds will then follow as natural consequences, but they are worth
highlighting here already.
9.4 SUMMARY OF DEFINITIONS AND NOTATION
In this section we summarize some of the denitions and notation we have previously
introduced. As in Section 9.3, we shall use lower case letters to denote random
processes, since we will only be dealing with expectations and not densities. Thus,
with x(t) and y(t) denoting (real) random processes, we summarize the following
denitions:
mean : (t)
(9.17)
x
= Ex(t)
autocorrelation : (t
1
, t
2
)
(9.18) R
xx
= Ex(t
1
)x(t
2
)
cross correlation : (t
1
, t
2
)
(9.19) R
xy
= Ex(t
1
)y(t
2
)
autocovariance : (t
1
, t
2
)
(t
1
)][x(t
2
)
x
(t
2
)] C
xx
= E[x(t
1
)
x
= R
xx
(t
1
, t
2
)
x
(t
1
)
x
(t
2
) (9.20)
cross covariance : (t
1
, t
2
)
(t
1
)][y(t
2
)
y
(t
2
)] C
xy
= E[x(t
1
)
x
= R
xy
(t
1
, t
2
)
x
(t
1
)
y
(t
2
) (9.21)
c Alan V. Oppenheim and George C. Verghese, 2010
170 Chapter 9 Random Processes
strict-sense stationary (SSS): all joint statistics for x(t
1
), x(t
2
), . . . , x(t
(T )
k
1 + (1)
k
(T )
k
P(a non-negative even # of sign changes) =
e
T
= e
T
k! 2 k!
k=0 k=0
k even
(9.33)
Using the identity
(T )
k
T
e =
k!
k=0
Alan V. Oppenheim and George C. Verghese, 2010 c
,
172 Chapter 9 Random Processes
equation (9.33) becomes
P(a non-negative even # of sign changes) = e
T
(e
T
+ e
T
)
2
1
= (1 + e
2T
) . (9.34)
2
Similarly, the probability of an odd number of sign changes in an interval of length
T is
1
(1 e
2T
). It follows that
2
P(X(t) = 1) = P(X(t) = 1 X(0) = 1)P(X(0) = 1) [
+ P(X(t) = 1[X(0) = 1)P(X(0) = 1)
1
= P(even # of sign changes in [0, t])
2
1
+ P(odd # of sign changes in [0, t])
2
1
_
1
_
1
_
1
_
1
(1 e
2t
) = (1 + e
2t
) + = . (9.35)
2 2 2 2 2
Note that because of Property I, the expression in the last line of Eqn. (9.35) is not
needed, since the line before that already allows us to conclude that the answer is
1
2
:
since the number of sign changes in any interval must be either even or odd, their
probabilities add up to 1, so P (X(t) = 1) =
1
2
. However, if Property 1 is relaxed to
allow P(X(0) = 1) = p
0
=
2
1
, then the above computation must be carried through
to the last line, and yields the result
(1 e
2t
) P(X(t) = 1) = p
0
(1 + e
2t
) +(1p
0
) =
_
1
_ _
1
_
1
_
1 + (2p
0
1)e
2t
_
.
2 2 2
(9.36)
Returning to the case where Property 1 holds, so P(X(t) = 1), we get
X
(t) = 0, and (9.37)
R
XX
(t
1
, t
2
) = E[X(t
1
)X(t
2
)]
= 1 P(X(t
1
) = X(t
2
)) + (1) P(X(t
1
) =, X(t
2
))
= e
2|t2t1|
. (9.38)
In other words, the process is exponentially correlated and WSS.
9.6 ERGODICITY
The concept of ergodicity is sophisticated and subtle, but the essential idea is de
scribed here. We typically observe the outcome of a random process (e.g., we record
a noise waveform) and want to characterize the statistics of the random process by
measurements on one ensemble member. For instance, we could consider the time-
average of the waveform to represent the mean value of the process (assuming this
c Alan V. Oppenheim and George C. Verghese, 2010
Section 9.7 Linear Estimation of Random Processes 173
mean is constant for all time). We could also construct histograms that represent
the fraction of time (rather than the probability-weighted fraction of the ensemble)
that the waveform lies in dierent amplitude bins, and this could be taken to reect
the probability density across the ensemble of the value obtained at a particular
sampling time. If the random process is such that the behavior of almost every par
ticular realization over time is representative of the behavior down the ensemble,
then the process is called ergodic.
A simple example of a process that is not ergodic is Example 9.2, an ensemble of
batteries. Clearly, for this example, the behavior of any realization is not represen
tative of the behavior down the ensemble.
Narrower notions of ergodicity may be dened. For example, if the time average
1
_
T
x) =
T 2T
T
x(t) dt (9.39) lim
almost always (i.e. for almost every realization or outcome) equals the ensemble
average
X
, then the process is termed ergodic in the mean. It can be shown,
for instance, that a WSS process with nite variance at each instant and with a
covariance function that approaches 0 for large lags is ergodic in the mean. Note
that a (nonstationary) process with time-varying mean cannot be ergodic in the
mean.
In our discussion of random processes, we will primarily be concerned with rst-
and second-order moments of random processes. While it is extremely dicult
to determine in general whether a random process is ergodic, there are criteria
(specied in terms of the moments of the process) that will establish ergodicity
in the mean and in the autocorrelation. Frequently, however, such ergodicity is
simply assumed for convenience, in the absence of evidence that the assumption
is not reasonable. Under this assumption, the mean and autocorrelation can be
obtained from time-averaging on a single ensemble member, through the following
equalities:
1
_
T
Ex(t) = lim x(t)dt (9.40)
T 2T
T
and
1
_
T
Ex(t)x(t + ) = lim x(t)x(t + )dt (9.41)
T 2T
T
A random process for which (9.40) and (9.41) are true is referred as second-order
ergodic.
9.7 LINEAR ESTIMATION OF RANDOM PROCESSES
A common class of problems in a variety of aspects of communication, control and
signal processing involves the estimation of one random process from observations
c Alan V. Oppenheim and George C. Verghese, 2010
174 Chapter 9 Random Processes
of another, or estimating (predicting) future values from the observation of past
values. For example, it is common in communication systems that the signal at the
receiver is a corrupted (e.g., noisy) version of the transmitted signal, and we would
like to estimate the transmitted signal from the received signal. Other examples
lie in predicting weather and nancial data from past observations. We will be
treating this general topic in much more detail in later chapters, but a rst look at
it here can be benecial in understanding random processes.
We shall rst consider a simple example of linear prediction of a random process,
then a more elaborate example of linear FIR ltering of a noise-corrupted process to
estimate the underlying random signal. We conclude the section with some further
discussion of the basic problem of linear estimation of one random variable from
measurements of another.
9.7.1 Linear Prediction
As a simple illustration of linear prediction, consider a discrete-time process x[n].
Knowing the value at time n
0
we may wish to predict what the value will be m
samples into the future, i.e. at time n
0
+ m. We limit the prediction strategy to a
linear one, i.e., with x[n
0
+ m] denoting the predicted value, we restrict x[n
0
+ m]
to be of the form
x[n
0
+ m] = ax[n
0
] + b (9.42)
and choose the prediction parameters a and b to minimize the expected value of
the square of the error, i.e., choose a and b to minimize
= E(x[n
0
+ m] x[n
0
+ m])
2
(9.43)
or
= E(x[n
0
+ m] ax[n
0
] b)
2
. (9.44)
To minimize we set to zero its partial derivative with respect to each of the two
parameters and solve for the parameter values. The resulting equations are
E(x[n
0
+ m] ax[n
0
] b)x[n
0
] = E(x[n
0
+ m] x[n
0
+ m])x[n
0
] = 0
(9.45a)
Ex[n
0
+ m] ax[n
0
] b = Ex[n
0
+ m] x[n
0
+ m] = 0 .
(9.45b)
Equation (9.45a) states that the error x[n
0
+ m] x[n
0
+ m] associated with the
optimal estimate is orthogonal to the available data x[n
0
]. Equation (9.45b) states
that the estimate is unbiased.
Carrying out the multiplications and expectations in the preceding equations results
in the following equations, which can be solved for the desired constants.
R
xx
[n
0
+ m, n
0
] aR
xx
[n
0
, n
0
] b
x
[n
0
] = 0 (9.46a)
x
[n
0
+ m] a
x
[n
0
] b = 0. (9.46b)
c Alan V. Oppenheim and George C. Verghese, 2010
Section 9.7 Linear Estimation of Random Processes 175
If we assume that the process is WSS so that R
xx
[n
0
+m, n
0
] = R
xx
[m], R
xx
[n
0
, n
0
] =
R
xx
[0], and also assume that it is zero mean, (
x
= 0), then equations (9.46) reduce
to
a = R
xx
[m]/R
xx
[0] (9.47)
b = 0 (9.48)
so that
R
xx
[m]
x[n
0
+ m] =
R
xx
[0]
x[n
0
]. (9.49)
If the process is not zero mean, then it is easy to see that
C
xx
[m]
x[n
0
+ m] =
x
+
C
xx
[0]
(x[n
0
]
x
) . (9.50)
An extension of this problem would consider how to do prediction when measure
ments of several past values are available. Rather than pursue this case, we illustrate
next what to do with several measurements in a slightly dierent setting.
9.7.2 Linear FIR Filtering
As another example, which we will treat in more generality in chapter 11 on Wiener
ltering, consider a discrete-time signal s[n] that has been corrupted by additive
noise d[n]. For example, s[n] might be a signal transmitted over a channel and d[n]
the noise introduced by the channel. The received signal r[n] is then
r[n] = s[n] + d[n]. (9.51)
Assume that both s[n] and d[n] are zero-mean random processes and are uncor
related. At the receiver we would like to process r[n] with a causal FIR (nite
impulse response) lter to estimate the transmitted signal s[n].
d[n]
s[n]
s[n]
r[n]
h[n]
FIGURE 9.5 Estimating the noise corrupted signal.
If h[n] is a causal FIR lter of length L, then
L1
s[n] =
h[k]r[n k]. (9.52)
k=0
c Alan V. Oppenheim and George C. Verghese, 2010
176 Chapter 9 Random Processes
We would like to determine the lter coecients h[k] to minimize the mean square
error between s[n] and s[n], i.e., minimize given by
= E(s[n] s[n])
2
L1
= E(s[n]
h[k]r[n k])
2
. (9.53)
k=0
To determine h, we set
h[m]
= 0 for each of the L values of m. Taking this
derivative, we get
= E2(s[n]
h[k]r[n k])r[n m]
h[m]
k
= E2(s[n] s[n])r[n m]
= 0 m = 0, 1, , L 1 (9.54)
which is the orthogonality condition we should be expecting: the error (s[n] s[n])
associated with the optimal estimate is orthogonal to the available data, r[n m].
Carrying out the multiplications in the above equations and taking expectations
results in
L1
h[k]R
rr
[m k] = R
sr
[m] , m = 0, 1, , L 1 (9.55)
k=0
Eqns. (9.55) constitute L equations that can be solved for the L parameters h[k].
With r[n] = s[n] + d[n], it is straightforward to show that R
sr
[m] = R
ss
[m] +
R
sd
[m] and since we assumed that s[n] and d[n] are uncorrelated, then R
sd
[m] = 0.
Similarly, R
rr
[m] = R
ss
[m] + R
dd
[m].
These results are also easily modied for the case where the processes no longer
have zero mean.
9.8 THE EFFECT OF LTI SYSTEMS ON WSS PROCESSES
Your prior background in signals and systems, and in the earlier chapters of these
notes, has characterized how LTI systems aect the input for deterministic signals.
We will see in later chapters how the correlation properties of a random process,
and the eects of LTI systems on these properties, play an important role in under
standing and designing systems for such tasks as ltering, signal detection, signal
estimation and system identication. We focus in this section on understanding
in the time domain how LTI systems shape the correlation properties of a random
process. In Chapter 10 we develop a parallel picture in the frequency domain, af
ter establishing that the frequency distribution of the expected power in a random
signal is described by the Fourier transform of the autocorrelation function.
Consider an LTI system whose input is a sample function of a WSS random process
x(t), i.e., a signal chosen by a probabilistic experiment from the ensemble that con
stitutes the random process x(t); more simply, we say that the input is the random
c Alan V. Oppenheim and George C. Verghese, 2010
Section 9.8 The Eect of LTI Systems on WSS Processes 177
process x(t). The WSS input is characterized by its mean and its autocovariance
or (equivalently) autocorrelation function.
Among other considerations, we are interested in knowing when the output process
y(t) i.e., the ensemble of signals obtained as responses to the signals in the input
ensemble will itself be WSS, and want to determine its mean and autocovariance
or autocorrelation functions, as well as its cross-correlation with the input process.
For an LTI system whose impulse response is h(t), the output y(t) is given by the
convolution
_
+
_
+
y(t) = h(v)x(t v)dv = x(v)h(t v)dv (9.56)
for any specic input x(t) for which the convolution is well-dened. The convolution
is well-dened if, for instance, the input x(t) is bounded and the system is bounded-
input bounded-output (BIBO) stable, i.e. h(t) is absolutely integrable. Figure 9.6
indicates what the two components of the integrand in the convolution integral may
look like.
x(v)
v
h(t - v)
t v
FIGURE 9.6 Illustration of the two terms in the integrand of Eqn. (9.56)
Rather than requiring that every sample function of our input process be bounded,
it will suce for our convolution computations below to assume that E[x
2
(t)] =
R
xx
(0) is nite. With this assumption, and also assuming that the system is BIBO
stable, we ensure that y(t) is a well-dened random process, and that the formal
manipulations we carry out below for instance, interchanging expectation and
convolution can all be justied more rigorously by methods that are beyond
our scope here. In fact, the results we obtain can also be applied, when properly
interpreted, to cases where the input process does not have a bounded second
moment, e.g., when x(t) is so-called CT white noise, for which R
xx
( ) = ( ). The
results can also be applied to a system that is not BIBO stable, as long as it has a
well-dened frequency response H(j), as in the case of an ideal lowpass lter, for
example.
We can use the convolution relationship (9.56) to deduce the rst- and second-
order properties of y(t). What we shall establish is that y(t) is itself WSS, and that
Alan V. Oppenheim and George C. Verghese, 2010 c
178 Chapter 9 Random Processes
x(t) and y(t) are in fact jointly WSS. We will also develop relationships for the
autocorrelation of the output and the cross-correlation between input and output.
First, consider the mean value of the output. Taking the expected value of both
sides of (9.56), we nd
__
+
_
E[y(t)] = E h(v)x(t v) dv
_
+
= h(v)E[x(t v)] dv
_
+
= h(v)
x
dv
_
+
=
x
h(v) dv
= H(j0)
x
=
y
. (9.57)
In other words, the mean of the output process is constant, and equals the mean of
the input scaled by the the DC gain of the system. This is also what the response
of the system would be if its input were held constant at the value
x
.
The preceding result and the linearity of the system also allow us to conclude that
applying the zero-mean WSS process x(t)
x
to the input of the stable LTI system
would result in the zero-mean process y(t)
y
at the output. This fact will be
useful below in converting results that are derived for correlation functions into
results that hold for covariance functions.
Next consider the cross-correlation between output and input:
__ _
+
_ _
Ey(t + )x(t) = E h(v)x(t + v)dv x(t)
_
+
= h(v)Ex(t + v)x(t)dv . (9.58)
Since x(t) is WSS, Ex(t + v)x(t) = R
xx
( v), so
_
+
Ey(t + )x(t) = h(v)R
xx
( v)dv
= h( ) R
xx
()
= R
yx
( ) . (9.59)
Note that the cross-correlation depends only on the lag between the sampling
instants of the output and input processes, not on both and the absolute time
location t. Also, this cross-correlation between the output and input is determinis
tically related to the autocorrelation of the input, and can be viewed as the signal
that would result if the system input were the autocorrelation function, as indicated
in Figure 9.7.
c Alan V. Oppenheim and George C. Verghese, 2010
Section 9.8 The Eect of LTI Systems on WSS Processes 179
R
yx
() R
xx
()
h()
FIGURE 9.7 Representation of Eqn. (9.59)
We can also conclude that
R
xy
() = R
yx
() = R
xx
() h() = R
xx
( ) h() , (9.60)
where the second equality follows from Eqn. (9.59) and the fact that time-reversing
the two functions in a convolution results in time-reversal of the result, while the
last equality follows from the symmetry Eqn. (9.13) of the autocorrelation function.
The above relations can also be expressed in terms of covariance functions, rather
than in terms of correlation functions. For this, simply consider the case where the
input to the system is the zero-mean WSS process x(t)
x
, with corresponding
zero-mean output y(t)
y
. Since the correlation function for x(t)
x
is the same
as the covariance function for x(t), i.e., since
R
xx ,xx
() = C
xx
() , (9.61)
the results above hold unchanged when every correlation function is replaced by
the corresponding covariance function. We therefore have, for instance, that
C
yx
() = h( ) C
xx
() (9.62)
Next we consider the autocorrelation of the output y(t):
__ _
+
_ _
Ey(t + )y(t) = E h(v)x(t + v)dv y(t)
_
+
= h(v) Ex(t + v)y(t) dv
. .
Rxy (v)
_
+
= h(v)R
xy
( v)dv
= h( ) R
xy
( )
= R
yy
() . (9.63)
Note that the autocorrelation of the output depends only on , and not on both
and t. Putting this together with the earlier results, we conclude that x(t) and
y(t) are jointly WSS, as claimed.
Alan V. Oppenheim and George C. Verghese, 2010 c
. .
. .
180 Chapter 9 Random Processes
The corresponding result for covariances is
C
yy
() = h() C
xy
( ) . (9.64)
Combining (9.63) with (9.60), we nd that
R
yy
( ) = R
xx
() h() h() = R
xx
( ) R
hh
() . (9.65)
h()h()=R
hh
( )
The function R
hh
() is typically referred to as the deterministic autocorrelation
function of h(t), and is given by
_
+
R
hh
( ) = h( ) h( ) = h(t + )h(t)dt . (9.66)
For the covariance function version of (9.65), we have
C
yy
( ) = C
xx
() h() h() = C
xx
() R
hh
() . (9.67)
h()h()=R
hh
( )
Note that the deterministic correlation function of h(t) is still what we use, even
when relating the covariances of the input and output. Only the means of the input
and output processes get adjusted in arriving at the present result; the impulse
response is untouched.
The correlation relations in Eqns. (9.59), (9.60), (9.63) and (9.65), as well as
their covariance counterparts, are very powerful, and we will make considerable
use of them. Of equal importance are their statements in the Fourier and Laplace
transform domains. Denoting the Fourier and Laplace transforms of the correlation
function R
xx
() by S
xx
(j) and S
xx
(s) respectively, and similarly for the other
correlation functions of interest, we have:
S
yx
(j) = S
xx
(j)H(j), S
yy
(j) = S
xx
(j)[H(j)[
2
,
S
yx
(s) = S
xx
(s)H(s), S
yy
(s) = S
xx
(s)H(s)H(s) . (9.68)
We can denote the Fourier and Laplace transforms of the covariance function C
xx
()
by D
xx
(j) and D
xx
(s) respectively, and similarly for the other covariance functions
of interest, and then write the same sorts of relationships as above.
Exactly parallel results hold in the DT case. Consider a stable discrete-time LTI
system whose impulse response is h[n] and whose input is the WSS random process
x[n]. Then, as in the continuous-time case, we can conclude that the output process
y[n] is jointly WSS with the input process x[n], and
y
=
x
h[n] (9.69)
R
yx
[m] = h[m] R
xx
[m] (9.70)
R
yy
[m] = R
xx
[m] R
hh
[m] , (9.71)
c Alan V. Oppenheim and George C. Verghese, 2010
Section 9.8 The Eect of LTI Systems on WSS Processes 181
where R
hh
[m] is the deterministic autocorrelation function of h[m], dened as
+
R
hh
[m] =
h[n + m]h[n] . (9.72)
n=
The corresponding Fourier and Z-transform statements of these relationships are:
y
= H(e
j0
)
x
, S
yx
(e
j
) = S
xx
(e
j
)H(e
j
) , S
yy
(e
j
) = S
xx
(e
j
)[H(e
j
)[
2
,
y
= H(1)
x
, S
yx
(z) = S
xx
(z)H(z) , S
yy
(z) = S
xx
(z)H(z)H(1/z).
(9.73)
All of these expressions can also be rewritten for covariances and their transforms.
The basic relationships that we have developed so far in this chapter are extremely
powerful. In Chapter 10 we will use these relationships to show that the Fourier
transform of the autocorrelation function describes how the expected power of a
WSS process is distributed in frequency. For this reason, the Fourier transform of
the autocorrelation function is termed the power spectral density (PSD) of the
process.
The relationships developed in this chapter are also very important in using random
processes to measure or identify the impulse response of an LTI system. For exam
ple, from (9.70), if the input x[n] to a DT LTI system is a WSS random process with
autocorrelation function R
xx
[m] = [m], then by measuring the cross-correlation
between the input and output we obtain a measurement of the system impulse re
sponse. It is easy to construct an input process with autocorrelation function [m],
for example an i.i.d. process that is equally likely to take the values +1 and 1 at
each time instant.
As another example, suppose the input x(t) to a CT LTI system is a random
telegraph wave, with changes in sign at times that correspond to the arrivals in a
Poisson process with rate , i.e.,
(T )
k
e
T
P(k switches in an interval of length T ) = . (9.74)
k!
Then, assuming x(0) takes the values 1 with equal probabilities, we can determine
that the process x(t) has zero mean and correlation function R
xx
( ) = e
2| |
, so
it is WSS (for t 0). If we determine the cross-correlation R
yx
() with the output
y(t) and then use the relation
R
yx
() = R
xx
() h() , (9.75)
we can obtain the system impulse response h(). For example, if S
yx
(s), S
xx
(s) and
H(s) denote the associated Laplace transforms, then
S
yx
(s)
H(s) = . (9.76)
S
xx
(s)
Note that S
xx
(s) is a rather well-behaved function of the complex variable s in this
case, whereas any particular sample function of the process x(t) would not have
such a well-behaved transform. The same comment applies to S
yx
(s).
c Alan V. Oppenheim and George C. Verghese, 2010
182 Chapter 9 Random Processes
As a third example, suppose that we know the autocorrelation function R
xx
[m]
of the input x[n] to a DT LTI system, but do not have access to x[n] and there
fore cannot determine the cross-correlation R
yx
[m] with the output y[n], but can
determine the output autocorrelation R
yy
[m]. For example, if
R
xx
[m] = [m] (9.77)
and we determine R
yy
[m] to be R
yy
[m] =
_
2
1
_
|m|
, then
_
1
_
|m|
R
yy
[m] = = R
hh
[m] = h[m] h[m]. (9.78)
2
Equivalently, H(z)H(z
1
) can be obtained from the Z-transform S
yy
(z) of R
yy
[m].
Additional assumptions or constraints, for instance on the stability and causality
of the system and its inverse, may allow one to recover H(z) from knowledge of
H(z)H(z
1
).
Alan V. Oppenheim and George C. Verghese, 2010 c
MIT OpenCourseWare
http://ocw.mit.edu
6.011 Introduction to Communication, Control, and Signal Processing
Spring 2010
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.