PTSP - Lecture Notes - 2019-Modified Sllabus PDF
PTSP - Lecture Notes - 2019-Modified Sllabus PDF
PTSP - Lecture Notes - 2019-Modified Sllabus PDF
NOTES ON
Mrs. G.Ajitha
(Assistant professor)
MODULE-I
Introduction
It is remarkable that a science which began with the consideration of games of chance
should have become the most important object of human knowledge.
A brief history
Probability has an amazing history. A practical gambling problem faced by the French
nobleman Chevalier de Méré sparked the idea of probability in the mind of Blaise Pascal (1623-
1662), the famous French mathematician. Pascal's correspondence with Pierre de Fermat (1601-
1665), another French Mathematician in the form of seven letters in 1654 is regarded as the
genesis of probability. Early mathematicians like Jacob Bernoulli (1654-1705), Abraham de
Moivre (1667-1754), Thomas Bayes (1702-1761) and Pierre Simon De Laplace (1749-1827)
contributed to the development of probability. Laplace's Theory Analytique des Probabilities
gave comprehensive tools to calculate probabilities based on the principles of permutations and
combinations. Laplace also said, "Probability theory is nothing but common sense reduced to
calculation."
Later mathematicians like Chebyshev (1821-1894), Markov (1856-1922), von Mises (1883-
1953), Norbert Wiener (1894-1964) and Kolmogorov (1903-1987) contributed to new
developments. Over the last four centuries and a half, probability has grown to be one of the
most essential mathematical tools applied in diverse fields like economics, commerce, physical
sciences, biological sciences and engineering. It is particularly important for solving practical
electrical-engineering problems in communication, signal processing and computers.
Notwithstanding the above developments, a precise definition of probability eluded the
mathematicians for centuries. Kolmogorov in 1933 gave the axiomatic definition of probability
and resolved the problem.
For example, thermal noise appearing in an electronic device is generated due to random
motion of electrons. We have deterministic model for weather prediction; it takes into account
of the factors affecting weather. We can locally predict the temperature or the rainfall of a place
on the basis of previous data. Probabilistic models are established from observation of a random
phenomenon. While probability is concerned with analysis of a random phenomenon, statistics
help in building such models from data.
Deterministic versus probabilistic models
A deterministic model can be used for a physical quantity and the process generating it
provided sufficient information is available about the initial state and the dynamics of the
process generating the physical quantity. For example,
We can determine the position of a particle moving under a constant force if we know
the initial position of the particle and the magnitude and the direction of the force.
Many of the physical quantities are random in the sense that these quantities cannot be
predicted with certainty and can be described in terms of probabilistic models only. For
example,
The outcome of the tossing of a coin cannot be predicted with certainty. Thus the
outcome of tossing a coin is random.
The number of ones and zeros in a packet of binary data arriving through a
communication channel cannot be precisely predicted is random.
The ubiquitous noise corrupting the signal during acquisition, storage and transmission
can be modelled only through statistical analysis.
A digital signal is defined at discrete points and also takes a discrete set of
values.
As an example, consider the case of an analog-to-digital (AD) converter. The input to the AD
converter is an analog signal while the output is a digital signal obtained by taking the samples
of the analog signal at periodic intervals of time and approximating the sampled values by a
discrete set of values.
Random Signal
Many of the signals encountered in practice behave randomly in part or as a whole in the
sense that they cannot be explicitly described by deterministic mathematical functions such as a
sinusoid or an exponential function. Randomness arises because of the random nature of the
generation mechanism. Sometimes, limited understanding of the signal dynamics also
necessitates the randomness assumption. In electrical engineering we encounter many signals
that are random in nature. Some examples of random signals are:
i. Radar signal: Signals are sent out and get reflected by targets. The reflected signals
are received and used to locate the target and target distance from the receiver. The
received signals are highly noisy and demand statistical techniques for processing.
ii. Sonar signal: Sound signals are sent out and then the echoes generated by some
targets are received back. The goal of processing the signal is to estimate the location of
the target.
iii. Speech signal: A time-varying voltage waveform is produced by the speaker speaking
over a microphone of a telephone. This signal can be modeled as a random signal.
A sample of the speech signal is shown in Figure 1.
iv. Biomedical signals: Signals produced by biomedical measuring devices like ECG,
EEG, etc., can display specific behavior of vital organs like heart and brain. Statistical
signal processing can predict changes in the waveform patterns of these signals to detect
abnormality. A sample of ECG signal is shown in Figure 2.
v. Communication signals: The signal received by a communication receiver is generally
corrupted by noise. The signal transmitted may the digital data like video or speech and
the channel may be electric conductors, optical fiber or the space itself. The signal is
modified by the channel and corrupted by unwanted disturbances in different stages,
collectively referred to as noise.
These signals can be described with the help of probability and other concepts in
statistics. Particularly the signal under observation is considered as a realization of a random
process or a stochastic process. The terms random processes, stochastic processes and random
signals are used synonymously.
o Amplification
o Filtering
These operations are performed by passing the input signal to a system that performs the
processing. For example, filtering involves selectively emphasising certain frequency
components and attenuating others. In low-pass filtering illustrated in Fig.4, high-frequency
components are attenuated
A problem frequently come across in signal processing is the estimation of the true value
of the signal from the received noisy data. Consider the received noisy signal given by
Simple frequency selective filters cannot be applied here, because random noise cannot
be localized to any spectral band and does not have a specific spectral pattern. We have to do
this by dissociating the noise from the signal in the probabilistic sense. Optimal filters like the
Wiener filter, adaptive filters and Kalman filter deals with this problem.
In estimation, we try to find a value that is close enough to the transmitted signal. The
process is explained in Figure 6. Detection is a related process that decides the best choice out
of a finite number of possible values of the transmitted signal with minimum error probability.
In binary communication, for example, the receiver has to decide about 'zero' and 'one' on the
basis of the received waveform. Signal detection theory, also known as decision theory, is based
on hypothesis testing and other related techniques and widely applied in pattern classification,
target detection etc.
One of the major areas of application of probability theory is Information theory and
coding. In 1948 Claude Shannon published the paper "A mathematical theory of
communication" which lays the foundation of modern digital communication. Following are
two remarkable results stated in simple languages :
Digital data is efficiently represented with number of bits for a symbol decided by its
probability of occurrence.
The data at a rate smaller than the channel capacity can be transmitted over a noisy
channel with arbitrarily small probability of error. The channel capacity again is
determined from the probabilistic descriptions of the signal and the noise.
Set
A set is a well defined collection of objects. These objects are called elements or
members of the set. Usually uppercase letters are used to denote sets.
Probability Concepts
2. Sample Space: The sample space is the collection of all possible outcomes of a
random experiment. The elements of are called sample points.
3. Event: An event A is a subset of the sample space such that probability can be assigned
to it. Thus
Figure 1
Consider the following examples.
The possible outcomes are H (head) and T (tail). The associated sample space is It
is a finite sample space. The events associated with the sample space are: and .
. . .
The associated finite sample space is .Some events are
And so on.
We may have to toss the coin any number of times before a head is obtained. Thus the possible
outcomes are:
H, TH, TTH, TTTH,
How many outcomes are there? The outcomes are countable but infinite in number. The
countably infinite sample space is .
Definition of probability
Consider a random experiment with a finite number of outcomes If all the outcomes of the
experiment are equally likely , the probability of an event is defined by
where
Example 6 A fair die is rolled once. What is the probability of getting a „6‟ ?
Here and
Example 7 A fair coin is tossed twice. What is the probability of getting two „heads'?
Here and .
Total number of outcomes is 4 and all four outcomes are equally likely.
Discussion
The classical definition is limited to a random experiment which has only a finite
number of outcomes. In many experiments like that in the above examples, the sample
space is finite and each outcome may be assumed „equally likely.' In such cases, the
counting method can be used to compute probabilities of events.
Consider the experiment of tossing a fair coin until a „head' appears.As we have
discussed earlier, there are countably infinite outcomes. Can you believe that all these
outcomes are equally likely?
The notion of equally likely is important here. Equally likely means equally probable.
Thus this definition presupposes that all events occur with equal probability . Thus the
definition includes a concept to be defined
If an experiment is repeated times under similar conditions and the event occurs in times,
then
Example 8 Suppose a die is rolled 500 times. The following table shows the frequency each
face.
We see that the relative frequencies are close to . How do we ascertain that these relative
Discussion This definition is also inadequate from the theoretical point of view.
We cannot repeat an experiment infinite number of times.
How do we ascertain that the above ratio will converge for all possible
sequences of outcomes of the experiment?
Definition Let be a sample space and a sigma field defined over it. Let be a
mapping from the sigma-algebra into the real line such that for each , there exists a
unique . Clearly is a set function and is called probability, if it satisfies the
following three axioms.
Figure 2
Discussion
Any assignment of probability assignment must satisfy the above three axioms
If ,
This is a special case of axiom 3 and for a discrete sample space , this simpler version
may be considered as the axiom 3. We shall give a proof of this result below.
1.
Suppose,
Then
Therefore
3. where where
We have,
4. If
We have,
6. We can apply the properties of sets to establish the following result for
,
Consider a finite sample space . Then the sigma algebra is defined by the power set
of S. For any elementary event , we can assign a probability P( si ) such that,
In a special case, when the outcomes are equi-probable, we can assign equal probability
p to each elementary event.
Suppose represent the elementary events. Thus is the event of getting „1',
is the event of getting '2' and so on.
Since all six disjoint events are equiprobable and we get ,
Example 10 Consider the experiment of tossing a fair coin until a head is obtained discussed in
Example 3. Here . Let us call
Suppose the sample space S is continuous and un-countable. Such a sample space arises
when the outcomes of an experiment are numbers. For example, such sample space occurs
when the experiment consists in measuring the voltage, the current or the resistance. In such a
case, the sigma algebra consists of the Borel sets on the real line.
Then for
In many applications we have to deal with a finite sample space and the elementary
events formed by single elements of the set may be assumed equiprobable. In this case, we can
define the probability of the event A according to the classical definition discussed earlier:
where = number of elements favorable to A and n is the total number of elements in the
sample space .
Thus calculation of probability involves finding the number of elements in the sample
space and the event A. Combinatorial rules give us quick algebraic formulae to find the
elements in .We briefly outline some of these rules:
1. Product rule Suppose we have a set A with m distinct elements and the set B with n
Example 1 A fair die is thrown twice. What is the probability that a 3 will appear at least
once.
Solution: The sample space corresponding to two throws of the die is illustrated in the
following table. Clearly, the sample space has elements by the product rule. The
event corresponding to getting at least one 3 is highlighted and contains 11 elements. Therefore,
Example 2 Birthday problem - Given a class of students, what is the probability of two
students in the class having the same birthday? Plot this probability vs. number of students and
be surprised!.
Example 3 An urn contains 6 red balls, 5 green balls and 4 blue balls. 9 balls were picked at
random from the urn without replacement. What is the probability that out of the balls 4 are red,
3 are green and 2 are blue?
Solution :
Example 4 What is the probability that in a throw of 12 dice each face occurs twice.
Solution: The total number of elements in the sample space of the outcomes of a single
throw of 12 dice is
The number of favourable outcomes is the number of ways in which 12 dice can be
arranged in six groups of size 2 each – group 1 consisting of two dice each showing 1, group 2
consisting of two dice each showing 2 and so on.
Therefore, the total number distinct groups
Consider the probability space . Let A and B two events in . We ask the
following question –
Given that A has occurred, what is the probability of B?
Let us consider the case of equiprobable events discussed earlier. Let sample points
be favourable for the joint event .
Figure 1
Clearly ,
This concept suggests us to define conditional probability. The probability of an event B under
the condition that another event A has occurred is called the conditional probability of B given
A and defined by
We can similarly define the conditional probability of A given B , denoted by .
Example 2 A family has two children. It is known that at least one of the children is a girl.
What is the
probability that both the children are girls?
Clearly,
In the following we show that the conditional probability satisfies the axioms of
probability.
By definition
Axiom 1:
Axiom 2 :
We have ,
Axiom 3 :
We have ,
Figure 2
If , then
We have ,
We have ,
Joint probability is not the same as conditional probability, though the two concepts are
often confused. Conditional probability assumes that one event has taken place or will take
place, and then asks for the probability of the other (A, given B). Joint probability does not have
such conditions; it simply asks for the chances of both happening (A and B). In a problem, to
help distinguish between the two, look for qualifiers that one event is conditional on the other
(conditional) or whether they will happen concurrently (joint).
Probability definitions can find their way into CFA exam questions. Naturally, there may also
be questions that test the ability to calculate joint probabilities. Such computations require use
of the multiplication rule, which states that the joint probability of A and B is the product of the
conditional probability of A given B, times the probability of B. In probability notation:
Given a conditional probability P(A | B) = 40%, and a probability of B = 60%, the joint
probability P(AB) = 0.6*0.4 or 24%, found by applying the multiplication rule.
P(AUB)=P(A)+P(B)-P(AחB)
Moreover, the rule generalizes for more than two events provided they are all independent of
one another, so the joint probability of three events P(ABC) = P(A) * (P(B) * P(C), again
assuming independence.
Total Probability
Remark
(1) A decomposition of a set S into 2 or more disjoint nonempty subsets is called a partition
of S.The subsets form a partition of S if
(2) The theorem of total probability can be used to determine the probability of a complex
event in terms of related simpler events. This result will be used in Bays' theorem to be
discussed to the end of the lecture.
Example 3 Suppose a box contains 2 white and 3 black balls. Two balls are picked at
random without replacement.
Clearly and form a partition of the sample space corresponding to picking two
balls from the box.
This result is known as the Baye's theorem. The probability is called the a priori
probability and is called the a posteriori probability. Thus the Bays' theorem enables
us to determine the a posteriori probability from the observation that B has occurred.
This result is of practical importance and is the heart of Baysean classification, Baysean
estimation etc.
Example 6
In a binary communication system a zero and a one is transmitted with probability 0.6
and 0.4 respectively. Due to error in the communication system a zero becomes a one with a
probability 0.1 and a one becomes a zero with a probability 0.08. Determine the probability (i)
of receiving a one and (ii) that a one was transmitted when the received message is one.
Given and
Example 7: In an electronics laboratory, there are identically looking capacitors of three makes
in the ratio 2:3:4. It is known that 1% of , 1.5% of are
defective. What percentages of capacitors in the laboratory are defective? If a capacitor picked
at defective is found to be defective, what is the probability it is of make ?
Let D be the event that the item is defective. Here we have to find .
Here
Independent events
Two events are called independent if the probability of occurrence of one event does not
affect the probability of occurrence of the other. Thus the events A and B are independent if
and
where and are assumed to be non-zero.
or --------------------
Two events A and B are called statistically dependent if they are not independent. Similarly, we
can define the independence of n events. The events are called independent if
and only if
Example 4 Consider the example of tossing a fair coin twice. The resulting sample space is
given by and all the outcomes are equiprobable.
Let be the event of getting „tail' in the first toss and be the
event of getting „head' in the second toss. Then
and
Again, so that
Example 5 Consider the experiment of picking two balls at random discussed in above
example
In this case, and .
Therefore, and and B are dependent.
RANDOM VARIABLE
INTRODUCTION
In application of probabilities, we are often concerned with numerical values which are
random in nature. For example, we may consider the number of customers arriving at a service
station at a particular interval of time or the transmission time of a message in a communication
system. These random quantities may be considered as real-valued function on the sample
space. Such a real-valued function is called real random variable and plays an important role in
describing random data. We shall introduce the concept of random variables in the following
sections.
Random variable
A random variable associates the points in the sample space with real numbers.
Consider the probability space and function mapping the sample space
The function is called a random variable if the inverse image of all Borel sets
under is an event. Thus, if is a random variable, then
Figure: Random Variable
Observations:
is the domain of .
The range of denoted by ,is given by
Clearly .
• The above definition of the random variable requires that the mapping is such that
is a valid event in . If is a discrete sample space, this requirement is met
by any mapping . Thus any mapping defined on the discrete sample space is
a random variable.
Example 2 Consider the example of tossing a fair coin twice. The sample space is S={
HH,HT,TH,TT} and all four outcomes are equally likely. Then we can define a random variable
as follows
Here .
Example 3 Consider the sample space associated with the single toss of a fair die. The
sample space is given by .
If we define the random variable that associates a real number equal to the number on
the face of the die, then .
.
Discrete, Continuous and Mixed-type Random Variables
Consider the Borel set , where represents any real number. The equivalent
event is denoted as .The event can be taken
as a representative event in studying the probability description of a random variable . Any
other event can be represented in terms of this event. For example,
and so on.
Figure 5
This follows from the fact that is a probability and its value should lie between 0
and 1.
is a non-decreasing function of . Thus, if
Is right continuous.
.
We have ,
We can further establish the following results on probability of events on the real line:
Thus we have seen that given , we can determine the probability of any
event involving values of the random variable .Thus is a complete description
of the random variable .
Find a) .
b) .
c) .
d) .
Solution:
Figure 6 shows the plot of FX(x).
Figure 6
The discrete random variable in this case is completely specified by the probability mass
function (pmf) .
Clearly,
• Suppose .Then
Example 1
Interpretation of
so that
.
This follows from the fact that is a non-decreasing function
Figure 8 below illustrates the probability of an elementary interval in terms of the pdf.
Example 2 Consider the random variable with the distribution function
Remark: Using the Dirac delta function we can define the density function for a discrete
random variables.
Consider the random variable defined by the probability mass function (pmf)
Then the density function can be written in terms of the Dirac delta function as
Example 3
Consider the random variable defined with the distribution function given by,
where
Suppose denotes the countable subset of points on such that the random
variable is characterized by the probability mass function . Similarly, let
be a continuous subset of points on such that RV is characterized by the
probability density function .
Clearly the subsets and partition the set If , then .
Thus the probability of the event can be expressed as
Taking the derivative with respect to x , we get
where
Figure 10
The pdf is given by
where
and
Example 5
X is the random variable representing the life time of a device with the PDF for .
Define the following random variable
Find FY(y).
Solution:
Other distribution and density rvs
In the following, we shall discuss a few commonly-used discrete random variabes. The
importance of these random variables will be highlighted.
Suppose X is a random variable that takes two values 0 and 1, with probability mass
functions
And
Such a random variable X is called a Bernoulli random variable, because it describes the
outcomes of a Bernoulli trial.
Figure 2
Remark
We can define the pdf of with the help of Dirac delta function. Thus
The Bernoulli RV is the simplest discrete RV. It can be used as the building block for
many discrete RVs.
For the Bernoulli RV,
Thus all the moments of the Bernoulli RV have the same value of
Suppose X is a discrete random variable taking values from the set . is called a
binomial random variable with parameters n and if
where
The sum of n independent identically distributed Bernoulli random variables is a
binomial random variable.
The binomial distribution is useful when there are two types of objects - good, bad;
correct, erroneous; healthy, diseased etc.
Example 3 In a binary communication system, the probability of bit error is 0.01. If a block of
8 bits are transmitted, find the probability that
Suppose is the random variable representing the number of bit errors in a block of 8
bits. Then
Therefore,
The probability mass function for a binomial random variable with n = 6 and p =0.8 is
shown in the Figure 3 below.
Figure 3
Where
.
A discrete random variable X is called a Poisson random variable with the parameter if
and
Figure 2
i. no call is received
ii. exactly 5 calls are received
iii. More than 3 calls are received.
Solution: Let X be the random variable representing the number of calls received. Given
Where Therefore,
0.9897
The Poisson distribution is also used to approximate the binomial distribution when n is
very large and p is small.
Then
Thus the Poisson approximation can be used to compute binomial probabilities for large n. It
also makes the analysis of such probabilities easier. Typical examples are:
Example 4 Suppose there is an error probability of 0.01 per word in typing. What is the
probability that there will be more than 1 error in a page of 120 words?
Solution: Suppose X is the RV representing the number of errors per page of 120 words.
Where Therefore,
Figure 1
We use the notation to denote a random variable X uniformly distributed over the
interval
Figure 2
Example 1
Figure 3 illustrates two normal variables with the same mean but different variances.
Figure 3
Substituting , we get
where is the distribution function of the standard normal variable.
Thus can be computed from tabulated values of . The table was very useful
in the pre-computer days.
These results follow from the symmetry of the Gaussian pdf. The function is tabulated and
the tabulated results are used to compute probability involving the Gaussian random variable.
Using the Error Function to compute Probabilities for Gaussian Random Variables
The function is closely related to the error function and the complementary error
function .
Note that,
Proof:
Exponential Random Variable
Figure 1
Example 1
Figure 6
Similarly,
Relation between the Rayleigh Distribution and the Gaussian Distribution
We discussed conditional probability in an earlier lecture. For two events A and B with
, the conditional probability was defined as
Clearly, the conditional probability can be defined on events involving a random variable
X.
Consider the event and any event B involving the random variable X . The
conditional distribution function of X given B is defined as
We can verify that satisfies all the properties of the distribution function.
Particularly.
And .
.
Is a non-decreasing function of .
All the properties of the pdf applies to the conditional pdf and we can easily show that
Then
And
Case 2:
and
and are plotted in the following figures.
Figure 1
For , .Therefore,
For , .Therefore,
Thus,
and . Then
where and
Remark
Provided exists.
Is also called the mean or statistical average of the random variable and is denoted
by
Note that, for a discrete RV with the probability mass function (pmf)
the pdf is given by
Example 1
Then
Example 2
Then
Example 3 Let X be a continuous random variable with
Then
=
Hence EX does not exist. This density function is known as the Cauchy density function.
We shall illustrate the above result in the special case when is one-to-one
and monotonically increasing function of x In this case,
Figure 2
The following important properties of the expectation operation can be immediately derived:
(a) If is a constant,
Clearly
Mean-square value
Variance
Second central moment is called as variance
For a random variable with the pdf and mean the variance of is denoted by
and
defined as
Example 4
Example 5
Find the variance of the random variable discussed in above example. As already computed
For example, consider two random variables with pmf as shown below. Note that
each of has zero mean.The variances are given by and implying that
has more spread about the mean.
Properties of variance
(1)
(2) If then
(3) If is a constant,
We can define the nth moment and the nth central- moment of a random variable X by the
following relations
Note that
The mean is the first moment and the mean-square value is the second
moment
The first central moment is 0 and the variance is the second central
moment
SKEWNESS
The third central moment measures lack of symmetry of the pdf of a random variable
variable. Is called kurtosis. If the peak of the pdf is sharper, then the
random variable has a higher kurtosis.
The mean and variance also give some quantitative information about the bounds of RVs.
Following inequalities are extremely useful in many practical problems.
Chebychev Inequality
The standard deviation gives us an intuitive idea how the random variable is distributed
about the mean. This idea is more precisely expressed in the remarkable Chebysev Inequality
stated below. For a random variable with mean
Proof:
MODULE-II
Characteristic function
Example 1
Example 2
Suppose X is a random variable taking values from the discrete set with
corresponding probability mass function for the value
Then,
Thus ,
TRANSFORMATION OF A RANDOM VARIABLE
Description:
Suppose we are given a random variable X with density fX(x). We apply a function g
to produce a random variable Y = g(X). We can think of X as the input to a black
box,and Y the output.
In this lecture, we extend the concepts of joint random variables to the case of multiple
random variables. A generalized analysis will be presented for random variables defined on
the same sample space.
We may define two or more random variables on the same sample space. Let and be
two real random variables defined on the same probability space The mapping
such that for is called a joint random variable.
Figure 1
Recall the definition of the distribution of a single random variable. The event was
used to define the probability distribution function . Given , we can find the
probability of any event involving the random variable. Similarly, for two random variables
and , the event is considered as the representative event.
The probability is called the joint distribution function or the
joint cumulative distribution function (CDF) of the random variables and and denoted by
.
Figure 2
Properties of JPDF
1)
2)
3)
Note that
4)
6)
7)
To prove this
Similarly .
If and are two discrete random variables defined on the same probability space
such that takes values from the countable subset and takes values from the
countable subset .Then the joint random variable can take values from the countable
subset in . The joint random variable is completely specified by their joint
probability mass function
Given , we can determine other probabilities involving the random variables and
Remark
This is because
and similarly
These probability mass functions and obtained from the joint probability
mass functions are called marginal probability mass functions .
Example 4 Consider the random variables and with the joint probability mass function as
tabulated in Table 1. The marginal probabilities and are as shown in the last
column and the last row respectively.
Table 1
If and are two continuous random variables and their joint distribution function is
continuous in both and , then we can define joint probability density function by
provided it exists.
Clearly
Example 6 The joint pdf of two random variables and are given by
• Find .
• Find .
• Find and .
• What is the probability ?
Conditional Distributions
We discussed the conditional CDF and conditional PDF of a random variable conditioned on
some events defined in terms of the same random variable. We observed that
and
We can define these quantities for two random variables. We start with the conditional
probability mass functions for two random variables.
Conditional Probability Density Functions
Suppose and are two discrete jointly random variable with the joint PMF The
conditional PMF of given is denoted by and defined as
Consider two continuous jointly random variables and with the joint probability
distribution function We are interested to find the conditional distribution function of
one of the random variables on the condition of a particular value of the other random variable.
We cannot define the conditional distribution function of the random variable on the
condition of the event by the relation
Because,
Similarly we have
Example 2 X and Y are two jointly random variables with the joint pdf given by
find,
(a)
(b)
(c)
Solution:
Since
We get
Let and be two random variables characterized by the joint distribution function
We are often interested in finding out the probability density function of a function of two or
more RVs. Following are a few examples.
where is received signal which is the superposition of the message signal and the noise
.
Figure 1
Consider Figure 2
Figure 2
We have
Example 1
Suppose X and Y are independent random variables and each uniformly distributed over (a, b).
And are as shown in the figure below.
The PDF of is a triangular probability density function as shown in the figure.
and
Thus we can determine the mean and the variance of .
Can we guess about the probability distribution of ?
The central limit theorem (CLT) provides an answer to this question.
The CLT states that under very general conditions converges in distribution to
as . The conditions are:
We shall consider the first condition only. In this case, the central-limit theorem can be stated as
follows:
We give a less rigorous proof of the theorem with the help of the characteristic function.
Further we consider each of to have zero mean. Thus,
Clearly,
The characteristic function of is given by
We will show that as the characteristic function is of the form of the characteristic
function of a Gaussian random variable.
Expanding in power series
Substituting
Note also that each term in involves a ratio of a higher moment and a power of and
therefore,
which is the characteristic function of a Gaussian random variable with 0 mean and
variance .
MODULE III
Where
Note that
As is varied over the entire axis, the corresponding (non-overlapping) differential regions
in plane cover the entire plane.
Thus,
Proof:
Example 3
Remark
(1) We have earlier shown that expectation is a linear operator. We can generally write
Thus
(2) If are independent random variables and ,then
Joint Moments of Random Variables
Just like the moments of a random variable provide a summary description of the random
variable, so also the joint moments provide summary description of two random variables. For
two continuous random variables , the joint moment of order is defined as
where and
Remark
(1) If are discrete random variables, the joint expectation of order and is
defined as
We will also show that To establish the relation, we prove the following result:
Non-negativity of the left-hand side implies that its minimum also must be nonnegative.
Thus
then
Note that independence implies uncorrelated. But uncorrelated generally does not imply
independence (except for jointly Gaussian random variables).
Note that is same as the two-dimensional Fourier transform with the basis
function instead of
If and are discrete random variables, we can define the joint characteristic function in
terms of the joint probability mass function as follows:
The joint characteristic function has properties similar to the properties of the chacteristic
function of a single random variable. We can easily establish the following properties:
1.
2.
3. If and are independent random variables, then
4. We have,
Hence,
Example 2 The joint characteristic function of the jointly Gaussian random variables and
with the joint pdf
We can use the joint characteristic functions to simplify the probabilistic analysis as
illustrated on next page:
Many practically occurring random variables are modeled as jointly Gaussian random variables.
For example, noise samples at different instants in the communication system are modeled as
jointly Gaussian random variables.
Two random variables are called jointly Gaussian if their joint probability density
means
variances
correlation coefficient
We denote the jointly Gaussian random variables and with these parameters as
The joint pdf has a bell shape centered at as shown in the Figure 1 below. The
variances determine the spread of the pdf surface and determines the orientation
of the surface in the plane.
(1) If and are jointly Gaussian, then and are both Gaussian.
We have
Similarly
(2) The converse of the above result is not true. If each of and is Gaussian, and are
not necessarily jointly Gaussian. Suppose
in this example is non-Gaussian and qualifies to be a joint pdf. Because,
And
Similarly,
(3) If and are jointly Gaussian, then for any constants and ,the random variable
given by is Gaussian with mean and variance
(4) Two jointly Gaussian RVs and are independent if and only if and are
uncorrelated .Observe that if and are uncorrelated, then
Example 1 Suppose X and Y are two jointly-Gaussian 0-mean random variables with variances
of 1 and 4 respectively and a covariance of 1. Find the joint PDF
We have
Suppose then
thus the linear transformation of two Gaussian random variables is a Gaussian random
variable.
Hence proved.
Univariate transformations
When working on the probability density function (pdf) of a random variable X, one
is often led to create a new variable Y defined as a function f(X) of the original variable X.
For example, if X~N(µ, ²), then the new variable:
Y = f(X) = (X - µ)/
Is N (0, 1).
It is also often the case that the quantity of interest is a function of another (random)
quantity whose distribution is known. Here are a few examples:
*Scaling: from degrees to radians, miles to kilometers, light-years to parsecs, degrees
Celsius to degrees Fahrenheit, linear to logarithmic scale, to the distribution of the
variance
* Laws of physics: what is the distribution of the kinetic energy of the molecules of a gas
if the distribution of the speed of the molecules is known ?
Multivariate transformations
The problem extends naturally to the case when several variables Yj are defined from
several variables Xi through a transformation y = h(x).
Here are some examples:
Polar coordinates
Let f(x, y) be the joint probability density function of the pair of r. v. {X, Y},
expressed in the Cartesian reference frame {x, y}. Any point (x, y) in the plane can also be
identified by its polar coordinates (r, ). So any realization of the pair {X, Y} produces a
pair of values of r and , therefore defining two new r. v. R and .
What is the joint probability density function of R and? What are the (marginal)
distributions of R and of ?
Sampling distributions
Let f(x) is the pdf of the r. v. X. Let also Z1 = z1(x1, x2... xn) be a statistic, e.g. the
sample mean. What is the pdf of Z1?
Z1 is a function of the n r. v. Xi (with n the sample size), that are lid with pdf f(x). If it
is possible to identify n - 1 other independent statistics Zi, i = 2... n, then a transformation
Z = h(X) is defined, and g(z), the joint distribution of Z = {Z1, Z2, ..., Zn} can be
calculated. The pdf of Z1 is then calculated as one of the marginal distributions of Z by
integrating g(z) over zi , i = 2, .., n.
Integration limits
Calculations on joint distributions often involve multiple integrals whose
integration limits are themselves variables. An appropriate change of variables sometimes
allows changing all these variables but one into fixed integration limits, thus making the
calculation of the integrals much simpler.
Adding a constant: Y = X + b
Subtracting a constant: Y = X - b
Multiplying by a constant: Y = mX
Dividing by a constant: Y = X/m
Multiplying by a constant and adding a constant: Y = mX + b
Dividing by a constant and subtracting a constant: Y = X/m - b
Indeed, suppose (this is the notation for "the is the distribution density of ") and
. For any domain of the space we can
write
We make the change of variables
in the last integral.
(Linear transformation of
random variables)
For two independent standard normal variables (s.n.v.) and the combination
is distributed as .
A product of normal variables is not a normal variable. See the section on the chi-squared
distribution.
MODULE-IV
Recall that a random variable maps each sample point in the sample space to a point in
the real line. A random process maps each sample point to a waveform.
The value of a random process is at any time can be described from its
probabilistic model.
The state is the value taken by at a time , and the set of all such states is called the
state space. A random process is discrete-state if the state-space is finite or countable. It also
means that the corresponding sample space is also finite or countable. Otherwise , the random
process is called continuous state.
Firtst order and nth order Probability density function and Distribution functions
As we have observed above that at a specific time is a random variable and can be
described by its probability distribution function This distribution
function is called the first-order probability distribution function.
We can similarly define the first-order probability density function
We defined the moments of a random variable and joint moments of random variables. We can
define all the possible moments and joint moments of a random process .
Particularly, following moments are important.
• .
Note that
The autocorrelation function and the autocovariance functions are widely used to characterize a
class of random process called the wide-sense stationary process.
The concept of stationarity plays an important role in solving practical problems involving
random processes. Just like time-invariance is an important characteristics of many
deterministic systems, stationarity describes certain time-invariant property of a class of random
processes. Stationarity also leads to frequency-domain description of a random process.
Thus, the joint distribution functions of any set of random variables does
not depend on the placement of the origin of the time axis. This requirement is a very strict.
Less strict form of stationarity may be defined.
Particularly,
If then is
called order stationary.
Is called order stationary does not depend on the placement of the origin of the time
axis. This requirement is a very strict. Less strict form of stationary may be defined.
If is stationary up to order 1
Let us assume Then
As a consequence
If is stationary up to order 2
Put
Similarly,
Therefore, the autocorrelation function of a SSS process depends only on the time lag
We can also define the joint stationary of two random processes. Two processes
And are called jointly strict-sense stationary if their joint probability distributions
of any order is invariant under the translation of time. A complex random process
is called SSS if and are jointly SSS.
This is because
Wide-sense stationary process
It is very difficult to test whether a process is SSS or not. A subclass of the SSS process called
the wide sense stationary process is extremely important from practical point of view.
Remark
(2) An SSS process is always WSS, but the converse is not always true.
This is the model of the carrier wave (sinusoid of fixed frequency) used to analyse the
noise performance of many receivers.
Note that
By applying the rule for the transformation of a random variable, we get
Note that
and
Such signals are called power signals. For a power signal the autocorrelation function is
defined as
Measures the similarity between a signal and its time-shifted version.
We see that of the above periodic signal is also periodic and its maximum occurs when
The autocorrelation of the deterministic signal gives us insight into the properties of the
autocorrelation function of a WSS process. We shall discuss these properties next.
Where is the complex conjugate of For a discrete random sequence, we can define
the autocorrelation sequence similarly.
Because,
We have
4. is a positive semi-definite function in the sense that for any positive integer and
real ,
Proof
Proof: Note that a real WSS random process is called mean-square periodic ( MS
periodic ) with a period if for every
Again
If and are two real jointly WSS random processes, their cross-correlation
functions are independent of and depends on the time-lag. We can write the cross-correlation
function
We Have
Further,
Example 2
Consider a random process which is sum of two real jointly WSS random processes.
We have
Example 3
Suppose
Often we are interested in finding the various ensemble averages of a random process
by means of the corresponding time averages determined from single realization of the random
process. For example we can compute the time-mean of a single realization of the random
process by the formula
which is constant for the selected realization. Note that represents the dc value of .
The above definitions are in contrast to the corresponding ensemble average defined by
The following time averages are of particular interest
Note that, and are functions of random variables and are governed by
respective probability distributions. However, determination of these distribution functions is
difficult and we shall discuss the behaviour of these averages in terms of their mean and
variances. We shall further assume that the random processes and are WSS.
Let us consider the simplest case of the time averaged mean of a discrete-time WSS random
process given by
The mean of
Let us consider the time-averaged mean for the continuous case. We have
The above double integral is evaluated on the square area bounded by and We
divide this square region into sum of trapezoidal strips parallel to (See Figure
1)Putting and noting that the differential area between and is
, the above double integral is converted to a single integral as follows:
Figure 1
Ergodicity Principle
If the time averages converge to the corresponding ensemble averages in the probabilistic sense,
then a time-average computed from a large realization can be used as the value for the
corresponding ensemble average. Such a principle is the ergodicity principle to be discussed
below:
and
therefore, the condition for ergodicity in mean is
Further,
Here
hence is mean ergodic.
Autocorrelation ergodicity
We consider so that,
Then will be autocorrelation ergodic if is mean ergodic.
Thus will be autocorrelation ergodic if
where
Simpler condition for autocorrelation ergodicity of a jointly Gaussian process can be found.
Example 2
For each realization, both the time-averaged mean and the time-averaged autocorrelation
function converge to the corresponding ensemble averages. Thus the random-phased sinusoid is
ergodic in both mean and autocorrelation.
Consider a continuous LTI system with impulse response h (t). Assume that the
system is always causal and stable. When a continuous time Random process X (t) is applied
on this system, the output response is also a continuous time random process Y (t). If the
random processes X and Y are discrete time signals, then the linear system is called a discrete
time system. In this unit we concentrate on the statistical and spectral characteristics of the
output random process Y (t).
System Response: Let a random process X (t) be applied to a continuous linear time
invariant system whose impulse response is h(t) as shown in below figure. Then the output
response Y (t) is also a random process. It can be expressed by the convolution integral, Y (t)
= h (t) *X (t)
X(t) Y (t)
h (t)
Mean Value of Output Response: Consider that the random process X (t) is wide sense
stationary process.
122
Autocorrelation Function of Output Response: The autocorrelation of Y (t) is
It is observed that the output autocorrelation function is a function of only τ. Hence the output
random process Y(t) is also WSS random process.
If the input X (t) is WSS random process, then the cross correlation function of input X (t) and
output Y(t) is
123
Spectral Characteristics of a System Response: Consider that the random process X (t) is a
WSS random process with the autocorrelation function (τ) applied through an LTI system. It
is noted that the output response Y (t) is also a WSS and the processes X (t) and Y (t) are
jointly WSS. We can obtain power spectral characteristics of the output process Y(t) by taking
the Fourier transform of the correlation functions.
Power Density Spectrum of Response: Consider that a random process X (t) is applied on an
LTI system having a transfer function H(ω). The output response is Y (t). If the power
spectrum of the input process is SXX (ω), then the power spectrum of the output response is
given by SYY (ω) =
Proof: Let ( ) be the autocorrelation of the output response Y (t). Then the power spectrum
of the response is the Fourier transform of ( ).
124
Therefore SYY (ω) = H*(ω) H(ω) SXX (ω) = H(-ω)H(ω) SXX
Similarly, we can prove that the cross power spectral density function
is SXY (ω) = SXX (ω) H(ω) and SYX (ω) = SXX (ω) H(-ω)
Types of Random Processes: In practical situations, random process can be categorized into
different types depending on their frequency components. For example information bearing
signals such as audio, video and modulated waveforms etc., carry the information within a
specified frequency band.
125
The Important types of Random processes are;
A random process is defined as a low pass random process X (t) if its power spectral
density SXX (ω) has significant components within the frequency band as shown in below
figure. For example baseband signals such as speech, image and video are low pass
random processes.
(2).Band pass random processes: A random process X (t) is called a band pass process if its
power spectral density SXX (ω) has significant components within a band width W that does
not include ω
=0. But in practice, the spectrum may have a small amount of power spectrum at ω =0, as
shown in the below figure. The spectral components outside the band W are very small and
can be neglected.
For example, modulated signals with carrier frequency ω0 and band width W are band pass
random processes. The noise transmitting over a communication channel can be modelled as a
band pass process.
126
(3).Band Limited random processes: A random process is said to be band limited if its
power spectrum components are zero outside the frequency band of width W that does not
include ω =0. The power density spectrum of the band limited band pass process is shown in
below figure.
(4).Narrow band random processes: A band limited random process is said to be a narrow
band process if the band width W is very small compared to the band centre frequency, i.e.
W<< ω0, where W=band width and ω0 is the frequency at which the power spectrum is
maximum. The power density spectrum of a narrow band process N(t) is shown in below
figure.
127
Representation of a narrow band process: For any arbitrary WSS random processes N(t),
The quadrature form of narrow band process can be represented as N(t) = X(t) Cos ω 0t –
Y(t)Sin ω0t
Where X(t) and Y(t) are respectively called the in-phase and quadrature phase components of
N(t). They can be expressed as
Properties of Band Limited Random Processes: Let N (t) be any band limited WSS random
process with zero mean value and a power spectral density, SNN(ω). If the random process is
represented by
N (t) = X (t) Cos ω0t – Y(t)Sin ω0t then some important properties of X (t) and Y (t) are given
below
8. If N (t) is a Gaussian random process, then X (t) and Y (t) are jointly Gaussian.
9. The relationship between autocorrelation and power spectrum SNN (ω) is
128
10. If N (t) is zero mean Gaussian and its psd, SN(ω) is symmetric about +/-w0 then X (t) and Y (t)
are statistically independent.
129
MODULE-V
And
Where the contribution to the average is power at frequency w and represents the
power spectral
density of . As , the left-hand side in the above expression represents the
130
average power of
Therefore, the PSD of the process is defined in the limiting sense by
We have
Figure 1
Note that the above integral is to be performed on a square region bounded by and
as illustrated in Figure 1.Substitute so that is a family of straight
lines parallel to The differential area in terms of is given by the shaded area and
equal to The double integral is now replaced by a single integral in
131
Therefore,
132
Figure 2
Figure 4
133
Properties of the PSD
being the Fourier transform of it shares the properties of the Fourier transform.
Here we discuss
important properties of
Consider a random process which is sum of two real jointly WSS random processes
As we have seen earlier,
Thus we see that includes contribution from the Fourier transform of the cross-
correlation functions
These Fourier transforms represent cross power spectral densities.
Given two real jointly WSS random processes the cross power spectral
density (CPSD) is defined as
Proceeding in the same way as the derivation of the Wiener-Khinchin-Einstein theorem for the
WSS process, it
can be shown that
and
The cross-correlation function and the cross-power spectral density form a Fourier transform
pair and we can
write
135
and
The CPSD is a complex function of the frequency ‟w‟. Some properties of the CPSD of two
jointly WSS processes
are listed below:
(1)
Note that
We have
136
Where is the Dirac delta function?
Observe that
137
Similarly,
We have,
Wiener-Khinchin-Einstein theorem
The Wiener-Khinchin-Einstein theorem is also valid for discrete-time random processes. The
power spectral density of the WSS process is the discrete-time Fourier transform
of autocorrelation sequence.
Thus and forms a discrete-time Fourier transform pair. A generalized PSD can be
defined in terms of as follows
138
clearly,
Figure 1
Linear system
The system is called linear if the principle of superposition applies: the weighted sum of
inputs results in the weighted sum of the corresponding outputs. Thus for a linear system
139
Example 1 Consider the output of a differentiator, given by
Then,
It is easy to check that that the differentiator in the above example is a linear time-invariant
system.
Figure 2
Recall that any function x(t) can be represented in terms of the Dirac delta function as follows
140
Where is the response at time t due to the shifted impulse?
Figure 3 shows the input-output relationship of an LTI system in terms of the impulse response
and the frequency response.
141
Figure 3
Consider an LTI system with impulse response h (t). Suppose is a WSS process
input to the system. The output of the system is given by
Where we have assumed that the integrals exist in the mean square sense.
142
T The Cross correlation of the input {X(t)} and the out put {Y ( t )} is given by
The Cross correlation of the input {X (t)} and the out put {Y ( t )} is given by
143
Thus we see that is a function of lag only. Therefore, and are jointly
wide-sense stationary.
Thus the autocorrelation of the output process depends on the time-lag , i.e.,
Thus
The above analysis indicates that for an LTI system with WSS input
Using the property of Fourier transform, we get the power spectral density of the output
process given by
144
Also note that
Taking the Fourier transform of we get the cross power spectral density
given by
Figure 4
Example 3
A random voltage modeled by a white noise process with power spectral density
Therefore,
(a)
then ,
(2)
and
(3)
Note that
And
Again
147
and
148
Where and the integral is defined in the mean-square sense. See the illustration in
Figure 2.
Figure 2
and
149
The Hilbert transform of is generally denoted as Therefore, from (2) and (3) we
establish
and
The realization for the in phase and the quadrature phase components is shown in Figure 3
below.
Figure 3
From the above analysis, we can summarize the following expressions for the autocorrelation
functions
150
Where
Figure 4
Similarly,
151
Notice that the cross power spectral density is purely imaginary. Particularly, if
is locally symmetric about
Implying that
152
MODULE-V
153
Arbitrary Noise Source
154
Resistor Noise Sources in Series
155
A Two Port Network
156
Equivalently the above circuit can also be represented with two separate
inputs as
157
Practical Noisy Networks
158
Average Noise Temperatures
159
Noise Bandwidth
We know that
160
Narrowband Noise
161
162
Information Theory
163
164
165
166
167
168
169
170