Stochastics Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Random Experiments:

In the study of probability, any process of observation is referred to as an experiment. The results of
observation are called the outcomes of the experiment. An experiment is called a random experiment if its
outcome cannot be predicted. Typical examples of a random experiment are the roll of a die, the toss of a coin,
drawing a card from a deck, or selecting a message signal for transmission from several messages.
Sample Space:
The set of all possible outcomes of a random experiment is called the sample space (or universal set),
and it is denoted by S. An element in S is called a sample point. Each outcome of a random experiment
corresponds to a sample point.
EXAMPLE 1
Find the sample space for the experiment of tossing a coin (a) once and (b) twice.
(a) There are two possible outcomes, heads or tails. Thus
𝑆 = {𝐻, 𝑇)
where H and T represent head and tail, respectively.
(b) There are four possible outcomes.
They are pairs of heads and tails. Thus
𝑆 = (𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇)
EXAMPLE 2
Find the sample space for the experiment of tossing a coin repeatedly and of counting the number of
tosses required until the first head appears.
Clearly all possible outcomes for this experiment are the terms of the sequence 1,2,3, . . . . Thus
𝑆 = (1, 2, 3, . . . )
Note that there are an infinite number of outcomes.
EXAMPLE 3
Find the sample space for the experiment of measuring (in hours) the lifetime of a transistor.
Clearly all possible outcomes are all nonnegative real numbers. That is,
𝑆 = {𝑡 ∶ 0 ≤ 𝑡 ≤ ∞}
where z represents the life of a transistor in hours.
Note that any particular experiment can often have many different sample spaces depending on the
observation of interest.
A sample space S is said to be discrete if it consists of a finite number of sample points (as in Example
1) or countably infinite sample points (as in Example 2).
A set is called countable if its elements can be placed in a one-to-one correspondence with the positive
integers.
A sample space S is said to be continuous if the sample points constitute a continuum (as in Example 3).
Random Variables :
The main purpose of using a random variable is that we can define certain probability functions that
make it both convenient and easy to compute the probabilities of various events.
Definitions:
Consider a random experiment with sample space S. A random variable 𝑋(𝜁) is a single-valued real
function that assigns a real number called the value of 𝑋(𝜁) to each sample point 𝜁 of S. Often, we use a single
letter 𝑋 for this function in place of 𝑋(𝜁) and use r.v. to denote the random variable.
Note that the terminology used here is traditional. A random variable is not a variable at all in the usual
sense, and it is a function.
The sample space S is termed the domain of the r.v. X, and the collection of all numbers [values of
𝑋(𝜁) is termed the range of the r.v. 𝑋. Thus the range of 𝑋 is a certain subset of the set of all real numbers
(Fig.1.1).
Note that two or more different sample points might give the same value of 𝑋(𝜁), but two different
numbers in the range cannot be assigned to the same sample point.

Fig.1.1. Random Variable X as a Function.


EXAMPLE 1
In the experiment of tossing a coin once, we might define the r.v. X as (Fig. 1.2)
𝑋(𝐻) = 1 𝑋(𝑇) = 0
Note that we could also define another r.v., say 𝑌 𝑜𝑟 𝑍, with
𝑌(𝐻) = 0, 𝑌(𝑇) = 1 𝑜𝑟 𝑍(𝐻) = 0, 𝑍(𝑇) = 0
Events Defined by Random Variables:
If 𝑋 is a r.v. and 𝑥 is a fixed real number, we can define the event (𝑋 = 𝑥) as
(𝑋 = 𝑥) = {𝜁 ∶ 𝑋(𝜁) = 𝑥} ………… (1)
Similarly, for fixed numbers 𝑥, 𝑥1 , 𝑎𝑛𝑑 𝑥2 , we can define the following events:
(𝑿 ≤ 𝒙) = {𝜁 ∶ 𝑋(𝜁) ≤ 𝑥 }
(𝑋 > 𝑥) = {𝜁 ∶ 𝑋(𝜁) > 𝑥} …………. (2)
(𝑥1 < 𝑋 < 𝑥) = {𝜁 ∶ 𝑥1 < 𝑋(𝜁) ≤ 𝑥2 }
Fig. 1.2 One random variable associated with coin tossing.
These events have probabilities that are denoted by
𝑃(𝑋 = 𝑥) = 𝑃{𝜁 ∶ 𝑋(𝜁) = 𝑥 }
𝑃(𝑋 ≤ 𝑥) = 𝑃{𝜁 ∶ 𝑋(𝜁) ≤ 𝑥 }
𝑃(𝑋 > 𝑥) = 𝑃{𝜁 ∶ 𝑋(𝜁) > 𝑥} …………. (3)
𝑃(𝑥1 < 𝑋 < 𝑥) = 𝑃{𝜁 ∶ 𝑥1 < 𝑋(𝜁) ≤ 𝑥2 }
EXAMPLE 2
In the experiment of tossing a fair coin three times, the sample space 𝑆1, consists of eight equally likely
sample points 𝑆1 = {𝐻𝐻𝐻, … … , 𝑇𝑇𝑇}. If 𝑋 is the r.v. giving the number of heads obtained, find (𝑎) 𝑃(𝑋 =
2); (𝑏) 𝑃(𝑋 < 2).
(a) Let 𝐴 ⊂ 𝑆1 be the event defined by 𝑋 = 2. Then, we have
𝐴 = (𝑋 = 2) = {𝜁 ∶ 𝑋(𝜁) = 2 } = {𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻)
Since the sample points are equally likely, we have
3
𝑃(𝑋 = 2) = 𝑃(𝐴) = 8

(b) Let Let 𝐵 ⊂ 𝑆1 be the event defined by 𝑋 < 2. Then


𝐵 = (𝑋 < 2) = {𝜁 ∶ 𝑋(𝜁) < 2} = (𝐻𝑇𝑇, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇)
4 1
𝑎𝑛𝑑 𝑃(𝑋 < 2) = 𝑃(𝐵) = =
8 2
STOCHASTIC (or RANDOM) PROCESSES
INTRODUCTION
The theory of random processes was first developed in connection with the study of fluctuations and
noise in physical systems. A random process is the mathematical model of an empirical process whose
development is governed by probability laws. Random processes provides useful models for the studies of
such diverse fields as statistical physics, communication and control, time series analysis, population growth,
and management sciences.
Consider some quantity in a teletraffic (or any) system
• It typically evolves in time randomly
– Example 1: the number of occupied channels in a telephone link at time t or at the arrival time of the 𝑛𝑡ℎ
customer.
– Example 2: the number of packets in the buffer of a statistical multiplexer at time t or at the arrival time of
the 𝑛𝑡ℎ customer
• This kind of evolution is described by a stochastic process – At any individual time t (or n) the system can
be described by a random variable – Thus, the stochastic process is a collection of random variables.
Defintion :
A random process is a family of r.v.'s {𝑋(𝑡), 𝑡 ∈ 𝑇} defined on a given probability space, indexed by
the parameter t, where t varies over an index set T.
Recall that a random variable is a function defined on the sample space S. Thus, a random process
{𝑋(𝑡), 𝑡 ∈ 𝑇} is really a function of two arguments {𝑋(𝑡, 𝜁), 𝑡 ∈ 𝑇, 𝜁 ∈ 𝑆}.For a fixed 𝑡(= 𝑡𝑘 ), 𝑋(𝑡𝑘 , 𝜁) =
𝑋𝑘 (𝜁) is a r.v. denoted by 𝑋(𝑡𝑘 ), as 𝜁 varies over the sample space S. On the other hand, for a fixed sample
point 𝜁𝑖 ∈ 𝑆, 𝑋(𝑡, 𝜁𝑖 ) = 𝑋𝑖 (𝑡) is a single function of time t, called a sample function or a realization of the
process. The totality of all sample functions is called an ensemble.
Of course if both 𝜁 𝑎𝑛𝑑 𝑡 are fixed, 𝑋(𝑡𝑘 , 𝜁𝑖 ) is simply a real number. the following we use the notation
𝑋(𝑡) to represent 𝑋(𝑡, 𝜁).
Thus Stochastic process or random process is a collection of random variables ordered by an index set.
☛ Example 1. Random variables 𝑋0 , 𝑋1, 𝑋2 , . .. form a stochastic process ordered by the discrete index
set {0, 1, 2, . . . }. Notation: {𝑋𝑛 ∶ 𝑛 = 0, 1, 2, . . . }.
☛ Example 2. Stochastic process {𝑌𝑡 ∶ 𝑡 ≥ 0}. with continuous index set {𝑡 ∶ 𝑡 ≥ 0}. The indices
𝑛 𝑎𝑛𝑑 𝑡 are often referred to as ”time”, so that 𝑋𝑛 is a descrete-time process and 𝑌𝑡 is a continuous-time
process.
Convention: the index set of a stochastic process is always infinite. The range (possible values) of
the random variables in a stochastic process is called the state space of the process. We consider both discrete-
state and continuous-state processes.
Further examples:
☛ Example 3. {𝑋𝑛 ∶ 𝑛 = 0, 1, 2, . . . }, where the state space of 𝑋𝑛 is {0, 1, 2, 3, 4} representing which
of four types of transactions a person submits to an on-line database service, and time n corresponds to the
number of transactions submitted.
☛ Example 4. {𝑋𝑛 ∶ 𝑛 = 0, 1, 2, . . . }, where the state space of 𝑋𝑛 is {1, 2} representing whether an
electronic component is acceptable or defective, and time n corresponds to the number of components
produced.
☛ Example 5. {𝑌𝑡 ∶ 𝑡 ≥ 0}, where the state space of 𝑌𝑡 is {0, 1, 2, . . . } representing the number of
accidents that have occurred at an intersection, and time t corresponds to weeks.
☛ Example 6.{𝑌𝑡 ∶ 𝑡 ≥ 0}, where the state space of 𝑌𝑡 is {0, 1, 2, . . . , s} representing the number of
copies of a software product in inventory, and time t corresponds to days.
☛ Example 7. {𝑌𝑡 ∶ 𝑡 ≥ 0}where the state space of 𝑌𝑡 is {0, 1, 2, . . . } representing the number of
cars parked in a parking garage at a shopping mall, and time t corresponds to hours.
Description of a Random Process:
In a random process{𝑋(𝑡), 𝑡 ∈ 𝑇}, the index set 𝑇 is called the parameter set of the random process.
The set of possible values of a single random variable 𝑋𝑛 𝑜𝑟 𝑋(𝑡) of a stochastic process
{𝑋𝑛 , 𝑛 ≥ 1}𝑜𝑟 {𝑋(𝑡), 𝑡 ∈ 𝑇} is known as its state space. The state space is discrete if it contains a finite or
a denumerable infinity of points; otherwise, it is continuous.
If the index set 𝑇 of a random process is discrete, then the process is called a discrete-parampter (or
discrete-time) process. A discrete-parameter process is also called a random sequence and is denoted by
{𝑋𝑛 , 𝑛 = 1, 2, . . . }.
If 𝑇 is continuous, then we have a continuous parameter (or continuous-time) process.
If the state space 𝐸 of a random process is discrete, then the process is called a discrete-state process,
often referred to as a chain. In this case, the state space 𝐸 is often assumed to be (0, 1, 2, . . . ).
If the state space 𝐸 is continuous, then we have a continuous-state process.
A complex random process 𝑋(𝑡) is defined by
𝑋(𝑡) = 𝑋1 (𝑡) + 𝑗 𝑋2 (𝑡)
where
𝑋1 (𝑡) 𝑎𝑛𝑑 𝑋2 (𝑡) are (real) random processes and 𝑗 = √−1 .
Generally, all random processes are real random processes unless specified otherwise.
Classification of stochastic processes with examples
Suppose that 𝑋(𝑡) gives the number of incoming calls at a switchboard in an interval (0, 𝑡).
Here the state space of 𝑋(𝑡) is discrete though 𝑋(𝑡) is defined for a continuous range of time. We have
a process in continuous time having a discrete state space.
Suppose that 𝑋 (𝑡) represents the maximum temperature at a particular place in (0, 𝑡), then the set of
possible values of 𝑋 (𝑡) is continuous. Here we have a system in continuous time having a continuous state
space. So far we have assumed that the values assumed by the r.v. 𝑋𝑛 (𝑜𝑟 𝑋 (𝑡)) are one-dimensional, but the
process {𝑋𝑛} (𝑜𝑟 {𝑋 (𝑡)}) may be multi-dimensional. Consider 𝑋 (𝑡) = (𝑋1 (𝑡), 𝑋2 (𝑡)), where
𝑋1 represents the maximum and 𝑋2 the minimum temperature at a place in an interval of time (0, 𝑡). We have
here a two-dimensional stochastic process in continuous time having continuous state space. One can similarly
have multi-dimensional processes. One-dimensional processes can be classified, in general, into the
following four types of processes:
(i) Discrete Parameter (or time), discrete state space.
(ii) Discrete Parameter (or time), continuous state space.
(iii) Continuous Parameter (or time), discrete state space.
(iv) Continuous Parameter (or time), continuous state space.
1. Stochastic Processes with Discrete Parameter and State Spaces
☛ Example 8. A Brand-Switching Model for Consumer Behavior Before introducing a new brand of
coffee, a manufacturer wants to study consumer behavior relative to the brands already available in the market.
Suppose there are three brands already available in the market. Suppose there are three brands on sale, say A,
B, C. The consumers either buy the same brand for a few months or change their brands every now and then.
There is also a strong possibility that when a superior brand is introduced, come of the old brands will be left
with only a few customers. Sample surveys are used to gauge consumer behavior.
In such a survey, conducted over a period of time, suppose the estimates obtained for the consumer
brand-switching behavior are as follows: Out of those why buy A is one month, during the next months 60%
buy A again, 30% switch to brand B and 10% switch to brand C. For brands B and C these figures are, B to
A 50%, B to B 30%, B to C 20%, C to A 40%, C to B 40%, C to C 20%.
If we are interested in the number of people who buy a certain brand of coffee, then that number could
be represented as a stochastic process. The behavior of the consumer can also be considered a stochastic
process that can enter three different states A, B, C. Some of the questions that arise are: What is the expected
number of months that a consumer stays wity one specific brand? What are the mean and variance of the
number using a particular brand after a certain number of months? Which is the product preferred most by the
customers in the long run?
Suppose, for instance, that consumer preferences are observed on a monthly basis. Then we have a
discrete-time, discrete-state stochastic process.
☛ Example 9. A Queueing Problem A bus taking students back and forth between the dormitory
complex and the student union arrives at the student union several times during the day. The bus has a capacity
to seat K persons. If there are K or fewer waiting persons when the bus arrives, it departs with the available
number. If there are more than K waiting, the driver allows the first K to board, and the remaining persons
must wait for the next bus. The university administrators would like to know at each time period during the
day how many students are left behind when a bus departs.
The number waiting at the bus stop is a stochastic process dependent on the arrival process (e.g., with
some probability distribution). Some desirable characteristics of the system are: the reduction in the number
waiting at any time, minimizing the time a student has to wait for a ride, and minimum operational cost.
Consider the number of students waiting at the time of arrival of a bus when time is counted by the
number of times the bus arrives. Then we again have a discrete time, discrete-state stochastic process.
2. Stochastic Processes with Continuous Parameter and Discrete State Space
☛ Example 10. In Example 9 consider the number of students waiting for a bus at any time of day –
in this case the parameter space is continuous and we speak about a continuous-time, discrete state stochastic
process.
☛ Example 11. A Population Growth Consider the size of a population at a given time – we have again
a continuous-time, discrete state stochastic process (the population is finite).
3. Stochastic Processes with Discrete Parameter and Continuous State Space
☛ Example 12. Stock Market Consider the values of the Dow-Jones Index at the end of the nth week.
Then we have a discrete-time stochastic process with the continuous state space (0,∞). The probabilistic
analysis of such process is very helpful to the participants to make judicious decisions about buying and selling
stocks.
☛ Example 13. In Example 9 consider waiting time of the n-th student arriving at a bus stop – in this
case we also have a continuous-state stochastic process
☛ Example 14. A Time-Sharing Computer System Jobs of varied length come to a computing center
from various sources. The number of jobs arriving, as well as their length, can be said to follow certain
distributions. Under these conditions the number of jobs waiting at any time and the time a job has to spend
in the system can be represented by stochastic processes. Under a strictly first-come, first-served policy, there
is a good chance of a long job delaying a much more important shorter fob over a long period of time. For the
efficient operation of the system, in addition to minimizing the number of jobs waiting and the total delay, it
may be necessary to adopt a different service policy. A round-robin policy in which the service is performed
on a single job only for a certain length of time, say 3 or 5 sec, and those jobs that need more service are put
back in the queue, is one of the common practices adopted under these conditions. Consider accumulated
workload observed at specified points in time.
4. Stochastic Processes with Continuous Parameter and State Spaces
☛ Example 15. In Example 14 consider waiting time of an arriving job until it gets into service, with
the arriving time o the job now the parameter.

MARKOV PROCESS
INTRODUCTION
Let {𝑋𝑛 , 𝑛 = 0, 1, 2, . . . } be a sequence of random variables (or a discrete parameter stochastic
process).
Let the possible outcomes of 𝑋𝑛 be j (j = 0, 1, 2,...), where the number of outcomes may be finite (say,
m) or denumerable. The possible values of 𝑋𝑛 constitute a set S = {0, 1, 2, ...}, and that the process has the
state space S. Unless otherwise stated, by state space of a Markov chain, we shall imply discrete state space
(having a finite or a countably infinite number of elements); it could be N = {0, 1, 2,...} or some other subset
of the set of integers I = {..., −2, −1, 0, 1, 2,...}.

MARKOV CHAINS (MC) {𝑿𝒏 , 𝒏 ≥ 𝟎}


Definition. The stochastic process {𝑋𝑛 , 𝑛 = 0, 1, 2, . . . } is called a Markov chain, if, for 𝑗, 𝑘, 𝑗1 , . . . ,
𝑗𝑛−1 ∈ 𝑁 (or any subset of I),
𝑃{𝑋𝑛 = 𝑘⁄𝑋𝑛−1 = 𝑗, 𝑋𝑛−2 = 𝑗1 , … … … , 𝑋0 = 𝑗𝑛−1
𝑃{𝑋𝑛 = 𝑘⁄𝑋𝑛−1 = 𝑗} = 𝑝𝑗𝑘 (𝑠𝑎𝑦)
Whenever the first member is defined.
The outcomes are called the states of the Markov chain; if 𝑋𝑛 has the outcome 𝑗 (𝑖. 𝑒. 𝑋𝑛 = 𝑗), the
process is said to be at state j at 𝑛𝑡ℎ trial. To a pair of states (𝑗, 𝑘) at the two successive trials (say, 𝑛𝑡ℎ and
(𝑛 + 1)𝑠𝑡 trials) there is an associated conditional probability 𝑝𝑗𝑘 It is the probability of transition from the
state j at 𝑛𝑡ℎ trial to the state k at (𝑛 + 1)𝑠𝑡 trial. The transition probabilities 𝑝𝑗𝑘 are basic to the study of the
structure of the Markov chain.
Some other examples of applications that use Markov chains include:
• Models of physical processes
– Rainfall from day-to-day
– Neural networks
– Population dynamics
– Lineups, e.g. in grocery stores, computer servers, telephone call centers, etc.
– Chemical reactions
– Protein folding
– Baseball statistics
• discretize a continuous system
• Sampling from high-dimensional systems, e.g. Markov-Chain Monte-Carlo
• Data/network analysis
– clustering
– Speech recognition
– PageRank algorithm in Google’s search engine.
Finite MC
A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number
of states.
Let M be a finite Markov chain with n states, 𝑉 = [𝑛] = {1, 2, . . . , 𝑛}.
Let 𝑝𝑖𝑗 denote the probability of transition from state i to state j, i. e., 𝑝𝑖𝑗 = 𝑃(𝑋𝑡+1 = 𝑗 |𝑋𝑡 = 𝑖).
(Note that this is a conditional probability: the question of 𝑖 → 𝑗 transition only arises if the system is in state
i, i. e. , 𝑋𝑡 = 𝑖.)
The finite Markov chain 𝑀 is characterized by the 𝑛 × 𝑛 transition matrix 𝑇 = (𝑝𝑖𝑗 ) (𝑖, 𝑗 ∈ [𝑛])
and an initial distribution 𝑞 = (𝑞1 , . . . , 𝑞𝑛 ) where 𝑞𝑖 = 𝑃(𝑋0 = 𝑖).
Time homogeneous M.C.
The Markov chain 𝑋(𝑡) is time-homogeneous if 𝑃(𝑋𝑛+1 = 𝑗|𝑋𝑛 = 𝑖) = 𝑃(𝑋1 = 𝑗|𝑋0 = 𝑖), i.e.
the transition probabilities do not depend on time n. If this is the case, we write 𝑝𝑖𝑗 = 𝑃(𝑋1 = 𝑗|𝑋0 = 𝑖)
for the probability to go from i to j in one step, and 𝑃 = (𝑝𝑖𝑗 ) for the transition matrix.
In other words,
The transition probability may or may not be independent of n. If the transition probability pjk is
independent of n, the Markov chain is said to be homogeneous (or to have stationary transition probabilities).
If it is dependent on n, the chain is said to be non-homogeneous. Generally we shall confine to Homogeneous
chains.
One step transition probabilities
The transition probability pjk refers to the states (j, k) at two successive trials (say, 𝑛𝑡ℎ and (𝑛 + 1)𝑠𝑡 trials);
the transition is one-step and 𝑝𝑗𝑘 is called one-step (or unit step) transition probability. In the more general
case, we are concerned with the pair of states (𝑗, 𝑘) at two non successive trials, say, state j at the 𝑛𝑡ℎ trial and
state k at the (𝑛 + 𝑚)𝑡ℎ trial. The corresponding transition probability is then called m-step transition
𝑚
probability and is denoted by 𝑝𝑗𝑘 , i.e.
𝑝𝑗𝑘 = 𝑃( 𝑋𝑛+𝑚 = 𝑘 ∕ 𝑋𝑛 = 𝑗 )

Transition probability matrix (t.p.m.) (or Matrix of Transition Probabilities)


The transition probabilities 𝑝𝑗𝑘 Satisfy
𝑝𝑗𝑘 ≥ 0, ∑𝑘 𝑝𝑗𝑘 = 1 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑗.
These probabilities may be written in the matrix form
𝑝11 𝑝12 … …
𝑝 𝑝 … …
𝑃 = (𝑝21 𝑝22 … …)
31 32
… … … …
This is called the transition probability matrix or matrix of transition probabilities (t.p.m.) of the
Markov chain. P is a stochastic matrix i.e. a square matrix with non-negative elements and unit row sums.
Example : A particle performs a random walk with absorbing barriers, say, as 0 and 4. Whenever it is
at any position r (0 < r < 4), it moves to r + 1 with probability P or 𝑡𝑜 (𝑟 − 1) with probability q, 𝑝 + 𝑞 =
1. But as soon as it reaches 0 or 4 it remains there itself. Let 𝑋𝑛 be the position of the particle after n moves.
The different states of 𝑋𝑛 are the different positions of the particle. {𝑋𝑛 } is a Markov chain whose unit-step
transition probabilities are given by
𝑃{𝑋𝑛 = 𝑟 + 1/𝑋𝑛−1 = 𝑟} = 𝑝
𝑃{𝑋𝑛 = 𝑟 − 1/𝑋𝑛−1 = 𝑟} = 𝑞, 0<r<4
And
𝑃{𝑋𝑛 = 0/𝑋𝑛−1 = 0} = 1
𝑃{𝑋𝑛 = 4/𝑋𝑛−1 = 4} = 1
The transition matrix is given by
States of 𝑋𝑛
0 1 2 3 4
0 1 0 0 0 0
1 𝑞 0 𝑝 0 0
State of 𝑋𝑛−1 2 0 𝑞 0 𝑝 0
3 0 0 𝑞 0 𝑝
4 [0 0 0 0 1]
Probability Distribution :
EVALUATION OF PROBABILITY

A more general equation relating the transition probabilities that holds even in the time-in
homogeneous case is: Chapman-Kolmogorov Equation.
CHAPMAN KOLMOGOROV EQUATION

𝑚+𝑛 𝑚 𝑛
𝑝𝑖𝑗 = ∑ 𝑝𝑖𝑟 𝑝𝑟𝑗
𝑟=0

𝑊ℎ𝑒𝑟𝑒 𝑖, 𝑗 𝑎𝑛𝑑 𝑟 𝑎𝑟𝑒 𝑠𝑡𝑎𝑡𝑒𝑠.


In other words,
𝑃(𝑋𝑚+𝑛 = 𝑗⁄𝑋0 = 𝑖) = ∑ 𝑃(𝑋𝑚 = 𝑘⁄𝑋0 = 𝑖) 𝑃( 𝑋𝑚+𝑛 = 𝑗⁄𝑋𝑚 = 𝑘
Proof :
To prove

𝑚+𝑛 𝑚 𝑛
𝑝𝑖𝑗 = ∑ 𝑝𝑖𝑟 𝑝𝑟𝑗
𝑟=0

Let’s consider the (𝑖, 𝑗)𝑡ℎ entry of the matrix 𝑃𝑚+𝑛


𝑚+𝑛
𝑝𝑖𝑗 = 𝑃(𝑋𝑚+𝑛 = 𝑗⁄𝑋0 = 𝑖)
= ∑𝑟 𝑃(𝑋𝑚+𝑛 = 𝑗, 𝑋𝑚 = 𝑟⁄𝑋0 = 𝑖)
𝑃(𝑋𝑚+𝑛 =𝑗,𝑋𝑚 =𝑟,𝑋𝑜 =𝑖) 𝑃(𝐴∩𝐵)
= ∑𝑟 … … … … … … . . 𝑃(𝐴⁄𝐵) =
𝑃(𝑋𝑜 =𝑖) 𝑃(𝐵)
𝑃(𝑋𝑚+𝑛 =𝑗,𝑋𝑚 =𝑟,𝑋𝑜 =𝑖) 𝑃(𝑋𝑚 =𝑟,𝑋𝑜 =𝑖)
= ∑𝑟
𝑃(𝑋𝑚 =𝑟,𝑋𝑜 =𝑖) 𝑃(𝑋𝑜 =𝑖)

= ∑𝑟 𝑃(𝑋𝑚+𝑛 = 𝑗/𝑋𝑚 = 𝑟, 𝑋𝑜 = 𝑖). 𝑃(𝑋𝑚 = 𝑟/𝑋𝑜 = 𝑖)


= ∑𝑟 𝑃(𝑋𝑚+𝑛 = 𝑗/𝑋𝑚 = 𝑟). 𝑃(𝑋𝑚 = 𝑟/𝑋𝑜 = 𝑖) … … … … . 𝑏𝑦 𝑀𝑎𝑟𝑘𝑜𝑣 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦
𝑛 𝑚
= ∑𝑟 𝑝𝑟𝑗 𝑝𝑖𝑟
𝑚 𝑛
= ∑𝑟 𝑝𝑖𝑟 𝑝𝑟𝑗
Which holds for all states 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑛𝑑 𝑚, 𝑛 ≥ 0

Forward Kolmogorov Equation. (for a time-homogeneous, discrete-time Markov Chain)


𝛼 𝑛+1 = 𝛼 𝑛 𝑃.
This equation shows how to evolve the probability in time. The solution is clearly
𝛼 𝑛 = 𝛼 0 𝑃𝑛 , which you could also show directly from the n-step transition probabilities.
Backward Kolmogorov Equation. (for a time-homogeneous, discrete-time Markov Chain)
𝑢𝑛+1 = 𝑃𝑢𝑛 , 𝑢0 (𝑖) = 𝑓(𝑖) ∀𝑖 ∈ 𝑆.

You might also like