Mathematical Expectation
Mathematical Expectation
Mathematical Expectation
Contents
o
o
o
o
o
o
o
o
Expectation
Variance, standard deviation
Moments
Conditional expectation, conditional variance,
conditional moments
Chebyshevs inequality
Weak and Strong Laws of Large Numbers
Other Measures of Central Tendency
Other Measures of Dispersion
Topic 3: Mathematical Expectation
2/50
Expectation
Mathematical Expectation or Expected Value, or briefly the
Expectation is a very important concept in P&S.
For a discrete random variable X having the possible values x1 ,...., xn ,
the expectation of X is defined as
n
E ( X ) = x1P ( X = x1 ) + ..... + xn P ( X = xn ) = x j P ( X = x j )
j =1
or equivalently, if P( X = x j ) = f ( x j ), then E ( X ) = x j f ( x j ),
j =1
3/50
Expectation
As a special case when the probabilities are all equal, we have
1
E( X ) =
n
x
j =1
x f ( x) dx
4/50
Expectation
The expectation of X is very often called the mean of X and
is denoted by X , or simply , when the particular random
variable is understood.
The mean, or expectation, of X gives a single value that acts
as a representative or average of the values of X , and for this
reason it is often called a measure of central tendency.
5/50
Example
Suppose that a game is to be played with a single die assumed fair.
In this game a player wins $20 if a 2 turns up, $40 if a 4 turns up;
loses $30 if a 6 turns up; while the player neither wins nor loses if
any other face turns up. Find the expected sum of money to be won.
Solution.
E(X)=0
1
1
1
1
1
1
+ 0 + 40 + 0 + (-30) = 5.
+ 20
6
6
6
6
6
6
Topic 3: Mathematical Expectation
6/50
P( X = x) =
{ x| g ( x ) = y }
f ( x).
{ x| g ( x ) = y }
y
i =1
h( yi ) = g ( x j ) f ( x j ).
j =1
Therefore, E[ g ( X )] = g ( x j ) f ( x j ) = g ( x) f ( x).
j =1
7/50
g ( x) f ( x) dx.
+ +
g ( x, y) f ( x, y ) dx dy
8/50
Theorem on Expectation
Theorem 1:
a1) If c is any constant, then
E (c X ) = c E ( X ).
a2) If X and Y are any random variables, then
E ( X + Y ) = E ( X ) + E (Y ).
a3) If X and Y are independent random variables, then
E ( XY ) = E ( X ) E (Y ).
9/50
10/50
= E[( X X ) ] = ( x j X ) 2 f ( x j ).
2
X
j =1
(x
j =1
(x
) 2 f ( x)dx,
11/50
12/50
13/50
Theorems on Variance
Theorem 2:
a1) X2 = E[( X X ) 2 ] = E ( X 2 ) X2 = E ( X 2 ) [ E ( X )]2 .
a2) If c is any constant, then Var (c X ) = c 2 Var ( X ).
a3) The quantity E[( X a ) 2 ] is a minimum when a = E ( X ).
a4) If X and Y are independent random variables,
Var ( X Y ) = Var ( X ) + Var (Y ) or X2 Y = X2 + Y2 .
In words, the variance of a sum of independent variables equals
the sum of their variances.
14/50
15/50
Moments
The rth moment of a random variable X about the
mean , also called the rth central moment, is
defined as r = E[( X )r ] where r =0,1,2,.....
It follows that 0 =1, 1 =0, and 2 = 2 , i.e., the
second central moment or second moment about
the mean is the variance.
16/50
Moments
If X is a discrete random variable taking the values x1 ,......, xn
and having probability function f ( x), then the rth moment is
n
r = E[( X ) ] = ( x j )r f ( x j ).
r
j =1
(x )
integral converges.
Topic 3: Mathematical Expectation
17/50
3 = 3' 3 2' + 2 3 ,
18/50
tx
f ( x)
(discrete variable)
et x f ( x) dx
(continuous variable).
19/50
dr
where = r M X (t )
is the rth derivative of M X (t )
dt
t =0
'
r
evaluated at t = 0.
Note: Since the coefficients in this expansion enable us to
find the moments, the reason for the name moment generating
function is apparent.
Topic 3: Mathematical Expectation
20/50
10
Characteristic Functions
If putting t = i , where i is the imaginary unit, in the moment
generating function we obtain an important function called the
characteristic function. We denote this by
X ( ) = M X (i ) = E (ei X ).
It follows that, assuming convergence,
X ( ) = ei x f ( x)
X ( ) =
ix
(discrete variable)
f ( x) dx (continuous variable).
21/50
Characteristic Functions
Homework. Show that the Taylor series expansion of the
characteristic function is
r
(i ) 2
' (i )
X ( ) = (i ) = 1 + (i ) +
+ .... + r
+ .....
r!
2!
r =0
'
r
'
2
dr
( )
where = ( i)
r X
d
=0
'
r
22/50
11
23/50
24/50
12
25/50
i x
X ( ) d
26/50
13
x f ( x, y) dx dy,
X = E ( X ) =
Y = E (Y ) =
y f ( x, y) dx dy,
(x
)2 f ( x, y ) dx dy,
= E[(Y Y ) ]=
2
Y
(y
) 2 f ( x, y ) dx dy.
27/50
Covariance
Another quantity that arises in the case of two variables X and Y is the
covariance defined by
XY = Cov( X , Y ) = E[( X X )(Y Y )].
In terms of the joint density function f ( x, y ), we have
XY =
(x
)( y Y ) f ( x, y) dx dy.
Y = y f ( x, y ),
x
XY = ( x X )( y Y ) f ( x, y),
x
where the sums are taken over all the discrete values of X and Y .
Topic 3: Mathematical Expectation
28/50
14
Properties of covariance
Theorem 6:
a1) XY = E ( XY ) E ( X ) E (Y ) = E ( XY ) X Y .
a2) If X and Y are independent random variables, then
XY = Cov( X , Y ) = 0.
a3) Var ( X Y ) = Var ( X ) + Var (Y ) 2 Cov ( X , Y ).
OR X2 Y = X2 + Y2 2 XY .
a4) XY X Y .
29/50
Correlation Coefficient
If X and Y are independent, then Cov( X , Y ) = XY = 0. On the other
hand, if X and Y are completely dependent, e.g. when X = Y , then
Cov( X , Y ) = XY = X Y . From this we are led to a measure of the
dependence of the variables X and Y given by
=
XY
.
XY
30/50
15
of Y given X is
f ( x, y )
, where f1 ( x) is the
f1 ( x)
y f ( y | x) dy,
31/50
E (Y | X = x) f ( x) dx.
1
32/50
16
Example
The average travel time to a distant city
is c hours by car or b hours by bus.
A woman cannot decide whether to drive
or take the bus, so she tosses a coin.
What is her expected travel time?
33/50
Example - Solution
Solution. Here we are dealing with the joint distribution of the
outcome of the toss, X , and the travel time, Y , where
Y = Ycar if X = 0 and Y = Ybus if X = 1.
Presumably, both Ycar and Ybus are independent of X, so that by
Property 1 above: E(Y|X=0)=E(Ycar |X=0)=E(Ycar )=c, and
E(Y|X=l)=E(Ybus |X=1)=E(Ybus )=b.
Then Property 2 (with the integral replaced by a sum) gives, for a
fair coin,
E (Y ) = E (Y | X = 0) P( X = 0) + E (Y | X = 1) P( X = 1) =
c+b
.
2
34/50
17
E[(Y 2 ) | X = x] =
2
(y )
2
f ( y | x) dy
where 2 = E (Y | X = x) .
We can also define the rth conditional moment of Y about any
value a given X as
E[(Y a) | X = x] =
r
( y a)
f ( y | x) dy.
35/50
Chebyshevs Inequality
An important theorem in probability and statistics that reveals
a general property of discrete or continuous random variables
having finite mean and variance is known under the name of
Chebyshev's inequality.
Theorem 7 (Chebyshev's Inequality):
Suppose that X is a random variable (discrete or continuous)
having mean and variance 2 , which are finite. Then if is
any positive number,
2
1
P(|X- | ) 2 or, with = k , P(|X- | k ) 2 .
k
Topic 3: Mathematical Expectation
36/50
18
= E[( X ) ] = ( x )2 f ( x) dx
2
|x | ( x ) 2 f ( x) dx
2 | x | f ( x) dx = 2 P (| x | ).
2
Thus, P(| x | ) 2 .
37/50
P(|X- | 2 )
38/50
19
S
Since n is the arithmetic mean of X 1 , . . . , X n , this theorem
n
S
states that the probability of the arithmetic mean n differing
n
from its expected value by more than approaches zero as n .
n
39/50
This result is often called the strong law of large numbers, and,
by contrast, that of Theorem 8 is called the weak law of large
numbers. When the "law of large numbers" is referred to
without qualification, the weak law is implied.
Topic 3: Mathematical Expectation
40/50
20
41/50
42/50
21
43/50
Percentiles
It is often convenient to subdivide the area under a density curve
by use of ordinates so that the area to the left of the ordinate is
some percentage of the total unit area. The values corresponding
to such areas are called percentile values, or briefly percentiles.
Thus, for example, the area to the left of the ordinate at x in the
above Fig is .
For instance, the area to the left of x0.10 would be 0.10, or 10%,
and x0.10 would be called the 10th percentile (also called the first
decile). The median would be the 50th percentile (or fifth decile).
Topic 3: Mathematical Expectation
44/50
22
Percentiles
45/50
Percentiles
Let be a number between 0 and 1. The (100 )th percentile
of the distribution of a continuous random variable X , denoted
by ( ), is defined by = F ( ( )).
Thus (0.75), the 75th percentile, is such that the area under
the graph of f ( x) to the left of (0.75) is equal 0.75.
Example. When we say that an individual's test score was at
the 90th percentile of the population, we mean that 90% of all
population scores were below that score and 10% were above.
Similarly, 30th percentile is the score that exceeds 30% of all
scores and is exceeded by 70% of all scores.
Topic 3: Mathematical Expectation
46/50
23
(discrete variable)
x f ( x) dx (continuous variable)
47/50
Skewness
Often a distribution is not symmetric about any value but instead has
one of its tails longer than the other. If the longer tail occurs to the right,
as in Fig. A, the distribution is said to be skewed to the right, while if
the longer tail occurs to the left, as in Fig. B, it is said to be skewed
to the left. Measures describing this asymmetry are called coefficients
of skewness, or briefly skewness. One such measure is given by:
E[( X )3 ] 3
= 3.
3 =
3
48/50
24
Fig. A
Fig. B
Fig. C
49/50
Kurtosis
In some cases a distribution may have its values concentrated
near the mean so that the distribution has a large peak as indicated
by the solid curve of Fig. C. In other cases the distribution may
be relatively flat as in the dashed curve of Fig. C. Measures of
the degree of peakedness of a distribution are called coefficients
of kurtosis, or briefly kurtosis. A measure often used is given by
E[( X ) 4 ]
4
2 =
.
=
2 2
4
( E[( X ) ])
50/50
25