Mathematical Expectation

Dr Sonnet Nguyen, USTH 2011
Topic 3: Mathematical Expectation

Course: Probability and Statistics
Lect. Dr Quang Hng/ Sonnet Nguyen
Contents
o
o
o
o
o
o
o
o
Expectation
Variance, standard deviation
Moments
Conditional expectation, conditional variance,
conditional moments
Chebyshevs inequality
Weak and Strong Laws of Large Numbers
Other Measures of Central Tendency
Other Measures of Dispersion
2/50
Expectation
Mathematical Expectation or Expected Value, or briefly the
Expectation is a very important concept in P&S.
For a discrete random variable X having the possible values x1 ,...., xn ,
the expectation of X is defined as
n
E ( X ) = x1P ( X = x1 ) + ..... + xn P ( X = xn ) = x j P ( X = x j )
j =1
or equivalently, if P( X = x j ) = f ( x j ), then E ( X ) = x j f ( x j ),
j =1
where the last summation is taken over all appropriate values of x.

3/50
Expectation
As a special case when the probabilities are all equal, we have
1
E( X ) =
n
x
j =1
which is called the arithmetic mean, or simply the mean, of

x1 , x2 ,..., xn .
For a continuous random variable X having density function f ( x),
the expectation of X is defined as
E( X ) =
x f ( x) dx
provided that the integral converges absolutely.

4/50
Expectation
The expectation of X is very often called the mean of X and
is denoted by X , or simply , when the particular random
variable is understood.
The mean, or expectation, of X gives a single value that acts
as a representative or average of the values of X , and for this
reason it is often called a measure of central tendency.
5/50
Example
Suppose that a game is to be played with a single die assumed fair.
In this game a player wins $20 if a 2 turns up, $40 if a 4 turns up;
loses $30 if a 6 turns up; while the player neither wins nor loses if
any other face turns up. Find the expected sum of money to be won.
Solution.
E(X)=0
1
1
1
1
1
1
+ 0 + 40 + 0 + (-30) = 5.
+ 20
6
6
6
6
6
6
6/50
Functions of Random Variables

Let X be a discrete random variable with probability function
f ( x). Then Y = g ( X ) is also a discrete random variable, and
the probability function of Y is
h( y ) = P (Y = y ) =
P( X = x) =
{ x| g ( x ) = y }
f ( x).
{ x| g ( x ) = y }
If X takes on the values x1 , x2 ,...., xn , and Y the values

y1 , y2 ,...., ym (m n), then
y
i =1
h( yi ) = g ( x j ) f ( x j ).
j =1
Therefore, E[ g ( X )] = g ( x j ) f ( x j ) = g ( x) f ( x).
j =1
7/50
Functions of Random Variables

Similarly, if X is a continuous random variable having probability
density f ( x), then it can be shown that E[ g ( X )] =
g ( x) f ( x) dx.
Note that E[ g ( X )] do not involve the probability function and

the probability density function of Y = g ( X ).
Generalizations are easily made to functions of two or more
random variables. For example, if X and Y are two continuous
random variables having joint density function f ( x, y ), then the
expectation of g ( X , Y ) is given by
E[ g ( X , Y )] =
+ +
g ( x, y) f ( x, y ) dx dy
8/50
Theorem on Expectation
Theorem 1:
a1) If c is any constant, then
E (c X ) = c E ( X ).
a2) If X and Y are any random variables, then
E ( X + Y ) = E ( X ) + E (Y ).
a3) If X and Y are independent random variables, then
E ( XY ) = E ( X ) E (Y ).
9/50
The Variance and Standard

Deviation
Another quantity of great importance is called
the variance and is defined by:
Var ( X ) = E[( X X ) 2 ] = E[( X E ( X )) 2 ].
The variance is a nonnegative number.
The positive square root of the variance is called
the standard deviation and is given by
X = Var ( X ) = E[( X X ) 2 ] .
10/50
The Variance and Standard

Deviation
If X is a discrete random variable taking the values x1 , x2 , . . . , xn and
having probability function f ( x), then the variance is given by
n
= E[( X X ) ] = ( x j X ) 2 f ( x j ).
2
X
j =1
If X takes on an infinite number of values x1 , x2 , . . . then

=
2
X
(x
j =1
X ) 2 f ( x j ), provided that the series converges.
If X is a continuous random variable having density function f ( x ),

then the variance is given by: = E[( X X ) ] =
2
X
(x
) 2 f ( x)dx,
provided that the integral converges.

11/50
What does the variance measure?

The variance (or the standard deviation) is a measure of the dispersion, or
scatter, of the values of the random variable about the mean . If the values
tend to be concentrated near the mean, the variance is small; while if the
values tend to be distributed far from the mean, the variance is large. The
situation is indicated graphically in the following figure for the case of two
continuous distributions having the same mean .
12/50
Units of Variance and Standard

Deviation
Note that if X has certain dimensions or units, such as cm,
then the variance of X has units cm 2 while the standard
deviation has the same unit as X , i.e., cm.
For this reason, the standard deviation is often used.
When no confusion can result, the standard deviation is
often denoted by instead of X , and the variance in
such case is 2 .
13/50
Theorems on Variance
Theorem 2:
a1) X2 = E[( X X ) 2 ] = E ( X 2 ) X2 = E ( X 2 ) [ E ( X )]2 .
a2) If c is any constant, then Var (c X ) = c 2 Var ( X ).
a3) The quantity E[( X a ) 2 ] is a minimum when a = E ( X ).
a4) If X and Y are independent random variables,
Var ( X Y ) = Var ( X ) + Var (Y ) or X2 Y = X2 + Y2 .
In words, the variance of a sum of independent variables equals
the sum of their variances.
14/50
Standardized Random Variables

Let X be a random variable with mean and standard deviation
( >0). Then we can define an associated standardized random
X -
variable given by X * =
.
An important property of X * is that it has a mean of zero and a

variance of 1, which accounts for the name standardized, i.e.,
E ( X *) = 0, Var ( X *) = 1. The values of a standardized variable are
sometimes called standard scores, and X is then said to be expressed
in standard units (i.e., is taken as the unit in measuring X ).
Standardized variables are useful for comparing different distributions.
15/50
Moments
The rth moment of a random variable X about the
mean , also called the rth central moment, is
defined as r = E[( X )r ] where r =0,1,2,.....
It follows that 0 =1, 1 =0, and 2 = 2 , i.e., the
second central moment or second moment about
the mean is the variance.
16/50
Moments
If X is a discrete random variable taking the values x1 ,......, xn
and having probability function f ( x), then the rth moment is
n
r = E[( X ) ] = ( x j )r f ( x j ).
r
j =1
If X takes on an infinite number of values x1 , x2 , . . . , then
r = ( x j ) r f ( x j ) provided that the series converges.

j =1
If X is a continuous random variable having density function f ( x ),

then r = E[( X ) ] =
r
(x )
f ( x)dx provided that the
integral converges.
17/50
Raw Moment or Moment about the

origin
The rth moment of X about the origin, also called the rth raw
moment, is defined as: r' = E ( X r ) where r = 0, 1, 2, . . . .
The relationship between these moments is given by
r
r
r = r' r' 1 + ... + (1) j r' j j + ... + ( 1) r 0' r .
1
j
As special cases we have, using 1' = and 0' = 1,
2 = 2' 2 ,
3 = 3' 3 2' + 2 3 ,
4 = 4' 4 3' + 62' 2 3 4 .

18/50
Moment Generating Functions

The moment generating function of X is defined by
M X (t ) = E (etX ), that is, assuming convergence,
M X (t ) =
M X (t ) =
tx
f ( x)
(discrete variable)
et x f ( x) dx
(continuous variable).
19/50
Moment Generating Functions

Homework. Show that the Taylor series expansion of the
moment generating function is
M X (t ) = r' t r = 1 + t + 2' t 2 + .... + r' t r + ...,

r =0
dr
where = r M X (t )
is the rth derivative of M X (t )
dt
t =0
'
r
evaluated at t = 0.
Note: Since the coefficients in this expansion enable us to
find the moments, the reason for the name moment generating
function is apparent.
20/50
10
Characteristic Functions
If putting t = i , where i is the imaginary unit, in the moment
generating function we obtain an important function called the
characteristic function. We denote this by
X ( ) = M X (i ) = E (ei X ).
It follows that, assuming convergence,
X ( ) = ei x f ( x)
X ( ) =
ix
(discrete variable)
f ( x) dx (continuous variable).
Since ei x 1, the series and the integral always converge

absolutely.
21/50
Characteristic Functions
Homework. Show that the Taylor series expansion of the
characteristic function is
r
(i ) 2
' (i )
X ( ) = (i ) = 1 + (i ) +
+ .... + r
+ .....
r!
2!
r =0
'
r
'
2
dr
( )
where = ( i)
r X
d
=0
'
r
22/50
11
Theorems on Moment Generating

Functions and Characteristic Functions
Theorem 3: If M X (t ) is the moment generating function
of the random variable X and a and b (b 0) are constants,
then the moment generating function of ( X + a ) / b is
t
M ( X + a )/ b (t ) = eat / b M X .
b
Similarly, for characteristic function X ( )

( X + a )/ b ( ) = e a i / b X .
b
23/50

Theorem 4: If X and Y are independent random variables having
moment generating functions M X (t ) and M Y (t ), respectively, then
M X +Y (t ) = M X (t ) M Y (t ).
In words, the moment generating function of a sum of
independent random variables is equal to the product of their
moment generating functions.
Similarly, for characteristic functions X ( ) and Y ( ) of two
independent random variables X and Y
X+Y ( ) = X ( ) Y ( ).
24/50
12

Theorem 5 (Uniqueness Theorem)
Suppose that X and Y are random variables having moment
generating functions M X (t ) and M Y (t ), respectively. Then X
and Y have the same probability distribution if and only if
M X (t ) = M Y (t ) identically.
Similarly, probability distribution is uniquely determined by
its characteristic function. Thus, suppose that X and Y are
random variables having characteristic functions X ( ) and
Y ( ) respectively, then X and Y have the same probability
distribution if and only if X ( ) = Y ( ) identically.
25/50
Relation between the density function and

the characteristic function
Note: An important reason for introducing the characteristic
function is that represents the Fourier transform of the density
function f ( x). From the theory of Fourier transforms, we can
easily determine the density function from the characteristic
function. In fact,
1
f ( x) =
2
i x
X ( ) d
which is often called an inversion formula, or inverse Fourier

transform.
Another reason for using the charact. function is that it always
exists whereas the moment generating function may not exist.
26/50
13
Variance for Joint Distributions

The results given above for one variable can be extended to two or
more variables. For example, if X and Y are two continuous random
variables having joint density function f ( x, y ), the means, or
expectations, of X and Y are

x f ( x, y) dx dy,
X = E ( X ) =
Y = E (Y ) =
y f ( x, y) dx dy,
and the variances are

= E[( X X ) ]=
2
X
(x
)2 f ( x, y ) dx dy,
= E[(Y Y ) ]=
2
Y
(y
) 2 f ( x, y ) dx dy.
27/50
Covariance
Another quantity that arises in the case of two variables X and Y is the
covariance defined by
XY = Cov( X , Y ) = E[( X X )(Y Y )].
In terms of the joint density function f ( x, y ), we have
XY =
(x
)( y Y ) f ( x, y) dx dy.
Similar remarks can be made for two discrete random variables.

X = x f ( x, y ),
x
Y = y f ( x, y ),
x
XY = ( x X )( y Y ) f ( x, y),
x
where the sums are taken over all the discrete values of X and Y .
28/50
14
Properties of covariance
Theorem 6:
a1) XY = E ( XY ) E ( X ) E (Y ) = E ( XY ) X Y .
a2) If X and Y are independent random variables, then
XY = Cov( X , Y ) = 0.
a3) Var ( X Y ) = Var ( X ) + Var (Y ) 2 Cov ( X , Y ).
OR X2 Y = X2 + Y2 2 XY .
a4) XY X Y .
29/50
Correlation Coefficient
If X and Y are independent, then Cov( X , Y ) = XY = 0. On the other
hand, if X and Y are completely dependent, e.g. when X = Y , then
Cov( X , Y ) = XY = X Y . From this we are led to a measure of the
dependence of the variables X and Y given by
=
XY
.
XY
We call the correlation coefficient, or coefficient of correlation.

From the property a4) of covariance we see that -1 1. In the
case where =0 (i.e., zero covariance), we call the variables X
and Y uncorrelated. In such cases ( =0) the variables may or may
not be independent.
30/50
15
Conditional Expectation, Variance,

and Moments
If X and Y have joint density function f ( x, y ), then we
have seen in Topic 2, the conditional density function
f ( y | x) =
of Y given X is
f ( x, y )
, where f1 ( x) is the
f1 ( x)
marginal density function of X .

We can define the conditional expectation, or conditional
mean, of Y given X by:
E (Y | X = x) =
y f ( y | x) dy,
where "X = x" is
to be interpreted as x < X x + dx in the continuous case.

31/50
Conditional Expectation, Variance,

and Moments
Similar properties (a1) and (a2) of Theorem 1 also hold for
conditional expectation. We note the following properties:
1. E (Y | X = x) = E (Y ), when X and Y are independent.
2. E (Y ) =
E (Y | X = x) f ( x) dx.
1
It is often convenient to calculate expectations by use of the

property 2, rather than directly.
32/50
16
Example
The average travel time to a distant city
is c hours by car or b hours by bus.
A woman cannot decide whether to drive
or take the bus, so she tosses a coin.
What is her expected travel time?
33/50
Example - Solution
Solution. Here we are dealing with the joint distribution of the
outcome of the toss, X , and the travel time, Y , where
Y = Ycar if X = 0 and Y = Ybus if X = 1.
Presumably, both Ycar and Ybus are independent of X, so that by
Property 1 above: E(Y|X=0)=E(Ycar |X=0)=E(Ycar )=c, and
E(Y|X=l)=E(Ybus |X=1)=E(Ybus )=b.
Then Property 2 (with the integral replaced by a sum) gives, for a
fair coin,
E (Y ) = E (Y | X = 0) P( X = 0) + E (Y | X = 1) P( X = 1) =
c+b
.
2
34/50
17
Conditional Variance and

Conditional Moments
We can define the conditional variance of Y given X by
E[(Y 2 ) | X = x] =
2
(y )
2
f ( y | x) dy
where 2 = E (Y | X = x) .
We can also define the rth conditional moment of Y about any
value a given X as
E[(Y a) | X = x] =
r
( y a)
f ( y | x) dy.
The usual theorems for variance and moments extend to

conditional variance and moments.
35/50
Chebyshevs Inequality
An important theorem in probability and statistics that reveals
a general property of discrete or continuous random variables
having finite mean and variance is known under the name of
Chebyshev's inequality.
Theorem 7 (Chebyshev's Inequality):
Suppose that X is a random variable (discrete or continuous)
having mean and variance 2 , which are finite. Then if is
any positive number,
2
1
P(|X- | ) 2 or, with = k , P(|X- | k ) 2 .
k
36/50
18
Proof of the Chebyshevs Inequality

We present the proof for continuous r.v. A proof for
discrete r.v. is similar. If f ( x) is the density function
of X , then
= E[( X ) ] = ( x )2 f ( x) dx
2
|x | ( x ) 2 f ( x) dx
2 | x | f ( x) dx = 2 P (| x | ).
2
Thus, P(| x | ) 2 .
37/50
Example of the Chebyshevs

Inequality
Example: Let k = 2 in Chebyshev's inequality, we see that
1
= 0.25 or P(|X- |<2 ) 0.75.
2
2
In words, the probability of X differing from its mean by
more than 2 standard deviations is less than or equal to 0.25;
P(|X- | 2 )
equivalently, the probability that X will lie within 2 standard

deviations of its mean is greater than or equal to 0.75.
This is remarkable because we have not even specified the
probability distribution of X .
38/50
19
Weak Law of Large Numbers

The following theorem, called the law of large numbers, is an
interesting consequence of Chebyshev's inequality.
Theorem 8 (Law of Large Numbers): Let X 1 , X 2 , . . . , X n be
mutually independent random variables (discrete or continuous),
each having finite mean and variance 2 .
S
If S n = X i for n = 1, 2,...., then lim P n = 0.

n
i =1
n
S
Since n is the arithmetic mean of X 1 , . . . , X n , this theorem
n
S
states that the probability of the arithmetic mean n differing
n
from its expected value by more than approaches zero as n .
n
39/50
Weak and Strong Law of Large

Numbers
A stronger result, which we might expect to be true, is that
S
lim n = , but this is actually false. However, we can prove
x
n
S
S
that lim n = with probability one, i.e. P lim n = =1.

x
n
x n
This result is often called the strong law of large numbers, and,
by contrast, that of Theorem 8 is called the weak law of large
numbers. When the "law of large numbers" is referred to
without qualification, the weak law is implied.
40/50
20
Other Measures of Central

Tendency
As we have already seen, the mean, or expectation,
of a random variable X provides a measure of
central tendency for the values of a distribution.
Although the mean is used most, two other measures
of central tendency are also employed. These are
the mode and the median.
41/50

Tendency: MODE
1. MODE. The mode of a discrete random variable is that value
which occurs most often or, in other words, has the greatest
probability of occurring.
Sometimes we have two, three, or more values that have
relatively large probabilities of occurrence. In such cases, we
say that the distribution is bimodal, trimodal, or multimodal,
respectively.
The mode of a continuous random variable X is the value
(or values) of X where the probability density function has a
relative maximum.
42/50
21

Tendency: MEDIAN
2. MEDIAN. The median is that value m for which
1
1
P ( X < m)
and P ( X > m) .
2
2
In the case of a continuous distribution we have
P( X < m) = 0.5 = P ( X > m), or equivalently: F (m) = 0.5.
and the median separates the density curve into two parts
having equal areas of 1/2 each. In the case of a discrete
distribution a unique median may not exist.
43/50
Percentiles
It is often convenient to subdivide the area under a density curve
by use of ordinates so that the area to the left of the ordinate is
some percentage of the total unit area. The values corresponding
to such areas are called percentile values, or briefly percentiles.
Thus, for example, the area to the left of the ordinate at x in the
above Fig is .
For instance, the area to the left of x0.10 would be 0.10, or 10%,
and x0.10 would be called the 10th percentile (also called the first
decile). The median would be the 50th percentile (or fifth decile).
44/50
22
Percentiles
For any (0,1), there exists a real number x such that S = ,

where S is the area under the curve from - to x . Formally,
= S = f ( x) dx = F ( x ).
x
45/50
Percentiles
Let be a number between 0 and 1. The (100 )th percentile
of the distribution of a continuous random variable X , denoted
by ( ), is defined by = F ( ( )).
Thus (0.75), the 75th percentile, is such that the area under
the graph of f ( x) to the left of (0.75) is equal 0.75.
Example. When we say that an individual's test score was at
the 90th percentile of the population, we mean that 90% of all
population scores were below that score and 10% were above.
Similarly, 30th percentile is the score that exceeds 30% of all
scores and is exceeded by 70% of all scores.
46/50
23
Other Measures of Dispersion

Just as there are various measures of central tendency besides the mean,
there are various measures of dispersion or scatter of a random variable
besides the variance or standard deviation. Some of the most common are:
1. SEMI-INTERQUARTILE RANGE. If x0.25 and x0.75 represent the 25th
and 75th percentile values, the difference x0.75 x0.25 is called the
(x0.75 x0.25 )
is the semi-interquartile range.
2
2. MEAN DEVIATION. The mean deviation (M.D.) of a random variable
interquartile range and
X is defined as the expectation of X , i.e., assuming convergence,

M .D.( X ) = E ( X ) = x f ( x)
M .D.( X ) = E ( X ) =
(discrete variable)
x f ( x) dx (continuous variable)
47/50
Skewness
Often a distribution is not symmetric about any value but instead has
one of its tails longer than the other. If the longer tail occurs to the right,
as in Fig. A, the distribution is said to be skewed to the right, while if
the longer tail occurs to the left, as in Fig. B, it is said to be skewed
to the left. Measures describing this asymmetry are called coefficients
of skewness, or briefly skewness. One such measure is given by:
E[( X )3 ] 3
= 3.
3 =
3
The measure 3 will be positive or negative according to whether the

distribution is skewed to the right or left, respectively. For a symmetric
distribution, 3 = 0.
48/50
24
Skewness and Kurtosis
Fig. A
Fig. B
Fig. C
49/50
Kurtosis
In some cases a distribution may have its values concentrated
near the mean so that the distribution has a large peak as indicated
by the solid curve of Fig. C. In other cases the distribution may
be relatively flat as in the dashed curve of Fig. C. Measures of
the degree of peakedness of a distribution are called coefficients
of kurtosis, or briefly kurtosis. A measure often used is given by
E[( X ) 4 ]
4
2 =
.
=
2 2
4
( E[( X ) ])
This is usually compared with the normal curve (discussed in the

next topic), which has a coefficient of kurtosis equal to 3.
50/50
25

Mathematical Expectation

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Mathematical Expectation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematical Expectation

Uploaded by

Copyright:

Available Formats

Dr Sonnet Nguyen, USTH 2011

Topic 3: Mathematical Expectation

Dr Sonnet Nguyen, USTH 2011

where the last summation is taken over all appropriate values of x.

which is called the arithmetic mean, or simply the mean, of

provided that the integral converges absolutely.

Dr Sonnet Nguyen, USTH 2011

Topic 3: Mathematical Expectation

Dr Sonnet Nguyen, USTH 2011

Functions of Random Variables

If X takes on the values x1 , x2 ,...., xn , and Y the values

Topic 3: Mathematical Expectation

Functions of Random Variables

Note that E[ g ( X )] do not involve the probability function and

Topic 3: Mathematical Expectation

Dr Sonnet Nguyen, USTH 2011

Topic 3: Mathematical Expectation

The Variance and Standard

Dr Sonnet Nguyen, USTH 2011

The Variance and Standard

If X takes on an infinite number of values x1 , x2 , . . . then

X ) 2 f ( x j ), provided that the series converges.

If X is a continuous random variable having density function f ( x ),

provided that the integral converges.

What does the variance measure?

Topic 3: Mathematical Expectation

Dr Sonnet Nguyen, USTH 2011

Units of Variance and Standard

Topic 3: Mathematical Expectation

Topic 3: Mathematical Expectation

Dr Sonnet Nguyen, USTH 2011

Standardized Random Variables

An important property of X * is that it has a mean of zero and a

Topic 3: Mathematical Expectation

Dr Sonnet Nguyen, USTH 2011

If X takes on an infinite number of values x1 , x2 , . . . , then

r = ( x j ) r f ( x j ) provided that the series converges.

If X is a continuous random variable having density function f ( x ),

f ( x)dx provided that the

Raw Moment or Moment about the

4 = 4' 4 3' + 62' 2 3 4 .

Dr Sonnet Nguyen, USTH 2011

Moment Generating Functions

Topic 3: Mathematical Expectation

Moment Generating Functions

M X (t ) = r' t r = 1 + t + 2' t 2 + .... + r' t r + ...,

Dr Sonnet Nguyen, USTH 2011

Since ei x 1, the series and the integral always converge

Topic 3: Mathematical Expectation

Dr Sonnet Nguyen, USTH 2011

Theorems on Moment Generating

Theorems on Moment Generating

Dr Sonnet Nguyen, USTH 2011

Theorems on Moment Generating

Relation between the density function and

which is often called an inversion formula, or inverse Fourier

Dr Sonnet Nguyen, USTH 2011

Variance for Joint Distributions