Mathematical Expectation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Dr Sonnet Nguyen, USTH 2011

Topic 3: Mathematical Expectation


Course: Probability and Statistics
Lect. Dr Quang Hng/ Sonnet Nguyen

Contents
o
o
o
o
o
o
o
o

Expectation
Variance, standard deviation
Moments
Conditional expectation, conditional variance,
conditional moments
Chebyshevs inequality
Weak and Strong Laws of Large Numbers
Other Measures of Central Tendency
Other Measures of Dispersion
Topic 3: Mathematical Expectation

2/50

Dr Sonnet Nguyen, USTH 2011

Expectation
Mathematical Expectation or Expected Value, or briefly the
Expectation is a very important concept in P&S.
For a discrete random variable X having the possible values x1 ,...., xn ,
the expectation of X is defined as
n

E ( X ) = x1P ( X = x1 ) + ..... + xn P ( X = xn ) = x j P ( X = x j )
j =1

or equivalently, if P( X = x j ) = f ( x j ), then E ( X ) = x j f ( x j ),
j =1

where the last summation is taken over all appropriate values of x.


Topic 3: Mathematical Expectation

3/50

Expectation
As a special case when the probabilities are all equal, we have
1
E( X ) =
n

x
j =1

which is called the arithmetic mean, or simply the mean, of


x1 , x2 ,..., xn .
For a continuous random variable X having density function f ( x),
the expectation of X is defined as
E( X ) =

x f ( x) dx

provided that the integral converges absolutely.


Topic 3: Mathematical Expectation

4/50

Dr Sonnet Nguyen, USTH 2011

Expectation
The expectation of X is very often called the mean of X and
is denoted by X , or simply , when the particular random
variable is understood.
The mean, or expectation, of X gives a single value that acts
as a representative or average of the values of X , and for this
reason it is often called a measure of central tendency.

Topic 3: Mathematical Expectation

5/50

Example
Suppose that a game is to be played with a single die assumed fair.
In this game a player wins $20 if a 2 turns up, $40 if a 4 turns up;
loses $30 if a 6 turns up; while the player neither wins nor loses if
any other face turns up. Find the expected sum of money to be won.
Solution.

E(X)=0

1
1
1
1
1
1
+ 0 + 40 + 0 + (-30) = 5.
+ 20
6
6
6
6
6
6
Topic 3: Mathematical Expectation

6/50

Dr Sonnet Nguyen, USTH 2011

Functions of Random Variables


Let X be a discrete random variable with probability function
f ( x). Then Y = g ( X ) is also a discrete random variable, and
the probability function of Y is
h( y ) = P (Y = y ) =

P( X = x) =

{ x| g ( x ) = y }

f ( x).

{ x| g ( x ) = y }

If X takes on the values x1 , x2 ,...., xn , and Y the values


y1 , y2 ,...., ym (m n), then

y
i =1

h( yi ) = g ( x j ) f ( x j ).
j =1

Therefore, E[ g ( X )] = g ( x j ) f ( x j ) = g ( x) f ( x).
j =1

Topic 3: Mathematical Expectation

7/50

Functions of Random Variables


Similarly, if X is a continuous random variable having probability
density f ( x), then it can be shown that E[ g ( X )] =

g ( x) f ( x) dx.

Note that E[ g ( X )] do not involve the probability function and


the probability density function of Y = g ( X ).
Generalizations are easily made to functions of two or more
random variables. For example, if X and Y are two continuous
random variables having joint density function f ( x, y ), then the
expectation of g ( X , Y ) is given by
E[ g ( X , Y )] =

+ +

g ( x, y) f ( x, y ) dx dy

Topic 3: Mathematical Expectation

8/50

Dr Sonnet Nguyen, USTH 2011

Theorem on Expectation
Theorem 1:
a1) If c is any constant, then
E (c X ) = c E ( X ).
a2) If X and Y are any random variables, then
E ( X + Y ) = E ( X ) + E (Y ).
a3) If X and Y are independent random variables, then
E ( XY ) = E ( X ) E (Y ).

Topic 3: Mathematical Expectation

9/50

The Variance and Standard


Deviation
Another quantity of great importance is called
the variance and is defined by:
Var ( X ) = E[( X X ) 2 ] = E[( X E ( X )) 2 ].
The variance is a nonnegative number.
The positive square root of the variance is called
the standard deviation and is given by
X = Var ( X ) = E[( X X ) 2 ] .
Topic 3: Mathematical Expectation

10/50

Dr Sonnet Nguyen, USTH 2011

The Variance and Standard


Deviation
If X is a discrete random variable taking the values x1 , x2 , . . . , xn and
having probability function f ( x), then the variance is given by
n

= E[( X X ) ] = ( x j X ) 2 f ( x j ).
2
X

j =1

If X takes on an infinite number of values x1 , x2 , . . . then


=
2
X

(x
j =1

X ) 2 f ( x j ), provided that the series converges.

If X is a continuous random variable having density function f ( x ),


then the variance is given by: = E[( X X ) ] =
2
X

(x

) 2 f ( x)dx,

provided that the integral converges.


Topic 3: Mathematical Expectation

11/50

What does the variance measure?


The variance (or the standard deviation) is a measure of the dispersion, or
scatter, of the values of the random variable about the mean . If the values
tend to be concentrated near the mean, the variance is small; while if the
values tend to be distributed far from the mean, the variance is large. The
situation is indicated graphically in the following figure for the case of two
continuous distributions having the same mean .

Topic 3: Mathematical Expectation

12/50

Dr Sonnet Nguyen, USTH 2011

Units of Variance and Standard


Deviation
Note that if X has certain dimensions or units, such as cm,
then the variance of X has units cm 2 while the standard
deviation has the same unit as X , i.e., cm.
For this reason, the standard deviation is often used.
When no confusion can result, the standard deviation is
often denoted by instead of X , and the variance in
such case is 2 .

Topic 3: Mathematical Expectation

13/50

Theorems on Variance
Theorem 2:
a1) X2 = E[( X X ) 2 ] = E ( X 2 ) X2 = E ( X 2 ) [ E ( X )]2 .
a2) If c is any constant, then Var (c X ) = c 2 Var ( X ).
a3) The quantity E[( X a ) 2 ] is a minimum when a = E ( X ).
a4) If X and Y are independent random variables,
Var ( X Y ) = Var ( X ) + Var (Y ) or X2 Y = X2 + Y2 .
In words, the variance of a sum of independent variables equals
the sum of their variances.

Topic 3: Mathematical Expectation

14/50

Dr Sonnet Nguyen, USTH 2011

Standardized Random Variables


Let X be a random variable with mean and standard deviation
( >0). Then we can define an associated standardized random
X -
variable given by X * =
.

An important property of X * is that it has a mean of zero and a


variance of 1, which accounts for the name standardized, i.e.,
E ( X *) = 0, Var ( X *) = 1. The values of a standardized variable are
sometimes called standard scores, and X is then said to be expressed
in standard units (i.e., is taken as the unit in measuring X ).
Standardized variables are useful for comparing different distributions.
Topic 3: Mathematical Expectation

15/50

Moments
The rth moment of a random variable X about the
mean , also called the rth central moment, is
defined as r = E[( X )r ] where r =0,1,2,.....
It follows that 0 =1, 1 =0, and 2 = 2 , i.e., the
second central moment or second moment about
the mean is the variance.

Topic 3: Mathematical Expectation

16/50

Dr Sonnet Nguyen, USTH 2011

Moments
If X is a discrete random variable taking the values x1 ,......, xn
and having probability function f ( x), then the rth moment is
n

r = E[( X ) ] = ( x j )r f ( x j ).
r

j =1

If X takes on an infinite number of values x1 , x2 , . . . , then

r = ( x j ) r f ( x j ) provided that the series converges.


j =1

If X is a continuous random variable having density function f ( x ),


then r = E[( X ) ] =
r

(x )

f ( x)dx provided that the

integral converges.
Topic 3: Mathematical Expectation

17/50

Raw Moment or Moment about the


origin
The rth moment of X about the origin, also called the rth raw
moment, is defined as: r' = E ( X r ) where r = 0, 1, 2, . . . .
The relationship between these moments is given by
r
r
r = r' r' 1 + ... + (1) j r' j j + ... + ( 1) r 0' r .
1
j
As special cases we have, using 1' = and 0' = 1,
2 = 2' 2 ,

3 = 3' 3 2' + 2 3 ,

4 = 4' 4 3' + 62' 2 3 4 .


Topic 3: Mathematical Expectation

18/50

Dr Sonnet Nguyen, USTH 2011

Moment Generating Functions


The moment generating function of X is defined by
M X (t ) = E (etX ), that is, assuming convergence,
M X (t ) =
M X (t ) =

tx

f ( x)

(discrete variable)

et x f ( x) dx

(continuous variable).

Topic 3: Mathematical Expectation

19/50

Moment Generating Functions


Homework. Show that the Taylor series expansion of the
moment generating function is

M X (t ) = r' t r = 1 + t + 2' t 2 + .... + r' t r + ...,


r =0

dr
where = r M X (t )
is the rth derivative of M X (t )
dt
t =0
'
r

evaluated at t = 0.
Note: Since the coefficients in this expansion enable us to
find the moments, the reason for the name moment generating
function is apparent.
Topic 3: Mathematical Expectation

20/50

10

Dr Sonnet Nguyen, USTH 2011

Characteristic Functions
If putting t = i , where i is the imaginary unit, in the moment
generating function we obtain an important function called the
characteristic function. We denote this by
X ( ) = M X (i ) = E (ei X ).
It follows that, assuming convergence,
X ( ) = ei x f ( x)
X ( ) =

ix

(discrete variable)

f ( x) dx (continuous variable).

Since ei x 1, the series and the integral always converge


absolutely.
Topic 3: Mathematical Expectation

21/50

Characteristic Functions
Homework. Show that the Taylor series expansion of the
characteristic function is
r
(i ) 2
' (i )
X ( ) = (i ) = 1 + (i ) +
+ .... + r
+ .....
r!
2!
r =0

'
r

'
2

dr
( )
where = ( i)
r X
d
=0
'
r

Topic 3: Mathematical Expectation

22/50

11

Dr Sonnet Nguyen, USTH 2011

Theorems on Moment Generating


Functions and Characteristic Functions
Theorem 3: If M X (t ) is the moment generating function
of the random variable X and a and b (b 0) are constants,
then the moment generating function of ( X + a ) / b is
t
M ( X + a )/ b (t ) = eat / b M X .
b
Similarly, for characteristic function X ( )

( X + a )/ b ( ) = e a i / b X .
b
Topic 3: Mathematical Expectation

23/50

Theorems on Moment Generating


Functions and Characteristic Functions
Theorem 4: If X and Y are independent random variables having
moment generating functions M X (t ) and M Y (t ), respectively, then
M X +Y (t ) = M X (t ) M Y (t ).
In words, the moment generating function of a sum of
independent random variables is equal to the product of their
moment generating functions.
Similarly, for characteristic functions X ( ) and Y ( ) of two
independent random variables X and Y
X+Y ( ) = X ( ) Y ( ).
Topic 3: Mathematical Expectation

24/50

12

Dr Sonnet Nguyen, USTH 2011

Theorems on Moment Generating


Functions and Characteristic Functions
Theorem 5 (Uniqueness Theorem)
Suppose that X and Y are random variables having moment
generating functions M X (t ) and M Y (t ), respectively. Then X
and Y have the same probability distribution if and only if
M X (t ) = M Y (t ) identically.
Similarly, probability distribution is uniquely determined by
its characteristic function. Thus, suppose that X and Y are
random variables having characteristic functions X ( ) and
Y ( ) respectively, then X and Y have the same probability
distribution if and only if X ( ) = Y ( ) identically.
Topic 3: Mathematical Expectation

25/50

Relation between the density function and


the characteristic function
Note: An important reason for introducing the characteristic
function is that represents the Fourier transform of the density
function f ( x). From the theory of Fourier transforms, we can
easily determine the density function from the characteristic
function. In fact,
1
f ( x) =
2

i x

X ( ) d

which is often called an inversion formula, or inverse Fourier


transform.
Another reason for using the charact. function is that it always
exists whereas the moment generating function may not exist.
Topic 3: Mathematical Expectation

26/50

13

Dr Sonnet Nguyen, USTH 2011

Variance for Joint Distributions


The results given above for one variable can be extended to two or
more variables. For example, if X and Y are two continuous random
variables having joint density function f ( x, y ), the means, or
expectations, of X and Y are

x f ( x, y) dx dy,

X = E ( X ) =

Y = E (Y ) =

y f ( x, y) dx dy,

and the variances are


= E[( X X ) ]=
2
X

(x

)2 f ( x, y ) dx dy,

= E[(Y Y ) ]=
2
Y

(y

) 2 f ( x, y ) dx dy.

Topic 3: Mathematical Expectation

27/50

Covariance
Another quantity that arises in the case of two variables X and Y is the
covariance defined by
XY = Cov( X , Y ) = E[( X X )(Y Y )].
In terms of the joint density function f ( x, y ), we have
XY =

(x

)( y Y ) f ( x, y) dx dy.

Similar remarks can be made for two discrete random variables.


X = x f ( x, y ),
x

Y = y f ( x, y ),
x

XY = ( x X )( y Y ) f ( x, y),
x

where the sums are taken over all the discrete values of X and Y .
Topic 3: Mathematical Expectation

28/50

14

Dr Sonnet Nguyen, USTH 2011

Properties of covariance
Theorem 6:
a1) XY = E ( XY ) E ( X ) E (Y ) = E ( XY ) X Y .
a2) If X and Y are independent random variables, then
XY = Cov( X , Y ) = 0.
a3) Var ( X Y ) = Var ( X ) + Var (Y ) 2 Cov ( X , Y ).
OR X2 Y = X2 + Y2 2 XY .
a4) XY X Y .

Topic 3: Mathematical Expectation

29/50

Correlation Coefficient
If X and Y are independent, then Cov( X , Y ) = XY = 0. On the other
hand, if X and Y are completely dependent, e.g. when X = Y , then
Cov( X , Y ) = XY = X Y . From this we are led to a measure of the
dependence of the variables X and Y given by
=

XY
.
XY

We call the correlation coefficient, or coefficient of correlation.


From the property a4) of covariance we see that -1 1. In the
case where =0 (i.e., zero covariance), we call the variables X
and Y uncorrelated. In such cases ( =0) the variables may or may
not be independent.
Topic 3: Mathematical Expectation

30/50

15

Dr Sonnet Nguyen, USTH 2011

Conditional Expectation, Variance,


and Moments
If X and Y have joint density function f ( x, y ), then we
have seen in Topic 2, the conditional density function
f ( y | x) =

of Y given X is

f ( x, y )
, where f1 ( x) is the
f1 ( x)

marginal density function of X .


We can define the conditional expectation, or conditional
mean, of Y given X by:
E (Y | X = x) =

y f ( y | x) dy,

where "X = x" is

to be interpreted as x < X x + dx in the continuous case.


Topic 3: Mathematical Expectation

31/50

Conditional Expectation, Variance,


and Moments
Similar properties (a1) and (a2) of Theorem 1 also hold for
conditional expectation. We note the following properties:
1. E (Y | X = x) = E (Y ), when X and Y are independent.
2. E (Y ) =

E (Y | X = x) f ( x) dx.
1

It is often convenient to calculate expectations by use of the


property 2, rather than directly.

Topic 3: Mathematical Expectation

32/50

16

Dr Sonnet Nguyen, USTH 2011

Example
The average travel time to a distant city
is c hours by car or b hours by bus.
A woman cannot decide whether to drive
or take the bus, so she tosses a coin.
What is her expected travel time?

Topic 3: Mathematical Expectation

33/50

Example - Solution
Solution. Here we are dealing with the joint distribution of the
outcome of the toss, X , and the travel time, Y , where
Y = Ycar if X = 0 and Y = Ybus if X = 1.
Presumably, both Ycar and Ybus are independent of X, so that by
Property 1 above: E(Y|X=0)=E(Ycar |X=0)=E(Ycar )=c, and
E(Y|X=l)=E(Ybus |X=1)=E(Ybus )=b.
Then Property 2 (with the integral replaced by a sum) gives, for a
fair coin,
E (Y ) = E (Y | X = 0) P( X = 0) + E (Y | X = 1) P( X = 1) =

Topic 3: Mathematical Expectation

c+b
.
2
34/50

17

Dr Sonnet Nguyen, USTH 2011

Conditional Variance and


Conditional Moments
We can define the conditional variance of Y given X by

E[(Y 2 ) | X = x] =
2

(y )
2

f ( y | x) dy

where 2 = E (Y | X = x) .
We can also define the rth conditional moment of Y about any
value a given X as
E[(Y a) | X = x] =
r

( y a)

f ( y | x) dy.

The usual theorems for variance and moments extend to


conditional variance and moments.
Topic 3: Mathematical Expectation

35/50

Chebyshevs Inequality
An important theorem in probability and statistics that reveals
a general property of discrete or continuous random variables
having finite mean and variance is known under the name of
Chebyshev's inequality.
Theorem 7 (Chebyshev's Inequality):
Suppose that X is a random variable (discrete or continuous)
having mean and variance 2 , which are finite. Then if is
any positive number,
2
1
P(|X- | ) 2 or, with = k , P(|X- | k ) 2 .

k
Topic 3: Mathematical Expectation

36/50

18

Dr Sonnet Nguyen, USTH 2011

Proof of the Chebyshevs Inequality


We present the proof for continuous r.v. A proof for
discrete r.v. is similar. If f ( x) is the density function
of X , then

= E[( X ) ] = ( x )2 f ( x) dx
2

|x | ( x ) 2 f ( x) dx
2 | x | f ( x) dx = 2 P (| x | ).
2
Thus, P(| x | ) 2 .

Topic 3: Mathematical Expectation

37/50

Example of the Chebyshevs


Inequality
Example: Let k = 2 in Chebyshev's inequality, we see that
1
= 0.25 or P(|X- |<2 ) 0.75.
2
2
In words, the probability of X differing from its mean by
more than 2 standard deviations is less than or equal to 0.25;

P(|X- | 2 )

equivalently, the probability that X will lie within 2 standard


deviations of its mean is greater than or equal to 0.75.
This is remarkable because we have not even specified the
probability distribution of X .
Topic 3: Mathematical Expectation

38/50

19

Dr Sonnet Nguyen, USTH 2011

Weak Law of Large Numbers


The following theorem, called the law of large numbers, is an
interesting consequence of Chebyshev's inequality.
Theorem 8 (Law of Large Numbers): Let X 1 , X 2 , . . . , X n be
mutually independent random variables (discrete or continuous),
each having finite mean and variance 2 .
S

If S n = X i for n = 1, 2,...., then lim P n = 0.


n
i =1
n

S
Since n is the arithmetic mean of X 1 , . . . , X n , this theorem
n
S
states that the probability of the arithmetic mean n differing
n
from its expected value by more than approaches zero as n .
n

Topic 3: Mathematical Expectation

39/50

Weak and Strong Law of Large


Numbers
A stronger result, which we might expect to be true, is that
S
lim n = , but this is actually false. However, we can prove
x
n
S
S

that lim n = with probability one, i.e. P lim n = =1.


x
n
x n

This result is often called the strong law of large numbers, and,
by contrast, that of Theorem 8 is called the weak law of large
numbers. When the "law of large numbers" is referred to
without qualification, the weak law is implied.
Topic 3: Mathematical Expectation

40/50

20

Dr Sonnet Nguyen, USTH 2011

Other Measures of Central


Tendency
As we have already seen, the mean, or expectation,
of a random variable X provides a measure of
central tendency for the values of a distribution.
Although the mean is used most, two other measures
of central tendency are also employed. These are
the mode and the median.

Topic 3: Mathematical Expectation

41/50

Other Measures of Central


Tendency: MODE
1. MODE. The mode of a discrete random variable is that value
which occurs most often or, in other words, has the greatest
probability of occurring.
Sometimes we have two, three, or more values that have
relatively large probabilities of occurrence. In such cases, we
say that the distribution is bimodal, trimodal, or multimodal,
respectively.
The mode of a continuous random variable X is the value
(or values) of X where the probability density function has a
relative maximum.
Topic 3: Mathematical Expectation

42/50

21

Dr Sonnet Nguyen, USTH 2011

Other Measures of Central


Tendency: MEDIAN
2. MEDIAN. The median is that value m for which
1
1
P ( X < m)
and P ( X > m) .
2
2
In the case of a continuous distribution we have
P( X < m) = 0.5 = P ( X > m), or equivalently: F (m) = 0.5.
and the median separates the density curve into two parts
having equal areas of 1/2 each. In the case of a discrete
distribution a unique median may not exist.
Topic 3: Mathematical Expectation

43/50

Percentiles
It is often convenient to subdivide the area under a density curve
by use of ordinates so that the area to the left of the ordinate is
some percentage of the total unit area. The values corresponding
to such areas are called percentile values, or briefly percentiles.
Thus, for example, the area to the left of the ordinate at x in the
above Fig is .
For instance, the area to the left of x0.10 would be 0.10, or 10%,
and x0.10 would be called the 10th percentile (also called the first
decile). The median would be the 50th percentile (or fifth decile).
Topic 3: Mathematical Expectation

44/50

22

Dr Sonnet Nguyen, USTH 2011

Percentiles

For any (0,1), there exists a real number x such that S = ,


where S is the area under the curve from - to x . Formally,
= S = f ( x) dx = F ( x ).
x

Topic 3: Mathematical Expectation

45/50

Percentiles
Let be a number between 0 and 1. The (100 )th percentile
of the distribution of a continuous random variable X , denoted
by ( ), is defined by = F ( ( )).
Thus (0.75), the 75th percentile, is such that the area under
the graph of f ( x) to the left of (0.75) is equal 0.75.
Example. When we say that an individual's test score was at
the 90th percentile of the population, we mean that 90% of all
population scores were below that score and 10% were above.
Similarly, 30th percentile is the score that exceeds 30% of all
scores and is exceeded by 70% of all scores.
Topic 3: Mathematical Expectation

46/50

23

Dr Sonnet Nguyen, USTH 2011

Other Measures of Dispersion


Just as there are various measures of central tendency besides the mean,
there are various measures of dispersion or scatter of a random variable
besides the variance or standard deviation. Some of the most common are:
1. SEMI-INTERQUARTILE RANGE. If x0.25 and x0.75 represent the 25th
and 75th percentile values, the difference x0.75 x0.25 is called the
(x0.75 x0.25 )
is the semi-interquartile range.
2
2. MEAN DEVIATION. The mean deviation (M.D.) of a random variable
interquartile range and

X is defined as the expectation of X , i.e., assuming convergence,


M .D.( X ) = E ( X ) = x f ( x)
M .D.( X ) = E ( X ) =

(discrete variable)

x f ( x) dx (continuous variable)

Topic 3: Mathematical Expectation

47/50

Skewness
Often a distribution is not symmetric about any value but instead has
one of its tails longer than the other. If the longer tail occurs to the right,
as in Fig. A, the distribution is said to be skewed to the right, while if
the longer tail occurs to the left, as in Fig. B, it is said to be skewed
to the left. Measures describing this asymmetry are called coefficients
of skewness, or briefly skewness. One such measure is given by:
E[( X )3 ] 3
= 3.
3 =
3

The measure 3 will be positive or negative according to whether the


distribution is skewed to the right or left, respectively. For a symmetric
distribution, 3 = 0.
Topic 3: Mathematical Expectation

48/50

24

Dr Sonnet Nguyen, USTH 2011

Skewness and Kurtosis

Fig. A

Fig. B

Fig. C

Topic 3: Mathematical Expectation

49/50

Kurtosis
In some cases a distribution may have its values concentrated
near the mean so that the distribution has a large peak as indicated
by the solid curve of Fig. C. In other cases the distribution may
be relatively flat as in the dashed curve of Fig. C. Measures of
the degree of peakedness of a distribution are called coefficients
of kurtosis, or briefly kurtosis. A measure often used is given by
E[( X ) 4 ]
4
2 =
.
=
2 2
4
( E[( X ) ])

This is usually compared with the normal curve (discussed in the


next topic), which has a coefficient of kurtosis equal to 3.
Topic 3: Mathematical Expectation

50/50

25

You might also like