Lecture 3 PDF
Lecture 3 PDF
Lecture 3 PDF
Using Statistics
Properties of the Normal Distribution
The Standard Normal Distribution
The Normal Distribution The Transformation of Normal Random Variables
The Inverse Transformation
The Normal Approximation of Binomial Distributions
Explain
p a thee significance
s g ca ce ofo thee standard
s a da d normal
o a distribution
d s bu o 0.2 0.2 0.2
P(x)
P(x)
P(x)
P
P
P
distribution x x x
2
x
0.3
e 2 2
f(x)
f ( x) for x
0.2
1 0.1
2 2 0.0
1
The Normal Probability Distribution Properties of the Normal Distribution
0.2
2 2 0.1
• If several independent random variables are normally distributed • If X1, X2, …, Xn are independent normal random variable, then
then their sum will also be normally distributed. their sum S will also be normally distributed with
• The mean of the sum will be the sum of all the individual means. • E(S) = E(X1) + E(X2) + … + E(Xn)
• The variance of the sum will be the sum of all the individual • V(S) = V(X1) + V(X2) + … + V(Xn)
variances ((by
y virtue of the independence).
p )
• Note: It is
i the variances
i that can be added above and not the
standard deviations.
2
4-9 4-10
Example 4.1: Let X1, X2, and X3 be independent random variables that are • If X1, X2, …, Xn are independent normal random variable, then the
normally distributed with means and variances as shown. random variable Q defined as Q = a1X1 + a2X2 + … + anXn + b will
also be normally distributed with
• E(Q) = a1E(X1) + a2E(X2) + … + anE(Xn) + b
Mean Variance • (Q) = a12 V(X
V(Q) ( 1) + a22 V(X
( 2) + … + an2 V(X
( n)
X1 10 1
• Note: It is the variances that can be added above and not the
standard deviations.
X2 20 2
X3 30 3
f(x)
f(y)
0.2
E(Q) = 12 – 2(-5) + 3(8) – 4(10) + 5 = 11
0.1 P(25 X 35) normal probability density
V(Q) = 4 + (-2)2(2) + 32(5) + (-4)2(1) = 73 0.0 P(47 Y 53) function.
P(-1 Z 1)
-5 0 5
z
3
Finding Probabilities of the Standard
The Standard Normal Distribution
Normal Distribution: P(0 §Z § 1.56)
Standard Normal Probabilities
The standard normal random variable, Z, is the normal random Standard Normal Distribution z
0.0
.00
0.0000
.01
0.0040
.02
0.0080
.03
0.0120
.04
0.0160
.05
0.0199
.06
0.0239
.07
0.0279
.08
0.0319
.09
0.0359
f(z)
Standard Normal Distribution 0.2 0.7
0.8
0.2580
0.2881
0.2611
0.2910
0.2642
0.2939
0.2673
0.2967
0.2704
0.2995
0.2734
0.3023
0.2764
0.3051
0.2794
0.3078
0.2823
0.3106
0.2852
0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
0 .4 1 56
1.56 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
{
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
0.0 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
-5 -4 -3 -2 -1 0 1 2 3 4 5 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
0 .3 Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
=1
f(z)
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
{
0 .2
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
-5 -4 -3 -2 -1 0 1 2 3 4 5
0.4406 2.9
3.0
0.4981
0.4987
0.4982
0.4987
0.4982
0.4987
0.4983
0.4988
0.4984
0.4988
0.4984
0.4989
0.4985
0.4989
0.4985
0.4989
0.4986
0.4990
0.4986
0.4990
=0
Z
. .
Find table area for 2.47 . . . . 1. Find table area for 2.00 .
0.9
.
0.3159 ...
P(0 < Z < 2.47) = .4932 . . . .
F(2) = P(Z 2.00) = .5 + .4772 =.9772
1.0 0.3413 ...
1.1 0.3643 ...
2.3 ... 0.4909 0.4911 0.4913 . .
P(Z < -2.47) = .5 - P(0 < Z < 2.47) 2.4 ... 0.4931 0.4932 0.4934 . .
2. Find table area for 1.00 . .
= .5 - .4932 = 0.0068 2.5 ... 0.4948 0.4949 0.4951 1.9 0.4713 ...
. F(1) = P(Z 1.00) = .5 + .3413 = .8413 2.0
2.1
0.4772
0.4821
...
...
.
3. P(1 Z 2.00) = P(Z 2.00) - P(Z 1.00)
. .
. .
. . .
0.2
f(z)
0.2
0.1
0.1
0.0 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z
4
Finding Values of the Standard Normal
99% Interval around the Mean
Random Variable: P(0 § Z § z) = 0.40
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
To find z such that 0.0
0.1
0.0000
0.0398
0.0040
0.0438
0.0080
0.0478
0.0120
0.0517
0.0160
0.0557
0.0199
0.0596
0.0239
0.0636
0.0279
0.0675
0.0319
0.0714
0.0359
0.0753
To have .99 in the center of the distribution, there
z .04 .05 .06 .07 .08 .09
0.2
0.3
0.0793
0.1179
0.0832
0.1217
0.0871
0.1255
0.0910
0.1293
0.0948
0.1331
0.0987
0.1368
0.1026
0.1406
0.1064
0.1443
0.1103
0.1480
0.1141
0.1517
should be (1/2)(1-.99) = (1/2)(.01) = .005 in each . . . . . . .
. . . . . . .
P(0 Z z) = .40: 0.4
0.5
0.1554
0.1915
0.1591
0.1950
0.1628
0.1985
0.1664
0.2019
0.1700
0.2054
0.1736
0.2088
0.1772
0.2123
0.1808
0.2157
0.1844
0.2190
0.1879
0.2224
tail of the distribution, and (1/2)(.99) = .495 in .
2.4 ...
.
0.4927
.
0.4929
.
0.4931
.
0.4932
.
0.4934
.
0.4936
0.6
0.7
0.2257
0.2580
0.2291
0.2611
0.2324
0.2642
0.2357
0.2673
0.2389
0.2704
0.2422
0.2734
0.2454
0.2764
0.2486
0.2794
0.2517
0.2823
0.2549
0.2852
each half of the .99 interval. That is: 2.5 ... 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 ... 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 . . . . . . .
1. Find a probability as close as 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
P(0 Z z.005) = .495
. . . . . . .
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 . . . . . . .
possible to .40 in the table of 1.1
1.2
0.3643
0.3849
0.3665
0.3869
0.3686
0.3888
0.3708
0.3907
0.3729
0.3925
0.3749
0.3944
0.3770
0.3962
0.3790
0.3980
0.3810
0.3997
0.3830
0.4015
13
1.3 0 4032
0.4032 0 4049
0.4049 0 4066
0.4066 0 4082
0.4082 0 4099
0.4099 0 4115
0.4115 0 4131
0.4131 0 4147
0.4147 0 4162
0.4162 0 4177
0.4177
standard
t d d normall probabilities.
b biliti . . . . . . . . . . .
. . . . . . . . . . . Look to the table of standard normal probabilities Total area in center = .99
. . . . . . . . . . . Area in center left = .495
to find that:
2. Then determine the value of z Standard Normal Distribution 0.4
and column. Area to the left of 0 = .50 Area = .40 (.3997) z.005
P(z 0) = .50
f(z)
0.3 0.2
P(0 Z 1.28) .40 P(-.2575 Z ) = .99 Area in right tail = .005
f(z)
0.2 0.1
Area in left tail = .005
0.0 Z
-z.005 z.005
-5 -4 -3 -2 -1 0 1 2 3 4 5
P(Z 1.28) .90 Z Z = 1.28 -2.575 2.575
Transformation
100 160 180 160
0.05
f(x)
0.04
P
(1) Subtraction: (X - x)
Z
0.03
0.02 =10
{
0.4 0.00
0 10 20 30 40 50 60 70 80 90 100
30 30
P 2 Z .6666 )
X
0.3
f(z)
0.2
5
The Transformation of Normal
Using the Normal Transformation
Random Variables
P
X 150
P Z
150 127 The transformation of X to Z, where a and b are numbers::
a
22
P ( X a ) P Z
P Z 1.045
0.5 0.3520 0.8520 b
P ( X b ) P Z
a b
P (a X b ) P Z
0 .2
. . . . . 0.02
. . . . .
1.1 ... 0.3790 0.3810 0.3830 0.01
1.2 ... 0.3980 0.3997 0.4015 0.01
1.3 ... 0.4147 0.4162 0.4177
. . . . .
. . . . . 0.00
. . . . . 80 130 139.36 180
X
6
4-26
f(x)
z .02 .03 .04 0.0006
.
. . . . .
. . . . .
z
. .
.05
.
.06
.
.07
.
standard normal 0.0004
.
. . . . .
. . . . . 0.0002
.
2.2
23
2.3
...
...
0.4868
0 4898
0.4898
0.4871
0 4901
0.4901
0.4875
0 4904
0.4904
. . . . . distribution.
1.8 ... 0.4678 0.4686 0.4693 0.0000
2.4 ... 0.4922 0.4925 0.4927 1000 2000 3000 4000
1.9 ... 0.4744 0.4750 0.4756
. . . . .
. . . . .
2.0 ... 0.4798 0.4803 0.4808 X
. . . . .
. . . . .
. . . . .
S tand ard Norm al D istrib utio n
Normal Distribution: = 5.7 = 0.5 Normal Distribution: = 2450 = 400
0.8
0.4
0.0015
Area = 0.49
0.7
0.3
0.6 .4750 .4750
0.5 0.0010
f(z)
0.2
f(x)
f(x)
0.4
0.3 X.01 = +z = 5.7 + (2.33)(0.5) = 6.865
0.0005 0.1
0.2 .0250 .0250
0.1 Area = 0.01
0.0
0.0 0.0000 -5 -4 -3 -2 -1 0 1 2 3 4 5
3.2 4.2 5.2 6.2 7.2 8.2 1000 2000 3000 4000 Z
X X
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
z Z.01 = 2.33 -1.96 Z 1.96
4-27
0.0006
. 0.0004
.
distribution in 0.0004
. standard normal 0.0002
. find the z value
.9500
question
ti and d off th
the 0.0002
0 .0002 .9500
9500 distribution. 0.0000 or values.
l
0.0000 1000 2000 3000 4000
standard normal 1000 2000 3000 4000 X
distribution.
X 2. Shade the area S tand ard Norm al D istrib utio n
S tand ard Norm al D istrib utio n corresponding 0.4
0.4 to the desired .4750 .4750
.4750
2. Shade the area 0.3
.4750
probability.
0.3
f(z)
corresponding to 0.2
f(z)
0.2
the desired z
. .
.05
.
.06
.
.07
. 0.1
0.1 . . . . . .9500
probability. .9500 . . . . .
0.0
1.8 ... 0.4678 0.4686 0.4693
0.0 1.9 ... 0.4744 0.4750 0.4756 -5 -4 -3 -2 -1 0 1 2 3 4 5
-5 -4 -3 -2 -1 0 1 2 3 4 5 2.0 ... 0.4798 0.4803 0.4808 Z
Z . . . . .
. . . . .
-1.96 1.96
7
Finding Values of a Normal Random Finding Values of a Normal Random
Variable, Given a Probability Variable, Given a Probability
Normal Distribution: = 2450, = 400
1. Draw pictures of
0.0012
. 3. From the table The normal distribution with = 3.5 and = 1.323 is a close
the normal .4750 .4750
distribution in
0.0010
.
0.0008
.
of the standard approximation to the binomial with n = 7 and p = 0.50.
normal
f(x)
0.0004
. distribution, Normal Distribution: = 3.5, = 1.323
standard normal P(x<4.5) = 0.7749 Binomial Distribution: n = 7, p = 0.50
0.0002
. .9500 find the z value
distribution. 0.0000
0.3 0.3
P(x)
f(x)
S tand ard Norm al D istrib utio n
corresponding 0.4
transformation 0.1 0.1
0.2 X X
f(x)
0.1
0.1
np(1 p) np(1 p)
0.0
0.0
8
Confidence Interval or Interval
Confidence interval Using Statistics
Estimate
• Consider the following statements:
A confidence interval or interval estimate is a range or interval of
x = 550 numbers believed to include an unknown population parameter.
• A single-valued estimate that conveys little information Associated with the interval is a measure of the confidence we have
about the actual value of the population mean. that the interval does indeed contain the parameter of interest.
We are 99% confident that is in the interval [449,551]
• An interval estimate which locates the population mean • A confidence interval or interval estimate has two components:
within a narrow interval, with a high level of confidence. A range or interval of values
We are 90% confident that is in the interval [400,700] An associated level of confidence
• An interval estimate which locates the population mean
within a broader interval, with a lower level of confidence.
0.3 x 1.96
n
f(z)
or 0.2
9
A 95% Interval around the Population
The 95% Confidence Interval for
Mean
Sampling Distribution of the Mean
0.4 Approximately 95% of sample means A 95% confidence interval for when is known and sampling is
0.3
95%
can be expected to fall within the done from a normal population, or a large sample is used, is:
interval 1.96 , 1.96 .
x 1.96
f(x)
0.2
n n
0.1
2.5% 2.5%
n
Conversely, about 22.5%
Conversely 5% can be
The quantity 1.9 6 is often called the margin of error or the
0.0
196
.
n
1.96
n
x
expected to be above 1.96 n and n
2.5% can be expected to be below sampling error.
x
1.96 . For example, if: n = 25 A 95% confidence interval:
n
x
2.5% fall below
= 20 20
the interval x
x 1.96 122 1.96
x So 5% can be expected to fall outside x = 122 n 25
122 (1.96)( 4 )
x
x 2.5% fall above the interval 1.96 , 1.96 .
x
the interval n n 122 7 .84
x
x
114 .16,129.84
95% fall within
the interval
02
0.2
f
f(z)
-5 -4 -3 -2 -1 0 1 2 3 4 5
P z z z (1 )
0.2
z z
0.1 2 2
0.90 0.050 1.645 2
Z
2
2 2
0.0 (1- )100% Confidence Interval: 0.80 0.100 1.282
-5 -4 -3 -2 -1 0 1 2 3 4 5
z Z z x z
2 2
2 n
10
The Level of Confidence and the The Sample Size and the Width of the
Width of the Confidence Interval Confidence Interval
When sampling from the same population, using a fixed sample size, the When sampling from the same population, using a fixed confidence
higher the confidence level, the wider the confidence interval. level, the larger the sample size, n, the narrower the confidence
St an d ar d N o r m al Di s tri b uti o n St an d ar d N or m al Di s tri b uti o n
interval.
0.4 0.4
S am p ling D is trib utio n o f th e M e an S a m p lin g D is trib utio n o f th e M e an
0.3 0.3 0 .4 0 .9
0 .8
f(z)
f(z)
0 .3 0 .7
0.2 0.2
0 .6
0 .5
f(x)
f(x)
0 .2
0.1 0.1 0 .4
0 .3
0 .1
0.0 0.0 0 .2
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 0 .1
Z Z 0 .0 0 .0
x 1.28 x 1.96 95% Confidence Interval: n = 20 95% Confidence Interval: n = 40
n n
• Shrimmy,is planning to invest heavily in black tiger breed. As part of the If the population standard deviation, , is not known, replace
decision, the company wants to estimate the average amount of black tiger with the sample standard deviation, s. If the population is
shrimp a family of four would need per month. A random sample of n = 100
families is obtained, and in this sample the average amount of shrimp in pound normal, the resulting statistic: t X
s
per month is 6.5 and the population standard deviation is known to be 3.2.
Construct a 95% confidence interval for the average amount of shrimp n
consumed d by
b th
the entire
ti population
l ti off families
f ili off 4.4 has a t distribution with (n - 1) degrees of freedom.
• The t is a family of bell-shaped and symmetric Standard normal
distributions, one for each number of degree of
freedom. t, df = 20
• The expected value of t is 0.
t, df = 10
• For df > 2, the variance of t is df/(df-2). This is
greater than 1, but approaches 1 as the number
of degrees of freedom increases. The t is flatter
and has fatter tails than does the standard
normal.
• The t distribution approaches a standard normal
11
Confidence Intervals for when is
The t Distribution
Unknown- The t Distribution
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657 t D is trib utio n: d f = 1 0
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841 0 .4
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 0 .3
}
f(t)
0 .2
9 1.383 1.833 2.262 2.821 3.250
(assuming a normally distributed population) is given by: 10
11
1.372
1.363
1.812
1.796
2.228
2.201
2.764
2.718
3.169
3.106
0 .1
12 1.356 1.782 2.179 2.681 3.055
x t
14 1.345 1.761 2.145 2.624 2.977 0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372 1.372
0
16 1.337 1.746 2.120 2.583 2.921 -2.228 2.228
}
17 1.333 1.740 2.110 2.567 2.898 t
2
18 1.330 1.734 2.101 2.552 2.878
Area = 0.025 Area = 0.025
where t is the value of the t distribution with n-1 degrees of
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
2 22 1.321 1.717 2.074 2.508 2.819
12
Large Sample Confidence Intervals for
the Population Mean
Example An environmental scientist wants to estimate the average amount of NOx in a given region. A random sample
of 100 data points gives x-bar = 357.60 ppm and s = 140.00 ppm. Give a 95% confidence interval for , the average
amount of NOx in any sample taken.
s 140.00
x z 0 . 025 357.60 1.96 357.60 27.44 330.16,385.04
n 100
Exercise 1 Exercise 2
13
Large-Sample Confidence Intervals Large-Sample Confidence Intervals
for the Population Proportion, p for the Population Proportion, p
Example Exercise 3
A marketing research firm wants to estimate the share that foreign companies
have in the American market for certain products. A random sample of 100
consumers is obtained, and it is found that 34 people in the sample are users
of foreign-made products; the rest are users of domestic products. Give a
95% confidence interval for the share of foreign products in this market.
pq ( 0.34 )( 0.66)
p z 0.34 1.96
2
n 100
0.34 (1.96)( 0.04737 )
0.34 0.0928
0.2472 ,0.4328
Thus, the firm may be 95% confident that foreign manufacturers control
anywhere from 24.72% to 43.28% of the market.
14
Confidence Intervals for the Population Variance:
The Chi-Square (2) Distribution The Chi-Square (2) Distribution
• The sample variance, s2, is an unbiased estimator of the population The chi-square random variable cannot be C hi-S q uare D is trib utio n: d f=1 0 , df =3 0 , d f =5 0
f( )
The chi-square distribution is the probability distribution of the sum of as the degrees of freedom increase. df = 30
2
0 .0 5
0 .0 4
several independent,
independent squared standard normal random variables.
variables 0 .0 3
0 .0 2
df = 50
2
to twice the number of degrees of freedom, (V[2] = 2df).
In sam pling from a norm al population, the random variable:
( n 1) s 2
2
2
15
Example (continued) Sample-Size Determination
Area in Right Tail Before determining the necessary sample size, three questions must
df .995 .990 .975 .950 .900 .100 .050 .025 .010 .005 be answered:
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . . • How close do you want your sample estimate to be to the unknown parameter? (What is the
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
desired bound, B?)
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67 • What do you want the desired confidence level (1-) to be so that the distance between your
estimate
i andd the
h parameter is
i less
l than
h or equall to B?
Chi-Square Distribution: df = 29 • What is your estimate of the variance (or standard deviation) of the population in question?
0.06
0.05
0.95
0.04
f( )
2
0.03
0.02
0.025
}
0 10 20 30 40 50 60 70
2
2
20.975 16.05 20.025 45.72
Bound, B
The sample size determines the bound of a statistic, since the standard
error of a statistic shrinks as the sample size increases:
Sample size = 2n
Standard error
of statistic
Sample size = n
Standard error
of statistic
16
Minimum Sample Size: Mean and
Example
Proportion
Minimum required sample size in estimating the population A microbiologist wants to conduct an experiment to estimate the average amount
mean, : of micro-organisms in the water of a popular river. He plans to determine the
z2 2 average amount of micro organism to within 120 µg/ml, with 95% confidence.
n 2 2 From past record, an estimate of the population standard deviation is
B s = 400 µg/ml. What is the minimum required sample size?
Bound of estimate:
B = z
2 n z
2 2
n 2
2
B
Minimum required sample size in estimating the population
proportion, p 2
(1.96 ) ( 400 ) 2
z2 pq 2
120
n 2 2
B 42 .684 43
17