cHAPTER 6 STATISTICS
cHAPTER 6 STATISTICS
cHAPTER 6 STATISTICS
NORMAL
DISTRIBUTION
Importance of Normal distribution
q Physicians often rely on a knowledge of normal limits to classify
patients as healthy or otherwise.
q For example: a serum cholesterol level above 200 mg/dl is
widely regarded as indicating a significantly increased risk for
coronary heart disease.
q Normal distribution is the basis for the use of inferential
statistics.
q A symmetrical probability distribution where most results are
located in the middle and few are spread on both sides
2
Importance of Normal distribution
❏ Examples:
ü The body temperature for healthy humans.
ü The heights and weights of adults.
ü IQ and standardized test scores.
ü Quality control test results.
ü Errors in measurements.
❏ Why? Used to illustrate the shape and variability of the data
and Normality is an important assumption when conducting statistical
analysis
3
Normal distribution properties
ü The wider the curve, the larger the standard deviation and the more variation
exists in the process
ü Graphical representation of the normal distribution.
ü It is determined by the mean and the standard deviation
rve
Cu
σ
al
rm
No
X
μ
4
Normal distribution properties
ü Helps calculating the probabilities for normally distributed populations
ü The probabilities are represented by the area under the normal curve.
ü The total area under the curve is equal to 100% (or 1.00)
f(X)
Total probability=100%
rve
Cu
σ
al
rm
No
X
μ
5
Normal distribution properties
q Empirical Rule: For any normally distributed data:
•68% of the data fall within 1 standard deviation of the
mean(the area between µ-s and µ+s).
•95% of the data fall within 2 standard deviations of the
mean (the area between µ-2s and µ+2s ).
•99.7% of the data fall within 3 standard deviations of
the mean (the area between µ-3s and µ+3s ).
6
Example
7
The Normal Distribution:
as mathematical function
q Where f(x) is the height of the curve for a given value of x
1 x-µ 2
1 - ( )
f ( x) = ×e 2 s
s 2p
This is a bell shaped curve
with different centers and
spreads depending on µ and
Note constants: s
p=3.14159
e=2.71828
8
Standard normal distribution
q Common practice to convert any normal distribution to the standardized form and
then use the standard normal table to find probabilities.
q The Standard Normal Distribution (Z distribution) is a way of standardizing the
normal distribution.
q It always has a mean of 0 and a standard deviation of 1
q The total area under the curve is 1
Standard normal distribution
ü Any normally distributed data can be converted to the standardized form using the formula:
(𝑥 − 𝜇)
𝑍=
𝜎
where:
ü 'X' is the data point in question.
ü Z' (or Z-score) is a measure of the number of standard deviations of that data point from the mean.
Standard normal distribution
q You can then use this information to determine the
area under the normal distribution curve that is:
11
12
Example 1
Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the proportion of persons having SAT math scores between 500 and
650?Sketch a curve and shade the area you wish to find.
𝑥 − 𝜇 500 − 500
𝑧! = = =0
𝜎 100
650 − 500
𝑧" = = 1.5
100
By using table A to find the area for a z=1.5,you will find the
answer to be 0.4332.Therefore,the proportion of persons having
SAT scores between 500 and 650 is 43%
13
Example 2
Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the proportion of persons having SAT math scores greater than
650?Sketch a curve and shade the area you wish to find.
Because the total area to the right curve of z=0 is 0.5 and area
between z=0 and z=1.5 is 0.4332,by subtraction you will obtain the
area beyond z=1.5,namely, 0.5– 0.4332=0.0668.
So, about 7% have SAT scores over 650
14
Example 3
What is the proportion of persons with SAT scores between 380 and 620?
To find the proportion of scores between 380 and 620, you must find the area under the
normal curve between Z values that correspond to SAT scores of 380 and 620.
The only way to find the area is to convert the raw scores of 380 and 620 to Z scores.
Using equation ,we find Z scores of - 1.20 and +1.20. Notice that there is exactly the same
area between Z = 0 and Z = 1.20 as there between Z = 0 and Z = - 1.20,
namely,0.3849(from table). Adding these two areas gives us
a total area of 0.7698; that is, 77% of the students have math SAT scores between 380
and 620.
15
Example 3
16
Example 4
Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the proportion of persons having SAT math scores between 450 and
670?Sketch a curve and shade the area you wish to find.
𝑥 − 𝜇 450 − 500
𝑧! = = = −0.5
𝜎 100
670 − 500
𝑧" = = 1.7
100
By using table A to find the area for a z=-0.5,you will find the
answer to be 0.1915,and that of z=1.7 to be 0.4554.Therefore,the
proportion of persons havinf SAT scores between 450 and 670 is
0.1915+0.4554=0.6469,thus 65%.
17
Example 5
Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the Z value of the normal curve that marks the upper 10%(or 90th
percentile )of the area? Sketch a curve and shade the area you wish to find.
18
Example 6
Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the 90 percentile of the SAT scores? Sketch a curve and shade the
th
𝑥−𝜇
𝑧! =
𝜎
# $%&&
1.28 = !&&
Therefore, x = 628
19
Example 7
Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the 90 percentile of the SAT scores? Sketch a curve and shade the
th
20
Extra exercises
21
Extra exercises
3. What z scores correspond to the
following areas under the normal
curve.
a. Area of 0.05 to right of +Z
b. Area of 0.01 to left of -z
c. Area of 0.05 beyond +&-z
d. Area of 0.9 between +&-z
4. Assume that the age at onset of
disease X is distributed normally
within a mean of 50 years and a
standard deviation of 12 years
what is the probability that an
individual get the disease before
35 years old? 22