cHAPTER 6 STATISTICS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter 6 : THE

NORMAL
DISTRIBUTION
Importance of Normal distribution
q Physicians often rely on a knowledge of normal limits to classify
patients as healthy or otherwise.
q For example: a serum cholesterol level above 200 mg/dl is
widely regarded as indicating a significantly increased risk for
coronary heart disease.
q Normal distribution is the basis for the use of inferential
statistics.
q A symmetrical probability distribution where most results are
located in the middle and few are spread on both sides

2
Importance of Normal distribution
❏ Examples:
ü The body temperature for healthy humans.
ü The heights and weights of adults.
ü IQ and standardized test scores.
ü Quality control test results.
ü Errors in measurements.
❏ Why? Used to illustrate the shape and variability of the data
and Normality is an important assumption when conducting statistical
analysis

3
Normal distribution properties
ü The wider the curve, the larger the standard deviation and the more variation
exists in the process
ü Graphical representation of the normal distribution.
ü It is determined by the mean and the standard deviation

f(X) Changing μ shifts the distribution left or right.

Changing σ increases or decreases the


spread.

rve
Cu
σ
al
rm
No

X
μ
4
Normal distribution properties
ü Helps calculating the probabilities for normally distributed populations
ü The probabilities are represented by the area under the normal curve.
ü The total area under the curve is equal to 100% (or 1.00)

f(X)
Total probability=100%

rve
Cu
σ
al
rm
No

X
μ
5
Normal distribution properties
q Empirical Rule: For any normally distributed data:
•68% of the data fall within 1 standard deviation of the
mean(the area between µ-s and µ+s).
•95% of the data fall within 2 standard deviations of the
mean (the area between µ-2s and µ+2s ).
•99.7% of the data fall within 3 standard deviations of
the mean (the area between µ-3s and µ+3s ).

For a stable normally distributed process,


99.73% of the values lie within +/-3 standard
deviation of the mean.

6
Example

Suppose that the heights of a sample men are normally distributed.


Ø The mean height is 178 cm and a standard deviation is 7 cm.
We can generalize that:
Ø 68% of population are between 171 cm and 185 cm.
Ø This might be a generalization, but it's true if the data is normally
distributed

7
The Normal Distribution:
as mathematical function
q Where f(x) is the height of the curve for a given value of x

1 x-µ 2
1 - ( )
f ( x) = ×e 2 s
s 2p
This is a bell shaped curve
with different centers and
spreads depending on µ and
Note constants: s
p=3.14159
e=2.71828

8
Standard normal distribution

q Common practice to convert any normal distribution to the standardized form and
then use the standard normal table to find probabilities.
q The Standard Normal Distribution (Z distribution) is a way of standardizing the
normal distribution.
q It always has a mean of 0 and a standard deviation of 1
q The total area under the curve is 1
Standard normal distribution
ü Any normally distributed data can be converted to the standardized form using the formula:

(𝑥 − 𝜇)
𝑍=
𝜎

where:
ü 'X' is the data point in question.
ü Z' (or Z-score) is a measure of the number of standard deviations of that data point from the mean.
Standard normal distribution
q You can then use this information to determine the
area under the normal distribution curve that is:

ü To the right of your data point.


ü To the left of the data point.
ü Between two data points.
ü Outside of two data points

q Z table then used to find probabilities associated


with the standard normal curve.

11
12
Example 1

Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the proportion of persons having SAT math scores between 500 and
650?Sketch a curve and shade the area you wish to find.

𝑥 − 𝜇 500 − 500
𝑧! = = =0
𝜎 100

650 − 500
𝑧" = = 1.5
100

By using table A to find the area for a z=1.5,you will find the
answer to be 0.4332.Therefore,the proportion of persons having
SAT scores between 500 and 650 is 43%

13
Example 2

Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the proportion of persons having SAT math scores greater than
650?Sketch a curve and shade the area you wish to find.

Because the total area to the right curve of z=0 is 0.5 and area
between z=0 and z=1.5 is 0.4332,by subtraction you will obtain the
area beyond z=1.5,namely, 0.5– 0.4332=0.0668.
So, about 7% have SAT scores over 650

14
Example 3

What is the proportion of persons with SAT scores between 380 and 620?

To find the proportion of scores between 380 and 620, you must find the area under the
normal curve between Z values that correspond to SAT scores of 380 and 620.
The only way to find the area is to convert the raw scores of 380 and 620 to Z scores.
Using equation ,we find Z scores of - 1.20 and +1.20. Notice that there is exactly the same
area between Z = 0 and Z = 1.20 as there between Z = 0 and Z = - 1.20,
namely,0.3849(from table). Adding these two areas gives us
a total area of 0.7698; that is, 77% of the students have math SAT scores between 380
and 620.

15
Example 3

We should point out that a negative Z score means that the


corresponding raw score will be lower than the mean. In this example,
a raw score of 380 coresponds to a -Z score of - 1.20. Notice that the
areas between the mean and ‡ Z (both labeled area A) are exactly the
same; the only difference is that the positive Z score represents the
area above the mean and the negative Z score represents the area
below the mean. A Z score of - 1.20 means that a raw score of 380 is
1.20 standard deviations below the mean and a Z score of 1.20 means
that a raw score of 620 is 1.20 standard deviations above the mean.

16
Example 4

Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the proportion of persons having SAT math scores between 450 and
670?Sketch a curve and shade the area you wish to find.

𝑥 − 𝜇 450 − 500
𝑧! = = = −0.5
𝜎 100

670 − 500
𝑧" = = 1.7
100
By using table A to find the area for a z=-0.5,you will find the
answer to be 0.1915,and that of z=1.7 to be 0.4554.Therefore,the
proportion of persons havinf SAT scores between 450 and 670 is
0.1915+0.4554=0.6469,thus 65%.

17
Example 5

Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the Z value of the normal curve that marks the upper 10%(or 90th
percentile )of the area? Sketch a curve and shade the area you wish to find.

The desired Z score is that value corresponding to 0.4 of the


area(0.5-0.1).In Table A,the value is found to be approximately
Z=1.28.

18
Example 6

Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the 90 percentile of the SAT scores? Sketch a curve and shade the
th

area you wish to find.

The desired Z score is that value corresponding to 0.4 of the


area(0.5-0.1).In Table A,the value is found to be approximately
Z=1.28.But what does this mean in terms of SAT scores ?

𝑥−𝜇
𝑧! =
𝜎
# $%&&
1.28 = !&&
Therefore, x = 628

19
Example 7

Let us assume that the SAT scores for a given population are manually
distributed with 𝜇 = 500 𝑎𝑛𝑑 𝜎 = 100.
What is the 90 percentile of the SAT scores? Sketch a curve and shade the
th

area you wish to find.


If 1 million high school students took the SAT ,how many would score in the
9oth percentile?

0.1 of the area or 10% of the scores are above z=1.28.therefore,


100,000 students.

20
Extra exercises

1. Find the area under the normal


curve that lies between the given
values Z.
a. Z=0 and z=2.37
b. Z=-1.85 and z=1.85
c. Z=-0.76 and z=1.13
d. Z=-2.77 and z=-0.96
2. Determine the area to the right of
Z.(left of –z)
a. Z=1.73
b. Z=5
c. Z=3 and z=-3

21
Extra exercises
3. What z scores correspond to the
following areas under the normal
curve.
a. Area of 0.05 to right of +Z
b. Area of 0.01 to left of -z
c. Area of 0.05 beyond +&-z
d. Area of 0.9 between +&-z
4. Assume that the age at onset of
disease X is distributed normally
within a mean of 50 years and a
standard deviation of 12 years
what is the probability that an
individual get the disease before
35 years old? 22

You might also like