Applications of Normal Distribution
Applications of Normal Distribution
Applications of Normal Distribution
These are:
(i) To determine the percentage of cases (in a normal distribution)
within given limits or scores.
X = Raw Score
M = Mean of X Scores
This figure means that 3413 cases in 10,000; or 34.13 percent of the
entire area of the curve lies between the mean and la. Similarly, if
we have to find the percentage of the distribution between the mean
and 1.56 σ, say, we go down the x/σ column to 1.5, then across
horizontally to the column headed by .06, and note the entry 44.06.
This is the percentage of the total area that lies between the mean
and 1.56σ.
We have so far considered only a distances measured in the positive
direction from the mean. For this we have taken into account only
the right half of the normal curve. Since the curve is symmetrical
about the mean, the entries in Table-A apply to distances measured
in the negative direction (to the left) as well as to those measured in
the positive direction.
For practical purposes we take the curve to end at points -3σ and
+3σ distant from the mean as the normal curve does not actually
meet the base line. Table of area under normal probability curve
shows that 4986.5 cases lie between mean and ordinate at +3σ.
Thus, 99 .73 percent of the entire distribution, would lie within the
limits -3σ and +3σ. The rest 0.27 percent of the distribution beyond
±3σ is considered too small or negligible except where N is very
large.
Example 1:
Given a normal distribution of 500 scores with M = 40 and σ= 8,
what percentage of cases lie between 36 and 48.
Solution:
Z score for raw score 36. Z = X-M/σ 36-40/8 = -4/8
or Z = -05. σ
or Z = +1σ
According to table area under N.P.C. (Table -A) the total percentage
of cases that lie between the Mean and -,5σ is 19.15. The percentage
of cases between the Mean and +1σ is 34.13. Therefore, total
percentage of cases that fall between the scores 36 and 48 is 19.15 +
34.13 = 53.28.
Solution:
First we convert raw score 60 to Z score by using the formula.
According to the table of area under N.P.C. (Table-A) the area of the
curve that lie between M and + 2σ is 47.72%. The total percentage of
cases below the score 60 is 50 + 47.72 = 97.72% or 98%.
Thus, the percentile rank of a student who secured 60 marks in an
achievement test in the class is 98.
Example 3:
In a class Amit’s percentile rank in the mathematics class is 75. The
mean of the class in mathematics is 60 with standard deviation 10.
Find out Amit’s marks in mathematics achievement test.
Solution:
According to definition of percentile rank the position of Amit on
the N.P.C. scale is 25% scores above the mean.
According to the N.P.C. Table the σ score of 25% cases from the
Mean is +.67σ.
Example 4:
Given a group of 500 college students who have been administered
a general mental ability test. The teacher wishes to classify the
group in five categories and assign them the grades A, B, C, D, E
according to ability. Assuming the general mental ability is normally
distributed in the population; calculate the number of students that
can be placed in groups A, B, C, D and E.
Solution:
We know that the total area of the Normal Curve extends from -3σ
to + 3σ that is over a range of 6σ.
Reasoning based on normal distributions is an important skill that goes throughout the rest of the
course. In this lecture, we will look at a few problems that illustrate what you can do with normal
distributions. One of the variables that we know do follow normal distributions is the height of
people. For all these problems, we’re going to assume that women’s heights are normally
distributed with a mean of 65 inches and a standard deviation of 3 inches. In the textbook’s
notation, we can also state .
1) What is the probability that a woman is between 64 inches and 69 inches tall (5’4” to 5’9”)?
Put another way, what fraction of women’s heights are in this range? Using the notation of
random variables, we would write this as P(64 < X < 69).
First, draw a horizontal axis and label it x, write the units (inches) below it, and draw a normal pdf
centered over the mean of 65 inches. Then mark and label 65 on the axis, mark and label 64 to the
left of it and 69 to the right of it, draw vertical lines from the 64 and the 69 to the curve and shade
the part between them, above the x-axis, and under the curve:
If you are using GeoGebra, then you will immediately see that the software tells you P(64 < X <69)
=0.5393. If you are using the calculator, then you need to find the normalcdf (not normalpdf)
function. Enter the number on the left where the shading begins, the number on the right where it
ends, the mean of the distribution, and its standard deviation, all separated by commas, normalcdf
(64, 69, 65, 3), and you will get 0.539347. Round this to the nearest ten-thousandth (four places
after the decimal point), or equivalently to the nearest hundredth of a percent, and you come up
with the correct answer: 0.5393, or 53.93%.
In the last lecture, we mentioned that in the old days, everyone has to learn how to look up a Z-
table, the table the shows the relationship between area and Z-score for the standard normal. Then
how does GeoGebra and normalcdf do it? Well, it’s no magic. The software simply converts any
normal distribution to a standard normal, using the familiar relationship of Z-score:
It’s not necessary that you always convert all normal distributions to Z, but it’s useful to recognize
how it is handled by the software, since we will be doing the same later in inferential statistics.
2) What is the probability that a woman is taller than 5 feet, 10 inches, or 70 inches? Put
another way, what fraction of women are taller than 70 inches? This would be written as P(X
> 70).
Start the same way as in Problem 1, but you have to mark and label only one number besides the
mean, the 70. Then shade to the right of the 70, because that’s where the taller heights are:
GeoGebra is fairly self-explanatory here. With the calculator, the only complication using normalcdf
is that there is no number on the right where the shading ends, so put in a big one, and if you’re not
sure if it’s big enough put in a bigger one and see if it changes your answer, at least to the nearest
ten-thousandth. normalcdf ( 70, 1000, 65, 3)=0.04779, so the rounded answer is 0.0478, or 4.78%.
In the problems above, we found the probability that the random variable falls within a certain
range. Now we’re going to reverse the process. We’ll start with the probability of a certain range,
and then we’ll have to find the values of the random variable that determine that range. I’ll call
these values cut-offs. Sometimes they are also called “inverse probability” problems.
In these three problems, we’ll use the same situation as above: Women’s heights are normally
distributed with a mean of 65 inches and a standard deviation of 3 inches.
1) How short does a woman have to be to be in the shortest 10% of women? If we call this cut-
off c, this could be written as finding c such that P(X < c) = 0.10.
We’ll do the same kind of diagram as before, but this time we’ll label the known probability, 10%,
and we do this above the shaded area, definitely not on the x-axis, because it’s an area, not a height.
The hardest part of the diagram is deciding which side of the mean to put the c on and which side of
the c to shade.
You really have to think about it. In this case, since by definition 50% of women are shorter than the
mean, the cut-off for 10% has to be less than the mean.
The picture here shows that how GeoGebra can be used to find the cut-off values: instead of
entering the cut-off values, you can enter 0.10 as the probability, and GeoGebra will solve for the
cut-off value (61.1553).
Using the calculator, you will need to resort to the invNorm function, followed by the percent of
data under the normal curve to the left of (always to the left of, no matter which side of c the
shading is on) the cut-off, then the mean and standard deviation, separated by commas.
So in our example, we will do invNorm (0.10, 65, 3), or, to the nearest inch, like the mean and
standard deviation, 61 inches. So about 10% of women are shorter than 61 inches. You can check
this using normalcdf, and you might as well use more of the cut-off than we rounded to, for greater
assurance that your check shows you got the right answer. You get normalcdf (0, 61.1553, 65, 3),
which come to 0.0999997, or 10%.
2) How tall does a woman have to be to be in the tallest fourth of women? (What is the cut-off
for the tallest 25% of women?) If we call this height c, we want to find the value of c such
that P(X > c) = 0.25. Here’s the diagram:
In GeoGebra it’s quite simple: you will just have to switch the left to the right tail.
In the calculator, when we use invNorm we must put in 0.75, because the calculator finds cut-offs
for areas to the left only: invNorm (0.75, 65, 3). Here 0.75 comes from the fact that the total area
must be equal to 1. When we subtract the area to the right, we are getting the area to the left of the
cut-off.
Again, either GeoGebra or invNorm rely on the standard normal Z table to compute these values. To
see how this is done, you will first need to first the cut-off value for the 25% area to the right:
Then using the relationship between the Z score and X, we can solve for x as the unknown:
Using the algebra you have learned, you will find x = 3*0.67 + 65 = 67.0, which is how the software
arrived at the answer. You won’t have to do it this way every time, but it’s helpful to keep in mind,
since this relation is used later on in finding the margin of error for confidence intervals.
3) What if we’re interested in finding cut-offs for a middle group of women’s heights, say the
middle 40%? Obviously, we’re looking for two numbers here, one on either side of the
mean, with the same distance to the mean. Call them and . Then we are looking for
these values so that
You probably noticed that the normal calculator in GeoGebra can’t really find two cut-offs at
once in fact, the figure above was drawn using a different tool. But and are not two
independent values, since they are equally far from 65, the mean. To use the normal calculator, we
must find out how much area is under the curve to the left of . Well, if 100% of area is under the
entire curve, then what’s left over after taking away the middle 40% is 1-0.40=0.60, and since that
60% is split evenly between the two tails (the parts at the sides), that gives 30% for each tail. So is
the number such that .