Probability Distributions: Inferential Statistics AB
Probability Distributions: Inferential Statistics AB
Probability Distributions: Inferential Statistics AB
MODULE 2
Probability Distributions
Introduction
I. Learning Outcomes:
II. Pre-Assessment:
1. Which of the following probabilities can NOT be found using the binomial
distribution?
a. The probability that 3 out of 8 tosses of a coin will result in heads.
b. The probability that Susan will beat Shannon in two of their three tennis matches
c. The probability of rolling at least two 3’s and two 4’s out of twelve rolls a die
d. All of the above
2. A probability distribution for which there are just two possible outcomes with fixed
probabilities summing to one.
a. Poisson
b. Binomial
c. Normal
d. Uniform
3. Calculate 6!
a. 12
b. 36
c. 120
d. 720
1
INFERENTIAL STATISTICS AB
4. How many outcomes are possible if 3 new employees are to be selected from a
group of 5 applicants?
a. 10
b. 12
c. 15
d. 30
8. The number of arrivals of delivery trucks per hour at a loading station is an example
of which of the following processes?
a. Binomial
b. Hypergeometric
c. Poisson
d. Uniform
9. Suppose random variable X has a normal distribution with a mean of 98.2 and the
standard deviation 0f 3.2, find P( x ≥99).
a. 0.0987
b. 0.4013
c. 0.5987
d. 0.9013
2
INFERENTIAL STATISTICS AB
Proability
Distribution
Discrete Continuous
Binomial Normal
Distribution Distribution
Poisson
Distribution
The figure above shows the classification of the three probability distributions—
Binomial, Poisson and Normal in terms of numerical values.
Before jumping on the next sections of this module, let us check first if you can still
remember the idea of permutation and combination. Evaluate the following:
(a )5 ! ( c ) P (4,2) (e ) 8
()3
6!
(b ) (d ) P( 7 ,5) (f ) 4
()
2! 4 ! 0
Can you still recall the probability distributions that were discussed to you in your
previous years in college? List at least 5 of them and give their descriptions.
3
INFERENTIAL STATISTICS AB
2.
3.
4.
5.
6.
QUESTION:
What is the relevant of the given activity to the topic that we are about to
discuss?
_________________________________________________________________
_________________________________________________________________
_________________________________________________________________
If pis the probability that an event will happen in any single trial (called the probability
of a success) and q=1− p is the probability that it will fail to happen in any single trial
(called the probability of failure), then the probability that the event will happen exactly
X times in N trials (i.e., X successes and N-X failure will occur) is given by
4
INFERENTIAL STATISTICS AB
N!
p ( X )= ( NX ) p q
x N−X
=
X ! ( N −X ) !
x N −X
P q (1)
The discrete probability distribution (1) is often called the binomial distribution since for
X =0 , 1, 2 , … , N it corresponds to successive terms of the binomial formula, or binomial
expansion,
N N−2 2
( q+ p )N =q N + N q N−1 p+
( ) q ( )
p + …+ p N (2)
1 2
( q+ p )4=q4 + 4 q3 p+ 4 q2 p 2+ 4 q p 3+ p 4
() () ()
1 2 3
¿ q 4 +4 q3 p+6 q 2 p2 +4 qp3 + p4
Distribution (1) is also called the Bernoulli distribution after James Bernoulli, who
discovered it at the end of the seventeenth century. Some properties of the binomial
distribution are listed in Table 1.1.
Mean μ=Np
5
INFERENTIAL STATISTICS AB
Variance σ 2=Npq
q− p
Moment coefficient of skewness α 3=
√ Npq
1−6 pq
Moment coefficient of kurtosis a 4=3+
Npq
√
standard deviation is σ =√ Npq= ( 100 ) ( 12 )( 12 )=5.
THE NORMAL DISTRIBUTION
Y= e
2 ( X −μ )
(3)
σ √2 π
When the variable X is expressed in terms of standard units [ z=X −μ/ σ ], equation (3) is
replaced by the so-called standard form.
1 −1/ 2 z 2
Y= e (4)
√2 π
In such case we say that z is normally distributed with mean and variance 1.Figure 7-1
is a graph of this standardized normal curve. It shows that the areas included between
z=−1 and +1 , z=−2 and +2, and z=−3 and +3 are equal, respectively, to 68.27%,
95.45%, and 99.73% of the total area, which is 1. The Table in Appendix II shows the
areas under this curve bounded by the ordinates at z ¿ 0 and any positive value of z .
6
INFERENTIAL STATISTICS AB
From this table the area between any two ordinates can be found by using the
symmetry of the curve about z=0 .
Some properties of the normal distribution given by equation (3) are listed in Table
7.2.
Variance σ2
Standard Deviation σ
If N is larger and if neither p nor q is too close to zero, the binomial distribution can be
closely approximated by a normal distribution with standardized variable given by
X −Np
z=
√ Npq
7
INFERENTIAL STATISTICS AB
The approximation becomes better with increasing N, and in the limiting case it is
exact; this is shown in Tables 1.1 and 1.2 where it is clear that as N increases, the
skewness and kurtosis for the binomial distribution approach that of the normal
distribution. In practice the approximation is very good if both Np and Nqare greater
than 5.
λ X e−λ
p ( X )= X =0 , 1, 2 , …
X!
Where e =2.71828… and λ is given constant, is called the Poisson distribution after
Simeon-Denis Poisson, who discovered it in the early part of the nineteenth century.
The values of p( X ) can be computed by using the table in Appendix I (which gives
values of e− λ for various values of λ or by using logarithms.
APPENDIX I Values of e− λ
( 0< λ<1 )
8
INFERENTIAL STATISTICS AB
Variance σ 2=λ
Standard deviation σ =√ λ
In the binomial distribution (1), if N is large while the probability p of the occurrence of
an event is close to 0, so that q=1− p is close to 1, the event is called a rare event. In
practice we shall consider an event to be rare if the number of trials is at least
50 ( N ≥ 50 ) while Np is less than 5. In such case the binomial distribution (1) is closely
approximated by the Poisson distribution (5) with λ=Np. This is indicated by comparing
Tables 1.1 and 1.3—for by placing λ=Np ,q ≈1, and p ≈ 0 in the Table 1.1, we get the
result in Tale 1.3.
Since there is a relation between the binomial and normal distributions, it follows that
there also is a relation between the Poisson and normal distributions. It can in fact be
shown that the Poisson distribution approaches a normal distribution with standardized
variable ( X −λ ) / √ λ as λ increases indefinitely.
9
INFERENTIAL STATISTICS AB
N!
p X p X … p kX
1 2 k
(6)
X1! X2! … XK ! 1 2
10
INFERENTIAL STATISTICS AB
Figure
1
Distribution of birth weight in 3,226 newborn babies (data from O' Cathain et al 2002)
To distinguish the use of the same word in normal range and Normal distribution we
have used a lower and upper case convention throughout.
The histogram of the sample data is an estimate of the population distribution of birth
weights in new born babies. This population distribution can be estimated by the
superimposed smooth `bell-shaped' curve or `Normal' distribution shown. We presume
that if we were able to look at the entire population of new born babies then the
distribution of birth weight would have exactly the Normal shape. We often infer, from
a sample whose histogram has the approximate Normal shape, that the population will
have exactly, or as near as makes no practical difference, that Normal shape.
The Normal distribution is completely described by two parameters μ and σ, where μ
represents the population mean, or centre of the distribution, and σ the population
standard deviation. It is symmetrically distributed around the mean. Populations with
small values of the standard deviation σ have a distribution concentrated close to the
centre μ; those with large standard deviation have a distribution widely spread along
the measurement axis. One mathematical property of the Normal distribution is that
exactly 95% of the distribution lies between
μ− (1.96 xσ )∧μ+( 1.96 xσ ) μ
Changing the multiplier 1.96 to 2.58, exactly 99% of the Normal distribution lies in the
corresponding interval.
In practice the two parameters of the Normal distribution, μ and σ, must be estimated
from the sample data. For this purpose a random sample from the population is first
taken. The sample mean and the sample standard deviation, ( x́)=S , are then
11
INFERENTIAL STATISTICS AB
calculated. If a sample is taken from such a Normal distribution, and provided the
sample is not too small, then approximately 95% of the sample lie within the interval:
x́−[ 1.96 × SD ( x́ ) ] ¿ x́+[1.96× SD (x́)]
This is calculated by merely replacing the population parameters μ and σ by the sample
estimates and s in the previous expression.
In appropriate circumstances this interval may estimate the reference interval for a
particular laboratory test which is then used for diagnostic purposes.
We can use the fact that our sample birth weight data appear Normally distributed to
calculate a reference range. We have already mentioned that about 95% of the
observations (from a Normal distribution) lie within ±1.96 SDs of the mean. So a
reference range for our sample of babies, using the values given in the histogram
above, is:
3.39 - [1.96 x 0.55] to 3.39 + [1.96 x 0.55]
2.31kg to 4.47kg
A baby's weight at birth is strongly associated with mortality risk during the first year
and, to a lesser degree, with developmental problems in childhood and the risk of
various diseases in adulthood. If the data are not Normally distributed then we can
base the normal reference range on the observed percentiles of the sample, i.e. 95% of
the observed data lie between the 2.5 and 97.5 percentiles. In this example, the
percentile-based reference range for our sample was calculated as 2.19kg to 4.43kg.
Most reference ranges are based on samples larger than 3500 people. Over many
years, and millions of births, the WHO has come up with a normal birth weight range
for new born babies. These ranges represent results than are acceptable in newborn
babies and actually cover the middle 80% of the population distribution, i.e. the 10th to
90th centiles. Low birth weight babies are usually defined (by the WHO) as weighing
less than 2500g (the 10th centile) regardless of gestational age, and large birth weight
babies are defined as weighing above 4000kg (the 90th centile). Hence the normal birth
weight range is around 2.5kg to 4kg. For our sample data, the 10th to 90th centile
range was similar, 2.75 to 4.03kg.
The Binomial Distribution
If a group of patients is given a new drug for the relief of a particular condition, then
the proportion p being successively treated can be regarded as estimating the
population treatment success rate .
The sample proportion p is analogous to the sample mean , in that if we score zero
for those s patients who fail on treatment, and 1 for those r who succeed, then p=r/n,
where n=r+s is the total number of patients treated. Thus p also represents a mean.
12
INFERENTIAL STATISTICS AB
Data which can take only a binary (0 or 1) response, such as treatment failure or
treatment success, follow the binomial distribution provided the underlying population
response rate does not change. The binomial probabilities are calculated from:
n!
P(r responses out of n)= π r (1−π )n−r
r ! ( n−r ) !
for successive values of R from 0 through to n. In the above, n! is read as “n factorial”
and r! as “r factorial”. For r =4 ,r !=4 × 3 ×2 ×1=24 . Both 0! and 1! are taken as equal to
1. The shaded area marked in Figure 2 (below) corresponds to the above expression for
the binomial distribution calculated for each of r =8,9 ,... , 20 and then added. This area
totals 0.1018. So the probability of eight or more responses out of 20 is 0.1018.
For a fixed sample size n the shape of the binomial distribution depends only on .
Suppose n=20 patients are to be treated, and it is known that on average a quarter,
or =0.25, will respond to this particular treatment. The number of responses actually
observed can only take integer values between 0 (no responses) and 20 (all respond).
The binomial distribution for this case is illustrated in Figure 2.
The distribution is not symmetric, it has a maximum at five responses and the height of
the blocks corresponds to the probability of obtaining the particular number of
responses from the 20 patients yet to be treated. It should be noted that the expected
value for r, the number of successes yet to be observed if we treated n patients, is
(nπ ). The potential variation about this expectation is expressed by the corresponding
standard deviation:
SD ( r )=√ nπ (1−π )
Figure 2 also shows the Normal distribution arranged to have μ=n π =5 and
σ =√ [ nπ (1−π ) ] =1.94, superimposed on to a binomial distribution with π=0.25 and
n=20. The Normal distribution describes fairly precisely the binomial distribution in this
case. If n is small, however, or π close to0 or 1, the disparity between the Normal and
binomial distributions with the same mean and standard deviation increases and the
Normal distribution can no longer be used to approximate the binomial distribution. In
such cases the probabilities generated by the binomial distribution itself must be used.
13
INFERENTIAL STATISTICS AB
Now it is clear that the distribution of the number of donors takes integer values only,
thus the distribution is similar in this respect to the binomial. However, there is no
theoretical limit to the number of organ donors that could happen on a particular day.
Here the population is the UK population aged 15-69, over two years, which is over 82
million person years, so in this case each member can be thought to have a very small
probability of actually suffering an event, in this case being admitted to a hospital ICU
and placed on a ventilator with a life threatening condition.
The mean number of organ donors per day over the two year period is calculated as:
14
INFERENTIAL STATISTICS AB
1330 1330
r= = =1.82 organ donations per day
(365+365) 730
It should be noted that the expression for the mean is similar to that for π , except here
multiple data values are common; and so instead of writing each as a distinct figure in
the numerator they are first grouped and counted. For data arising from a Poisson
distribution the standard error, that is the standard deviation of r, is estimated by
r
√
SE(r )= ( ), where n is the total number of days (or an alternative time unit). Provided
n
the organ donation rate is not too low, a 95% confidence interval for the underlying
(true) organ donation rate λ can be calculated in the usual way:
interval for λ is 1.72 to 1.92 organ donations per day. Exact confidence intervals can be
calculated as described by Altman et al. (2000).
The Poisson probabilities are calculated from:
λr −λ
P(r responses)= e
r!
…for successive values of r from 0 to infinity. Here e is the exponential constant
2.7182 … , and λ is the population rate which is estimated by r in the example above.
Example
Suppose that before the study of Wight et al. (2004) was conducted it was expected
that the number of organ donations per day was approximately two. Then assuming
λ=2, we would anticipate the probability of 0 organ donations in a given day to be
( 20 /0 ! ) e−2 =e−2. (Remember that 20 and 0 !are both equal to 1.) The probability of one
organ donation would be (21/1!)e-2 = 2(e-2) = 0.271. Similarly the probability of two
organ donations per day is (22/2!)e-2= 2(e-2) = 0.271; and so on to give for three
donations 0.180, four donations 0.090, five donations 0.036, six donations 0.012, etc. If
the study is then to be conducted over 2 years (730 days), each of these probabilities is
multiplied by 730 to give the expected number of days during which 0, 1, 2, 3, etc.
donations will occur. These expectations are 98.8, 197.6, 197.6, 131.7, 26.3, 8.8 days.
A comparison can then be made between what is expected and what is actually
observed.
15
INFERENTIAL STATISTICS AB
1.1 find (a) the mean and (b) the standard deviation for the distribution of
defective bolts in a total 0f 400.
3. The table below shows the number of days, f, in 50-day period during which X
automobile accidents occurred in a city. Fit a Poisson distribution to the data.
16
INFERENTIAL STATISTICS AB
VI. Post-Assessment:
1. Which of the following probabilities can NOT be found using the binomial
distribution?
a. The probability that 3 out of 8 tosses of a coin will result in heads.
b. The probability that Susan will beat Shannon in two of their three tennis matches
c. The probability of rolling at least two 3’s and two 4’s out of twelve rolls a die
d. All of the above
2. A probability distribution for which there are just two possible outcomes with fixed
probabilities summing to one.
a. Poisson
b. Binomial
c. Normal
d. Uniform
3. Calculate 6!
a. 12
b. 36
c. 120
d. 720
4. How many outcomes are possible if 3 new employees are to be selected from a
group of 5 applicants?
a. 10
b. 12
c. 15
d. 30
17
INFERENTIAL STATISTICS AB
8. The number of arrivals of delivery trucks per hour at a loading station is an example
of which of the following processes?
a. Binomial
b. Hypergeometric
c. Poisson
d. Uniform
9. Suppose random variable X has a normal distribution with a mean of 98.2 and the
standard deviation 0f 3.2, find P( x ≥99).
a. 0.0987
b. 0.4013
c. 0.5987
d. 0.9013
VII. References:
18
INFERENTIAL STATISTICS AB
19