Slides 2 Statistics

Probability and Statistics-LCW
Marianopolis College
Hadi Bigdely
2
1
4.2. SPECIAL DISCRETE RANDOM VARIABLES’ DISTRIBUTIONS:
1. Bernoulli distribution:
A Bernoulli random variable has only two outcomes, with probability
distribution
e.g., tossing a coin, winning or losing a game, · · · .
So
2
Example 4.21:
1. When (e.g., for tossing a coin), we have
2. When rolling a die , with outcome k , (1 ≤ k ≤ 6) , let

if the roll resulted in a six ,
if the roll did not result in a six .
3
2. The Binomial Distribution:
Definition: A binomial experiment is an experiment with all the
following:
Let n be fixed.
Perform a Bernoulli (Failure and Success) trial n times in sequence .
Assume the individual trials are independent .
Assume the probability of a success is .
Example: Toss a coin 6 times (n=6). Assume success is facing Head.
,
with probability
( ) (
(why?)
4
Definition (Binomial random variable):
The binomial random variable associated with a binomial experiment consisting
of n trials is defined as
= the number of success among the n trials
Suppose, for example, that n=3. Then there are eight possible outcomes for the
experiment:
From the definition of , , , and so on.
Possible values for in an n-trial experiment are . We will often write
to indicate that is a binomial rv based on n trials with success probability .
5
Proposition: Consider a sequence of independent success/failure
experiments, each of which yields success with probability . Let
denote the number of successes then we have
where 𝒊 is a Bernoulli random variable with probability . Then is a

binomial rv and
Example 4.22: Consider a binomial experiment with n=3. Then there

are eight possible outcomes for the
experiment:
Let be the corresponding binomial random variable. Then
6
Remark: Because the pmf of a binomial rv depends on the two
parameters and , we denote the pmf by , so
Theorem: Let . Let Then
Proof: This expression follows from the fact that any specific sequence with
exactly successes (hence failures) has a probability
. There are such sequences.
Example 4.23: Toss a coin 12 times.
a) What is the probability that you get 3 Head?
b) What is the probability of having at least 8 Head?
c) What is the probability of having at most 1 Head?
7
The Binomial mass and cumulative distribution functions for n = 12 , 𝑝 =
𝑝 𝑥 = 𝑏 𝑥; 12; 𝑎𝑛𝑑 𝐹(𝑥)=B 𝑥; 12;

Remark: For 𝑋~𝐵𝑖𝑛(𝑛, 𝑝), the cdf will be denoted by 𝐵 𝑘; 𝑛, 𝑝 = 𝑃 𝑋 ≤ 𝑘 = ∑ 𝑏(𝑦; 𝑛, 𝑝)
8
Example 4.24: Rolling a die 12 times where
Success=Result is six
Fail=Result is not having six
9
Example 4.25: Everyday one of your classmates is taking the tram twice
without any ticket. There is a 5% chance of meeting inspectors each time one
boards the tram. What is the probability that he is going to be caught at least
twice in a 30 day month?
Answer: Answer: Let be the number of times he will meet inspectors per
year. X is a binomial random variable with , p = 0.05 and
10
Example 4.26: Your friend pretends that he is able to distinguish pepsi
from coke. You give him a test. Out of 5 trials, he gets the right answer
4 times. What is the probability that he just got lucky?
Answer: Let be the number of right answers. If he answers
completely randomly, then is a binomial with n = 5 and . We
have
=
=0.1875
This is no convincing evidence that your friend can distinguish the two
beverage
11
Using Binomial Table:
Example 4.27:
12
13
Cumulative Binomial Probabilities Table:
14
Variance and Expected value of a Binomial rv:
Theorem: If then , and
.
Proof: We know where each is Bernoulli rv.
So
We do not prove V(X) result.
Example 4.28:
15
Example 4.29: What is the probability of obtaining 45 or fewer heads in 100 tosses
of a fair coin?
Answer:
16
3. The Uniform Discrete Distribution:
Definition: Let be a discrete rv taking values . Assume that all these values have
equal probability, i.e.
Then this distribution is called a Uniform discrete distribution or equally likely distribution
and we write
Example 4.29: Random experiment where this distribution occurs is the choice
of an integer at random between 1 and 100, inclusive. Let be the number chosen. Then
Theorem: Let Then

1 𝑚+1
𝜇=𝐸 𝑋 = 𝑥𝑝 𝑥 = 𝑥=
𝑚 2
( )( )
=
Example 4.28: In rolling a fair die where =“the upward face showing”,
Then m = 6, µ = 7/2 = 3.5, and .
17
3. The Geometric Distribution:
Consider yet again independent trials, each with success probability . These trials
are performed until a success occurs.
Let be the number of trials required then takes values
is called a Geometric rv with Geometric probability distribution
Remark: (Why?)
Theorem: Let Geometric rv where the probability of success is p. Then
Proof:
18
4. The Hypergeometric distribution:
Consider a box containing balls of which are white and
are black. We take a simple random sample (i.e. without replacement) of size
and
Let the number of white balls in the sample. The Hypergeometric distribution
is the distribution of under this sampling scheme and for example
Definition: Consider a finite population with individuals. Each individual can be

characterized as a success (S) or a failure (F), and there are successes in the
population ( failures). A sample of individuals is selected without
replacement. Let the number of Successes.
Then is a Hypergeometric rv and its pmf is called the hypergeometric distribution
which is
for , an integer, satisfying

19
Remark:
• With a sample survey, Success or failure may correspond variously to
people who do or don’t have leukemia, people who do or don’t
smoke, people who do or don’t favor the death penalty, or people
who will or won’t vote for a particular political party.
• Here is the size of the population, is the number of individuals in

the population with the characteristic of interest, while measures
the number with that characteristic in a sample of size .
20
Example 4.29: Consider a room with 10 people inside, 4 of which have flu.
We choose 7 persons from the room without replacement.
a) What is the probability that 2 of them have flu?
b) What is the probability that more than 2 of them have flu?
Answer: Let =the number of people in the sample who have flu
b)
- -
21
Example 4.30:
22
Theorem: The mean and variance of the hypergeometric rv having
pmf are
Note that is the proportion of S’s in the population and if we replace

it with we have
Remark: shows that the means of the binomial and hypergeometric

rv’s are equal, whereas the variances of the two rv’s differ by the factor
often called the finite population correction factor.
23
Example 4.31: Suppose a company fleet of 20 cars contains 7 cars that do
not meet government exhaust emissions standards and are therefore
releasing excessive pollution. Moreover, suppose that a traffic policeman
randomly inspects 5 cars.
a) How many cars is he likely to find that exceed pollution control
standards?
b) What is the probability of 3 polluting cars?
Answer:
a) N=20, M=7, n=5, X=# of cars exceed pollution control
1.75
b)
24
An Interesting Application:
• In industrial quality control, lots of size N are subjected to sampling
inspection. The defective items in the lot play the role of “S” elements
and their number M is typically unknown.
• A sample size n is taken, and the number X of defective items in it is

determine. We know that X follow a hypergeometric distribution of
parameter N, M and n.
• Having observed X = x, we can estimate M by finding the value of M

which maximizes P (X = x); this is called the maximum likelihood
estimate of M.
25
5. The Poisson distribution:
The Poisson random variable approximates the Binomial random
variable.
Definition: A discrete rv, taking values is said to be a
Poisson random variable with parameter if
Remark: This is indeed a probability distribution as
Since writing the taylor series centered at 0 of , we see that
26
Example 4.32:
27
Remark: Recall that for the Binomial random variable
, and when is small.
Indeed, for the Poisson random variable we will show that
Remark: The Poisson random variable
models the probability of ”successes ” in a given ”time” interval (short

period of time), when the average number of successes is .
• Note that this is used absolutely everywhere as many natural phenomena

are Poisson distributed: visits to a particular website, pulses of some sort
recorded by a counter, email messages sent to a particular address, accidents
in an industrial facility, or cosmic ray showers observed by astronomers at a
particular observatory.
28
Example 4.33: Suppose customers arrive at the rate of six per hour.
The probability that customers arrive in a one-hour period is
0.0148
0.0446
The probability that more than 2 customers arrive is
29
Proposition: Suppose that in the binomial pmf we let
and in such a way that approaches a value . Then
Proof: We can assume then

!
=
! !
→ 30
• Practically this means that if a large number of trials n are made
which have a success probability p such that np is moderate then we
can approximate the total number of successes by a Poisson rv of
parameter , or
when we take
( the average number of successes ) .
This approximation is “close to accurate” if n is large and p small .
Indeed if
31
Example 4.34:
32
Table 3.2 shows
the Poisson distribution for
along with three
binomial distributions with
, and the Figure 3.8
plots the Poisson along with
the first two binomial distributions.
The approximation is of limited
use for , but of course
the accuracy is better for
and much better
for
33
Theorem: If has a Poisson distribution with parameter , then
Proof:
Remark: Since when , , ,

the mean and variance of a binomial variable should approach those of
a Poisson variable. These limits are ,
Remark: The cdf of a Poisson rv with parameter is shown by

so
34
Example 4.35: There are 50 misprints in a book which has 250 pages and assume
the number of errors in a page follows a Poisson distribution of parameter
.
a) Find the probability that page 100 has no misprints.
b) Find the probability that page 100 has 2 misprints.
Answer: If we let be the r.v. denoting the number of misprints on page 100 (or
any other page), is a Poisson r.v. with parameter . So we have
.
35
Example 4.36: You have a 10 meter by 10 meter plot of land, which is divided
into a grid of 100 squares. You scatter 500 seeds on this plot. We assume that each
seed falls at random, so that it is equally likely to fall on any of the 100 squares.
Consider the square in the upper left hand corner.
a) What is the probability that exactly 4 seeds fall on it?
b) What is the probability that 0 seeds fall on it?
Answer: Think of this as dropping 500 seeds, one after the other, and recording
whether the seed falls into the upper left hand square or not. “Success” means
falling into the square, and that happens with probability 1/100 = .01. So, n = 500,
and p = 0.1. The expected number of successes is np = 5. By the Poisson
approximation, we have
36
Example 4.37: The following compares the binomial (blue) and poisson (red)
distributions. Here is And we know
37
4.2 Continuous Random Variables and Probability Distributions:
For the time being, we have only considered discrete random variables (rv) - set of
possible values is finite or countable - such as the number of arrivals in a given time
instants, the number of successful trials, the number of items with a given characteristic.
In many practical applications, we have to deal with random variables whose set of
possible values is uncountable.
Example 4.38:
• Waiting for a bus. The time (in minutes) which elapses between arriving at a bus
stop and a bus arriving can be modelled as a rv, taking values in
• Share price. The values of one share of a specific stock at some given future time can be
modelled as a rv, taking values in
• Weight. The weight of a randomly chosen individual can be modelled as a rv, taking
values in
• Temperature. The temperature in Celsius at a given time can be modelled as a rv,
taking values in
38
Definition: A continuous random variable is a random variable with both of the
following properties:
1. Its set of possible values consists either of all numbers in a single interval on the
number line [this interval can be ] or all numbers in a disjoint union of
such intervals (e.g., ).
2. Any single value of the variable has zero probability, that is,
for any possible value .
39
Definition: Let be a continuous rv. Then a probability distribution or probability
density function (pdf) of is a function such that for any two numbers a and b with
That is, the probability that takes on a value in the interval is the area above this
interval and under the graph of the density function, as illustrated in Figure 4.2. The graph
of is often referred to as the density curve.
Remark: For to be a legitimate pdf, it must satisfy the following two conditions:
1) for all
2) area under the entire graph of f(x) , 40
Remark:
• 𝑓 (𝑥) is NOT a probability, e.g. 𝑓 (1) is not the probability that
𝑋 = 1, it is a probability density.
• As 𝑓 (𝑥) is NOT a probability, it is NOT surprising to have 𝑓 (𝑥) > 1

for some values of 𝑥 in ℝ.
• For a continuous rv, the probability of any single value 𝑐 is zero; i.e.
for any 𝑐 in ℝ, we have 𝑃 𝑋 = 𝑐 = 𝑃 𝑐 ≤ 𝑋 ≤ 𝑐 = ∫ 𝑓 𝑥 𝑑𝑥 = 0
• 𝑃 𝑋 = 𝑐 = 0 implies that
𝑃 𝑎≤𝑋≤𝑏 =𝑃 𝑎<𝑋≤𝑏 =𝑃 𝑎≤𝑋<𝑏 =𝑃 𝑎<𝑋<𝑏
• Keep track of the “range” of a density, such as the interval [0, 1] or

[0, +∞) when doing your calculations.
• Note that 𝑃 𝑋 < 𝑏 = ∫ 𝑓 𝑥 𝑑𝑥
• Also lim 𝑃 𝑋 < 𝑏 = 0 and lim 𝑃 𝑋 < 𝑏 = 1

→ →
41
Example 4.39:
42
Definition: Let be a probability density function (pdf) of the rv, . The
associated (cumulative) distribution function is
( )
Therefore we have .
Remark: and
→ →
43
Example 4.40:
44
45
Example 4.41: Assume you are told that the pdf of a rv, satisfies
where is an unknown constant. What is the probability that
46
Definition: A continuous rv, is said to have a uniform distribution on the
interval if the pdf of is
Example 4.43:
47
48
Proposition(Using to Compute Probabilities):
Let X be a continuous rv, with pdf and cdf Then for any number ,
and for any two numbers and with ,
49
Example 4.44: Draw graphs of the distribution and density functions
Verify:
1-
2-
3-
4-
Example 4.45: Let have a Uniform distribution on interval . Find
50
Definition: The median of a continuous distribution, denoted by , is
the 50th percentile, so satisfies
)=
That is, half the area under the density curve is to the left of and half
is to the right of .
Remark: A continuous distribution whose pdf is symmetric—the graph

of the pdf to the left of some point is a mirror image of the graph to the
right of that point—has median equal to the point of symmetry.
51
Example 4.46: A group insurance policy covers the medical claims of the
employees of a small company. The value, , of the claims made in one year is
described by
where X is a random variable with pdf
What is the probability that exceeds 40,000 given that exceeds 10,000?
52
Expected value of a continuous rv:
Definition: The expected or mean value of a continuous rv, with pdf
is
Example 4.47: In the previous example find
53
Definition: If is a continuous rv with pdf and is any
function of , then
( )
Definition: The variance of a continuous random variable with pdf

and mean value is
The standard deviation (SD) of is
Remark: The variance and standard deviation give quantitative

measures of how much spread there is in the distribution or population
of values. Again is roughly the size of a typical deviation from .
54
Proposition:
Proof:
Example 4.48: Find the standard deviation in Example 4.46.
Remark: and
Proof: Exercise
55
Uniform Density: As we had the following is a uniform density function of on the
interval
Proposition: Let be a uniform pdf on Then
Proof:
56
4.2.1: The Exponential Distribution:
The family of exponential distributions provides probability models that
are very widely used in engineering and science disciplines.
Definition: is said to have an exponential distribution with parameter
if the pdf of is
Proposition: Let have an exponential distribution with parameter Then its cdf
function is
Proof:
Proposition: Let have an exponential distribution with parameter
Proof:
57
Example 4.49:
Example 4.50:
58
Remark: The exponential distribution is frequently used as a model for
the distribution of times between the occurrence of successive events,
such as customers arriving at a service facility or calls coming in to a
switchboard. The reason for this is that the exponential distribution is
closely related to the Poisson process.
Theorem: Suppose that the number of events occurring in any time

interval of length has a Poisson distribution with parameter
(where the rate of the event process, is the expected number of
events occurring in 1 unit of time) and that numbers of occurrences in
non-overlapping intervals are independent of one another. Then the
distribution of elapsed time between the occurrence of two successive
events is exponential with parameter
59
Why? the result is easily verified for the time until the first event
occurs: which is exactly the cdf of the exponential distribution.
which is exactly the cdf of the exponential distribution.

Example 4.51: Suppose that calls are received at a 24-hour “suicide hotline”
according to a Poisson process with rate call per day.
a) What is the probability that more than 2 days elapse between calls?
b) What is the expected time between successive calls?
Answer: Then the number of days X between successive calls has an exponential
distribution with parameter value
.
60
61
Theorem:
a) Let have a uniform distribution on , then the median of
𝑨 𝑩
is
𝟐
b) Let have an exponential distribution with parameter λ, then the
𝑳𝒏𝟐
median of is .
Proof:
62
4.2.2: The Normal Distribution:
The normal distribution is the most important one in all of probability
and statistics. Many numerical populations have distributions that can
be fit very closely by an appropriate normal curve. Examples include
heights, weights, and other physical characteristics, measurement
errors in scientific experiments, anthropometric measurements on
fossils, reaction times in psychological experiments, measurements of
intelligence and aptitude, scores on various tests, and numerous
economic measures and indicators. In addition, even when individual
variables themselves are not normally distributed, sums and averages
of the variables will under suitable conditions have approximately a
normal distribution; this is the content of the Central Limit Theorem
discussed in the next chapter.
63
The Standard Normal distribution:
Definition: A continuous rv has the standard normal distribution if its pdf is
Theorem: Let have the standard normal distribution. Then and

.
Proof:
One can show (not easy)
Remark: Let be have the standard normal distribution. Then we write
64
Definition: We denote the cdf of a standard normal variable by
so . Also The graph of f(z; 0, 1) is called
the standard normal (or z) curve.
65
Theorem:
3) Let then
66
Example 4.52:
67
68
69
Definition: A continuous rv, has a normal distribution with parameters
and ( if its pdf is
and we write .
Remark: Let .
1- )
2-
3-
4-
70
Proof of 3) To show , we show
( Using a u-sub.)
Example:
71
When then equals ,
however this integral is not simple. So probabilities involving are
computed by “standardizing.”
The standardized variable is . Subtracting shifts the mean from
to zero, and then dividing by scales the variable so that the
standard deviation is 1 rather than
Remark: If then )= and )=

Proof:
72
Theorem: If , then
has a standard normal distribution. Thus
and
Remark: The key idea of the proposition is that by standardizing, any

probability involving can be expressed as a probability involving a
standard normal rv, Z, so that Table can be used.
73
Example 4.53:
74
75
Example 4.54: The breakdown voltage of a randomly chosen diode of a particular
type is known to be normally distributed. What is the probability that a diode’s
breakdown voltage is within 1 standard deviation of its mean value?
Answer: This question can be answered without knowing either or , as long as the
distribution is known to be normal
Corollary: If the population distribution of a variable is (approximately) normal, then

1. Roughly 68% of the values are within 1 SD of the mean.
2. Roughly 95% of the values are within 2 SDs of the mean.
3. Roughly 99.7% of the values are within 3 SDs of the mean.
76
Example 4.55: Suppose .
Use the standard normal Table to determine:
• P( X ≤ −0.5 )
• P( X ≥ 0.5 )
• P( | X − μ | ≥ 0.5 )
• P( | X − μ | ≤ 0.5 )
Answer: This means
77
Approximation of a Binomial Distribution Using a Normal Curve:
The normal distribution is often used as an approximation to the distribution of
values in a discrete population. In such situations, extra care should be taken to
ensure that probabilities are computed accurately.
Example 4.56: IQ in a particular population (as measured by a standard test) is

known to be approximately normally distributed with and . What is
the probability that a randomly selected individual has an IQ of at least 125?
Letting “the IQ of a randomly chosen person”, we wish . The

temptation here is to standardize as in previous examples. However, the
IQ population distribution is actually discrete, since IQs are integer-valued.
78
So the normal curve is an approximation to a discrete probability histogram, as
pictured in the following Figure. The rectangles of the histogram are centered at
integers, so IQs of at least 125 correspond to rectangles beginning at 124.5, as
shaded in Figure. Thus we really want the area under the approximating normal
curve to the right of 124.5. Standardizing this value gives ,
whereas standardizing 125 results in . The difference is not
great, but the answer .0516 is more accurate. Similarly, would be
approximated by the area between 124.5 and 125.5, since the area under the
normal curve above the single value 125 is zero.
79
Remark: The correction for discreteness of the underlying distribution in Example 4.56 is
often called a continuity correction.
80
81
Proposition: Let X be a binomial rv based on trials with success
probability . Then if the binomial probability histogram is not too
skewed, has approximately a normal distribution with and
. In particular, for a possible value of ,
In practice, the approximation is adequate provided that both

and , since there is then enough symmetry in the
underlying binomial distribution.
82
Example 4.57:
83
5. Sampling and Central Limit Theorem:
The observations in a single sample were denoted in Chapter 1 by
Consider selecting two different samples of size n from the same population distribution.
The s in the second sample will virtually always differ at least a bit from those in the
first sample. For example, a first sample of cars of a particular type might result in
fuel efficiencies , whereas a second sample may give
. Before we obtain data, there is uncertainty about
the value of each . Because of this uncertainty, before the data becomes available we
view each observation as a random variable and denote the sample by
5.1. Distribution of Linear Combination:

Definition: Given a collection of n random variables and numerical constants
the rv,
is called a linear combination of the ’s.
84
For example,
is a linear combination of and with , ,
and .
2) is a linear combination of
Definition: Two random variables and are said to be independent if for every pair
of and values
when and are discrete
or
when and are continuous
If the equality is not satisfied for all then and are said to be dependent.
85
Example 5.1: The random variables X and Y with density function
are independent because one can show
86
Theorem 5.1: Given a collection of n random variables ,
1) Whether or not the are independent,
2) If are independent
and
⋯
Example 5.2:
87
5.2 Sampling:
Sampling can consist of
• Gathering random data from a large population, for example,
− measuring the height of randomly selected adults
− measuring the star ng salary of random CS graduates
• Recording the results of experiments , for example,

− measuring the breaking strength of randomly selected bolts
− measuring the life me of randomly selected light bulbs
88
Definition: A random sample from a population consists of independent, identically
distributed (all with the same distribution) random variables, .
Very important remark: Whenever we mention a random sample, this random

sample is defined on a population which has originally a distribution therefore there
was a random variable (on this population) which has a mean and standard
deviation.
Definition: Let be a random sample (independent, identically

distributed). Then the sample mean is
and the sample total is
89
Theorem 5.0 : Let be a random sample from a distribution
(population) with mean value
standard deviation .
Then , and .
Also , and
Remark: Theorem 5.0 is important since it is implying that mean of is centered

precisely at the mean of the population mean from which the sample has been
selected. Indeed this is true even though each sample can have a different mean.
Theorem 5.1: Let be a random sample from a normal distribution

(population) with mean and standard deviation . Then for any n,
a) is normally distributed with mean and standard deviation .
b) is normally distributed with mean and standard deviation .
90
Example: The time that it takes a randomly selected rat of a certain
subspecies to find its way through a maze is a normally distributed rv with
min and min. Suppose five rats are selected. Let denote their
times in the maze. Assuming the s are a random sample from this normal
distribution,
a) what is the probability that the total time for the five is between 6 and 8 min?
b) Determine the probability that the sample average time is at most 2.0 min.
Answer: is normally distributed with mean and standard deviation
.
is normally distributed with mean and standard
deviation .
6 − 7.5 8 − 7.5
𝑃 6≤𝑇 ≤8 =𝑃 ≤𝑍≤ = 𝑃(−1.92 ≤ 𝑍 ≤ 0.64) = 𝛷 0.64 − 𝛷 −1.92
0.783 0.783
= 0.7115
2 − 1.5
𝑃 𝑋≤2 =𝑃 𝑍≤ = 𝑃 𝑍 ≤ 3.19 = 𝛷 3.19 = 0.9993
0.1565
91
5.3 Central Limit Theorem:
Theorem(The Central Limit Theorem (CLT)):
Let be a random sample from a distribution with mean and
standard deviation . Then if n is sufficiently large,
1) has approximately a normal distribution with and ,
2) also has approximately a normal distribution with and .
The larger the value of n, the better the approximation.
92
Theorem: If
Proposition: The CLT also applies to discrete random variables.
Example 5.3:
93
Example 5.4:
94
Definition: A statistic is a function of random variables
• Thus a statistic itself is a random variable .
Example:
1) The sample mean is a statistic.
2) The sample total is a statistic.
3) The sample variance is a statistic.
4) The sample standard deviation is a statistic.
5) The sample range, the difference between the largest and the
smallest observation, is a statistic.
.
.
.
95
6. Statistical inference:
6.1. Confidence Interval (CI):
Suppose that the mean of a population is unknown while
1. The population distribution is normal
2. The mean of a sample is known.
3. The value of the population standard deviation is known
How do we approximate ?
Motivational example:
Let be a random sample from a normal distribution with mean and standard
deviation equal . We know that, is normally distributed with mean and
standard deviation .
Standardizing we have the standard normal variable
96
Because the area under the standard normal curve between and is ,
So lies in the interval .
97
Remark: The interval is random because the two
endpoints of the interval involve a random variable. It is centered at the sample
mean and extends to each side by . The interval’s width is not random;
only the location of the interval (its midpoint ) is random (Following Figure). Now
can be rephrased as
“the probability is that the random interval
includes the true value of .”
Before any experiment is performed and any data is gathered,
it is quite likely that will lie inside this interval.
98
Definition: Let be a random sample from a normal distribution with
mean and standard deviation (these are for the whole population).
Assume we observe , we compute the observed
sample mean and then substitute into in
, the resulting fixed interval is called a 95% confidence interval (CI) for . This CI
can be expressed either as
or
99
Example 6.1:
100
Deriving the confidence interval (CI):
For a given we define ⁄ -critical value as the number satisfying
- 𝜶⁄𝟐 𝜶⁄
𝟐
where is the standard normal distribution.
We can show that ⁄
Now Let be a random sample from a normal distribution with
mean and standard deviation . Assume we observe
- ⁄ ⁄ ⁄ ⁄ ⁄ ⁄
So by , ⁄ ⁄ , so
Theorem: confidence interval for the mean of a normal population
when the value of is known is given by
where is the sample mean and ⁄ 101

Theorem: Let be a random sample from a population with
mean and standard deviation equal . By CLT, If n is sufficiently large,
has approximately standard normal distribution. So
is large-sample confidence interval for with confidence level approximately

%.
where, is the average of an observed sample and s is the standard deviation of
the sample.
⁄ is called the margin of error, so the interval is
Remark: Note that in this theorem, originally the interval was

⁄ ⁄ however since n is large we replaced by .
102
Example 6.2: The production process for engine control housing units of a
particular type has recently been modified. Prior to this modification, historical
data had suggested that the distribution of hole diameters for bushings on the
housings was normal with a standard deviation of . It is believed that the
modification has not affected the shape of the distribution or the standard
deviation, but that the value of the mean diameter may have changed. A sample of
40 housing units is selected and hole diameter is determined for each one,
resulting in a sample mean diameter of 5.426 mm. Calculate a confidence interval
for true average hole diameter using a confidence level of 90%.
Answer: Now
⁄
which is the cumulative z-curve area of . Therefore using the table

⁄ 1.645, so
103
A Confidence Interval for a Population Proportion:
Let p denote the proportion of “successes” in a population. How can
we approximate p using the proportion of successes in a sample ?
For example, assume 5 of 50 randomly selected computers need a
service, then what proportion of all the computers need a service?
104
Theorem: Let p denote the proportion of “successes” in a population (is unknown).
A random sample of n individuals is to be selected, and is the number of successes in
the sample.
(Provided that n is small compared to the population size, X can be regarded as a binomial
rv with and where . Furthermore, if both and
, has approximately a normal distribution.)
Assume is the proportion of “successes” in a sample. So is also a rv. We have
and . As we know since is approximately normally distributed,
is also approximately normal. By standardizing it we have
- ⁄ ⁄
Isolating p in the bracket, for large n, we obtain
105
So to summarize,
Assume the population proportion is unknown. Let be the sample
proportion in a large sample size n. Then the confidence interval of
for is
where ⁄ .
Note that the interval can be written as
where ⁄ is called the marginal error.
106
Example 6.3: In a sample of 1000 randomly selected consumers who had opportunities to send in a
rebate claim form after purchasing a product, 250 of these people said they never did so (“Rebates:
Get What You Deserve,” Consumer Reports, May 2009: 7). Reasons cited for their behavior included
too many steps in the process, amount too small, missed deadline, fear of being placed on a mailing
list, lost receipt, and doubts about receiving the money. Calculate an upper confidence bound at the
95% confidence level for the true proportion of such consumers who never apply for a rebate.
Based on this bound, is there compelling evidence that the true proportion of such consumers is
smaller than 1/3? Explain your reasoning.
Answer: 𝑝̂ = = = 0.25. So 𝑞 = 1 − 𝑝̂ = 1 − 0.25 = 0.75
.
We should find 𝑧 ⁄ , while 1 − 𝛼 = 0.95. 𝛷 𝑧 ⁄ =1− =1− =0.975. Using the table for
standard normal, 𝑧 ⁄ = 1.96. So
𝑝̂ 𝑞 𝑝̂ 𝑞 0.25 × 0.75 0.25 × 0.75
𝑝̂ − 𝑧 , 𝑝̂ + 𝑧 = 0.25 − 1.96 , 0.25 + 1.96 =⋯
𝑛 𝑛 1000 1000
107
Definition:
1) The width of the CI of a population mean, is ⁄
2) The width of the CI of a population proportion, is ⁄
Confidence Level and Sample Size:

Example 6.4: Extensive monitoring of a computer time-sharing system has suggested that
response time to a particular editing command is normally distributed with standard deviation 25
millisec. A new operating system has been installed, and we wish to estimate the true average
response time 𝜇 for the new environment. Assuming that response times are still normally
distributed with 𝜎 = 25 , what sample size is necessary to ensure that the resulting 95% CI has a
width of (at most) 10?
Answer: 𝜎 = 25, 𝑤𝑖𝑑𝑡ℎ = 10, n=?
.
We should find 𝑧 ⁄ , while 1 − 𝛼 = 0.95. 𝛷 𝑧 ⁄ =1− =1− =0.975. so 𝑧 ⁄ = 1.96.
10 = 2𝐸 = 2𝑧 ⁄ = 2 × 1.96 × , so 2 × 1.96 × = 10, 𝑠𝑜 𝑛 = 2 × 1.96 × = 9.8, so
𝑛 = 96.04. Therefore 𝑛 = 97.
108
Interpreting a Confidence Interval:
Assume that a 95% confidence interval of a population mean is (79.3, 80.7). In
general, it is “incorrect” to write the statement
lies in (79.3, 80.7))=0.95
A correct interpretation of “95% confidence” relies on the long-run relative frequency
interpretation of probability: To say that an event A has probability is to say that if the
experiment on which A is defined is performed over and over again, in the long run A will
occur 95% of the time. Suppose we obtain another sample of typists’ preferred heights and
compute another 95% interval in example 6.1. Then we consider repeating this for a third
sample, a fourth sample, a fifth sample, and so on. Let A be the event that
Since , in the long run of our computed CIs will contain .

This is illustrated in the following figure, where the vertical line cuts the measurement axis
at the true (but unknown) value of m. Notice that 7 of the 100 intervals shown fail to
contain . In the long run, only 5% of the intervals so constructed would fail to contain .
109
110
7. Tests of Hypothesis:
7.1. Test of Hypothesis based on a single sample:
A parameter can be estimated from sample data either by a single number or an
entire interval of possible values (a confidence interval). Frequently, however, the
objective of an investigation is not to estimate a parameter but to decide which of
two contradictory claims about the parameter is correct. Methods for this
comparison is part of statistical inference called hypothesis testing.
A statistical hypothesis, or just hypothesis, is a claim either about the value of a

single parameter (population characteristic like or …), or about the values of
several parameters, or about the form of an entire probability distribution.
111
In any hypothesis-testing problem, there are two contradictory
hypotheses under consideration. For example:
1) The first claim and the other
2) Then first claim and the other
In testing statistical hypotheses, the problem will be formulated so that

one of the claims is initially favored. This initially favored claim will not
be rejected in favor of the alternative claim unless sample evidence
contradicts it and provides strong support for the alternative claim.
112
Definition:
The null hypothesis, denoted by
the claim that is initially assumed to be true (the “prior belief” claim).
The alternative hypothesis, denoted by
the assertion that is contradictory to (“researcher’s hypothesis”) .
The null hypothesis will be rejected in favor of the alternative hypothesis

only if sample evidence suggests that is false. If the sample does not
strongly contradict , we will continue to believe in the null hypothesis.
The two possible conclusions from a hypothesis-testing
analysis are then reject or fail to reject .
A test of hypotheses is a method for using sample data to decide whether
the null hypothesis should be rejected.
113
To summarize:
• Often we want to decide whether a prior hypothesis 𝟎 is True or False.
• To do so we gather data, i.e., a sample .
• A typical hypothesis is that a random variable has a given mean .
• Based on the data we want to accept or reject the hypothesis 𝟎 .
Example:
We consider ordering a large shipment of 50 watt light bulbs.
The manufacturer claims that :
• The lifetime of the bulbs has a normal distribution .
• The mean lifetime is μ = 1000 hours.
• The standard deviation is σ = 100 hours.
We want to test the hypothesis that μ = 1000.
We assume that :
• The lifetime of the bulbs has indeed a normal distribution .
• The standard deviation is indeed σ = 100 hours.
• We test the lifetime of a sample of 25 bulbs .
114
Remark: The alternative to the null hypothesis will look like one of
the following three:
1.
2.
3.
Definition: A test procedure is specified by the following:
1. The parameter (we only cover ) on which the decision (reject or do not
reject ) is to be based
2. A rejection region, the set of all test values ( values) for which will be
Rejected.
115
Example: suppose a cigarette manufacturer claims that the average
nicotine content of brand B cigarettes is (at most) 1.5 mg. It would be
unwise to reject the manufacturer’s claim without strong contradictory
evidence, so an appropriate problem formulation is to test
versus .
Consider a decision rule based on analyzing a random sample of 32
cigarettes. Let denote the sample average nicotine content. If is true,
, whereas if is false, we expect to exceed 1.5. Strong
evidence against is provided by a value that considerably exceeds 1.5.
Thus we might use as a test statistic along with
the rejection region .
116
Definition:
• A type I error consists of rejecting the null hypothesis when it is true.
• A type II error involves not rejecting when is false.

•
We define and
Example:
In the nicotine scenario,
A type I error: consists of rejecting the manufacturer’s claim that
when it is actually true (for example .
A type II error: Accepting the manufacturer’s claim although it is False(for

example since we observe we accept .
117
Example 7.1:
118
Example 7.2:
119
Test of Hypothesis for Mean of a Normal Population:
Let represent a random sample of size n from the normal
population. Then the sample mean has a normal distribution with
expected value and standard deviation . Then the following
is hypothesis test with significance level
where ⁄ and
120
121
Example:
122
123
Test of Hypothesis for Mean for Large-Sample:
The previous Test is true even if represent a
random sample of size n from a population with any distribution as
long as n is sufficiently large . Since will be approximately
normal by CLT. Note that in this case we will let Test statistic to be
So since n is large (the sample standard deviation) is close to (the

population standard deviation).
124
Example: A dynamic cone penetrometer (DCP) is used for measuring
material resistance to penetration (mm/blow) as a cone is driven into
pavement or subgrade. Suppose that for a particular application it is
required that the true average DCP value for a certain type of pavement
be less than 30. The pavement will not be used unless there is conclusive
evidence that the specification has been met (Let %5 significance level
when it is not mentioned). The following data was collected for DCP:
14.1 14.5 15.5 16.0 16.0 16.7 16.9 17.1 17.5 17.8
17.8 18.1 18.2 18.3 18.3 19.0 19.2 19.4 20.0 20.0
20.8 20.8 21.0 21.5 23.5 27.5 27.5 28.0 28.3 30.0
30.0 31.6 31.7 31.7 32.5 33.5 33.9 35.0 35.0 35.0
36.7 40.0 40.0 41.3 41.7 47.5 50.0 51.0 51.8 54.4
55.0 57.0
Should we use this pavement? 125
Answer: Since the number of sample elements n=52>40 is large by CLT, is
normal with mean and stand. Deviation
1. true average DCP value
2. :
3. : (so the pavement will not be used unless the null hypothesis is rejected)
̅
4.
5. Since and , the rejection region is where

So using the table, (Lower-tailed)
6. With
7. Since -0.73 >-1.645, cannot be rejected. So we should not use the

pavement. 126
Test of Hypothesis of proportion for Large-Sample:
Reminder: Let denote the proportion of individuals or objects in a
population who have a specified property (e.g., cars with manual
transmissions or smokers who smoke a filter cigarette). If an individual or
object with the property is labeled a success ( ), then is the population
proportion of successes. Tests concerning will be based on a random
sample of size from the population. Provided that is small relative to the
population size, (the number of ’s in the sample) has a
binomial distribution. Furthermore, if itself is large [
, both and are approximately normally distributed.
( )
We know that and Note that is the
sample proportion. 127
The Test for significance level :
where ⁄ and
128
Example:
129
130
7.2. Test of Hypothesis based on two sample:
The inferences discussed in this section concern a difference between the
means of two different population distributions. An investigator might, for
example, wish to test hypotheses about the difference between true average
breaking strengths of two different types of corrugated fiberboard. One such
hypothesis would state that that is, that . Alternatively, it may
be appropriate to estimate by computing a 95% CI.
In this section we consider the following assumptions:
Basic assumptions:
1. is a random sample from a distribution with mean and variance .
3. The samples are independent of one another.
131
In this section we want to study the difference between two samples
Theorem: With the prior(previous) assumption, the expected value of

is . The standard deviation of is is
Proof:
Since are independent
132
Hypothesis Test for Difference of Two Means in a Normal Population:
Assume Population distribution is Normal and assume
3. The samples are independent of one another.
The Test Hypothesis for with significance level , is (note that z is standard
normal)
133
Example:
134
Answer:
135
Hypothesis Test for Difference of Two Means in Large Samples:
The assumptions of normal population distributions and known values of and are
fortunately unnecessary when both sample sizes are sufficiently large ( .
In this case, the Central Limit Theorem guarantees that has approximately a
normal distribution regardless of the underlying population distributions. Then we have
𝟎 𝟏 𝟐 𝟎
𝟏 𝟐
𝒂 𝟏 𝟐 𝜶
𝒂 𝟏 𝟐 𝜶
𝒂 𝟏 𝟐 𝜶 or 𝜶
𝟐 𝟐
136
Example:
137
138
8. Chi-Square Test for Goodness-of-Fit:
In this section, each observation in a sample is classified as belonging to one
of a finite number of categories (e.g., blood type could be one of the four
categories O, A, B, or AB). Let denoting the probability that any particular
observation belongs in category (or the proportion of the population
belonging to category ). We then wish to test a null hypothesis that
completely specifies the values of all the
(such as , when there are four
categories).
The test statistic is based on how different the observed numbers in the
categories are from the corresponding expected numbers when is true.
139
A binomial experiment consists of a sequence of independent trials in which each trial
can result in one of two possible outcomes: S (for success) and F (for failure).
Considering and , we presented a large-sample z test for
testing (note that this can be written as ,
Definition: A multinomial experiment generalizes a binomial experiment by allowing

each trial to result in one of possible outcomes, where . Indeed the multinomial
experiment has the following variables (note that this experiment is multinomial if n is
small compared to the population)
number of elements in a sample selected from a population
number of possible outcomes (number of categories)
the probability that a trial results in category ith in the population(the population
proportion in the th category)
140
Example: For example, suppose a store accepts three different types of credit cards. A
multinomial experiment would result from observing the type of credit card used type
1, type 2, or type 3—by each of the next 100 customers who pay with a credit card. If
35 use type 1 and 54 use type 2 and the rest type 3, then Note that we
do not know .
Note that if 𝟎 𝟏 𝟐 𝟑 then 𝟎 is not true if at least one

of the 𝒊 ’s has a value different from that asserted by 𝟎 (in which case at least two
must be different, since they sum to 1).
Remark: Before the multinomial experiment is performed, the number of trials that will
result in category ( ) is a random variable such that .
In Multinomial Experiment with categories, each with probability and elements
in the sample
141
Assume that , then if is true
then
We usually consider a table where the expected values when is true
are displayed just below the observed values. The ’s and ’s are
usually referred to as observed cell counts, and are the corresponding
expected cell counts under .
142
Chi-Squared Test 𝟐
:
, can be found using a table. is the degree of freedom.

143
Motivation for this test: The ’s should all be reasonably close to the
corresponding when is true. The test procedure involves
assessing the difference between the ’s and the , with being
rejected when the difference is sufficiently large. It is natural to consider
. However, suppose and . Then if
and , the two categories contribute the same squared
deviations to the proposed measure. Yet can be only 5% less than what
would be expected when is true, whereas 50% less.
So we divide each by its expected count , so we get
Also, since one of the can be written in terms of others

the degree of freedom is . Note that is the significance level
(upper-tail area).
144
Example:
145
Solution:
146
147
9. Linear Regression and Correlation:
9.1. Correlation:
There are many situations in which the objective in studying the joint behavior of
two variables is to see whether they are related. In this section, we first develop
the sample correlation coefficient as a measure of how strongly related two
variables and are in a sample.
Definition: Let and be two random variables. We define the covariance of

and by
We can show that
Remark: Note that
148
Theorem: Given numerical pairs …, . We have
Definition: Given numerical pairs …, .

We define sample correlation coefficient as
The coefficient of determination is defined as
149
150
151
Example:
152
9.2. Regression Line:
Much of mathematics is devoted to studying variables that are
deterministically related. Saying that x and y are related in this manner
means that once we are told the value of x, the value of y is completely
specified.
There are many variables x and y that would appear to be related to
one another, but not in a deterministic fashion. A familiar example is
given by variables
x=high school grade point average (GPA) and y= college GPA
The value of y cannot be determined just from knowledge of x, and two
different individuals could have the same x value but have very different
y values.
153
Regression analysis is the part of statistics that investigates the
Relationship between two or more variables related in a nondeterministic
fashion. In this section, we generalize the deterministic linear relation to
a linear probabilistic relationship.
The simplest deterministic mathematical relationship between two
variables and is a linear relationship .
is called independent, predictor, or explanatory variable.
is called dependent or response variable.
154
Consider a sample data consisting of observed pairs
We want to find an estimate regression line that can best
approximate our sample data.
Such a line should have the property that
Should be minimum (this is the sum of the errors).

So we should find the absolute minimum of
That is are such that

for any and
155
Using multivariable calculus and least squared estimate, we obtain the
following:
Theorem: Consider a sample data consisting of observed pairs
Then the estimate regression line is
where
156
Example:
Answer:
So We estimate that the expected change in true average cetane number
associated with a 1 g increase in iodine value is , i.e., a decrease
of .
157
158

Slides 2 Statistics

Uploaded by

Copyright:

Available Formats

Slides 2 Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides 2 Statistics

Uploaded by

Copyright:

Available Formats

Probability and Statistics-LCW

e.g., tossing a coin, winning or losing a game, · · · .

2. When rolling a die , with outcome k , (1 ≤ k ≤ 6) , let

From the definition of , , , and so on.

Possible values for in an n-trial experiment are . We will often write

to indicate that is a binomial rv based on n trials with success probability .

where 𝒊 is a Bernoulli random variable with probability . Then is a

Example 4.22: Consider a binomial experiment with n=3. Then there

Let be the corresponding binomial random variable. Then

Theorem: Let . Let Then

𝑝 𝑥 = 𝑏 𝑥; 12; 𝑎𝑛𝑑 𝐹(𝑥)=B 𝑥; 12;

Theorem: Let Then

Definition: Consider a finite population with individuals. Each individual can be

for , an integer, satisfying

• Here is the size of the population, is the number of individuals in

Note that is the proportion of S’s in the population and if we replace

Remark: shows that the means of the binomial and hypergeometric

• A sample size n is taken, and the number X of defective items in it is

• Having observed X = x, we can estimate M by finding the value of M

Remark: This is indeed a probability distribution as

Since writing the taylor series centered at 0 of , we see that

Remark: The Poisson random variable

models the probability of ”successes ” in a given ”time” interval (short

• Note that this is used absolutely everywhere as many natural phenomena

Proof: We can assume then

Remark: Since when , , ,

Remark: The cdf of a Poisson rv with parameter is shown by

• As 𝑓 (𝑥) is NOT a probability, it is NOT surprising to have 𝑓 (𝑥) > 1

• Keep track of the “range” of a density, such as the interval [0, 1] or

• Note that 𝑃 𝑋 < 𝑏 = ∫ 𝑓 𝑥 𝑑𝑥

• Also lim 𝑃 𝑋 < 𝑏 = 0 and lim 𝑃 𝑋 < 𝑏 = 1

where is an unknown constant. What is the probability that

and for any two numbers and with ,

Example 4.45: Let have a Uniform distribution on interval . Find

Remark: A continuous distribution whose pdf is symmetric—the graph

Example 4.47: In the previous example find

Definition: The variance of a continuous random variable with pdf

The standard deviation (SD) of is

Remark: The variance and standard deviation give quantitative

Example 4.48: Find the standard deviation in Example 4.46.

Proposition: Let be a uniform pdf on Then

Theorem: Suppose that the number of events occurring in any time

which is exactly the cdf of the exponential distribution.

Theorem: Let have the standard normal distribution. Then and

One can show (not easy)

Remark: Let be have the standard normal distribution. Then we write

Remark: If then )= and )=

has a standard normal distribution. Thus

Remark: The key idea of the proposition is that by standardizing, any

Corollary: If the population distribution of a variable is (approximately) normal, then

Example 4.56: IQ in a particular population (as measured by a standard test) is

Letting “the IQ of a randomly chosen person”, we wish . The

In practice, the approximation is adequate provided that both

5.1. Distribution of Linear Combination:

is called a linear combination of the ’s.

are independent because one can show

• Recording the results of experiments , for example,

Very important remark: Whenever we mention a random sample, this random

Definition: Let be a random sample (independent, identically