Statistics and Probability Finals Reviewer

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

LESSON 1: ESTIMATION OF THE MEAN

Recall that zα is the point that corresponds to area to the right of z under the normal curve, that is the
significance level α. If α represents the probability of NOT capturing the population means μ, then 1 – α is the
level of confidence that the interval contains the true population mean.

Significance Level α (1 – α)100% Confidence Level

0.01 (1 – 0.01)100% = 99%

0.05 (1 – 0.05)100% = 95%

0.10 (1 – 0.10)100% = 90%

1
We make use of z α instead of z α .This means that z α is the z-score that places the area α in each tail of the
2 2 2

normal curve.

For example, for a 95% confidence level, we have:

For 99% confidence level, we have:

FINALS mike jeffer cristobal domingo


These are the values of confidence levels that are commonly used.

(1 – α)100% α zα
Significance Level α
Confidence Level 2 2

0.10 (1 – 0.10)100% = 90% 0.05 or 5% z 0 .05 = 1.65

0.05 (1 – 0.05)100% = 95% 0.025 or 2.5% z 0 .0 2 5 = 1.96

0.01 (1 – 0.01)100% = 99% 0.005 or 0.5% z 0 .0 0 5 = 2.58

Take a look at the table below. The same sample from a population of particular study is used in each
confidence level. The following confidence intervals are obtained.

Confidence Confidence
Length of Confidence Interval
Level Interval

90% 20.8 < μ < 34.3 13.5

95% 18.5 < μ < 35.7 17.2

99% 15.8 < μ < 38.3 22.5

 This means that for a 90% confidence level, the population mean is between the values 20.8 and 34.3 for
the set of data in this sample. Also, for a 95% confidence level, the population mean is between 18.5 and
35.7 of the sample. Finally, for a 99% confidence level, the population mean is between the values of
15.8 and 38.3 in the sample.
 The idea behind interval estimate is that for large random samples n ≥ 30, and the population is infinite,
σ
the sampling distribution of the mean is approximately a normal distribution, where μx = μ and σ x = .
√n
σ
 Hence, the mean of a large random sample will differ from an infinite population by at most z α x .
2 √n
 We now recall that the z-score of a normal distribution is obtained by the formula.
x −μ
zα=
2
σ
√n

 By using x as an estimate of μ, the probability 1 – α is the probability that this estimate is “off” either
way by at most | x−μ | or x−μ , as shown below.
Where x−μ is the maximum error of estimate E.

FINALS mike jeffer cristobal domingo


Hence, in
x −μ
zα=
2
σ
√n
We have,
E
zα=
2
σ
√n
By multiplication property of equality, solving for E.

σ
Margin of error E = z α x
2 √n

So that we have the following.:

• Thus, we obtain the expanded expression.


Example:
1. A group of statisticians wants to use the mean of a random sample size n = 200 to estimate the average sales
of milk cartons of a salesman in a month. Based on previous computations, σ = 7.8 What can they declare with
0.95 or 95% confidence about the maximum error of their estimate?
Solution:
σ
Margin of error E = z α x
2 √n
n = 200
σ = 7.8
confidence level = 95%
z α = z 0.05 = z
0.025 = 1.96
2 2

FINALS mike jeffer cristobal domingo


• Referring to normal curve, we have:

2. Using the data of the number of hours of free time of grade 11 students presented at the start of the section
where the sample mean is 2.285, what can be said with 99% confidence about the maximum error of the sample
mean with a standard deviation of 1.045 as an estimate of the number of hours of free time of grade 11 students
between classes?
Solution:
s
Margin of error E = z α x if n ≥ 30.
2 √n
n = 40
σ = 1.045
confidence level = 99%
z α = z 0.01 = z
0.005 = 2.58
2 2

Solving for E, we have


σ
E = zα x
2 √n

1.045
E = 2.58 x
√ 40
E ≈ 0.4263 hours

This is another way of presenting a sample mean along with its error of estimate from the population mean.
Random samples from infinite populations are assumed to have normal distribution. This means that μx = μ and
σ
σx = hence
√n
x−μ
¿
z σ ,
√n

• Using these values in E, we have:


σ
• E = zα x
2 √n
7.8
• E = 1.96 x
√ 200
• E ≈ 1.08
• This means that the statisticians with 95%
confidence that their error will be at most 1.08
milk cartons.
FINALS mike jeffer cristobal domingo
gives a value of a random variable that follows the standard normal distribution. Connecting 1 - α probability
that this random variable will take a value between −z α and z α , we have,
2 2

−z α < z < z α
2 2

Moreover,
x−μ
−z α < σ < zα
2 2
√n
Isolating μ by using addition and multiplication properties of inequality we have the following:
x−z α • σ < μ< x + z α • σ
2 √n 2 √n
Confidence Interval or z-interval

• Hence, 17.4±0.55 is the average age


of the Grade 11 students enrolled in
summer.

3. The principal of a large senior high school wants to estimate the average age of grade 11 students enrolled
during summer. It is known from previous studies that the standard deviation is 2 years. A sample of 50
students shows the mean to be 17.4 years. Find the 95% confidence interval of the population mean.
Solution:
confidence level = 95%
z α = z 0.05 = z
0.025 = 1.96
2 2
Sample mean = 17.4
Standard deviation population = 2
Sample size n = 50
• Using the z-interval,
x−z α • σ < μ< x + z α • σ
2 √n 2 √n
2 2
17.4 – 1.96 • < μ< 17.4 + 1.96 •
√50 √50
2 2
17.4 – 1.96 • < μ< 17.4 + 1.96 •
√50 √50
16.85 < μ< 17.95

Determining the Minimum Sample Size for Finding a Confidence Interval for the Mean

FINALS mike jeffer cristobal domingo


• Determining the sample size can also be treated as statistical estimation. In any study, a researcher
should ask the questions: How many samples should be required to form a reliable estimate? How close
to the true mean do you want? How confident do you want to be?
• To address those concerns, we can resort to the following formula for E.
σ
• E = zα x
2 √n
• As we want to find the number of sample n, we must isolate this on one side.
• Using the algebraic manipulation, we have
σ
E = zα x Hence, to be 99% confident that the estimate
2 √n
is within 1 mm of the true mean thickness of
E √ n = z α (σ ) the cardboard, the researcher needs a sample
2
size of 60 cardboards.
z α (σ )
√n = 2
E
z α (σ )
n=( 2 )2
E
• So, this is the formula for the Minimum Sample Size to Find an interval estimate of the population
mean.
Example: A researcher of a cardboard manufacturing company would like to know the estimated thickness
of the cardboard the machine produces. How many cardboards should he measure if he wants to be 99%
confident that the estimate is accurate to 1 mm. Study shows that the standard deviation is 3 mm.
Solution:
A 99% confidence level indicates
α =1−0.99
α =0.01
α
= 0.005
2
z 0.005 = 2.58

z α (σ )
n=( 2 )2
E
2.58(3) 2
n=( )
1
n = 7.742
n = 59.91
n = 60

The Student’s T-Distribution


• We have learned that normal distribution, also known as bell-shaped distribution, can be expressed in
terms of its mean and standard deviation o. In symbols, we have the following formula:
x−μ
• z= σ
√n

FINALS mike jeffer cristobal domingo


• This is what we have been using to construct a confidence interval. In normal distribution, the

population standard deviation σ is known. However, in practice, this is seldom true. We do not often
know the population standard deviation σ . Hence, we cannot use the value of σ in the formula. And so,
instead of using the population standard deviation, we use the sample standard deviation to estimate it.
• When this happens, something fundamental changes, since the sampling standard deviation is a statistic,
which means it would vary from sample to sample. Hence it would no longer follow the standard normal
distribution. It would now follow a different distribution called the students t-distribution.
x−μ
• t= s
√n
• with n-1 degrees of freedom

• This distribution was developed and published by a statistician, W.S. Gosset under the pen name
"Student." This t-distribution is usually used when the sample size is small (n < 30).
• Recall that degree of freedom (df) is a value that represents the number of data that are free to vary.
This would mean that for every n number of samples, there is a corresponding t-distribution.

Example: Find the t-value for a 90% confidence interval with n = 10.
• This means that the shaded area we want has a sum of 10% or 0.1. We know that these 2 shaded areas
are symmetrical about the mean and thus are equal in area; hence,
• α = 0.1
• α/2 = 0.05
• This means that the area in each of the shaded region is 0.05 or 5%. Thus, we want to find t 0.05 with 10 –
1 = 9 degrees of freedom. This is conventionally written as:
• t0.05(9)
• We locate the t-value by finding the intersection of df = 9 and α = 0.05 under area in two tails.

Characteristics of the T-DISTRIBUTION

FINALS mike jeffer cristobal domingo


Example: Find the t-value of the region whose area is 0.025 with 12 degrees of freedom.
Solution:
• This means that we want to find t0.025(12).
• Since we are not locating a confidence level, we will consider the area in one tail.

• Therefore, t0.025(12) = 2.179. the distribution is illustrated visually as follows.

Example: Find the probability that the t-value is between -2.353 and 4. 541 with 3 df.

FINALS mike jeffer cristobal domingo


P (t < 4.541)

P (t < -2.353)

Using the shaded regions of the two distributions, we now have the following mathematical statement.
P(-2.353 < t < 4.541)
= P(t < 4.541) – P(t > 2.353)

• Therefore, t0.01(3) = 4.541


corresponds to P (t < 4.541)
• However, the value 0.01 is the α or the
area to the right of t.
• So, the needed region is 1 – 0.01 =
0.99
• Thus, P (t < 4.541) = 99%
= 0.99 – 0.05
= 0.94 or 94% probability

Procedure in Finding the Confidence Coefficient for t using the t-Table


1. Use the column for the appropriate confidence level.
2. Use the row for the appropriate degrees of freedom.
3. The intersection of the appropriate column and row is the confidence coefficient.

When to use z-test or t-test?

LESSON 2: ESTIMATION OF STANDARD DEVIATIONS AND PROPORTIONS


• Aside from the maximum error when estimating the mean of a population, confidence in estimating the
standard deviation of a population is also important. For example, in making medicines, the standard
deviation and variance must be kept in check so that the patients receive the correct dosage in the course
of their treatment.

FINALS mike jeffer cristobal domingo


• We now introduce the statistic distribution needed to calculate these confidence intervals. It is called the
Chi-square distribution.
2
( n−1 ) s2
• Chi-Square Statistic χ =
σ2
• Chi-square statistic χ2 (lowercase Greek letter chi, pronounced as “kigh square") is a value of a random
variable having approximately the chi-square distribution.
• The chi-square distribution is similar to the t-statistic as it makes use of degrees of freedom. Hence it is
also a family of curves as shown below.

• This is an example of the chi-square distribution. It is not symmetric, unlike normal and student’s t-
distributions. As the number of degrees of freedom increases, the distribution becomes more symmetric.
Its domain consists only of nonnegative real numbers.

1-α
α
2 2
χ α and χ 1−α are the critical valuesχ αwhich
(df) depend on the degrees of freedom and can only be obtained
2

2
from a Chi-Square table.
2 2
• χ α /2 is the critical value to the right, while χ 1−a/ 2 is the critical value to the left.

2 2
Example: Find the values of χ 1−a/ 2 and χ α/2 for a 95% confidence level when n=30.

FINALS mike jeffer cristobal domingo


Df = n – 1
= 30 – 1 = 29

To find χ 2α /2 ,
1 – 0.95 = 0.05
0.05/2 = 0.025
2
χ α /2 = 45.722

To find χ 21−a/ 2,
1 – 0.025 = 0.975
2
χ 1−a/ 2= 16.047

We have;

The probability 1 – a that a random variable having a chi-square distribution will take on a value between χ 21−a/ 2
and χ 2a /2 ,
2 2
χ 1−a/ 2 < χ 2 < χ a /2 ,
Substituting the expression for χ 2 and solving for σ 2.
2 ( n−1 ) s 2 2
χ 1−a/ 2 < 2 < χ a /2
σ

FINALS mike jeffer cristobal domingo


Example: Construct a 99% confidence interval for 𝜎, the true standard deviation of number of heartbeats per
minute of a fetus with a standard deviation of 5 from 12 mothers.
STEP 1
Df = n – 1
= 12 – 1 = 11
A 99% confidence interval would mean
a = 1 – 0.99 = 0.01

To find χ 2α /2 ,
0.01/2 = 0.005
2
χ α /2 = 26.757

To find χ 21−a/ 2,
1 – 0.005 = 0.995
2
χ 1−a/ 2= 2.603

We have:

STEP 2
• Using the values obtained from the table. To obtain the our confidence interval for σ 2 we use the values
obtained from the table in step 1, which is
( n−1 ) s 2 ( n−1 ) s 2
2 < σ2< 2
χ a /2 χ 1−a/ 2
• We have s = 5, n = 12, χ a /2=¿ 26.575, χ 21−a/ 2 = 2.603
2

( 11 ) 5 2 ( 11 ) 52
• < σ2<
26.575 2.603
2
• 10.28 < σ < 105.65
• 3.21< σ < 10.28

This means that we are 99% confident that the actual population
standard deviation of the heartbeats of fetus is between 3.21 beats
per minute to 10.28 beats minute.

FINALS mike jeffer cristobal domingo


Population Proportions

We now consider count data. This refers to data acquired by counting rather than measuring.
For example, we want to find the number of persons who are claustrophobic, the number of defective products,
and number of items in good condition.
The information available for the estimation of a population proportion (percentage, or probability) is a sample
x
proportion p̂ = where x is the number of times that an event occurred in n trials. A proportion represents a
n
part of a whole. It can be written as a fraction, decimal, or percentage. For instance, the sample proportion of 35
students out of 450 who belong to an organization is 35/450 = 7/90 = 0.077 = approx. 8%.
This means that 8% of the students belong to an organization. Proportions can also mean probabilities. This
means that there is a 0.08 probability that a student selected at random belongs to an organization.

Proportions can be acquired from samples or populations. We now define the symbols:
p = population proportion
p̂ (read "p hat") = sample proportion

Formulas involving proportions:


x
p̂ =
n
n−x
q^ =
, which is also 1 - p̂
n
Where x = number of samples that has the desired characteristic
n = sample size

Note that p̂ and q^ can be in decimals or fractions and has the following property:
p̂ + q^ = 1
Moreover, it can also be expressed as percentages with the following property:
p̂ + q^ = 100%
97% of all sample proportions
In terms of the z-distribution, we have the following ideas: say we want are intothisfind the population proportion of a
interval.
certain study with 97% confidence interval:
1.5% of all sample proportions 1.5% of all sample proportions
are in this area. are in this area.

Example 4: A study is made such that out of 300 students, 215 use the social media sites daily. Identify p̂ and q^ .
Solution:
Given: n = 300
x = 215
x
p̂ =
n
215
p̂ = = 0.72
300
This means that 72% of those surveyed use social media sites daily.
The sample proportion is p̂ = 72%
Among those surveyed, those who do not use social media sites daily is given by,
n−x 300−215
q^ = = = 0.28 or 28%
n 300
FINALS mike jeffer cristobal domingo
Similarly,
q^ = 1 - p̂ = 1 – 0.72 = 0.28 or 28%

This indicates that 28% of those surveyed do not use social media sites daily.
As with means, proportion based on a single point may not be reliable.
Hence, statisticians also use an interval estimate for a proportion.
We now have:
Margin of Error E = z α
2 √ p̂ q^
n
for a population proportion
For confidence intervals of population proportions, it should satisfy np ≥ 5 and nq ≥ 5.
Recall that,
x−μ
z=
σ
With,
μ = np and σ =√ np(1−p)

we have,
x−np
z=
√ np(1−p)
If we substitute z into the inequality,
−z α < z< z α
2 2
And apply algebraic manipulation, we have

n
x zα
- •
x

2
p(1− p)
n
x
n √
< p < + zα •
2
p(1− p)
n
But p̂ is , and using p̂ for p, therefore we have:
n
Large-sample confidence interval for p.


p̂ - z α •
2

p̂ (1− p̂ )
p(1− p)
n √
< p < p̂ + z α •
2
p(1− p)
n
Note that is called the error of a population.
n
It is actually the standard deviation of the sampling distribution of a sample proportion.
To link E and the confidence interval, p̂ - E < p < p̂ + E

Let us now use this in an example.

Example 5: A random sample of 45 students from 450 students are enrolled in summer classes. Estimate the
population proportion of students taking the classes in summer with 95% confidence interval.

Solution:
We first identify p̂ and q^ .
x 45
p̂ = = = 0.1 or 10% q^ = 1 – 0.01 = 0.9 or 90%
n 450

Note that we want a 95% confidence interval. That is,


(1 – α)100% α zα
Significance Level α
Confidence Level 2 2

0.10 (1 – 0.10)100% = 90% 0.05 or 5% z 0 .05 = 1.65

FINALS mike jeffer cristobal domingo


0.025 or z 0.025 = 1.96
0.05 (1 – 0.05)100% = 95%
2.5%

0.005 or z 0 .0 0 5 = 2.58
0.01 (1 – 0.01)100% = 99%
0.5%

Substituting the given values in the formula, the values obtained from our table.
We have z α = ∓ z =∓ 1.96 0.025
2

p̂ - z α •
2 √
p(1− p)
n √
< p < p̂ + z α •
2
p(1− p)
n
0.1 – 1.96 •

0.1 ( 0.9 )
450
< p < 0.1 + 1.96 •

0.1 – 1.96(0.0141) < p <0.1 + 1.96 (0.0141)
0.1 ( 0.9 )
450

0.1 – 0.0277 < p < 0.1 + 0.0277


0.0723 < p < 0.1277

Another way of taking our formula is the idea that the confidence interval for p is between,
p̂ ∓ E, the maximum error of estimate
p̂ ∓ z α
2 n√
^p q^

p̂ ∓ z α
2 √
^p (1− ^p )
n

That is,

p̂ ∓ z α
2 √
^p (1− ^p )
n

= 0.1 ∓ 1.96
450
= 0.1 ∓ 1.96(0.0141)

0.01 (0.9)

= 0.1 ∓ 0.0277
= 0.0723 < p < 0.1277
The population proportion is between 7.23% and 12.77%.
This means that there is about 7.239% to 12.77% of those students surveyed are enrolled in summer classes.

CHAPTER 5 LESSON 1: THE NULL AND ALTERNATIVE HYPOTHESIS


 A hypothesis test is a statistical tool or procedure that verifies a claim about a population. The two
types of claims or educated guesses are the null hypothesis and the alternative hypothesis. In this
Chapter, you will learn how to make your decision more accurate and reliable because you can use
hypothesis testing to verify something you claim as true or false.

Null Hypothesis (H0)


 A null hypothesis is a claim or proposition that tells something about the population, which can be
disapproved, rejected, or nullified, thus the word "null." It is the hypothesis that contains the condition
of equality. Technically, it is the hypothesis that states no significant difference between two parameters.

 Consider the following scenario. "According to the ancient Greeks, the Earth was the center of the solar
system based on what is called the geocentric theory, but not all Greeks believed the geocentric claim
because Aristarchus of Samos disputed that the Sun is the center of the solar system which is called the
heliocentric theory."

FINALS mike jeffer cristobal domingo


 Based on these claims, many astronomers like Galileo Galilee conducted his experiment to which he
formulated the null hypothesis with his objective to void or nullify it.

"H0: The Earth is the center of the solar system."

Alternative Hypothesis (HA)


An alternative hypothesis is a claim that negates the null hypothesis. Technically, it is the hypothesis that states
that there is a significant difference between the two parameters. Based on our previous example, Galileo had
his alternative hypothesis as:
“HA: The Earth is NOT the center of the solar system."

Some examples of null and alternative hypotheses that were done by scientists are the following:
a. H0 : The moon revolves around the earth.
HA : The moon DOES NOT revolve around the earth.
b. H0 : The earth is spherical.
HA : The earth is NOT spherical.
c. H0: The Shroud of Turin is the linen cloth that was used to wrap the body of Christ.
HA: The Shroud of Turin is fraud; it is NOT the linen cloth that was used to wrap the body of Christ.

When you set up a hypotheses that involve a parameter, some mathematical symbols will be very useful. These
are <, >, =, ≠, ≤, and ≥.

To determine whether a population mean μ is equal to some target value μ0, the hypothesis may be written in
symbols as follows.
H0 : μ = μ0; HA : μ < μ0, (left-tailed test)
H0 : μ = μ0; HA : μ > μ0, (right-tailed test)
H0 : μ = μ0; HA : μ ≠ μ0 (two-tailed test)

To determine whether one population mean μ1, is equal to another population mean μ2, the hypothesis may be
written in symbols as follows.
H0 : μ1 ≥ μ2; HA : μ < μ2
H0 : μ1 ≤ μ2; HA : μ > μ2
H0 : μ1 = μ2; HA : μ ≠ μ2
How to Set Up a Hypothesis Test: Null versus Alternative

When you set up a hypothesis test to nullify or invalidate a statistical claim, you must first know how to define
the null and alternative hypotheses. We know that the claim to be proven not true is the null hypothesis, so it is
necessary to have the alternative hypothesis to have an alternative claim once the null hypothesis is rejected.

Example 1: The claim that more than 43% of all Filipino males become bald at old age is a claim about the
proportion (parameter) of all Filipino males (population) who are bald at old age.
 H0: The proportion of Filipino males who are bald at old age is less than or equal to 43%.
 HA: The proportion of Filipino males who are bald is more than 43%.

Example 2: A sample of 25 receipts from a burger stand has a mean (in peso) μ = 5,000 and a standard
deviation (in peso) σ = 800. Use these values to test whether or not the mean sales of the burger stand are
different from 15,000.
 It is assumed that the sales is either 15,000 or not P5,000, so the null and alternative hypotheses using
the statistical shorthand
notation are as follows.
H0 : μ : = P5,000 (The mean sales is P5,000.)

FINALS mike jeffer cristobal domingo


HA : μ ≠ P5,000

Example 3:

The average time to make a ready-mix bibingka is five minutes. The statistical shorthand notation for the null
and alternative hypothesis in this case would be as follows:
H0 : μ : = 5 (The population mean is 5 minutes)
HA : μ ≠ 5
All null hypotheses include an equal sign (=, ≤, and ≥) in them. For the alternative hypothesis, there can be three
possibilities, which can be any of the following.
1. The population parameter is not equal to the claimed value. HA : μ ≠ 5. This is a two-tailed test.
2. The population parameter is greater than the claimed value. HA : μ > 5. This is a right-tailed test.
3. The population parameter is less than the claimed value. HA : μ < 5. This is a left-tailed test.

LESSON 2: THE CENTRAL LIMIT THEOREM


 It states that if more samples are taken from a population, the sampling distribution of the mean can be
approximated closely with a normal distribution. This is true no matter what the shape of the distribution
is.
x−μ
z= σ
√n
 In the given formula for z, probability problems can be solved with the following known values:
x is the mean of random sample,
μ is the population mean,
σ is the population standard deviation, and
n is the sample size.

Example 1: There are 64 pawikan hatchlings in a marine sanctuary in Batangas which can creep their way to the
sea from the shore at an average speed of 0.025 meter per second with a standard deviation of 0.012 meter per
second. Assuming the variable is normally distributed and 16 pawikan hatchlings are chosen at random, what is
the probability that they have an average speed of
a. less than 0.03 meter per second?
b. greater than 0.02 meter per second?

Solution:
A. We wish to determine the probability that the 16 pawikan hatchlings chosen at random have an average
speed of less than 0.03 meter per second. That is, P( x <0.03). The given values are μ = 0.025, σ =0.012, n=16,
and x = 0.03. By the CLT we have,
x−μ
z= σ
√n
0.03−0.025
z= 0.012
√16
z = 1.67

Therefore, P( x < 0.03) = P( z < 1.67)


= 0.5 + 0.4525
= 0.9525

Thus, there is a 95.25% chance that the 16 pawikan hatchlings chosen at random have an average speed less
than 0.03 meter per second.

B. We wish to determine the probability that the average speed is greater than 0.02 meter per second, that is, P(
x >0.02) With the given values, we have the z-value as follows:
FINALS mike jeffer cristobal domingo
x−μ
z= σ
√n
0.02−0.025
z= 0.012 = -1.67
√ 16
Therefore, P( x >¿ 0.02) = 0.5 + 0.4525
= 0. 9525
Thus, there is a 95.25% chance that the 16 hatchlings chosen at random have a speed greater than 0.02 meter
per second.

Example 2: In a vegetable show are 200 pieces of squash that weigh an average of 6 kilograms with a standard
deviation of 4 kilograms. If 4 pieces of squash are chosen at random and assuming normality, what is the
probability that they have an average weight between 4 kilograms and 12.5 kilograms?
x−μ
z= σ
√n
4−6
z = 4 = -1
√4
x−μ
z= σ
√n
12.5−6
z= 4 = 3.25
√4
P(4 < x < 12.5) = P( -1 < z < 3.25)
= 0.3413 + 0.4994
= 0.8407

Thus, there is 84.07% that the 4 pieces of squash chosen


at random have weight between 4 kg and 12.5 kg.

LESSON 3: TYPE I AND TYPE II ERRORS

Types of Statistical Errors


What are the possible errors that we can make when we test a hypothesis? The error may not imply that we did
the wrong mathematics or because of carelessness in our calculations. Theoretically, it means that we
committed an error by either rejecting the null hypothesis when it is supposedly true or not rejecting the null
hypothesis when it is actually false. We can tabulate the possible outcomes or decisions that we can make when
we test a hypothesis.

H0 is actually
Decision True False
Reject H0 Type I error Correct
Do not reject H0 Correct Type II error

Sometimes, statistics may not accurately reflect the values of parameters, the decision one make may sometimes
not reflect the reality. The table above can be stated in the following four possible outcomes.
a. Correct Decision: Rejecting the null hypothesis when in reality it is false.

FINALS mike jeffer cristobal domingo


b. Type I Error: Rejecting the null hypothesis when in reality it is true.
c. Type II Error: Not rejecting the null hypothesis when in reality it is false.
d. Correct Decision: Not rejecting the null hypothesis when in reality it is true.

You may find it helpful when giving a verdict to a criminal trial. You want to test the following:
H0: The defendant did not commit the crime.
HA: The defendant committed the crime.

 It is a universal practice that the defendant is always assumed to be innocent because the court wants to
give him/her the benefit of the doubt. This is also why defendant is called suspect, which is also tagged
as innocent. This is the same procedure when we test null hypothesis, it will only be rejected when we
find sufficient and empirical evidence against it.

 In such a case, the following errors may happen: (1) Convicting a person who, in reality, did not commit
the crime. In other words, rejecting the null hypothesis when in reality it is true (Type I Error); (2)
Acquitting a person who, in reality, committed the crime. In other words, not rejecting the null
hypothesis when in fact it is false (Type II Error).

 No one wants to spend his life in prison for a crime he did not commit. That is why, as a society, we
have decided to make the probability of committing Type I error very small by using the phrase "beyond
reasonable doubt."

Probability of Type I and Type II Errors


 The probability of Type I error, given H0, is true, is called the significance level α of the test. That is,
P(Type I error | H0, is true) = α.

 As a researcher, one gets to pick the value of a, which is thought as appropriate to the problem. The
commonly used α values are 0.01, 0.05, 0.10.

 The probability of Type II error is represented by β. The value of β depends on a number of factors such
as choice of α, sample size, and the true value of the parameter.
 The power of a test is the probability of rejecting Ho, given it is false. That is,
Power = 1 – P(Type II error) = 1 – β.

Power depends on the dependent factors of β. Since we want to decrease our chance of committing Type I error,
why not choose a very small value for α?

Identify the type of error being described by the following:


Example 1: It has been shown many times that on a certain personality test, age produces better results than
gender. However, the probability value for the data from your sample was 0.15, so you were unable to reject the
null hypothesis that age and gender produce the same results. What type of error did you make?
Solution: Type II error was committed. In this example, there is really a difference in the population between
age and gender, but you did not find a significant difference in your sample. Failing to reject a false null
hypothesis is a Type II error.

Example 2: In the population, there is no difference between boys and girls on a certain test. However, you
found a difference in your sample. The probability for the data was 0.04, so you rejected the null hypothesis.
What type of error did you make?
Solution: Type I error was committed. There is no difference in the population but you found a difference in
your sample. A Type I error occurs when a significance test results in the rejection of a true null hypothesis.

Let us illustrate an example to show how to solve the probabilities of committing Type I and Type II errors.

Example 3: Ryan claims that the class average score for a diagnostic exam in Algebra is less than 60. You
gather a sample of 51 test papers and calculate the sample average to be 57 with population standard deviation

FINALS mike jeffer cristobal domingo


12. Suppose the teacher, Ms. Coronel, knows that the true average of the exam is 58. Calculate the probabilities
of committing a Type I and Type II errors when testing Ryan's claim at the significance level of 1%.

Solution:
Setting the H0 and HA we have,
H0 : μ = 60 (the class average score of the diagnostic exam in Algebra is equal to 60.)
HA : μ < 60 (the class average scores of the diagnostic exam in Algebra is less than 60.)
This suggests that this is a left-tailed test because HA, is μ < 60.

The other given values are the following:


n = 51 (sample size)
α = 0.01 (significance level)
x =57 (sample average)
σ = 12 (population standard deviation)
μ = 60 (population average which is the assumed true)

Rejection region

zα = -z0.01

Based on the given values, the probability of Type I error is 0.01 or


α = P(Type I error|H0, is true) = 0.01.
We can now assume that the distribution of the sample means follows a normal distribution, since n= 51 (n ≥
30), which is sufficiently large according to the Central Limit Theorem.
Recall the following levels of confidence and their corresponding critical values of z as follows:
Significance Level
Significance Level α Critical value Critical value
Confidence Interval α (one-tailed) (two-tailed)
2 z (two-tailed) z (one-tailed)

0.90 0.10 0.05 1.65 1.28


0.95 0.05 0.025 1.96 1.65
0.98 0.02 0.01 2.33 1.96
0.99 0.01 0.005 2.58 2.33
We now illustrate it using standard normal curve.

Based on the table on the critical values, α = 0.01 in one tail corresponds to 2.33. That is, -zα = -2.33
We can now solve for x of the original sample distribution using z-value formula.
x−μ
zα = σ
√n
σ
x = μ + zα
√n
12
= 60 + (-2.33)
√51
= 56.08

FINALS mike jeffer cristobal domingo


We now redraw the distribution of our test.

We now draw the true mean distribution according to Ms. Coronel that the class average score is 58%.

We now standardized 56.08.

Area = 0.3729 + 0.50


Area = 0.8729
0.3729 0.50

-1.14 0 x−μ
z= σ
√n
56.08−58
z = 12 = -1.14
√51
Looking at the z-table, -1.14 corresponds to the area of 0.3729.

The probability of committing Type II error is 0.8729 or 87.29% - quite high. This implies that there is 87.29%
chance of not rejecting H0 even if it is false.

Example 4: Suppose we are randomly sampled 16 values from a normally distributed population, where σ = 8
but μ is unknown. Test the following at α = 0.05.
H0 : μ = 75
HA : μ ≠ 75

Solution:

Rejection region

nonrejection region

The alternative hypothesis suggests that the


60 case we are dealing is a two-tailed test.
56.08

Probability of
committing Type II
error

56.0 60 z 0
8
FINALS mike jeffer cristobal domingo
α 0.05
The corresponding z of ± = = 0.025 is ± 1.96 based on the table.
2 2

Convert these to raw test scores.


σ
x = μ + zα
√n
8 8
x = 75 + (-1.96)( ) x = 75 + (1.96)( )
√16 √16
= 72 = 78.92

This suggests that we reject H0 if either one of the following is true.


x < 71.08 or x > 78.92
Probability of Type II error

Rejection region Standardize the true distribution.

71.08 75 78.92 z1 0 z2
We now standardize 71.08 and 78.92.
x−μ
z= σ
√n
71.08−75
z= 8 = -1.96
√16

Rejection region Rejection region

nonrejection region

-1.96 0 1.96
x−μ
z= σ
√n
78.92−75
z= 8 = 1.96
√16
We now draw the mean distribution μ = 75.

Rejection region Rejection region

nonrejection region

71.08 75 78.92

FINALS mike jeffer cristobal domingo


Looking on the z table, beyond -1.96 corresponds 0.025 and beyond 1.96 corresponds to an area of 0.025 also.
So, in our standardized normal distribution we have,

Area = 1 – (0.025 + 0.025) = 1 – 0.05 = 0.95

We have the following probabilities:

Probability to the left 0.5 – 0.025 = 0.475


Probability to the right 0.5 – 0.025 = 0.475
Therefore, the probability of Type II error is 47.5% + 47.5% = 95%.

Area is 0.025 Area is 0.025

-1.96 0 1.96

FINALS mike jeffer cristobal domingo


Z-TABLE

FINALS mike jeffer cristobal domingo


T-TABLE

FINALS mike jeffer cristobal domingo


CHI-SQUARE TABLE

FINALS mike jeffer cristobal domingo

You might also like