STATFINALedit
STATFINALedit
STATFINALedit
Normal Curve Therefore, the probability density function for the normal
➢ The normal curve, also known as the Gaussian distribution is 0.17603
distribution or bell curve, is a probability distribution that
describes a wide range of phenomena in nature and Standard Normal Table / Z – Table
human behavior. ➢ Used to determine the area/ percentage under
consideration in a standard normal distribution.
➢ It is a symmetrical, bell-shaped curve that is defined The standard normal distribution curve can be used to
by two parameters: its mean (µ) and standard deviation ➢ This z-table gives the area between the mean and the solve a wide variety of practical problems. The only
(σ). value from 0 to 3. 99. requirement is that the variable be normally or
➢ The normal curve is also important in statistical ➢ This particular z-table provides the area between the approximately normally distributed.
inference, as it allows researchers to make probabilistic mean in the z-value.
statements about the population based on sample data. To solve problems by using the standard normal
Steps In Finding Area Under Normal Curve distribution, use the formula:
⦿ SKETCH the normal curve, locate the z- value and
label it.
⦿ ANALYZE and shade the area you are looking for
depending on the given condition.
⦿ LOCATE the given z value on the z table. Find the first
two digits (whole number & tenths) on the left side of the Example: 1. The mean number of hours an American
ztable. Look for the remaining number (hundredths) on worker spends on the computer is 3.1 hours per workday.
the top of the z-table. The intersection of this number (I & Assume the standard deviation is 0.5 hour. Find the
II) will be the area from the mean to the z-value. percentage of workers who spend less than 3.5 hours on
⦿ LABEL the shaded area. the computer. Assume the variable is normally
⦿ State your ANSWER. distributed.
Solution:
A Normal Distribution Curve As A Probability Step 1 – Draw the figure and represent the area.
Example Step 2 – Find the z value corresponding to X.
Distribution Curve
Find the probability density function for the normal
distribution where mean = 4 and standard deviation = 2 A normal distribution curve can be used as a probability
distribution curve for normally distributed variables.
and x = 3.
For probabilities, a special notation is used. For example:
Find the probability of any z value between 0 and 2.32,
this probability is written as P (0 < z < 2.32).
Step 3 – Find the area. A. PROBABILITY SAMPLING Probability sampling
means that every member of the population has a chance If you use this technique, it is important to make sure that
of being selected. It is mainly used in quantitative there is no hidden pattern in the list that might skew the
research. If you want to produce results that are sample. For example, if the HR database groups
representative of the whole population, probability employees by team, and team members are listed in order
sampling techniques are the most valid choice. of seniority.
There are four main types of probability samples.
Example: All employees of the company are listed in
alphabetical order. From the first 10 numbers, you
randomly select a starting point: number 6. From number
6 onwards, every 10th person on the list is selected (6, 16,
26, 36, and so on), and you end up with a sample of 100
people. there is a risk that your interval might skip over
people in junior roles, resulting in a sample that is skewed
towards senior employees.
3. Stratified sampling
Stratified sampling involves dividing the population into
subpopulations that may differ in important ways. It
allows you to draw more precise conclusions by ensuring
that every subgroup is properly represented in the sample.
1. Simple random sampling To use this sampling method, you divide the population
In a simple random sample, every member of the into subgroups (called strata) based on the relevant
population has an equal chance of being selected. Your characteristic (e.g., gender identity, age range, income
sampling frame should include the whole population. bracket, job role).
Random sampling can be done for relatively small
populations by drawing lots or by using a table of random Example: The company has 800 female employees and
numbers. 200 male employees. You want to ensure that the sample
To conduct this type of sampling, you can use tools like reflects the gender balance of the company, so you sort
random number generators or other techniques that are the population into two strata based on gender. Then you
based entirely on chance. use random sampling on each group, selecting 80 women
and 20 men, which gives you a representative sample of
Example: You want to select a simple random sample of 100 people.
1000 employees of a social media marketing company.
You assign a number to every employee in the company 4. Cluster sampling
CHAPTER 9: SAMPLING DISTRIBUTION database from 1 to 1000, and use a random number Cluster sampling also involves dividing the population
SAMPLING DISTRIBUTION A sampling distribution is generator to select 100 numbers. into subgroups, but each subgroup should have similar
a probability distribution of a statistic obtained from a 2. Systematic sampling characteristics to the whole sample. Instead of sampling
larger number of samples drawn from a specific Systematic sampling is similar to simple random individuals from each subgroup, you randomly select
population. It is also known as the subset of a population. sampling, but it is usually slightly easier to conduct. entire subgroups.
Any quantity obtained from a sample for the purpose of Every member of the population is listed with a number,
estimating a population parameter is called a sample but instead of randomly generating numbers, individuals A cluster sample is a simple random sample of clusters
statistic or statistic. are chosen at regular intervals. from the available clusters in the population. Clusters are
a group of elements in the population of interest.
Voluntary response samples are always at least somewhat
If it is practically possible, you might include every biased, as some people will inherently be more likely to
individual from each sampled cluster. If the clusters volunteer than others, leading to self-selection bias.
themselves are large, you can also sample individuals
from within each cluster using one of the techniques Example: You send out the survey to all students at your
above. This is called multistage sampling. university and a lot of students decide to complete it. This
can certainly give you some insight into the topic, but the
This method is good for dealing with large and dispersed people who responded are more likely to be those who
populations, but there is more risk of error in the sample, have strong opinions about the student support services,
as there could be substantial differences between clusters. so you can’t be sure that their opinions are representative
It’s difficult to guarantee that the sampled clusters are of all students.
really representative of the whole population.
3. Purposive sampling
Example: The company has offices in 10 cities across the 1. Convenience sampling This type of sampling, also known as judgement
country (all with roughly the same number of employees A convenience sample simply includes the individuals sampling, involves the researcher using their expertise to
in similar roles). You don’t have the capacity to travel to who happen to be most accessible to the researcher. This select a sample that is most useful to the purposes of the
every office to collect your data, so you use random is an easy and inexpensive way to gather initial data, but research.
sampling to select 3 offices – these are your clusters. there is no way to tell if the sample is representative of
the population, so it can’t produce generalizable results. It is often used in qualitative research, where the
Convenience samples are at risk for both sampling bias researcher wants to gain detailed knowledge about a
B. NON-PROBABILITY SAMPLING METHOD In a and selection bias. specific phenomenon rather than make statistical
non-probability sample, individuals are selected based on inferences, or where the population is very small and
non-random criteria, and not every individual has a Advantages: specific. An effective purposive sample must have clear
chance of being included. This type of sample is easier 1. Collect data quickly criteria and rationale for inclusion. Always make sure to
and cheaper to access, but it has a higher risk of sampling 2. Inexpensive methodology describe your inclusion and exclusion criteria and beware
bias. That means the inferences you can make about the 3. Easy to do research of observer bias affecting your arguments.
population are weaker than with probability samples, and 4. Low cost
your conclusions may be more limited. If you use a non- 5. Readily available sample Example: You want to know more about the opinions and
probability sample, you should still aim to make it as 6. Fewer rules to follow experiences of disabled students at your university, so you
representative of the population as possible. purposefully select a number of students with different
Example: You are researching opinions about student support needs in order to gather a varied range of data on
Non-probability sampling techniques are often used in support services in your university, so after each of your their experiences with student services.
exploratory and qualitative research. In these types of classes, you ask your fellow students to complete a survey
research, the aim is not to test a hypothesis about a broad on the topic. This is a convenient way to gather data, but 4. Snowball sampling
population, but to develop an initial understanding of a as you only surveyed students taking the same classes as If the population is hard to access, snowball sampling can
small or under-researched population. you at the same level, the sample is not representative of be used to recruit participants via other participants. The
all the students at your university. number of people you have access to “snowballs” as you
These are the four main types of non-probability sample. get in contact with more people. The downside here is also
2. Voluntary response sampling representativeness, as you have no way of knowing how
Similar to a convenience sample, a voluntary response representative your sample is due to the reliance on
sample is mainly based on ease of access. Instead of the participants recruiting others. This can lead to sampling
researcher choosing participants and directly contacting bias.
them, people volunteer themselves (e.g. by responding to
a public online survey). Example: You are researching experiences of
homelessness in your city. Since there is no list of all
homeless people in the city, probability sampling isn’t To use the formula, first figure out what you want your In this case, we can say that the sample statistics is
possible. You meet one person who agrees to participate error of tolerance to be. unbiased estimate.
in the research, and she puts you in contact with other For example, you may be happy with a confidence level
homeless people that she knows in the area. of 95 percent (giving a margin error of 0.05), or you may TWO TYPES OF ESTIMATORS
require a tighter accuracy of a 98 percent confidence level
5. Quota sampling (a margin of error of 0.02)
Quota sampling relies on the non-random selection of a
predetermined number or proportion of units. This is
called a quota. You first divide the population into CHAPTER 10: ESTIMATION
mutually exclusive subgroups (called strata) and then
recruit sample units until you reach your quota. These STATISTICAL INFERENCE → is the process by which 1. POINT ESTIMATOR ¬ A point estimator draws
units share specific characteristics, determined by you we infer population properties from sample properties. inferences about a population by estimating the value of
prior to forming your strata. The aim of quota sampling is an unknown parameter using a single value or points.
to control what or who makes up your sample. Example: Suppose a college president wishes to estimate
where:
• x̄is the sample mean
• z α∕2 is the z value providing an area of a/2 in the upper
tail of the standard normal probability distribution
• n is the sample size
• σ is the population standard deviation
Hence, one can say with 95% confidence the true mean
example: of the population is between 17.10 and 20.90 based on a
Find a 95 % confidence interval for a population mean μ sample of 50 people who play the lottery.
In an interval estimate, the parameter is specified as being for n= 36, x̄= 15.2 ,σ = 1.6
between two values. For example, an interval estimate for Since the sample mean size of n=36, the distribution of 2. SMALL SAMPLES (n < 30) In this case we use the t
the average of all students might be 26.9 < μ < 27.7. the sample mean x̄is approximately normally distributed distribution to obtain the confidence level.
with mean μ and standard error (σ/√n). The approximate
The confidence interval is a specific interval estimate of 95% confidence interval is
a parameter determined by using data obtained from a STEP 1: Determine the confidence coefficients or the
sample and by using the specific confidence level of the critical values (za/2)
estimate. Three common confidence intervals are used: *look at z-table • x̄is the sample mean
the 90, the 95, and the 99% confidence intervals. STEP #2: Find the lower and upper confidence limit • t α∕2 are values found in the t-table that are proportions
to the areas in the two tails of the curve, called the critical
values
• Additional information: Degrees of freedom refer to the
maximum number of logically independent values, which
may vary in a data sample.
• n is the sample size
• σ is the population standard deviation
STEP #3: Interpret the results Hence, one can say with
Steps In Finding The Confidence Interval 95% confidence the true mean of the population is
STEP #1: Determine the confidence coefficients or the between 14.68 and 15.72.
critical values (za/2)
STEP #2: Find the lower and upper confidence limits EXAMPLE:
STEP #3: Interpret the results. A researcher wishes to estimate the average amount of Steps In Finding The Confidence Interval
money a person spends on lottery tickets each month. A STEP #1: Determine the confidence coefficients or the
Confidence Intervals For The Mean sample of 50 people who play the lottery found the mean critical values (ta/2)
STEP #2: Find the lower and upper confidence limits STEP #2: Find the lower and upper confidence limits expect to capture the population parameter with repeated
STEP #3: Interpret the results sampling.
EXAMPLE: The Statistician of BISCAST wants to know The confidence interval for p is completed using this
the mean age of entering mathematics majors. He formula:
computed the mean age of 18 years and standard
deviation of 1.4 years on a random sample of 25 entering STEP #3: Interpret the results
Mathematics majors coming from a normally distributed Thus, we can say that 95% confidence that the interval
population. With 99% confidence, find the interval between 172.97 and 187.02 obtain the true mean weight
estimate of the population mean. of dark chocolate bars based on 20 samples of dark Now,
chocolate. let’s estimate the population proportion with 85%
STEP #1: Determine the confidence coefficients or the confidence with the given n=500 and p^=0.84
critical values (ta/2) Confidence Intervals for Proportions
n = 25 The confidence limits for the population proportion are SOLUTION: Step 1: Calculate the Standard Error
df = n - 1 = 25 – 1 = 24 given by
The coefficient for this values is 2.797.
(base from t table)
Sample Proportion
EXAMPLE #1: Suppose a random sample of 500 students at BISCAST Step 3: Find the tail area and find it to the less than z-table
The average weight of 20 dark chocolate bars selected agree or disagree to the statement given by the researcher.
from a normally distributed population is 180g with a This 0.84 is a point estimate for the population
standard deviation of 15g. Find the interval estimate proportion. And the sample proportion for failures is 1-
using the 95% confiedence interval. 0.84 which is 0.16.
STEP #1: Determine the confidence coefficients or the
critical values (ta/2) It gives us an idea of what the population might be.
n = 20 However, it does not tell us that 86% of the population
df = n - 1 = 20 – 1 = 19 will agree to the statement.
The coefficient for this values is 2.093.
(base from t table) To infer or generalize about the population we construct
interval that is, a range of values that we expect to capture
the population parameter, p at some confidence level. The
confidence level represents the proportion of times we
Step 3: Now, substitute all the given values in the Step 2: Now, using these values find the P1 and P2
Step 4: Compute for the confidence interval formulation and compute for the two different population
mean
a. Find the sum of each row and each column, and find
the grand total, as shown.
Step 1: State the hypothesis
H0: There is no significant difference between the
opinions between male and female toward the candidate.
H1: There is a significant difference between the opinions
Step 4: Make a decision between male and female toward the candidate.
The decision is to reject the null hypothesis since 26.67 > Step 2: Determine the critical value
b. For each cell, multiply the corresponding row sum by 5.991. degree of significance = (R - 1)(C - 1)
the column sum and divide by the grand total, to get the = (2 - 1)(2 - 1)
expected value: = (1)(1)
degree of significance = 1 level of significance = 1% or
0.01
c. For example, for C 1,2, the expected value, denoted by Tabular value is 6.64
E1,2, is (refer to the previous tables) Step 3: Compute the test value using the chi-square
Step 5: Summarize the result test for homogeneity.
The conclusion is that there is enough evidence to support
Computing for x2 we have:
the claim that opinion is related to (dependent on)
profession—that is, that the doctors and nurses differ in
their opinions about the procedure.
I. OBJECTIVES
II. CONTENT
HYPOTHESIS
● A premise or claim that we want to test.
● A statistical hypothesis is a conjecture about a population parameter. This
conjecture may or may not be true.
● A claim or statement about a population parameter.
HYPOTHESIS TESTING
● Hypothesis testing in statistics is a way for you to test the results of a survey or
experiment to see if you have meaningful results. You’re basically testing whether
your results are valid by figuring out the odds that your results have happened by
chance.
● It is a decision – making process for evaluating claims about a population.
Null Hypothesis
● States that there is no difference between a parameter and specific value or that
there is no difference between two parameters.
● It shows no significant difference, no changes, nothing happened, no relationship
between two parameters.
● Currently accepted value for a parameter.
● It is the initial claim and represented by H0.
Alternative Hypothesis
● States a specific difference between a parameter and a specific value or states
that there is a difference between two parameters.
● It shows that there is significant difference, an effect, change, relationship
between a parameter and specific value.
● It involves the claim to be tested and it is also called a research hypothesis.
● It is contrary to the null hypothesis and represented by Ha or H1.
Example:
State the null and the alternative hypotheses for each conjecture.
1. It is believed that a candy machine makes chocolate bars that are on average 5
grams. A worker claims that the machine after maintenance no longer makes 5
grams bars.
Solution:
H0 : μ = 5 grams
Ha : μ ≠ 5 grams
2. Doctors believe that the average teen sleeps on average no longer than 10 hours
per day. A researcher believes that teens on average sleep longer.
Solution:
H0 : μ ≤ 10 hours
Ha : μ > 10 hours
3. A researcher feels that advertising on television will change the buying preferences
of young adults for a certain product. The researcher is not sure whether the sales
will increase or decrease. In the past, the mean sales was P500, 000.
Solution:
H0 : μ = P500, 000
Ha : μ ≠ P500, 000
STATISTICAL TEST
● Uses the data obtained from a sample to make a decision whether or not the null
hypothesis should be rejected. The numerical value obtained from a statistical test
is called the test value.
● A statistical test provides a mechanism for making quantitative decisions about a
process or processes. The intent is to determine whether there is enough evidence
to "reject" a conjecture or hypothesis about the process. The conjecture is called
the null hypothesis.
Types of Errors
In the hypothesis testing situation, there are four possible outcomes. The four
possible outcomes are shown below:
True False
Reject Null
Type I Error Correct Decision
Hypothesis (H0 )
Remember:
● A type I error occurs if one rejects the null hypothesis when it is true.
● A type II error occurs if one does not reject the null hypothesis when it is false.
Examples:
1. Suppose the null hypothesis is: Ben’s used car is safe to drive. Which statement
represents a type I error and a type II error?
a. Ben thinks that his car may be safe when, in fact, it is not safe.
b. Ben thinks that his car may be safe when, in fact, it is safe.
c. Ben thinks that his car may not be safe when, in fact, it is not safe.
d. Ben thinks that his car may not be safe when, in fact, it is safe.
Answer:
➔ Letter d is the statement that represents type I error because it rejects the
null hypothesis even though it is true.
➔ Letter a is the statement that represents type II error because it does not
reject the null hypothesis when it is false.
2. In a criminal court case, the null hypothesis is that the defendant is presumed
innocent. Which statement represents a type I error and a type II error?
a. The jury believes that the defendant is guilty when, in fact, he is innocent.
b. The jury believes that the defendant is guilty when, in fact, he is not
innocent.
c. The jury believes that the defendant is not guilty when, in fact, he is not
innocent.
d. The jury believes that the defendant is not guilty when, in fact, he is
innocent.
Answer:
➔ Letter a is the statement that represents type I error because it rejects the
null hypothesis when it is true.
➔ Letter c is the statement that represents type II error because it does not
reject the null hypothesis when it is false.
Level of Confidence
● It is the percentage of times you expect to get close to the same estimate if you
run your experiment again or resample the population in the same way.
● The most common confidence levels are 90%, 95% and 99%.
Level of Significance
The level of significance is the maximum probability of committing a type I error.
This probability is symbolized by “α” (alpha). That is P (type I error) = α. Researchers
generally agree on using three arbitrary significance: 0.10, 0.05 and 0.01 level. When α
= 0.10, there is 10% chance of rejecting a true null hypothesis; when α = 0.05, there is
5% chance of rejecting a true null hypothesis and when α = 0.01, there is 1% chance of
rejecting a true null hypothesis.
● It can also represent α (alpha) = 1 – C (level of confidence).
For example:
The level of confidence is 95%
C = 0.95
α = 1 – 0.95
α = 0.05
Remember:
● The level of confidence and level of significance are related to each other since
C and α specify the same thing level. They are both telling how sure you are
and not making the right decision. There are some problems that will specify
the level of confidence and some will specify the level of significance.
CRITICAL VALUES
● The critical value (s) separates the critical region from the non-critical region.
● The critical or rejection region is the range of values of the test value that
indicates that there is a significant difference and that the null hypothesis should
be rejected.
● The non-critical or non-rejection region is the range of values of the test value
that indicates that the null hypothesis should not be rejected.
● The critical value can be on the right side of the mean or on the left side of the
mean for a one-tailed test. Its location depends on the inequality sign of the
alternative hypothesis.
● A one-tailed test indicates that the null hypothesis should be rejected when the
test value is in the critical region on one side of the mean. A one-tailed test is
either right-tailed or left-tailed, depending on the direction of the inequality of the
alternative hypothesis.
● A two-tailed test, in statistics, is a method in which the critical area of a distribution
is two-sided and tests whether a sample is greater than or less than a certain range
of values. It is used in null-hypothesis testing and testing for statistical significance.
● Right tailed test is also called the upper tail test. A hypothesis test is performed if
the population parameter is suspected to be greater than the assumed parameter
of the null hypothesis.
● A left-tailed test is used when the alternative hypothesis states that the true value
of the parameter specified in the null hypothesis is less than the null hypothesis
claims.
Decision Criteria:
● Reject the null hypothesis if test statistic > t critical value (right-tailed hypothesis
test).
● Reject the null hypothesis if test statistic < t critical value (left-tailed hypothesis
test).
● Reject the null hypothesis if the test statistic does not lie in the acceptance region
(two-tailed hypothesis test).
Example:
Find the critical value of the following.
1. The confidence level is 5%. Find the critical value of “Z” for Right-Tailed Test
Solution:
Step 1: C = 5% = 0.05
Step 3: Find the 0.95 in the z-table and add the value you get.
1.6+ 0.05 = 1.65
Step 4: Find the 0.925 in the z-table and add the value you get.
1.4 + 0.04 = 1.44
Example:
1. Measure participants' weight before and after the diet counseling course.
2. Measured the performance of 10 participants in a spelling test before and after
they underwent a new form of computerized teaching method to improve spelling.
3. Compute the difference of their scores from the pre-test and post-test.
Assumptions:
1. Your dependent variable should be measured on a continuous scale.
2. Related samples/group. This means that the subjects in the first group are also in
the second group.
3. No significant outliers in the two groups.
4. Approximately normally distributed.
Formulas:
To find t-value:
𝐷𝐷 − µ
𝐷𝐷
𝑡𝑡 = 𝑠𝑠
𝐷𝐷
𝑛𝑛
where:
𝐷𝐷 = average of difference (D)
µ = difference between the means of the two variables = assume it is zero
𝐷𝐷
𝑠𝑠 = standard deviation of the difference
𝐷𝐷
n = sample size
Requirements: D, 𝐷𝐷 and SD
To find D:
𝐷𝐷 = 𝑥𝑥 − 𝑥𝑥
1 2
where:
To find 𝐷𝐷:
Σ𝐷𝐷
𝑛𝑛
where:
Σ𝐷𝐷 = summation of difference
𝑛𝑛 = sample size
To find SD:
2 2
𝑛𝑛Σ𝐷𝐷 −(Σ𝐷𝐷)
𝑛𝑛(𝑛𝑛−1)
where:
𝑛𝑛 = sample size
2
Σ𝐷𝐷 = summation of squared difference
Σ𝐷𝐷 = summation of difference
𝑛𝑛
Example 1:
A math teacher wishes to see whether a new program will reduce the number of
errors that the students make when solving worded problems. The data are shown here.
At α = 0.05, can it be concluded that the number of errors has been reduced?
Student 1 2 3 4 5 6
Errors Before 12 9 0 5 4 3
Errors After 9 6 1 3 2 3
Step 1: State the hypothesis.
H0: There is no significant difference between the average number or errors before and
after the new program.
H1: There is a significant difference between the average number or errors before and
after the new program.
Step 2: Find the type of test, degree of freedom, and critical value.
Type of Test: One-tailed Test (Right-tailed Test)
Student 1 2 3 4 5 6
Errors Before 12 9 0 5 4 3
Errors After 9 6 1 3 2 3
D 3 3 -1 2 2 0
D2 9 9 1 4 4 0
Given: n = 6
Find Σ𝐷𝐷: 2
Find Σ𝐷𝐷 :
Σ𝐷𝐷 = 3 + 3 + (− 1) + 2 + 2 + 0 2
Σ𝐷𝐷 = 9 + 9 + 1 + 4 + 4 + 0
Σ𝐷𝐷 = 9 2
Σ𝐷𝐷 = 27
Find 𝐷𝐷:
𝐷𝐷 = Σ𝐷𝐷 = 9 = 3
𝑛𝑛 6 2
𝐷𝐷 = 1. 5
Find 𝑠𝑠 :
𝐷𝐷
2 2
𝑛𝑛Σ 𝐷𝐷−(Σ𝐷𝐷 )
𝑠𝑠 =
𝐷𝐷 𝑛𝑛(𝑛𝑛−1)
2
(6)(27)−(9)
= (6)(6−1)
162−81
= (6)(5)
81
= 30
= 2. 7
𝑠𝑠 = 1. 64
𝐷𝐷
Find t-value:
𝐷𝐷 − µ
𝑡𝑡 = 𝑠𝑠𝐷𝐷
𝐷𝐷
𝑛𝑛
1.5 − 0
𝑡𝑡 = 1.64
6
1.5
𝑡𝑡 = 1.64
2.45
1.5
𝑡𝑡 = 0.67
𝑡𝑡 = 2. 24
A teacher, after seeing the poor mathematics scores of the class, decides to
conduct special tutoring for the subject. The test was out of 10. She then compares the
before and after scores of the students. The alpha is to be assumed as 0.05. The aim is
to find whether the special tutoring was effective or not and the results come out as
follows:
Before 7 6 5 4 4 6 7 5 5 7
After 9 10 7 5 7 5 9 6 8 7
Step 2: Find the type of test, degree of freedom, and critical value.
Type of Test: Two-tailed Test
Before 7 6 5 4 4 6 7 5 5 7
After 9 10 7 5 7 5 9 6 8 7
D -2 -4 -2 -1 -3 1 -2 -1 -3 0
D2 4 16 4 1 9 1 4 1 9 0
Given: n=10
2
Find Σ𝐷𝐷 :
2
Σ𝐷𝐷 = 4 + 16 + 4 + 1 + 9 + 1 + 4 + 1 + 9 + 0
2
Σ𝐷𝐷 = 49
Find 𝐷𝐷:
𝐷𝐷 = Σ𝐷𝐷𝑛𝑛= −17 10
𝐷𝐷 =− 1. 7
Find 𝑠𝑠 :
𝐷𝐷
2 2
𝑛𝑛Σ 𝐷𝐷−(Σ𝐷𝐷 )
𝑠𝑠 =
𝐷𝐷 𝑛𝑛(𝑛𝑛−1)
2
(10)(49)−(−17)
= (10)(10−1)
490−289
= (10)(9)
201
= 90
= 2. 23
𝑠𝑠 = 1. 49
𝐷𝐷
Find t-value:
𝐷𝐷 − µ
𝑡𝑡 = 𝑠𝑠𝐷𝐷
𝐷𝐷
𝑛𝑛
−1.7 − 0
𝑡𝑡 = 1.49
10
−1.7
𝑡𝑡 = 1.49
3.16
−1.7
𝑡𝑡 = 0.47
𝑡𝑡 =− 3. 62
Step 4: Make the decision.
Reject the H0 , since -3.62 < -2.2622
Example:
● The effectiveness of two different diets on two different groups of individuals.
● Comparing the height of students in two different schools.
Caution!! The t-test can be used when the population standard deviations are not known
and the sample size is smaller (less than 30).
Assumptions:
1. Independence of the observations. Each subject should belong to only one
group. There is no relationship between the observations in each group.
2. No significant outliers in the two groups.
3. Normality. The data for each group should be approximately normally distributed.
4. Homogeneity of variances. The variance of the outcome variable should be
equal in each group.
Formula for the t-test – For Testing the differences Between Two Means- Independent
Samples.
Variances are assumed to be unequal.
Example:
According to Nielsen Media Research, children ( ages 2-11 ) spend an average of
21 hours and 30 minutes watching television per week while teens (ages 12-17) spend
an average of 20 hours and 40 minutes . Based on the sample statistics obtained below,
is there sufficient evidence to conclude a difference in average television watching times
between the two groups? Use α= 0.05.
Children Teens
Sample mean 22. 45 18.50
Sample variance 16. 4 18. 2
Sample size 15 15
State only the null and the alternative hypotheses for each conjecture. (1 point
each)
a. An instructor feels that using a module will enhance the performance of his
students in clinical psychology. In the past, the average grade of the students was
75.
b. The school board claims that at least 60% of students bring a phone to school. A
teacher believes this number is too high and randomly samples 25 students to test
at a level of significance of 0.02.
c. A company has stated that their straw machine makes straw that are 4 mm
diameter. A worker believes the machine no longer makes straw of this size and
samples 100 straws to perform a hypothesis test with 99% confidence.
C. Identification
1. Critical Value
a. Solve and illustrate the critical value of Z. Find the critical value of “Z” for
two-tailed test for alpha of 39%. (5 points)
2. t-test Dependent
Salary Wizard is an online tool that allows you to look up incomes for
specific jobs for cities in the United State. We looked up the 25th percentile for
income for six jobs in two cities: Boise, Idaho, and Los Angeles, California. The
data are below:
1 2 3 4 5 6
3. t-test Independent
IV. A statistics teacher wants to compare his two classes to see if they perform any differently
on the tests he gave that semester. Class A had 25 students with an average score of 70,
standard deviation 15. Class B had 20 students with an average score of 74 , standard
deviation 25. Using alpha O.O5 , did these two classes perform differently on the tests?
(15 points)
CHAPTER 12: ANALYSIS OF VARIANCE (ANOVA)
I. OBJECTIVES
II. CONTENT
ANOVA Notation
BETWEEN-GROUP VARIATION
WITHIN-GROUP VARIATION
2 ( 𝑖𝑖 ) 𝑖𝑖
Σ 𝑛𝑛 −1 𝑠𝑠2
𝑠𝑠 =
𝑊𝑊 Σ(𝑛𝑛𝑖𝑖−1)
These terms are used to summarize the analysis of variance and are placed
in a summary table, as shown in Table 1.
Table 1. Analysis of Variance Summary Table
Total
𝑆𝑆𝑆𝑆 𝑀𝑀𝑀𝑀
𝑆𝑆𝑆𝑆 = 𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑀𝑀𝑀𝑀 = 𝐵𝐵
𝐹𝐹 = 𝑀𝑀𝑀𝑀
𝐵𝐵
𝐵𝐵 𝐵𝐵 𝑘𝑘−1
𝑊𝑊
𝑆𝑆𝑆𝑆
=
𝑆𝑆𝑆𝑆 = 𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑤𝑤𝑤𝑤𝑤𝑤ℎ𝑖𝑖𝑖𝑖 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑀𝑀𝑀𝑀 𝑊𝑊
𝑊𝑊 𝑊𝑊 𝑁𝑁−𝑘𝑘
● One-Way ANOVA
It has one independent variable with two levels and is used when you
want to test two groups to see if there’s a difference between them.
● Two-Way ANOVA
It has two independent variables which can have multiple levels and
is used when you want to know how two independent variables, in
combination, affect a dependent variable.
( )
2
Σ𝑛𝑛𝑖𝑖 𝑥𝑥 𝑖𝑖−𝑥𝑥 𝐺𝐺𝐺𝐺
𝑀𝑀𝑀𝑀 =
𝐵𝐵
𝑘𝑘−1
( 𝑖𝑖 ) 𝑖𝑖
Σ 𝑛𝑛 −1 𝑠𝑠2
𝑀𝑀𝑀𝑀 =
𝑊𝑊 Σ(𝑛𝑛𝑖𝑖 −1)
“Post hoc” (Latin, meaning “after this”) means to analyze the results of
your experimental data. They are often based on a familywise error rate; the
probability of at least one Type I error in a set (family) of comparisons.
Post hoc test will only be conducted when the p-value for the ANOVA is
statistically significant. If the p-value is not statistically significant, meaning, the
means for all of the groups are not different from each other, thus there is no
need to conduct a post hoc test to find out which groups are different from each
other.
BONFERRONI PROCEDURE
The specifics of the study will determine whether or not to apply the
Bonferroni correction. We can use the method in certain circumstances, such as
if:
Formula:
Where:
Even though you are comparing three or more means in this use of the F
test, variances are used in the test instead of means.
With the F test, two different estimates of the population variance are
made, namely, between-group variance, and within-group variance.
Between-Group Variance
➢ It is the first estimate that involves finding the variance of the means.
Within-Group Variance
➢ It is the second estimate made by computing the variance using all the
data and is not affected by differences in the means.
Solution:
H0 = μ1 = μ2 = μ3 (claim)
d.f.N. = k – 1 = 3 – 1 = 2
d.f.D. = N – k = 30 – 3 = 27
The critical value is 3.35 with α = 0.05
a.
7 4 6
9 3 1
5 6 3
8 2 5
6 7 3
8 5 4
6 5 6
10 4 5
7 1 7
4 3 3
x̄ 1 =7 x̄ 2 =4 x̄ 3 = 4.3
2 2 2
s1 = 3.33 s2 = 3.33 s3 = 3.34
𝑥𝑥 =
70 + 40 + 43
𝐺𝐺𝐺𝐺 30
𝑥𝑥 =
153
𝐺𝐺𝐺𝐺 30
𝑥𝑥 = 5. 1
𝐺𝐺𝐺𝐺
𝑀𝑀𝑀𝑀 =
𝐵𝐵
(
Σ𝑛𝑛𝑖𝑖 𝑥𝑥 𝑖𝑖−𝑥𝑥 𝐺𝐺𝐺𝐺 )
𝑘𝑘−1
2 2 2
2 2 2
54.6
𝑀𝑀𝑀𝑀 =
𝐵𝐵 2
𝑀𝑀𝑀𝑀𝐵𝐵 = 27. 3
d. Find the within-group variance.
𝑆𝑆𝑆𝑆
𝑀𝑀𝑀𝑀 = 𝑊𝑊
𝑊𝑊 𝑁𝑁−𝑘𝑘
( 𝑖𝑖 ) 𝑖𝑖
Σ 𝑛𝑛 −1 𝑠𝑠2
𝑀𝑀𝑀𝑀 =
𝑊𝑊 Σ(𝑛𝑛𝑖𝑖 −1)
90
𝑀𝑀𝑀𝑀 =
𝑊𝑊 27
𝑀𝑀𝑀𝑀 = 3. 33
𝑊𝑊
𝑠𝑠2
𝑊𝑊
𝑀𝑀𝑀𝑀
𝐹𝐹 = 𝑀𝑀𝑀𝑀
𝐵𝐵
𝑊𝑊
27.3
𝐹𝐹 = 3.33
𝐹𝐹 = 8. 198
Since the F-test value is greater than the critical value, 8.198 > 3.35,
therefore, reject the null hypothesis.
There is enough evidence to reject the null hypothesis and conclude that at
least one of the three samples has significantly different means and thus
belongs to an entirely different population.
Limitations of One-Way ANOVA
A one-way ANOVA tells us that at least two groups are different from each
other. But it won’t tell us which groups are different. If our test returns a significant
f-statistic, we may need to run a post-hoc test to tell us exactly which groups differ
in means.
Step 1. Input your data into columns or rows in Excel. For example, if three groups
of students for music treatment are being tested, spread the data into three
columns.
Step 2. Click the “Data” tab and then click “Data Analysis.” If you don’t see Data
Analysis, load the ‘Data Analysis Toolpak’ add-in.
Step 4. Type an input range into the Input Range box. For example, if the data is
in cells A1 to C10, type “A1:C10” into the box. Check the “Labels in the first row” if
we have column headers, and select the Rows radio button if the data is in rows.
Step 5. Select an output range. For example, click the “New Worksheet” radio
button.
Step 6. Choose an alpha level. For most hypothesis tests, 0.05 is standard.
Step 7. Click “OK.” The results from ANOVA will appear in the worksheet.
Another measure for ANOVA is the p-value. If the p-value is less than
the alpha level selected (which it is, in our case), we reject the Null Hypothesis.
Now to check which samples had different means, we will take the
Bonferroni approach and perform the post hoc test in Excel through the following
steps:
Step 8. Again, click on “Data Analysis” in the “Data” tab and select “t-Test: Two-
Sample Assuming Equal Variances,” and click “OK.”
Step 9. Input the range of the Class A column in the Variable 1 Range box and the
range of the Class B column in the Variable 2 Range box. Check the “Labels” if
you have column headers in the first row.
Step 10. Select an output range. For example, click the “New Worksheet” radio
button.
Step 11. Perform the same steps (step 8 to step 10) for Columns of Class B –
Class C and Class A – Class C.
ANOVA stands for analysis of variance and tests for differences in the
effects of independent variables on a dependent variable. A two-way ANOVA test
is a statistical test used to determine the effect of two nominal predictor variables
on a continuous outcome variable.
The two-way ANOVA summary table is set up as shown in the table below:
Correction term (𝐶𝐶 )
𝑥𝑥
2
(Σ𝑥𝑥)
(𝐶𝐶 ) = 𝑁𝑁𝑁𝑁. 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
𝑥𝑥
For Factor A
2
(Σ𝑋𝑋 )
𝑆𝑆𝑆𝑆 = 𝐴𝐴
− (𝐶𝐶 )
𝐴𝐴 𝑁𝑁𝑁𝑁. 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑥𝑥
For Factor B
2
(Σ𝑋𝑋 )
𝑆𝑆𝑆𝑆 = 𝐵𝐵
− (𝐶𝐶 )
𝐵𝐵 𝑁𝑁𝑁𝑁. 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑥𝑥
For 𝑆𝑆𝑆𝑆
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇
2
𝑆𝑆𝑆𝑆 = Σ(𝑋𝑋) − (𝐶𝐶 )
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑋𝑋
For Interaction
𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑆𝑆 − 𝑆𝑆𝑆𝑆 − 𝑆𝑆𝑆𝑆 − 𝑆𝑆𝑆𝑆
𝐴𝐴𝐴𝐴𝐴𝐴 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝐴𝐴 𝐵𝐵 𝑊𝑊
The assumptions for the two-way analysis of variance are basically the
same as those for the one-way ANOVA, except for sample size.
Assumption of Two-Way ANOVA
EXAMPLE:
A researcher wishes to see whether the type of gasoline used and the type of
automobile driven have any effect on gasoline consumption. Two types of gasoline,
regular and high-octane, will be used, and two types of automobiles, two-wheel-
and four-wheel-drive, will be used in each group. There will be two automobiles in
each group, for a total of eight automobiles used. Using a two-way analysis of
variance, the researcher will perform the following steps.
Solution:
Step 1: State the hypotheses. The hypotheses for the interaction are these:
H𝑜𝑜: There is no interaction effect between type of gasoline used and type
of automobile a person drives on gasoline consumption.
H1 : There is an interaction effect between type of gasoline used and type
of automobile a person drives on gasoline consumption.
Step 3: Complete the ANOVA summary table to get the test values.
2
(Σ𝑥𝑥)
(𝐶𝐶 ) = 𝑁𝑁𝑁𝑁. 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
𝑥𝑥
= (26.7+25.2+28.6+29.3+32.3+32.8+26.1+24.2
8
= 6339.38
SUM OF SQUARES (SS)
For Factor A
2
(Σ𝑋𝑋 )
𝑆𝑆𝑆𝑆 = 𝐴𝐴
− (𝐶𝐶 )
𝐴𝐴 𝑁𝑁𝑁𝑁. 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑥𝑥
2 2
(109.8 + 115.4 )
= 4
− 6339. 38 = 6343. 3 − 6339. 39 = 3.92
For Factor B
2
(Σ𝑋𝑋 )
𝑆𝑆𝑆𝑆 = 𝐵𝐵
− (𝐶𝐶 )
𝐵𝐵 𝑁𝑁𝑁𝑁. 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑥𝑥
2 2
(117 + 108.2 )
= 4
− 6339. 39 = 6349. 06 − 6339. 39 = 9.68
2
𝑆𝑆𝑆𝑆 = Σ (𝑥𝑥 − 𝑥𝑥)
𝑊𝑊
For 𝑆𝑆𝑆𝑆
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇
2
𝑆𝑆𝑆𝑆 = Σ(𝑋𝑋) − (𝐶𝐶 )
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑋𝑋
2
= (26. 7 + 25. 2 + 28. 6 + 29. 3 + 32. 3 + 32. 8 + 26. 1 + 24. 2)
For Interaction
𝑆𝑆𝑆𝑆
54.080
𝑀𝑀𝑀𝑀𝐴𝐴𝐴𝐴𝐴𝐴 = 𝐴𝐴𝐴𝐴𝐴𝐴
(𝑎𝑎−1)(𝑏𝑏−1)
= (2−1)(2−1)
= 54. 080
𝑆𝑆𝑆𝑆
3.300
𝑀𝑀𝑀𝑀 = 𝑊𝑊
= = 0. 825
𝑊𝑊 𝑎𝑎𝑎𝑎(𝑛𝑛−1) 4
F VALUE
𝑀𝑀𝑀𝑀
3.920
𝐹𝐹𝐴𝐴 = 𝑀𝑀𝑀𝑀
𝐴𝐴
= 0.825
= 4. 752
𝑊𝑊
𝑀𝑀𝑀𝑀
9.680
𝐹𝐹 = 𝑀𝑀𝑀𝑀
𝐵𝐵
= = 11. 733
𝐵𝐵 0.825
𝑊𝑊
𝑀𝑀𝑀𝑀
54.080
𝐹𝐹 = 𝑀𝑀𝑀𝑀
𝐴𝐴𝐴𝐴𝐴𝐴
= = 65. 552
𝐴𝐴𝐴𝐴𝐴𝐴 0.825
𝑊𝑊
7.71 , the null hypotheses concerning the type of automobile driven and the
interaction effect should be rejected. Since the interaction effect is statistically
significant no decision should be made about the automobile type without further
investigation.
Step 1: Click the “Data” tab and then click “Data Analysis.” If you don’t see the
Data analysis option, install the Data Analysis Toolpak.
Step 2: Click “ANOVA two factor with replication” and then click “OK.” The two-
way ANOVA window will open.
Step 3: Type an Input Range into the Input Range box. For example, if your data
is in cells A1 to A25, type “A1:A25” into the Input Range box. Ensure you include
all of your data, including headers and group names.
Step 4: Type a number in the “Rows per sample” box. Rows per sample is actually
a bit misleading. What this is asking you is how many individuals are in each group.
For example, if you have 5 individuals in each age group, you would type “5” into
the Rows per Sample box.
Step 5: Select an Output Range. For example, click the “new worksheet” radio
button to display the data in a new worksheet.
Step 6: Select an alpha level. In most cases, an alpha level of 0.05 (5 percent)
works for most tests.
Step 7: Click “OK” to run the two-way ANOVA. The data will be returned in your
specified output range.
Step 8: Read the results. To figure out if you are going to reject the null hypothesis
or not, you’ll basically be looking at two factors:
Note: We don’t only have to have two variables to run a two-way ANOVA in Excel
2013. We can also use the same function for three, four, five, or more variables.
❖ When there are two independent variables, the analysis of variance is called
a two-way ANOVA.
❖ The two-way ANOVA enables the researcher to test the effects of two
independent variables and a possible interaction effect on one dependent
variable.
The results for the two-way ANOVA test on our example look like this:
As you can see in the highlighted cells in the image above, the F-value for
sample and column, i.e., factor 1 (music) and factor 2 (age), respectively, are
higher than their F-critical values. This means that the factors significantly affect
the students’ results, and thus we can reject the null hypothesis for the factors.
Also, the F-value for interaction effect is quite less than its F-critical value,
so we can conclude that music and age did not have any combined effect on the
population.
SUMMARY
A. IDENTIFICATION
8. It is the first estimate that involves finding the variance of the means.
10 6 5
12 8 9
9 3 12
15 0 8
13 2 4
2. A reputed marketing agency in India has three different training programs for its
salesmen. The three programs are Method – A, B, and C. To assess the success
of the programs, 4 salesmen from each of the programs were sent to the field.
Their performances in terms of sales are given in the following table. Test whether
there is a significant difference among methods and among salesmen.
METHODS
SALESMAN
A B C
1 4 6 2
2 6 10 6
3 5 7 4
4 7 5 4
BICOL STATE COLLEGE OF APPLIED SCIENCES AND TECHNOLOGY
Penafrancia Ave., Penafrancia, Naga City
Academic School Year 2022 - 2023
1 12 5 12 11 40
2 5 14 14 11 44
3 5 6 12 11 34
4 10 8 13 13 44
5 15 20 5 5 45
Total 47 53 56 51 207
Test the significance of the difference between the observed frequencies and the expected
frequencies at 1% level of significance.
BICOL STATE COLLEGE OF APPLIED SCIENCES AND TECHNOLOGY
Penafrancia Ave., Penafrancia, Naga City
Academic School Year 2022 - 2023
2. As a researcher, you want to examine if the proportion of students who drive their own
cars are the same with those students who drive their parents’ cars at the two schools:
BISCAST and CBSUA. Use 0.05 as the level of significance.
CamScanner
CamScanner
CamScanner
CamScanner
CamScanner
CamScanner
CamScanner
CamScanner
CamScanner
CamScanner
CamScanner