Hypothesis and Hypothesis Testing
Hypothesis and Hypothesis Testing
Hypothesis and Hypothesis Testing
Definition of Hypotheses 2
Hypothesis Testing 4
Decision Errors 6
Decision Rules 7
Conclusion 17
References 18
1
Summary
A hypothesis testing is the pillar of true research findings. This write-up substantiates the role of
a hypothesis, steps in hypothesis testing and its application in the course of a research exercise.
The world that we are living is full of uncertainties. Scientifically, we can’t have 100 percent
confidence on assumption especially in the context of a social science research field. Human
reasoning is complex and can be complicated; this is why we have to rule out chances or
assumption as a plausible explanation for the results from a research study. In this vein,
Statisticians have devised a means of drawing inferences from research findings through
hypothesis testing. Further light is being shed on decision errors and rules of interpreting
hypothesis test result. Procedures of hypothesis test in regression analysis, t-test and chi-square
goodness of fit test were also expatiated. Statistical software like SPSS, STATA, JMP etc. have
eased us the stress of all the rigorous calculations stated in this text. The manual step is to justify
that results from these software are not magic. In the case of large data, the manual method is not
efficient. This will later bring us to the next line of action; that is, practical analysis of data using
a statistical software.
2
DEFINITION OF HYPOTHESIS
A hypothesis is a tentative assertion or a formal statement of theory (testable or refutable) that
shows how two or more variables are expected to relate to one another [1]. It could also be a
formal version of a speculation that is usually based on a theory. Therefore, a statistical
hypothesis is an assumption about a population parameter. This assumption may or may not be
true. This is why hypotheses are more specific than theories. Multiple hypotheses may relate to
one theory. However, hypotheses result from the reasoning done in the conceptual framework.
Hypotheses can take the form of a simple proposition of an expected outcome, or can assert the
existence of a relationship. For instance, a simple proposition might be that one production
system, based on a particular technology is more profitable than another production system based
on another technology
On the other hand, a hypothesis of a relationship could be that in the demand for pork in Nigeria,
the per capita consumption of pork is affected by price of pork, the price of other substitutes
(meat or fish), per capita income, religious affiliation, and ethnic background.
1. Null hypothesis: The null hypothesis, denoted by H0, is usually the hypothesis that
sample observations result purely from chance. The null hypothesis, always states that the
treatment has no effect (no change, no difference). According to the null hypothesis, the
population mean after treatment is the same as it was before treatment. The α-level
establishes a criterion, or "cut-off", for making a decision about the null hypothesis. The
alpha level also determines the risk of a Type I error.
3
Figure 1
The locations of the critical region boundaries for three different levels of significance: α
= .05, α = .01, and α = .001 are shown in Figure 1. The critical region consists of
outcomes that are very unlikely to occur if the null hypothesis is true. That is, the critical
region is defined by sample means that are almost impossible to obtain if the treatment
has no effect. This means that these samples have a probability (p) that is less than the
alpha level.
HYPOTHESIS TESTING
Hypothesis testing is a technique which helps to determine whether a specific treatment has an
effect on the individuals in a population [1, 2]. It is a formal procedures used by statisticians to
accept or reject statistical hypotheses. The best way to determine whether a statistical hypothesis
is true would be to examine the entire population. Since that is often impractical, researchers
typically examine a random sample from the population. If sample data are not consistent with
the statistical hypothesis, the hypothesis is rejected.
For instance, suppose we wanted to determine whether a coin was fair and balanced. A null
hypothesis might be that half the flips would result in Heads and half, in Tails. The alternative
hypothesis might be that the number of Heads and Tails would be very different.
H0: P = 0.5
Ha: P ≠ 0.5
4
Suppose we flipped the coin 30 times, resulting in 20 Heads and 10 Tails. Given this result, we
would be inclined to reject the null hypothesis. We would conclude, based on the evidence, that
the coin was probably not fair and balanced.
In another example,
The general goal of a hypothesis test is to rule out chance (sampling error) as a plausible
explanation for the results from a research study.
Figure 1
From Fig. 1, it is assumed that the parameter μ is known for the population before treatment. The
purpose of the experiment is to determine whether or not the treatment has an effect on the
population mean. If the individuals in the sample are noticeably different from the individuals in
the original population, we have evidence that the treatment has an effect.
5
However, it is also possible that the difference between the sample and the population is simply
sampling error
Also, from Fig. 2, the entire population receives the treatment and then a sample is selected from
the treated population. In the actual research study, a sample is selected from the original
population and the treatment is administered to the sample. From either perspective, the result is
a treated sample that represents the treated population.
Figure 2
i. The difference between the sample and the population can be explained by sampling error
(there does not appear to be a treatment effect)
ii. The difference between the sample and the population is too large to be explained by sampling
error (there does appear to be a treatment effect).
6
Statisticians follow a formal process to determine whether to reject a null hypothesis, based on
sample data. This process, called hypothesis testing, consists of four steps [2].
a. State the hypotheses: This involves stating the null and alternative hypotheses. The
hypotheses are stated in such a way that they are mutually exclusive. That is, if one is
true, the other must be false.
b. Formulate an analysis plan: The analysis plan describes how to use sample data to
evaluate the null hypothesis. The evaluation often focuses around a single test statistic.
c. Analyze sample data: Find the value of the test statistic (mean score, proportion, t
statistic, z-score, etc.) described in the analysis plan.
d. Interpret results: Apply the decision rule described in the analysis plan. If the value of
the test statistic is unlikely, based on the null hypothesis, reject the null hypothesis.
DECISION ERRORS
Two types of errors can result from a hypothesis test [1, 2, 3].
Type I error: A Type I error occurs when the researcher rejects a null hypothesis when it
is true. The probability of committing a Type I error is called the significance level. This
probability is also called alpha, and is often denoted by α.
7
Type II error: A Type II error occurs when the researcher fails to reject a null hypothesis
that is false. The probability of committing a Type II error is called Beta, and is often
denoted by β. The probability of not committing a Type II error is called the Power of the
test.
DECISION RULES
The analysis plan includes decision rules for rejecting the null hypothesis. In practice,
statisticians describe these decision rules in two ways - with reference to a P-value or with
reference to a region of acceptance.
The set of values outside the region of acceptance is called the region of rejection. If the
test statistic falls within the region of rejection (for instance, Fig. 4), the null hypothesis is
rejected. In such cases, we say that the hypothesis has been rejected at the α level of
significance.
8
Figure 4
Y = Β0 + Β1X
Where Β0 is a constant, Β1 is the slope (also called the regression coefficient), X is the value of
the independent variable, and Y is the value of the dependent variable.
If we find that the slope of the regression line is significantly different from zero, we will
conclude that there is a significant relationship between the independent and dependent
variables.
If there is a significant linear relationship between the independent variable X and the dependent
variable Y, the slope will not be equal to zero.
H0: Β1 = 0
Ha: Β1 ≠ 0
9
The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that
the slope is not equal to zero.
The analysis plan describes how to use sample data to accept or reject the null hypothesis. The
plan should specify the following elements.
Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
Test method. Use a linear regression test to determine whether the slope of the
regression line differs significantly from zero.
Many statistical software / packages and some graphing calculators provide the standard of
the slope as a regression analysis output. Table 1 shows a hypothetical output for the
following regression equation: y = 42 + 15x.
In Table 1, the standard error of the slope (shaded in gray) is equal to 20. In this example, the
standard error is referred to as "SE Coeff". However, other software packages might use a
different label for the standard error. It might be "StDev", "SE", "Std Dev", or something else.
In computing the slope, like the standard error, the slope of the regression line will be provided
by most statistics software packages. In the hypothetical output (Table 1), the slope is equal to
35.
In calculating the degrees of freedom (DF), for simple linear regression (one independent and
one dependent variable), DF is equal to:
DF = n - 2
10
Test statistic. The test statistic is a t statistic (t) defined by the following equation.
t = b1 / SE
where b1 is the slope of the sample regression line, and SE is the standard error of the
slope.
The P-value is the probability of observing a sample statistic as extreme as the test statistic.
In interpreting the result in Table 1, if the sample findings are unlikely, given the null hypothesis,
the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to
the significance level, and rejecting the null hypothesis when the P-value is less than the
significance level.
To broaden our knowledge on hypothesis test for regression, let us consider a hypothetical
problem:
Problem
The Power Holding Company of Nigeria (PHCN) surveys 101 randomly selected customers in
Ekiti State. For each survey participant, the company collects the following: annual electric bill
(in naira) and home size (in square feet). Output from a regression analysis appears in Table 2.
Is there a significant linear relationship between annual bill and home size? Use a 0.05 level of
significance.
Solution
The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis
plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:
11
State the hypotheses: The first step is to state the null hypothesis and an alternative
hypothesis.
If the relationship between home size and electric bill is significant, the slope
will not equal zero.
Formulate an analysis plan: For this analysis, the significance level is 0.05. Using
sample data, we will conduct a linear regression t-test to determine whether the slope of
the regression line differs significantly from zero.
Analyze sample data: To apply the linear regression t-test to sample data, we require the
standard error of the slope, the slope of the regression line, the degrees of freedom, the t
statistic test statistic, and the P-value of the test statistic.
We get the slope (b1) and the standard error (SE) from the regression output.
b1 = 0.55 SE = 0.24
We compute the degrees of freedom and the t statistic test statistic, using the following
equations.
DF = n - 2 = 101 - 2 = 99
Based on the t statistic, test statistic and the degrees of freedom, we determine the P-
value. The P-value is the probability that a t statistic having 99 degrees of freedom is
more extreme than 2.29. Since this is a two-tailed test, "more extreme" means greater
than 2.29 or less than -2.29. We use the t Distribution Calculator to find P(t > 2.29) =
0.0121 and P(t < 2.29) = 0.0121. Therefore, the P-value is 0.0121 + 0.0121 or 0.0242.
12
Interpret results. Since the P-value (0.0242) is less than the significance level (0.05), we
cannot accept the null hypothesis.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.
Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis.
The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true,
the other must be false; and vice versa.
Table 3 shows three sets of null and alternative hypotheses. Each makes a statement about the
difference d between the mean of one population μ1 and the mean of another population μ2. (In
the table, the symbol ≠ means “not equal to ".)
13
Table 3
1 μ1 - μ2 = d μ 1 - μ2 ≠ d 2
2 μ1 - μ2 > d μ 1 - μ2 < d 1
3 μ1 - μ2 < d μ 1 - μ2 > d 1
The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on
either side of the sampling distribution would cause a researcher to reject the null hypothesis.
The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on
only one side of the sampling distribution would cause a researcher to reject the null hypothesis.
When the null hypothesis states that there is no difference between the two population means
(i.e., d = 0), the null and alternative hypothesis are often stated in the following form.
H0: μ1 = μ2
Ha: μ1 ≠ μ2
The analysis plan describes how to use sample data to accept or reject the null hypothesis. It
should specify the following elements.
Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
Test method. Use the two-sample t-test to determine whether the difference between
means found in the sample is significantly different from the hypothesized difference
between means.
14
where s1 is the standard deviation of sample 1, s2 is the standard deviation of sample 2,
n1 is the size of sample 1, and n2 is the size of sample 2.
If DF does not compute to an integer, round it off to the nearest whole number. Some
texts suggest that the degrees of freedom can be approximated by the smaller of n1 - 1 and
n2 - 1; but the above formula gives better results.
t = [ (x1 - x2) - d ] / SE
The P-value is the probability of observing a sample statistic as extreme as the test
statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess
the probability associated with the t statistic, having the degrees of freedom computed
above.
Note: All the above steps are manual methods; the statistical software does everything
Interpret Results
If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.
Now, let us use hypothetical examples to illustrate how to conduct a hypothesis test of a
difference between mean scores. The first problem involves a two-tailed test; the second
problem, a one-tailed test.
15
Within a school district, students were randomly assigned to one of two Math teachers - Mrs.
Similoluwa and Mrs. Juliet. After the assignment, Mrs. Similoluwa had 30 students, and Mrs.
Juliet had 25 students.
At the end of the year, each class took the same standardized test. Mrs. Similoluwa's students had
an average test score of 78, with a standard deviation of 10; and Mrs. Juliet’s students had an
average test score of 85, with a standard deviation of 15.
Test the hypothesis that Mrs. Similoluwa and Mrs. Juliet are equally effective teachers. Use a
0.10 level of significance. (Assume that student performance is approximately normal.)
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an
analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps
below:
State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.
Null hypothesis: μ1 - μ2 = 0
Alternative hypothesis: μ1 - μ2 ≠ 0
Note that these hypotheses constitute a two-tailed test. The null hypothesis will be
rejected if the difference between sample means is too big or if it is too small.
Formulate an analysis plan. For this analysis, the significance level is 0.10. Using
sample data, we will conduct a two-sample t-test of the null hypothesis.
Analyze sample data. Using sample data, we compute the standard error (SE), degrees
of freedom (DF), and the t statistic test statistic (t).
SE = sqrt[(s12/n1) + (s22/n2)]
SE = sqrt[(102/30) + (152/25] = sqrt(3.33 + 9) = sqrt(12.33) = 3.51
DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
DF = (102/30 + 152/25)2 / { [ (102 / 30)2 / (29) ] + [ (152 / 25)2 / (24) ] }
DF = (3.33 + 9)2 / { [ (3.33)2 / (29) ] + [ (9)2 / (24) ] } = 152.03 / (0.382 + 3.375) =
152.03/3.757 = 40.47
t = [ (x1 - x2) - d ] / SE = [ (78 - 85) - 0 ] / 3.51 = -7/3.51 = -1.99
16
where s1 is the standard deviation of sample 1, s2 is the standard deviation of sample 2,
n1 is the size of sample 1, n2 is the size of sample 2, x1 is the mean of sample 1, x2 is the
mean of sample 2, d is the hypothesized difference between the population means, and
SE is the standard error.
Since we have a two-tailed test, the P-value is the probability that a t statistic having 40
degrees of freedom is more extreme than -1.99; that is, less than -1.99 or greater than
1.99.
We use the t Distribution Calculator to find P (t < -1.99) = 0.027, and P (t > 1.99) =
0.027. Thus, the P-value = 0.027 + 0.027 = 0.054.
Interpret results. Since the P-value (0.054) is less than the significance level (0.10), we
cannot accept the null hypothesis.
The Acme Company has developed a new battery. The engineer in charge claims that the new
battery will operate continuously for at least 7 minutes longer than the old battery.
To test the claim, the company selects a simple random sample of 100 new batteries and 100 old
batteries. The old batteries run continuously for 190 minutes with a standard deviation of 20
minutes; the new batteries, 200 minutes with a standard deviation of 40 minutes.
Test the engineer's claim that the new batteries run at least 7 minutes longer than the old. Use a
0.05 level of significance. (Assume that there are no outliers in either sample.)
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an
analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps
below:
State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.
17
Note that these hypotheses constitute a one-tailed test. The null hypothesis will be
rejected if the mean difference between sample means is too small.
Formulate an analysis plan. For this analysis, the significance level is 0.05. Using
sample data, we will conduct a two-sample t-test of the null hypothesis.
Analyze sample data. Using sample data, we compute the standard error (SE), degrees
of freedom (DF), and the t statistic test statistic (t).
SE = sqrt[(s12/n1) + (s22/n2)]
SE = sqrt[(402/100) + (202/100] = sqrt(16 + 4) = 4.472
DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
DF = (402/100 + 202/100)2 / { [ (402 / 100)2 / (99) ] + [ (202 / 100)2 / (99) ] }
DF = (20)2 / { [ (16)2 / (99) ] + [ (2)2 / (99) ] } = 400 / (2.586 + 0.162) = 145.56
t = [ (x1 - x2) - d ] / SE = [(200 - 190) - 7] / 4.472 = 3/4.472 = 0.67
Here is the logic of the analysis: Given the alternative hypothesis (μ1 - μ2 < 7), we want to
know whether the observed difference in sample means is small enough (i.e., sufficiently
less than 7) to cause us to reject the null hypothesis.
The observed difference in sample means (10) produced a t statistic of 0.67. We use the t
Distribution Calculator to find P(t < 0.67) = 0.75.
This means we would expect to find an observed difference in sample means of 10 or less
in 75% of our samples, if the true difference were actually 7. Therefore, the P-value in
this analysis is 0.75.
Interpret results. Since the P-value (0.75) is greater than the significance level (0.05),
we cannot reject the null hypothesis.
18
This test is applied when you have one categorical variable from a single population. It is used to
determine whether sample data are consistent with a hypothesized distribution. For example,
suppose a company printed baseball cards. It claimed that 30% of its cards were rookies; 60%,
veterans; and 10%, All-Stars. We could gather a random sample of baseball cards and use a chi-
square goodness of fit test to see whether our sample distribution differed significantly from the
distribution claimed by the company.
The chi-square goodness of fit test is appropriate when the following conditions are met:
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.
Every hypothesis test requires the analyst to state a null hypothesis (H0) and an alternative
hypothesis (Ha) as usual. The hypotheses are stated in such a way that they are mutually
exclusive. That is, if one is true, the other must be false; and vice versa.
For a chi-square goodness of fit test, the hypotheses take the following form.
Typically, the null hypothesis (H0) specifies the proportion of observations at each level of the
categorical variable. The alternative hypothesis (Ha) is that at least one of the specified
proportions is not true.
The analysis plan describes how to use sample data to accept or reject the null hypothesis. The
plan should specify the following elements.
19
Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or
0.10; but any value between 0 and 1 can be used.
Test method. Use the chi-square goodness of fit test to determine whether observed
sample frequencies differ significantly from expected frequencies specified in the null
hypothesis.
Using sample data, find the degrees of freedom, expected frequency counts, test statistic, and the
P-value associated with the test statistic.
The degrees of freedom (DF) is equal to the number of levels (k) of the categorical
variable minus 1: DF = k - 1 .
The expected frequency counts at each level of the categorical variable are equal to the
sample size times the hypothesized proportion from the null hypothesis
Ei = npi
where Ei is the expected frequency count for the ith level of the categorical variable, n is
the total sample size, and pi is the hypothesized proportion of observations in level i.
The test statistic is a chi-square random variable (Χ2) defined by the following equation.
Χ2 = Σ [ (Oi - Ei)2 / Ei ]
where Oi is the observed frequency count for the ith level of the categorical variable, and
Ei is the expected frequency count for the ith level of the categorical variable.
The P-value is the probability of observing a sample statistic as extreme as the test
statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution
Calculator to assess the probability associated with the test statistic.
Interpret Results
20
If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting
the null hypothesis when the P-value is less than the significance level.
Problem
Acme Toy Company prints baseball cards. The company claims that 30% of the cards are
rookies, 60% veterans, and 10% are All-Stars.
Suppose a random sample of 100 cards has 50 rookies, 45 veterans, and 5 All-Stars. Is this
consistent with Acme's claim? Use a 0.05 level of significance.
Solution
The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis
plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:
State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.
Null hypothesis: The proportion of rookies, veterans, and All-Stars is 30%, 60%
and 10%, respectively.
Alternative hypothesis: At least one of the proportions in the null hypothesis is
false.
Formulate an analysis plan. For this analysis, the significance level is 0.05. Using
sample data, we will conduct a chi-square goodness of fit test of the null hypothesis.
Analyze sample data. Applying the chi-square goodness of fit test to sample data, we
compute the degrees of freedom, the expected frequency counts, and the chi-square test
statistic. Based on the chi-square statistic and the degrees of freedom, we determine
the P-value.
DF = k - 1 = 3 - 1 = 2
(Ei) = n * pi
(E1) = 100 * 0.30 = 30
21
(E2) = 100 * 0.60 = 60
(E3) = 100 * 0.10 = 10
Χ2 = Σ [ (Oi - Ei)2 / Ei ]
Χ2 = [ (50 - 30)2 / 30 ] + [ (45 - 60)2 / 60 ] + [ (5 - 10)2 / 10 ]
Χ2 = (400 / 30) + (225 / 60) + (25 / 10) = 13.33 + 3.75 + 2.50 = 19.58
where DF is the degrees of freedom, k is the number of levels of the categorical variable,
n is the number of observations in the sample, Ei is the expected frequency count for level
i, Oi is the observed frequency count for level i, and Χ2 is the chi-square test statistic.
The P-value is the probability that a chi-square statistic having 2 degrees of freedom is
more extreme than 19.58.
We use the Chi-Square Distribution Calculator to find P(Χ2 > 19.58) = 0.0001.
Interpret results. Since the P-value (0.0001) is less than the significance level (0.05), we
cannot accept the null hypothesis.
Conclusion
References
22
[1] Søren Johansen (1991). Estimation and Hypothesis Testing of Cointegration Vectors in
Gaussian Vector Autoregressive Models. Econometrica Vol. 59, No. 6 (Nov., 1991), pp. 1551-
1580.
[3] Massey A. and Miller S. (undated). Tests of Hypotheses Using Statistics. Mathematics
Department, Brown University, Providence, RI 02912
23