Unit 12

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Testing of Hypothesis

UNIT 12 TESTING OF HYPOTHESIS


Structure
12.1 Introduction
Objectives

12.2 Null Hypothesis


12.3 Hypothesis Testing
12.3.1 Errors in Testing of Hypothesis
12.3.2 Steps Involved in Hypothesis Testing

12.4 Rejection Regions


12.5 Student’s t-Distribution
12.5.1 Two Tailed and One Tailed Tests
12.5.2 Hypothesis Tests about the Difference between Two Population Means
12.5.3 Test for Difference between Proportions

12.6 Chi-Square Test


12.7 Summary
12.8 Key Words
12.9 Answers to SAQs

12.1 INTRODUCTION
Many a time, we strongly believe some results to be true. But after taking a sample, we
notice that one sample data does not wholly support the result. The difference is due to (i)
the original belief being wrong, or (ii) the sample being slightly one sided.
Tests are, therefore, needed to distinguish between the two possibilities. These tests tell
about the likely possibilities and reveal whether or not the difference can be due to only
chance elements. If the difference is not due to chance elements, it is significant and,
therefore, these tests are called tests of significance. The whole procedure is known as
Testing of Hypothesis.
Setting up and testing hypotheses is an essential part of statistical inference. In order to
formulate such a test, usually some theory has been put forward, either because it is
believed to be true or because it is to be used as a basis for argument, but has not been
proved. For example, the hypothesis may be the claim that a new drug is better than the
current drug for treatment of a disease, diagnosed through a set of symptoms.
In each problem considered, the question of interest is simplified into two competing
claims/hypotheses between which we have a choice; the null hypothesis, denoted by H0,
against the alternative hypothesis, denoted by H1. These two competing claims /
hypotheses are not however treated on an equal basis; special consideration is given to
the null hypothesis. We have two common situations :
(i) The experiment has been carried out in an attempt to disprove or reject a
particular hypothesis, the null hypothesis; thus we give that one priority so it
cannot be rejected unless the evidence against it is sufficiently strong. For
example, null hypothesis H0: there is no difference in taste between coke
and diet coke, against the alternate hypothesis H1: there is a difference in the
tastes.
(ii) If one of the two hypotheses is ‘simpler’, we give it priority so that a more
‘complicated’ theory is not adopted unless there is sufficient evidence
against the simpler one. For example, it is ‘simpler’ to claim that there is no
51
Probability and Statistics difference in flavour between coke and diet coke than it is to say that there is
a difference.
The hypotheses are often statements about population parameters like expected value and
variance. For example, H0, might be the statement that the expected value of the height of
ten year old boys in the Indian population, is not different from that of ten year old girls.
A hypothesis might also be a statement about the distributional form of a characteristic of
interest; for example, that the height of ten years old boys is normally distributed within
the Indian population.
Objectives
After studying this unit, you should be able to
• understand the basic concepts of Testing of Hypothesis,
• explain Null Hypothesis,
• differentiate Type-I and Type-II errors,
• apply student’s t-distribution,
• appreciate Chi-square test, and
• understand the use of common statistical tests.

12.2 NULL HYPOTHESIS


The null hypothesis, H0, represents a theory that has been put forward, either because it is
believed to be true or because it is to be used as a basis for argument, but has not been
proved. For example, in respect of a clinical trial of a new drug, the null hypothesis might
be that the new drug is no better, on average, than the current drug. We would write H0 :
there is no difference between the two drugs on average.
We give special consideration to the null hypothesis. This is due to the fact that the null
hypothesis relates to the statement being tested, whereas the alternative hypothesis relates
to the statement to be accepted if / when the null hypothesis is rejected.
The alternative hypothesis, H1, is a statement of what a statistical hypothesis test is set up
to establish. For example, in a clinical trial of a new drug, the alternative hypothesis
might be that the new drug has a different effect, on average, compared to that of the
current drug. We would write H1: the two drugs have different effects, on average. The
alternative hypothesis might also be that the new drug is better, on average, than the
current drug. In this case, we would write H1: the new drug is better than the current
drug, on average.
The final conclusion once the test has been carried out is always given in terms of the
null hypothesis. We either ‘reject H0 in favour of H1’ or ‘do not reject H0’; we never
conclude ‘reject H1’, or even ‘accept H1’.
If we conclude ‘do not reject H0’, this does not necessarily mean that the null hypothesis
is true. It only suggests that there is not sufficient evidence against H0 in favour of H1;
rejecting the null hypothesis then suggests that the alternative hypothesis may be true.

12.3 HYPOTHESIS TESTING


Hypothesis testing is a form of statistical inference that uses data from a sample to draw
conclusions about a population parameter or a population probability distribution. First, a
tentative assumption is made about the parameter or distribution. This assumption is
called the null hypothesis and is denoted by H0. An alternative hypothesis (denoted by
H1), which is the opposite of what is stated in the null hypothesis, is then defined. The
hypothesis-testing procedure involves using sample data to determine whether or not H0
52
can be rejected. If H0 is rejected, the statistical conclusion is that the alternative Testing of Hypothesis
hypothesis H1 is true.
A hypothesis is a statement supposed to be true till it is proved false. It may be based on
previous experience or may be derived theoretically. First a statistician or the investigator
forms a research hypothesis that an exception is to be tested. Then she/he derives a
statement which is opposite to the research hypothesis (noting as Ho). The approach here
is to set up an assumption that there is no contradiction between the believed result and
the sample result and that the difference, therefore, can be ascribed solely to chance.
Such a hypothesis is called a null hypothesis (Ho). It is the null hypothesis that is actually
tested, not the research hypothesis. The object of the test is to see whether the null
hypothesis should be rejected or accepted.
If the null hypothesis is rejected, that is taken as evidence in favour of the research
hypothesis which is called as the alternative hypothesis (denoted by H1). In usual practice
we do not say that the research hypothesis has been “proved” only that it has been
supported.
For example, assume that a radio station selects the music it plays based on the
assumption that the average age of its listening audience is 30 years. To determine
whether this assumption is valid, a hypothesis test could be conducted with the null
hypothesis as H0 : µ = 30 and the alternative hypothesis as H1: µ ≠ 30. Based on a sample
of individuals from the listening audience, the sample mean age, x , can be computed and
used to determine whether there is sufficient statistical evidence to reject H0.
Conceptually, a value of the sample mean that is “close” to 30 is consistent with the null
hypothesis, while a value of the sample mean that is “not close” to 30 provides support
for the alternative hypothesis. What is considered “close” and “not close” is determined
by using the sampling distribution of x .
Ideally, the hypothesis-testing procedure leads to the acceptance of H0 when H0 is true
and the rejection of H0 when H0 is false. Unfortunately, since hypothesis tests are based
on sample information, the possibility of errors must be considered. A Type-I error
corresponds to rejecting H0 when H0 is actually true, and a Type-II error corresponds to
accepting H0 when H0 is false. The probability of making a Type-I error is denoted by α,
and the probability of making a Type-II error is denoted by β.
In using the hypothesis-testing procedure to determine if the null hypothesis should be
rejected, the person conducting the hypothesis test specifies the maximum allowable
probability of making a Type-I error, called the level of significance for the test.
Common choices for the level of significance are α = 0.05 and α = 0.01. Although most
applications of hypothesis testing control the probability of making a Type I error, they
do not always control the probability of making a Type-II error. A graph known as an
operating-characteristic curve can be constructed to show how changes in the sample size
affect the probability of making a Type-II error.
A concept known as the p-value provides a convenient basis for drawing conclusions in
hypothesis-testing applications. The p-value is a measure of how likely the sample results
are, assuming the null hypothesis is true; the smaller the p-value, the lesser likely are the
sample results reliable. If the p-value is less than α, the null hypothesis can be rejected;
otherwise, the null hypothesis cannot be rejected. The p-value is often called the
observed level of significance for the test.
A hypothesis test can be performed on parameters of one or more populations as well as
in a variety of other situations. In each instance, the process begins with the formulation
of null and alternative hypotheses about the population. In addition to the population
mean, hypothesis-testing procedures are available for population parameters such as
proportions, variances, standard deviations, and medians.
Hypothesis tests are also conducted in regression and correlation analysis to determine if
the regression relationship and the correlation coefficient are statistically significant. A
goodness-of-fit test refers to a hypothesis test in which the null hypothesis is that the
population has a specific probability distribution, such as a normal probability
53
Probability and Statistics distribution. Nonparametric statistical methods also involve a variety of hypothesis-
testing procedures.
For example, if it is assumed that the mean of the weights of the population of a college
is 55 kg, then the null hypothesis will be : the mean of the population is 55 kg, i.e.
H1 : µ = 55 kg ( Null hypothesis ). In terms of alternative hypothesis (i) H1 : µ  ≠ 55 kg,
(ii) H1 : µ > 55 kg, (iii) H1 : µ < 55 kg.
Now fixing the limits totally depends upon the accuracy desired. Generally the limits are
fixed such that the probability that the difference will exceed the limits is 0.05 or 0.01.
These levels are known as the ‘levels of significance’ and are expressed as 5% or
1% levels of significance. Rejection of null hypothesis does not mean that the hypothesis
is disproved. It simply means that the sample value does not support the hypothesis.
Also, acceptance does not mean that the hypothesis is proved. It means simply it is being
supported.
Confidence Limits
The limits (or range) within which the hypothesis should lie with specified
probabilities are called the confidence limits or fiducial limits. It is customary to
take these limits as 5% or 1% levels of significance. If sample value lies between
the confidence limits, the hypothesis is accepted; if it does not, the hypothesis is
rejected at the specified level of significance.
12.3.1 Errors in Testing of Hypothesis
In testing any hypothesis, we get only two results : either we accept or we reject it. We
do not know whether it is true or false. Hence four possibilities may arise.
(i) The hypothesis is true but test rejects it (Type-I error).
(ii) The hypothesis is false but test accepts it (Type-II error).
(iii) The hypothesis is true and test accepts it (correct decision).
(iv) The hypothesis is false and test rejects it (correct decision)
Type-I Error
In a hypothesis test, a Type-I error occurs when the null hypothesis is rejected
when it is in fact true; that is, H0 is wrongly rejected. For example, in a clinical
trial of a new drug, the null hypothesis might be that the new drug is no better, on
average, than the current drug; that is H0 : there is no difference between the two
drugs on average. A Type-I error would occur if we concluded that the two drugs
produced different effects when in fact there was no difference between them.
Table 12.1 gives a summary of possible results of any hypothesis test :
Table 12.1 : Type-I Error
Decision
Reject H0 Don’t Reject H0
Truth H0 Type-I Error Right Decision
H1 Right Decision Type-II Error

A Type-I error is often considered to be more serious, and therefore more


important to avoid, than a Type-II error. The hypothesis test procedure is therefore
adjusted so that there is a guaranteed ‘low’ probability of rejecting the null
hypothesis wrongly; this probability is never 0. This probability of a Type-I error
can be precisely computed as,
P (Type-I error) = significance level = α
The exact probability of a Type-II error is generally unknown.
54
If we do not reject the null hypothesis, it may still be false (a Type-II error) as the Testing of Hypothesis
sample may not be big enough to identify the falseness of the null hypothesis
(especially if the truth is very close to hypothesis).
For any given set of data, Type-I and Type-II errors are inversely related; the
smaller the risk of one, the higher the risk of the other.
A Type-I error can also be referred to as an error of the first kind.
Type-II Error
In a hypothesis test, a Type-II error occurs when the null hypothesis, H0, is not
rejected when it is in fact false. For example, in a clinical trial of a new drug, the
null hypothesis might be that the new drug is no better, on average, than the
current drug; that is H0 : there is no difference between the two drugs on average.
A Type-II error would occur if it was concluded that the two drugs produced the
same effect, that is, there is no difference between the two drugs on average, when
in fact they produced different ones.
A Type-II error is frequently due to sample sizes being too small.
The probability of a Type-II error is symbolised by β and written :
P (Type-II error) = β (but is generally unknown).
A Type-II error can also be referred to as an error of the second kind.
Hypothesis testing refers to the process of using statistical analysis to determine if the
observed differences between two or more samples are due to random chance (as stated
in the null hypothesis) or to true differences in the samples (as stated in the alternate
hypothesis). A null hypothesis (H0) is a stated assumption that there is no difference in
parameters (mean, variance, DPMO) for two or more populations. The alternate
hypothesis (H1) is a statement that the observed difference or relationship between two
populations is real and not the result of chance or an error in sampling. Hypothesis
testing is the process of using a variety of statistical tools to analyze data and, ultimately,
to fail to reject or reject the null hypothesis. From a practical point of view, finding
statistical evidence that the null hypothesis is false allows you to reject the null
hypothesis and accept the alternate hypothesis.
Because of the difficulty involved in observing every individual in a population for
research purposes, researchers normally collect data from a sample and then use the
sample data to help answer questions about the population.
A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis
about a population parameter.
The hypothesis testing is standard and it follows a specific order;
(i) first state a hypothesis about a population (a population parameter, e.g. mean
µ),
(ii) obtain a random sample from the population and also find its mean x , and
(iii) compare the sample data with the hypothesis on the scale (standard z or
normal distribution).
A hypothesis test is typically used in the context of a research study, i.e. a researcher
completes one round of a field investigation and then uses a hypothesis test to evaluate
the results. Depending on the type of research and the type of data, the details will differ
from one research situation to another.
Basic Experimental Situations for Hypothesis Testing
(i) It is assumed that the mean, µ, is known before treatment. The purpose of
the experiment is to determine whether or not the treatment has an effect on
the population mean, e.g. a researcher will like to find out whether increased
stimulation of infants has an effect on their weight. It is known from
55
Probability and Statistics national statistics that the mean weight, µ, of 2-year old children is
13 kg. The distribution is normal with the standard deviation, σ = 2 kg.
(ii) To test the truth of the claim : a researcher may take 16 new born infants and
give their parents detailed instructions for giving these infants increased
handling and stimulations. At age 2, each of the 16 children will be weighed
and the mean weight for the sample will be computed.
(iii) The researcher may conclude that the increased handling and stimulation
had an effect on the weight of the children if there is a substantial difference
in the weights from the population mean.
12.3.2 Steps Involved in Hypothesis Testing
(i) State the null hypothesis and the alternative hypothesis. (Note : The goal of
inferential statistics is to make general statements about the population by
using sample data. Therefore in testing hypothesis, we make our predictions
about the population parameters).
(ii) Set the criteria for a decision.
(iii) Level of significance or alpha level for the hypothesis test : This is
represented by α which is the probability used to define the very unlikely
sample outcomes, if the null hypothesis is true. In hypothesis testing, the set
of potential samples is divided into those that are likely to be obtained and
those that are very unlikely if the hypothesis is true.

Figure 12.1 : Hypothesis Testing

(iv) Critical Region : The region composed of extreme samples values that are
very unlikely outcomes if the null hypothesis is true. The boundaries for the
critical region are determined by the alpha level. If sample data fall in the
critical region, the null hypothesis is rejected. The α-level you set affects the
outcome of the research.
(v) Collect data and compute sample statistics using the formula
x −µ
z=
σx
where, x = sample mean,
µ = hypothesised population mean, and
σ x = standard error between x and µ.
σ
σx =
n
(vi) Make a decision and write down the decision rule.
Z-Score Statistics
56
Z-Score is called a test statistics. The purpose of a test statistics is to determine Testing of Hypothesis
whether the result of a research study (the obtained difference) is more than what
would be expected by the chance alone.
Obtained difference
z=
Difference due to chance
Now suppose a manufacturer, produces some type of articles of good quality. A
purchaser by chance selects a sample randomly. It so happens that the sample contains
many defective articles and it leads the purchaser to reject the whole product. Now, the
manufacturer suffers a loss even though he has produced a good article of quality.
Therefore, this Type-I error is called “producers risk”.
On the other hand, if we accept the entire lot on the basis of a sample and the lot is not
really good, the consumers are put in loss. Therefore, this Type-II error is called the
“consumers risk”.
In practical situations, still other aspects are considered while accepting or rejecting a lot.
The risks involved for both producer and consumer are compared. Then Type-I and
Type-II errors are fixed; and a decision is reached.

Figure 12.2 : Types of Error

In summary, we recommend the following procedure for formulating hypotheses and


stating conclusions.
Formulating Hypotheses and Stating Conclusions
(i) State the hypothesis as the alternative hypothesis H1.
(ii) The null hypothesis, H0, will be the opposite of H1 and will contain an
equality sign.
(iii) If the sample evidence supports the alternative hypothesis, the null
hypothesis will be rejected and the probability of having made an incorrect
decision (when in fact H0 is true) is α, a quantity that can be manipulated to
be as small as the researcher wishes.
(iv) If the sample does not provide sufficient evidence to support the alternative
hypothesis, then conclude that the null hypothesis cannot be rejected on the
basis of your sample. In this situation, you may wish to collect more
information about the phenomenon under study.
Example 12.1
The logic used in hypothesis testing has often been likened to that used in the
courtroom in which a defendant is on trial for committing a crime.
(i) Formulate appropriate null and alternative hypotheses for judging the guilt
or innocence of the defendant.
57
(ii) Interpret the Type-I and Type-II errors in this context.
Probability and Statistics (iii) If you were the defendant, would you want α to be small or large? Explain.
Solution
(i) Under a judicial system, a defendant is “innocent until proven guilty”. That
is, the burden of proof is not on the defendant to prove his or her innocence;
rather, the court must collect sufficient evidence to support the claim that the
defendant is guilty. Thus, the null and alternative hypotheses would be
H0 : Defendant is innocent
H1 : Defendant is guilty
(ii) The four possible outcomes are shown in Table 12.2. A Type-I error would
be to conclude that the defendant is guilty, when in fact he or she is
innocent; a Type-II error would be to conclude that the defendant is
innocent, when in fact he or she is guilty.
Table 12.2 : Conclusions and Consequences
Decision of Court
Defendant is Defendant is
Innocent Guilty
True State of Defendant is innocent Correct decision Type-II error
Nature
Defendant is guilty Type-I error Correct decision

(iii) Most people would probably agree that the Type-I error in this situation is
by far the more serious. Thus, we would want α, the probability of
committing a Type-I error, to be very small indeed.
A convention that is generally observed when formulating the null and alternative
hypotheses of any statistical test is to state H0 so that the possible error of
incorrectly rejecting H0 (Type-I error) is considered more serious than the possible
error of incorrectly failing to reject H0 (Type-II error). In many cases, the decision
as to which type of error is more serious is admittedly not as clear-cut as that of
Example 12.1; experience will help to minimize this potential difficulty.
Types of Errors for a Hypothesis Test
The goal of any hypothesis testing is to make a decision. In particular, we will
decide whether to reject the null hypothesis, H0, in favour of the alternative
hypothesis, H1. Although we would like always to be able to make a correct
decision, we must remember that the decision will be based on sample information,
and thus we are subject to make one of two types of error, as defined in Table 12.2.
The null hypothesis can be either true or false. Further, we will make a conclusion either
to reject or not to reject the null hypothesis. Thus, there are four possible situations that
may arise in testing a hypothesis as shown in Table 12.3.
Table 12.3 : Conclusions and Consequences for Testing a Hypothesis
Conclusions
Do Not Reject Reject
Null Hypothesis Null Hypothesis
Null Hypothesis Correct conclusion Type-I error
True
Alternative Hypothesis Type-II error Correct conclusion
“State of Nature”

The kind of error that can be made depends on the actual state of affairs (which, of
course, is unknown to the investigator). Note that we risk a Type-I error only if the null
hypothesis is rejected, and we risk a Type-II error only if the null hypothesis is not
rejected. Thus, we may make no error, or we may make either a Type-I error (with
58 probability α), or a Type-II error (with probability β), but not both. We don't know which
type of error corresponds to actuality and so would like to keep the probabilities of both Testing of Hypothesis
types of errors small. There is an intuitively appealing relationship between the
probabilities for the two types of error : As α increases, β decreases, similarly, as β
increases, α decreases. The only way to reduce α and β simultaneously is to increase the
amount of information available in the sample, i.e. to increase the sample size.
You may note that we have carefully avoided stating a decision in terms of “accept the
null hypothesis H0”. Instead, if the sample does not provide enough evidence to support
the alternative hypothesis H1 we prefer a decision “not to reject H0”. This is because, if
we were to “accept H0”, the reliability of the conclusion would be measured by β, the
probability of Type-II error. However, the value of β is not constant, but depends on the
specific alternative value of the parameter and is difficult to compute in most testing
situations.

12.4 REJECTION REGIONS


In this section, we will describe how to arrive at a decision in a hypothesis-testing
situation. Recall that when making any type of statistical inference (of which hypothesis
testing is a special case), we collect information by obtaining a random sample from the
populations of interest. In all our applications, we will assume that the appropriate
sampling process has already been carried out.
Example 12.2
Suppose we want to test the hypotheses
H0 : µ = 72
H1 : µ > 72
What is the general format for carrying out a statistical test of hypothesis?
Solution
The first step is to obtain a random sample from the population of interest. The
information provided by this sample, in the form of a sample statistic, will help us
decide whether to reject the null hypothesis, or not. The sample statistic upon
which we base our decision is called the test statistic.
The second step is to determine a test statistic that is reasonable in the context of a
given hypothesis test. For this example, we are hypothesizing about the value of
the population mean µ. Since our best guess about the value of µ is the sample
mean x , it seems reasonable to use x as a test statistic. We will learn how to
choose the test statistic for other hypothesis-testing situations in the examples that
follow.
The third step is to specify the range of possible computed values of the test
statistic for which the null hypothesis will be rejected. That is, what specific values
of the test statistic will lead us to reject the null hypothesis in favour of the
alternative hypothesis? These specific values are known collectively as the
rejection region for the test. For this example, we would need to specify the values
of x that would lead us to believe that Ha is true, i.e., that µ is greater than 72. We
will learn how to find an appropriate rejection region in later examples.
Once the rejection region has been specified, the fourth step is to use the data in
the sample to compute the value of the test statistic. Finally, we make our decision
by observing whether the computed value of the test statistic lies within the
rejection region. If, in fact, the computed value falls within the rejection region, we
will reject the null hypothesis; otherwise, we do not reject the null hypothesis.
Outline for Testing a Hypothesis
(i) Obtain a random sample from the population(s) of interest.
(ii) Determine a test statistic that is reasonable in the context of the given
hypothesis test. 59
Probability and Statistics (iii) Specify the rejection region, the range of possible computed values of the
test statistic for which the null hypothesis will be rejected.
(iv) Use the data in the sample to compute the value of the test statistic.
(v) Observe whether the computed value of the test statistic lies within the
rejection region. If so, reject the null hypothesis; otherwise, do not reject the
null hypothesis.
Recall that the null and alternative hypotheses will be stated in terms of specific
population parameters. Thus, in step 2 we decide on a test statistic that will provide
information about the target parameter.
Sampling of Variables
The sample mean x is normally distributed with mean µ and standard deviation
σ
, where µ is the mean of the population and σ is the standard deviation of the
n
population. Thus, result will help us in testing a hypothesis about the population
mean.
Testing the hypothesis that the population mean µ = µ0 which is a specified value.
Now σ = S.D. of the population is known.
We consider the null hypothesis
µ = µ0, i.e. µ − µ0 = 0.

x −µ
Next, calculate | z| =
σ
n
(i) If | z | < 1.96, the difference is not significant at 5% level and Ho is accepted,
otherwise rejected.
(ii) If | z | < 2.58, the difference is not significant at 1% level and Ho is accepted,
otherwise rejected.
Note : We have assumed that σ is known. If however σ is not known, we take σ
to be equal to the S. D. of the sample.
Example 12.3
A random sample of 400 male students have average weight of 55 kg. Can we say
that the sample comes from a population with mean 58 kg with a variance of 9 kg?
Solution
The null hypothesis Ho is that the sample comes from the given population.
In notations : Ho : µ = 58 kg and H1 : µ ≠ 58 kg.

x −µ
Now | z| = . Insert x = sample mean = 55 kg.
σ
n
µ = population mean = 58 kg, n = 400 and σ = population SD = 3

55 − 58
Therefore | z | = = 20 > 2.58
3
400
This value is highly significant. We will reject Ho on the basis of this sample. The
60 sample, therefore, is not likely to be from the given population.
Example 12.4 Testing of Hypothesis

A random sample of 400 tins of vegetable oil and labeled “5 kg net weight” has a
mean net weight of 4.98 kg with standard deviation of 0.22 kg. Do we reject the
hypothesis of net weight of 5 kg per tin on the basis of this sample at 1% level of
significance?
Solution
The null hypothesis Ho is that the net weight of each tin is 5 kg.
In notations Ho : µ = 5 kg.
Inserting, x = 4.98 kg, µ = 5 kg, σ = 0.22 kg and n = 400 in

x −µ
| z| = , we get
σ
n

4.98 − 5
| z| = = 18 > 2.58
0.22
400
Hence Ho is rejected at 1% level of significance.
Application of Hypothesis Testing
In this section, we will present applications of the hypothesis-testing logic. Among
the population parameters to be considered are (µ1 − µ2), p, and (p1 − p2).
The concepts of a hypothesis test are the same for all these parameters; the null and
alternative hypotheses, test statistic, and rejection region all have the same general
form. However, the manner in which the test statistic is actually computed depends
on the parameter of interest. For example, we saw that the large-sample test
statistic for testing a hypothesis about a population mean µ is given by
x − µ0
| z| =
σ
n
while the test statistic for testing a hypothesis about the parameter p is
pˆ − p0
| z| =
p0 q0
n
where pˆ and p0 denote the sample proportion and the theoretical proportion
respectively.
The key to correctly diagnosing a hypothesis test is to determine first the
parameter of interests. In this section, we will present several examples illustrating
how to determine the parameter of interest. The following are the key words to
look for when conducting a hypothesis test about a population parameter.
Table 12.4 : Determining the Parameter of Interest
Parameter Description
µ Mean; average
(µ1 − µ2) Difference in means or averages; mean difference;
comparison of means or averages
p Proportion; percentage; fraction; rate 61
Probability and Statistics (p1 − p2) Difference in proportion, percentage, fraction, or
rates; comparison of proportions, percentages,
fractions, or rates
σ2 Variance; variation; precision

σ12 Ratio of variances; difference in variation;


comparison of variances
σ 22

Hypothesis Test about a Population Mean


Suppose that the last year all students at a certain university reported the number of
hours spent on their studies during a certain week; the average was
40 hours. This year we want to determine whether the mean time spent on studies
of all students at the university is in excess of 40 hours per week. That is, we will
test
H0 : µ = 40, H1 : µ > 40
where µ = Mean time spent on studies of all students at the university.
We are conducting this study in an attempt to gather support for H1; we hope that
the sample data will lead to the rejection of H0. Now, the point estimate of the
population mean µ is the sample mean x . Will the value of x that we obtain from
our sample be large enough for us to conclude that µ is greater than 40? In order to
answer this question, we need to perform each step of the hypothesis-testing
procedure.
Tests of Population Means using Large Samples
Table 12.5 contains the elements of a large-sample hypothesis test about a
population mean, µ. Note that for this case, the only assumption required for the
validity of the procedure is that the sample size is in fact large (n ≥ 30).
Table 12.5 : Large-sample Test of Hypothesis about a Population Mean
ONE -TAILED TEST TWO -TAILED TEST
H0 : µ = µ 0 H0 : µ = µ 0
H1 : µ > µ 0 (or Ha : µ < µ 0) H1 : µ ≠ µ 0

Test Statistic :
x − µ0 x − µ0
z= ≈
σx σ
n
Rejection Region for H0 Rejection Region for H0
z > zα (or z < − zα) z < − zα /2 (or z > zα /2)

where zα is the z-value such that P(z > zα) = α; and zα/2 is the z-value such that P(z > zα/2) = α/2.
[Note: µ0 is our symbol for the particular numerical value specified for µ in the null hypothesis.]
Assumption : The sample size must be sufficiently large (say, n ≥ 30) so that the sampling
distribution of x is approximately normal and that s provides a good approximation to σ.

Example 12.5
The mean time spent on studies of all students at a university last year was
40 hours per week. This year, a random sample of 35 students at the university was
drawn. The following summary statistics were computed:
x = 42 hours; σ = 13.85 hours

62
Test the hypothesis that µ, the population mean time spent on studies per week is Testing of Hypothesis
equal to 40 hours against the alternative that µ is larger than 40 hours. Use a
significance level of α = 0.05.
Solution
We have previously formulated the hypotheses as
H0 : µ = 40
H1 : µ > 40
Note that the sample size, n = 35, is sufficiently large so that the sampling
distribution of x is approximately normal and that σ provides a good
approximation to σ. Since the required assumption is satisfied, we may proceed
with a large-sample test of hypothesis about µ.

Figure 12.3 : Rejection Region

Using a significance level of α = 0.05, we will reject the null hypothesis for this
one-tailed test if z > zα/ 2 = z0.05 , i.e., if z > 1.645. This rejection region is shown
in Figure 12.3.
Computing the value of the test statistic, we obtain
x − µ 0 42.1 − 40
z= = = 0.897
σ 13.85
n 35
Since this value does not fall within the rejection region (Figure 12.3), we do not
reject H0. We say that there is insufficient evidence (at α = 0.05) to conclude that
the mean time spent on studies per week of all students at the university this year is
greater than 40 hours. We would need to take a larger sample before we could
detect whether µ > 40, if in fact this were the case.
Example 12.6
A sugar refiner packs sugar into bags weighing, on average, 1 kilogram. Now the
setting of machine tends to drift, i.e. the average weight of bags filled by the
machine sometimes increases, sometimes decreases. It is important to control the
average weight of bags of sugar. The refiner wish to detect shifts in the mean
weight of bags as quickly as possible, and reset the machine. In order to detect
shifts in the mean weight, he will periodically select 50 bags, weigh them, and
calculate the sample mean and standard deviation. The data of a periodical sample
is as follows :
x = 1.03 kg, σ = 0.05 kg
63
Probability and Statistics Test whether the population mean µ is different from 1 kg at significance level
α = 0.01.
Solution
We formulate the following hypotheses :
H0 : µ = 1
H1 : µ ≠ 1
The sample size (50) exceeds 30, we may proceed with the larger sample test about
µ. Because shifts in µ in either direction are important, so the test is
two-tailed.
At significance level α = 0.01, we will reject the null hypothesis for this two tail
test if
z < − zα/2 = − z0.005 or z > zα/2 = z0.005
i.e., if z < − 2.576 or z > 2.576.
The value of the test statistic is computed as follows :
x − µ 0 1.03 − 1
z≈ = = 4.243
σ 0.05
n 50
Since this value is greater than the upper-tail critical value (2.576), we reject the
null hypothesis and accept the alternative hypothesis at the significance level of
1%. We would conclude that the overall mean weight was no longer 1 kg, and
would run a less than 1% chance of committing a Type-I error.
Example 12.7
When flipped 1000 times, a coin landed 515 times heads up. Does it support the
hypothesis that the coin is unbiased?
Solution
The null hypothesis is that the coin is unbiased.
In notations Ho : P = Po, where Po = 0.5 and qo = 1 − Po = 0.5
515
Now the sample proportion is pˆ = = 0.515
1000

pˆ − p0 0.515 − 0.5
| z| = = = 0.949 < 2.58 or even 3
p0 q0 0.5 × 0.5
n 1000
We then do not reject the null hypothesis. The coin is unbiased.
Example 12.8
While throwing 5 die 40 times, a person got success 25 times – getting a 4 was
called a success. Can we consider the difference between expected value and
observed value as being significantly different?
Solution
If we carefully examine the data then the hypothesis can be stated that the dice is
unbiased.
4 5
1 5 5
In notation Ho : P = Po, where P0 = 5C4     =   = 0.4019
6 6 6
and q0 = 1 – P0 = 0.5981
64
i.e. Ho : P = 0.4019 and H1 : P ≠ 0.4019 Testing of Hypothesis

25
The sample proportion pˆ = = 0.625
40

pˆ − p0 0.625 − 0.4019
| z| = = = 2.88 > 2.58
p0 q0 0.4019 × 0.5981
n 40
Hence the hypothesis H0 is to be rejected at 1% level of significance or we can say
that the value obtained is highly significant. The given data do not support H0.
Thus the dice is not unbiased.
Tests of Population Means using Small Samples
When the assumption required for a large-sample test of hypothesis about µ is
violated, we need a hypothesis-testing procedure that is appropriate for use with
small samples. Because if we use methods of the large-sample test, we will run
into trouble on two accounts. Firstly, our small sample will underestimate the
population variance, so our test statistic will be wrong. Secondly, the means of
small samples are not normally distributed, so our critical values will be wrong.
We have learnt that the means of small samples have a t-distribution, and the
appropriate t-distribution will depend on the number of degrees of freedom in
estimating the population variance. If we use large samples to test a hypothesis,
then the critical values we use will depend upon the type of test (one or two tailed).
But if we use small samples, then the critical values will depend upon the degrees
of freedom as well as the type of test.
A hypothesis test about a population mean, µ, based on a small sample (n < 30)
consists of the elements listed in Table 12.6.
Table 12.6 : Small-sample Test of Hypothesis about a Population Mean
ONE-TAILED TEST TWO-TAILED TEST
H0 : µ = µ 0 H0 : µ = µ 0
H1 : µ > µ 0 (or H1 : µ < µ 0) H1 : µ ≠ µ 0
Test Statistic
x − µ0
t=
σ
n
Rejection Region Rejection Region
t > tα (or t < − tα) t < − tα /2 (or t > tα /2)
where the distribution of t is based on (n – 1) degrees of freedom; tα is the
t-value such that P (t > tα ) = α ; and tα/2 is the t-value such that
P (t > tα/2 ) = α/2.
Assumption: The relative frequency distribution of the population from which
the sample was selected is approximately normal.
As we noticed in the development of estimation procedures, when we are making
inferences based on small samples, more restrictive assumptions are required than
when making inferences from large samples. In particular, this hypothesis test
requires the assumption that the population from which the sample is selected is
approximately normal.
Notice that the test statistic given in Table 12.6 is a t statistic and is calculated
exactly as our approximation to the large-sample test statistic, z, given earlier in
this section. Therefore, just like z, the computed value of t indicates the direction 65
Probability and Statistics and approximate distance (in units of standard deviations) that the sample mean,
x , is from the hypothesized population mean, µ0.
Example 12.9
The expected lifetime of electric light bulbs produced by a given process was 1500
hours. To test a new batch a sample of 10 was taken which showed a mean lifetime
of 1410 hours. The standard deviation is 90 hours. Test the hypothesis that the
mean lifetime of the electric light bulbs has not changed, using a level of
significance of α = 0.05.
Solution
This question asks us to test that the mean has not changed, so we must employ a
two-tailed test :
H0 : µ = 1500
H1: µ ≠ 1500
Since we are restricted to a small sample, we must make the assumption that the
lifetimes of the electric light bulbs have a relative frequency distribution that is
approximately normal. Under this assumption, the test statistic will have a
t-distribution with (n − 1) = (10 − 1) = 9 degrees of freedom. The rejection rule is
then to reject the null hypothesis for values of t such that
t < − tα /2 or t > tα /2 with α /2 = 0.05/2 = 0.025.
From Table with 9 degrees of freedom, we find that
t0.025 = 2.262.
The value of test statistic is
x − µ 0 1410 − 1500
t= = = − 3.1623
σ 90
n 10
The computed value of the test statistic, t = − 3.1623, falls below the critical value
of − 2.262. We reject H0 and accept H1 at significance level of 0.05, and conclude
that there is some evidence to suggest that the mean lifetime of all light bulbs has
changed.
Testing the Difference between Means
If x1 and x2 denote the means of the samples drawn from the first and second
population respectively, having means µ1 and µ2 and standard deviations σ1 and σ2
and if the sizes of the samples are n1 and n2, then it can be proved that the
distribution of the difference between the means x1 − x2 is normal with mean
(µ1 − µ2) and standard deviation is given by

σ12 σ 22
σ= +
n1 n2

(x1 − x2 ) − (µ1 − µ 2 )
Therefore, z=
σ12 σ 22
+
n1 n2

Further, under the hypothesis Ho : µ1 = µ2 or H0 : µ1 − µ2 = 0. We see that

66
x1 − x2 Testing of Hypothesis
z= is the standard normal variate.
σ12 σ 22
+
n1 n2

When the two samples belong to the same population, we have σ1 = σ2 = σ then,
x1 − x2
z=
1 1
σ +
n1 n2

Similarly, the confidence limits for (µ1 − µ2 ) at various levels of confidence are :
(i) (x1 − x2 ) ± 1.96 σ at 95% level of confidence

(ii) (x1 − x2 ) ± 2.58 σ at 99% level of confidence

(iii) (x1 − x2 ) ± 3. S at 99.73% level of confidence.

1 1
Note 1 : Here S = S. E. = σ + . For the samples drawn from the same
n1 n2
population.
Note 2 : If S.D. of two populations, i.e. σ1, σ2 are unknown, we use s. d. of
samples in their places.
Example 12.10
A group of 200 students have the mean height of 154 cms. Another group of
300 students have the mean height of 152 cms. Can these be from the same
population with S.D. of 5 cms?
Solution
Ho : µ1 = µ2, the samples are from the same population.
H1 : µ1 ≠ µ2, here x1 = 154 cms, x2 = 152 cms, σ = 5 cms, n1 = 200 and n2 = 300.

x1 − x2 154 − 152
Now, | z| = = = 4.38 > 3
1 1 1 1
σ + 5 +
n1 n2 200 300

i.e. the z-score is highly significant. Therefore, we reject Ho, i.e. it is not likely that
the two samples are from the same population.
Example 12.11
Suppose it is claimed that in a very large batch of components, about 10% of items
contain some form of defect. It is proposed to check whether this proportion has
increased, and this will be done by drawing randomly a sample of 150
components. In the sample, 20 are defectives. Does this evidence indicate that the
true proportion of defective components is significantly larger than 10%? Test at
significance level α = 0.05.

Solution
We wish to perform a large-sample test about a population proportion, p :
H0: p = 0.10 (i.e., no change in proportion of defectives)
H1: p > 0.10 (i.e., proportion of defectives has increased) 67
Probability and Statistics where p represents the true proportion of defects.
At significance level α = 0.05, the rejection region for this one-tailed test consists
of all values of z for which
z > z0.05 = 1.645
The test statistic requires the calculation of the sample proportion, p̂ of defects :

Number of sampled components


pˆ =
Number of defective components in the sample

20
= = 0.133
150
Noting that q0 = 1 – p0 = 1 – 0.10 = 0.90, we obtain the following value of the test
statistic :
pˆ − p0 0.133 − 0.10
z= = = 1.347
p0 q 0 (0.10) (0.90)
n 150
This value of z lies out of the rejection region; so we would conclude that the
proportion defective in the sample is not significant. We have no evidence to reject
the null hypothesis that the proportion defective is 0.01 at the 5% level of
significance. The probability of our having made a Type II error (accepting H0
when, in fact, it is not true) is β = 0.05.
[Note that the interval

pˆ qˆ (0.133) (1 − 0.133)
pˆ ± 2 = 0.133 ± 2 = 0.133 ± 0.056
n 150
does not contain 0 or 1. Thus, the sample size is large enough to guarantee that
validity of the hypothesis test.]
Although small-sample procedures are available for testing hypotheses about a
population proportion, the details are omitted from our discussion. It is our experience
that they are of limited utility since most surveys of binomial population performed in the
reality use samples that are large enough to employ the techniques of this section.

12.5 STUDENT’S t-DISTRIBUTION


This concept was introduced by W. S. Gosset (1876 - 1937). He adopted the pen name
“student”. Therefore, the distribution is known as ‘student’s t-distribution’.
It is used to establish confidence limits and test the hypothesis when the population
variance is not known and sample size is small (< 30).
If a random sample x1, x2, . . . , xn of n values be drawn from a normal population with
mean µ and standard deviation σ then the mean of sample
∑ xi
x=
n
Estimate of the Variance
Let σ2 be the estimate of the variance of the sample, then σ2 given by
∑ ( xi − x ) 2
σ2 = ; (n − 1) as denominator in place ‘n’.
n −1
68
(i) The statistic ‘t’ is defined as Testing of Hypothesis

|x − µ| |x − µ|
t= or n
σ 2 σ
n
where x = sample mean, µ = actual or hypothetical mean of population,
n = sample size, σ = standard deviation of sample and

∑ (xi − x ) 2
where σ=
n −1

Note : ‘t’ is distributed as the distribution with (n − 1) degree of


freedom (df).
(ii) (a) The variable ‘t’ of the distribution ranges from minus infinity to plus
infinity.
(b) Like standard normal distribution, it is also symmetrical and has mean
zero.
(c) σ2 of t-distribution is greater than 1, but becomes 1 as ‘df’ increases
and thus the sample size becomes large. Thus the variance of
t-distribution approaches the variance of the normal distribution as the
sample size increases for (df) = µ → ∞, the t-distribution matches
with the normal distribution (Figure 12.4).

Figure 12.4

Also note that the t-distribution is lower at the mean and higher at the
tails than the normal distribution, i.e. the t-distribution has
proportionally greater area at its tails than the normal distribution.
(iii) (a) If | t | exceeds t0.05 then difference between x and µ is significant at
0.05 level of significance.
(b) If | t | exceeds t0.01, then difference is said to highly significant at 0.01
level of significance.
(c) If | t | < t0.05, we conclude that the difference between x and m is not
significant and the sample might have been drawn from a population
with mean = µ, i.e. the data is consistent with the hypothesis.
(iv) Fiducial limits of population mean
σ
For 95% x± t0.05
n

69
Probability and Statistics σ
For 99% x± t0.01
n
Example 12.12
A random sample of 16 values from a normal population is found to have a mean
of 41.5 and standard deviation of 2.795. On this basis, is there any reason to reject
the hypothesis that the population mean µ = 43? Also find the confidence limits for
µ.
Solution
Here n = 16 − 1 = 15, x = 41.5, σ = 2.795 and µ = 43.

|x − µ| 1.5 × 15
Now t= n= = 2.078
σ 2.795
From the t-table for 15 degree of freedom, the probability of t being 0.05, the value
of t = 2.13. Since 2.078 < 2.13, the difference between x and µ is not significant.
Now, null hypothesis : Ho : µ = 43 and
Alternative hypothesis : H1 : µ ≠ 43.
Thus there is no reason to reject Ho. To find the limits, using for 95%,
σ
x± t0.05
n
2.795
= 41.5 ± × 2.13
16
= 41.5 ± (0.6988) (2.13)

= (40.011, 42.988)
Example 12.13
Ten individuals are chosen at random from the population and their heights are
found to be inches 63, 63, 64, 65, 66, 69, 69, 70, 70, 71. Discuss the suggestion
that the mean height in the universe is 65 inches given that for 9 degree of freedom
the value of student’s ‘t’ at 0.05 level of significance is 2.262.
Solution
xi = 63, 63, 64, 65, 66, 69, 69, 70, 70, 71 and n = 10
∑ xi 670
∴ x= = = 67
n 10

∑ (xi − x ) 2 88
and σ= = = 3.13 inches
n −1 9

The null hypothesis, H0 : µ = 65 inches


The alternative hypothesis, H1 = µ ≠ 65 inches.
The df = n – 1 = 10 – 1 = 9
|x − µ| 67 − 65
t= n= 10 = 2.02
σ 3.13
But t0.05 at (df = 9) = 2.262
70
∴ t = 2.02 < 2.262 Testing of Hypothesis

The difference is not significant at a ‘t’ 0.05 level of significance. Thus, Ho is


accepted and we conclude that the mean height is 65 inches.
Example 12.14
Nine items of a sample have the following values 45, 47, 52, 48, 47, 49, 53, 51, 50.
Does the mean of the 9 items differ significantly from the assumed population
mean of 47.5?
Given that for degree of freedom = 8, P = 0.945 for t = 1.8 and P = 0.953 for
t = 1.9.
Solution
Given that for degree of freedom = 8. P = 0.945 for t = 1.8 and P = 0.953 for
t = 1.9.
Σ xi = 45 + 47 + 52 + 48 + 47 + 49 + 53 + 51 + 50 = 442
n=9
∑ xi 442
∴ x= = = 49.11
n 9
∑ (xi − x ) 2 54.89
Also σ= = = 2.62
n −1 9 −1
Let the null hypothesis H0 : µ = 47.5 and the alternative hypothesis H1 : µ ≠ 47.5.
|x − µ| 49.11 − 47.5
Now t= n= × 9 = 1.843
σ 2.62
With the given data, we interpret the value of P for t = 1.843.
For t = 1.9, P = 0.953.
for t = 1.8 P = 0.945
Difference of t = 0.1 Difference of P = 0.008
Therefore for difference of t = 0.043, the difference of P = 0.0034. Hence for
t = 1.843, P = 0.9484. Therefore the probability of getting a value of t > 1.43 is
(1 − 0.9484) = 0.051 and it is greater than 0.05. Thus Ho is accepted, i.e. the mean
of 9 items differ significantly from the assumed population mean.
Distribution of ‘t’ for Comparison of Two Samples Means of Independent Samples
Let x1i (i = 1, 2, 3, . . . , n1 ) and x2i (i = 1, 2, 3, . . . , n2 ) be two random
independent samples drawn from two normal populations with mean µ1 and µ2
respectively but with same variance σ2. Let x1 and x2 be sample means and let

1 n1 1 n2
σ12 = ∑ ( xi − x1 ) 2 and σ 22 = ∑ ( xi − x2 ) 2
n1 − 1 i =1 n2 − 1 i = 1
Then the static t is given by
( x1 − x2 )
t=
1 1
σ 2P  + 
 n1 n2 
(n1 − 1) σ12 + (n2 − 1) σ 22
where σ 2P = ,
n1 + n2 − 2
is called the pooled estimate of the population variance.
Example 12.15
Two types of drugs were used on 5 and 7 patients for reducing their weights in
Iswari’s ‘slim-beauty’ health club. Drug A was allopathic and drug B was Herbal. 71
The decrease in the weight after using drugs for six months was as follows :
Probability and Statistics Drug A : 10 12 13 11 14
Drug B : 8 9 12 14 15 10 9
Is there a significant difference in the efficiency of the two drugs? If yes, which
drug should you buy?
Solution
Let the null hypothesis Ho : µ1 = µ2 or Ho : µ1 − µ2 = 0.
Alternative hypothesis H1 : µ1 ≠ µ2 or H1 : µ1 − µ2 ≠ 0
x1i ( x1 − x1 ) ( x1 − x1 ) 2 x2i ( x2 − x2 ) ( x2 − x2 )2
10 −2 4 8 −3 9
12 0 0 9 −2 4
13 1 1 12 1 1
11 −1 1 14 3 9
14 2 4 15 4 16
10 −1 1
9 −2 4
5 7
∑ x1i = 60, ∑ ( x1i − x1 ) 2 = 10 and ∑ x2i = 77, ∑ ( x2i − x2 ) 2 = 44
i =1 i =1

∑ x1i 60 ∑ x2i 77
Now, x1 = = = 12 and x2 = = = 11
n 5 n 7
(n1 − 1) σ12 + (n2 − 1) σ 22
Also σ 2P = ,
n1 + n2 − 2

∑ ( x1 − x1 ) 2 10
where σ12 = = = 2 .5
n1 − 1 5 −1

∑ ( x2 − x2 ) 2 44
and σ 22 = = = 7 .3
n2 − 1 7 −1
(5 − 1) 2.5 + (7 − 1) 7.3
Therefore, σ 2P =
5+ 7 − 2
4 × 2 .5 + 6 × 7 .3
= = 5.38
10
Then using the formula
x1 − x2 − (µ1 − µ 2 )
t= , where µ1 − µ 2 = 0 for H0,
1 1 
σ 2P  + 
 n1 n2 
12 − 11 1 1
we get t= = = = 0.736
1 1 12 5.38 × 0.342
5.38  +  5.38 ×
5 7 35

Now ν (df ) = n1 + n2 − 2 = 10
For ν = 10, t0.05 = 2.228
Therefore, 0.736 < 2.288

72
Thus the null hypothesis is accepted. Hence there is no significance in the Testing of Hypothesis
efficiency of the two drugs. Since drug B is Herbal and there is no difference in
efficiency between the two with no side effects, we should buy the Herbal drug.
Example 12.16
To test the effect of a fertilizer on rice production, 24 equal plots of a certain land
are selected. Half of them were treated with fertilizer leaving the rest untreated.
Other conditions were the same. The mean production of rice on untreated plots
was 4.8 quintals with standard deviation of 0.4 quintal, while the mean yield on the
treated plots was 5.1 quintals with a standard deviation of 0.36 quintal. Can we say
that there is significant improvement in the production of rice due to use of
fertilizer at 0.05 level of significance?
Solution
The null hypothesis H0 : µ1 = µ2 or H0 : µ1 − µ2 = 0
Alternative hypothesis H1: µ1 ≠ µ2 or H1 : µ1 − µ2 ≠ 0
or H1 : µ1> µ2, and the fertilizer improved the yield.
Given x1 = 4.8, n1 = 12, σ1 = 0.4, x2 = 5.1, n2 = 12, σ 2 = 0.36

(n1 − 1) σ12 + (n2 − 1) σ 22


∴ σ 2P =
n1 + n2 − 2

(12 − 1) (0.4) 2 + (12 − 1) (0.36) 2


=
12 + 12 − 2

∴ σ 2P = 0.1448
x1 − x2 − (µ1 − µ 2 )
Using the formula t=
1 1
σ 2P  + 
 n1 n2 
5.1 − 4.8 − 0
we get t= = 1.93
1 1
0.1448  + 
 12 12 
For n (df) = 12 + 12 − 2 = 22, t0.05 = 2.07
Therefore, 1.93 < 2.07
Thus we accept Ho, i.e. there is no significant difference in rice production due to
the use of fertilizer.
12.5.1 Two Tailed and One Tailed Tests
While testing a hypothesis, we often talk of two-tailed tests and one-tailed tests. In the
previous tests the critical region lay along both the tails of the distributions. That is, we
did not want sample statistic (say mean) to be away from the population parameter (say
mean) in either direction. The test for such a hypothesis is non-directional or two-sided or
two-tailed. A two-tailed test of hypothesis will reject the null hypothesis Ho, if the sample
statistic is significantly higher than or lower than the hypothesized population parameter.
Thus in two-tailed test, the rejection (critical) region is located in both the tails.
For example, suppose you suspect that a particular 6th grader’s performance on a test in
Mathematics is not a true representative of the students who have appeared. The national
mean score in this test was found to be 75. The alternative (or research) hypothesis is :
H1 : µ ≠ 75 while the null hypothesis is : Ho : µ = 75.
Now our pre-determined probability level is 95%, i.e. 5% level of significance for this
test. Both tests have the rejection (or critical) region of 5%, i.e. 0.05. Now this rejection
region is divided between both the tails of the distribution (Figure 12.5), i.e. 2.5% or 0.25
73
in the upper tail and 2.5% or 0.25 in the lower tail since your hypothesis gives only a
Probability and Statistics difference and not a direction. You will reject the null hypothesis on the basis that the
sample mean falls into the area beyond 1.96 S.E. Otherwise if it falls into area 0.475
corresponds to 1.96 S.E. you can accept the null hypothesis.

Figure 12.5

Suppose you want to reduce the risk of committing a Type-I error, then reduce the size of
the rejection region, if the hypothesis is treated at 1%, i.e. 0.01 level of significance and
if we consult the table of areas under the normal curve, we find the acceptance region of
0.495 (one half of 0.99) is equal to 2.58 S.E. from µH, i.e. z-score = 0.

Figure 12.6

You will still reject the null hypothesis of no difference, if the class sample is either
much higher or much lower than our population mean of 75.
As distinguished from the two-tailed test, we can apply a directional – one sided, i.e. one-
tailed test also because in some cases it is necessary to guard against only small values of
x , (i.e. sample mean). One-tailed test is so called because the rejection region will be
located in only one-tail, which may either be on the upper or the lower side of the
distribution depending upon the alternative (H1) hypothesis formula. For example, we
want to test a hypothesis that the average income per household is greater than Rs. 5000
against the alternative hypothesis that the income is Rs. 1000 or more. We will place all
α risk on the upper-side of the theoretical sampling distribution and the test will be
one-tailed. On the other hand, if we are testing that the average income per household is
Rs. 5000 against H1 that the income is less than Rs. 5000 or less, the α risk is on the
lower side of the distribution and the test will be one sided.

74
Testing of Hypothesis

Figure 12.7

Summing up, if the population’s specified mean is say µ0, then the null hypothesis
would be H0 : µ = µ0 and alternative (researcher’s) hypothesis could be either one
of
(i) H1 : µ ≠ µo (i.e. µ > µo or µ < µo).
(ii) H1 : µ > µo or
(iii) H1 : µ < µo
Example 12.17
Past records show that the mean marks of students taking statistics are 60 with
standard deviation of 15 marks. A new method of teaching is adopted and a
random sample of 64 students is chosen. After using the new method, the sample
gives the mean marks of 65. Is the new method better?
Solution
Here we are interested in knowing whether the marks increased on using the new
teaching method. Therefore, we use the one-tailed method :
The null hypothesis is : Ho : µ = 60
The alternative hypothesis is : H1 : µ > 60.
We have x = 65, µ = 60, σ = 15 and n = 64 then
x − µ 65 − 60
z= = = 2.66
σ 15
n 64
Now suppose the researcher had predetermined the level of significance which is
0.01 or 1% for his decision. Then 2.66 > 2.33 (Here z-score is 2.33 for 0.01 level
on the upper-tail of distribution). Therefore, the observed value is highly
significant. That is, Ho is rejected and H1 is accepted. This means the new teaching
method is better.
Example 12.18
A manufacturer of an antibiotic claimed that his antibiotic was 90% effective in
curing a certain type of V. D. if used for a duration of 8 weeks. In a sample of
200 people who tried this, 160 people were cured. Determine whether his claim is
legitimate.
Solution
Let P = Probability for curing the V. D. by the use of the manufacturer’s antibiotic.
Setting two types of hypothesis as :
Null hypothesis : H0: P = 0.9 ⇒ claim is supported.
Researcher’s hypothesis : H1 : P < 0.9 ⇒ claim is rejected.

75
Probability and Statistics

Figure 12.8
160
Now p̂ = Proportion of success in the given sample = = 0 .8
200
Now p = 0.9 ⇒ q = 0.1.
pq (0.9) (0.1)
Therefore, = = 0.021
n 200
Thus the corresponding z-score will be
Pˆ − P 0.8 − 0.9 − 0.1
z= = = = − 4.71
pq 0.021 0.021
n
which is much less than − 2.33. Thus by our decision rule H0 is rejected and H1 is
accepted stating that his claim is not legitimate and that the sample results are
highly significant (at 0.01 level of significance).
Test of Significance for Small Samples
So far we have discussed problems belonging to large samples. When a small
sample (size < 30) is considered, the above tests are inapplicable because the
assumptions we made for large sample tests, do not hold good for small samples.
In case of small samples, it is not possible to assume (i) that the random sampling
distribution of a statistics normal and (ii) the sample values are sufficiently close to
population values to calculate the S.E. of estimate.
Thus an entirely new approach is required to deal with problems of small samples.
But one should note that the methods and theory of small samples are applicable to
large samples but its converse is not true.
Degree of Freedom
By degree of freedom (df) we mean the number of classes to which the value
can be assigned arbitrarily or at will without voicing the restrictions or
limitations placed.
For example, we are asked to choose any 4 numbers whose total is 50.
Clearly we are at freedom to choose any 3 numbers say 10, 23, 7 but the
fourth number, 10 is fixed since the total is 50 [50 – (10 + 23 + 7) = 10].
Thus we are given a restriction, hence the freedom of selection of number is
4 − 1 = 3.
The degree of freedom (df) is denoted by v (nu) or df and it is given by
v = n − k, where n = number of classes and k = number of independent
constrains (or restrictions).
In general for a Binomial distribution, v = n − 1.
For Poisson distribution, v = n − 2 (since we use total frequency and
arithmetic mean).
For normal distribution, v = n − 3 (since we use total frequency, mean and
standard deviation) etc.
12.5.2 Hypothesis Tests about the Difference between Two Population
Means
There are two brands of coffee, A and B. Suppose a consumer group wishes to determine
whether the mean price per kg of brand A exceeds the mean price per kg of
brand B. That is, the consumer group will test the null hypothesis H0: (µ1 − µ2) = 0
against the alternative (µ1 − µ2) > 0. The large-sample procedure described in Table 12.7
is applicable for testing a hypothesis about (µ1 − µ2), the difference between two
76 population means.
Table 12.7 : Large-sample Test of Hypothesis about (µ1 − µ2) Testing of Hypothesis

ONE -TAILED TEST TWO -TAILED TEST


H0 : (µ1 − µ2) = D0 H0 : (µ1 − µ2) = D0
H1 : (µ1 − µ2) > D0 (or H1 : (µ1 − µ2) < D0) H1 : (µ1 − µ2) ≠ D0
Test Statistic
(x1 − x2 ) − D0 (x1 − x2 ) − D0
| z| = ≈
σ (x1 − x2 ) σ12 σ 22
+
n1 n2
Rejection Region Rejection Region
z > zα (or z < − zα) z < − zα/2 or z > zα/2
[Note : In many practical applications, we wish to hypothesize that there is no difference
between the population means; in such cases, D0 = 0]
Assumptions:
(1) The sample sizes n1 and n2 are sufficiently large (n1 ≥ 30 and n2 ≥ 30).
(2) The samples are selected randomly and independent from the target populations.

Example 12.19
A consumer group selected independent random samples of supper-markets
located throughout a country for the purpose of comparing the retail prices per kg
of coffee of brands A and B. The results of the investigation are summarised in
Table 12.8. Does this evidence indicate that the mean retail price per kg of brand A
coffee is significantly higher than the mean retail price per kg of brand B coffee?
Use a significance level of α = 0.01.
Table 12.8 : Coffee Prices

Brand A Brand B
n1 = 75 n2 = 64
x1 = Rs. 300 x2 = Rs. 295
σ1 = Rs.11 σ2 = Rs. 9

Solution
The consumer group wants to test the hypotheses
H0 : (µ1 − µ2) = 0 (i.e., no difference between mean retail prices)
H1 : (µ1 − µ2) > 0 (i.e., mean retail price per kg of brand A is higher than that of
brand B)
where, µ1 = Mean retail price per kg of brand A coffee at all
super-markets, and
µ2 = Mean retail price per kg of brand B coffee at all super-markets.
This one-tailed, large-sample test is based on a z statistic. Thus, we will reject H0 if
z > zα = z0.01. Since z0.01 = 2.33, the rejection region is given by z > 2.33
(Figure 12.9.)
We compute the test statistic as follows :
(x1 − x2 ) − D0 (300 − 295) − 0
z= = = 2.947
σ12 σ 22 (11) 2
(9) 2
+ +
n1 n2 75 64

77
Probability and Statistics

Figure 12.9 : Rejection Region


Since this computed value of z = 2.947 lies in the rejection region, there is sufficient
evidence (at α = 0.01) to conclude that the mean retail price per kg of brand A coffee is
significantly higher than the mean retail price per kg of brand B coffee. The probability
of our having committed a Type-I error is α = 0.01.
When the sample sizes n1 and n2 are inadequate to permit use of the large-sample
procedure of Example 12.19, we have made some modifications to perform a
small-sample test of hypothesis about the difference between two population means. The
test procedure is based on assumptions that are more restrictive than in the large-sample
case. The elements of the hypothesis test and required assumptions are listed
in Table 12.9.
Table 12.9 : Small-sample Test of Hypothesis about (µ1 - µ2)
ONE -TAILED TEST TWO -TAILED TEST
H0: (µ1 − µ2) = D0 H0: (µ1 − µ2) = D0
H1: (µ1 − µ2) > D0 (or H1: (µ1 − µ2)< D0) H1: (µ1 − µ2) ≠ D0
Test Statistic
(x1 − x2 ) − D0
t=
1 1
σ 2p  + 
 n1 n2 
Rejection Region Rejection Region
t > tα (or t < − tα) t < − tα/2 or t > tα/2
where
(n1 − 1) σ12 + (n2 − 1) σ 22
σ 2p =
n1 + n2 − 2
and the distribution of t is based on (n1 + n2 − 2) degrees of freedom.
Assumptions
(1) The populations from which the two samples are selected both have approximately normal
relative frequency distributions.
(2) The variances of the two populations are equal.
(3) The random samples are selected in an independent manner from the two populations.

Example 12.20
There was a research on the weights at birth of the children of urban and rural
women. The researcher suspects there is a significant difference between the mean
weights at birth of children of urban and rural women. To test this hypothesis, he
selects independent random samples of weights at birth of children of mothers
from each group, calculates the mean weights and standard deviations and
summarizes in Table 12.10. Test the researcher’s belief, using a significance of
α = 0.02.
78
Table 12.10 : Weight at Birth Data Testing of Hypothesis

Urban Mothers Rural Mothers


n1 = 15 n2 = 14
x1 = 3.5933 kg x2 = 3.2029 kg
σ1 = 0.3707 kg σ2 = 0.4927 kg

Solution
The researcher wants to test the following hypothesis :
H0: (µ1 − µ2) = 0 (i.e., no difference between mean weights at birth)
H1: (µ1− µ2) ≠ 0 (i.e., mean weights at birth of children of urban and rural
women are different) where µ1 and µ2 are the true mean weights at birth of
children of urban and rural women, respectively.
Since the sample sizes for the study are small (n1 = 15, n2 = 14), the following
assumptions are required:
(i) The two populations of weights at birth of children both have approximately
normal distributions.
(ii) The variances of the populations of weights at birth of children for two
groups of mothers are equal.
(iii) The samples were independently and randomly selected.
If these three assumptions are valid, the test statistic will have a t-distribution with
(n1 + n2 − 2) = (15 + 14 − 2) = 27 degrees of freedom with a significance level of
α = 0.02, the rejection region (Figure 12.10) is given by
t < − t0.01 = − 2.473 or t > t0.01 = 2.473 (see Figure 12.10)

Figure 12.10 : Rejection Region


Since we have assumed that the two populations have equal variances (i.e. that
σ12 = σ 22 = σ ), we need to compute an estimate of this common variance. Our
pooled estimate is given by

(n1 − 1) σ12 + (n2 − 1) σ 22 (15 − 1) (0.3707) 2 + (14 − 1) (0.4927) 2


σ 2p = = = 0.1881
n1 + n2 − 2 15 + 14 − 2

Using this pooled sample variance in the computation of the test statistic, we 79
obtain
Probability and Statistics (x1 − x2 ) − D0 (3.5933 − 3.2029) − D0
t= = = 2.422
1 1 1 1
σ 2p  +  0.1881  + 
 n1 n2   15 14 

Now the computed value of t does not fall within the rejection region; thus, we fail
to reject the null hypothesis (at α = 0.02) and conclude that there is insufficient
evidence of a difference between the mean weights at birth of children of urban
and rural women.
In this example, we can see that the computed value of t is very close to the upper
boundary of the rejection region. This region is specified by the significance level and
the degree of freedom. How is the conclusion about the difference between the mean
weights at births affected if the significance level is α = 0.05? We will answer the
question in the next example.
12.5.3 Test for Difference between Proportions
If two samples are drawn from different populations, we may be interested in finding out
whether the difference between the proportion of successes is significant or not. Let x1
and x2 be the number of items possessing the attribute A, in the random sampling of sizes
n1 and n2 from two populations respectively. Then the sample proportions of successes
x x
are P1 = 1 and P2 = 2 , where P1 and P2 are proportion of successes in the two
n1 n2
populations.
Under the hypothesis that the proportions in two populations are equal
P1 − P2
z=
1 1 
P Q  + 
 n1 n2 
In general, however, we do not know the population’s proportion of success. In such a
case, we can replace P by its best estimate, the pooled estimate of the actual proportion in
the population, where
n1 P1 + n2 P2 x + x2
Pooled estimate (P) = or P = 1 and Q = 1 – P.
n1 + n2 n1 + n2

Example 12.21
A machine produced 16 defective articles in a batch of 500. After overhauling, it
produced 3 defectives in a batch of 100. Has the machine improved?
Solution
Ho : P1 = P2, i.e. the machine has not improved after overhauling. H1 : P1 ≠ P2
16 3
Now P1 = = 0.032 and P2 = = 0.030 .
500 100
Pooled estimate of actual proportion in the population is given by
x1 + x2 16 + 3
P= = = 0.032
n1 + n2 500 + 100

Q = 1 – P = 0.968

80
Testing of Hypothesis

P1 − P2 0.032 − 0.030
| z| = =
1 1   1 1 
P Q  +  0.032 × 0.968  + 
 n1 n2   500 100 

0.002
| z| = = 0.105 < 1.96 (at 5% level)
0.019
Ho is true, i.e. the machine has not improved significantly.
Example 12.22
There are 1000 students in a college out of 20000 students in the whole university.
In a study 200 were found smokers in the college and 1000 in the university. Is
there a significant difference between the proportion of smokers in the college and
in the university?
Solution
Ho : P1 = P2, i.e. there is no significant difference in the college and university in
case of proportion of smokers. H1: P1 ≠ P2.
200
Proportion of smokers in college, P1 = = 0.20
1000
1000
Proportion of smokers in the university, P2 = = 0.05
20000
Q2 = 1 − P2 = 0.95
Also n1 = 1000 and n1 + n2 = 20000 ∴ n2 = 19000.

P1 − P2 0.20 − 0.05
| z| = =
n2  19000 
P2 Q2 × 0.05 × 0.95  
n1 + n2  1000 + 19000 

| z | = 0.706 < 3

Since the value is highly significant, it could not have arisen due to sample
fluctuations. By not Rejecting Ho we say that there is no significant difference
between proportion of smokers in the college and the university.
SAQ 1
(a) A stenographer claims that she can take dictation at the rate of 120 words
per minute. Can we reject her claim on the basis of 100 trials in which she
demonstrated a mean of 116 words with standard deviation of 15 words?
(b) An automatic machine was designed to pack exactly 2 kg of tea. A sample
of 100 packs was examined to test the machine. The average weight was
found to be 1.94 kg with standard deviation of 0.10 kg. Is the machine
working properly?
(c) Prior to the institution of a new safety program, the average number of on-
the-job accidents per day at a factory was 4.5. To determine if the safety
program has been effective in reducing the average number of accidents per
day, a random sample of 30 days is taken after the institution of the new
safety program and the number of accidents per day is recorded. The sample
mean and standard deviation were computed as follows :
81
Probability and Statistics x = 3.7 σ = 1.3
(i) Is there sufficient evidence to conclude (at significance level 0.01)
that the average number of on-the-job accidents per day at the factory
has decreased since the institution of the safety program?
(ii) What is the practical interpretation of the test statistic computed in
part (i)?
(d) A patented medicine claimed that it is effective in curing 90% of the patients
suffering from malaria. From a sample of 200 patients using this medicine, it
was found that only 170 were cured. Determine whether the claim is right or
wrong (Take 1% level of significance).
(e) Random samples from two population gave the following results :
Population A Population B
Mean 490 500
SD 50 40
Size 300 300

Is the difference between the means significant?


(f) In two large populations, there are 30% and 25% fair haired people
respectively. Is the difference likely to be hidden in sample of 1200 and 900
respectively from the two populations?
(g) Two types of needles, the old type and the new type, used for injection of
medical patients with a certain substance. The patients were allocated at
random to two group, one to receive the injection from needle of the old
type, the other to receive the injection from needles of the new type. The
number of patients showing reactions to the injection. Does the information
support the belief that the proportion of patients giving reactions to needles
of the old type is less than the corresponding proportion patients giving
reactions to needles of the new type? Test at significance level of α = 0.01.
Data on the Patients’ Reactions
Injected by Old Injected by New
Type Needles Type Needles
Number of sampled 100 100
patients
Number in sample with 37 56
reactions

(h) The breaking strengths of metal rods, produced by a certain company, have
mean as 820 kg and standard deviation as 50 kg. When a new manufacturing
process is adopted, it is claimed that the breaking strength can be improved.
A sample of 100 rods is tested and the results indicates that the breaking
strength as 840 kg. Can we support this claim at a 1% level of significance?

SAQ 2
(a) A certain stimulus administered to each of 12 patients resulted in the
following increments in ‘Blood pressure’ 5, 2, 8, − 1, 3, 0, 6, − 2, 1, 5, 0, 4.
Can it be concluded that the stimulus will in general be accompanied by an
increase in blood pressure, given that for all df the value of t0.05 = 2.201?
(b) Two types of batteries are tested for their length of life and following results
are obtained.
82
No. of Sample Mean Variance Testing of Hypothesis
(n) (x)
Battery A 10 500 hours 100
Battery B 10 560 hours 121

Is there a significant difference in the two batteries?


(c) The mean life time of a sample of 100 fluorescent light bulbs produced by a
company is computed to be 1570 hours with a standard deviation of
120 hours. The company claims that the average life of the bulbs produced
by it is 1600 hours. Using the level of significance of 0.05, is the claim
acceptable?
(d) A machinist is making engine parts with axle diameter of 0.70 inch. A
random sample of 10 parts shows mean diameter 0.742 inch with a standard
deviation of 0.04 inch. On the basis of this sample, would you say that the
work is inferior?
(e) The growth (in millimeters) in 15 days of the tumor induced in a mouse is
expected to be 4.0 millimeters. In order to test this contention a sample of
nine mice with induced tumor was observed for 15 days. The mean growth
in the sample was obtained to be 4.3 mm and the sample standard deviation
to be 1.2 mm. If the tumor growth can be assumed to follow a normal
distribution, test at 0.1 level of significance whether the contention is
correct.
(f) In a psychology class, a Professor read a report which noted that 30% of all
women are afraid of flying. A student was given the project to take a sample
and test if the report was correct. The student took a random sample of
150 women and found that 50 of these women were afraid of flying.
At α = 0.05 m, test if the sample conclusions are consistent with the
reported figures.
(g) The Dean of students wants to find out if there is any significant difference
in the mathematical ability of male and female students, as determined by
their achievement scores in Basic Skills test in mathematics. A random
sample of 50 female students and 100 male students was selected from all
the students who took the test in the autumn of 2004. Their results are
summarized as follows :
Male Students Female Students
Average score x1 = 75 x2 = 70
Standard deviation σ1 = 10 σ2 = 12
At 0.01 level of significance, test if there is a significant difference in the
average scores of male and female students.
(h) A sample of 8 students majoring in Economics was taken to their IQ scores.
They were given a standardised test and their scores were recorded as
120, 116, 122, 125, 120, 115, 110, 132
Construct a 95% confidence interval for the true average IQ for all majoring
in Economics. Assume that the sample is from a Normal distribution.
(i) A soft drink vending machine is set to dispense 8 ounces per cup. If the
machine is tested 9 times yielding a mean cup fill of 8.2 ounces with a
standard deviation of 0.3 ounces, what can we conclude about the null
hypothesis of µ = 8 ounces against the alternate hypothesis of µ > 8 ounces
at α = 0.01.
(j) It is desired to test if there is any significant difference between average
ages of seniors at Kamla Nehru College and Gargi College. A random
sample of 10 seniors from Kamla Nehru revealed the average age to 83
Probability and Statistics 23 years with a standard deviation of 4 years. A similar random sample of
8 seniors from Gargi College revealed an average age of 26 years with a
standard deviation of 5 years. At 0.05 level of significance, is there a
difference between the average age of seniors at the two colleges?

12.6 CHI-SQUARE TEST


Tests like z-score and t are based on the assumption that the samples were drawn from
normally distributed populations or more accurately that the sample means were
normally distributed. As these tests require assumptions about the type of population or
parameters, these tests are known as ‘parametric tests’.
There are many situations in which it is impossible to make any rigid assumption about
the distribution of the population from which samples are drawn. This limitation led to
search for non-parametric tests. Chi-square (Read as Ki-square) test of independence and
goodness of fit is a prominent example of a non-parametric test. The chi-square (χ2) test
can be used to evaluate a relationship between two nominal or ordinal variables.
χ2 (chi-square) is measure of actual divergence of the observed and expected frequencies.
In sampling studies, we never expect that there will be a perfect coincidence between
actual and observed frequencies and the question that we have to tackle is about the
degree to which the difference between actual and observed frequencies can be ignored
as arising due to fluctuations of sampling. If there is no difference between actual and
observed frequencies then χ2 = 0. If there is a difference, then χ2 would be more than 0.
But the difference may also be due to sample fluctuation and thus the value of χ2 should
be ignored in drawing the inference. Such values of χ2 under different conditions are
given in the form of tables and if the actual value is greater than the table value, it
indicates that the difference is not solely due to sample fluctuation and that there is some
other reason.
On the other hand, if the calculated χ2 is less than the table value, it indicates that the
difference may have arisen due to chance fluctuations and can be ignored. Thus χ2-test
enables us to find out the divergence between theory and fact or between expected and
actual frequencies.
If the calculated value of χ2 is very small, compared to table value, then expected
frequencies are very little and the fit is good.
If the calculated value of χ2 is very large as compared to table value then divergence
between the expected and the observed frequencies is very big and the fit is poor.
We know that the degree of freedom r (df) is the number of independent constraints in a
set of data.
Suppose there is a two χ2 association table and actual frequencies of the various classes
are as follows :

A a
B AB aB

22 38 60
84
Testing of Hypothesis
Ab ab
b 8 32 40

30 70 100

Now the formula for calculating expected frequency of any class (cell)

Total for column containing the cell


Row total for row containing the cell × colum =
The total number of observations

R×C
In notations : Expected frequency =
N
For example, if we have two attributes A and B that are independent then the expected
30 × 60
frequency of the class (cell) AB would be = = 18 .
100
Once the expected frequency of cell (AB) is decided the expected frequencies of
remaining three classes are automatically fixed.
Thus for class (aB) it would be 60 – 18 = 42
for class (Ab) it would be 30 – 18 = 12
for class (ab) it would be 70 – 42 = 28
This means that so far as two χ2 association (contingency) table is concerned, there is
1 degree of freedom.
In such tables, the degrees of freedom are given by a formula n = (c – 1) (r – 1),
where c = Number of columns and r = Number of rows.
Thus in 2 × 2 table df = (2 – 1) (2 – 1) = 1
3 × 3 table df = (3 – 1) (3 – 1) = 4
4 × 4 table df = (4 – 1) (4 – 1) = 9 etc.
If the data is not in the form of contingency tables but as a series of individual
observations or discrete or continuous series then it is calculated by n = n – 1 where n is
the number of frequencies or values of number of independent individuals.

 (Observed frequency − Expected frequency) 2 


χ2 = ∑  
 Expected frequency 

 (O − E ) 2 
χ2 = ∑  
 E 
where O = Observed frequency and E = Expected frequency.

Example 12.23
The following table shows the age groups of people interviewed according to their
age-group and the number in each group estimated to have T. B.

Age Group Nos. Interviewed T. B. Cases

15 – 20 199 1

20 – 25 300 8
85
Probability and Statistics
25 – 35 1128 38

35 – 45 1375 96

45 – 55 1089 105

55 – 65 625 56

65 - 75 155 12

Total 4871 316

Do these figures justify the hypothesis that T. B. is equally popular in all age
groups?
Solution
If T.B. equally popular in all groups then in each age group
 316 
 4871 × 100 = 6.5% of the people suffer from it 
 
On this basis, the observed and expected frequencies would be as

Age Group Observed Cases Expected Cases


15 - 20 1 13
20 - 25 8 19.5
25 - 35 38 73
35 - 45 96 89
45 - 55 105 71
55 - 65 56 40.5
65-75 12 10

Using formula, we get


(1 − 13) 2 (8 − 19.5) 2 (38 − 73) 2 (96 − 89) 2
χ2 = + + +
13 19.5 73 89
(105 − 71) 2 (56 − 40.5) 2 (12 − 10) 2
+ + +
71 40.5 10
Therefore, χ 2 = 57.6
Also n (df) = (c − 1) (n − 1) = (7 − 1) (2 − 1) = 6
From the χ2 − table, χ2 0.05, n = 6 = 12.59
Since 57.6 > 12.59 at 0.05 level of significance for 6 degree of freedom, the
difference is significant and the hypothesis is not justified.
SAQ 3
12 dice were thrown 4096 times and a throw of 6 was reckoned as a success. The
observed frequencies were as given below :
Number of 0 1 2 3 4 5 6 7 Total
Successes
Frequencies 447 1145 1181 796 380 115 24 8 4096

86
Find the value of χ2 on the hypothesis that the dice were unbiased and hence show Testing of Hypothesis
that the data is consistent with the hypothesis so far as the χ2 test is concerned.

12.7 SUMMARY
In this unit we have learnt the procedures for testing hypotheses about various population
parameters.
In many practical problems, statisticians are called upon to make decisions about a
statistical population on the basis of simple observations. In attempting to reach such
decision, it is necessary to make certain assumptions or guesses about the characteristics
of population, particularly about the probability distribution or the value of its
parameters. Such an assumption or statement about the population is called a Statistical
Hypothesis. The validity of a hypothesis will be tested by analysing the sample. The
procedure which enables us to decide whether a certain hypothesis is true or not, is called
Testing of Hypothesis.
A statistical test of significance involves two mutually exclusive and exhaustive
hypotheses, the null hypothesis (H0) and the alternative hypothesis (H1).
The null hypothesis specifies an expected value for the population. It usually takes the
form of “no effect” or “no difference”.
The alternative hypothesis denies the null hypothesis.
Statistical proof is both indirect and probabilistic. By rejecting the null hypothesis, we
assert the alternative hypothesis.
Rejection of the null hypothesis involves a judgement based on probability. If the
obtained result would have rarely occurred in the sampling distribution of the statistic,
we reject H0 and assert H1.
“Rarely” is, in turn, defined by probability. The 5 percent significance level means that
the result would be obtained by chance 5 percent of the time or less. Similarly, using
α = 0.01, H0 is rejected when the result would be obtained by chance 1 percent of time or
less.
Statistical proof is not absolute. Two types of error that may be made are Type-I or
Type α error and Type-II or Type β error.
A Type-I error consists of falsely rejecting H0; i.e. rejecting H0 when it is true. The
probability of this type of error is α.
A Type-II error occurs when we fail to reject H0, when in fact H0 is false.
Both the null and alternative hypothesis may either be non-directional or directional.
When non-directional, the critical region is found in both tails of the sampling
distribution. When directional, the critical region is only one-tailed.
Tests involving two sample means usually involve two different conditions, i.e. there are
two samples, which are drawn from two populations. The usual H0 is that the mean of the
first population equals the mean of the second population. Rejection of H0 permits us to
infer that the conditions produced different results.
Dependent tests of significance are used when the measurements are paired in some way.
This may be accomplished by using before-after measures on the same persons or objects
or by matching them on some known basis.
87
Probability and Statistics Hypothesis testing is an inferential process, which means that it uses limited information
as the basis for reaching a general conclusion. A sample provides only limited or
incomplete information about the whole population. This means we could make incorrect
conclusions.
Hypothesis testing involving two sample proportions is conceptually similar to the test of
significance of the difference between means. H0 is typically that the proportion of the
first population equals the proportion of the second population. Rejection of H0 permits
us to infer differences in the populations of the variable of interest.
Chi-square (χ2) test of independence and goodness of fit is a prominent example of a
non-parametric test. The chi-square test (χ2) can be used to evaluate a relationship
between two nominal or ordinal variables. Finally, we had also discussed the Chi-square
test.

12.8 KEY WORDS


Statistical Hypothesis : Any statement or assertion about a statistical
population or the values of its parameters is called
a statistical hypothesis.
Test of Hypothesis : A test of hypothesis is a procedure which specifies
a set of ‘rules for decision’ whether to “accept” or
“reject” the hypothesis under consideration (i.e.
null hypothesis).
Null Hypothesis : A statistical hypothesis which is set up (i.e.
assumed) and whose validity is tested for possible
rejection on the basis of sample observations is
called a Null Hypothesis. It is denoted by H0.
Alternative Hypothesis : A statistical hypothesis which differs from the null
hypothesis is called an Alternative Hypothesis,
and is denoted by H1. The alternative hypothesis is
not tested, but its acceptance (rejection) depends
on the rejection (acceptance) of the null
hypothesis.
Critical Region : The set of values of the test statistic which lead to
rejection of the null hypothesis is called critical
region of the test.
Level of Significance : The maximum probability which a true null
hypothesis is rejected is known as level of
significance of the test and is denoted by α.
Type-I Error : A Type-I error is the error of rejecting the null
hypothesis when it is true. The probability of
committing a Type-I error is usually denoted by α.
Type-II Error : A Type-II error is the error of accepting the null
hypothesis when it is false. The probability of
making a Type-II error is usually denoted by β.

12.9 ANSWERS TO SAQs


SAQ 1
(a) The hypothesis to be tested is that her claims to be accepted.
In notations Ho : µ = 120 and H1 : µ ≠ 120
Substituting, x = 116, σ = 15, n = 100 in
88
Testing of Hypothesis

x −µ
| z| = , we get
σ
n

116 − 120
| z| = = 2.67 > 1.96
15
100
The difference is not significant at both 5% and 1% level of significance, i.e.
the value of z - score 2.67 is highly significant. Hence Ho is rejected, i.e. her
claim is to be rejected.
If somebody is interested in the number of trials on the basis of which, with
the same figure, her claim would not have been rejected, he can proceed as

116 − 120 −4
| z| = = ≤ 1.96
15 15
n n

4 n
i.e. ≤ 1.96
15
i.e. n ≤ 0.49 × 15

i.e. n ≤ 7.35

i.e. n ≤ (7.35) 2
i.e. n ≤ 54.02
i.e. n = 54 trials.
(b) The null hypothesis to be tested is that the machine is working properly.
In notations H0 : µ = 2 kg and H1 : µ ≠ 2 kg.
Substituting, x = 1.94, σ = 0.10, n = 100

x −µ 1.94 − 2
we get, | z| = = = 6.0 > 2.58
σ 0.10
n 100
The z-score is highly significant, hence, we reject Ho on the basis of this
sample, i.e. the machine is not working properly.
(c) (i) In order to determine whether the safety program was effective, we
will conduct a large-sample test of
H0 : µ = 4.5 (i.e., no change in average number of on-the-job
accidents per day)
H1 : µ < 4.5 (i.e., average number of on-the-job accidents per day has
decreased)
where µ represents the average number of on-the-job accidents per
day at the factory after institution of the new safety program. For a
significance level of α = 0.01, we will reject the null hypotheses if
z < − z0.01 = − 2.33
89
The computed value of the test statistic is
Probability and Statistics
x − µ0 3.7 − 4.5
z= = = 3.37
σ 1.3
n 30
Since this value does fall within the rejection region, there is
sufficient evidence (at α = 0.01) to conclude that the average number
of on-the-job accidents per day at the factory has decreased since the
institution of the safety program. It appears that the safety program
was effective in reducing the average number of accidents per day.
(ii) If the null hypothesis is true, µ = 4.5. Recall that for large samples,
the sampling distribution of x is approximately normal, with mean
σ
µ x = µ and standard deviation σ x = . Then the z-score for x ,
n
under the assumption that H0 is true, is given by
x − 4.5
σx =
σ
n

Figure 12.11 : Location of Rejection Region


You can see that the test statistic computed above is simply the
z-score for the sample mean x , if in fact µ = 4.5. A calculated z-score
of − 3.37 indicates that the value of x computed from the sample
falls a distance of 3.37 standard deviations below the hypothesized
mean of µ = 4.5. Of course, we would not expect to observe a z-score
this extreme if in fact µ = 4.5.
(d) The null hypothesis is that the claim is quite right, i.e.
H0 : P = P0 where P0 = 90% = 0.9 and H1 : P ≠ 0.9.
Also qo = 1 – P0 = 0.1 and n = 200.
170
The sample proportion pˆ = = 0.85
200

pˆ − p0 0.85 − 0.9
| z| = = = 2.36 < 2.58
p0 q0 0.9 × 0.1
n 200
90
The null hypothesis Ho is quite right at 1% level of significance and that the Testing of Hypothesis
claim is justified.
(e) H0 : µ1 = µ2, the difference between the two means is not significant, i.e.
H1 : µ1 ≠ µ2.
We have x1 = 490, x2 = 500, σ1 = 50, σ2 = 40, n1 = 300 and n2 = 300.

x1 − x2 490 − 500 − 10
| z| = = =
σ12 σ 22 (50) 2
(40) 2 2500 1600
+ + +
n1 n2 300 300 300 300

− 10 − 10 − 10
= = = = 2.71 > 1.96
4100 41 13.66
300 3
Therefore, the null hypothesis H0 is rejected, i.e. the difference between the
two means is significant.
(f) P1 = 30% = 0.30 and P2 = 25% = 0.25, q1 = 0.70 and q2 = 0.75, n1 = 1200
and n2 = 900.

P1 − P2 0.30 − 0.25
| z| = =
P1 q1 P2 q2 (0.3) (0.7) (0.25) (0.75)
+ +
n1 n2 1200 900

0.05
= = 2.56
0.0195
Therefore, | z | > 1.96, (i.e. at 5% level of significance). Hence, it is unlikely
that the real difference will be hidden.
Note : At times you may be interested in the comparison of proportions of
persons possessing an attribute in a sample with proportion given by
the population. In that case use :

P1 − P2
| z| =
n2
P1 q1 ×
n1 + n2

where P2 = Population proportion, q2 = 1 − P2,


n1 = Number of observations in the sample,
n1 + n2 = Size of population, and
n2 = (Size of population − n1).
(g) We wish to perform a test of
H0 : (p1 − p2) = 0
H1 : (p1 − p2) < 0
where
p1 = Proportion of patients giving reactions to needles of the old type.
p2 = Proportion of patients giving reactions to needles of the new type.
91
For this large-sample, one-tailed test, the null hypothesis will be rejected if
Probability and Statistics z < − z0.01 = − 2.33
The sample proportions p1 and p2 are computed for substitution into the
formula for the test statistic:
p̂1 = Sample proportion of patients giving reactions with needles of the old
37
type = = 0.37
100
p̂2 = Sample proportion of patients giving reactions with needles of the
56
new type = = 0.56
100
Hence, qˆ1 = 1 − pˆ 1 = 1 − 0.37 = 0.63 and qˆ 2 = 1 − pˆ 2 = 1 − 0.56 = 0.44
Since D0 = 0 for this test of hypothesis, the test statistic is given by
(pˆ 1 − pˆ 2 ) − D0
z=
1 1
pˆ qˆ  + 
 n1 n2 
where
Total number of patients giving reactions with needles of both types
pˆ =
Total number of patients sampled
37 + 56
= = 0.465,
100 + 100

and qˆ = 1 − pˆ = 0.535
Then we have
(0.37 − 0.56) − 0
z= = − 2.69
 1 1 
(0.465) (.535)  + 
 100 100 
This value falls below the critical value of − 2.33. Thus, at α = 0.01, we
reject the null hypothesis; there is sufficient evidence to conclude that the
proportion of patients giving reactions to needles of the old type is
significantly less than the corresponding proportion of patients giving
reactions to needles of the new type, i.e. p1 < p2.
The inference derived from the test in SAQ 1(g) is valid only if the sample
sizes, n1 and n2, are sufficiently large to guarantee that the intervals
pˆ 1 − qˆ1 pˆ 2 − qˆ 2
pˆ 1 ± 2 and pˆ 2 ± 2
n1 n2

do not contain 0 and 1. This requirement is satisfied for SAQ 1(g) :


pˆ 1 − qˆ1 (0.37) (0.63)
pˆ 1 ± 2 = 0.37 ± 2 = 0.37 ± 0.097 or (0.273, 0.467)
n1 100
pˆ 2 − qˆ 2 (0.56) (0.44)
pˆ 2 ± 2 = 0.56 ± 2 = 0.56 ± 0.090 or (0.467, 0.659)
n2 100

92
Testing of Hypothesis

Figure 12.12 : Rejection Region

(h) Setting up the two hypotheses as :


Null hypothesis : Ho : µ = 820 kg ⇒ there is no change.
Researcher’s hypothesis : H1 : µ < 820 kg ⇒ Breaking strength is improved.
A one-tailed test should be used at 0.01 level of significance for decision
making rule.
Now µ = 820, x = 840, n = 100 and σ = 50 then
x − µ 840 − 820 20
z= = = = 4.00
σ 50 50
n 100 10
Now z-score 4.00 > 2.33, i.e. the null hypothesis Ho is rejected and the
researchers hypothesis is accepted stating that we support the claim
“Breaking strength is improved by the new manufacturing process”.

Figure 12.13

Note : In practice, one should adopt a ‘one tailed test’ only when he has enough
reasoning to expect that the difference will be in a specified direction. A two-
tailed test is conservative than a one-tailed test. Since it uses more extreme test
statistic for the rejection of the null hypothesis.

SAQ 2
∑ xi 31
(a) x= = = 2.583 ≈ 2.6
n 12

∑ (xi − x ) 2 104.92
Also σ= = = 3.08
n −1 12 − 1

The null hypothesis Ho : µ = 0, i.e. assuming that the stimulus will not be
accompanied by an increase in blood pressure (or the mean increase in blood
pressure for the population is zero). 93
Probability and Statistics |x − µ| 2.6 − 0
Now t= n= 12 = 2.924
σ 3.08
The table value, t0.05, n = 11 = 2.201
Therefore, 2.924 > 2.201.
Thus the null hypothesis Ho is rejected, i.e. we find that our assumption is
wrong and we say that as a result of the stimulus the blood pressure will
increase.
(b) The null hypothesis Ho : µ1 = µ2 or Ho : µ1 − µ2 ≠ 0
Alternative hypothesis H1 : µ1 ≠ µ2 or H1 : µ1 − µ2 ≠ 0, i.e. there is no
significant difference in the two batteries.
Now n1 = 10, σ1 = 100 = 10, x1 = 500, n2 = 10, n1 = 10, σ 2 = 121 = 11
and x2 = 560 .

( n1 − 1) σ12 + (n2 − 1) σ 22
Thus σ 2P =
n1 + n2 − 2

(10 − 1) (10) 2 + (10 − 1) (11) 2


=
10 + 10 − 2
= 110.5

x1 − x2 − (µ1 − µ 2 )
Using the formula t = , where µ1 − µ 2 = 0
1 1 
σ 2P  + 
 n1 n2 
500 − 560
we get t= = 12.76
1 1
110.5  + 
 10 10 
The degree of freedom v (df) = 10 + 10 − 2 = 18
For v = 18, t0.05 = 2.1
Therefore, 12.76 > 2.1 (much higher)
Thus the difference is highly significant (rejection of Ho)
(c) x = 1570, µ = 1600, σ = 120, n = 100

x −µ
Now, t=
σ
n

1570 − 1600
= = 2 .5
120
100
at 0.05, the level of significance t = 1.96.
Since t > 1.96. Hence the claim is to be rejected.
(d) x = 0.742, µ = 0.70, σ = 0.04, n = 10
94
x −µ Testing of Hypothesis
Now, t=
σx
n −1

0.742 − 0.70
= = 3.15
0.04
10 − 1

at 0.05, the level of significance t = 2.262.


Since calculated t > 2.262. Hence the claim is to be rejected.
(e) x = 4.3, µ = 4, σ = 1.2, n = 9

x −µ
Now, t=
σ
n
4 .3 − 4
= = 0.75
1.2
9
α
From table, t n −1 , = t8, 0.05 = 1.86 .
2
Since calculated | t | < 1.86, we have no reason to reject H0.
(f) Clearly, it is a two tailed test, so that
H0 : π = 30, and
H1 : π ≠ 30
p−π
Now, z=
σp

π (1 − π)
where, σp =
n

0.3 × 0.7
= = 0.037
150
50
and p= = 0.33
150
0.33 − 0.3
then z= = 0.81
0.037
Since the z values of 0.81 is less than the critical value of z at α = 0.05
which is 1.96, we cannot reject the null hypothesis.
(g) It is a tow-tailed test, so that
H0 : µ1 = µ2
H1 : µ1 ≠ µ2
Now, we have n1 = 100 n2 = 50

x1 = 75 x2 = 70

σ1 = 10 σ 2 = 12
95
Probability and Statistics α1 = 0.01

x1 − x2
z=
σ12 σ 22
+
n1 n2

75 − 70
= = 2.54
10 2 12 2
+
100 50
Since this value z is less than the critical value of z at α = 0.01 for a
two-tailed test, which is 2.58, we cannot reject the null hypothesis.
(h) As the sample size is small, a t-distribution represents the data more closed
to the z distribution.
Mean x = 120

∑ (x − x)2
and σ= = 6.69
n −1

The value of t at α/2 = 0.25 and (n – 1) = 7 degrees of freedom is 2.365


σ
then, the confidence limits are x ± t .
n

Figure 12.14

σ
x1 = x − t
n
6.69
= 120 − × (2.36) = 114.42
8
σ
and x2 = x + t
n
= 125.58
Hence 114.42 ≤ µ ≤ 125.58
(i) Since the sample size is small, we can use the t-test.
96
x −µ Testing of Hypothesis
Now, t=
σ
n
x = 8.2, µ = 8, σ = 0.3, n = 9 , then

8 .2 − 8
t= =2
0 .3
9
The critical value of t from the table at α = 0.01 for one tail test and df
(degree of freedom) = 8, is 2.896.
Since our calculated value of t is less than the critical value of t, we cannot
reject the null hypothesis.
(j) H0 : µ1 = µ2
H1 : µ1 ≠ µ2
x1 − x2
Now, t=
1 1
σ 2p  + 
 n1 n2 

(n1 − 1) σ12 + (n2 − 1) σ 22


where σ 2p =
(n1 + n2 − 2)

(10 − 1) 4 2 + (8 − 1) 52
= = 19.94
(10 + 8 − 2)

23 − 26
and t= = − 1.41
 1 1
19.94  + 
 10 8 

The critical value of t from the table at α = 0.05 for a two-tail test and
df = (n1 + n2 − 2) = 16 is 2.12. Since the numerical value of our calculated
‘t’ is less than the critical value of t, we cannot reject the null hypothesis.
SAQ 3
On the hypothesis of unbiased dice the theoretical frequencies in 4096 throws are
the terms in the Binomial expansion of 4096 (5/6 + 1/6)2 and are given below :

Number of 0 1 2 3 4 5 6 7 Total
Successes

Frequencies 459 1102 1212 808 364 116 27 8 4096

Using the formula :


(O1 − E1 ) 2 (O2 − E2 ) 2 (O3 − E3 ) 2 (O − E8 ) 2
χ2 = + + + ... + 8
E1 E2 E3 E8

(447 − 459) 2 (1145 − 1102) 2 (8 − 8) 2


we get χ2 = + + ... +
459 1102 8
χ 2 = 5.811
The number of classes is 8 but the total of the observed and theoretical frequencies 97
agree. Therefore n (df) = 8 − 1 = 7. From the table
Probability and Statistics χ20.05, n = 7 = 14.07, i.e. the calculated value of χ2 is not significant and thus the
observed frequency distribution is consistent with the hypothesis.

FURTHER READING
Grant, E. L. and R. S. Leavenworth (1980), Statistical Quality Control, 5th Ed., McGraw
Hill, New York.
Grewal, B. S. (2004), Higher Engineering Mathematics, Khanna Publisher, New Delhi.
Kreyszig Erwin (2002), Advanced Engineering Mathematics, John Wiley and Sons, Inc.
Richard, P. Runyon and Haber Audrey, Business Statistics, Richard D. Irwin, Inc.
Stroud, K. A. (2002), Engineering Mathematics, McMillan Press Ltd.
Stuart, A. and J. K. Orth, Kendalls’, Advanced Theory of Statistics, Volumes 1 and 2,
5th Ed., Kent UK, Arnold.
Walpole Ronald, E and Mayers Rammond, H (2002), Probability and Statistics for
Engineers and Scientists, McMillan Publishing Company, New York.

98

You might also like