Hypothesis Testing Study Material
Hypothesis Testing Study Material
Hypothesis Testing Study Material
The group (of size N) of units or items or individuals or observations or objects forming a
subject matter of statistical investigation for their various characteristics is known as
population. Population refers totality of all relevant data.
OR
It is the collection of a specified group of similar objects, individuals or of entitles that have
some common observable characteristics in them.
The population may be finite or infinite depending on size N being finite or infinite. It may
be real or Hypothetical.
5.2 Sample: A part or portion of units selected from population on the basis of some definite
norm is called a sample.
➢ The size of sample (number of units contained in sample) is denoted by n (where, n < N).
When, 𝑛 ≥ 30, the sample size is said to be large and when 𝑛 < 30, the sample size is said
to be small.
➢ For example, if the students of GTU are population, students of SNPIT are sample.
5.3 Sampling: The process of drawing samples from a given population is known as sampling.
➢ There are many methods of sampling – Random sampling, Simple random sampling,
Stratified sampling, Systematic Sampling, Purposive sampling.
5.3.1 Random Sampling: If each unit of the population has the same chance of being selected in
the sample, then the sampling is said to be random sampling.
➢ In random sampling with replacement, each unit of the population may be chosen more
than once since the unit is replaced in the population. Whereas, in random sampling
without replacement, each unit of the population can be chosen only once since the unit is
not replaced in the population.
➢ Thus, we say that sampling is done from infinite population when it is drawn from infinite
population or it is drawn from finite population with replacement.
➢ Again, we say that sampling is done from finite population when it is drawn from finite
population without replacement.
5.4 Population parameters and sample statistics: Statistical measures or constants obtained
from the population such as population mean 𝝁, population variance 𝝈𝟐 etc. are known as
population parameters or simply parameters. It is based on all units of populations.
Population parameter is denoted by 𝜽.
̅, sample
Statistical measures obtained from sample observations such as sample mean 𝒙
variance 𝝈𝟐 etc, are known as sample statistics or simply statistics.
The following table indicates the notations used for different statistical measures for
parameters and statistics.
Size 𝑵 𝒏
Mean 𝝁 ̅
𝒙
Standard Deviation 𝝈 𝒔
Variance 𝝈𝟐 𝒔𝟐
Proportion 𝑷 𝒑
Correlation 𝒓
Note that a statistic varies from sample to sample but the parameter remains a constant.
This variation in the value of a statistics is known as sampling fluctuations. As statistic is a
random variable, so it must have a probability distribution
5.5 Sampling Distribution: Given a finite population of size N, draw all possible samples each
of the same size n. Then the total number of all possible samples each of the same size n is
given by
𝑵 𝑵!
𝑵 𝑪𝒏 = ( )= = 𝒌 (𝒔𝒂𝒚)
𝒏 (𝑵 − 𝒏)! 𝒏!
For each of these samples, compute a statistic 𝒕 [for example, mean, variance, etc.] then t
will vary from sample to sample as shown below.
Sample
1 2 3 ...... k
Number
Statistic 𝒕 𝑡1 𝑡2 𝑡3 ...... 𝑡𝑘
The aggregate of these values of 𝒕, together with their relative frequencies or probabilities
with which they occur, constitutes the sampling distribution of the statistic 𝑡.
If the statistic 𝑡 used is the sample mean, then the distribution is called the sampling
distribution of mean.
5.6 Statistical Inference: The theory of statistical inference (also known as Decision theory)
consist of those methods by which one makes inferences or generalization about a
population based on the information provided by samples selected from the population.
(a) Estimation: In which population parameters are estimated on the basis of sample
information.
For example, a manufacturer is interested in estimating the average life of his product,
proportion of defective items in his lot, average demand of his product etc.
(b) Hypothesis Testing: Here, some hypothetical statement is made about the population
parameter and it is tested at certain level of significance to check whether the made
hypothesis is correct or incorrect on the basis of sample information.
For example, consider a case in which a housewife is interested to find out whether brand
A detergent clean brighten than brand B detergent. Then she may start with the hypothesis
that brand A is better and after proper testing she accept or reject the hypothesis.
5.7.1 Statistical Hypothesis: It is an assumption or statement [ which may or may not be true]
concerning one or more populations. For example,
(I) The average marks of students of IT classes and CE classes of GTU are same.
(II) Population mean μ = 30.
5.7.2 Testing of Hypothesis: The truth and falsity of the statistical hypothesis can be known
certainly only if we examine the entire population, which is in most of the cases impractical.
Instead, we take a random sample from the population and use the information contained
in the sample to decide whether the hypothesis is likely to be true or false.
Step 1: State the Null Hypothesis (H0) and Alternative Hypothesis (H1)
➢ Null Hypothesis (H0): It is the statistical Hypothesis which is to be actually tested for
acceptance or rejection. The capital letter H stands for Hypothesis and ‘zero’ implies
no difference between sample statistic and population parameter value. The Null
Hypothesis usually be considered true until it is proved false on the basis of sample
data. A Null Hypothesis of is always expressed in the form of mathematical statement.
• 𝑯𝟎 : 𝝁 = 𝝁𝟎 𝑯𝟏 : 𝝁 ≠ 𝝁𝟎
• 𝑯𝟎 : 𝝁 ≥ 𝝁𝟎 𝑯𝟏 : 𝝁 < 𝝁𝟎
• 𝑯𝟎 : 𝝁 ≤ 𝝁𝟎 𝑯𝟏 : 𝝁 > 𝝁𝟎
The level of significance is denoted by 𝜶, and is specified before the samples are drawn. It
is specified in terms of the probability of Null hypothesis 𝑯𝟎 being wrong. In other words,
the level of significance defines the likelyhood of rejecting the hypothesis when it is true.
i.e. the maximum probability with which we would be ready to risk a rejecting hypothesis
when it is true is called as the level of significance. In practice, a level of significance of 𝜶 =
𝟎. 𝟎𝟏, or 𝜶 = 𝟎. 𝟎𝟓 is used, although other values are also used. 𝜶 = 𝟎. 𝟎𝟏 is used for the
high accuracy.
𝑣 = 𝑛 − 𝑘, 𝑣 > 0
where n is the sample size and k is the number of independent constraints imposed on
the observations in the sample.
For example, suppose that, we are asked to select any five observations,
then there is no restriction on the selection of these observations and we are free to
select any five observations and Hence, degree of freedom is
𝑣 =𝑛−𝑘 =5−0= 5
Let us take another situation, Suppose we are asked to select 5 observations whose
sum is 100. Then we are free to choose any four observations but 5 th observation will
automatically be selected by virtue of 100. Hence, the degree of freedom for the
selection is:
𝑣 =𝑛−𝑘 =5−1=4
In any test of hypothesis, a test statistics t* is calculated from the sample data is used
Consider the area under the probability curve of the sampling distribution of the test
statistic t* which follow some known distribution. The area under the probability curve
The area of the critical region is the level of significance of the test 𝛼.
• It should be noted that the critical region always lies on the tails of the distribution.
Depending on the nature of Alternative Hypothesis, Critical Region may lie on one side
or both sides of the tails and according we have one tailed or two tailed tests.
A test of any hypothesis is one tailed or two tailed can be determined by the alternative
hypothesis 𝐻1 as follows:
(1) If 𝐻1 carries > sign, then the test Hypothesis is known as Right one tailed test. At
(2) If 𝐻1 carries < sign, then the test Hypothesis is known as Left one tailed test. At
the level of 𝜶, the critical region is shown in the given figure II.
(3) If 𝐻1 carries ≠ sign, then the test Hypothesis is known as Two tailed test. At the
Test Statistics:
The statistical decision to accept or reject null hypothesis is made on the basis of a
depending upon the probability distribution followed by it. The main test statistics
are 𝑧, 𝜒 2 , 𝑡, 𝐹 etc.
Error in testing:
random sample from population, so there are scopes of committing errors. The
• Type I Error indicates 𝑯𝟎 is true but it is rejected by the test – Rejection Error
• Type II Error indicates 𝑯𝟎 is false but it is accepted by the test – Acceptance Error
denoted by 𝜷.
lot after testing a sample from population, Type I error rejects a lot when it is good and
accept a wrong hypothesis that to reject correct one. That is, Type II errors are most
4. Determine the degree of freedom if required (This step is required for t – test,
between the test statistic and critical value. The decision will depend upon
whether the computed value of the test criterion in step V falls under the