Hypothesis Testing Study Material

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

Concept of Sampling and Testing of Hypothesis

5.1 Population [or universe]:

The group (of size N) of units or items or individuals or observations or objects forming a
subject matter of statistical investigation for their various characteristics is known as
population. Population refers totality of all relevant data.
OR
It is the collection of a specified group of similar objects, individuals or of entitles that have
some common observable characteristics in them.

The population may be finite or infinite depending on size N being finite or infinite. It may
be real or Hypothetical.

➢ For every inquiry and statistical investigation, complete enumeration of entire


population may not be feasible, affordable and practicable. Therefore, a finite subset of
population is selected by some scientific method with a view of estimating the population
characteristic is known as sample.

5.2 Sample: A part or portion of units selected from population on the basis of some definite
norm is called a sample.

➢ The size of sample (number of units contained in sample) is denoted by n (where, n < N).
When, 𝑛 ≥ 30, the sample size is said to be large and when 𝑛 < 30, the sample size is said
to be small.

➢ For example, if the students of GTU are population, students of SNPIT are sample.

5.3 Sampling: The process of drawing samples from a given population is known as sampling.

➢ There are many methods of sampling – Random sampling, Simple random sampling,
Stratified sampling, Systematic Sampling, Purposive sampling.

5.3.1 Random Sampling: If each unit of the population has the same chance of being selected in
the sample, then the sampling is said to be random sampling.
➢ In random sampling with replacement, each unit of the population may be chosen more
than once since the unit is replaced in the population. Whereas, in random sampling
without replacement, each unit of the population can be chosen only once since the unit is
not replaced in the population.

1 SNPIT & RC, Umrakh | Dr. Mansi Zaveri


[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

➢ Thus, we say that sampling is done from infinite population when it is drawn from infinite
population or it is drawn from finite population with replacement.
➢ Again, we say that sampling is done from finite population when it is drawn from finite
population without replacement.

5.4 Population parameters and sample statistics: Statistical measures or constants obtained
from the population such as population mean 𝝁, population variance 𝝈𝟐 etc. are known as
population parameters or simply parameters. It is based on all units of populations.
Population parameter is denoted by 𝜽.

̅, sample
Statistical measures obtained from sample observations such as sample mean 𝒙
variance 𝝈𝟐 etc, are known as sample statistics or simply statistics.

The following table indicates the notations used for different statistical measures for
parameters and statistics.

Statistical Measure Parameter (𝜽) Statistics (𝒕)

Size 𝑵 𝒏

Mean 𝝁 ̅
𝒙

Standard Deviation 𝝈 𝒔

Variance 𝝈𝟐 𝒔𝟐

Proportion 𝑷 𝒑

Correlation  𝒓

2 SNPIT & RC, Umrakh | Dr. Mansi Zaveri


[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

Note that a statistic varies from sample to sample but the parameter remains a constant.
This variation in the value of a statistics is known as sampling fluctuations. As statistic is a
random variable, so it must have a probability distribution

5.5 Sampling Distribution: Given a finite population of size N, draw all possible samples each
of the same size n. Then the total number of all possible samples each of the same size n is
given by

𝑵 𝑵!
𝑵 𝑪𝒏 = ( )= = 𝒌 (𝒔𝒂𝒚)
𝒏 (𝑵 − 𝒏)! 𝒏!

For each of these samples, compute a statistic 𝒕 [for example, mean, variance, etc.] then t
will vary from sample to sample as shown below.

Sample
1 2 3 ...... k
Number

Statistic 𝒕 𝑡1 𝑡2 𝑡3 ...... 𝑡𝑘

The aggregate of these values of 𝒕, together with their relative frequencies or probabilities
with which they occur, constitutes the sampling distribution of the statistic 𝑡.

If the statistic 𝑡 used is the sample mean, then the distribution is called the sampling
distribution of mean.

5.6 Statistical Inference: The theory of statistical inference (also known as Decision theory)
consist of those methods by which one makes inferences or generalization about a
population based on the information provided by samples selected from the population.

(a) Estimation: In which population parameters are estimated on the basis of sample
information.
For example, a manufacturer is interested in estimating the average life of his product,
proportion of defective items in his lot, average demand of his product etc.
(b) Hypothesis Testing: Here, some hypothetical statement is made about the population
parameter and it is tested at certain level of significance to check whether the made
hypothesis is correct or incorrect on the basis of sample information.

3 SNPIT & RC, Umrakh | Dr. Mansi Zaveri


[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

For example, consider a case in which a housewife is interested to find out whether brand
A detergent clean brighten than brand B detergent. Then she may start with the hypothesis
that brand A is better and after proper testing she accept or reject the hypothesis.

5.7 Statistical Hypothesis and Its Testing:

5.7.1 Statistical Hypothesis: It is an assumption or statement [ which may or may not be true]
concerning one or more populations. For example,

(I) The average marks of students of IT classes and CE classes of GTU are same.
(II) Population mean μ = 30.

5.7.2 Testing of Hypothesis: The truth and falsity of the statistical hypothesis can be known
certainly only if we examine the entire population, which is in most of the cases impractical.
Instead, we take a random sample from the population and use the information contained
in the sample to decide whether the hypothesis is likely to be true or false.

5.8 General Procedure of hypothesis Testing:

Step 1: State the Null Hypothesis (H0) and Alternative Hypothesis (H1)

➢ Null Hypothesis (H0): It is the statistical Hypothesis which is to be actually tested for
acceptance or rejection. The capital letter H stands for Hypothesis and ‘zero’ implies
no difference between sample statistic and population parameter value. The Null
Hypothesis usually be considered true until it is proved false on the basis of sample
data. A Null Hypothesis of is always expressed in the form of mathematical statement.

➢ Alternative Hypothesis (H1): Any Hypothesis which is complementary to the Null


Hypothesis is called an Alternative Hypothesis.

The rejection of Null Hypothesis leads to acceptance of Alternative Hypothesis. The


following are the examples of Null Hypothesis (H0) and Alternative Hypothesis (H1).

• 𝑯𝟎 : 𝝁 = 𝝁𝟎 𝑯𝟏 : 𝝁 ≠ 𝝁𝟎

• 𝑯𝟎 : 𝝁 ≥ 𝝁𝟎 𝑯𝟏 : 𝝁 < 𝝁𝟎

• 𝑯𝟎 : 𝝁 ≤ 𝝁𝟎 𝑯𝟏 : 𝝁 > 𝝁𝟎

Step 2: (i) State the Level of significance 𝜶 (alpha)

4 SNPIT & RC, Umrakh | Dr. Mansi Zaveri


[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

The level of significance is denoted by 𝜶, and is specified before the samples are drawn. It

is specified in terms of the probability of Null hypothesis 𝑯𝟎 being wrong. In other words,

the level of significance defines the likelyhood of rejecting the hypothesis when it is true.

i.e. the maximum probability with which we would be ready to risk a rejecting hypothesis
when it is true is called as the level of significance. In practice, a level of significance of 𝜶 =
𝟎. 𝟎𝟏, or 𝜶 = 𝟎. 𝟎𝟓 is used, although other values are also used. 𝜶 = 𝟎. 𝟎𝟏 is used for the
high accuracy.

➢ 𝜶 is also expressed as percentage. Choosing 𝜶 = 𝟓% in designing a test of hypothesis


means that there are about 5 chances out of 100 that Null hypothesis is rejected when
it is true. i. e. we are about 95% confident that our decision is right.
➢ In this case we say that the Hypothesis has been rejected at 5% level of significance,
which means that we could be wrong with 0.05 chances.

(ii) Level of Confidence (𝟏 − 𝜶): As discussed above, level of confidence is complementary


to the level of significance. It is (𝟏 − 𝜶). If the level of significance is 1% then it implies
that level of confidence is 99%.

(iii)Degree of Freedom: degree of freedom is the number of independent observations of


the samples. The number of independent observations is different for different
statistics. The degree of freedom is denoted and defined by

𝑣 = 𝑛 − 𝑘, 𝑣 > 0
where n is the sample size and k is the number of independent constraints imposed on
the observations in the sample.

For example, suppose that, we are asked to select any five observations,
then there is no restriction on the selection of these observations and we are free to
select any five observations and Hence, degree of freedom is

𝑣 =𝑛−𝑘 =5−0= 5

Let us take another situation, Suppose we are asked to select 5 observations whose
sum is 100. Then we are free to choose any four observations but 5 th observation will

5 SNPIT & RC, Umrakh | Dr. Mansi Zaveri


[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

automatically be selected by virtue of 100. Hence, the degree of freedom for the
selection is:
𝑣 =𝑛−𝑘 =5−1=4

Step 3: (i) Critical Region:

In any test of hypothesis, a test statistics t* is calculated from the sample data is used

to accept or reject the Null hypothesis of the test.

Consider the area under the probability curve of the sampling distribution of the test

statistic t* which follow some known distribution. The area under the probability curve

is divided into two regions (by predetermined level of significance).

(1) The region of rejection – Critical region or the region of significance

Where the Null hypothesis 𝐻0 is rejected.

(2) The region of Acceptance

Where the Null hypothesis 𝐻0 is acceptance.

The area of the critical region is the level of significance of the test 𝛼.

• It should be noted that the critical region always lies on the tails of the distribution.

Depending on the nature of Alternative Hypothesis, Critical Region may lie on one side

or both sides of the tails and according we have one tailed or two tailed tests.

One tailed or two tailed tests

A test of any hypothesis is one tailed or two tailed can be determined by the alternative

hypothesis 𝐻1 as follows:

(1) If 𝐻1 carries > sign, then the test Hypothesis is known as Right one tailed test. At

the level of 𝜶, the critical region is shown in the given figure I.

6 SNPIT & RC, Umrakh | Dr. Mansi Zaveri


[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

(2) If 𝐻1 carries < sign, then the test Hypothesis is known as Left one tailed test. At

the level of 𝜶, the critical region is shown in the given figure II.

(3) If 𝐻1 carries ≠ sign, then the test Hypothesis is known as Two tailed test. At the

level of 𝜶, the critical region is shown in the given figure III.

Test Statistics:

7 SNPIT & RC, Umrakh | Dr. Mansi Zaveri


[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

The statistical decision to accept or reject null hypothesis is made on the basis of a

statistics, called as test statistics, and it is computed using particular formula

depending upon the probability distribution followed by it. The main test statistics

are 𝑧, 𝜒 2 , 𝑡, 𝐹 etc.

Error in testing:

As the statistical decisions of acceptance or rejection of null hypothesis are based on

random sample from population, so there are scopes of committing errors. The

following situations arise while testing Null hypothesis.

Nature of Null Hypothesis Accept 𝑯𝟎 Reject 𝑯𝟎

𝑯𝟎 is True Correct Decision Type I Error

𝑯𝟎 is False Type II Error Correct Decision

• Type I Error indicates 𝑯𝟎 is true but it is rejected by the test – Rejection Error

• Type II Error indicates 𝑯𝟎 is false but it is accepted by the test – Acceptance Error

• Probability of Type I error is denoted by 𝜶 and Probability of Type II error is

denoted by 𝜷.

For example, In Statistical Quality Control, while deciding Acceptance or Rejection of a

lot after testing a sample from population, Type I error rejects a lot when it is good and

Type II error accepts a lot when it is bad.

In most of decision-making problems in business and social science, it is riskier to

accept a wrong hypothesis that to reject correct one. That is, Type II errors are most

serious then Type I error.

8 SNPIT & RC, Umrakh | Dr. Mansi Zaveri


[APPLIED STATISTICS: HYPOTHESIS TESTING] [3130006]

Steps to Test Hypothesis:

1. Formulate Null Hypothesis 𝑯𝟎 .

2. Formulate Alternative Hypothesis 𝑯𝟏 .

3. Choose the level of significance 𝜶

4. Determine the degree of freedom if required (This step is required for t – test,

χ2 − test, and F − test).

5. Determine the critical Region:

6. Compute the test statistic

7. Decision: Accept or reject the Null Hypothesis depending on the relation

between the test statistic and critical value. The decision will depend upon

whether the computed value of the test criterion in step V falls under the

region of rejection or acceptance.

9 SNPIT & RC, Umrakh | Dr. Mansi Zaveri

You might also like