MOD 5 (Hypotheis Testing)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40




Decision Science Area
CMS Business School
Jain (Deemed-to-be University)
AGENDA-Session 26
• Introduction to Hypothesis
• Population and Sample
• Sample Statistic and Population Parameter
• Parametric and Non-Parametric Tests
• Hypothesis
• Formulation of Null & Alternate Hypothesis
• Level of Significance
• Tail of the Test
• It is not an unusual thing for all of us to come across various
advertisements from organizations across verticals about their product.
• Some will be straight, some will try look down upon their competitors
and some will talk about their strengths.
• These strengths, most of the times, are presented in the form of claims.
For example, an automobile company may claim about the mileage of
the vehicle.
• These claims are open enough to be put under test – a test to ascertain
the claim.
• In statistics, this claim is referred to as a proposition and the test, the
test of hypotheses.

• “Hypothesis may be defined as a proposition or a set of

propositions set forth as an explanation for the occurrence of
some specified group of phenomena either asserted merely as a
provisional conjecture to guide some investigation in the light of
established facts” (Kothari, 1988).
• A research hypothesis is quite often a predictive statement, which
is capable of being tested using scientific methods that involve an
independent and some dependent variables.
Population and Sample

• Population:
• Populations can be the complete set of all similar items that exist. For
example, the population of a country includes all people currently within
that country. It’s a finite but potentially large list of members.
• However, a population can be a theoretical construct that is potentially
infinite in size. For example, quality improvement analysts often consider
all current and future output from a manufacturing line to be part of a
• A population is the entire group that you want to draw conclusions about.
Population and Sample

• Sample:
• Consists one or more observations drawn from the population.
• A sample is the specific group that you will collect data from.
• The size of the sample is always less than the total size of the population.
• Samples can be drawn from the population using various methods
classified under Probability and Non-Probability Sampling Methods.
• Probability Sampling Methods include – Random Sampling, Systematic
Sampling, Stratified Sampling and Cluster Sampling.
• Non-Probability Sampling Methods include – Judgmental Sampling,
Convenience Sampling, Quota Sampling and Snowball Sampling.
Population and Sample
Sample Statistic and Population Parameter
• Sample Statistic: A statistic is any summary number, like an
average or percentage, that describes the sample. If you collect a
sample and calculate the mean and standard deviation, these are
sample statistics.
• Population Parameter: A parameter is any summary number,
like an average or percentage, that describes the entire population.
Because you can almost never measure an entire population, you
usually don’t know the real value of a parameter. In fact,
parameter values are nearly always unknowable. While we don’t
know the value, it definitely exists.
• For example, the average height of adult women in India is a parameter
that has an exact value—we just don’t know what it is!
Parametric and Non-Parametric Tests

• Parametric Tests: In these tests, the population parameters are

• Non-Parametric Tests: In these tests, the population parameters
are unknown
Introduction - Hypothesis

• A hypothesis is a proposition that we want to verify.

• A hypothesis test is a process that uses sample statistic to test a
claim about the value of a population parameter.
• A hypothesis is an educated guess about something in the world
around you.
• It should be testable, either by experiment or observation.
A Hypothesis is…..
• A statement
• An assumption
• A claim
• An educated guess
• A tentative point of view
• A proposition not yet tested
• A preliminary explanation
• A preliminary Postulate

• All these that can be subjected to verify or test
Hypothesis – Some Examples

• Sales of SUV depends on the features offered

• High School Students who take private tuitions fare better than
those who do not.
• The outcome of three different methods of manufacturing a ball is
the same.
• The efficiency of four different methods to teach statistics is the
Hypothesis – Characteristics

• A hypothesis must be precise and clear. If it is not, then the

inferences drawn on its basis would not be reliable.
• A hypothesis must be capable of being put to test.
• A hypothesis must state relationship between two variables, in
the case of relational hypotheses.
• A hypothesis must be specific and limited in scope. This is
because a simpler hypothesis generally would be easier to test for
the researcher.
Hypothesis – Characteristics

• A hypothesis must be stated in the simplest language, so as to

make it understood by all concerned.
• A hypothesis must be consistent and derived from the most
known facts.
• A hypothesis must be amenable to testing within a stipulated
or reasonable period of time.
Hypothesis – Purpose

• Guides/gives direction to the study/investigation

• Defines facts that are relevant and irrelevant
• Suggests which form of research design is likely to be most
• Provides a framework for organizing the conclusions of the
• Limits the research to specific area.
• Offers explanations for the relationships between those variables
that can be empirically tested
Steps in Testing Hypothesis

• Step 1: Formulation of Null (H0) and Alternate Hypothesis (H1)

• Step 2: Setting up the level of significance for the test, stating tail
of the test, critical value and degrees of freedom (except for ‘z’
• Step 3: Calculating the test statistic (z0, t0, χ20, F0)
• Step 4: Locating the test statistic and accepting/rejecting Null
• Step 5: Drawing inference
Formulation of Null & Alternate Hypothesis
• Null hypothesis(H0) always predicts that
• There are no differences between the groups being studied or
• There is no relationship between the variables being studied
• The null hypothesis is essentially the "devil's advocate" position.
That is, it assumes that whatever you are trying to prove did not
Formulation of Null & Alternate Hypothesis
• By contrast, the alternate hypothesis(H1) always predicts that
there will be a difference between the groups/variables being
• The alternative hypothesis states the opposite and is usually the
hypothesis you are trying to prove.
Formulation of Null & Alternate Hypothesis
• H0: The outcome of three different methods of manufacturing a
ball is same.
• H1: The outcome of three different methods of manufacturing a
ball is not same.
Level of Significance

• To understand level of significance, we need to understand errors

in hypothesis.
• Let us take a scenario: You need to buy a mobile phone. You
have two options in front of you:
• You would buy at a company outlet or from an authorised seller
• You would buy in the black market (without warranty/without
invoice at a lesser price – especially if you are buying an expensive one)
Level of Significance

• These situations would lead to four decisions by you:

• You buy from a company outlet/authorised seller
• You do not buy from a company outlet/authorised seller
• You buy from black market
• You do not buy from black market
• Out of these, which are the right decisions and which are wrong?
Level of Significance

• You buy from a company outlet/authorised seller (RIGHT)

• You do not buy from a company outlet/authorised seller
• You buy from black market (WRONG)
• You do not buy from black market (RIGHT)
Level of Significance
Now, these situations are same as:
• You buy from a company outlet/authorised seller (RIGHT)
• H0 is right and you accept it
• You do not buy from a company outlet/authorised seller
• H0 is right and you reject it
• You buy from black market (WRONG)
• H0 is wrong and you accept it
• You do not buy from black market (RIGHT)
• H0 is wrong and you reject it.
Level of Significance

• Now, there comes the errors in Hypothesis:

• H0 is TRUE and you accept it – NO ERROR
• H0 is TRUE and you reject it – REJECTION ERROR (TYPE 1
• H0 is FALSE and you accept it – ACCEPTANCE ERROR (TYPE 2
• H0 is FALSE and you reject it – NO ERROR
Level of Significance

Accept H0 Reject H0

H0 is TRUE Right Decision TYPE I ERROR

H0 is FALSE TYPE II ERROR Right Decision
Level of Significance

• The probability of type 1 error is nothing but the level of

significance. It is represented by symbol α (ALPHA).
• We usually set one of the three levels of significance to test the
hypothesis depending on the precision we need – 0.01, 0.05, 0.1.
• How do we translate this?
Level of Significance
• At 0.01 level of significance, we are giving ourselves a 1% chance
• At 0.05 level of significance, we are giving ourselves a 5% chance
• At 0.1 level of significance, we are giving ourselves a 10% chance

• We can also understand this using confidence interval which is

given by 1-α. For eg., if α = 0.01, then confidence interval = 1-0.01
= 0.99 or 99%.
Level of Significance

• At 0.01 level of significance or 0.99 level of confidence interval

(99%), if we repeat this test 100 times, 99 times we must get the
same result – HIGH PRECISION
• At 0.05 level of significance or 0.95 level of confidence interval
(95%), if we repeat this test 100 times, 95 times we must get the
• At 0.1 level of significance or 0.90 level of confidence interval
(90%), if we repeat this test 100 times, 90 times we must get the
same result – LOW PRECISION.
Tail of the Test

• A test that is conducted to show whether the mean of the sample

is significantly greater than and significantly less than the mean of
a population is considered a two-tailed test.
• When the testing is set up to show that the sample mean would be
higher or lower than the population mean, it is referred to as a
one-tailed test.
• A one- or two-tailed t-test is determined by whether the total area
of ‘α’ is placed in one tail or divided equally between the two tails.
Tail of the Test

• A one-tailed test is a statistical hypothesis test set up to show

that the sample mean would be higher or lower than the
population mean, but not both.
• When using a one-tailed test, the analyst is testing for the
possibility of the relationship in one direction of interest, and
completely disregarding the possibility of a relationship in
another direction.
• This can be better explained by using the concept of rejection
Tail of the Test

• A rejection region or critical region is defined as the part of the

sample space which corresponds to the rejection of null
hypothesis being tested.
• It is the range of the values for which the null hypothesis is not
• A critical value separates the rejection region from the
non-rejection region.
• The nature of the rejection region depends on the type of
alternative hypothesis and the size of the rejection region
depends on the magnitude of the level of significance.
Tail of the Test

Right Tailed Test Left Tailed Test Two Tailed Test
Tail of the Test

• An easy way to identify the tail of the test:

• When there is ≠ in H1, then that test is a two tailed test
• When there is > in H1, then that test is a right tailed test
• When there is < in H1, then that test is a left tailed test
Tail of the Test and Critical Value

• Tail of the test is important for us to know where our acceptance

and rejection regions are located.
• The critical value (also called as the table value) separates the
acceptance region from the rejection region.
Degrees of Freedom

• In simple words, we define degrees of freedom as the number of

freely available samples (without any obstacles) from whom the
data can be collected.
• Degrees of Freedom refers to the maximum number of logically
independent values, which are values that have the freedom to
vary, in the data sample.
• Other than the ‘z’ test, degrees of freedom is applicable in all other
hypothesis tests.
• In fact, arriving at the critical value of a test depends upon the
level of significance and the degrees of freedom.
Calculating the Test Statistic

• This is the step where the test statistic is calculated using

appropriate formulae.
• Formulae varies based on what test is being conducted – whether
it is a ‘z’ test, ‘t’ test, Chi-Square test or ANOVA.
• A ‘z’ test, also known as large sample test, is conducted when the
sample size (n) is greater than 30.
• A ‘t’ test, also known as small sample test, is conducted when the
sample size (n) is less than or equal to 30.
Calculating the Test Statistic

• In the case of large samples, when we are testing the significance

of statistic, the concept of standard error is used.
• It measures only sampling errors.
• Sampling errors are involved in estimating a population
parameter from a sample, instead of including all the essential
information in the population.
Calculating the Test Statistic
Locating the test statistic and
accepting/rejecting Null Hypothesis
• After calculating the test statistic, we need to check whether it lies
in the acceptance region or the rejection region.
• For that we use a simple rule: if |z0| ≤ |ze|, where ‘ze’ is the critical
value of a ‘z’ test, then accept Null Hypothesis. Else accept
Alternate Hypothesis.
• The rule remains the same for all the hypothesis tests, irrespective
of whether it is a ‘z’, ‘t’, Chi-Square or ANOVA test.
Drawing Inference

• Based on whether Null Hypothesis is accepted or rejected, there is

a need to draw inference or conclusion about the test that has
been conducted and apply it to the population from which the
sample was drawn.
• This is what is meant by generalizing to the population, by testing
the sample.

You might also like