NOTES THREE Dms 201

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 23

DMS 201: BUSINESS STATISTICS I

HYPOTHESIS TESTING

Definition
- A hypothesis is a claim or an opinion about an item or issue. Therefore
it has to be tested statistically in order to establish whether it is correct
or not correct
- Whenever testing an hypothesis, one must fully understand the 2 basic
hypothesis to be tested namely
i. The null hypothesis (H0)
ii. The alternative hypothesis(H1)

The null hypothesis


This is the hypothesis being tested, the belief of a certain characteristic
e.g. Kenya Bureau of Standards (KBS) may walk to a sugar making
company with an intention of confirming that the 2kgs bags of sugar
produced are actually 2kgs and not less, they conduct hypothesis testing
with the null hypothesis being: H0 = each bag weighs 2kgs. The testing
will set out to confirm this or to refute it.

The alternative hypothesis


While formulating a null hypothesis we also consider the fact that the
belief might be found to be untrue hence we will reject it. We therefore
formulate an alternative hypothesis which is a contradiction to the null
hypothesis, thus when we reject the null hypothesis we accept the
alternative hypothesis.
In our example the alternative hypothesis would be
H1 = each bag does not weigh 2kg

Acceptance and rejection regions


All possible values which a test statistic may either assume consistency
with the null hypothesis (acceptance region) or lead to the rejection of
the null hypothesis (rejection region or critical region)
The values which separate the rejection region from the acceptance
region are called critical values

Type I and type II errors


While testing hypothesis (H0) and deciding to either accept or reject a
null hypothesis, there are four possible occurrence.
a) Acceptance of a true hypothesis (correct decision) – accepting the null
hypothesis and it happens to be the correct decision. Note that
statistics does not give absolute information, thus its conclusion could
be wrong only that the probability of it being right are high.
b) Rejection of a false hypothesis (correct decision).
c) Rejection of a true hypothesis – (incorrect decision) – this is called
type I error, with probability = α.
d) Acceptance of a false hypothesis – (incorrect decision) – this is called
type II error, with probability = β.

Levels of significance
A level of significance is a probability value which is used when
conducting tests of hypothesis. A level of significance is basically the
probability of one making an incorrect decision after the statistical

1|Page
DMS 201: BUSINESS STATISTICS I
testing has been done. Usually such probability used are very small e.g.
1% or 5%

2|Page
DMS 201: BUSINESS STATISTICS I

0.5000 0.4900

1% provision for errors

0 Critical value
or

5% = 0.05000
0.4500

Critical value (-1.65)


Critical region

NB: If the standardized value of the mean is less than –1.65 we reject the
null hypothesis (H0) and accept the alternative Hypothesis (H 1) but if the
standardized value of the mean is more than –1.65 we accept the null
hypothesis and reject the alternative hypothesis

The above sketch graph and level of significance are applicable when the
sample mean is < (i.e. less than the population mean)

The following is used when sample mean > population mean

Acceptance region

Critical region
(rejection region)

5% = 0.05

0 Z = 1.65 (critical value)

3|Page
DMS 201: BUSINESS STATISTICS I
NB: If the sample mean standardized value < 1.65, we accept the null
hypothesis but reject the alternative. If the sample mean value > 1.65 we
reject the null hypothesis and accept the alternative hypothesis
The above sketch is normally used when the sample mean given is
greater than the population mean

Accept null hyp( reject


Alternative hyp)
Reject null hyp
(accept alt hyp) Reject null hyp
(accept alt hyp)

0.05% = 0.05 0.495 0.495 0.5% = 0.05

-2.58 +2.58
NB: if the standardized value of the sample mean is between –2.58 and
+2.58 accept the null hypothesis but otherwise reject it and therefore
accept the alternative hypothesis

TWO TAILED TESTS


A two tailed test is normally used in statistical work(tests of significance)
e.g. if a complaint lodged by the client is about a product not meeting
certain specifications i.e. the item will generate a complaint if its
measurements are below the lower tolerance limit or above the upper
tolerance limit

Region of acceptance for


H0

Critical region Critical region

15cm 17 ½ cm

NB: Alternative hypothesis is usually rejected if the standardized value of


the sample mean lies beyond the tolerant limits (15cm and 17 ½ cm).

4|Page
DMS 201: BUSINESS STATISTICS I
ONE TAILED TEST
This is a test where the alternative hypothesis (H 1:) is only concerned
with one of the tails of the distribution e.g. to test a business complaint if
the complaint is above the measurements of item being shorter than is
required.
E.g. a manufacturer of a given brand of bread may state that the average
weight of the bread is 500 gms but if a consumer takes a sample and
weighs each of the pieces of bread and happens to have a mean of 450
gms he will definitely complain about the bread which is underweight.
The statistical analysis to be done will concentrate on the left tail of the
normal distribution in which one will have to establish whether 450 gms
being less than 500g is statistically significant. Such a test therefore is
referred to as one tailed test.

5|Page
DMS 201: BUSINESS STATISTICS I

left

On the other hand the test may copulate on the right hand tail of the
normal distribution when this happens the major complaint is likely to do
with oversize items bought. Therefore the test is known as one tailed as
the focus is on one end of the normal distribution.

Number of standard
errors
Two tailed One tailed
test test
5% level of 1.96 1.65
significance
1% level of 2.58 2.33
significance

HYPOTHESIS TESTING PROCEDURE


Whenever a business complain comes up there is a recommended
procedure for conducting a statistical test. The purpose of such a test is
to establish whether the null hypothesis or alternative hypothesis is to be
accepted.
The following are steps normally adopted
1. Statement of the null and alternative hypothesis
2. Statement of the level of significance to be used.
3. Statement about the test statistic i.e. what is to be tested e.g. the
sample mean, sample proportion, difference between sample
means or sample proportions
4. Type of test whether two tailed or one tailed.
5. Statement on critical values using the appropriate level of
significance
6. Standardizing the test statistic
7. Conclusion showing whether to accept or reject the null hypothesis

STANDARD HYPOTHESIS TESTS


In principal, we can test the significance of any statistic related to any
probability distribution. However we will be interested in a few standard
cases. The sample statistics mean, proportion and variance, are related
to the normal, t, F, and chi squared distributions
Thus
1. Normal test

6|Page
DMS 201: BUSINESS STATISTICS I

Test a sample mean ( ) against a population mean (µ) (where


samples size n > 30 and population variance σ 2 is known) and sample
proportion, P(where sample size np >5 and nq >5 since in this case
the normal distribution can be used to approximate the binomial
distribution

2. t test
Tests a sample mean ( ) against a population mean and especially
where the population variance is unknown and n < 30.

3. Variance ratio test or f test


It is used to compare population variances and it is used with samples
of any size drawn from normal populations.

4. Chi squared test


It can be used to test the association between attributes or the
goodness of fit of an observed frequency distribution to a standard
distribution

Example 1
A certain NGO carried out a survey in a certain community in order to
establish the average at which the girls are married. The results of the
survey indicated that the marriage age for the girls is 19 years
In order to establish the validity of the mean marital age, a sample of 50
women was interviewed and the average age indicated that they got
married at the age of 16 years. However the different ages at which they
were married differed with the standard deviation of 2.1years
The sample data indicates that the marital age is less 19 years. Is this
conclusion true or not ?

Required
Conduct a statistical test to either support the above conclusion drawn
from the sample statistics i.e. the marriage age is less than 19 use a level
of significance of 5%

Solution
1. Null hypothesis
H0: μ (mean marital age) = 19 years
Alternative hypothesis H1: μ (mean marital age) < 19 years
2. The level of significance is 5%
3. The test statistics is the sample mean age, = 16 years
4. The critical value of the one tailed test (one tailed because the
alternative hypothesis is an inequality) at 5% level of significance is
–1.65

7|Page
DMS 201: BUSINESS STATISTICS I

Acceptance region

Rejection region

- 1.65 0

5. The standardizes value of the sample mean is

Z = where =

Where, = Sample mean


µ = Population mean
S = sample standard deviation
n = sample size
z = standard value (as per computation)
The standard value Z must fall within the acceptance region for us
to accept the null hypothesis. Thus it must be > - 1.65 otherwise
we accept the alternative hypothesis.

Z = = - 10.1

6. Since –10.1 < -1.65, we reject the null hypothesis but accept the
alternative hypothesis at 5% level of significance i.e. the marriage
age in this community is significantly lower than 19 years

Example 2
A foreign company which manufactures electric bulbs has assured its
customers that the lifespan of the bulbs is 28 month with a standard
deviation of 4months
Recently the company embarked on a quality improvement research for
their product. After the research using new technology. A sample of 70
bulbs was tested and they gave a mean lifespan of 30.2 months
Does this justify the research undertaken? Use 1% level of significance to
conduct a statistical test in order to establish the truth about the above
question.
Testing procedure
1. Null hypothesis H0: µ = 28
Alternative hypothesis H1: µ > 28
2. The level of significance is 1% (one tailed test)
3. The test statistics is the sample mean age, x’ = 30.2

8|Page
DMS 201: BUSINESS STATISTICS I
4. The critical value of the one tailed test at 5% level of significance is
+ 2.33

9|Page
DMS 201: BUSINESS STATISTICS I

0.4900

1% = 0.01

2.33
5. The standardized value of the sample mean is

Z = = = 4.6

6. Since 4.6 > 2.33, we reject the null hypothesis but accept the
alternative hypothesis at 1% level of significance i.e. the new
sample mean life span is statistically significant higher than the
population mean
Therefore the research undertaken was worth while or justified

Example 3
A construction firm has placed an order that they require a consignment
of wires which have a mean length of 10.5 meters with a standard
deviation of 1.7 m
The company which produces the wires delivered 90 wires, which had a
mean length of 9.2 m., The construction company rejected the
consignment on the grounds that they were different from the order
placed.

Required
Conduct a statistical test to indicate whether you support or not support
the action taken by the construction company at 5% level of significance.

Solution
Null hypothesis µ = 10.5 m
Alternative hypothesis µ ≠ 10.5 m
Level of significance be 5%
The test statistics is the sample mean = 9.2m
The critical value of the two tailed test at 5% level of significance is ±
1.96 (two tailed test).

10 | P a g e
DMS 201: BUSINESS STATISTICS I

- 1.96 +1.96
The standardized value of the test Z =

Z = = = - 7.25

Since 7.25 < 1.96, reject the null hypothesis but accept the alternative
hypothesis at 5% level of significance i.e. the sample mean is
statistically different from the consignment ordered by the
construction company. Therefore support the action taken by the
construction company

TESTING THE DIFFERENCE BETWEEN TWO SAMPLE MEAN


(LARGE SAMPLES)
A large sample is defined as one which contains 30 or more items (n≥30)
Where n is the sample size
In a business those involved are constantly observant about the
standards or specifications of the item which they sell e.g. a trader may
receive a batch of items at one time and another batch at a later time at
the end he may have concluded that the two samples are different in
certain specifications e.g. mean weight mean lifespan, mean length e.t.c.
further it may become necessary to establish whether the observed
differences are statistically significant or not. If the difference are
statistically significant then it means that such differences must be
explained i.e. there are known causes but if they are not statistically
significant then it means that the difference observed have no known
causes and are mainly due to chance
If the differences are established to be statistically significant then it
implies that the complaints, which necessitated that kind of test, are
justified
Let X1 and X2 be any two samples whose sizes are n1 and n2 and mean 1

and 2. Standard deviation S 1 and S 2 respectively. In order to test the


difference between the two sample means, we apply the following
formular

11 | P a g e
DMS 201: BUSINESS STATISTICS I

Z = where =

Example 1
An agronomist was interested in the particular fertilize yield output. He
planted maize on 50 equal pieces of land and the mean harvest obtained
later was 60 bags per plot with a standard deviation of 1.5 bags. The
crops grew under natural circumstances and conditions without the soil
being treated with any fertilizer. The same agronomist carried out an
alternative experiment where he picked 60 plots in the same area and
planted the same plant of maize but a fertilizer was applied on these
plots. After the harvest it was established that the mean harvest was 63
bags per plot with a standard deviation of 1.3 bags

Required
Conduct a statistical test in order to establish whether there was a
significant difference between the mean harvest under the two types of
field conditions. Use 5% level of significance.

Solution

H0 : µ1 = µ2

H1 : µ1 ≠ µ2

Critical values of the two tailed test at 5% level of significance are 1.96
The standardized value of the difference between sample means is given
by Z where

Z = where S =

Z =

= 11.11

12 | P a g e
DMS 201: BUSINESS STATISTICS I
- 1.96 0 +1.96
Since 11.11 < -1.96, we reject the null hypothesis but accept the
alternative hypothesis at 5% level of significance i.e. the difference
between the sample mean harvest is statistically significant. This implies
that the fertilizer had a positive effect on the harvest of maize
Note: You don’t have to illustrate your solution with a diagram.

Example 2
An observation was made about reading abilities of males and females.
The observation lead to a conclusion that females are faster readers than
males. The observation was based on the times taken by both females
and males when reading out a list of names during graduation
ceremonies.
In order to investigate into the observation and the consequent
conclusion a sample of 200 men were given lists to read. On average
each man took 63 seconds with a standard deviation of 4 seconds
A sample of 250 women were also taken and asked to read the same list
of names. It was found that they on average took 62 second with a
standard deviation of 1 second.

Required
By conducting a statistical hypothesis testing at 1% level of significance
establish whether the sample data obtained does support earlier
observation or not

Solution
H0: µ1 = µ2
H1: µ1 ≠ µ2
Critical values of the two tailed test is at 1% level of significance is 2.58.

Z =

Z = = 3.45

Acceptance region

Rejection region

13 | P a g e
DMS 201: BUSINESS STATISTICS I

- 2.58 0 +2.58
+3.45
Since 3.45 > 2.33 reject the null hypothesis but accept the alternative
hypothesis at 1% level of significance i.e. there is a significant
difference between the reading speed of Males and females, thus
females are actually faster readers.

TEST OF HYPOTHESIS ON PROPORTIONS


This follows a similar method to the one for means exept that the
standard error used in this case:

Sp =

Z score is calculated as, Z = Where P = Proportion found in


the sample.
Π – the hypothetical proportion.

Example
A member of parliament (MP) claims that in his constituency only 50% of
the total youth population lacks university education. A local media
company wanted to ascertain that claim thus they conducted a survey
taking a sample of 400 youths, of these 54% lacked university education.

Required:
At 5% level of significance confirm if the MP’s claim is wrong.
Solution.
Note: This is a two tailed tests since we wish to test the hypothesis that
the hypothesis is different (≠) and not against a specific alternative
hypothesis e.g. < less than or > more than.

H0 : π = 50% of all youth in the constituency.


H1 : π ≠ 50% of all youth in the constituency.

Sp = = = 0.025

Z= = 1.6

at 5% level of significance for a two-tailored test the critical value is 1.96


since calculated Z value < tabulated value (1.96).
i.e. 1.6 < 1.96 we accept the null hypothesis.
Thus the MP’s claim is accurate.

HYPOTHESIS TESTING OF THE DIFFERENCE BETWEEN


PROPORTIONS

Example
Ken industrial manufacturers have produced a perfume known as
“fianchetto.” In order to test its popularity in the market, the
manufacturer carried a random survey in Back rank city where 10,000
14 | P a g e
DMS 201: BUSINESS STATISTICS I
consumers were interviewed after which 7,200 showed preference. The
manufacturer also moved to area Rook town where he interviewed
12,000 consumers out of which 1,0000 showed preference for the
product.

Required
Design a statistical test and hence use it to advice the manufacturer
regarding the differences in the proportion, at 5% level of significance.

Solution
H0 : π1 = π2
H1 : π1 ≠ π2

The critical value for this two tailed test at 5% level of significance =
1.96.

Now Z =

But since the null hypothesis is π1 = π2, the second part of the numerator
disappear i.e.
π1 - π2 = 0 which will always be the case at this level.

Then Z =

Where;
Sample 1 Sample 2
Sample size n1 = n2 =
10,000 12,000
Sample proportion of P2 = 0.83
success P1 =
Population proportion of Π1 Π2
success.

Now =

Where P =

And q = 1 – p
in our case
P=

=
= 0.78
q = 0.22

15 | P a g e
DMS 201: BUSINESS STATISTICS I
= 0.00894

Z= = 12.3

Since 12.3 > 1.96, we reject the null hypothesis but accept the
alternative. the differences between the proportions are statistically
significant. This implies that the perfume is much more popular in
Rook town than in Back rank city.

HYPOTHESIS TESTING ABOUT THE DIFFERENCE BETWEEN


TWO PROPORTIONS
Is used to test the difference between the proportions of a given attribute
found in two random samples.
The null hypothesis is that there is no difference between the population
proportions. It means two samples are from the same population.
Hence
H0 : π1 = π2
The best estimate of the standard error of the difference of P1 and P2 is
given by pooling the samples and finding the pooled sample proportions
(P) thus

P=

Standard error of difference between proportions

And Z =

Example
In a random sample of 100 persons taken from village A, 60 are found to
be consuming tea. In another sample of 200 persons taken from a village
B, 100 persons are found to be consuming tea. Do the data reveal
significant difference between the two villages so far as the habit of
taking tea is concerned?

Solution
Let us take the hypothesis that there is no significant difference between
the two villages as far as the habit of taking tea is concerned i.e. π 1 = π2
We are given
P1 = 0.6; n1 = 100
P2 = 0.5; n2 = 200

Appropriate statistic to be used here is given by

P =

=
= 0.53

16 | P a g e
DMS 201: BUSINESS STATISTICS I
q = 1 – 0.53
= 0.47

= 0.0608

Z=

= 1.64

Since the computed value of Z is less than the critical value of Z = 1.96
at 5% level of significance therefore we accept the hypothesis and
conclude that there is no significant difference in the habit of taking tea
in the two villages A and B

t distribution (student’s t distribution) tests of hypothesis (test


for small samples n < 30)
For small samples n < 30, the method used in hypothesis testing is
exactly similar to the one for large samples exept that t values are used
from t distribution at a given degree of freedom v, instead of z score, the
standard error Se statistic used is also different.
Note that v = n – 1 for a single sample and n 1 + n2 – 2 where two sample
are involved.

a) Test of hypothesis about the population mean


When the population standard deviation (S) is known then the t statistic
is defined as

t = where

Follows the students t distribution with (n-1) d.f. where


= Sample mean
μ = Hypothesis population mean
n = sample size
and S is the standard deviation of the sample calculated by the formula

S= for n < 30

If the calculated value of t exceeds the table value of t at a specified level


of significance, the null hypothesis is rejected.

Example
Ten oil tins are taken at random from an automatic filling machine. The
mean weight of the tins is 15.8 kg and the standard deviation is 0.5kg.
Does the sample mean differ significantly from the intended weight of
16kgs. Use 5% level of significance.

Solution
Given that n = 10; = 15.8; S = 0.50; μ = 16; v = 9

17 | P a g e
DMS 201: BUSINESS STATISTICS I
H0 : μ = 16
H1 : μ ≠ 16
=

t =

= -1.25
The table value for t for 9 d.f. at 5% level of significance is 2.26. the
computed value of t is smaller than the table value of t. therefore,
difference is insignificant and the null hypothesis is accepted.

b) Test of hypothesis about the difference between two means


The t test can be used under two assumptions when testing hypothesis
concerning the difference between the two means; that the two are
normally distributed (or near normally distributed) populations and that
the standard deviation of the two is the same or at any rate not
significantly different.

Appropriate test statistic to be used is

t = at n1 + n2 – 2 d.f.

The standard deviation is obtained by pooling the two sample standard


deviation as shown below.

Sp =

Where S1 and S2 are standard deviation for sample 1 & 2


respectively.

Now = and =

Alternatively = Sp

Example
Two different types of drugs A and B were tried on certain patients for
increasing weights, 5 persons were given drug A and 7 persons were
given drug B. the increase in weight (in pounds) is given below

Drug A 8 12 16 9 3
Drug B 10 8 12 15 6 8 11

18 | P a g e
DMS 201: BUSINESS STATISTICS I
Do the two drugs differ significantly with regard to their effect in
increasing weight? (Given that v= 10; t0.05 = 2.23)

Solution
H0 : μ1 = μ2
H1 : μ1 ≠ μ2

t=

Calculate for , and S


X1 X1 – (X1 – )2 X2 (X2 – ) (X2 – )2
8 -1 1 10 0 0
12 +3 9 8 -2 4
13 +4 16 12 +2 4
9 0 0 15 +5 25
3 -6 36 6 -4 16
8 -2 4
11 +1 1
ΣX1 = Σ(X1– )= Σ (X1 – )2= ΣX2= Σ (X2 – )= Σ (X2– )2=
45 0 62 70 0 54

X1 = = =9 X2 =

S1 = = 3.94 S2 =

Sp =

= 3.406

or 3.406

= 1.99

t = =

= 0.50
Now t0.05 (at v = 10) = 2.23 > 0.5

Thus we accept the null hypothesis.


Hence there is no significant difference in the efficacy of the two drugs in
the matter of increasing weight
Example

19 | P a g e
DMS 201: BUSINESS STATISTICS I
Two salesmen A and B are working in a certain district. From a survey
conducted by the head office, the following results were obtained. State
whether there is any significant difference in the average sales between
the two salesmen at 5% level of significance.

A B
No. of sales 20 18
Average sales in shs 170 205
Standard deviation in shs 20 25

Solution
H0 : μ1 = μ2
H1 : μ1 ≠ μ2
Where

Sp =

= Sp

Where: =170, = 205, n1 = 20, n2 = 18, S1 = 20, S2 = 25, V = 36

Sp =

= 22.5

= 7.31

t=

= 4.79
t0.05(36) = 1.9 (Since d.f > 30 we use the normal tables)

The table value of t at 5% level of significance for 36 d.f. when d.f. >30,
that t distribution is the same as normal distribution is 1.9. since the
value computed value of t is more than the table value, we reject the null
hypothesis. Thus, we conclude that there is significant difference in the
average sales between the two salesmen

Testing the hypothesis equality of two variances


The test for equality of two population variances is based on the
variances in two independently selected random samples drawn from two
normal populations
Under the null hypothesis

20 | P a g e
DMS 201: BUSINESS STATISTICS I

F= Now under the H0 : it follows that

F= which is the test statistic.

Which follows F – distribution with V1 and V2 degrees of freedom. The


larger sample variance is placed in the numerator and the smaller one in
the denominator
If the computed value of F exceeds the table value of F, we reject the null
hypothesis i.e. the alternate hypothesis is accepted

Example
In one sample of observations the sum of the squares of the deviations of
the sample values from sample mean was 120 and in the other sample of
12 observations it was 314. test whether the difference is significant at
5% level of significance

Solution
Given that n1 = 10, n2 = 12, Σ(x1 – )2 = 120
Σ(x2 – )2 = 314

Let us take the null hypothesis that the two samples are drawn from the
same normal population of equal variance
H0 :
H1:

Applying F test i.e.

F=

=
since the numerator should be greater than denominator

F=

The table value of F at 5% level of significance for V 1 = 9 and V2 = 3.11.


Since the calculated value of F is less than the table value, we accept the
hypothesis. The samples may have been drawn from the two population
having the same variances.

Chi square hypothesis tests (Non-parametric test)(X2)

21 | P a g e
DMS 201: BUSINESS STATISTICS I
They include amongst others
i. Test for goodness of fit
ii. Test for independence of attributes
iii. Test of homogeneity
iv. Test for population variance

The Chi square test (χ2) is used when comparing an actual (observed)
distribution with a hypothesized, or explained distribution.

It is given by; χ2 = Where O = Observed frequency

E = Expected frequency
The computed value of χ2 is compared with that of tabulated χ 2 for a
given significance level and degrees of freedom.

SUMMARY OF FORMULAE IN HYPOTHESIS

Testing

(a) Hypothesis testing of mean


For n>30

Z= Where at level of significance.

For n < 30

t= where

at n – 1 d.f
level of significance

(b) Difference between means


For n > 30

Z=

Where

At = level of significance
For n < 30

t= at n1 + n2 – 2 d.f

where

and

(c) Hypothesis testing of proportions

22 | P a g e
DMS 201: BUSINESS STATISTICS I

Z=

Where: Sp =

P = Proportion found in sample


q=1–p
= hypothetical proportion

(d) Difference between proportions

Z=

Where:

p=

q=1–P
(e) Chi-square test

X2 =

Where O = observed frequency

E= = expected frequency

(f) F – test (variance test)

F=

Here the bigger value between the standard deviations make the
numerator.

23 | P a g e

You might also like