Unit 3
Unit 3
Unit 3
Contents
I Hypothesis for the dierence between two population means:
matched pairs
I Hypothesis for the dierence between two population means:
independent samples
I Two normal populations with equal (unknown) variances
I Two normal populations with known variances
I Two nonnormal populations with unknown variances and large
samples
I Two Bernoulli populations
I Hypothesis for the ratio of two population variances: independent
samples
Chapter 3. Comparing two populations
Learning goals
At the end of this chapter you should be able to:
I Perform a test of hypothesis for the dierence between two
population means and for the ratio of two population variances
I Construct confidence intervals for the dierence/ratio
I Distinguish situations where a test based on matched pairs is
suitable from those where a test based on independent samples is
I Calculate the power of a test and the probability of Type II Error
Chapter 3. Comparing two populations
References
I Newbold, P. Statistics for Business and Economics
I Chapter 9 (9.6-9.9)
I Ross, S.
I Chapter 10
Introduction
product: i 1 2 3 4 5 6 7 8 9 10
high-recall: xi 137 135 83 125 47 46 114 157 57 144
low-recall: yi 53 114 81 86 34 66 89 113 88 111
di.: di = xi yi 84 21 2 39 13 20 25 44 31 33
Tests for the dierence between two means: matched pairs
Example: cont.
Population:
D = dierence between high- Test statistic: T = sDD /p
D0
tn
n 1
and low-recall
Observed test statistic:
'
2
D N(X Y , D )
D0 = 0 n = 10
SRS: n = 10 p
d = 21 sd = 1088 = 32.98
Sample: d = 210 d D0
10 = 21 t = p
142022 10(21)2 sd / n
sd2 = 10 1 = 1088
21
= p = 2.014
Objective: test 32.98/ 10
D0
z}|{
H0 : X Y 0 against H1 : X Y > 0
(Upper-tail test)
Tests for the dierence between two means: matched pairs
Example: cont.
tn 1 density
Conclusion: The sample data gave enough evidence to support the claim
that on the average, brain activity is higher for the high-recall than for
the low-recall group. If in fact, the mean brain activity were the same for
these two groups, then the probability of finding a sample result as
extreme as or more extreme than that actually obtained would be
between 0.025 and 0.05 (which is rather low).
Tests for the dierence between two means: matched pairs
Example: cont. in Excel: Go to menu: Data, submenu: Data Analysis,
choose function: t-Test Paired Two Sample for Means.
Columns A and B (data), in yellow (the observed test statistic and
p-value).
Two-tail test for the dierence between two means via CI:
matched pairs
Since the value of 0 belongs to this interval, we cannot reject the null
hypothesis of the equality of the two population means at a = 0.05
significance level.
Tests for the dierence between two means: independent
normal samples, population variances equal
I Let X be a population with mean X and variance X2 and Y be a
population with mean Y and variance Y2 , both normally distributed
with unknown, but equal population variances 2 = X2 = Y2 .
I Suppose we have a random sample of n1 observations from X and
an independent random sample of n2 observations from Y .
I In a two-tail test H0 : X Y = D0 against H1 : X Y 6= D0 :
I The test statistic is
X Y D0
T = q H0 tn1 +n2 2
1 1
sp n1
+ n2
' '
X N(X , X2 ) X N(Y , Y2 )
SRS: n1 = 4 SRS: n2 = 4
D0 78.0 63.5
z}|{ = p = 0.915
22.4 1/4 + 1/4
H0 : X Y = 0
against
Rejection region:
H1 : X Y > 0
(Upper-tail test) 1.440
z }| {
RR0.1 = {t : t > t6;0.1 }
Test statistic: T = rX Y H tn +n 2
1 + 1 0 1 2
sp
n1 n2 Since t = 0.915 2
/ RR0.1 we cannot reject the null hypothesis
Observed test statistic: at a 10% level.
D0 = 0 n1 = 4 n2 = 4
x = 78.0 sx = 24.4 y = 63.5 sy = 20.2
Conclusion: The sample data did
(n1 1)sx2 + (n2 1)sy2 not contain strong evidence
2
sp =
n1 + n2 2 suggesting that on average, more
(4 1)24.42 + (4 1)20.22 ideas will be generated by groups
=
4+4 2 with moderators. However, for such
= 501.7 small sample sizes, we cannot expect
great power in the test so quite large
dierences in the population means
would be needed to reject the null
hypothesis at low significance levels.
Two-tail test for the dierence between two means via CI:
independent normal samples, population variances equal
Since the value of 0 belongs to this interval, we cannot reject the null
hypothesis of the equality of the two population means at a = 0.01
significance level.
Tests for the dierence between two means: independent
large samples or two normal populations with known
variances
I Let X be a population with mean X and variance X2 and Y be a
population with mean Y and variance Y2 .
I Suppose we have a random sample of n1 observations from X and
an independent random sample of n2 observations from Y and:
I Either that both n1 and n2 are large and 12 and 22 are unknown
I Or that X and Y are normally distributed and 12 and 22 are known
I In a two-tail test H0 : X Y = D0 against H1 : X Y 6= D0 :
I The test statistic is:
I Either
X Y D0
Z = r H0 , approx. N(0, 1)
2
sX 2
sY
n1
+ n2
I Or
X Y D0
Z = r H0 N(0, 1)
2 2
X Y
n1
+ n2
I The rejection region is (at significance level ):
RR = {z : z < z/2 or z > z/2 }
Tests for the dierence between two means: independent
large samples or two normal populations with known
variances
Example: 9.7 (Newbold) A survey of practicing certified public accountants on
attitudes to women in the profession was carried out. Survey respondents were
asked to react on a scale from one (strongly disagree) to five (strongly agree)
to the statement: Women in public accounting are given the same job
assignments as men. For a sample of 186 male accountants, the mean
response was 4.059 and the sample quasi-standard deviation was 0.839. For an
independent random sample of 172 female accountants, the mean response was
3.680 and the sample quasi-standard deviation was 0.966. Test the null
hypothesis ( = 0.0001) that the two population means are equal against the
alternative that the true mean is higher for male accountants.
Population 1: Population 2:
X = response of a male accountant Y = response of a female accountant
' '
X X , X2 X Y , Y2
4.059 3.680
= q = 3.95
0.8392 /186 + 0.9662 /172
Tests for the dierence between two means: independent
large samples or two normal populations with known
variances
Since the value of 0 does not belong to this interval, we can reject the
null hypothesis of the equality of the two population means at a = 0.05
significance level.
Tests for the dierence between two proportions:
independent large samples
I Let X Bernoulli(pX ) and let Y Bernoulli(pY ) where pX and pY
are two population proportions of individuals with a characteristic of
interest.
I Suppose we have a random sample of n1 observations from X and
an independent random sample of n2 observations from Y and that
both n1 and n2 are large
I In a two-tail test H0 : pX = pY (= p0 ) against H1 : pX 6= pY :
I The test statistic is:
pX pY
Z = r H0 , approx. N(0, 1),
1 1
p0 (1 p0 ) n1
+ n2
where
n1 pX + n2 pY
p0 =
n1 + n2
I The rejection region is (at significance level ):
Example: 9.9 (Newbold) In market research, when populations of individuals or households are surveyed by mail questionnaires, it is
important to achieve as high a response rate as possible. One way to improve response might be to include in the questionnaire an initial
inducement question, intended to increase the respondents interest in completing the questionnaire. Questionnaires containing an
inducement question on the importance of recreation facilities in a city were sent to a sample of 250 households, yielding 101 responses.
Otherwise identical questionnaires, but without the inducement question, were sent to an independent random sample of 250 households,
producing 75 responses. Test the null hypothesis that the two population proportions of responses would be the same against the
alternative that the response rate would be higher when the inducement question is included.
Population 1: Population 2:
X = 1 if a person completes the Y = 1 if a person completes the
questionnaire with the inducement questionnaire without the inducement
question, and 0 otherwise question, and 0 otherwise
' '
X Bernoulli(pX ) Y Bernoulli(pY )
101 75
Sample: px = 250
= 0.404 Sample: py = 250
= 0.300
Tests for the dierence between two proportions:
independent large samples
Test statistic: Since p-value is very small, the null hypothesis can be rejected
pX pY at any significance level bigger than 0.0075.
Z = s H0 , approx. N(0, 1)
p0 (1 p0 ) 1 + 1
n1 n2
Observed test statistic:
Conclusion: The sample data did
n1 = 250 n2 = 250 contain very strong evidence
px = 0.404 py = 0.300 suggesting that a higher response
n1 px + n2 py rate will be achieved when an
p0 =
n1 + n2
inducement question is included
250(0.404) + (250)(0.300)
than when it is not.
=
250 + 250
= 0.352
Tests for the dierence between two proportions:
independent large samples
Since the value of 0 does not belong to this interval, we can reject the
null hypothesis of the equality of the two population means at a = 0.05
significance level.
Tests for the ratio of variances: normal samples
1.2
2
i=1 Xi
F = n1 Pm 2
m i=1 Yi
1.0
follows an Fn,m distribution with n df1=30 df2=30)
and m degrees of freedom. We can df1=10 df2=15
0.8
view it as a ratio of two normalized df1=8 df2=8
chi-square rvs. This is where the df1=5 df2=3
0.6
result from the previous page comes
from:
0.4
2
n1 1
z }| { 0.2
1 (n1 1)sX2
sX2 n1 1 2
=H0 Fn1 1,n2 1
0.0
' '
X N(X , X2 ) Y N(Y , Y2 )
SRS: n1 = 17 SRS: n2 = 11
Normal dierences D D0
p tn 1
Matched pairs sD / n
Normal pops. X Y D0
r H tn +n
Equal common var. 1 + 1 0 1 2 2
sp
n1 n2
Normal pops. X Y D0
X Y = D0 s H N(0, 1)
Known vars. 2 2 0
X + Y
n1 n2
Nonnormal pops. X Y D0
Unknown vars. s H , approx N(0, 1)
s2 s2 0
Large samples X + Y
n1 n2
Bernoulli pops. pX pY
pX pY = 0 s H0 , approx N(0, 1)
Large samples
p0 (1 p0 ) 1 + 1
n1 n2
2
sX
2 2 H Fn
X/ Y = 1 Normal pops.
s2 0 1 1,n2 1
Y