Atwo Sampling Problem

Two Sample Testing Problems and Solutions
As we have stressed several times in class, often our approach in statistics is comparative;
that is, we want to compare the parameters in two different populations. Here, in particular,
we will be interested in testing for the equality of means of two different populations. When
we do so, the null hypothesis simply states that the means are equal, but does not specify
what they are equal to, so that the null value of the parameters are unknown.
Suppose we have two populations, where the variable of interest is assumed to be nor-
mally distributed. We indicate these distributions as N (µ1 , σ12 ) and N (µ2 , σ22 ). We have
independent samples X11 , X12 , . . . , X1n1 and X21 , X22 , . . . , X2n2 , respectively, from the two
populations; n1 and n2 are the two sample sizes. We want to test for the equality of the two
means. The null hypothesis H0 : µ1 = µ2 essentially indicates µ1 − µ2 = 0, and since µ1 − µ2
is the mean of X̄1 − X̄2 , the latter is a good starting point.
Note that
σ12 σ22

X̄1 − X̄2 ∼ N µ1 − µ2 , + . (1)
n1 n2
So the null hypothesis H0 : µ1 = µ2 states that this random variable has mean zero. Under
the null, therefore,
X̄1 − X̄2
Z=q 2 ∼ N (0, 1). (2)
σ1 σ22
n1
+ n2
However the statistic in Equation (2) cannot be used on real life data, as σ12 and σ22 are
unknown. Unlike the one sample case, simply replacing them by the sample variances does
not automatically produce a random variable having a t distribution according to definition
(except in very special cases). In the following we will look at several different methods,
both exact and approximate, to handle the testing problem for difference of means, under
different situations.
1. Pooled t-test: Suppose it is possible to make the assumption that the two variances
(σ12 and σ22 ) are equal (although unknown). Of course this assumption has to be
justified. In practice, such an assumption may be made from previous information, or
after doing a test of equality of variances.
So now we have independent samples X11 , X12 , . . . , X1n1 and X21 , X22 , . . . , X2n2 , re-
spectively, from the two populations where the variable of interest has a N (µ1 , σ12 ) and
N (µ2 , σ22 ) distribution, respectively. But now we are assuming, in addition, σ12 = σ22 .
Let σ 2 represent this common, unknown value. In this case one can determine an exact
t-test for testing the null H0 : µ1 = µ2 .
Let s21 and s22 be the sample variances for the two samples. Let us note that
(n1 − 1)s21 2 (n2 − 1)s22
∼ χ (n1 − 1) and ∼ χ2 (n2 − 1) (3)
σ2 σ2
1
under the assumption of equality of variances. Therefore, by the reproductive property
of chi-squares
(n1 − 1)s21 + (n2 − 1)s22
2
∼ χ2 (n1 + n2 − 2). (4)
σ
So if we define
(n1 − 1)s21 + (n2 − 1)s22
s2p = ,
n1 + n2 − 2
then
(n1 + n2 − 2)s2p
2
∼ χ2 (n1 + n2 − 2). (5)
σ
We will refer to s2p as the pooled estimate of the variance. Now, under the null, with
the equality of variances assumption
X̄1 − X̄2
q ∼ N (0, 1).
σ n11 + n12
We will show that replacing σ by sp (the square root of the pooled estimate of the
variance), produces a random variable having a t distribution with (n1 +n2 −2) degrees
of freedom. Now
,
X̄1 − X̄2 X̄1 − X̄2 sp
q = q
sp n11 + n12 σ n11 + n12 σ
,vu ,
X̄1 − X̄2 u (n1 + n2 − 2)s2p
= q t (n1 + n2 − 2)
σ 1
+ 1 σ2
n1 n2
[a N (0, 1) variable]
= q 2 .
[a χ (n1 +n2 −2) variable]
n1 +n2 −2
As the numerator and the denominator are independent (in normal samples the mean
and the variance are independent), we have, from definition
X̄1 − X̄2
q ∼ t(n1 + n2 − 2)
sp n11 + n12
under the null.
Example 1. The strength of concrete depends, to some extent, on the method used
for drying. Two different drying methods showed the following results for indepen-
dently tested specimens (measurements in psi):
2
Method I Method II
n1 = 7 n2 = 10
x̄1 = 3250 x̄2 = 3240
s1 = 210 s2 = 190
Do the data provide enough evidence to conclude that the mean strength is different
for the two methods?
Solution: We will perform the test under the assumption of equality of variances. In
this case the pooled variance is
6 × 2102 + 9 × 1902
s2p = = 40900.
7 + 10 − 2
√
Thus sp = 40900 = 202.2375. We assume that the strength in the i-th population
is distributed as N (µi , σi2 ), i = 1, 2. Our null hypothesis is H0 : µ1 = µ2 , to be tested
against H1 : µ1 6= µ2 . As we have assumed that the variances are equal, or test statistic
is
X̄1 − X̄2 3250 − 3240
t= q = q = 0.1003.
sp n11 + n12 202.2375 17 + 10
1
Under the null, the statistic has a t(15) distributions. At 5% level of significance, we
will reject when the statistic is grater than t0.975 (15) = 2.13145, or is smaller than
−2.13145. (The test is two-sided, as the problem only asks whether the mean strength
is “different”). Since the observed statistic 0.1003 does not go outside these limits, we
cannot reject the null hypothesis. The p-value is 0.9214, which is much larger compared
to 0.05. Thus, as far as the sample data are concerned, there is not enough evidence
to claim that the strengths of these two varieties are different.
2. Large sample Z-test: For the pooled-t test, the assumption of equality of variances
is essential. If such an assumption is not appropriate, performing an exact test for
the equality of means is difficult. However, if the sample sizes are large (as a rule of
thumb, if both sample sizes are equal to or larger than 30) we can do an approximate
Z test without assuming equality of variances.
Estimators are called “consistent” if they converge to the true value of the parameter
as the sample size increases. (This is a rather vague way of putting it, but you will
be exposed to the more rigorous definitions and applications in later courses, particu-
larly your probability courses). As s21 and s22 are consistent estimators for σ12 and σ22
respectively, it is generally expected that
X̄ − X̄2 X̄1 − X̄2

q1 2 2
and q 2
σ1 σ2 s1 s22
n1
+ n2 n1
+ n2
3
would be close in large samples. For large samples, therefore, we will use
X̄ − X̄2
q 12
s1 s22
n1
+ n2
as an approximate Z-statistic for testing the equality of means.
Example 2. The Chaplin Social Insight test is a psychological test designed to

measure how accurately the subject appraises other people. The possible scores on the
test range from 0 to 41. During the development of the Chaplin test, it is given to
several different groups of people. Here the results for male and female college students
majoring in liberal arts.
Group Sex n x̄ s
1 Male 133 25.34 5.05
2 Female 162 24.94 5.44
Do these data support the contention that males and females differ in average social
insight?
Solution: The sample sizes are large. We assume that social insight is distributed
as N (µi , σi2 ) in the i-th population, i = 1, 2. (Note that this assumption is not so
important here, as the means would approximately follow normals distributions from
the central limit theorem). Our null hypothesis is H0 : µ1 = µ2 , to be tested against
H1 : µ1 6= µ2 . As the sample sizes are larger, the test statistic
X̄1 − X̄2
Z=q 2 (6)
s1 s2
n1
+ n22
has an approximate N (0, 1) distribution. For the given data
X̄1 − X̄2 25.34 − 24.94

Z=q 2 2
=q = 0.6537. (7)
s1 s2 5.052 5.442
n1
+ n2 133
+ 162
At 5% level of significance, we reject when observed Z is greater than 1.96, or smaller

than −1.96. As the observed statistic does not cross these limits, we cannot reject the
null hypothesis. At 5% level of significance, there is no evidence in the given data to
claim that males and females differ in terms of social insight. The p-value of the test
is 0.5133.
3. Welch’s t-test: When testing for the equality of means, sometimes we will come
across a situation where the setup will neither be appropriate for a pooled-t test or a
4
large sample Z test. For example, when the samples sizes (or at least one sample size)
are smaller than 30 and when the equality of variances appear to be an unrealistic
assumption, we can do neither of the above two tests.
In such situations we perform the Welch’s t-test, which has the same statistic
X̄ − X̄2
q 12 (8)
s1 s2
n1
+ n22
as the large sample Z-test. But now this is a t-statistic which can be shown to have
an approximate t-distribution with degrees of freedom
2 2
σ1 σ22
n1
+ n2
df = 2 2 2 2 . (9)
σ1 σ2
n1 n2
n1 −1
+ n2 −1
This is a function of σ12 and σ22 , which have to be replaced by s21 and s22 in practice. Also
the expression may not be an integer, so the the value of df will have to be rounded
before use.
Example 3. The following table gives the sample size, the mean and the standard
deviation based on two independent samples from two different populations. Perform
a test for the equality of the two means at 5% level of significance.
Group n x̄ s
1 12 19.04 4.61
2 9 23.99 4.32
Solution: We assume that the random variable of interest is normally distributed in

the two populations, with the mean and the variance being (µi , σi2 ), i = 1, 2. The
null hypothesis is H0 : µ1 = µ2 , to be tested against H1 : µ1 6= µ2 . Suppose that no
information is available which indicates that the equality of variances is a reasonable
assumption. We will perfrom a Welch’s t-test for these data.
The test statistic is

X̄1 − X̄2 19.04 − 23.99
t= q 2 2
=q = −2.5245
s1 s2 4.612 4.322
n1
+ n2 12
+ 9
The rounded df is 18.The corresponding p-value is 0.0212. It is much smaller compared

to the level of significance and we reject the null. There is enough evidence in the data
to suggest that the null hypothesis is false.
5
4. Paired t-test: A paired t-test is also, technically, a test involving two samples. But a
paired t-test is fundamentally different from the three tests we have described earlier
in these notes. In a paired t-test, the two samples are dependent unlike the previous
two sample testing cases, all of which require the two samples to be independent.
In a paired t-test we essentially have n pairs of observations, the observations within
each pair being dependent, and if we write the two vectors (the first observations of
each pair and the second observations of each pair) separately, it gives the impression
of there being two different samples as in the three previous cases. In truth, it is one,
paired, sample.
Even then, one may want to test whether the mean of the first observation in the pair is
different from the mean of the second observation in the pair. But one cannot proceed
like a pooled t-test, large sample Z-test or Welch’s t-test, as the samples are dependent.
Instead, here the experimenter takes the difference between the first element of the pair
and the second element of the pair. Suppose that the null hypothesis is that the means
of the two elements of the pair are the same. Then, under the null, the mean of the
differenced vector is zero.
Note that this differencing essentially creates a single sample (of differences) and one
can then proceed to perform a one sample t-test for zero mean with these data.
Example 4. Are any physiological indicators associated with schizophrenia? Early

studies, based largely on postmortem analysis, suggest that the sizes of certain areas
of the brain may be different in persons afflicted with schizophrenia than in others.
Confounding variables in these studies, however, clouded the issue considerably. In a
1990 article, researchers reported the results of a study that controlled for genetic and
socioeconomic differences by examining 15 pairs of monozygotic twins, where one of
the twins was schizophrenic and the other was not. The twins were located through an
intensive search throughout Canada and the United States. (Data from R. L. Suddath
et al., “Anatomical Abnormalities in the Brains of Monozygotic Twins Discordant for
Schizophrenia”, New England Journal of Medicine, 322(12) (1990): 789-93).
The researchers used magnetic resonance imaging to measure the volumes (in cm3 ) of
several regions and sub-regions inside the twins’ brains. The numbers presented here
are based on the summary statistics from one of the sub-regions, the left hippocampus.
Can the observed difference be attributed to chance?
6
Pair # Unaffected Affected Difference
1 1.94 1.27 0.67
2 1.44 1.63 −0.19
3 1.56 1.47 0.09
4 1.58 1.39 0.19
5 2.06 1.93 0.13
6 1.66 1.26 0.40
7 1.75 1.71 0.04
8 1.77 1.67 0.10
9 1.78 1.28 0.50
10 1.92 1.85 0.07
11 1.25 1.02 0.23
12 1.93 1.34 0.59
13 2.04 2.02 0.02
14 1.62 1.59 0.03
15 2.08 1.97 0.11
Solution: Here, if one looks at the data, there is one sample of unaffected individuals,
and one sample of affected (schizophrenic) individuals. This may make it look like two
different samples. But the observations have been obtained as pairs (from twins), and
observations within each pair are dependent as they are obtained from monozygotic
twins.
We want to test H0 : µ1 = µ2 against the alternative H1 : µ1 > µ2 , where µ1 is the
mean volume of the left hippocampus for healthy individuals, while µ2 is the same for
schizophrenic individuals. However, since the data are paired, we do not need to make
individual normality assumptions about the two populations. Rather, we will simply
assume that the difference (healthy − schizophrenic) of the volumes between the twins
has a normal distribution with mean µd and variance σd2 . If the null hypothesis is true,
then this mean µd is zero.
If we denote the differences as d1 , d2 , . . . , dn , the null hypothesis may be rewritten as
H0 : µd = 0. The test statistic may be written as
√ ¯
d¯ − µd nd
t= √ =
sd / n sd
where d¯ is the mean of the differences, sd is the standard deviation of the differences
and µd = 0 is mandated by the null hypothesis. In this example the test statistic is
√ ¯ √
nd 15(0.1987)
t= = = 3.229.
sd 0.2383
This is based on a t statistic with 14 degrees of freedom. The test is one-sided. Com-
paring with the quantile t0.95 (14) = 1.7613, we see that our observed value is far larger
7
than this. So the null hypothesis is rejected. It does appear that there is enough evi-
dence to indicate that the volume of the left hippocampus of affected (schizophrenic)
individuals is smaller, and the observed difference cannot be attributed to chance. (The
p-value of the test is 0.003).

Atwo Sampling Problem

Uploaded by

Copyright:

Available Formats

Atwo Sampling Problem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Atwo Sampling Problem

Uploaded by

Copyright:

Available Formats

Two Sample Testing Problems and Solutions

under the null.

X̄ − X̄2 X̄1 − X̄2

as an approximate Z-statistic for testing the equality of means.

Example 2. The Chaplin Social Insight test is a psychological test designed to

has an approximate N (0, 1) distribution. For the given data

X̄1 − X̄2 25.34 − 24.94

At 5% level of significance, we reject when observed Z is greater than 1.96, or smaller

Solution: We assume that the random variable of interest is normally distributed in

The test statistic is

The rounded df is 18.The corresponding p-value is 0.0212. It is much smaller compared

Example 4. Are any physiological indicators associated with schizophrenia? Early

You might also like