Atwo Sampling Problem
Atwo Sampling Problem
Atwo Sampling Problem
As we have stressed several times in class, often our approach in statistics is comparative;
that is, we want to compare the parameters in two different populations. Here, in particular,
we will be interested in testing for the equality of means of two different populations. When
we do so, the null hypothesis simply states that the means are equal, but does not specify
what they are equal to, so that the null value of the parameters are unknown.
Suppose we have two populations, where the variable of interest is assumed to be nor-
mally distributed. We indicate these distributions as N (µ1 , σ12 ) and N (µ2 , σ22 ). We have
independent samples X11 , X12 , . . . , X1n1 and X21 , X22 , . . . , X2n2 , respectively, from the two
populations; n1 and n2 are the two sample sizes. We want to test for the equality of the two
means. The null hypothesis H0 : µ1 = µ2 essentially indicates µ1 − µ2 = 0, and since µ1 − µ2
is the mean of X̄1 − X̄2 , the latter is a good starting point.
Note that
σ12 σ22
X̄1 − X̄2 ∼ N µ1 − µ2 , + . (1)
n1 n2
So the null hypothesis H0 : µ1 = µ2 states that this random variable has mean zero. Under
the null, therefore,
X̄1 − X̄2
Z=q 2 ∼ N (0, 1). (2)
σ1 σ22
n1
+ n2
However the statistic in Equation (2) cannot be used on real life data, as σ12 and σ22 are
unknown. Unlike the one sample case, simply replacing them by the sample variances does
not automatically produce a random variable having a t distribution according to definition
(except in very special cases). In the following we will look at several different methods,
both exact and approximate, to handle the testing problem for difference of means, under
different situations.
1. Pooled t-test: Suppose it is possible to make the assumption that the two variances
(σ12 and σ22 ) are equal (although unknown). Of course this assumption has to be
justified. In practice, such an assumption may be made from previous information, or
after doing a test of equality of variances.
So now we have independent samples X11 , X12 , . . . , X1n1 and X21 , X22 , . . . , X2n2 , re-
spectively, from the two populations where the variable of interest has a N (µ1 , σ12 ) and
N (µ2 , σ22 ) distribution, respectively. But now we are assuming, in addition, σ12 = σ22 .
Let σ 2 represent this common, unknown value. In this case one can determine an exact
t-test for testing the null H0 : µ1 = µ2 .
Let s21 and s22 be the sample variances for the two samples. Let us note that
(n1 − 1)s21 2 (n2 − 1)s22
∼ χ (n1 − 1) and ∼ χ2 (n2 − 1) (3)
σ2 σ2
1
under the assumption of equality of variances. Therefore, by the reproductive property
of chi-squares
(n1 − 1)s21 + (n2 − 1)s22
2
∼ χ2 (n1 + n2 − 2). (4)
σ
So if we define
(n1 − 1)s21 + (n2 − 1)s22
s2p = ,
n1 + n2 − 2
then
(n1 + n2 − 2)s2p
2
∼ χ2 (n1 + n2 − 2). (5)
σ
We will refer to s2p as the pooled estimate of the variance. Now, under the null, with
the equality of variances assumption
X̄1 − X̄2
q ∼ N (0, 1).
σ n11 + n12
We will show that replacing σ by sp (the square root of the pooled estimate of the
variance), produces a random variable having a t distribution with (n1 +n2 −2) degrees
of freedom. Now
,
X̄1 − X̄2 X̄1 − X̄2 sp
q = q
sp n11 + n12 σ n11 + n12 σ
,vu ,
X̄1 − X̄2 u (n1 + n2 − 2)s2p
= q t (n1 + n2 − 2)
σ 1
+ 1 σ2
n1 n2
[a N (0, 1) variable]
= q 2 .
[a χ (n1 +n2 −2) variable]
n1 +n2 −2
As the numerator and the denominator are independent (in normal samples the mean
and the variance are independent), we have, from definition
X̄1 − X̄2
q ∼ t(n1 + n2 − 2)
sp n11 + n12
Example 1. The strength of concrete depends, to some extent, on the method used
for drying. Two different drying methods showed the following results for indepen-
dently tested specimens (measurements in psi):
2
Method I Method II
n1 = 7 n2 = 10
x̄1 = 3250 x̄2 = 3240
s1 = 210 s2 = 190
Do the data provide enough evidence to conclude that the mean strength is different
for the two methods?
Solution: We will perform the test under the assumption of equality of variances. In
this case the pooled variance is
6 × 2102 + 9 × 1902
s2p = = 40900.
7 + 10 − 2
√
Thus sp = 40900 = 202.2375. We assume that the strength in the i-th population
is distributed as N (µi , σi2 ), i = 1, 2. Our null hypothesis is H0 : µ1 = µ2 , to be tested
against H1 : µ1 6= µ2 . As we have assumed that the variances are equal, or test statistic
is
X̄1 − X̄2 3250 − 3240
t= q = q = 0.1003.
sp n11 + n12 202.2375 17 + 10
1
Under the null, the statistic has a t(15) distributions. At 5% level of significance, we
will reject when the statistic is grater than t0.975 (15) = 2.13145, or is smaller than
−2.13145. (The test is two-sided, as the problem only asks whether the mean strength
is “different”). Since the observed statistic 0.1003 does not go outside these limits, we
cannot reject the null hypothesis. The p-value is 0.9214, which is much larger compared
to 0.05. Thus, as far as the sample data are concerned, there is not enough evidence
to claim that the strengths of these two varieties are different.
2. Large sample Z-test: For the pooled-t test, the assumption of equality of variances
is essential. If such an assumption is not appropriate, performing an exact test for
the equality of means is difficult. However, if the sample sizes are large (as a rule of
thumb, if both sample sizes are equal to or larger than 30) we can do an approximate
Z test without assuming equality of variances.
Estimators are called “consistent” if they converge to the true value of the parameter
as the sample size increases. (This is a rather vague way of putting it, but you will
be exposed to the more rigorous definitions and applications in later courses, particu-
larly your probability courses). As s21 and s22 are consistent estimators for σ12 and σ22
respectively, it is generally expected that
3
would be close in large samples. For large samples, therefore, we will use
X̄ − X̄2
q 12
s1 s22
n1
+ n2
Group Sex n x̄ s
1 Male 133 25.34 5.05
2 Female 162 24.94 5.44
Do these data support the contention that males and females differ in average social
insight?
Solution: The sample sizes are large. We assume that social insight is distributed
as N (µi , σi2 ) in the i-th population, i = 1, 2. (Note that this assumption is not so
important here, as the means would approximately follow normals distributions from
the central limit theorem). Our null hypothesis is H0 : µ1 = µ2 , to be tested against
H1 : µ1 6= µ2 . As the sample sizes are larger, the test statistic
X̄1 − X̄2
Z=q 2 (6)
s1 s2
n1
+ n22
3. Welch’s t-test: When testing for the equality of means, sometimes we will come
across a situation where the setup will neither be appropriate for a pooled-t test or a
4
large sample Z test. For example, when the samples sizes (or at least one sample size)
are smaller than 30 and when the equality of variances appear to be an unrealistic
assumption, we can do neither of the above two tests.
In such situations we perform the Welch’s t-test, which has the same statistic
X̄ − X̄2
q 12 (8)
s1 s2
n1
+ n22
as the large sample Z-test. But now this is a t-statistic which can be shown to have
an approximate t-distribution with degrees of freedom
2 2
σ1 σ22
n1
+ n2
df = 2 2 2 2 . (9)
σ1 σ2
n1 n2
n1 −1
+ n2 −1
This is a function of σ12 and σ22 , which have to be replaced by s21 and s22 in practice. Also
the expression may not be an integer, so the the value of df will have to be rounded
before use.
Example 3. The following table gives the sample size, the mean and the standard
deviation based on two independent samples from two different populations. Perform
a test for the equality of the two means at 5% level of significance.
Group n x̄ s
1 12 19.04 4.61
2 9 23.99 4.32
5
4. Paired t-test: A paired t-test is also, technically, a test involving two samples. But a
paired t-test is fundamentally different from the three tests we have described earlier
in these notes. In a paired t-test, the two samples are dependent unlike the previous
two sample testing cases, all of which require the two samples to be independent.
In a paired t-test we essentially have n pairs of observations, the observations within
each pair being dependent, and if we write the two vectors (the first observations of
each pair and the second observations of each pair) separately, it gives the impression
of there being two different samples as in the three previous cases. In truth, it is one,
paired, sample.
Even then, one may want to test whether the mean of the first observation in the pair is
different from the mean of the second observation in the pair. But one cannot proceed
like a pooled t-test, large sample Z-test or Welch’s t-test, as the samples are dependent.
Instead, here the experimenter takes the difference between the first element of the pair
and the second element of the pair. Suppose that the null hypothesis is that the means
of the two elements of the pair are the same. Then, under the null, the mean of the
differenced vector is zero.
Note that this differencing essentially creates a single sample (of differences) and one
can then proceed to perform a one sample t-test for zero mean with these data.
6
Pair # Unaffected Affected Difference
1 1.94 1.27 0.67
2 1.44 1.63 −0.19
3 1.56 1.47 0.09
4 1.58 1.39 0.19
5 2.06 1.93 0.13
6 1.66 1.26 0.40
7 1.75 1.71 0.04
8 1.77 1.67 0.10
9 1.78 1.28 0.50
10 1.92 1.85 0.07
11 1.25 1.02 0.23
12 1.93 1.34 0.59
13 2.04 2.02 0.02
14 1.62 1.59 0.03
15 2.08 1.97 0.11
Solution: Here, if one looks at the data, there is one sample of unaffected individuals,
and one sample of affected (schizophrenic) individuals. This may make it look like two
different samples. But the observations have been obtained as pairs (from twins), and
observations within each pair are dependent as they are obtained from monozygotic
twins.
We want to test H0 : µ1 = µ2 against the alternative H1 : µ1 > µ2 , where µ1 is the
mean volume of the left hippocampus for healthy individuals, while µ2 is the same for
schizophrenic individuals. However, since the data are paired, we do not need to make
individual normality assumptions about the two populations. Rather, we will simply
assume that the difference (healthy − schizophrenic) of the volumes between the twins
has a normal distribution with mean µd and variance σd2 . If the null hypothesis is true,
then this mean µd is zero.
If we denote the differences as d1 , d2 , . . . , dn , the null hypothesis may be rewritten as
H0 : µd = 0. The test statistic may be written as
√ ¯
d¯ − µd nd
t= √ =
sd / n sd
where d¯ is the mean of the differences, sd is the standard deviation of the differences
and µd = 0 is mandated by the null hypothesis. In this example the test statistic is
√ ¯ √
nd 15(0.1987)
t= = = 3.229.
sd 0.2383
This is based on a t statistic with 14 degrees of freedom. The test is one-sided. Com-
paring with the quantile t0.95 (14) = 1.7613, we see that our observed value is far larger
7
than this. So the null hypothesis is rejected. It does appear that there is enough evi-
dence to indicate that the volume of the left hippocampus of affected (schizophrenic)
individuals is smaller, and the observed difference cannot be attributed to chance. (The
p-value of the test is 0.003).