Lecture 8: Student's T-Tests: MATH E-156: Mathematical Statistics March 25, 2019

Lecture 8: Student’s t-Tests
MATH E-156: Mathematical Statistics

March 25, 2019
1 Introduction
So far in this course, all of our work on hypothesis testing and interval es-
timation has suffered from a deep flaw. Whenever we’ve worked with the
normal distribution, we’ve had to assume that the population variance is
known to the investigator. Of course this is wildly implausible – how could
it be that the researcher knows the population variance with complete confi-
dence, but doesn’t know the population mean? We did this in order to make
the mathematics simple, so that we could concentrate on the arguments that
justified the fundamental procedures of hypothesis testing. But now we need
to drop this assumption, and learn how to handle the real-world case where
the population variance is not known to the experimenter.
2 Student’s t Distribution
In much of our work, we’ve used an expression where we take the difference
between the sample mean and the population mean, and then divide this by
the standard deviation of the sampling distribution of the sample mean:
X −µ
q
Var[X]
We also know that the variance of the sampling distribution of the sample
mean is:
σ2
Var[ X ] =
n
1
This result holds for all distributions (at least those for which the variance
σ 2 is defined), and does not require any assumption of normality. If we
substitute this value into the first equation, we obtain the expression:
X −µ
Z = p
σ 2 /n
If X is normally distributed, then we can write this random variable Z as a

linear transformation of X:
X −µ
Z = p
σ 2 /n
1 µ
= p ·X − p
2
σ /n σ 2 /n
Any such linear transformation of a normal random variable will itself be

normally distributed. The expected value of Z is:
" #
X −µ
E[Z] = E p
σ 2 /n
E[ X ] − µ
= p
σ 2 /n
µ−µ
= p
σ 2 /n
= 0
2
Similarly, the variance of Z is:
" #
X −µ
Var[ Z ] = Var p
σ 2 /n
" #
1 µ
= Var p ·X − p
σ 2 /n σ 2 /n
" #
1
= Var p ·X
σ 2 /n
!2
1
= p · Var X
σ 2 /n
1
= · σ 2 /n
σ 2 /n
= 1
So Z is a normal random variable with expected value µ = 0 and variance

σ 2 = 1. But this just means that Z is a standard normal random variable,
by definition.
Notice the σ 2 in the denominator – that’s the population variance, so

we can only use this formula when the population variance is known. What
happens if we don’t know the population variance? In that case, we’ll have
to estimate it, and an obvious candidate for the estimator for the population
variance is the sample variance S 2 :
Pn
2 (Xi − X)2
S = i=1
n−1
The sample variance S 2 has some very nice properties: in particular, we
showed that, because we are dividing through by n − 1, it is an unbiased
estimator for the population variance. We also mentioned that, for a normally
distributed distribution, this estimator is efficient, which means that it has
the lowest mean squared error possible. So an attractive approach to dealing
with an unknown population variance is to use the sample variance S 2 to
3
estimate this quantity, and then to use S 2 in place of s2 in our expression:
X −µ X −µ
p −→ p
σ 2 /n S 2 /n
Other than this one substitution, everything else stays the same.
While this substitution is unquestionably a natural choice for handling

the problem of an unknown population variance, it creates some new dif-
ficulties for us. Before, when the population variance was known, the only
random variable in our expression was X, and the expression was just a linear
transformation of this random variable. We have nice theorems that enable
us to work with the expectation and variance of a linear transformation of a
random variable, so everything was straightforward. Once we replace σ 2 by
the estimator S 2 , this is no longer the case, and in fact we have a ratio of
two random variables. For this situation, we don’t have nice theorems about
the expectation and variance, so our old strategy doesn’t work anymore.
2.1 The Independence of X and S 2

An extremely important result for our derivation of the t distribution is that,
if X is normally distributed, then the sample mean random variable X and
the sample variance random variable S 2 are independent. In fact, the normal
distribution is the only distribution that has this property. Unfortunately,
it is difficult to prove this result, and it requires tools from linear algebra
that are not in the formal prerequisites for this course, so I’m not going to
present a formal derivation. Instead, we’ll use an alternative approach to
understanding this result, and hope to develop some intuition about what it
means.
To investigate this claim, we use the method of computer simulation. The

idea here is to use the computer to simulate a number of random samples from
a distribution, calculate the observed sample mean and sample variance, and
then plot these values on a 2-dimensional scatterplot. We can then visually
inspect the resulting image and look for indications of independence or non-
independence. We’ll start with distributions for which the sample mean and
sample variance are not independent, and then finally consider the normal
distribution and see how it’s different.
4
We’ll start with the exponential distribution. In this simulation, we draw
1,000 samples of size n = 10 from an exponential distribution with scale
parameter λ = 1, so that the density function is:
fX (x) = e−x , x > 0
For each sample we calculate the sample mean and the sample variance, and
then draw a point such that the x coordinate is the value of the sample mean
and the y coordinate is the value of the sample variance. The resulting graph
looks like this: Here you can clearly see that the distribution of the sample
Figure 1: Sample mean vs. sample variance for exponential distribution
variance depends on the value of the sample mean:

• For samples that have a small value for the observed sample mean, the
sample variance is also very small, but for samples that have a larger
value for the observed sample mean the sample variance tends to also
be larger.
• For samples that have a smaller value for the observed sample mean
the sample variance has small variability, but for samples with a larger
5
value for the observed sample mean the sample variance has greater
variability.
Clearly, for the exponential distribution, the sample mean random variable
X and the sample variance random variable S 2 are not independent.
Now let’s consider the 2-parameter Pareto distribution. Again, we use

the same approach as before: we draw a random sample of size n = 10,
calculate the sample mean and sample variance, and plot a point with these
values as the x and y coordinates, respectively. The only difference is that
now we are sampling from a 2-parameter Pareto distribution instead of an
exponential distribution. For this simulation, I chose to use a 2-parameter
Pareto distribution with shape parameter α = 3 and scale parameter θ = 10,
so that the density function is:
3 · θ3
fX (x) =
(x + 10)4
Here the graph is very similar to the graph for the exponential distribution:
Here you can clearly see that the distribution of the sample variance depends
on the value of the sample mean:
• For samples that have a small value for the observed sample mean, the
sample variance is also very small, but for samples that have a larger
value for the observed sample mean the sample variance tends to also
be larger.
• For samples that have a smaller value for the observed sample mean
the sample variance has small variability, but for samples with a larger
value for the observed sample mean the sample variance has greater
variability.
Just as with the exponential distribution, the sample mean random variable
X and the sample variance random variable S 2 are not independent for this
2-parameter Pareto distribution.
Now we’ll consider the uniform distribution. With this distribution it is

a little harder to see the failure of independence, mainly because there are
very few sample means that were observed at the endpoints:
6
Figure 2: Sample mean vs. sample variance for 2-parameter Pareto distribu-
tion
After all this, let’s finally take a look at the normal distribution. This
simulation is based on drawing 1,000 simulated samples of size n = 10 from
a normal distribution:
It’s important to emphasize that running simulations and looking at the

result graphic images is not a rigorous proof of the result.
3 The Sampling Distribution for the Sample

Variance
Recall the sample variance, defined as:
Pn 2
2 i=1 (Xi − X)
S =
n−1
7
Figure 3: Sample mean vs. sample variance for uniform distribution
In lecture 4, on point estimation, we saw that the expectation of the sample

variance S 2 is the population variance σ 2 :
E[S 2 ] = σ 2
In other words, S 2 is an unbiased estimator for the population variance. This
is an important result, and it’s true for any distribution, not just a normal
distribution. However, it only tells us about the expectation of S 2 , and we
can’t say anything in general about the sampling distribution of this statistic.
Now we’re going to derive the complete specification for the sample variance
when sampling from a normal distribution.
To derive this result, we’ll start out with a little algebra. First, note that:
Xn Xn Xn
(Xi − X) = Xi − X
i=1 i=1 i=1
= n·X −n·X
= 0
8
Figure 4: Sample mean vs. sample variance for normal distribution
Now we’re going to use a sneaky algebraic trick. We’ll start with an sum
over squared terms, and inside the squared terms we’ll add and subtract the
sample mean. Since we’re both adding and subtracting, this won’t affect the
sum, and then when we expand the expression we’ll use the little result we
9
just proved to get rid of one of the terms:
n
X n
X
2
(Xi − µ) = ((Xi − X) + (X − µ))2
i=1 i=1
n
X n
X
= (Xi − X)2 + 2 · (Xi − X) · (X − µ)
i=1 i=1
n
X
+ (X − µ)2
i=1
n
X n
X
2
= (Xi − X) + 2 · (X − µ) · (Xi − X)
i=1 i=1
+ n · (X − µ)2
n
X
= (Xi − X)2 + 2 · (X − µ) · 0
i=1
+ n · (X − µ)2
n
X
= (Xi − X)2 + n · (X − µ)2
i=1
So, we have obtained:

n
X n
X
2
(Xi − µ) = (Xi − X)2 + n · (X − µ)2
i=1 i=1
Let’s divide everything by σ 2 :

n 2 n 2 2
X Xi − µ X Xi − X X −µ
= + n·
i=1
σ i=1
σ σ
We can re-write the first term on the right-hand side:

n 2
X Xi − X (n − 1) · S 2
=
i=1
σ σ2
10
We can re-write the second term on the right-hand side:
2 2
X −µ X −µ
n· = √
σ σ/ n
Thus, we have:
n 2 2
(n − 1) · S 2

X Xi − µ X −µ
= + √
i=1
σ σ2 σ/ n
To make things simpler, let’s re-label these expressions:

n 2
X Xi − µ
U =
i=1
σ
(n − 1) · S 2
V =
σ2
2
X −µ
W = √
σ/ n
Thus, our equation is:
n 2 2
(n − 1) · S 2

X Xi − µ X −µ
= 2
+ √
σ σ σ/ n
|i=1
| {z } | {z }
{z } V W
U
Now what can we say about the distribution of these random variables
U , V , and W ? For U , note that each Xi is normally distributed with ex-
pected value µ and variance σ 2 , thus the expression inside the paretheses is
a standard normal, and since U consists of the sum of the squares of these
independent standard normal random variables, we can conclude that U is a
chi-squared random variance with n degrees of freedom:
U ∼ χ2 (n)
For W , the expression inside the parentheses is a standard normal random
variable, and since W is the square of this standard normal random variable,
it has a chi-squared distribution with 1 degree of freedom:
W ∼ χ2 (1)
11
We don’t know yet anything about the distribution of V , but we can say one
thing: since V is a function of the sample variance S 2 , and W is a function of
the sample mean X, and we know that for a normal population the sample
variance and the sample mean are independent, then it must be the case that
V and W are independent.
At this point, we’ve shown that U is chi-squared with n degrees of free-

dom, W is chi-squared with 1 degree of freedom, and V and W are indepen-
dent. Thus, U is the sum of two independent random variables, hence the
moment-generating function for U is the product of the moment-generating
functions for V and W :
MU (t) = MV (t) · MW (t)
The moment-generating function for U is:

1
MU (t) =
(1 − 2t)n/2
The moment-generating function for W is:

1
MW (t) =
(1 − 2t)1/2
Thus, the moment-generating function equation is:

1 1
n/2
= MV (t) ·
(1 − 2t) (1 − 2t)1/2
Thus, it follows that

1
MV (t) =
(1 − 2t)(n−1)/2
But this is the moment-generating function for a chi-squared random variable
with n − 1 degrees of freedom. Thus, we have shown that V is a chi-squared
random variable with n − 1 degrees of freedom, or:
(n − 1) · S 2
∼ χ2 (n − 1)
σ2
12
4 The t Distribution
Suppose we have random sample S = {X1 , X2 , . . . , Xn } from a normally
distributed population with mean µ and variance σ 2 . In this section we want
to determine the probability density function for the T statistic:
X −µ
T = r
S2
n
There are really two steps in this process:
• First, we want to define the distribution of the T statistic in a more
convenient form.
• Second, we want to use this new definition to calculate fT (t), the den-
sity function for the random variable T .
In practice, when we want to perform calculations with this probability distri-
bution, we invariably will use some form of statistical or numerical software,
so you can actually get away from learning all the details of this section. But
I encourage you to try to grasp at least the overall strategy, as the derivation
uses many of the tools that we’ve developed so far.
4.1 Defining the t distribution

Suppose we have random sample S = {X1 , X2 , . . . , Xn } from a normally
distributed population with mean µ and variance σ 2 . Our main goal in this
lecture is to determine the probability density function for the T statistic:
X −µ
T = r
S2
n
√
Our first step will be to divide the numerator and denominator by σ/ n,
the (unknown) standard error of the sample mean:
X −µ
√
X −µ σ/ n
r = r
S2 S2
n σ2
13
Next, in the denominator we’ll rewrite the quotient S 2 /σ 2 by multiplying
and dividing by (n − 1):
r s
S2 (n − 1) · S 2
= (n − 1)
σ2 σ2
So when we’re all done with all of these manipulations, we have:
X −µ
√
X −µ σ/ n
T = p = s
S 2 /n

(n − 1) · S 2
(n − 1)
σ2
At first, this looks utterly pointless, taking a relatively simple expression and
making it vastly more complicated. But in fact, we have done the contrary,
and this is an extraordinary result. First, notice that the numerator is just
a standard normal random variable, which we’ll denote as Z:
X −µ
Z = √
σ/ n
Notice that Z is really just a function of the sample mean random variable
X. Next, for the denominator, let’s define the random variable W :
(n − 1) · S 2
W =
σ2
We know from the previous section that W is a chi-squared random variable
with n − 1 degrees of freedom. Notice that W is just a function of the sample
variance random variable S 2 . Thus, we can now write our T statistic as:
Z
T = r
W
n−1
The numerator is a function of the sample mean random variable X, and
the denominator is a function of the sample variance random variable S 2 .
For a normally distributed population, we’ve seen that these two random
variables are independent. Thus, the numerator of T and the denominator
of T are independent as well. So we can now formally define the probability
distribution for the T statistic:
14
Let Z be a standard normal random variable, let W be a chi-
squared random variable with ν degrees of freedom, and let Z
and W be independent. Then a random variable T has the t
distribution if:
Z
T = p
W/ν
4.2 Calculating the density function for the t distribu-

tion
The density for the t distribution is:

ν+1
Γ −(ν+1)/2
t2

2
fT (t) = √ ν · 1 +
πν · Γ ν
2
This might look a little complicated, but in fact you never have to evaluate
this, and you’ll always use numerical software to do any calculations with
this.
5 Inference for One Normal Mean

That was a lot of math, and in the end we had one seemingly simple result:
if we draw a random sample of size n, denoted S = {X1 , X2 , . . . , Xn }, from a
normal distribution with expected value E[X] = µ and variance Var[X] = σ 2 ,
then the statistic
X −µ
T =p
S 2 /n
has a t distribution with n − 1 degrees of freedom.
In the previous lecture, we developed three methods for testing a null

hypothesis H0 :
• In the first method, we could calculate a rejection region, which was

the region where an observed test statistic would lead us to reject the
null hypothesis.
15
• In the second method, we could calculate a p-value, and if the p-value
is less that the pre-specified significance level, we would reject the null
hypothesis.
• Finally, in the third method, we could construct a confidence interval

for the parameter of interest, and if the confidence interval did not
contain the parameter value of the null hypothesis we would reject the
null hypothesis.
We also argued that all of these approaches are equivalent, in that for every
sample we will reject the null hypothesis using one of the methods if and
only if we would reject the null hypothesis using all of the methods.
In the hypothesis testing lecture, I emphasized that we assumed a known

variance because it made the math a little easier, and it enabled us to concen-
trate on the logic of the hypothesis testing. We’ve now done pretty much all
of the hard mathematical work to relax the assumption of a known variance,
and you should be able to see that we’re basically using the same approach
as before, except with a few alterations to handle the fact that we have to
estimate the variance. Let’s return to these three methods, and see how to
make the necessary adjustments.
5.1 Method 1: Rejection Region

In the first method, we calculate a rejection region, which is the set of values
of the test statistic that will lead us to reject the null hypothesis; thus if
we observe a value of the test statistic in the rejection region, we will reject
the null hypothesis, while if the observed test is not in the rejection regions
we will fail to reject the null hypothesis. Let’s suppose that the null and
alternative hypotheses are:
H0 : µ = µ0
HA : µ 6= µ0
Here, µ0 is a constant, and in any particular problem this will be replaced

by a specific numerical value. Notice here that we are doing a two-sided
test, so that we will be willing to reject the null hypothesis either because
the true population is larger than µ0 , or smaller than µ0 . Remember that n
denotes the sample size. To construct the rejection region for this test, we
16
assume that the null hypothesis is true and then under this assumption find
a probability statement of this form:

Pr L ≤ X ≤ U = 1 − α
As before, L and U are specific values that will insure that the probability
statement is correct. Unfortunately, we don’t have any results about X that
can directly give us this the values for L and R. However, we do know
something about the test statistic T . Let’s set up a probability interval for
this statistic: !
X −µ
Pr V ≤ p ≤W =1−α
S 2 /n
We know from our previous work that the test statistic in this probability
statement follows a t distribution with n − 1 degrees of freedom. Let QT (n −
1, p) denote the pth quantile of a t distribution with n − 1 degrees of freedom,
so that:
Pr(T ≤ QT (p, n − 1)) = p
Let’s choose for V and W the values:
V = QT (α/2, n − 1)
W = QT (1 − α/2, n − 1)
Notice that V will, by definition, cut of an area of α/2 in the lower tail, while
W will cut off a value of α/2 in the upper tail. Then we have:
!
X −µ
Pr tq (α/2, n − 1) ≤ p ≤ QT (1 − α/2, n − 1) = 1 − α
S 2 /n
Now we do some algebra, and we obtain:
r r !
S 2 S2
Pr µ + QT (α/2, n − 1) · ≤ X ≤ µ + QT (1 − α/2, n − 1) · = 1−α
n n
At this point, we have solved the problem of determining the values of L and
U:
r
S2
L = µ + QT (α/2, n − 1) ·
n
r
S2
U = µ + QT (1 − α/2, n − 1) ·
n
17
Let’s see an example of how this works. Let’s suppose that we draw
a sample of size 8 and observe a sample mean of x = 47.2 and a sample
variance of s2 = 27. As usual, we will perform our test at a significance level
of α = 0.05. Since the sample size is n = 8, then the degrees of freedom
will be df = 8 − 1 = 7. We want to perform a two-sided test for the null
hypothesis that µ = 53. Thus, our null and alternative hypotheses are:
H0 : µ = 53
HA : µ 6= 53
The quantile values are:
QT (0.025, 7) = −2.36462
QT (0.975, 7) = 2.36462
Now we have everything we need to calculate the rejection region for this
test:
r
s2
L = µ0 + QT (0.025, 7) ·
n
r
27
= 53 + (−2.36462) ·
8
= 48.65591
r
s2
U = µ0 + QT (0.975, 7) ·
n
r
27
= 53 + (+2.36462) ·
8
= 57.34409
Thus the rejection region is (48.65591, 57.34409). Since the observed test
statistic is x = 47.2, this is outside the rejection region, and we reject the
null hypothesis and consider this to be strong evidence against the null.
18
5.2 Method 2: The p-Value Method
With the p-value approach, we want to calculate the probability of obtaining
a test statistic that is as extreme or more extreme than what was actually
observed, given the assumption that the null hypothesis is true. For a one-
sample test, we can use the T statistic:
X −µ
T =p
S 2 /n
We know that, if the null hypothesis is true, this test statistic will follow a
t distribution with n − 1 degrees of freedom. Thus, for the p-value method,
we calculate the observed value of this test statistic, denoted t, and then
calculate the area in the tails cut off by t and −t (for a two-sided test). If
this area is less than the pre-specified significance level, then we should reject
the null hypothesis.
Let’s use the values from our previous example. We have:

x − µ0
t = q
s2
n
47.2 − 53
= q
27
8
= −3.15712
Now we calculate the area under the tails cut off by T = −3.15712 and
T = +3.15712 for a t distribution with df = n − 1 = 7 degrees of freedom:
p = P r(T ≤ −3.15712) + Pr(T ≥ 3.15712)
= 0.00800 + 0.00800
= 0.01599
Thus, since the p-value is less than the pre-specified significance level of
α = 0.05, we again consider this data to be strong evidence against the null
hypothesis, and we reject the null.
19
5.3 Method 3: Confidence Intervals
Finally, we can use confidence intervals to perform a null hypothesis. We
start with the probability interval for the T statistic:
!
X −µ
Pr QT (α/2, n − 1) ≤ p ≤ QT (1 − α/2, n − 1) = 1 − α
S 2 /n
With a little bit of algebra, we can re-arrange this to:
Pr (L∗ ≤ µ ≤ U ∗ ) = 1 − α
Here the values of L and U are:

r
S2
L∗ = X + QT (α/2, n − 1) ·
n
r
S2
U ∗ = X + QT (1 − α/2, n − 1) ·
n
We then check to see if the value of µ under the null hypothesis is contained
in the interval (L∗ , U ∗ ); if it is not, then we reject the null hypothesis.
Let’s return to our example. The relevant values are:
x = 47.2
s2 = 27
n = 8
α = 0.05
QT (α/2, n − 1) = −2.36462
QT (1 − α/2, n − 1) = +2.36462
Then the lower limit of the confidence interval is:

r
∗ s2
L = x + QT (α/2, n − 1) ·
n
r
27
= 47.2 − 2.36462 ·
8
= 42.85591
20
The upper limit of the confidence interval is:
r
s2
U ∗ = x + QT (1 − α/2, n − 1) ·
n
r
27
= 47.2 + 2.36462 ·
8
= 51.54409
Thus, the 95% confidence interval is (42.9, 51.5), and since this does not
contain the null value µ0 = 53, we once again view this data as providing
strong evidence against the null hypothesis and we reject the null.
6 Inference for Two Normal Means

6.1 The Sampling Distribution of the Two-Sample Dif-
ference of Sample Means D = X + Y
Now let’s consider the problem of comparing two independent populations.
First, we have a population with expected value E[X] = µX and variance
Var[X] = σ 2 , and we draw a random sample of size nX : SX = {X1 , X2 , . . . , XnX }.
Next, we have a population with expected value E[Y ] = µY and variance
Var[Y ] = σ 2 , and we draw a random sample of size nY : SY = {Y1 , Y2 , . . . , YnY }.
Since the populations are independent, it follows that X and Y are indepen-
dent. Notice that the two populations can potentially have different means,
but they must have the same common variance σ 2 . Both population means
µX and µY as well as the common variance σ 2 are unknown parameters.
One of the most natural questions to ask about such a model is whether
the population means µX and µY are equal. To investigate this, consider the
test statistic D, the difference of the two sample means:
D =X −Y
I’m going to call this statistic D the “two-sample difference of sample means”,
and you should be aware that no one else does this. Then the expected value
21
of D is:
E[D] = E[X − Y ]
= E[X − Y ]
= E[X] − E[Y ]
= µX − µY
The variance of D is:
Var[D] = Var[X − Y ]
= Var[X] − Var[Y ]
σ2 σ2
= +
nX nY

2 1 1
= σ · +
nX nY
Now consider this random variable:

(X − Y ) − E[X − Y ]
Z = q
Var[X − Y ]
(X − Y ) − (µX − µY )
= s
2
1 1
σ · +
nX nY
Since the populations for X and Y are normally distributed, then X and
Y are normally distributed, and thus the random variable Z is a standard
normal random variable.
How can we estimate the variance of D? In principle we could just use

2
SX ,the sample variance from the X population alone, because this will be
22
an unbiased estimator of the common variance σ 2 . However, if we did this,
we would be ignoring the data from Y , and that doesn’t seem like such a
great idea. Likewise, we could just use SY , the sample variance for the Y
population alone, and as before this will be an unbiased estimator for the
common variance σ 2 . But this is again an inefficient procedure, because we
are not incorporating the information from the X population. Instead, the
2
best thing to do is to take some sort of weighted average of SX and SY2 ,
where the weights add up to 1, because this will be an unbiased estimator
that incorporates the data from both X and Y :

a 2 b 2 a 2 b
· E SY2

E · SX + · SY = · E SX +
a+b a+b a+b a+b
It turns out that it is very nice to select a and b to be the degrees of freedom
2
of SX and SY2 , respectively:
a = nX − 1
b = nY − 1
a + b = (nX − 1) + (nY − 1)
= nX + nY − 2
When we use these weights for a and b, the resulting estimator is called the
pooled estimator of variance, and is denoted Sp2 :
nX − 1 nY − 1
Sp2 = 2
· SX + · S2
nX + nY − 2 nX + nY − 2 Y
What is the distribution of this pooled estimator of the variance Sp2 ? Let’s
clear fractions by multiplying by n + m − 2 and divide by σ 2 , so that we have:
(nX + nY − 2) · Sp2 2
(nX − 1) · SX (nY − 1) · SY2
= +
σ2 σ2 σ2
Let’s look at the two terms on the right-hand side of this equation. By our
basic result on the sampling distribution of the sample variance for a normal
23
population, the first term is a chi-squared random variable with nX − 1
degrees of freedom:
2
(nX − 1) · SX
∼ χ2 (nX − 1)
σ2
Similarly, the second term on the right-hand side is a chi-squared random
variable with nY − 1 degrees of freedom:
(nY − 1) · SY2
∼ χ2 (nY − 1)
σ2
Since the populations X and Y are independent, these two random variables
will be independent, and therefore their sum will be a chi-squared random
variable with (nX − 1) + (xY − 1) = nX + nY − 2 degrees of freedom:
2
(nX − 1) · SX (nY − 1) · SY2
+ ∼ χ2 (nX + nY − 2)
σ2 σ2
But this sum of random variables can also be expressed in terms of the pooled
variance estimator Sp2 , so we have:
(nX + nY − 2) · Sp2
∼ χ2 (nX + nY − 2)
σ2
Now it’s time for the grand finale! Let’s review what we’ve done so far.
First, we showed that the random variable Z was a standard normal random
variable:
(X − Y ) − (µX − µY )
Z = s ∼ N (0, 1)
1 1
σ2 · +
nX nY
Next, we found a random variable involving the pooled sampled variance
that followed a chi-squared distribution with nX + nY − 2 degrees of freedom:
(nX + nY − 2) · Sp2
∼ χ2 (nX + nY − 2)
σ2
Note that these two random variables are independent, because Z depends
only on the sample means X and Y , and the pooled sample variance depends
24
2
only on the sample variances SX and SY2 , and we know that for normally dis-
tributed populations the sample mean and sample variance are independent.
Now recall that the t distribution is defined as the ratio of two independent
random variables, with the numerator a standard normal random variable
Z and the denominator the square root of a chi-squared random variable X
divided by its degrees of freedom:
Z
T = p
X/n
Substituting, we have:
 
 
 (X − Y ) − (µX − µY ) 
 s 
 
 1 1 
σ2 · +
nX nY
T = v
u (nX + nY − 2) · Sp2
u
u
t σ2
nX + nY − 2
This looks scary, but some simplification is possible: in the denominator, we
can cancel the term nX + nY − 2, and we can cancel a σ 2 from both the
numerator and denominator. We end up with:
(X − Y ) − (µX − µY )
T = s
2
1 1
Sp · +
nX nY
Let’s finish this subsection by reviewing how we calculate this statistic:

• First, we draw a random sample of size nX from the X population, and
we draw a random sample of size nY from the Y population.
2
• We compute X, Y , SX , and SY2 .
• We calculate the pooled variance Sp2 :

nX − 1 nY − 1
Sp2 = 2
· SX + · SY2
nX + nY − 2 nX + nY − 2
25
• Finally, we calculate the T statistic:
(X − Y ) − (µX − µY )
T = s
2
1 1
Sp · +
nX nY
This test statistic will have a t distribution with nX + nY − 2 degrees
of freedom.
6.2 Hypothesis Testing: The Rejection Region Ap-

proach
Now that we have the sampling distribution for D, the two-sample difference
of sample means, we can start to make inferences from data. There are many
tests that we could perform with this model, but by far the most common is
to test if the two means µX and µY are equal. Unless we have strong prior
information about these populations, we will perform a two-sided hypothesis
test, and thus the null and alternative hypotheses are:
H0 : µX = µY
HA : µX 6= µY
Note that, when the null hypothesis is true, the two population means are
equal, so that µX − µY = 0. Thus, under the null, the sampling distribution
of the two-sample difference of sample means is:
(X − Y ) − (µX − µY )
T = s
2
1 1
Sp · +
nX nY
(X − Y ) − (0)
= =s
2
1 1
Sp · +
nX nY
X −Y
= =s
1 1
Sp2 · +
nX nY
26
Now we can construct a rejection region for a pre-specified significance
level α. We start with the probability interval statement:
 
 
 X −Y 
QT (α/2, nX + nY − 2) ≤ s
Pr  ≤ Q (1 − α/2, n + n − 2)
T X Y
 = 1−α

 2
1 1 
Sp · +
nX nY
If we multiply through by the denominator of the central term, we end up
with a probability statement of the form:
Pr(L ≤ X − Y ≤ U ) = 1 − α
In this statement, we have:
s
1 1
L = Sp2 · + · QT (α/2, nX + nY − 2)
nX nY
s
1 1
U = Sp2 · + · QT (1 − α/2, nX + nY − 2)
nX nY
Where do we get these t quantiles from? In R, we use the function qt, so for
example to obtain the q = 2.5% quantile for a t distribution with 17 degrees
of freedom, we would use the function:
qt(0.025, 17)
In Excel, we would use the formula:
= T.INV(0.025, 17)
Using either platform, you will end up with the value t0.025,17 = −2.160369.
Let’s see an example. Suppose we conduct an experiment, and we observe
these values:
nX = 9
nY = 6
x = 45
y = 37
2
sX = 43
s2Y = 46
27
Now we can calculate the pooled variance estimate s2P :
9−1 6−1
s2P = · 43 + · 45
9+6−2 9+6−2
= 44.15385
The t quantiles are:
t0.025,13 = −2.160369
t0.975,13 = +2.160369
Then L is:
s
1 1
L = Sp2 · + · QT (α/2, nX + nY − 2)
nX nY
s
1 1
= 44.15385 · + · (−2.160369)
9 6
= −7.56591
For U we have:
s
1 1
U = Sp2 · + · QT (1 − α/2, nX + nY − 2)
nX nY
s
1 1
= 44.15385 · + · (+2.160369)
9 6
= +7.56591
In fact, under the null hypothesis, the t distribution is symmetric with respect
to 0, hence L will be negative, and U will be positive, and they will both have
the same magnitude, so once you know one of L or U you can immediately
write down the other. To perform the hypothesis test, we calculate the
28
observed value of the test statistic:
d = x−y
= 45 − 37
= 8
Now we can make our inference: the observed value of the test statistic is
greater than U , so it lies in the rejection region, and we conclude that this
data provides strong evidence against the null hypothesis that the population
means µX and µY are equal, or equivalently that this data provides strong
evidence that the population mean µX is greater than the population mean
µY .
6.3 Hypothesis Testing: The p-Value Approach

Using the p-value approach, we calculate the probability of obtaining a test
statistic that is as extreme or more extreme than what was actually observed,
given that the null hypothesis is true. We’ve just derived the sampling dis-
tribution of the T statistic under the null:
X −Y
T = s
2
1 1
Sp · +
nX nY
Under the null hypothesis, this test statistic will follow a t distribution with
nX + nY − 2 degrees of freedom. Thus, for the p-value method, we calculate
the observed value of this test statistic, denoted t, and then calculate the
area in the tails cut off by t and −t (for a two-sided test). If this area is less
than the pre-specified significance level, then we reject the null hypothesis.
Let’s go back to our previous example to see how this works. In this case,
29
the observed test statistic is:
x−y
t = s
1 1
s2p · +
nX nY
45 − 37
= s
1 1
44.15385 · +
9 6
= 2.28432
Now we need to calculate the upper tail cut off by the value t = 2.28432,
as well as the lower tail cut off by t = −2.28432, using a t distribution with
nX + nY − 2 = 9 + 6 − 2 = 13 degrees of freedom:
p = Pr(T ≤ −2.28432) + Pr(T ≥ 2.28432)
= 0.01990 + 0.01990
= 0.03980
Thus, the p-value is p = 0.03980, and since this is less than the pre-specified
significance level of α = 0.05, we reject the null hypothesis, and again con-
clude that this data provides strong evidence that the population mean µX
is greater than the population mean µY .
6.4 Hypothesis Testing: The Confidence Interval Ap-

proach
Finally, we can perform a hypothesis test using confidence intervals. Again,
we have a pre-specified significance level α, and then we have the probability
interval statement:
 
 
 (X − Y ) − (µ X − µ Y ) 
QT (α/2, nX + nY − 2) ≤ s
Pr  ≤ QT (1 − α/2, nX + nY − 2) = 1−α


 1 1 
Sp2 · +
nX nY
30
As usual, we do some algebraic manipulation to obtain a probability interval
statement of the form:
Pr(L ≤ µX − µY ≤ U ) = 1 − α
Here we have:
s
1 1
L = (X − Y ) − Sp2 · + · QT (α/2, nX + nY − 2)
nX nY
s
1 1
U = (X − Y ) + Sp2 · + · QT (1 − α/2, nX + nY − 2)
nX nY
Let’s go back to our example one last time. We calculate the lower limit
L:
s
1 1
L = (X − Y ) − Sp2 · + · QT (α/2, nX + nY − 2)
nX nY
s
1 1
= (45 − 37) − 44.15385 · + · 2.16037
9 6
= 0.43409
Similarly, we can calculate U :

s
1 1
U = (X − Y ) − Sp2 · + · QT (1 − α/2, nX + nY − 2)
nX nY
s
1 1
= (45 − 37) + 44.15385 · + · 2.16037
9 6
= 15.56591
We can check these calculations by making sure that the midpoint of the
interval is x − y = 8:
0.43409 + 15.56591
=8
2
31
So our 95% confidence interval for µX − µY , the difference of the true pop-
ulation means of X and Y respectively, is (0.43409, 15.56591). Since this
does not contain the value 0, this indicates that 0 is not a plausible value
for µX − µY , or equivalently it is implausible that µX = µY , and we reject
the null hypothesis. For the third time, we conclude that this data provides
strong evidence that the population mean µX is greater than the population
mean µY .
32

Lecture 8: Student's T-Tests: MATH E-156: Mathematical Statistics March 25, 2019

Uploaded by

Copyright:

Available Formats

Lecture 8: Student's T-Tests: MATH E-156: Mathematical Statistics March 25, 2019

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 8: Student's T-Tests: MATH E-156: Mathematical Statistics March 25, 2019

Uploaded by

Copyright:

Available Formats

Lecture 8: Student’s t-Tests

MATH E-156: Mathematical Statistics

If X is normally distributed, then we can write this random variable Z as a

Any such linear transformation of a normal random variable will itself be

So Z is a normal random variable with expected value µ = 0 and variance

Notice the σ 2 in the denominator – that’s the population variance, so

While this substitution is unquestionably a natural choice for handling

2.1 The Independence of X and S 2

To investigate this claim, we use the method of computer simulation. The

fX (x) = e−x , x > 0

Figure 1: Sample mean vs. sample variance for exponential distribution

variance depends on the value of the sample mean:

Now let’s consider the 2-parameter Pareto distribution. Again, we use

Now we’ll consider the uniform distribution. With this distribution it is

It’s important to emphasize that running simulations and looking at the

3 The Sampling Distribution for the Sample

In lecture 4, on point estimation, we saw that the expectation of the sample

So, we have obtained:

Let’s divide everything by σ 2 :

We can re-write the first term on the right-hand side:

To make things simpler, let’s re-label these expressions:

At this point, we’ve shown that U is chi-squared with n degrees of free-

MU (t) = MV (t) · MW (t)

The moment-generating function for U is:

The moment-generating function for W is:

Thus, the moment-generating function equation is:

Thus, it follows that

4.1 Defining the t distribution

4.2 Calculating the density function for the t distribu-

5 Inference for One Normal Mean

In the previous lecture, we developed three methods for testing a null

• In the first method, we could calculate a rejection region, which was

• Finally, in the third method, we could construct a confidence interval

In the hypothesis testing lecture, I emphasized that we assumed a known

5.1 Method 1: Rejection Region

Here, µ0 is a constant, and in any particular problem this will be replaced

The quantile values are:

Let’s use the values from our previous example. We have:

p = P r(T ≤ −3.15712) + Pr(T ≥ 3.15712)

Here the values of L and U are:

Let’s return to our example. The relevant values are:

Then the lower limit of the confidence interval is:

6 Inference for Two Normal Means

The variance of D is:

Now consider this random variable:

How can we estimate the variance of D? In principle we could just use

Let’s finish this subsection by reviewing how we calculate this statistic:

• We calculate the pooled variance Sp2 :

6.2 Hypothesis Testing: The Rejection Region Ap-

The t quantiles are:

6.3 Hypothesis Testing: The p-Value Approach

6.4 Hypothesis Testing: The Confidence Interval Ap-

Similarly, we can calculate U :

You might also like