The One-Sample T-Test: Department of Biostatistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Department of Biostatistics

The one-sample T-test


The one-sample t-test is used to determine if a sample comes from a population with a specific mean (the
hypothesized mean). In the instance of the one-sample t-test, we know the population mean. Then we
draw a random sample from the population make a statistical decision as to whether or not the sample
mean differs from the population mean. In one sample t-test, we know the population mean.

Example: A consumer group is investigating a producer of diet meals to examine if its prepackaged meals actually
contain the advertised 6 ounces of protein in each package. Based on the following data, is there any evidence that
the meals do not contain the advertised amount of protein? The goal is to test whether there is a significant
difference in the mean protein content and the company’s claim. Run the appropriate test at a 5% level of
significance.

5.1 4.9 6.0 5.1 5.7 5.5 4.9 6.1 6.0 5.8
5.2 4.8 4.7 4.2 4.9 5.5 5.6 5.8 6.0 6.1

First, we need to translate the question into statements that state the null hypothesis and the alternative hypothesis.

Null Hypothesis- – In the population of interest, the average protein content in the prepackage
diet meal equals to the claimed mean of 6.0 grams of protein

-vs-

Alternative Hypothesis- (two sided) – the average protein content in the prepackage diet meal is
different from the claimed mean of 6.0 grams of protein.

The exact form of the research hypothesis depends on our belief about the parameter of interest and whether
it has possibly increased, decreased or is different from the null value. The research hypothesis should be set
up before any data are collected.

Once we have our formal hypothesis, we can then perform a statistical test. Here, we will be using the one-sample T-
test.
To determine if the difference between your sample mean (calculated mean) and the hypothesized
̅
mean is statistically significant, you need to compute a test statistic, called a t-value: .


Department of Biostatistics

1. The numerator of the t-value, ( ̅ ), can be seen as a measure of the strength of the signal (the
difference between the mean of your sample and the hypothesized mean of the population).
2. The denominator in the formula measures the noise in our data (the standard error of the mean
quantifies the precision of the sample mean.) So, as n becomes larger, the denominator value
becomes smaller, given a fixed standard deviation, meaning that the noise gets smaller.
3. The t-value compares the strength of the signal to the noise in the data. If the observed mean is far
away from our hypothesized mean then the signal will be large. A very large t-statistic signifies that
either the signal is large, the standard error of the mean is very small, or both and will be strong
evidence that the null hypothesis should be rejected.

Assumptions for the one-sample t-test:


1. The variable of interest is measured on the interval or ratio scale (that is the variable is continuous).
a. Interval variable: the difference between two numbers is meaningful (e.g. Temperature)
b. Ratio variable: the difference between two numbers is meaningful and has a clear definition of
0 ( e.g. weight).
2. The observations in your data are independent
3. There should be no significant outliers (they influence the sample mean and sample standard
deviation).
4. The variable should be normally distributed.

Checking of assumption of the one-sample t-test


1. Variable should be interval or ratio
a. The variable of interest is ounce. If one package has is 6 ounces of protein, and another
package has 5 ounces of protein, can we estimate difference in protein content? Can we say
one package is twice as heavy as another package? Can a package contain 0 ounce of protein?
2. The observations in the data are independent.
a. Is the amount of protein in package 1 dependent on the amount of protein in package two?
There are no relations between the packages
3. There should be no significant outliers (there are tests such as Grubbs' test that can be used to check
this assumption)
a. Amount of protein the sample ranges from 4.2 ounces to 6.1 ounces, it seems to be fine.
4. The variable should be normally distributed (loosely speaking, the data should be symmetric, i.e. mean =
median = mode). There are formal tests of normality such as the Kolmogorov-Smirnov test that can be
used to check this assumption. In this example the mean is 5.4, the median is 5.5, and the modes are 4.9
and 6.0).
Department of Biostatistics
Department of Biostatistics

Here are the steps that are required to perform a one-sample T-test in EXCEL:

Step 0: Make sure your database is clean and in column format


Step 1: Compute Mean Age: ̅
Department of Biostatistics

∑ ̅ ∑ ̅
Step 2: Compute Standard deviation for Age: √ √
Department of Biostatistics

̅
Step 3: Compute the Test Statistic:

Step 4: Compute the degrees of freedom: DF = n-1


Department of Biostatistics

Step 5: Compute the P-value, using this formula TDIST(ABS(T-value),DF,TAILS): TAILS = 2 for two-tailed (two-sided) test
Department of Biostatistics

The final result should look like this. In this example, we reject the null hypothesis. We have enough evidence to say
that the protein content in the prepackaged meals is different than the advertised amount.

Here is an example to try your skills.

Twelve subjects with diagnosed hypertension were randomly selected for this study. The age at which they were
diagnosed were recorded and listed below. Based on the data, is there any evidence that the age at diagnosis is not
equal to 45.0 years?

Age at Diagnosis of
Hypertension 32.8 40.0 41.0 42.0 45.5 47.0 48.5 50.0 51.0 52.0 54.0 59.2

The p-value = 0.3723, we fail to reject the null hypothesis. There is no evidence suggesting that the mean age at
diagnosis is different from 45.0 years.

You might also like