Ttest
Ttest
Ttest
The t-test is the most basic inferential statistic. A t-test is a statistical test that is used
to compare the means of two groups. It is often used in hypothesis testing to determine
whether a process or treatment actually has an effect on the population of interest, or
whether two groups are different from one another (Bevans, 2020). The distribution of
continuous data can often be closely approximated by the normal distribution (Ugoni,
1995). Need for the t distribution stems from the fact that we have had to estimate the
standard deviation, throwing extra variability into the problem (Dawson-Saunders &
Trapp, 2009)
T-Test Assumptions
1. The first assumption made regarding t-tests concerns the scale of measurement. The
assumption for a t-test is that the scale of measurement applied to the data collected
follows a continuous or ordinal scale, such as the scores for an IQ test.
2. The second assumption made is that of a simple random sample, that the data is
collected from a representative, randomly selected portion of the total population.
3. The third assumption is the data, when plotted, results in a normal distribution, bell-
shaped distribution curve.
A t-test can only be used when comparing the means of two groups (a.k.a. pairwise
comparison). If you want to compare more than two groups, or if you want to do
multiple pairwise comparisons, use an ANOVA test or a post-hoc test (Dawson-
Saunders & Trapp, 2009).
The t-test is a parametric test of difference, meaning that it makes the same
assumptions about your data as other parametric tests. The t-test assumes your data:
are independent
are (approximately) normally distributed.
have a similar amount of variance within each group being compared (a.k.a.
homogeneity of variance)
If your data do not fit these assumptions, you can try a nonparametric alternative to
the t-test, such as the Wilcoxon Signed-Rank test for data with unequal variances.
Type of T-Test
The one sample t test is concerned with making inference regarding a population
mean. For example, suppose you were interested in testing the hypothesis that the
average ESR for polymyalgia rheumatic was 95 mm/hr. To show this, you would need
to randomly select ‘n’ (say 100) people with polymyalgia rheumatic (Dawson-
Saunders & Trapp, 2009). From this sample we obtain 2 statistics. The sample mean
x, and the sample standard deviation (s).
The one-sample or univariate t-test starts with four numbers -- an assumed value for
population mean (according to the null hypothesis, µ), the sample mean, an estimate
of the spread of the sampling distribution for the mean and a measure of quality (df)
plus the Assumption of Normality. From this, you (or SPSS) can calculate the t-
statistic and then look up the p-value for this particular t-statistic for the df that you
have. If the p-value is less than .05, you reject the null hypothesis. The population
mean according to the null hypothesis does not require any calculation. It is usually
set by theory.
It is rarely the case that we know what the population standard deviation is, and
usually need to estimate it with the following When testing a hypothesis, we always
assume the hypothesis is correct. We now want to know what the probability of our
observed sample mean ( x ) or something more extreme occurring is.
If we assume that the underlying distributions which the two samples were taken from
are both normally distributed, then the distribution of each of the means will also be
normally distributed as discussed before. It can be shown that the difference between
2 normally distributed variables will also have a normal distribution.
By calculating the number of standard errors the sample mean lies from the
hypothesised mean, we are able to obtain the probability P (X > x), by comparing t* to
the appropriate t distribution. Having multiplied this probability by ‘2’, we have then
calculated the 2 sided p-value.
Common practice is to reject the hypothesis when the p-value is less than 0.05, and
not reject it when the p- value is greater than 0.05.
There is only one assumption of the univariate t-test: the data are normal. This can and
should be tested prior to the running the t-test, using, e.g., the Shapiro-Wilk test.
Usually we want to compare the means of 2 groups. For example, the mean of a
treatment group and the mean of a control group for polymyalgia rheumatic. The
hypothesis tested here, is the hypothesis stated ie. ‘Nothing Happens’, or the means in
the 2 groups are equal to each other (Zar, 1984). If we denote the mean of the
treatment group by 1 and the mean of the control group by 2, then the hypothesis
that we want to test is
1 - 2 = 0
the study design would be to take a random sample of n1 people who have treatment,
and a random sample of n2 people who act as controls, and calculate the difference
between the sample means by an hypothesized mean.
The p-value can then be derived using the same method as with the one sample t test.
That is, calculate the number of SE’s the sample mean lies from the hypothesized
mean, and compare this t statistic to the appropriate t distribution.
3. The variances of the two populations are equal. (If not, the Aspin-Welch Unequal-
Variance test is used.)
4. The two samples are independent. There is no relationship between the individuals
in one sample as compared to the other (as there is in the paired t-test).
5. Both samples are simple random samples from their respective populations. Each
individual in the population has an equal probability of being selected in the sample
When choosing a t-test, you will need to consider two things: whether the groups
being compared come from a single population or two different populations, and
whether you want to test the difference in a specific direction (Ugoni, 1995).
If the groups come from a single population (e.g. measuring before and after an
experimental treatment), perform a paired t-test.
If the groups come from two different populations (e.g. two different species,
or people from two separate cities), perform a two-sample t-
test (a.k.a. independent t-test).
If there is one group being compared against a standard value (e.g. comparing
the acidity of a liquid to a neutral pH of 7), perform a one-sample t-test.
If you only care whether the two populations are different from one another,
perform a two-tailed t-test.
If you want to know whether one population mean is greater than or less than
the other, perform a one-tailed t-test.
Performing a t-test
The t-test estimates the true difference between two group means using the ratio of the
difference in group means over the pooled standard error of both groups. You can
calculate it manually using a formula, or use statistical analysis software.
T-test formula
The formula for the two-sample t-test (a.k.a. the Student’s t-test) is shown below.
In this formula, t is the t-value, x1 and x2 are the means of the two groups being
compared, s2 is the pooled standard error of the two groups, and n1 and n2 are the
number of observations in each of the groups.
A larger t-value shows that the difference between group means is greater than the
pooled standard error, indicating a more significant difference between the groups.
You can compare your calculated t-value against the values in a critical value chart to
determine whether your t-value is greater than what would be expected by chance. If
so, you can reject the null hypothesis and conclude that the two groups are in fact
different.
Calculating a t-test requires three key data values. They include the difference
between the mean values from each data set (called the mean difference), the standard
deviation of each group, and the number of data values of each group. The outcome of
the t-test produces the t-value. This calculated t-value is then compared against a value
obtained from a critical value table (called the T-Distribution Table). This comparison
helps to determine the effect of chance alone on the difference, and whether the
difference is outside that chance range. The t-test questions whether the difference
between the groups represents a true difference in the study or if it is possibly a
meaningless random difference (Ugoni, 1993)
T-Distribution Tables
The T-Distribution Table is available in one-tail and two-tails formats. The former is
used for assessing cases which have a fixed value or range with a clear direction
(positive or negative). For instance, what is the probability of output value remaining
below -3, or getting more than seven when rolling a pair of dice? The latter is used for
range bound analysis, such as asking if the coordinates fall between - 2 and +2. The
calculations can be performed with standard software programs that support the
necessary statistical functions, like those found in MS Excel.
The t-test produces two values as its output: t-value and degrees of freedom. The t-
value is a ratio of the difference between the mean of the two sample sets and the
variation that exists within the sample sets. While the numerator value (the difference
between the mean of the two sample sets) is straightforward to calculate, the
denominator (the variation that exists within the sample sets) can become a bit
complicated depending upon the type of data values involved. The denominator of the
ratio is a measurement of the dispersion or variability. Higher values of the t-value,
also called tscore, indicate that a large difference exists between the two sample sets.
The smaller the t-value, the more similarity exists between the two sample sets
Conclusion
References
Ugoni A. On the Subject of Hypothesis Testing. COMSIG Review 1993; 2(2): 45-48.
Zar J. H. Biostatistical Analysis. New Jersey: New Jersey 1984: 97-101.