T-Test: What It Is With Multiple Formulas and When To Use Them

T-Test: What It Is With Multiple Formulas and When To Use Them
From Video:
T-test is a term from statistics that allows for the comparison of two data populations and their means. The test is used
to see if the two sets of data are significantly different from one another. A null hypothesis is used to test for the
significant difference. In addition, the data sets usually follow a normal distribution curve, but the variances are
unknown and assumed to be equal. The t-test was devised by William Sealy Gosset around 1908 while he was an
employee of the Guinness Brewery in Ireland. Gosset used the t-test to monitor the quality of the stout brewed by
Guinness. A t-test could be used to test the returns from two different portfolios that were managed under two different
investment strategies. In this analysis, a null hypothesis can be created, for instance, where the means of the returns for
the two portfolios do not differ. The test then looks at the T-statistic and t-distribution to determine a p-value
(probability) that can be used to either validate or refute the null hypothesis. The t-test is one of several different types
of statistical tests used for hypothesis testing.
--
What Is a T-Test?
A t-test is an inferential statistic used to determine if there is a significant difference between the means of two groups
and how they are related. T-tests are used when the data sets follow a normal distribution and have unknown variances,
like the data set recorded from flipping a coin 100 times.
The t-test is a test used for hypothesis testing in statistics and uses the t-statistic, the t-distribution values, and the
degrees of freedom to determine statistical significance
--
Key Takeaways
A t-test is an inferential statistic used to determine if there is a statistically significant difference between the means of
two variables.
The t-test is a test used for hypothesis testing in statistics.
Calculating a t-test requires three fundamental data values including the difference between the mean values from each
data set, the standard deviation of each group, and the number of data values.
T-tests can be dependent or independent.
--
Understanding the T-Test
A t-test compares the average values of two data sets and determines if they came from the same population. In the
above examples, a sample of students from class A and a sample of students from class B would not likely have the same
mean and standard deviation. Similarly, samples taken from the placebo-fed control group and those taken from the
drug-prescribed group should have a slightly different mean and standard deviation.
Mathematically, the t-test takes a sample from each of the two sets and establishes the problem statement. It assumes
a null hypothesis that the two means are equal.
Using the formulas, values are calculated and compared against the standard values. The assumed null hypothesis is
accepted or rejected accordingly. If the null hypothesis qualifies to be rejected, it indicates that data readings are strong
and are probably not due to chance.
The t-test is just one of many tests used for this purpose. Statisticians use additional tests other than the t-test to
examine more variables and larger sample sizes. For a large sample size, statisticians use a z-test. Other testing options
include the chi-square test and the f-test.
--
Using a T-Test
Consider that a drug manufacturer tests a new medicine. Following standard procedure, the drug is given to one group
of patients and a placebo to another group called the control group. The placebo is a substance with no therapeutic
value and serves as a benchmark to measure how the other group, administered the actual drug, responds.
After the drug trial, the members of the placebo-fed control group reported an increase in average life expectancy of
three years, while the members of the group who were prescribed the new drug reported an increase in average life
expectancy of four years.
Initial observation indicates that the drug is working. However, it is also possible that the observation may be due to
chance. A t-test can be used to determine if the results are correct and applicable to the entire population.
Four assumptions are made while using a t-test. The data collected must follow a continuous or ordinal scale, such as
the scores for an IQ test, the data is collected from a randomly selected portion of the total population, the data will
result in a normal distribution of a bell-shaped curve, and equal or homogenous variance exists when the standard
variations are equal.
--
T-Test Formula
Calculating a t-test requires three fundamental data values. They include the difference between the mean values from
each data set, or the mean difference, the standard deviation of each group, and the number of data values of each
group.
This comparison helps to determine the effect of chance on the difference, and whether the difference is outside that
chance range. The t-test questions whether the difference between the groups represents a true difference in the study
or merely a random difference.
The t-test produces two values as its output: t-value and degrees of freedom. The t-value, or t-score, is a ratio of the
difference between the mean of the two sample sets and the variation that exists within the sample sets.
The numerator value is the difference between the mean of the two sample sets. The denominator is the variation that
exists within the sample sets and is a measurement of the dispersion or variability.
This calculated t-value is then compared against a value obtained from a critical value table called the T-distribution
table. Higher values of the t-score indicate that a large difference exists between the two sample sets. The smaller the t-
value, the more similarity exists between the two sample sets.
Degrees of freedom refer to the values in a study that have the freedom to vary and are essential for assessing the
importance and the validity of the null hypothesis. The computation of these values usually depends upon the number
of data records available in the sample set.
--
T-Score
A large t-score, or t-value, indicates that the groups are different while a small t-score indicates that the groups are
similar.
--
Paired Sample T-Test
The correlated t-test, or paired t-test, is a dependent type of test and is performed when the samples consist of matched
pairs of similar units, or when there are cases of repeated measures. For example, there may be instances where the
same patients are repeatedly tested before and after receiving a particular treatment. Each patient is being used as a
control sample against themselves.
This method also applies to cases where the samples are related or have matching characteristics, like a comparative
analysis involving children, parents, or siblings.
The formula for computing the t-value and degrees of freedom for a paired t-test is:
where:
mean1 and mean2=The average values of each of the sample sets
s(diff)=The standard deviation of the differences of the paired data values
n=The sample size (the number of paired differences)
n−1=The degrees of freedom
--
Equal Variance or Pooled T-Test
The equal variance t-test is an independent t-test and is used when the number of samples in each group is the same, or
the variance of the two data sets is similar.
The formula used for calculating t-value and degrees of freedom for equal variance t-test is:
where:
mean1 and mean2=Average values of each of the sample sets

var1 and var2=Variance of each of the sample sets
n1 and n2=Number of records in each sample set
and,
Degrees of Freedom=n1+n2−2
where: n1 and n2=Number of records in each sample set

--
Unequal Variance T-Test
The unequal variance t-test is an independent t-test and is used when the number of samples in each group is different,
and the variance of the two data sets is also different. This test is also called Welch's t-test.
The formula used for calculating t-value and degrees of freedom for an unequal variance t-test is:
where:
mean1 and mean2=Average values of each
of the sample sets
n1 and n2 = Number of records in each sample set
and,
where:
n1 and n2=Number of records in each sample set
--
Which T-Test to Use?
The following flowchart can be used to determine which t-test to use based on the characteristics of the sample sets.
The key items to consider include the similarity of the sample records, the number of data records in each sample set,
and the variance of each sample set.
Example of an Unequal Variance T-Test
Assume that the diagonal measurement of paintings received in an art gallery is taken. One group of samples includes 10
paintings, while the other includes 20 paintings. The data sets, with the corresponding mean and variance values, are as
follows:
Though the mean of Set 2 is higher than that of Set 1, we cannot conclude that the population corresponding to Set 2
has a higher mean than the population corresponding to Set 1.
Is the difference from 19.4 to 21.6 due to chance alone, or do differences exist in the overall populations of all the
paintings received in the art gallery? We establish the problem by assuming the null hypothesis that the mean is the
same between the two sample sets and conduct a t-test to test if the hypothesis is plausible.
Since the number of data records is different (n1 = 10 and n2 = 20) and the variance is also different, the t-value and
degrees of freedom are computed for the above data set using the formula mentioned in the Unequal Variance T-Test
section.
The t-value is -2.24787. Since the minus sign can be ignored when comparing the two t-values, the computed value is
2.24787.
The degrees of freedom value is 24.38 and is reduced to 24, owing to the formula definition requiring rounding down of
the value to the least possible integer value.
One can specify a level of probability (alpha level, level of significance, p) as a criterion for acceptance. In most cases, a
5% value can be assumed.
Using the degree of freedom value as 24 and a 5% level of significance, a look at the t-value distribution table gives a
value of 2.064. Comparing this value against the computed value of 2.247 indicates that the calculated t-value is greater
than the table value at a significance level of 5%. Therefore, it is safe to reject the null hypothesis that there is no
difference between means. The population set has intrinsic differences, and they are not by chance.
--
How Is the T-Distribution Table Used?
The T-Distribution Table is available in one-tail and two-tails formats. The former is used for assessing cases that have a
fixed value or range with a clear direction, either positive or negative. For instance, what is the probability of the output
value remaining below -3, or getting more than seven when rolling a pair of dice? The latter is used for range-bound
analysis, such as asking if the coordinates fall between -2 and +2.
--
What Is an Independent T-Test?
The samples of independent t-tests are selected independent of each other where the data sets in the two groups don’t
refer to the same values. They may include a group of 100 randomly unrelated patients split into two groups of 50
patients each. One of the groups becomes the control group and is administered a placebo, while the other group
receives a prescribed treatment. This constitutes two independent sample groups that are unpaired and unrelated to
each other.
--
What Does a T-Test Explain and How Are They Used?
A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to
determine whether a process or treatment has an effect on the population of interest, or whether two groups are
different from one another.

T-Test: What It Is With Multiple Formulas and When To Use Them

Uploaded by

Copyright:

Available Formats

T-Test: What It Is With Multiple Formulas and When To Use Them

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

T-Test: What It Is With Multiple Formulas and When To Use Them

Uploaded by

Copyright:

Available Formats

T-Test: What It Is With Multiple Formulas and When To Use Them

The t-test is a test used for hypothesis testing in statistics.

T-tests can be dependent or independent.

Understanding the T-Test

Paired Sample T-Test

mean1 and mean2=The average values of each of the sample sets

s(diff)=The standard deviation of the differences of the paired data values

n=The sample size (the number of paired differences)

n−1=The degrees of freedom

Equal Variance or Pooled T-Test

mean1 and mean2=Average values of each of the sample sets

where: n1 and n2=Number of records in each sample set

Unequal Variance T-Test

mean1 and mean2=Average values of each

of the sample sets

var1 and var2=Variance of each of the sample sets

n1 and n2 = Number of records in each sample set

var1 and var2=Variance of each of the sample sets

n1 and n2=Number of records in each sample set

Which T-Test to Use?

How Is the T-Distribution Table Used?

What Is an Independent T-Test?

What Does a T-Test Explain and How Are They Used?

You might also like