NYA Lab 2B Scientific Method Statistics W2022

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

LABORATORY 2B

DATA ANALYSIS: DESCRIPTIVE & INFERENTIAL STATISTICS


I strongly recommend that you read & re-read this lab, as it is probably the first time you have ever ventured into
this land of numbers gone mad.
Note that this is lab exam material and you will need this to complete your CE Lab report on allelopathy.

AIM
When you have completed this lab, you will be expected to:
 understand the difference between descriptive and inferential statistics
 apply inferential statistics (focus on t-tests) appropriately to data.

1. INTRODUCTION
Researchers gather data to describe and learn about large populations. Unfortunately, most populations are too
big for us to measure any one variable for every member of the population. For example, it would be practically
impossible to measure the resting heart rate of all teenagers in Canada. Instead, we collect information from a
sample of the population. Sampling means we take a relatively small number of measurements that (hopefully)
represents the entire population.

Figure 1 A sample is a smaller group of members of a population


selected at random to represent the population. Put another
way, a sample is a subset of the population.
You can calculate for example, the mean height of a sample of
people from our Biology class, and from that you may infer the
mean height of the entire population (ex of all Dawson students).

2. EXPERIMENTAL DESIGN
2.1 The importance of sample size

Figure 2. The importance of


sample size.
In this example, you can see that a
single sample cannot represent a
population, but a sample size of 10
(n=10) in this case, comes close
(but it isn’t identical).

The bigger your sample size, the


more likely your sample will be
similar to the population you are
sampling from.

Let’s look at what can happen if your sample size is small and doesn’t represent the population. If you were to test
the effect of exercise on heart rate by measuring the heart rate of 10 people at rest, then again after 1 minute of
skipping rope, you would find that the heart rate obviously increased, but what you’re interested in is the degree to
which the heart rate increased. I’ve made up 2 possible data sets to compare below.
General Biology I, S. Kunicki, Dawson College Page 1 of 13
Table 1. Increase in heart rate (bpm) of 10 subjects, 5 males, 5 females.
Group 1 Group 2
5 35
3 39
6 28
4 33
7 36
7 37
6 32
5 28
8 31
4 35
Average 6 33

If, just by chance, the 10 people you chose were in really good shape (group 1), you would conclude that heart rate
doesn’t increase very much in response to exercise. If, on the other hand, just by chance, you chose 10 people
who were couch-potatoes (group 2), you would conclude that heart rate increases greatly. Neither of these groups
would be representative of the population at large.

In the experiments you will perform this semester, you will be using sample sizes that are scientifically way too
small (example 4 samples). This is done for practical reasons (cost and space), but you will see below that no one
would publish results from a study that has a sample size of only 4.
In order to have confidence that your survey results are representative, it is critically important that you have a
large number of randomly selected participants in each group you survey. So, what exactly is ''a large number?"
For a 95% confidence level (which means that there is only a 5% chance of your sample results differing from the
1
true population average) a good estimate of the margin of error (or confidence interval) is given by where n
√n
is the number of participants or sample size (Niles, 2006).
The following table shows this estimate of the margin of error for sample sizes ranging from 10 to 10,000. For more
advanced students with an interest in statistics, the Creative Research Systems website (Creative Research
Systems, 2003) has a more exact formula, along with a sample size calculator that you can use. For most
1
purposes, though, the approach is sufficient. Source of table:
√n
http://www.sciencebuddies.org/science-fair-projects/project_ideas/Soc_participants.shtml

You can quickly see from the table that results obtained from a survey with only 10 random participants are not
reliable. The margin of error in this case is roughly 32%. This means that if you found for example, that 6 out of
your 10 participants (60%) had a fear of heights: then the actual proportion of the population with a fear of heights

General Biology I, S. Kunicki, Dawson College Page 2 of 13


could vary by ±32%. In other words, the actual proportion could be as low as 28% (60-32) and as high as 92% (60
+ 32). With a range that large, your small sample isn't saying anything.
If you increase the sample size to 100 people, your margin of error falls to 10%. Now if 60% of the participants
reported a fear of heights, there would be a 95% probability that between 50 and 70% of the total population have a
fear of heights. Now you're getting somewhere. If you want to narrow the margin of error to ±5%; you have to
survey 500 randomly selected participants. The bottom line is you need to sample a lot of people before you can
start having any confidence in your results.
I don’t want you to memorize the formula. I do want you to see the impact of the sample size on the confidence
interval ie how confident you can be that the results from your sample are representative of the population.

2.2 What are replicates?


Science relies heavily on replicate measurements, both biological and technical replicates. Biological replicates
are parallel measurements of biologically distinct samples that capture random biological variation. Technical
replicates are repeated measurements of the same sample. The point of a technical replicate is to establish the
variability (experimental error) of the analysis technique
(the protocol and/or equipment you used and/or human
error).

Figure 3. Biological vs technical replicates.


QUESTION: Identify which of the drawings represents 3
biological vs 3 technical replicates.

Knowing the inherent variability as determined by


the variability of the technical replicates, and by the biological replicates, is necessary if one is to decide whether
observed differences between two groups of organisms (control vs experimental groups) is simply random or
represents a “true” biological difference induced by the independent variable.

QUESTION:If you were to test 100 students (before and after exercise) on 3 separate days, the 100 students are
______________________ replicates, and the repeated measurements of the heart rates on days 1,2, and 3 are
_______________________replicates. With the technical replicates, you could determine if the heartrate
increased to the same for degree for each individual.

QUESTIONS:
1. Consider the allelopathy lab (Lab 2A); how many replicates are there in the control group?
2. Were these biological or technical replicates?
3. What information does a biological replicate give you?
4. What information does a technical replicate give you?

2.3 The importance of having a control group


A properly designed experiment is one in which there are two conditions: the control and the test
(experimental). The goal is to have the 2 conditions be identical, except for the independent variable. The
subjects (samples) can be the same for both conditions. For example, you measure the heart rate of the same
100 students before (control condition) and after exercise (test condition). Alternatively, you can have 2 different
groups of subjects (test and control groups). In this case, the subjects are randomly assigned to the groups. You
might measure the heart rate in a control group (no exercise) of 100 students and in a test group (exercise) of 100
different students (for a total of 200 students). In this second example, you would need to control certain variables;
for example, age (you wouldn't want the average age of the control group to be 15 and that of the test group 95).
In this case, age would be a control variable.
In the example of the same 100 students tested before and after exercise, the number of biological replicates is
____ while the number of technical replicates is _____.

General Biology I, S. Kunicki, Dawson College Page 3 of 13


3. ANALYZING YOUR DATA USING STATISTICS
There are two types of statistics:
1. Descriptive statistics provide simple summaries about the sample and the dependent variable measured. For
example, the average student grade in General Biology I at Dawson last year was 74%.

2. Inferential statistics is used to make judgments of the probability that an observed difference between groups
is due to the independent variable, or if it is one that might have happened by chance in this study. A conclusion
using inferential statistics might be “Students who studied Biology at Dawson did significantly better on
standardized Biology tests than those who studied at CEGEP Mediocre.”
Inferential statistics are used to determine how confident you are that the results (in this example grades on the
standardized test) were caused by the independent variable (the CEGEP you went to), and not by chance (ex
what if by chance, the students who went to Dawson had higher IQs than those who went to the other CEGEP and
so scored higher because of their IQ, not because of the classes they took at Dawson?). You will see below
(section 3.2) how this is determined.

3.1 DESCRIPTIVE STATISTICS


Usually, researchers collect a lot of data in an experiment. It is very difficult to find "meaning" in this raw data. For
example, suppose you measured the heart rate of the same 200 students at rest compared to after 5 minutes of
skipping rope on 3 separate days (ie 200 biological replicates, 3 technical replicates); there would be 400 values for
heart rate per day (200 before, 200 after skipping rope) for a total of 1200 data points; this data set would be the
raw data.

Descriptive stats are used to summarize data so we can get an overview of the results. We can present the data
in tables and graphs and report the mean values as well as the variability observed.
Below is shown a small sample size (n) of 18 students (n=18), 9 males and 9 females. By graphing the raw data,
you can see a trend; for both females and males, the heart rate increased after skipping rope for 5 minutes.
However, just by looking at the graph, you can't tell what the mean heart rate before and after exercise was.

Figure 4. The effect of skipping rope for 5 minutes on the


heart rate in 18 Dawson students; 9 females and 9 males.
Subjects 1-9 were females and 10-18 were males. The
mean age for both males and females was 18 years.

 To be able to say how much the heart rate increased after exercise, you will need to use descriptive
statistics.
 To determine if the increase was significant, you would need to do inferential statistics.

By the way, a control variable is missing here. It is well known that the heart of people who exercise regularly
pumps blood more effectively and so the heart rate of such people doesn’t increase as much when exercising, as
compared to that of people who never exercise. If you wanted to compare, for example, the increase in heart rate
in males as compared to females, you would need to know beforehand how often each person exercised. Imagine
that you hypothesized that the increase in heart rate would be less for males than for females and, just by chance,
the females you chose were all athletes and the males were all couch-potatoes (see also Table 1, page 2). Your
results would probably suggest that the heart rate is less affected by exercise in females than in males.

It is very important to try to identify as many control variables as you can when designing your experiments.
Note that the control variable is not the same thing as a control group. For example, if you wanted to know the
effect of exercise on weight loss, you would have a control group that didn’t exercise, and an experimental group
that did. An example of a control variable would be diet (you wouldn’t want subjects in one group eating 4,000
Calories/day while those in the other group eating only 1,000 Calories/day).
General Biology I, S. Kunicki, Dawson College Page 4 of 13
3.1.1 Measures of Central Tendency
The most likely “pattern” revealed by examining a data set is the central tendency of the values. The most common
measures of central tendency are the mean (average), median, and mode. The mean ( ) is the arithmetic
average of a group of measurements. It is the sum of all the values divided by the number of values (n). In the
case of exercise above, the mean beats per minute (bpm) at rest was 1290/18 = 71.7 = 72 bpm and was 1491/18 =
82.8 = 83 bpm after exercise.
The mean is the most common measure of central tendency, but the median and mode are sometimes useful
because they are less sensitive to extreme values. [FYI: The median is the middle value of a group of
measurements that have been ranked from highest to lowest or lowest to highest. The mode is the value that
appears most often in the data set.] Note that for this course, we will not be using either the median or mode.

3.1.2 Variation within a Data Set


Below we have two samples of fish taken from 2 different lakes. Table 1 shows us that while calculating the mean
length of the fish gives us a general idea about the size of fish in the lake, it doesn't describe the variation in the
length of the fish. Even though the mean is the same for both lakes, the fish lengths from lake 2 have considerably
more variation. Variation is best quantified by range, variance, and standard deviation.
This example shows you the importance of determining not only the mean, but the amount of variation too.
Table 1. Raw data for the length of fish (in mm) collected from 2 different lakes. Sample size n= 5 for each lake.
Lake 1 Lake 2
25 10
25 18
30 20
32 35
35 67
Mean 30 30
Range 10 (25 to 35) 57 (10 to 67)
Sample size, n 5 5

The range is the difference between the smallest and the largest values of the data set—the wider the range the
greater the variation. The range of the fish length from lake 1 is 35-25 = 10 mm; the range of fish length from lake 2
is 67-10= 57mm. The mean length of the fish is the same for both sites, but the ranges indicate much more
variation in the fish from lake 2.
Variance measures how data values vary about the mean. Variance is much more informative than the range.
In order to determine the variance (see Table 2), you first calculate the mean. Second calculate the deviation of
each sample from the mean (see column 3 in Table 2) Third, square each deviation (column 4). Fourth; sum the
squared deviations. Lastly, divide the sum of squared deviations by the number of data points minus one to
calculate the variance (S2). S2 = ∑ (xi – mean)2 / n-1. (see step 5 below).
Notice in the equation that the larger the sample size, n, the smaller the variance. This is because a larger sample
size is likely to be more representative of the population (see Figure 2, page 1).
While variance is a good measure of the dispersion of values about the mean, a second and more commonly cited
measure of variation is the standard deviation. The standard deviation (S) equals the square root of the variance:
s= √ s2
Table 2: Calculation for the variance of the fish length in mm data from Lake 1:
Length of fish from Lake 1 Mean ( ) Deviation from the mean Deviation2
(xi) (xi - ) (xi - )2
25 30 -5 25
28 30 -2 4
30 30 0 0
32 30 2 4
35 30 5 25
Step 4: Sum of the squared deviations = 58
Step 5: Variance = 58/(5-1) = 14.5 mm
General Biology I, S. Kunicki, Dawson College Page 5 of 13
Step 6: Standard deviation = √14.5= 3.807 = 4 mm
Table 3. Calculation for the variance of the fish length in mm data from Lake 2:
Length of fish from Lake 2 Mean ( ) Deviation from the Deviation2
(xi) mean (xi - ) (xi - )2
10 30 -20 400
18 30 -12 144
20 30 -10 100
35 30 5 25
67 30 37 1369
Sum of the squared deviations = 2038
Variance = 2038/(5-1) = 509.5 mm; Standard deviation = √509.5= 22.57= 23 mm

The length of the fish can be reported as the mean ± standard deviation; 30 ± 4 mm for lake 1 and 30 ± 23 mm for
lake 2.
The mean and measures of variability (variance and standard deviation) are examples of descriptive statistics.

If we think of a graphical example, the variability in a sample might look something like this:

Figure 5. The average grade of General Bio I students. The


grades of three classes of NYA students were recorded. While
the average of all 3 classes was 74%, the standard deviations
differed: class 1 = 15, class 2 = 35 and class 3 = 5. The grades
are on the x-axis, the number (frequency) of students with each
grade, is on the y-axis. The more “spread out” the data are, the
higher the variability.

Going back to our fish example, we would see that the standard deviation (represented by the error bar in the
graph) is much greater for lake 2.
Average fish size

60
50 Lake 1 Figure 6. Comparison of mean fish lengths (mm) in 2
(mm)

Lake 2 different lakes. For both lakes, the sample size, n, was five.
40 The error bars represent standard deviation.
30
20
10
0
Lakes
1

QUESTION: Why is it necessary, but not sufficient, to present the average value obtained? Consider the fictional
person who drowned crossing a river that had a mean depth of 0.5 meters; what did this person fail to consider?

What is the importance of replication in an experiment? Discuss variability.

A) Biological replicates

B) Technical replicates

General Biology I, S. Kunicki, Dawson College Page 6 of 13


Answer A) Biological replicates help you to capture random biological variation; ex. humans don’t all respond identically to
exercise in terms of increase in heart rate (some increase a little, some a lot).
B) Technical replicates help identify variation in technique. They help you to evaluate the precision and reproducibility of an
assay, to determine if the observed effect can be reliably measured.

Calculate the variance and standard deviation of the following.

Heart rate before exercise Mean ( ) Deviation from the Deviation2


(bpm) mean (xi - ) (xi - )2
68
74
72
70
Sum of the squared deviations =
Variance =
Standard deviation =

The resting heart rate can be reported as the mean ± standard deviation: ______ ± ______

Use Excel to calculate the mean ± standard deviation.

%CV (SD/mean x 100)=

CALCULATE THE VARIANCE AND STANDARD DEVIATION OF YOUR DATA ON ALLELOPATHY

Number of germinated Mean ( ) Deviation from the Deviation2


seeds in WATER mean (xi - ) (xi - )2

Sum of the squared deviations = ______________

Variance =

Standard deviation =

%CV (SD/mean x 100)=

Number of germinated Mean ( ) Deviation from the Deviation2


seeds in Onion extract mean (xi - ) (xi - )2

Sum of the squared deviations = _______________


Variance =

Standard deviation =

General Biology I, S. Kunicki, Dawson College Page 7 of 13


%CV (SD/mean x 100)=
3.2 Inferential statistics
In the exercise example in Figure 4, it's obvious that the heart rate increased following exercise; but did it increase
significantly following exercise? How would you find out? You would use an inferential statistical test. The goal
of inferential statistics is to make inferences (conclusions) about a population on the basis of data collected
from samples. It's an example of inductive reasoning, where we observe a few individuals, our sample, and make
conclusions about all individuals, our population. By looking at a sample size of only 18 Dawson students, we may
want to draw a conclusion about the effect of exercise on all 11,000 Dawson students or maybe on all humans.

3.2.1 Hypothesis testing using inferential statistics


When comparing the means for two groups of data you need to decide if the differences are due to the
treatment/independent variable or are due to the randomness of the data collection process, in other words, to
chance. To test for this possibility, you would generally do the following:

 Write out your hypotheses (null and alternate), collect your data, analyze your data using the appropriate
statistical test, and draw your conclusion. For this course, we will only be looking at the t-test.

For example, you may be interested in determining whether the ability to recall words is reduced when studying
with music on, as compared to when studying in silence, or whether physical exercise is associated with reduced
amounts of stress hormones. In both cases you would probably find that the sample means that you are comparing
are different. The question is are they significantly different?

3.2.1.1 The hypothesis


A statistical test involves two hypotheses:
1. The null (no effect) hypothesis (H0) of the test. The H0 states that the difference in the dependent variable
between the two treatment groups is not statistically significant. In other words, your independent variable
has no effect on your dependent variable.
2. The alternative (effect) hypothesis (HA) of the test. The HA states that the difference in the dependent
variable between the two groups is statistically significant.

Influence of age on blood pressure (bp)


1) There will be no difference in the bp of young vs old people.
2) Young people will have a different blood pressure than older people.
3) Young people will have a lower blood pressure than older people.
4) Young people will have a higher blood pressure than older people.

The first hypothesis (1) is called the null hypothesis because you are predicting that there will be no difference;
This is the hypothesis that researchers want to nullify (reject)… so it’s called the null hypothesis. Researchers
themselves do not usually predict that there will be no difference. Rather, they make one of the alternative
hypotheses (see numbers 2,3,4 above) and collect data in an attempt to REJECT the null hypothesis. If you are
successful in rejecting the null hypothesis, you can conclude that the data supports the alternative hypothesis.

There are also two types of alternative hypotheses in the above examples. Hypotheses (3) and (4) are
directional hypotheses. In both cases, you are specifying which group (young or old) will have the larger sample
mean. Hypothesis (2) is a non-directional hypothesis. You are only saying that there will be a difference; you are
not specifying whether the difference is positive or negative. For example, you are not specifying whether the
blood pressure of young people will be higher or lower. A non-directional hypothesis is said to be two-tailed; a
directional hypothesis is said to be one-tailed.

You can only test one alternative hypothesis with a data set. Therefore, you must state only one alternative
hypothesis.

State whether the following hypotheses are 1-tailed (directional) or 2-tailed (non-directional):

Caffeine increases heart rate. ___________________________

General Biology I, S. Kunicki, Dawson College Page 8 of 13


[Write a null hypothesis regarding caffeine: _____________________________________________________]
Fertilizer affects the growth rate of corn. _____________________

Smoking increases the risk of lung cancer. _________________

Gravitation will greatly influence the growth of roots. ________

Mice fed vitamin C will grow larger than mice fed a placebo. __________

Make up an example of a 1-tailed hypothesis: ____________________________________________

Make up an example of a 2-tailed hypothesis: ____________________________________________

Is your alternative hypothesis regarding the onion extract one or two-tailed? _____________________

3.2.2 The significance level (aka the "oops factor")


Statistical tests examine the probability of obtaining the difference you observed between groups IF the H 0
WERE TRUE. This probability is calculated as a p-value that ranges from 0 (0%) to 1 (100%). In order to carry out
a statistical test, you must decide on a significance level to add meaning to what you mean by “significant”. Your
significance level is essentially your threshold for what you consider not very likely.

Traditionally, scientists have chosen a significance level of 0.05 (p<0.05) which means that there is less than a 5%
probability (p<0.05) that the difference observed between the experimental and control groups was due simply to
chance and not due to the independent variable (sample not representative of the population, unidentified
confounding variables, experimental errors…). When the consequence of the mistaken conclusion is great,
scientists may choose a lower significance level, for example p<0.01 or even p< 0.001. On the other hand, if a
drug company is testing a drug that may be used for terminally ill people, for whom there is no other drug option, a
significance level of 0.1 or higher may be considered acceptable. The smaller the calculated p-value, the higher
the significance.

The lower the p-value, the less likely that you would obtain this difference by chance and so you don’t believe
the H0 (you reject it) and you believe the HA.
For this course, you will use a significance level of 0.05 (p<0.05) ie there is less than a 5% chance (1 out of 20
times) of getting these results when the H0 is true, that is, the H0 is too unlikely to be true.

It's possible that you perform a study (ex. test whether 50 people given drug X loose more weight than 50 people
given a placebo) where you find a significant difference, but then you see that several other researchers repeated
the study and failed to find a significant difference. There are a number of possible explanations for such results.

One possible explanation is that your sample size was too small and so was not representative of the
population. In the case of the weight loss study, it’s possible that the test group (got the drug) of patients you
randomly selected just happened to respond to the drug better than, say, 95% of most people (again, the smaller
the sample size, the more likely this can occur). It’s entirely possible that the subjects (people) in the other
researcher were simply less sensitive to the drug than were your subjects. Or maybe the subjects in your test
group were eating better or doing exercise; these are variables that you didn’t control for in your experimental
design. While you do try to control for all the variables that you can think of, it can happen that there are control
variables that you were unaware of, and so you didn’t control for them.
Moral of the story: no matter how well an experiment is designed, it should never be considered as definitive: you
can draw conclusions based on your data, but you can’t say you have proven something. When you’re writing a
lab report, please make sure you never write “we proved…”; it’s very bad for my blood pressure, and for your
grade.

QUESTION: What does a p=0.1 mean? _________________________________________________

General Biology I, S. Kunicki, Dawson College Page 9 of 13


______________________________________________________________________________________

Based on the above, for your experiment on allelopathy (Lab 2A), you will want a p value that is _____________
in order to conclude that the onion extract significantly inhibited the germination of the radish seeds.

3.3 The t-test


The t-test is used to determine whether the difference between the means of two groups is statistically
significant. Figure 7 shows the distributions for the experimental (treatment group on the right) and control
(left) groups in 3 different studies. The dotted lines show where the mean is for each group. While the difference
between the means is the same in all three studies (ie the drug seems to have improved memory in each of the 3
patient groups), the three sets of graphs don't look the same -- they represent very different situations. The top
example shows a case with medium variability in the memory scores within each group. The second situation
shows a high variability in the memory scores within each group, and the third shows a case with low variability
in the memory scores within each group. Clearly, we would conclude that the experimental group and control group
appear most different in the low-variability situation. Why? Because there is relatively little overlap between the two
bell-shaped curves. In the high variability case, the group difference appears least striking because the two bell-
shaped distributions overlap so much. In fact, many of the control subjects had results that were better than the test
subjects.

Figure 7. The effect of drug D on memory scores in 3 different


studies. The test subjects (received drug D) are shown on the right,
the control group (received a placebo) is on the left. Values on the
X-axis go from low score on left, to high score on the right. Values
on the y-axis are the number of subjects with that score.

QUESTION: If you were the president of a drug company, which of


the 3 results above would you prefer? Why?

This leads us to a very important conclusion: when we are looking at the differences between scores for two
groups, we have to look not only at the difference in the means, but also in the variability within each group.
A t-test considers 2 factors to judge whether a difference between 2 groups is significant or not:
1. The size of the difference in the measured variable between treatment groups (difference in their means).
2. The amount of variation in the measured variable WITHIN each treatment groups.
The LARGER the difference and the SMALLER the variability of the data within each group, the more likely that
the difference is statistically significant.

To the right is one of the equations used in t-tests. Don’t memorize this. I include it so
you can see that t-tests evaluate not only the difference in the means (X1 vs X2), but
also the variability (S2 is variance) in your control and tests groups.

3.3.1 Unpaired (independent) vs paired (dependent) t-tests


Unpaired t-test compares two independent groups ex. diabetic patients versus non-diabetics.
The terms independent (unpaired) t-test and dependent (paired) t-test are not to be confused with independent &
dependent variables.

The paired t-test is for comparing and testing 2 sets of data that are somehow related, based on 2 samples.
Usually this happens when measurements are taken “Before and After” a certain treatment or event; essentially,
you test the same samples under two different conditions. So, the samples are not ‘independent’ samples.
The paired t-test reduces inter-subject variability (because it makes comparisons between the same subject), and
thus is theoretically more powerful than the unpaired t-test.

General Biology I, S. Kunicki, Dawson College Page 10 of 13


QUESTION 1. Which t-test (unpaired or paired) would you use to compare:

A. the heart rate of Dawson students (Figure 4, p4): _______________________________

B. the fish from the 2 lakes (Table 1, p5): ____________________________________

C. the effect of onion extract on radish seed germination: _____________________________

QUESTION 2. Describe a protocol to test the following hypothesis: Hypothesis: the ability to recall words is
reduced when studying with music on, as compared to when studying in silence. A) First describe a protocol that
would be unpaired and then B) re-write the protocol for a paired t-test.
A) Unpaired t-test:

B) Paired t-test:

3.3.2 The t-test Procedure


1. State the null and alternative hypotheses
a. Indicate if your alternative hypothesis is 1-tailed or 2-tailed.
2. Decide if your test is unpaired or paired.
3. List the data of the control and experimental groups.
4. Calculate the mean of each group.
5. Calculate the standard deviation of each group.
6. Determine your p value using Excel.
7. State your conclusion.

INTERPRETING THE RESULTS OF A STATISTICAL TEST

 If the p-value is ≥ 0.05, then you fail to reject your null hypothesis (the H0): The difference IS NOT
statistically significant.

 If the p-value is < 0.05, then you accept your alternative hypothesis (the HA) ie you reject the H0 and
conclude that the difference IS statistically significant.

Some Important Notes on Statistical Tests


Statistical tests are useful tools for data analysis and determining the results of your experiment and drawing
conclusions, but they are not the result of an experiment. Many students mistakenly think that finding a
statistical significance between two treatment groups means that the experiment was a success and that their
experimental hypothesis is 100% supported.

For instance:
1. A difference between treatment groups might be significant but small and scientifically boring.
2. A statistical test does not look at how you carried out the experiment and cannot identify a false positive result
because of an unwanted variable. You may get a p value of 0.01 with a sample size of 5, which would be
meaningless.
3. A statistical test can’t look at other limitations of the experiment in testing a hypothesis.

General Biology I, S. Kunicki, Dawson College Page 11 of 13


It is still the responsibility of the researcher to put meaning to this difference and draw appropriate
conclusions for an experiment.
QUESTION. For each of the 2 experiments, write out the hypotheses (null & alternative), indicate whether the
(alternative) hypothesis is 1-tailed or 2-tailed, and whether the experiment is unpaired or paired. Given the p value,
draw a conclusion.

A) Samples of male Leopard frogs (Rana pipens) were selected from 2 different ponds, and their body lengths
were measured to the nearest mm. Data was used to determine whether there is a statistically significant
difference in body length in the male frogs from the 2 populations. The p value was found to be 0.04

Null hypothesis:

Alternative hypothesis:

1-tailed or 2-tailed? Unpaired or paired?

Conclusion:

B) There is some support for the idea that eating anchovies before a race will make a person run faster. To test this
hypothesis, Dr. B. Joggerson gathered 3 professional sprinters. He had each sprinter run the 100m dash and
recorded their times. After an hour break, Dr. Johnson then had each sprinter eat 5 extra-salty anchovies and
asked them to run the dash again; and he recorded their times. The following is the data that was collected, and
the p value was found to be 0.05.

Sprinter Sprint time without anchovies (sec) Sprint time with anchovies (sec)
1 13.4 12.2
2 12.2 12.0
3 12.5 12.1

Null hypothesis:

Alternative hypothesis:

1-tailed or 2-tailed? Unpaired or paired?

Conclusion:

3.3.2.1 Using excel to calculate your p value


In the figure below, you see that you can use Excel to find the actual probability (the p value) that the results
observed were due to chance. In the example below, a paired, 1-tailed t-test found that a decrease in ambient
temperature resulted in a significant decrease in heart rate; the p value was 0.0216; this means is that there is only
a 2.16% chance (p of 0.0216 x 100) that the decrease in heart rate was due to chance, and not to the decrease in
temperature. This gives you a precise calculation of the chance that your results were not due to the independent
variable.

You can also use Excel to make sure that you calculated
your mean and standard deviation correctly.

Step 1. Input your data in 2 columns.


General Biology I, S. Kunicki, Dawson College Page 12 of 13
Step 2. Go to the formulas tab: notice the Insert function option on the top left

Step 3. A) Choose the place that you want your p value to appear (in the example above, square C10 was chosen.
B) Click on Insert Function and choose the following
function: T.TEST
C) Click OK

Step 4. The following window will appear:

Array 1: is your first data set. In this example,


beats/minute at 25°C
Array 2: is your second data set. In this example,
beats/minute at 10°C

Tails: Choose 1 if the alternative hypothesis is 1-tailed;


Choose 2 if the alternative hypothesis is 2-tailed.
In this example, the hypothesis was that a decrease in
temperature would cause a decrease in heart rate, so 1-
tailed.

Type: Choose 1 if the experiment was paired;


Choose 2 if the experiment was unpaired. In this
experiment, the same Daphnia were tested at room and
ice temperatures, so it's a paired t-test.

When you click on OK, you see your p value (0.021594).


Use these data to practice on Excel; you should get the same p value.

You are now ready to analyze your Allelopathy results (Lab 2A, which you haven’t done yet😊).

General Biology I, S. Kunicki, Dawson College Page 13 of 13

You might also like