Biostatistics Lectr - Basic Concepts ANOVA (2020)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Basic Concepts

Experimental Unit: This is the unit of experimental material to which a


treatment is applied. e.g. a plant, an area of ground containing
many plants, a pot in a green house, a single animal, several animals,
or an entire herd.
Plot: This is used synonymously with experimental unit, usually in
experiments involving plants.
Variable: A measurable characteristic of an experimental unit. e.g.
weight, height, yield etc. This can be discrete (discontinuous) i.e.
assuming only specific values, or it can be continuous and assume any
value between certain limits.
Variates: These are individual measurements of a variable
Basic Concepts Cont’d.
• Treatment: This is the material that is to be tested in the experiment
e.g. crop varieties, animal diet(s) etc.
• Replication: When a treatment is applied to more than one
experimental unit we have replication. Two experimental units
treated alike constitute 2 replications or replicates
• Sample: This is a set of measurements that constitutes part of a
population. Information is obtained from a sample and inferences
are made about a population therefore, it is important that the
sample be representative of the population and to obtain a
representative sample the principle of randomness is used.
Basic Concepts Cont’d.
• Random Sample: This is one in which any individual is as likely to be
included as any other
• Population: A population is a set of measurements or counts of a
single variable taken on all the units specified to be in the
population.
• Usually a population is very large such that it is not possible to know
all the measurements or counts in it. Therefore, it is not possible to
know the actual arithmetic mean of the population, its standard
deviation or its variance.
• These Numerical Characteristics (mean, variance, std. deviation)
describing the population are referred to as Parameters.
Basic Concepts Cont’d
• Since we do not know the number of observations in a population,
we therefore DO NOT KNOW the actual value, for instance, of the
parameter Mean (which is a fixed value) of the population.
• This parameter, the arithmetic Mean, can therefore only be
estimated.
• To do this representative samples are drawn from the population.
• Since these samples are small in size, all the measurements or counts
in each of them can be made and known, thus their means, variances,
std. deviations etc. can be determined or calculated.
Important Symbols used in Biostatistics
• These numerical values (mean, variance, std. deviation) of the
samples are called STATISTICS and they estimate the respective
population parameters, since they came from that population,
hence they are also called ESTIMATES.
Symbols commonly used in Biostatistics
Population Sample
Mean: µ ӯ
Variance: σ2 s2
Standard Deviation: σ s
Summation ∑ ∑
Statistical Notation, Means & Std. Deviations

• Population Mean: µ = ∑(Y1+Y2+Y3+…….+YN)


_______________ = ∑ Y /N N

• Sample Mean: ӯ = ∑ Y /r , where r = No. of variates in the sample;


N = No of observations or counts in the population
• Often, we wish to denote the difference between a variate (Y) and
Deviations: Variances & Std. Deviations

a mean (ӯ). Such deviates are often represented by italicised


lowercase y or x. Thus y = Y-ӯ.
• The most common measure of dispersion, and the best for most
purposes, is the standard deviation and its square, the variance.
• Population variance: σ2 = ∑ (Y - µ)2 /N

• The best estimate of σ2 in a small sample (where r < 60) is defined as


≠ ≠ s2 = ∑ (Y - ӯ) 2/r-1 ≠ ≠ ≠ ≠
• The Std. Deviation is the square root of the Variance
ANALYSIS OF VARIANCE (ANOVA)
• Yields (tons/ha) of sorghum varieties A & B from plots to which the varieties were
randomly assigned
• ___________________________________________________________________
• Replications Total Mean
• I II III IV V
• ___________________________________________________________________
• A 19 14 15 17 20 85 17
• B 23 19 19 21 18 100 20
• Grand Total 185
• Grand Mean (Mean of Means) 18.5
• ___________________________________________________________________
ANALYSIS OF VARIANCE (ANOVA)
• The two varieties A & B could be coming from the same population or
from two different populations.
• If they are coming from the same population, the two means (17 &
20) of these varieties estimate the same population mean which is
not known because we do not know all the variates (measurements
or counts) in that population for us to calculate that mean.
• To compute the variability called experimental error, we compute the
variance of each sample (sA2 & sB2), assume they both estimate a
common variance (σ2) of the population they are coming from, which
we do not know.
ANALYSIS OF VARIANCE (ANOVA)

•sA2 = ∑(Y- ӯ)2 / r-1, Where r= No. of observations (variates) in that sample

= (19-17) 2 +……+(20-17) 2/5-1

= 26/4 = 6.5

•sB2 = ∑(Y- ӯ)2 / r-1


= (23-20) 2 +……+(18-20) 2/5-1

= 16/4 = 4.0
ANOVA
• If the sample variances sA2 & sB2 are pooled, we get a common variance
which is an estimate of σ2 based on variability within the samples which
we will designate as Sw2
• Sw2 = 6.5 + 4.0 = 5.25 This is the pooled variance
2
• Assuming the null hypothesis that these two samples are random samples drawn from
the same population and that, therefore ӯA and ӯB both estimate the same population
mean (µ), we estimate the variance of means (σӯ2) from the means of samples A & B.
• Sӯ2 = (17-18.5)2 + (20-18.5)2 = 4.5
2-1
ANOVA
• We again estimate σ2 using the relationship Sӯ2 = s2/r & solving for s2.
• Remember, r is the number of variates on which the sample mean is
based. This estimate of σ2 is designated as Sb2.
• Sb2 = r Sӯ2 = 5(4.5) = 22.5 This is variance between means

• We now have two estimates of σ2: i.e. Sw2 based on the variability
within each sample, and Sb2 based on the variability between the
samples.
Statistical Hypothesis & Tests of
Significance
• The statistical procedure for comparing 2 or more treatment means
employs the use of an assumption called the Null Hypothesis (Ho),
which assumes that there are no significant differences between
treatments being compared.
• The alternative Hypothesis is Ha and it assumes that there are
significant differences between treatments
• An F test is a ratio between two variances and is used to determine
whether two independent estimates of variance can be assumed to be
estimates of the same variance.
•F = S2, calculated from samples means = Sb2/ Sw2
S2 calculated by pooling sample variances
F test
• From our data, the Calculated F value = 22.5/5.25 = 4.29. This
value is compared with the value at 5% & 1% level of
significance from the F-tables with the degrees of freedom (d.f)
for the numerator, n-1, (i.e. 2-1 = 1) (where n is the number of
samples) and for the denominator n(r-1) = 2(5-1)= 2(4) = 8,
(where r is the number of variates in each sample)

• Thus the calculated F-value 4.29 is compared with the tabular


value with 1 d.f. for the numerator, and 8 d.f. for the
denominator, whose value is 5.32 at 5% level of significance, and
11.26 at 1% level of significance
CONCLUSIONS
• Since 4.29 is less than 5.32 and 11.26, we conclude
that the two varieties (samples) are not significantly
different from each other in terms of yield, therefore
chances are that they came from the same
population, since their means do not statistically differ
significantly. We therefore accept the null hypothesis
(Ho) and reject the Alternative hypothesis (HA).
ANOVA Cont’d.
• The data in our example consists of only 2 treatments (variety A & B). If
there were more treatments to be compared, and if each treatment
had a high number of observations (variates), calculation of the
individual variances for use in calculating F-value would be tedious and
more time consuming.
• An easier way of handling the data is by use of an ANOVA table which
will be illustrated under each Experimental Design.
• The principle difference among experimental designs is the way in
which experimental units are grouped or classified.
• In all designs, experimental units are classified by treatments, but in
some they are further classified into blocks, rows, columns, main plots
etc.
ANOVA Cont’d.
• The analysis of variance uses the means of these groupings, called
Sources of Variation to estimate Mean Squares.
• A mean square estimating the dispersion among plot measurements
resulting from random causes is also calculated – it is called
experimental error.
• In the absence of real differences resulting from means of treatments,
blocks, or other sources of variation, these mean squares will, on the
average, be equal.
• Only rarely will one mean square deviate greatly from another by
chance alone.
ANOVA Cont’d.
• When an F test indicates that the mean square from one of the
sources of variation is significantly greater than the mean square
resulting from random effects, we say that there are real differences
among the means of that particular source of variation.
• However, remember – there is always a definite chance that we will
be wrong in such a conclusion. It is up to the experimenter to select
the odds at which it is believed there are real effects.
• This brings us to the concept of TYPE I & TYPE II ERROR
TYPE I AND TYPE II ERROR
• If the null hypothesis (Ho) is true, i.e. ӯA = ӯB, then approximately 5
samples out of 100 will have means that are significantly different i.e.
where ӯA ≠ ӯB.
• If we actually obtain such a sample, we may make one or two
decisions:
i) that the Ho is true (i.e. ӯA = ӯB) and that the sample obtained
by us just happened to be one of those in the tail of the
distribution
ii) that drawing such a sample is not too improbable an event to
justify acceptance of the Ho i.e. that the hypothesis ӯA = ӯB is
not true.
TYPE I AND TYPE II ERROR Cont’d.
• Either of these decisions may be correct, depending upon the truth of
the matter. If the hypothesis is correct, then the 1st decision to accept
Ho will be correct. If we decide to reject Ho under these circumstances
i.e. when Ho is actually true we commit an error of Rejecting a true Ho.
This is called a Type I error.

• On the other hand, if actually ӯA ≠ ӯB., the 1st decision to accept Ho is an


error, a so called Type II Error which is Acceptance of a false Ho
• Finally, if Ho is not true and we decide to reject it, then again we make
the correct decision.
TYPE I AND TYPE II ERROR Cont’d.
• Thus there are two kinds of correct decisions;, Accepting a True Ho
and Rejecting a False Ho; and two kinds of Errors:
Type I – Rejecting a true Ho and Type II, Accepting a false Ho.

• The relationship between hypotheses and decisions can be


summarized as follows:
TYPE I AND TYPE II ERROR Cont’d.

 
 
   
NULL HYPOTHESIS
Accepted Rejected
 
 
   
 
 
 
 
NULL HYPOTHESIS True Correct Type I Error
 
 
Decision
   
 
False Type II Correct
Error Decision

You might also like