biometry

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Wollo University

Department of Statistics

Biometry Course Notes for Biotechnology


Students
By: Tilaye M.
1. Introduction
• Researchers need to understand the basic principles of
experimental design and basic terminologies of Statistics
in order to perform quality research through the application
of Statistical principles and methods.

• The goal of every research activity is to make a good


decision and scientific generalization using data.

• Data are the raw material while information is what you


get when you organize and analyze your data.
Definition and Description of terminologies
 Statistics is science that deals with data and Method.

 Statistics - defined as the science of extracting information from


data.

 It is the development, application of theory and methods to the


collection(Design), analysis(Applying statistical techniques and
methods) and interpretation of observed information from planed or
unplanned experiments under uncertainty.

 Technological developments have demanded methodology for the


efficient extraction of reliable statistics from complex databases.
 Sample - a subset of observations selected by a specific procedure.
 i.e a part/subset of a population. E.g a Sample of diseased
patients.
 Population - the complete or the entire set of observations about
which information is desired. E.g the population all patients
afflicted with a disease.
• Parameter: a numerical measure or a value used to describe a
population.
• Statistic: a numerical measure or value used to describe a sample.
• Data set - a collection of data
• Raw data - the initial measurements that form the basis of analyses
• Biostatistics is application of statistical methods to analyze
biological data.
• Biological data: Agriculture, health etc.
• Biometry/biometrics - the application of mathematical and
statistical methods to the collection, analysis, and interpretation of
biological data .

• Biometry refers to the development and application of statistical


methods for planned biological experiments.
Examples:
 Determine which of 3 fertilizers compounds produces highest
yield.
 Determine which of two drugs is more effective for controlling a
certain disease in humans
 Determine whether an activity such as smoking causes a response
such as lung cancer
Data and variables
 The Oxford English Dictionary (1971) defined data (singular
datum) as facts, especially numerical facts, collected together for
reference or information.

 Data are at the center of experimental and observational studies.

 Data are the material with which statisticians work.

 They are records of measurement, counts or observations.

 Examples of data are records of weights of calves, milk yield in


lactation of a group of cows, male or female sex, and blue or green
color of eyes.
Data and Variable
 A variable is any observable event that can vary
 Variable is anything that can assume different
observable values.
 Alternatively, a set of observations on a particular
character is termed a variable, e.g. weight, milk yield,
sex, and eye color
 Data are the values of a variable, for example, a
weight of 200 kg, a daily milk yield of 20 kg, male, or
blue eyes.
Types of variables
• Variables can be defined as quantitative (numerical) and
qualitative (attributive, categorical, or classification) variable.
1. Quantitative variables
• Have values expressed as numbers and the differences between
values have numerical meaning.
• A quantitative variable can be also classified as continuous or
discrete .
• A continuous variable in turn can be classified into interval and
ratio values based on its measurement scale.
Cont…
A. Continuous (measurement) variables - values are determined
using some kind of measuring scale, e.g. length, weight, volume,
area, density, time, etc.
 It can assume any range of values between an intervals.
 Its scale of measurement can be either Interval or ratio scale.
• Interval variables – continuous variables that have no true zero
value, e.g. temperature (0°C = 32°F)
• Ratio variables – continuous variables that can have a true zero
value, e.g. milk yield of a cow.
B. Discrete variables –can assume finite or countably infinite values.

 values are determined by counting, e.g. family size, litter size,


microbial count, egg production, tiller count, number of teats on a
cow, etc.
Qualitative variables
 These are variables described by the qualities they possess. They
are expressed in categories or groups.
 It can not be measured numerically.
A. Ordinal (ranked) variables - an ordinal variable has categories
that can be ranked or ordered.
 It is expressed by relative differences, e.g. high, medium, low, etc.

 We can assign numbers (scores) to ordinal categories, e.g. 1 = low;


2 = medium; & 3 = high; but differences among those numbers
have no numerical meaning

 E.g. difference b/n score 1 and 2 (low and medium) does not have
the same meaning as the difference between 2 and 3 (medium and
high).
B.Nominal variables
• a nominal variable has categories that cannot be ranked. No
category is more valuable than another, e.g. color, gender,
Marial Status, etc.
• Individuals can be assigned in two different groups that has
no order and meaningful mathematical and statistical
difference.
• Reading assignment: Discuss in detail about measurement
scale and type of scales of measurement with practical
examples..
The need of biometry and experimental design
 In agricultural, medical and other biological applications, the most
common objectives is comparison of two or more treatments or else.
 The premise of statistical inference is that we attempt to control and
access the uncertainty of inferences we make on the population of interest
based on observation of samples.
 In Survey study, the research has no control for the response variable but
designed experimental study allows the research to control variation in
the response variable.
 And biometry is all about experimental study (DOE).
 Thus, the application of Biometry and experimental design is useful for
comparison of means and control of variation though randomization,
replication and blocking.
The basic objective of experimental design is to
construct an experiment that allows for a valid
estimate of the variance of the observations.

Experimental design is the basis for most of the


ideas of designing scientific investigations.

Experimental design is useful in conducting


planed experimental research .
In designed experimental study the primary
interest is:
– to investigate on the relative performance of
certain factors.
• The interest would be in answering questions
such as:
– are the three methods for treating the disease
different? And if so, by how much?
– Is the new teaching method significantly different
from the old method?
The Role of Statistics / Biometry/ in Experimental design
• Statistics has a significance role in experimental design.
 Project Planning Phase
– What is to be measured?
– How large is the likely variation?
– What are the influential factors?
 Experimental Design Phase
– Control known sources of variation
– Allow estimation of the size of the uncontrolled variation
– Permit an investigation of suitable models
 Statistical Analysis and result interpretation Phase
• Make inferences on design factors
• Guide subsequent designs
• Suggest more appropriate models
• proper result interpretation.
• And concluding remarks!
Basic Principles of Experimental Design
The researcher has the control over the factors
to be tested and the form of data to be collected.
 He/she sets the experiment and observes the
outcome.
Experiment:
– is a planned inquiry set to obtain new facts,
confirm or disapprove results from a previous
experiment or verify certain biological phenomenal.
Objectives
– The objectives must clearly stated as questions
to be answered; hypotheses to be tested, and
effects to be estimated.

– It is necessary to classify the objectives as


major or minor, since certain experimental
designs give greater precision for some treatment
comparisons than others.
Components of experimental Design
 The following are components that any researcher must clearly state

when conducting a designed experimental study.


 Treatment structure
 Design structure
 Experimental unit
 Randomization
 Replications
 Assumptions and Hypothesis
 Treatment: the procedure whose effects is to be measured and
compared with other treatments.
– E.g. a standard ration, a spraying schedule, a temperature-humidity
combination, etc.
Cont…
 A set of treatments,

– e.g. sources of fertilizer such as DAP, CAN, TSP, Manure, etc.

One-way treatment structure,

– e.g. nitrogen levels, Dairy meal levels, etc.,

Two-way treatment structure,

– e.g. plant population and different hybrids.

 Higher order treatment structure, etc.

 The interest is to estimate effects, compare effects, predict, etc.,


Cont..
• Experimental unit: the unit of experimental material to which one
application of the treatment is applied.

• It is the smallest unit of material to which the treatment is applied.

– e.g. an animal, 5 pigs in a pen, a half-leaf.

• Experimental design: from a statistician point of view, a design is a plan


for obtaining and using experimental material to allow comparisons
among treatments.

• More specifically, it is a plan for applying the treatments to experimental


units in such a way that experimental units are alike except for the
treatments.
Replication:
• When a treatment appears more than once in an
experiment, it is said to replicated.

• Replication is necessary to provide for an estimate of


experimental error, which is required for tests of
significance.

–Replication provides means of computing


experimental error.

• Without replication there is no basis for comparison.


• The number of replication is determined by:

– the extent, to which the standard error must be reduced,

– which is in turn determined by the size of treatment difference, which

the experiment should detect, and

– the amount of precision envisaged in the experiment.


Randomization:
• It is a process in which the order of allocation of treatments to
experimental units can be done randomly in order to eliminate
human bias and any other systematic influence.

• Done to ensure that we have a valid/unbiased estimate of:

– experimental error,

– treatment means and

– differences among the treatment means, etc

 More importantly it provides insurance against the possibility that


the model for analysis is valid.
Design structure
 Involves techniques for controlling known variation among the

experimental units.

– Thus, experimental units are grouped into homogeneous groups

referred as blocks such that variation within the groups is a minimum

and between them is a maximum.

 The following are examples of design structure:


– Complete randomized design (CRD)
– Randomized complete block design (RCBD)
– Latin square design
– Cross-over design,
Incomplete Block design, etc.
Assumptions

• The design structure and treatment structure do

not interact.

• The observed values are independently and

identically distributed normal with a constant

variance.
Hypothesis
• In experimental design, the possible hypothesis to be
tested should be clearly stated.
• For example; testing equality of treatment means.
• The level of significance should be given for the
assurance of equality or inequality of the possible
treatment means.
• Once the hypothesis is framed, the next step is to
design a procedure for its verification.
 This is the experimental procedure, which usually
consists of four phases:
1) Selecting the appropriate materials to test.
2) Specify the characters to measure
3) Selecting the procedure to measure those characters
4) Specify the procedure to determine whether the
measurements made support the hypothesis.
 In general, the first two phases are fairly easy for a
subject matter specialist to specify.
• For example, in a maize breeder, the test materials would

probably be the native and the newly developed varieties.

• The characters to be measured would probably be disease

infection and grain yield, and other agronomic characters.

• On the other hand, the procedures regarding how the

measurements are to be made and how these measurements

can be used to prove or disprove a hypothesis depends

heavily on techniques developed by statisticians.


These two tasks constitute much of what is
generally termed the design of experiment, which
has essentials components:
Estimate of error
Control of error
Proper interpretation of results

Usually, due to chance and other factors error


exists in any of experimental studies.
Estimate of the error
 The term Estimate is usually used to denote the actual numerical
values one might calculate in practice.

 While Estimator is a quantity describing the sample that is used to


guess for the value of the corresponding population parameter.
 That is the sample mean(𝑦) is an estimator of the population mean
(𝜇).
 Hence, the error is the discrepancy between the population value and
sample value.

 In experimental designed study, the experimental error occurred and


can be minimized by randomization and replication.
 Precision: is a property of the random variables or statistics and not

of the observed values of those variables or statistics.

 Precision, sensitivity, or amount of information is measured as a


reciprocal of the variance of a mean.
1 𝑛
 That is 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 = = . Where 𝑣𝑎𝑟(𝑦) denotes the
𝑣𝑎𝑟(𝑦) 𝜎2
variance the sample mean 𝑦.
 An observed effect is said to be sufficiently precise if the standard
deviation of the statistic that measures the effect is suitably small.
 As the variance of y denoted by 𝝈𝟐 increases, the amount
of information decreases (less precise).

 Similarly as “n” increases, the amount of information


increases(more precise).

 Blocking, repeat tests, replication, and adjustment for


covariates can all increase precision in the estimation of
factor effects and helps to control the error.
Control of error
 The ability to detect existing differences among treatments increases as a
result the size of the experimental error decreases.
 A good experiment incorporates all possible means of minimizing the
experimental error.
 Experimental error: extraneous factor which are beyond the control of
the experiments (errors of experimental observation, measurements and
natural factors).
 The commonly used techniques for controlling experimental error in agricultural
research are:
 Blocking
 Proper plot technique
 Data analysis
Blocking
 Blocking is an important component in almost all
experimental designs.
 In field experiments where substantial variation
within an experimental field can be expected and
through blocking, variation among blocks can be
measured and removed from experimental error.
 Significant reduction in experimental error is
usually achieved with the use of proper blocking.
Proper Plot Technique

For almost, all types of experiment, it is absolutely


essential that all other factors aside from those
considered as treatments be maintained uniformly
for all experimental units.

And hence appropriate plotting techniques should


implemented.
Data Analysis
 In cases where blocking alone may not be able to achieve
adequate control of experimental error and hence proper
choice of data analysis can help greatly.

 Covariance analysis is most commonly used for this


purpose.

 The analysis of covariance can reduce the variability


among experimental units by adjusting their values to a
common value of the covariates.
Proper Interpretation of Results
 An important feature of the design of experiments
is its ability to uniformly maintain all
environmental factors that are not a part of the
treatments being evaluated.
 Applicability and generalization of the
experimental results need professional
interpretation of results.
Chapter two

Analysis of variance
Analysis of Variance(ANOVA)
• ANOVA (Analysis of Variance) is a statistical tool to test the
homogeneity of different groups based on their differences.

• ANOVA is the method of analyzing the variance in a set of data


and dividing the variance into groups according to the sources of
those variations.

• ANOVA is based on the principle that the total amount of


differences in a set of data can be divided into two types, the
amount that can be attributed to chance and the other that is
caused due to specific causes.
ANOVA-----cont’ed
• ANOVA is used to determine the difference between more than two

population means of the samples by analyzing the variation within each

of the samples, and relative to the variation between the samples.


• It is an extension of two independent sample population mean
comparison.
 While performing ANOVA, assumptions are made.
 The first is that the samples are extracted from a normal population
randomly.
 populations should be independent and the population variance for

each population are assumed to be equal or homogenous.

 All factors other than those being tested are controlled.


One- Way ANOVA
 One-way ANOVA is a short-cut method where a single factor is

considered, and its effect on the samples is observed.

• This method is performed when the means of the samples and/or

the mean of the sample means are non-integer values.

• In one-way ANOVA, at least three groups are analyzed using F-test

as a t-test can be used to determine the difference between two

groups.

• One drawback of one-way ANOVA is that it cannot tell which specific

groups are different from each other but can tell that at least two

groups are different.


The ANOVA Function
• The ANOVA function can be expressed as;

𝒚𝒊𝒋 = 𝝁 + 𝜺𝒊𝒋 ,where

 𝒚𝒊𝒋 =are the actual observation/ data values,


 𝝁= the grand or overall mean and

 𝜺𝒊𝒋 = are the random errors which has zero mean and
constant variance 𝜎 2 .
 The ANOVA compares variation between sample means to
the variation between data points with in each group.
Possible Hypothesis
 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 𝑉𝑠 𝐻1 : 𝜇𝑖 ≠ 𝜇𝑗 𝑓𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑖 ≠ 𝑗,
𝑖, 𝑗 = 1,2, … 𝑘.

 In The hypothesis the test statistics used will be an F-test using the
ration of the two mean square (MSG and MSE) at (k-1, N-k) degree
of freedom.

𝑀𝑆𝐺
 i.e 𝐹𝑐𝑎𝑙 = ; where 𝑀𝑆𝐺 is the mean square between groups
𝑀𝑆𝐸

and 𝑀𝑆𝐸 is the mean square of error.


Data layout
• Let we have k- population with observation, then the
actual data can be presented as follows.

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2 … 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑘


𝑦11 𝑦21 𝑦𝑘1
𝑦12 𝑦22 𝑦𝑘2
.. .. ..
. . .
𝑦1𝑛 𝑦2𝑛 𝑦𝑘𝑛

Here the data structure is when samples taken in each


population are equal or balanced sample are taken.
But this may not be always true.
Components of ANOVA Table summary
Source of Degree of Sum of Mean Calculated F-
variation (SV) freedom(DF) Square(SS) Square(MS) value
Between k-1 SSG MSG
groups
With in N-k SSE MSE 𝑀𝑆𝐺
𝐹=
groups(error) 𝑀𝑆𝐸
Total N-1 SST

Where N=total observation in all the population/groups


𝑆𝑆𝐺 = 𝑘𝑖=1 𝑛𝑖 (𝑦𝑖 − 𝑦)2 , 𝑆𝑆𝐸 = 𝑘𝑖=1(𝑛𝑖 − 1)𝑆𝑖 2 = 𝑘𝑖=1 𝑛
𝑗=1(𝑦𝑖𝑗 −𝑦𝑖 )
2
𝑘 𝑛

𝑆𝑆𝑇 = (𝑦𝑖𝑗 −𝑦)2 = 𝑆𝑆𝐺 + 𝑆𝑆𝐸,


𝑖=1 𝑗=1
𝑛𝑖 𝑛𝑖 2
𝑆𝑆𝐺 𝑀𝑆𝐺 𝑗=1 𝑦𝑖𝑗 2 𝑗=1(𝑦𝑖𝑗 −𝑦𝑖 )
𝑀𝑆𝐺 = 𝑘−1
, 𝑀𝑆𝐸 = 𝑁−𝑘
and F is as seen in the table. 𝑦𝑖 = 𝑛𝑖
, 𝑆𝑖 = 𝑛𝑖 −1
𝑘 𝑛 𝑘
𝑖=1 𝑗=1 𝑦𝑖𝑗 𝑖=1 𝑛𝑖 𝑦𝑖
and 𝑦 = =
𝑁 𝑁
The hypothesis testing Procedure in ANOVA

 The usual procedure can be followed.


• State the null and alternative hypothesis.
• Determine the level of significance and calculate the
tabulated F- value. Usually α= 1%,5% and 10%.
• Determine the test statistic: here in order to determine you
F- value, you have to compute all values of ANOVA table.
• State the decision rule.
• Interpretation of results and conclusion .
Practical examples.
1. Complete the Following Missing values in ANOVA
table.
Sources of DF Sum of Mean F
variation Square Square
Between 2 2668.8 ___?___
groups __?____
Error ___?___ ____?___ 84.6
total 16 3852.9
Example 2
 To test if the mean time needed to mix a batch of material is the
same for machines produced by three manufacturers, the following
data on the time (in minute) needed to mix the material were
obtained.
Manufacturer
A B C
20 28 20
26 26 19
24 31 23
22 28 21

 Test equality of mean time need to mix the materials at alpha


equals to 5%.
Solution
 Using the formal procedure, we will get the following ANOVA
table results.
SV DF SS MS F

b/n groups 2 104 52


10.63
Within 9 44.01 4.89
groups

Total 11 148.01

 And 𝐹𝛼 𝑘 − 1, 𝑛 − 𝑘 = 𝐹0.05 (2,9)=4.26


 At 5% significance level, the means are not equal since 4.26<10.63.
Which shows significance mean time difference to mix the material.
Exercise
 Using the following three populations(groups) data and answer the
questions below.

Group A 62 60 50 48 47

Group B 60 60 58 53 49
Group C 59 49 49 47 42

1) Compute the total sum of square, Error sum of square, Between group
sum of square, Mean square between groups, Mean square of error, and
F value and
2) Present your result using ANOVA table.
3) Is there enough evidence at 5% significance level to suggest that the
three means has significant difference?
Reading assignment
• If you get a significance difference of means
between groups in the ANOVA hypothesis testing
procedure, what further procedure you use to see
between which group the difference exists?

• What Does multiple comparison test mean?


Discuss in detail with your friends.

You might also like