biometry

Wollo University
Department of Statistics
Biometry Course Notes for Biotechnology

Students
By: Tilaye M.
1. Introduction
• Researchers need to understand the basic principles of
experimental design and basic terminologies of Statistics
in order to perform quality research through the application
of Statistical principles and methods.
• The goal of every research activity is to make a good

decision and scientific generalization using data.
• Data are the raw material while information is what you

get when you organize and analyze your data.
Definition and Description of terminologies
 Statistics is science that deals with data and Method.
 Statistics - defined as the science of extracting information from

data.
 It is the development, application of theory and methods to the

collection(Design), analysis(Applying statistical techniques and
methods) and interpretation of observed information from planed or
unplanned experiments under uncertainty.
 Technological developments have demanded methodology for the

efficient extraction of reliable statistics from complex databases.
 Sample - a subset of observations selected by a specific procedure.
 i.e a part/subset of a population. E.g a Sample of diseased
patients.
 Population - the complete or the entire set of observations about
which information is desired. E.g the population all patients
afflicted with a disease.
• Parameter: a numerical measure or a value used to describe a
population.
• Statistic: a numerical measure or value used to describe a sample.
• Data set - a collection of data
• Raw data - the initial measurements that form the basis of analyses
• Biostatistics is application of statistical methods to analyze
biological data.
• Biological data: Agriculture, health etc.
• Biometry/biometrics - the application of mathematical and
statistical methods to the collection, analysis, and interpretation of
biological data .
• Biometry refers to the development and application of statistical

methods for planned biological experiments.
Examples:
 Determine which of 3 fertilizers compounds produces highest
yield.
 Determine which of two drugs is more effective for controlling a
certain disease in humans
 Determine whether an activity such as smoking causes a response
such as lung cancer
Data and variables
 The Oxford English Dictionary (1971) defined data (singular
datum) as facts, especially numerical facts, collected together for
reference or information.
 Data are at the center of experimental and observational studies.
 Data are the material with which statisticians work.
 They are records of measurement, counts or observations.
 Examples of data are records of weights of calves, milk yield in

lactation of a group of cows, male or female sex, and blue or green
color of eyes.
Data and Variable
 A variable is any observable event that can vary
 Variable is anything that can assume different
observable values.
 Alternatively, a set of observations on a particular
character is termed a variable, e.g. weight, milk yield,
sex, and eye color
 Data are the values of a variable, for example, a
weight of 200 kg, a daily milk yield of 20 kg, male, or
blue eyes.
Types of variables
• Variables can be defined as quantitative (numerical) and
qualitative (attributive, categorical, or classification) variable.
1. Quantitative variables
• Have values expressed as numbers and the differences between
values have numerical meaning.
• A quantitative variable can be also classified as continuous or
discrete .
• A continuous variable in turn can be classified into interval and
ratio values based on its measurement scale.
Cont…
A. Continuous (measurement) variables - values are determined
using some kind of measuring scale, e.g. length, weight, volume,
area, density, time, etc.
 It can assume any range of values between an intervals.
 Its scale of measurement can be either Interval or ratio scale.
• Interval variables – continuous variables that have no true zero
value, e.g. temperature (0°C = 32°F)
• Ratio variables – continuous variables that can have a true zero
value, e.g. milk yield of a cow.
B. Discrete variables –can assume finite or countably infinite values.
 values are determined by counting, e.g. family size, litter size,

microbial count, egg production, tiller count, number of teats on a
cow, etc.
Qualitative variables
 These are variables described by the qualities they possess. They
are expressed in categories or groups.
 It can not be measured numerically.
A. Ordinal (ranked) variables - an ordinal variable has categories
that can be ranked or ordered.
 It is expressed by relative differences, e.g. high, medium, low, etc.
 We can assign numbers (scores) to ordinal categories, e.g. 1 = low;

2 = medium; & 3 = high; but differences among those numbers
have no numerical meaning
 E.g. difference b/n score 1 and 2 (low and medium) does not have
the same meaning as the difference between 2 and 3 (medium and
high).
B.Nominal variables
• a nominal variable has categories that cannot be ranked. No
category is more valuable than another, e.g. color, gender,
Marial Status, etc.
• Individuals can be assigned in two different groups that has
no order and meaningful mathematical and statistical
difference.
• Reading assignment: Discuss in detail about measurement
scale and type of scales of measurement with practical
examples..
The need of biometry and experimental design
 In agricultural, medical and other biological applications, the most
common objectives is comparison of two or more treatments or else.
 The premise of statistical inference is that we attempt to control and
access the uncertainty of inferences we make on the population of interest
based on observation of samples.
 In Survey study, the research has no control for the response variable but
designed experimental study allows the research to control variation in
the response variable.
 And biometry is all about experimental study (DOE).
 Thus, the application of Biometry and experimental design is useful for
comparison of means and control of variation though randomization,
replication and blocking.
The basic objective of experimental design is to
construct an experiment that allows for a valid
estimate of the variance of the observations.
Experimental design is the basis for most of the

ideas of designing scientific investigations.
Experimental design is useful in conducting

planed experimental research .
In designed experimental study the primary
interest is:
– to investigate on the relative performance of
certain factors.
• The interest would be in answering questions
such as:
– are the three methods for treating the disease
different? And if so, by how much?
– Is the new teaching method significantly different
from the old method?
The Role of Statistics / Biometry/ in Experimental design
• Statistics has a significance role in experimental design.
 Project Planning Phase
– What is to be measured?
– How large is the likely variation?
– What are the influential factors?
 Experimental Design Phase
– Control known sources of variation
– Allow estimation of the size of the uncontrolled variation
– Permit an investigation of suitable models
 Statistical Analysis and result interpretation Phase
• Make inferences on design factors
• Guide subsequent designs
• Suggest more appropriate models
• proper result interpretation.
• And concluding remarks!
Basic Principles of Experimental Design
The researcher has the control over the factors
to be tested and the form of data to be collected.
 He/she sets the experiment and observes the
outcome.
Experiment:
– is a planned inquiry set to obtain new facts,
confirm or disapprove results from a previous
experiment or verify certain biological phenomenal.
Objectives
– The objectives must clearly stated as questions
to be answered; hypotheses to be tested, and
effects to be estimated.
– It is necessary to classify the objectives as

major or minor, since certain experimental
designs give greater precision for some treatment
comparisons than others.
Components of experimental Design
 The following are components that any researcher must clearly state
when conducting a designed experimental study.

 Treatment structure
 Design structure
 Experimental unit
 Randomization
 Replications
 Assumptions and Hypothesis
 Treatment: the procedure whose effects is to be measured and
compared with other treatments.
– E.g. a standard ration, a spraying schedule, a temperature-humidity
combination, etc.
Cont…
 A set of treatments,
– e.g. sources of fertilizer such as DAP, CAN, TSP, Manure, etc.
One-way treatment structure,
– e.g. nitrogen levels, Dairy meal levels, etc.,
Two-way treatment structure,
– e.g. plant population and different hybrids.
 Higher order treatment structure, etc.
 The interest is to estimate effects, compare effects, predict, etc.,

Cont..
• Experimental unit: the unit of experimental material to which one
application of the treatment is applied.
• It is the smallest unit of material to which the treatment is applied.
– e.g. an animal, 5 pigs in a pen, a half-leaf.
• Experimental design: from a statistician point of view, a design is a plan

for obtaining and using experimental material to allow comparisons
among treatments.
• More specifically, it is a plan for applying the treatments to experimental

units in such a way that experimental units are alike except for the
treatments.
Replication:
• When a treatment appears more than once in an
experiment, it is said to replicated.
• Replication is necessary to provide for an estimate of

experimental error, which is required for tests of
significance.
–Replication provides means of computing

experimental error.
• Without replication there is no basis for comparison.

• The number of replication is determined by:
– the extent, to which the standard error must be reduced,
– which is in turn determined by the size of treatment difference, which
the experiment should detect, and
– the amount of precision envisaged in the experiment.

Randomization:
• It is a process in which the order of allocation of treatments to
experimental units can be done randomly in order to eliminate
human bias and any other systematic influence.
• Done to ensure that we have a valid/unbiased estimate of:
– experimental error,
– treatment means and
– differences among the treatment means, etc
 More importantly it provides insurance against the possibility that

the model for analysis is valid.
Design structure
 Involves techniques for controlling known variation among the
experimental units.
– Thus, experimental units are grouped into homogeneous groups
referred as blocks such that variation within the groups is a minimum
and between them is a maximum.
 The following are examples of design structure:

– Complete randomized design (CRD)
– Randomized complete block design (RCBD)
– Latin square design
– Cross-over design,
Incomplete Block design, etc.
Assumptions
• The design structure and treatment structure do
not interact.
• The observed values are independently and
identically distributed normal with a constant
variance.
Hypothesis
• In experimental design, the possible hypothesis to be
tested should be clearly stated.
• For example; testing equality of treatment means.
• The level of significance should be given for the
assurance of equality or inequality of the possible
treatment means.
• Once the hypothesis is framed, the next step is to
design a procedure for its verification.
 This is the experimental procedure, which usually
consists of four phases:
1) Selecting the appropriate materials to test.
2) Specify the characters to measure
3) Selecting the procedure to measure those characters
4) Specify the procedure to determine whether the
measurements made support the hypothesis.
 In general, the first two phases are fairly easy for a
subject matter specialist to specify.
• For example, in a maize breeder, the test materials would
probably be the native and the newly developed varieties.
• The characters to be measured would probably be disease
infection and grain yield, and other agronomic characters.
• On the other hand, the procedures regarding how the
measurements are to be made and how these measurements
can be used to prove or disprove a hypothesis depends
heavily on techniques developed by statisticians.

These two tasks constitute much of what is
generally termed the design of experiment, which
has essentials components:
Estimate of error
Control of error
Proper interpretation of results
Usually, due to chance and other factors error

exists in any of experimental studies.
Estimate of the error
 The term Estimate is usually used to denote the actual numerical
values one might calculate in practice.
 While Estimator is a quantity describing the sample that is used to

guess for the value of the corresponding population parameter.
 That is the sample mean(𝑦) is an estimator of the population mean
(𝜇).
 Hence, the error is the discrepancy between the population value and
sample value.
 In experimental designed study, the experimental error occurred and

can be minimized by randomization and replication.
 Precision: is a property of the random variables or statistics and not
of the observed values of those variables or statistics.
 Precision, sensitivity, or amount of information is measured as a

reciprocal of the variance of a mean.
1 𝑛
 That is 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 = = . Where 𝑣𝑎𝑟(𝑦) denotes the
𝑣𝑎𝑟(𝑦) 𝜎2
variance the sample mean 𝑦.
 An observed effect is said to be sufficiently precise if the standard
deviation of the statistic that measures the effect is suitably small.
 As the variance of y denoted by 𝝈𝟐 increases, the amount
of information decreases (less precise).
 Similarly as “n” increases, the amount of information

increases(more precise).
 Blocking, repeat tests, replication, and adjustment for

covariates can all increase precision in the estimation of
factor effects and helps to control the error.
Control of error
 The ability to detect existing differences among treatments increases as a
result the size of the experimental error decreases.
 A good experiment incorporates all possible means of minimizing the
experimental error.
 Experimental error: extraneous factor which are beyond the control of
the experiments (errors of experimental observation, measurements and
natural factors).
 The commonly used techniques for controlling experimental error in agricultural
research are:
 Blocking
 Proper plot technique
 Data analysis
Blocking
 Blocking is an important component in almost all
experimental designs.
 In field experiments where substantial variation
within an experimental field can be expected and
through blocking, variation among blocks can be
measured and removed from experimental error.
 Significant reduction in experimental error is
usually achieved with the use of proper blocking.
Proper Plot Technique
For almost, all types of experiment, it is absolutely

essential that all other factors aside from those
considered as treatments be maintained uniformly
for all experimental units.
And hence appropriate plotting techniques should

implemented.
Data Analysis
 In cases where blocking alone may not be able to achieve
adequate control of experimental error and hence proper
choice of data analysis can help greatly.
 Covariance analysis is most commonly used for this

purpose.
 The analysis of covariance can reduce the variability

among experimental units by adjusting their values to a
common value of the covariates.
Proper Interpretation of Results
 An important feature of the design of experiments
is its ability to uniformly maintain all
environmental factors that are not a part of the
treatments being evaluated.
 Applicability and generalization of the
experimental results need professional
interpretation of results.
Chapter two
Analysis of variance
Analysis of Variance(ANOVA)
• ANOVA (Analysis of Variance) is a statistical tool to test the
homogeneity of different groups based on their differences.
• ANOVA is the method of analyzing the variance in a set of data

and dividing the variance into groups according to the sources of
those variations.
• ANOVA is based on the principle that the total amount of

differences in a set of data can be divided into two types, the
amount that can be attributed to chance and the other that is
caused due to specific causes.
ANOVA-----cont’ed
• ANOVA is used to determine the difference between more than two
population means of the samples by analyzing the variation within each
of the samples, and relative to the variation between the samples.

• It is an extension of two independent sample population mean
comparison.
 While performing ANOVA, assumptions are made.
 The first is that the samples are extracted from a normal population
randomly.
 populations should be independent and the population variance for
each population are assumed to be equal or homogenous.
 All factors other than those being tested are controlled.

One- Way ANOVA
 One-way ANOVA is a short-cut method where a single factor is
considered, and its effect on the samples is observed.
• This method is performed when the means of the samples and/or
the mean of the sample means are non-integer values.
• In one-way ANOVA, at least three groups are analyzed using F-test
as a t-test can be used to determine the difference between two
groups.
• One drawback of one-way ANOVA is that it cannot tell which specific
groups are different from each other but can tell that at least two
groups are different.

The ANOVA Function
• The ANOVA function can be expressed as;
𝒚𝒊𝒋 = 𝝁 + 𝜺𝒊𝒋 ,where
 𝒚𝒊𝒋 =are the actual observation/ data values,

 𝝁= the grand or overall mean and
 𝜺𝒊𝒋 = are the random errors which has zero mean and
constant variance 𝜎 2 .
 The ANOVA compares variation between sample means to
the variation between data points with in each group.
Possible Hypothesis
 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 𝑉𝑠 𝐻1 : 𝜇𝑖 ≠ 𝜇𝑗 𝑓𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑖 ≠ 𝑗,
𝑖, 𝑗 = 1,2, … 𝑘.
 In The hypothesis the test statistics used will be an F-test using the
ration of the two mean square (MSG and MSE) at (k-1, N-k) degree
of freedom.
𝑀𝑆𝐺
 i.e 𝐹𝑐𝑎𝑙 = ; where 𝑀𝑆𝐺 is the mean square between groups
𝑀𝑆𝐸
and 𝑀𝑆𝐸 is the mean square of error.

Data layout
• Let we have k- population with observation, then the
actual data can be presented as follows.
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2 … 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑘

𝑦11 𝑦21 𝑦𝑘1
𝑦12 𝑦22 𝑦𝑘2
.. .. ..
. . .
𝑦1𝑛 𝑦2𝑛 𝑦𝑘𝑛
Here the data structure is when samples taken in each

population are equal or balanced sample are taken.
But this may not be always true.
Components of ANOVA Table summary
Source of Degree of Sum of Mean Calculated F-
variation (SV) freedom(DF) Square(SS) Square(MS) value
Between k-1 SSG MSG
groups
With in N-k SSE MSE 𝑀𝑆𝐺
𝐹=
groups(error) 𝑀𝑆𝐸
Total N-1 SST
Where N=total observation in all the population/groups

𝑆𝑆𝐺 = 𝑘𝑖=1 𝑛𝑖 (𝑦𝑖 − 𝑦)2 , 𝑆𝑆𝐸 = 𝑘𝑖=1(𝑛𝑖 − 1)𝑆𝑖 2 = 𝑘𝑖=1 𝑛
𝑗=1(𝑦𝑖𝑗 −𝑦𝑖 )
2
𝑘 𝑛
𝑆𝑆𝑇 = (𝑦𝑖𝑗 −𝑦)2 = 𝑆𝑆𝐺 + 𝑆𝑆𝐸,

𝑖=1 𝑗=1
𝑛𝑖 𝑛𝑖 2
𝑆𝑆𝐺 𝑀𝑆𝐺 𝑗=1 𝑦𝑖𝑗 2 𝑗=1(𝑦𝑖𝑗 −𝑦𝑖 )
𝑀𝑆𝐺 = 𝑘−1
, 𝑀𝑆𝐸 = 𝑁−𝑘
and F is as seen in the table. 𝑦𝑖 = 𝑛𝑖
, 𝑆𝑖 = 𝑛𝑖 −1
𝑘 𝑛 𝑘
𝑖=1 𝑗=1 𝑦𝑖𝑗 𝑖=1 𝑛𝑖 𝑦𝑖
and 𝑦 = =
𝑁 𝑁
The hypothesis testing Procedure in ANOVA
 The usual procedure can be followed.

• State the null and alternative hypothesis.
• Determine the level of significance and calculate the
tabulated F- value. Usually α= 1%,5% and 10%.
• Determine the test statistic: here in order to determine you
F- value, you have to compute all values of ANOVA table.
• State the decision rule.
• Interpretation of results and conclusion .
Practical examples.
1. Complete the Following Missing values in ANOVA
table.
Sources of DF Sum of Mean F
variation Square Square
Between 2 2668.8 ___?___
groups __?____
Error ___?___ ____?___ 84.6
total 16 3852.9
Example 2
 To test if the mean time needed to mix a batch of material is the
same for machines produced by three manufacturers, the following
data on the time (in minute) needed to mix the material were
obtained.
Manufacturer
A B C
20 28 20
26 26 19
24 31 23
22 28 21
 Test equality of mean time need to mix the materials at alpha

equals to 5%.
Solution
 Using the formal procedure, we will get the following ANOVA
table results.
SV DF SS MS F
b/n groups 2 104 52

10.63
Within 9 44.01 4.89
groups
Total 11 148.01
 And 𝐹𝛼 𝑘 − 1, 𝑛 − 𝑘 = 𝐹0.05 (2,9)=4.26

 At 5% significance level, the means are not equal since 4.26<10.63.
Which shows significance mean time difference to mix the material.
Exercise
 Using the following three populations(groups) data and answer the
questions below.
Group A 62 60 50 48 47
Group B 60 60 58 53 49
Group C 59 49 49 47 42
1) Compute the total sum of square, Error sum of square, Between group
sum of square, Mean square between groups, Mean square of error, and
F value and
2) Present your result using ANOVA table.
3) Is there enough evidence at 5% significance level to suggest that the
three means has significant difference?
Reading assignment
• If you get a significance difference of means
between groups in the ANOVA hypothesis testing
procedure, what further procedure you use to see
between which group the difference exists?
• What Does multiple comparison test mean?

Discuss in detail with your friends.

biometry

Uploaded by

Document Informationclick to expand document informationfgdbbbbg

Document Informationclick to expand document information

Copyright:

Available Formats

biometry

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

biometry

Uploaded by

Copyright:

Available Formats

Wollo University

Biometry Course Notes for Biotechnology

• The goal of every research activity is to make a good

• Data are the raw material while information is what you

 Statistics - defined as the science of extracting information from

 It is the development, application of theory and methods to the

 Technological developments have demanded methodology for the

• Biometry refers to the development and application of statistical

 Data are at the center of experimental and observational studies.

 Data are the material with which statisticians work.

 They are records of measurement, counts or observations.

 Examples of data are records of weights of calves, milk yield in

 values are determined by counting, e.g. family size, litter size,

 We can assign numbers (scores) to ordinal categories, e.g. 1 = low;

Experimental design is the basis for most of the

Experimental design is useful in conducting

– It is necessary to classify the objectives as

when conducting a designed experimental study.

– e.g. sources of fertilizer such as DAP, CAN, TSP, Manure, etc.

One-way treatment structure,

– e.g. nitrogen levels, Dairy meal levels, etc.,

Two-way treatment structure,

– e.g. plant population and different hybrids.

 Higher order treatment structure, etc.

 The interest is to estimate effects, compare effects, predict, etc.,

• It is the smallest unit of material to which the treatment is applied.

– e.g. an animal, 5 pigs in a pen, a half-leaf.

• Experimental design: from a statistician point of view, a design is a plan

• More specifically, it is a plan for applying the treatments to experimental

• Replication is necessary to provide for an estimate of

–Replication provides means of computing

• Without replication there is no basis for comparison.

– the extent, to which the standard error must be reduced,

– which is in turn determined by the size of treatment difference, which

the experiment should detect, and

– the amount of precision envisaged in the experiment.

• Done to ensure that we have a valid/unbiased estimate of:

– treatment means and

– differences among the treatment means, etc

 More importantly it provides insurance against the possibility that

– Thus, experimental units are grouped into homogeneous groups

referred as blocks such that variation within the groups is a minimum

and between them is a maximum.

 The following are examples of design structure:

• The design structure and treatment structure do

• The observed values are independently and

identically distributed normal with a constant

probably be the native and the newly developed varieties.

• The characters to be measured would probably be disease

infection and grain yield, and other agronomic characters.

• On the other hand, the procedures regarding how the

measurements are to be made and how these measurements

can be used to prove or disprove a hypothesis depends

heavily on techniques developed by statisticians.

Usually, due to chance and other factors error

 While Estimator is a quantity describing the sample that is used to

 In experimental designed study, the experimental error occurred and

of the observed values of those variables or statistics.