Variance Component Estimation & Best Linear Unbiased Prediction (Blup)

VARIANCE COMPONENT ESTIMATION &
BEST LINEAR UNBIASED PREDICTION (BLUP)

V.K. Bhatia
I.A.S.R.I., Library Avenue, New Delhi- 11 0012
[email protected]
1. Introduction
Variance components are commonly used in formulating appropriate designs, establishing
quality control procedures, or, in statistical genetics in estimating heritabilities and genetic
correlations. Traditionally the estimators used most often have been the analysis of variance
(ANOVA) estimators, which are obtained by equating observed and expected mean squares
from an analysis of variance and solving the resulting equations. If the data are balanced, the
ANOVA estimators have many appealing properties. In unbalanced situations, these properties
are rarely hold true which create number of problems in arriving at correct decisions. As in
reality, variance components are mostly estimated from unbalanced data only so there are
number of problems associated with them in these situations. In unbalanced situations, two
general classes of estimators have sparked considerable interest: maximum likelihood and
restricted maximum likelihood (ML and REML) and minimum norm and minimum variance
quadratic unbiased estimation (MINQUE and MIVQUE). The links between them is also very
important component. In addition to estimation problems in unbalanced case, the notion of
robust estimation which takes care of influence of outliers and underlying statistical
assumptions is also of interest.
The classical least squares model contains only are random element, the random error; all other
effects are assumed to be fixed constants. For this class of models, the assumption of
independence of the ε i implies independence of the yi. That is if Var ( ε ) = Iσ 2 , then
Var ( y) = Iσ 2 also. Such models are called fixed effects models or more simply fixed models.
There are situations when there is more than one random term. The classical variance
components problems, in which the purpose is to estimate components of variance rather than
specific treatment effects, is one example. In these cases, the “treatment effects” are assumed
to be a random sample from a population of such effects and the goal of the study is to estimate
the variance among these effects in the population. The individual effects that happen to be
observed in the study are not of any particular interest except for the information they provide
on the variance component. Models in which all effects assumed to be random effects are
called random models. Models that contain both fixed and random effects are called mixed
models.
2. Analysis of Variance Approach

The conventional least square approach, sometimes called the analysis of variance approach, to
mixed model is to assume initially that all effects, other than the term that assigns a unique
random element to each observation are fixed effects. Least squares is applied to this “fixed”
model to obtain relevant partitions of the sums of squares. Then, the model containing the
random effects as reinstated and expectations of the mean squares are derived. The mean
square expectations determine how tests of significance are to be made and how variance
Variance Component Estimation and BLUP
components are to be estimated. Adjustments to tests of significance are made by

“constructing” an error mean square that has the proper expectation with respect to the random
elements. This requires the expectations of the mean squares under the random model. For
balanced data the mean square expectations are easily obtained. The expectations are
expressed in terms of a linear function of the variance components for the random effect plus a
general statement of the classes of fixed effects involved in the quadratic function.
3. Henderson’s Methods I, II & III

Henderson described three methods of estimating variance components that are just three
different ways of using the general ANOVA method. They differ only in the different
quadratics ( not always sums of squares ) used for a vector of any linearly independent
quadratic forms of the observations. All three also suffer from demerits of the general ANOVA
method - that for unbalanced data no optimal application of the method is known, the methods
can yield negative estimates, and distributional properties of the estimators are not known.
Method I
In Method I the quadratics used are analogous to the sums of squares used for balanced data,
the analogy being such that certain sums of squares in balanced data become, for unbalanced
data, quadratic forms that are not non-negative definite, and so they are not sums of squares.
Thus e.g. for 2-way cross classification with n observations per cell, the sum of squares
∑ bn( y i.. − y ... ) = ∑ bny i.. − abny ...
2 2 2
becomes for unbalanced data

∑ bn i. ( y i.. − y ... ) = ∑ bn i. y i.. − abn .. y ...
2 2 2
This method is easy to compute, even for large data sets; and for random models, it yields
estimators that are unbiased. It can not be used for mixed models. It can only be adopted to a
mixed model by altering that model and treating the fixed effect either as non-existent or as
random - in which case the estimated variance components for the true random effects will be
biased.
Method II
This is designed to capitalize on the easy computability of Method I and to broaden its use by
removing the limitation of Method I that it can not be used for mixed models. The method has
two parts. First make the temporary assumption that the random effects are fixed and for the
model
y=Xβ+Zu+e
solve the normal equation for β0

⎡ X′X X′Z⎤ ⎡β0 ⎤ ⎡ X ′y ⎤
⎢ Z′ X Z′ Z ⎥ ⎢ 0 ⎥ = ⎢ Z′ y ⎥
⎣ ⎦ ⎣u ⎦ ⎣ ⎦
Then consider the vector of data adjusted for β0 , namely

z = y - X β0 and then model for z will be
z = 1 µ$ + Zu + Ke
VI-2
where µ$ differ from µ and where K is known. This can thus easily be analysed by Method I.
Method II is relatively easy to compute, especially when the number of fixed effects is not too
large. And although it can be used for a wide variety of mixed models, it can not be used for
those mixed models that have interactions between fixed and random factors, whether those
interactions are defined as random effects (the usual case) or as fixed effects.
Method III
This uses sums of squares that arise in fitting an overparameterised model and submodels
thereof. It can be used for any mixed model and yield estimators that are unbiased. Although
the method uses sums of squares that are known (at least in some cases) to be useful in certain
fixed effects models, no analytic evidence is available that these sums of squares have optimal
properties for estimating variances. The main disadvantage of this method is that though its
confinement to sums of squares for fitting overparameterised models, there is a problem of too
many sums of squares being available. For example for the 2-way crossed classification
overparameterised model with equation
y ijk = µ + α i + β j + φ ij + eijk
Suppose all effects are random. There are then four variance components to estimate :
σ α2 , σ β2 , σ 2τ and σ 2ε . But for that model there are five different sums of squares R(α µ ) ,
R(β µ , α ) , R(β µ ) , R(α µ , β) and R( τ µ , α , β) as well as SSE which can be used. From
these at least three sets suggest themselves as possible candidates for Method III estimation
(a) R(α µ ) R(β α , µ ) R( τ α , β, µ ) SSE
(b) R(β µ ) R ( α β, µ ) R ( τ α , β, µ ) SSE
(c) R ( α µ , β) R(β α , µ ) R( τ α , β, µ ) SSE
All three sets yield the same estimators of σ 2τ and σ 2e . Two different estimators of σ α2 and
σ β2 arise and it is difficult to conclude that which sets of estimators are to be preferred.
4. ML (Maximum Likelihood)
The method is based on the maximizing the likelihood function. For the mixed model, under
the assumption of normality of error terms and random effects we have
y = Xβ + Zu + e ∼ N( Xβ, V)
with
V = ∑ σ 2i Zi Z'i + σ e2 I N = ∑ σ 2i Zi Z'i
i =1 i=0
The likelihood function is then

L = (2 π)
−1/ 2 N
V
−1/ 2
[
exp −1 / 2( y − Xβ) V −1 ( y − Xβ)
'
]
Maximizing L with respect to elements of β and the variance components (the σ 2i s that occur
in V) leads to equations that have to be solved to yield ML estimators of β and of σ2. These
VI-3
equations can be written in a variety of ways and can be solved iteratively. Despite the
numerical difficulties involved in solving these equations for obtaining ML estimators of
variance components, it is preferred over ANOVA method. The reason is that this method is
well defined and the resulting estimators have attractive, well-known large-sample properties
they are normally distributed and their sampling variances are known.
5. REML (Restricted Maximum Likelihood)

REML estimators are obtained from maximizing that part of the likelihood which is invariant to
the location parameter; i.e. in terms of the mixed model y = Xβ + Zu + e , invariant to Xβ.
Another way of looking at it, is that REML maximizes the likelihood of a vector combinations
of the observations that are invariant to Xβ.
Suppose Ly is such a vector. Then

Ly = LXβ + LZu + Le is invariant to Xβ if and only if LX = 0.
Computational problems for obtaining solutions are same as that of ML method. The REML
estimation procedure does not, however, include estimating β. On the other hand the REML
equations with balanced data provide solutions that are identical to ANOVA estimators which
are unbiased and have attractive minimum variance properties. In this sense REML is said to
take account of the degrees of freedom involved in estimating the fixed effects, whereas ML
estimators do not. The easiest example of this is the case of a simple sample of n observations
from a N( µ , σ 2 ) distribution. The two estimators of σ2 are
σ 2ML = ∑ ( x i − x) / n
2
σ 2REML = ∑ ( x i − x) / ( n − 1)
2
The REML estimator has taken account of the one degree of freedom required for estimating µ,
whereas the ML estimator has not. The REML estimator is also unbiased, but the ML
estimator is not. In the general case of unbalanced data neither the ML estimator nor the
REML estimators are unbiased.
6. MINQUE (Minimum Norm Quadratic Unbiased Estimation)

The Method is based on the concept that the estimation minimize a (Euclidean) norm, be a
quadratic form of the observations and be unbiased. Its development involves extensive
algebra. More importantly, its concept demands the use of some pre-assigned weights that
effectively play a part of a priori values for the unknown variance components. This method
has two advantages; it involves no normality assumptions as do ML and REML. And the
equations that yield the estimator do not have to solved iteratively. The solution only depends
on the pre-assigned values; different pre-assigned values can give different estimators from the
same data set. One must therefore talk about “a” MINQUE estimator and not “the” MINQUE
estimator. This appears to a troublesome feature of the MINQUE procedure. Also, its
estimators can be negative and they are only unbiased if indeed the true, unknown value of σ2
is pre-assigned. There is also a close relationship between REML and MINQUE i.e.
a MINQUE solution = a first iterate of REML.
VI-4
7. MIVQUE (Minimum Variance Quadratic Unbiased Estimation)

MINQUE demands no assumptions about the form of the distribution of y. But if the usual
normality assumptions are invoked, the MINQUE solution has the properties of being that
unbiased quadratic form of the observations which has minimum variance; i.e. it is a minimum
variance quadratic unbiased estimator, MIVQUE.
8. I-MINQUE (Iterative MINQUE)

As already pointed out, the MINQUE procedure demands using a weight vector for the pre-
assigned value for σ2. No iteration is involved. But having obtained a solution, σ 12 say, its
existence prompts the idea of using it as a new pre-assigned value for getting a new estimate of
σ2, say σ 22 . This leads to using the MINQUE equations iteratively to yield iterative MINQUE,
or I-MINQUE estimators. They are, of course, if one iterates to convergence, the same as
REML estimators. Hence I-MINQUE = REML. Even in the absence of normality assumptions
on y, the I-MINQUE solutions do have large-sample normality properties.
9. Negative Variance Component Estimates

The variance components should always be positive because they are assumed to represent the
variance of a random variable. But some of existing methods like ANOVA and MIVQUE do
give rise to negative estimates. These negative estimates may arise for a variety of reasons.
• The variability in your data may be large enough to produce a negative estimate even
though the true value of the variance component is positive.
• Data may contain outliers which exhibit unusual large variability.
• A different model for interpreting your data may be appropriate. Under some statistical
models for variance components analysis, negative estimates are an indication that
observations in the data are negatively correlated.
10. Robust Estimation

Outliers may occur with respect to any of the random components in a mixed - model analysis
of variance. There is an extensive literature on robust estimation in the case of single error
component. There is, however, only a small body of literature on robust estimation in the
variance-component model
11. Computational Problems

The special features of various computational problems of estimating variance components
involve the application of iterative procedures such as Newton-Raphson and Marquardt
method, Method of scoring, Quasi-Newton methods, EM algorithm and Method of successive
approximations.
12. Evaluation of Algorithms

Several recent research papers evaluate algorithms for variance components estimation. While
there is no consensus on the best method, some general conclusions seem to be as follows:
1. The Newton-Raphson method often converges in the fewest iterations, followed by the
scoring method and the EM algorithm. In some cases the EM algorithm requires a
very large number of iterations. The individual iterations tend to be slightly shorter for
the EM algorithm, but this depends greatly on the details of the programming.
VI-5
2. The robustness of the methods to their starting values (ability to converge given poor
starting values) is the reverse of the rate of convergence. The EM algorithm is better
than Newton-Raphson.
3. The EM algorithm automatically takes care of inequality constraints imposed by the
parameter space. Other algorithms need specialized programming to incorporate
constraints.
4. Newton-Raphson and scoring generate an estimated, asymptotic variance-covariance
matrix for the estimates as a part of their calculations. At the end of the EM iterations,
special programming [perhaps a single step of Newton-Raphson ] needs to be
employed to calculate asymptotic standard errors.
13. Computational Methods Available in SAS

Four methods are available in SAS PROC VARCOMP statements using the METHOD =
option. They are
The Type 1 Method

This method (METHOD = TYPE 1) computes the type 1 sum of squares for each effect,
equates each mean square involving only random effects to its expected values and solves the
resulting system of equation.
The MIVQUE0 Method

The MIVQUE0 method (METHOD = MIVQUE0) produces unbiased estimates that are
invariant with respect to the fixed effects of the model and are locally best quadratic unbiased
estimates given that the true ratio of each component to the residual error component is zero.
The technique is similar to Type 1 except that the random effects are adjusted only for the fixed
effects. This is a default method used in PROC VARCOMP.
The MAXIMUM - LIKELIHOOD Method

The ML method (METHOD = ML) computes maximum likelihood estimates of the variance
components .
The RESTRICTED MAXIMUM - LIKELIHOOD Method

The restricted maximum likelihood method (METHOD = REML) is similar to ML method,
but it first separates the likelihood into two parts, one that contains the fixed effects and another
that does not. This is an iterated version of MIVQUE0.
14. Specification for using PROC VARCOMP in SAS

The following statements are used in the VARCOMP procedure
Required in this order :

PROC VARCOMP <option> ;
CLASS Variables ;
MODEL dependents = effects </option> ;
Optional :
BY variables ;
VI-6
Only one MODEL statement is allowed. The BY, CLASS and MODEL statements are
described after the PROC VARCOMP statements.
PROC VARCOMP statement

PROC VARCOMP <option> ;
DATA (SAS data set. If this is omitted the most recently created SAS data set is used)
EPSILON = number (default 1E - 8) (Convergence value)
MAXITER = number (number of iterations) (default = 50)
METHOD = TYPE 1/MIVQUE0/ML/REML (default = MIVQUE0)
By statement
BY variables ;
A BY statement can be used with PROC VARCOMP to obtain separate analyses on
observation in groups determined by the BY variables.
CLASS statement
The CLASS statement specifies the classification variables to be used in the analysis.
MODEL statement
MODEL dependents = effects </option> ;
The MODEL statement gives the dependent variables and independent effects. If more than
one dependent is specified, a separate analysis is performed for each one. Only one MODEL
statement is allowed. Only one option is available in the MODEL statement.
FIXED = n
= Tells VARCOMP that the first n effects is the MODEL statement are effects. The
remaining effects are assumed to be random. By default PROC VARCOMP
assumes that all effects are random in the model of Y = A|B/ Fixed = 1
then A x B is considered a random effect.
Example: In this example, A & B are classification variables and Y is the dependent variable.
A is declared fixed, and B and A x B are random.
data a;
input a b y;
cards;
1 1 237
1 1 254
1 1 246
1 2 178
1 2 179
2 1 208
2 1 178
2 1 187
2 2 146
2 2 145
VI-7
2 2 141
3 1 186
3 1 183
3 2 142
3 2 125
3 2 136
;
Proc Varcomp method = type 1;
Class a b;
model y = a|b/ Fixed = 1 ;
run;
Proc Varcomp method = mivque0;

Class a b;
model y = a|b/ Fixed = 1;
run;
Proc Varcomp method = ml;

class a b;
run;
Proc varcomp method = reml;

class a b;
run;
Exercise: The data given below is first month milk yield of 28 daughters of 4 sires in 3 herds
Herd Sire Daughter Milk Yield
1 1 157, 160, 138

2 96, 110, 115, 120
3 82, 65
4 120, 130, 110
2 1 140, 142, 145
2 122, 117, 98
3 70, 94
3 2 112, 125, 105
3 110, 92
4 116, 129, 131
Case (i) Assume herd and sire as random components.
(ii) Assume only sire as random component.
Obtain the different variance components by all the four methods.
VI-8
Best Linear Unbiased Prediction (BLUP)

A problem that occurs frequently in animal and plant breeding applications and probably in
many other fields as well, is that given a sample data vector from a mixed model, the
experimenter wishes to predict some set of linear functions of a future random vector. Thus, it
is a problem of prediction of random vector in mixed linear models and takes different form
under different situations, which is known as
(a) Best Prediction (BP)

(i) The form of the joint distribution of records and of the random vector to be predicted is
known.
(ii) Parameters of the distribution are known.
(iii) It has been proved that the conditional mean of genetic values given the records, has
optimum properties.
(b) Best Linear Prediction(BLP)

(i) The form of the distribution is not known or certain parameters are not known.
(i) We do know means of the records, the means of the genetic values and variances and
covariances or second moments are known.
(ii) This involves finding that linear function of the records which minimizes the average of
squared errors of prediction.
(iii) In case of normal distribution BLP is BP.
(c) Best Linear Unbiased Prediction (BLUP)

(i) The problem is the same as for BLP, but now we do not know the means.
(ii) Only the variances and covariances of the random vectors are known.
(iii) (iii)We find the linear function of the records which has same expectation as the genetic
values to be predicted and which is, in the class of that function, which minimizes the
average of squared errors.
(d) Neither first nor second moments are known and still it is desired to use linear
prediction methods
(i) We never really known parameters, but we may have good prior estimates of them and it
will be
(1) BP when we have good estimates of all parameters
(2) BLP when we have good estimates of first and second moments
(3) BLUP when we have good estimates of the second central moments
(ii) If we have no prior estimates of either first or second moments, we need to estimate them
from the same data that are used for prediction.
In practical situations, mostly problems are of the type in which we assume that the variance
covariance matrix of random variables is known and further it is assumed that records follow
mixed model. Two methods been most frequently used. In the first, a regular least squares
solution is obtained by treating all random variables except an error vector with variance Iσ2 as
fixed. Then the predictor is as linear function of the least square solution. In the second method,
estimates of the fixed estimates of the model are obtained by some method, possibly by regular
VI-9
least square as in the first method, the data are adjusted for the fixed effects and then selection
index methods are applied to the adjusted data as though the adjustment had been made with
known values of fixed effects.
Henderson, (1963) suggested a combination of these methods and described a mixed model
method which resulted simultaneously best linear unbiased estimators of estimable linear
functions of the fixed elements of the model and best linear unbiased predictors of the random
elements of the model.
The general linear model

y = Xβ + Zu + e
where
y is a nx1 vector of observations
X is known (nxp) matrix
β is (p x 1) vector of fixed effects
u is (q x 1) non observable random effect
e is (n x 1) error effect vector
and
⎡ y ⎤ ⎡ Xβ ⎤ ⎡ y ⎤ ⎡ V ZG R ⎤
⎢ ⎥ ⎢
E ⎢ u⎥ = ⎢ 0 ⎥ ⎥ and V ⎢⎢ u⎥⎥ = ⎢⎢ GZ ' G O ⎥⎥
⎢⎣ e ⎥⎦ ⎢⎣ 0 ⎥⎦ ⎢⎣ e ⎥⎦ ⎢⎣ R O R ⎥⎦
No assumptions are made concerning the distribution of the random variables, however G and
R are assumed known without error and are non singular.
The general problem to be solved is to predict a function K’β+M’u (β generally fixed, u

generally random) as the predictand, by a linear function of the observations, L’y, the
predictor, such that the prediction error variances for predictors of each element of K’β + M’u
are minimized and such that the expected value of the predictor is equal to the expected value
of the predictand. The function K’β must be an estimable function. The prediction error is
K’β + M’u - L’y
and the variance-covariance matrix of this function is the matrix of interest since we wish to
minimize each individual diagonal element. To do this we define this matrix algebraically
Assumption: V( K’β) = 0 and all cov involving K’β = 0,

V(K’β + M’u - L’y) = V(M’u) + V(L’y) - Cov(M’u, y’L) - Cov(L’y, u’M)
= M’GM + L’VL - M’GZ’L - L’ZGM
To ensure that the predictor is unbiased, i.e., has the same expected value as the predictand, we
add a Lagrange Multiplier to the variance-covariance matrix of prediction errors prior to
minimizing the function.
VI-10
We know that
E(K’β + M’u)= K’β
and
E(L’y) = L’Xβ
Thus in order for L’Xβ = K’β for all possible vectors, β, then L’X - K’ = 0 must be true.
Hence the Lagrange Multiplier becomes (L’X - K’)γ.
The LM added to V(K’β + M’u - L’y) gives the function, F, below

F = M’GM + L’VL - M’GZ’L - L’ZGM + (L’X - K’)γ
The function F is differentiated with respect to the unknowns, L and γ, and the derivatives are
equated to zero (null matrices)
∂F
= 2VL - 2ZGM + Xγ = 0
∂ L'
∂F
and = L’X - K’ = 0
∂γ
Note that the second derivative provides the condition which must hold in order for the
prediction to be unbiased.
These results can be rearranged in matrix form as follows:

⎡ V X⎤ ⎡ L ⎤ ⎡ZGM ⎤
⎢ X' 0 ⎥ ⎢1 2 γ ⎥ = ⎢ K ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
Recall that V = ZGZ’ + R and let θ = 1 2 γ

⎡ ZGZ '+ R X⎤ ⎡ L⎤ ⎡ ZGM ⎤
⎢ X' =
⎣ 0 ⎥⎦ ⎢⎣θ ⎥⎦ ⎢⎣ K ⎥⎦
From the first line

RL + ZG(Z’L - M) + Xθ = 0
Let S = G(Z’L - M) and note that

G− 1 S = Z ' L − M
and
M = Z’L - G -1S
Now we can write the following equations

⎡R Z X⎤ ⎡ L ⎤ ⎡ 0 ⎤
⎢ Z ' − G −1 0 ⎥ ⎢ S ⎥ = ⎢ M ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢⎣ X' 0 0 ⎥⎦ ⎢⎣θ ⎥⎦ ⎢⎣ K ⎥⎦
VI-11
Absorb the L equation into the other two

⎡ Z ' R −1Z + G −1 Z ' R −1X⎤ ⎡S⎤ ⎡ M ⎤
- ⎢ −1 ⎥ ⎢ ⎥
=⎢ ⎥
⎣ X ' R −1
Z X ' R X ⎦ ⎣θ ⎦ ⎣K ⎦
Multiply both sides by -1 and let

−1
⎡ C11 C12 ⎤ ⎡ Z ' R −1Z + G −1 Z ' R −1 X ⎤
⎢C = ⎢ ⎥
⎣ 12 ' C22 ⎥⎦ ⎣ X' R −1Z X ' R −1 X ⎦
Then
⎡S⎤ ⎡ C11 C12 ⎤ ⎡− M⎤
⎢θ ⎥ = ⎢ C C22 ⎥⎦ ⎢⎣ − K ⎥⎦
⎣ ⎦ ⎣ 12 '
and
RL = -ZS - Xθ
⎡ C11 C12 ⎤ ⎡ M⎤
= - [ Z X] ⎢ ⎥ ⎢ ⎥
⎣ C12 ' C22 ⎦ ⎣ K ⎦
⎡ C11 C12 ⎤ ⎡ M⎤
L = R-1 [ Z X] ⎢ ⎥ ⎢ ⎥
⎣ C12 ' C22 ⎦ ⎣ K ⎦
and
⎡ C11 C12 ⎤ ⎡ Z ' R −1 y ⎤
L’y = [ M ' K '] ⎢ ⎢ −1 ⎥
⎣ C12 ' C22 ⎥⎦ ⎣ X' R y⎦
= M’ u$ + K’ β$
⎡ u$ ⎤
= [ M ' K '] ⎢ $ ⎥
⎣β ⎦
where
−
⎡ u$ ⎤ ⎡ Z ' R −1Z + G −1 Z ' R − 1 X ⎤ ⎡ Z ' R −1 y ⎤
⎢β$ ⎥ = ⎢ −1 ⎥ ⎢ ⎥
⎣ ⎦ ⎣ X' R Z X ' R − 1 X ⎦ ⎣ X ' R −1 y ⎦
or
−
⎡β$ ⎤ ⎡ X' R −1X X ' R − 1 Z ⎤ ⎡ X ' R −1 y ⎤
⎢ ⎥ = ⎢ −1 −1 − 1⎥ ⎢ −1 ⎥
⎣ u$ ⎦ ⎣ Z ' R X Z ' R Z + G ⎦ ⎣ Z ' R y ⎦
These equations are commonly referred to as Henderson’s mixed model equations, and these
provide predictors with the smallest prediction error variances among all linear unbiased
predictors.
This methodology can be extended to various situations like the case of individual model and
model for related sires etc.
VI-12
Animal Additive Genetic Model

The model for the individual record is
yi = xi’β + zi’u + ai + ei
where β represents fixed effects with xi relating the record on the i-th animal to this vector,
u represents random effects other than breeding values,
zi relates this vector to yi
ai is the additive genetic value of the i-th animal and
ei is a random error associated with the individual record.
The vector representation of the entire set of records is

y = Xβ + Zu + Zaa + e
If a represents only those animals with records, Za = I. Otherwise it is an identity matrix with
rows deleted that correspond to animals without records.
Var (u) = G
Var (a) = A σ 2a
Var (e) = R, usually I σ 2e
Cov (u,a’) = 0, Cov (u,e’) = 0,Cov (a,e’) = 0
If Za ≠ I, the mixed model equations are

⎡ X' R −1X X' R −1Z X' R −1Z a ⎤ ⎡β o ⎤ ⎡ X' R −1y ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ Z ' R −1X Z ' R −1Z + G −1 Z ' R −1Z a ⎥ ⎢ u$ ⎥ = ⎢ Z ' R −1y ⎥
⎢ ' −1 ⎥ ⎢ a$ ⎥ ⎢ −1 ⎥
⎢⎣ Z a R X Z ' a R −1Z Z 'a R −1Z a + A −1 / σ 2a ⎥ ⎢⎣ ⎥⎦ ⎢⎣ Z 'a R y⎥⎦
⎦
If Za = I, it simplifies to
⎡ X' R −1X X' R −1Z X ' R −1 ⎤ ⎡β o ⎤ ⎡ X' R −1y⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
−1 −1 −1 Z ' R −1 ⎢ Z ' R −1y ⎥
⎢ Z' R X Z' R Z + G ⎥ ⎢ u$ ⎥ =
⎢ R −1X ⎢ −1 ⎥
⎢⎣
−
R Z 1 R + A / σ a ⎥⎥
−1 − 1 2 ⎢ a$ ⎥
⎢⎣ ⎥⎦ ⎢⎣ R y ⎥⎦
⎦
If R = I σ 2e it further simplifies to
⎡ X' X X' Z X' ⎤ ⎡β o ⎤ ⎡ X' y⎤
⎢ −1 2 ⎥ ⎢ ⎥ ⎢ Z' y⎥
⎢ Z' X Z' Z + G σ e Z' ⎥ ⎢ u$ ⎥ = ⎢ ⎥
⎢ a$ ⎥
⎢ X
⎣ Z I + A −1σ 2e / σ 2a ⎥⎦ ⎢⎣ ⎥⎦
⎢⎣ y ⎥⎦
If the number of animals is large, one should, of course, use Henderson’s method (1976) for
computing A-1. Since this method requires using a “base” population of non-inbred, unrelated
animals, some of these probably do not have records. Also we may wish to evaluate some
VI-13
progeny that have not yet made a record. Both of these circumstances will result in Za ≠ I, but
a will contain predicted breeding values of these animals without records.
Sire model with additive genetic effects

The model in which related sires are mated to a random sample of unrelated dams, no dam
has more than one progeny with a record, and each progeny produces one record, is
yij = x’ijβ + si + z’iu + eij
where β represents fixed effects with xij relating the j-th progeny of the i-th sire to
these effects
si represents the sire effect on the progeny record
u represents other random factors with zij relating these to the ij-th progeny
record
eij is a random “error”
The vector representation is

y = Xβ + Zss + Zu + e
Var (s) = A σ 2s where A is the numerator relationship of the sires and σ 2s is the sire variance
in the “base” population. If the sires comprise a random sample from this population σ 2s = 1
4
additive genetic variance. Some columns of Zs will be null if s contains sires with no progeny,
as will usually be the case if the simple method for computation of A-1 requiring base
population animals, is used.
Var (u) = G, Cov (s,u’) = 0
Var (e) = R, usually = I σ 2e
Cov (s,e’) = 0, Cov (u,e’) = 0
If sires and dams are truly random,

I σ 2e = .75I (additive genetic variance)
+ I (environmental variance)
With this model the mixed model equations are
⎡ X' R −1X X' R −1Z s X' R −1Z ⎤ ⎡β o ⎤ ⎡ X' R −1y ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢Z ' s R −1X Z ' s R −1Z s + A −1 / σ s2 Z 's R −1Z ⎥ ⎢ s$ ⎥ = ⎢Z ' s R −1y ⎥
⎢ −1 ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣ Z ' R X Z ' R −1Z s Z ' R −1Z + G −1⎥ ⎢ u$ ⎥ ⎢ Z ' R −1y ⎥
⎦ ⎣ ⎦ ⎣ ⎦
If R = I σ 2e , then it simplifies to
⎡ X' X X' Z s X' ⎤ ⎡ β o ⎤ ⎡ X' y ⎤
⎢ −1σ 2 / σ 2 ⎥ ⎢ ⎥ ⎢ ⎥
Z
⎢ s ' X Z ' Z
s s + A e s Z ' s Z ⎥ ⎢ s$ ⎥ = ⎢Z 's y⎥
⎢ ⎥
⎢ Z' X
⎣ Z Z ' Z + σ 2e G −1⎥⎦ ⎢ u$ ⎥ ⎢⎣ Z ' y ⎥⎦
⎣ ⎦
VI-14
Illustration: Suppose we consider seven sires with the following relationships:

S0
S1 S2
S3 S4 S5 S6
There are no progeny on S0. Each sire has two progeny in each of the two contemporary
groups that differ by 100 kg and the calves by each sire within contemporary group differ by
100 kgs. There are a total of 24 calves. Results obtained under different models are as under:
Table 1: Summary of solutions from different models

Solutions
Model g$1 g$ 2 S0 S1 S2 S3 S4 S5 S6
Sires with
progeny 0.00 -3.64 3.64 0.00 1.82 -1.82
data
Groups +
Sires with 0 7.50 1.82 -1.82 2.73 -0.91 0.91 -2.73
progeny
u$ = g$ + s$ 1.82 -1.82 10.23 6.59 8.41 4.77
Sires related -1.13 0.42 -3.25 3.03 0.03 0.25 -2.75

Groups +
Sires related 0 7.50 0.00 1.83 -1.83 10.39 7.39 7.61 4.61
Table 2: Rank of sires under different models

Model Rank
Sires with progeny data 3, 5, 4-1, 6, 2

Groups + Sires with progeny 3, 5, 4, 6, 1, 2
Sires related 3, 1, 5, 4, 0, 6, 2
Groups + Sires related 3, 1, 5, 0, 4, 6, 2
The changes in the sire evaluations as best described by the changes in rank shown in Table 2.
Based on progeny data alone, no distinction can be made between sire one and four. The
addition of groups changes the rank of one, four and six. The addition of relationship creates
more rank changes. Clearly the relationships are the leading contributing factors to the correct
ranking, since three is the best bull by all models and he was the son of one and one is the son
of the base sire.
VI-15
The addition of the relationship matrix does two things for the prediction procedure. 1) It
provides RELATIONSHIP TIES among the animals in different contemporary groups.
Relationship ties do the same thing as reference sire having progeny in many different
contemporary groups. This is an important aspect of including A-1. 2) It also gives
predictions that include the parental half-sib information that is available. The lower the
heritability of the trait, the more important this aspect of including the relationship inverse
becomes. This second aspect is equivalent to the selection index theory approach which
combines sources of information into one predicted value.
References
Dempfle,L.(1977) “Comparison of several sire evaluation methods in dairy cattle breeding,”
Livestock Production Science, pp.129-139.
Goldberger, A.S.(1962) “Best linear unbiased prediction in the generalised linear regression
model,” JASA, 57, pp.369-375.
Harville, D. A.(1976) “Extension of the Gauss-Markov theorem to include the estimation of
random effects,” Ann. Statist., 4, pp.384-395.
Harville,D.A.(1990) “BLUP (Best Linear Unbiased Prediction) and Beyond,”In Advances in
Statistical Methods for Genetic Improvement of Livestock ( D. Gianola and K.
Hammoud, eds.), pp.239-276. Springer, New York.
Henderson, C.R.(1963) “Selection index and expected genetic advance,”In Statistical Genetics
and Plant Breeding, pp.141-163. Nat. Acad. Sci., Nat. Res. Council Publication, 982,
Washington, DC.
Henderson, C.R.(1975) “Best linear unbiased estimation and prediction under a selection
model,” Biometrics, 31, pp.423-447.
Henderson, C.R.(1976) “A simple method for computing the inverse of a numerator
relationship matrix used in prediction used in prediction of breeding values,”
Biometrics, 32, pp.69-83.
Henderson, C.R.(1984) “Applications of Linear Models in Animal Breeding,” University of
Guelph.
Lindley D.V. and Smith, A.F.M.(1972) “Bayes estimates for the linear model”(with
discussion), JRSS, Ser. B, 34, pp.1-41.
Robinson, G.K.(1991) “That BLUP is a good thing: The estimation of random effects,”
Statistical Science, 6(1), pp.15-51.
VI-16

Variance Component Estimation & Best Linear Unbiased Prediction (Blup)

Uploaded by

Copyright:

Available Formats

Variance Component Estimation & Best Linear Unbiased Prediction (Blup)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Variance Component Estimation & Best Linear Unbiased Prediction (Blup)

Uploaded by

Copyright:

Available Formats

VARIANCE COMPONENT ESTIMATION &

BEST LINEAR UNBIASED PREDICTION (BLUP)

2. Analysis of Variance Approach

components are to be estimated. Adjustments to tests of significance are made by

3. Henderson’s Methods I, II & III

becomes for unbalanced data

solve the normal equation for β0

Then consider the vector of data adjusted for β0 , namely

The likelihood function is then

5. REML (Restricted Maximum Likelihood)

Suppose Ly is such a vector. Then

6. MINQUE (Minimum Norm Quadratic Unbiased Estimation)

7. MIVQUE (Minimum Variance Quadratic Unbiased Estimation)

8. I-MINQUE (Iterative MINQUE)

9. Negative Variance Component Estimates

10. Robust Estimation

11. Computational Problems

12. Evaluation of Algorithms

13. Computational Methods Available in SAS

The Type 1 Method

The MIVQUE0 Method

The MAXIMUM - LIKELIHOOD Method

The RESTRICTED MAXIMUM - LIKELIHOOD Method

14. Specification for using PROC VARCOMP in SAS

Required in this order :

PROC VARCOMP statement

Proc Varcomp method = mivque0;

Proc Varcomp method = ml;

Proc varcomp method = reml;

1 1 157, 160, 138

Obtain the different variance components by all the four methods.

Best Linear Unbiased Prediction (BLUP)

(a) Best Prediction (BP)

(b) Best Linear Prediction(BLP)

(c) Best Linear Unbiased Prediction (BLUP)

The general linear model

The general problem to be solved is to predict a function K’β+M’u (β generally fixed, u

Assumption: V( K’β) = 0 and all cov involving K’β = 0,

= M’GM + L’VL - M’GZ’L - L’ZGM

The LM added to V(K’β + M’u - L’y) gives the function, F, below

These results can be rearranged in matrix form as follows:

Recall that V = ZGZ’ + R and let θ = 1 2 γ

From the first line

Let S = G(Z’L - M) and note that

Now we can write the following equations

Absorb the L equation into the other two

Multiply both sides by -1 and let

Animal Additive Genetic Model

The vector representation of the entire set of records is

If Za ≠ I, the mixed model equations are

Sire model with additive genetic effects

The vector representation is

If sires and dams are truly random,

Illustration: Suppose we consider seven sires with the following relationships:

Table 1: Summary of solutions from different models

u$ = g$ + s$ 1.82 -1.82 10.23 6.59 8.41 4.77

Sires related -1.13 0.42 -3.25 3.03 0.03 0.25 -2.75

Table 2: Rank of sires under different models