Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
Variance Component Estimation & Best Linear Unbiased Prediction (Blup)
1. Introduction
Variance components are commonly used in formulating appropriate designs, establishing
quality control procedures, or, in statistical genetics in estimating heritabilities and genetic
correlations. Traditionally the estimators used most often have been the analysis of variance
(ANOVA) estimators, which are obtained by equating observed and expected mean squares
from an analysis of variance and solving the resulting equations. If the data are balanced, the
ANOVA estimators have many appealing properties. In unbalanced situations, these properties
are rarely hold true which create number of problems in arriving at correct decisions. As in
reality, variance components are mostly estimated from unbalanced data only so there are
number of problems associated with them in these situations. In unbalanced situations, two
general classes of estimators have sparked considerable interest: maximum likelihood and
restricted maximum likelihood (ML and REML) and minimum norm and minimum variance
quadratic unbiased estimation (MINQUE and MIVQUE). The links between them is also very
important component. In addition to estimation problems in unbalanced case, the notion of
robust estimation which takes care of influence of outliers and underlying statistical
assumptions is also of interest.
The classical least squares model contains only are random element, the random error; all other
effects are assumed to be fixed constants. For this class of models, the assumption of
independence of the ε i implies independence of the yi. That is if Var ( ε ) = Iσ 2 , then
Var ( y) = Iσ 2 also. Such models are called fixed effects models or more simply fixed models.
There are situations when there is more than one random term. The classical variance
components problems, in which the purpose is to estimate components of variance rather than
specific treatment effects, is one example. In these cases, the “treatment effects” are assumed
to be a random sample from a population of such effects and the goal of the study is to estimate
the variance among these effects in the population. The individual effects that happen to be
observed in the study are not of any particular interest except for the information they provide
on the variance component. Models in which all effects assumed to be random effects are
called random models. Models that contain both fixed and random effects are called mixed
models.
Method I
In Method I the quadratics used are analogous to the sums of squares used for balanced data,
the analogy being such that certain sums of squares in balanced data become, for unbalanced
data, quadratic forms that are not non-negative definite, and so they are not sums of squares.
Thus e.g. for 2-way cross classification with n observations per cell, the sum of squares
∑ bn( y i.. − y ... ) = ∑ bny i.. − abny ...
2 2 2
This method is easy to compute, even for large data sets; and for random models, it yields
estimators that are unbiased. It can not be used for mixed models. It can only be adopted to a
mixed model by altering that model and treating the fixed effect either as non-existent or as
random - in which case the estimated variance components for the true random effects will be
biased.
Method II
This is designed to capitalize on the easy computability of Method I and to broaden its use by
removing the limitation of Method I that it can not be used for mixed models. The method has
two parts. First make the temporary assumption that the random effects are fixed and for the
model
y=Xβ+Zu+e
VI-2
Variance Component Estimation and BLUP
where µ$ differ from µ and where K is known. This can thus easily be analysed by Method I.
Method II is relatively easy to compute, especially when the number of fixed effects is not too
large. And although it can be used for a wide variety of mixed models, it can not be used for
those mixed models that have interactions between fixed and random factors, whether those
interactions are defined as random effects (the usual case) or as fixed effects.
Method III
This uses sums of squares that arise in fitting an overparameterised model and submodels
thereof. It can be used for any mixed model and yield estimators that are unbiased. Although
the method uses sums of squares that are known (at least in some cases) to be useful in certain
fixed effects models, no analytic evidence is available that these sums of squares have optimal
properties for estimating variances. The main disadvantage of this method is that though its
confinement to sums of squares for fitting overparameterised models, there is a problem of too
many sums of squares being available. For example for the 2-way crossed classification
overparameterised model with equation
y ijk = µ + α i + β j + φ ij + eijk
Suppose all effects are random. There are then four variance components to estimate :
σ α2 , σ β2 , σ 2τ and σ 2ε . But for that model there are five different sums of squares R(α µ ) ,
R(β µ , α ) , R(β µ ) , R(α µ , β) and R( τ µ , α , β) as well as SSE which can be used. From
these at least three sets suggest themselves as possible candidates for Method III estimation
(a) R(α µ ) R(β α , µ ) R( τ α , β, µ ) SSE
(b) R(β µ ) R ( α β, µ ) R ( τ α , β, µ ) SSE
(c) R ( α µ , β) R(β α , µ ) R( τ α , β, µ ) SSE
All three sets yield the same estimators of σ 2τ and σ 2e . Two different estimators of σ α2 and
σ β2 arise and it is difficult to conclude that which sets of estimators are to be preferred.
4. ML (Maximum Likelihood)
The method is based on the maximizing the likelihood function. For the mixed model, under
the assumption of normality of error terms and random effects we have
y = Xβ + Zu + e ∼ N( Xβ, V)
with
V = ∑ σ 2i Zi Z'i + σ e2 I N = ∑ σ 2i Zi Z'i
i =1 i=0
VI-3
Variance Component Estimation and BLUP
equations can be written in a variety of ways and can be solved iteratively. Despite the
numerical difficulties involved in solving these equations for obtaining ML estimators of
variance components, it is preferred over ANOVA method. The reason is that this method is
well defined and the resulting estimators have attractive, well-known large-sample properties
they are normally distributed and their sampling variances are known.
Computational problems for obtaining solutions are same as that of ML method. The REML
estimation procedure does not, however, include estimating β. On the other hand the REML
equations with balanced data provide solutions that are identical to ANOVA estimators which
are unbiased and have attractive minimum variance properties. In this sense REML is said to
take account of the degrees of freedom involved in estimating the fixed effects, whereas ML
estimators do not. The easiest example of this is the case of a simple sample of n observations
from a N( µ , σ 2 ) distribution. The two estimators of σ2 are
σ 2ML = ∑ ( x i − x) / n
2
σ 2REML = ∑ ( x i − x) / ( n − 1)
2
The REML estimator has taken account of the one degree of freedom required for estimating µ,
whereas the ML estimator has not. The REML estimator is also unbiased, but the ML
estimator is not. In the general case of unbalanced data neither the ML estimator nor the
REML estimators are unbiased.
VI-4
Variance Component Estimation and BLUP
VI-5
Variance Component Estimation and BLUP
2. The robustness of the methods to their starting values (ability to converge given poor
starting values) is the reverse of the rate of convergence. The EM algorithm is better
than Newton-Raphson.
3. The EM algorithm automatically takes care of inequality constraints imposed by the
parameter space. Other algorithms need specialized programming to incorporate
constraints.
4. Newton-Raphson and scoring generate an estimated, asymptotic variance-covariance
matrix for the estimates as a part of their calculations. At the end of the EM iterations,
special programming [perhaps a single step of Newton-Raphson ] needs to be
employed to calculate asymptotic standard errors.
VI-6
Variance Component Estimation and BLUP
Only one MODEL statement is allowed. The BY, CLASS and MODEL statements are
described after the PROC VARCOMP statements.
By statement
BY variables ;
A BY statement can be used with PROC VARCOMP to obtain separate analyses on
observation in groups determined by the BY variables.
CLASS statement
The CLASS statement specifies the classification variables to be used in the analysis.
MODEL statement
MODEL dependents = effects </option> ;
The MODEL statement gives the dependent variables and independent effects. If more than
one dependent is specified, a separate analysis is performed for each one. Only one MODEL
statement is allowed. Only one option is available in the MODEL statement.
FIXED = n
= Tells VARCOMP that the first n effects is the MODEL statement are effects. The
remaining effects are assumed to be random. By default PROC VARCOMP
assumes that all effects are random in the model of Y = A|B/ Fixed = 1
then A x B is considered a random effect.
Example: In this example, A & B are classification variables and Y is the dependent variable.
A is declared fixed, and B and A x B are random.
data a;
input a b y;
cards;
1 1 237
1 1 254
1 1 246
1 2 178
1 2 179
2 1 208
2 1 178
2 1 187
2 2 146
2 2 145
VI-7
Variance Component Estimation and BLUP
2 2 141
3 1 186
3 1 183
3 2 142
3 2 125
3 2 136
;
Proc Varcomp method = type 1;
Class a b;
model y = a|b/ Fixed = 1 ;
run;
Exercise: The data given below is first month milk yield of 28 daughters of 4 sires in 3 herds
Herd Sire Daughter Milk Yield
VI-8
Variance Component Estimation and BLUP
(d) Neither first nor second moments are known and still it is desired to use linear
prediction methods
(i) We never really known parameters, but we may have good prior estimates of them and it
will be
(1) BP when we have good estimates of all parameters
(2) BLP when we have good estimates of first and second moments
(3) BLUP when we have good estimates of the second central moments
(ii) If we have no prior estimates of either first or second moments, we need to estimate them
from the same data that are used for prediction.
In practical situations, mostly problems are of the type in which we assume that the variance
covariance matrix of random variables is known and further it is assumed that records follow
mixed model. Two methods been most frequently used. In the first, a regular least squares
solution is obtained by treating all random variables except an error vector with variance Iσ2 as
fixed. Then the predictor is as linear function of the least square solution. In the second method,
estimates of the fixed estimates of the model are obtained by some method, possibly by regular
VI-9
Variance Component Estimation and BLUP
least square as in the first method, the data are adjusted for the fixed effects and then selection
index methods are applied to the adjusted data as though the adjustment had been made with
known values of fixed effects.
Henderson, (1963) suggested a combination of these methods and described a mixed model
method which resulted simultaneously best linear unbiased estimators of estimable linear
functions of the fixed elements of the model and best linear unbiased predictors of the random
elements of the model.
No assumptions are made concerning the distribution of the random variables, however G and
R are assumed known without error and are non singular.
and the variance-covariance matrix of this function is the matrix of interest since we wish to
minimize each individual diagonal element. To do this we define this matrix algebraically
To ensure that the predictor is unbiased, i.e., has the same expected value as the predictand, we
add a Lagrange Multiplier to the variance-covariance matrix of prediction errors prior to
minimizing the function.
VI-10
Variance Component Estimation and BLUP
We know that
E(K’β + M’u)= K’β
and
E(L’y) = L’Xβ
Thus in order for L’Xβ = K’β for all possible vectors, β, then L’X - K’ = 0 must be true.
Hence the Lagrange Multiplier becomes (L’X - K’)γ.
The function F is differentiated with respect to the unknowns, L and γ, and the derivatives are
equated to zero (null matrices)
∂F
= 2VL - 2ZGM + Xγ = 0
∂ L'
∂F
and = L’X - K’ = 0
∂γ
Note that the second derivative provides the condition which must hold in order for the
prediction to be unbiased.
VI-11
Variance Component Estimation and BLUP
Then
⎡S⎤ ⎡ C11 C12 ⎤ ⎡− M⎤
⎢θ ⎥ = ⎢ C C22 ⎥⎦ ⎢⎣ − K ⎥⎦
⎣ ⎦ ⎣ 12 '
and
RL = -ZS - Xθ
⎡ C11 C12 ⎤ ⎡ M⎤
= - [ Z X] ⎢ ⎥ ⎢ ⎥
⎣ C12 ' C22 ⎦ ⎣ K ⎦
⎡ C11 C12 ⎤ ⎡ M⎤
L = R-1 [ Z X] ⎢ ⎥ ⎢ ⎥
⎣ C12 ' C22 ⎦ ⎣ K ⎦
and
⎡ C11 C12 ⎤ ⎡ Z ' R −1 y ⎤
L’y = [ M ' K '] ⎢ ⎢ −1 ⎥
⎣ C12 ' C22 ⎥⎦ ⎣ X' R y⎦
= M’ u$ + K’ β$
⎡ u$ ⎤
= [ M ' K '] ⎢ $ ⎥
⎣β ⎦
where
−
⎡ u$ ⎤ ⎡ Z ' R −1Z + G −1 Z ' R − 1 X ⎤ ⎡ Z ' R −1 y ⎤
⎢β$ ⎥ = ⎢ −1 ⎥ ⎢ ⎥
⎣ ⎦ ⎣ X' R Z X ' R − 1 X ⎦ ⎣ X ' R −1 y ⎦
or
−
⎡β$ ⎤ ⎡ X' R −1X X ' R − 1 Z ⎤ ⎡ X ' R −1 y ⎤
⎢ ⎥ = ⎢ −1 −1 − 1⎥ ⎢ −1 ⎥
⎣ u$ ⎦ ⎣ Z ' R X Z ' R Z + G ⎦ ⎣ Z ' R y ⎦
These equations are commonly referred to as Henderson’s mixed model equations, and these
provide predictors with the smallest prediction error variances among all linear unbiased
predictors.
This methodology can be extended to various situations like the case of individual model and
model for related sires etc.
VI-12
Variance Component Estimation and BLUP
where β represents fixed effects with xi relating the record on the i-th animal to this vector,
u represents random effects other than breeding values,
zi relates this vector to yi
ai is the additive genetic value of the i-th animal and
ei is a random error associated with the individual record.
If a represents only those animals with records, Za = I. Otherwise it is an identity matrix with
rows deleted that correspond to animals without records.
Var (u) = G
Var (a) = A σ 2a
Var (e) = R, usually I σ 2e
Cov (u,a’) = 0, Cov (u,e’) = 0,Cov (a,e’) = 0
If Za = I, it simplifies to
⎡ X' R −1X X' R −1Z X ' R −1 ⎤ ⎡β o ⎤ ⎡ X' R −1y⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
−1 −1 −1 Z ' R −1 ⎢ Z ' R −1y ⎥
⎢ Z' R X Z' R Z + G ⎥ ⎢ u$ ⎥ =
⎢ R −1X ⎢ −1 ⎥
⎢⎣
−
R Z 1 R + A / σ a ⎥⎥
−1 − 1 2 ⎢ a$ ⎥
⎢⎣ ⎥⎦ ⎢⎣ R y ⎥⎦
⎦
If R = I σ 2e it further simplifies to
⎡ X' X X' Z X' ⎤ ⎡β o ⎤ ⎡ X' y⎤
⎢ −1 2 ⎥ ⎢ ⎥ ⎢ Z' y⎥
⎢ Z' X Z' Z + G σ e Z' ⎥ ⎢ u$ ⎥ = ⎢ ⎥
⎢ a$ ⎥
⎢ X
⎣ Z I + A −1σ 2e / σ 2a ⎥⎦ ⎢⎣ ⎥⎦
⎢⎣ y ⎥⎦
If the number of animals is large, one should, of course, use Henderson’s method (1976) for
computing A-1. Since this method requires using a “base” population of non-inbred, unrelated
animals, some of these probably do not have records. Also we may wish to evaluate some
VI-13
Variance Component Estimation and BLUP
progeny that have not yet made a record. Both of these circumstances will result in Za ≠ I, but
a will contain predicted breeding values of these animals without records.
where β represents fixed effects with xij relating the j-th progeny of the i-th sire to
these effects
si represents the sire effect on the progeny record
u represents other random factors with zij relating these to the ij-th progeny
record
eij is a random “error”
VI-14
Variance Component Estimation and BLUP
S1 S2
S3 S4 S5 S6
There are no progeny on S0. Each sire has two progeny in each of the two contemporary
groups that differ by 100 kg and the calves by each sire within contemporary group differ by
100 kgs. There are a total of 24 calves. Results obtained under different models are as under:
Sires with
progeny 0.00 -3.64 3.64 0.00 1.82 -1.82
data
Groups +
Sires with 0 7.50 1.82 -1.82 2.73 -0.91 0.91 -2.73
progeny
The changes in the sire evaluations as best described by the changes in rank shown in Table 2.
Based on progeny data alone, no distinction can be made between sire one and four. The
addition of groups changes the rank of one, four and six. The addition of relationship creates
more rank changes. Clearly the relationships are the leading contributing factors to the correct
ranking, since three is the best bull by all models and he was the son of one and one is the son
of the base sire.
VI-15
Variance Component Estimation and BLUP
The addition of the relationship matrix does two things for the prediction procedure. 1) It
provides RELATIONSHIP TIES among the animals in different contemporary groups.
Relationship ties do the same thing as reference sire having progeny in many different
contemporary groups. This is an important aspect of including A-1. 2) It also gives
predictions that include the parental half-sib information that is available. The lower the
heritability of the trait, the more important this aspect of including the relationship inverse
becomes. This second aspect is equivalent to the selection index theory approach which
combines sources of information into one predicted value.
References
Dempfle,L.(1977) “Comparison of several sire evaluation methods in dairy cattle breeding,”
Livestock Production Science, pp.129-139.
Goldberger, A.S.(1962) “Best linear unbiased prediction in the generalised linear regression
model,” JASA, 57, pp.369-375.
Harville, D. A.(1976) “Extension of the Gauss-Markov theorem to include the estimation of
random effects,” Ann. Statist., 4, pp.384-395.
Harville,D.A.(1990) “BLUP (Best Linear Unbiased Prediction) and Beyond,”In Advances in
Statistical Methods for Genetic Improvement of Livestock ( D. Gianola and K.
Hammoud, eds.), pp.239-276. Springer, New York.
Henderson, C.R.(1963) “Selection index and expected genetic advance,”In Statistical Genetics
and Plant Breeding, pp.141-163. Nat. Acad. Sci., Nat. Res. Council Publication, 982,
Washington, DC.
Henderson, C.R.(1975) “Best linear unbiased estimation and prediction under a selection
model,” Biometrics, 31, pp.423-447.
Henderson, C.R.(1976) “A simple method for computing the inverse of a numerator
relationship matrix used in prediction used in prediction of breeding values,”
Biometrics, 32, pp.69-83.
Henderson, C.R.(1984) “Applications of Linear Models in Animal Breeding,” University of
Guelph.
Lindley D.V. and Smith, A.F.M.(1972) “Bayes estimates for the linear model”(with
discussion), JRSS, Ser. B, 34, pp.1-41.
Robinson, G.K.(1991) “That BLUP is a good thing: The estimation of random effects,”
Statistical Science, 6(1), pp.15-51.
VI-16