This LaTeX document is available as postscript or asAdobe PDF.
Method 1 was derived by Dr. C. R. Henderson in 1949 as a result of his Ph.D. thesis work at Iowa State University. The method was not published until 1953 along with Methods 2 and 3 in Biometrics in one of Henderson's landmark papers. Method 1 applies the analysis of variance(AOV) procedure for balanced data to unbalanced data. Method 1 is only applicable to models that are completely random, i.e, . Method 1 is theoretically derived to be unbiased and translation invariant but only under a model of completely random mating and no selection. Method 1 is probably the easiest of all methods to calculate.
Definition of Quadratic Forms
Three types of quadratic forms are defined for Henderson's Method 1.
They are
Each of the Q-matrices in Method 1 has the following properties;
Expectations of Quadratic Forms
The expectations of the Method 1 quadratic forms are easy to derive.
Recall that
Now concentrate on
for each of the three types of quadratic
forms. For the Total Sum of Squares,
Obtaining Estimates
Let
be the vector of quadratic forms, i.e.
Traditionally with balanced AOV tables, the sum of squares due to the
mean was usually subtracted from the other sum of squares. Similarly,
find any matrix
such that
and has full row rank, then
An Example
Consider the case of milk yields by daughters of dairy bulls in several different herds. Let s, the number of random factors in the model, be two (herds and sires), and thus, there are three components of variance to estimate. Data for analysis appears in the table below. Milk yields are given in percentage units relative to a breed mean for cows of a certain age and month of calving. The total sum of squares is 390,729.
Herd | Sire | Daughter Milk Yields | Subclass totals | |||
1 | 1 | 157, | 160, | 138 | 455 (3) | |
2 | 96, | 110, | 115, | 120 | 441 (4) | |
3 | 82, | 65, | 147 (2) | |||
4 | 120, | 130, | 110 | 360 (3) | ||
2 | 1 | 140, | 142, | 145 | 427 (3) | |
2 | 122, | 117, | 98 | 337 (3) | ||
3 | 70, | 94, | 164 (2) | |||
3 | 2 | 112, | 125, | 105 | 342 (3) | |
3 | 100, | 92, | 192 (2) | |||
4 | 116, | 129, | 133 | 378 (3) | ||
Totals | Number | Sum | |
By Sire | 1 | 6 | 882 |
2 | 10 | 1120 | |
3 | 6 | 503 | |
4 | 6 | 738 | |
By Herd | 1 | 12 | 1403 |
2 | 8 | 928 | |
3 | 8 | 912 | |
Overall | 28 | 3243 | |
Quadratic Forms
Expectations
Solving for Estimates
Sampling Variances of Estimates
Let TSS, MSS, HSS, and SSS represent the quadratic forms for the total sum of squares, mean, herds, and sires, respectively, then the coefficients of the unknown parameters for computing the variances and covariances of the corresponding quadratic forms are in the following table.
TSS | 56. | 112. | 112. | 544. | 328. | 416. |
TSS,MSS | 2. | 38.8571 | 29.7143 | 196.5714 | 288. | 117.7143 |
TSS,HSS | 6. | 112. | 34.6667 | 544. | 328. | 132.6667 |
TSS,SSS | 8. | 45.6 | 112. | 224.5333 | 328. | 416. |
MSS | 2. | 38.8571 | 29.7143 | 188.7347 | 288.6531 | 110.3673 |
MSS,HSS | 2. | 38.8571 | 29.7143 | 196.5714 | 288. | 110.3810 |
MSS,SSS | 2. | 38.8571 | 29.7143 | 188.8762 | 288. | 117.7143 |
HSS | 6. | 112. | 34.6667 | 544. | 328. | 112.9514 |
HSS,SSS | 2.4111 | 45.6 | 34.6667 | 224.5333 | 328. | 132.6667 |
SSS | 8. | 45.6 | 112. | 193.5867 | 328. | 416. |
Now assume the true components of variance have the following values;
and
Post-multiply the table of coefficients above
by the vector
which correspond to
the columns of the table above,
and store the results in matrix form as
Least Squares Equations
The quadratic forms(except for TSS) and their expectations
can also be obtained from Ordinary Least Squares equations. For the
example data, the OLS equations are
Problem
Apply Method 1 to the following data using a model with
three random factors. Compute the sampling variances of the
estimates.
The main drawback of Method 1 was that only completely random models could be handled. Methods 2 and 3 accommodate mixed models. Method 2 is an unbiased, translation invariant procedure, and can be used with mixed models provided that interactions between fixed and random factors, or nesting of random factors within fixed factors, do not exist in the model.
Method 2 requires the least squares (LS) equations for all factors in the model. A solution vector is obtained by taking a generalized inverse of the coefficient matrix of the LS equations. Because there are an infinite number of possible solution vectors to the LS equations, Searle(1968) maintained for many years that Method 2 was not uniquely defined. That is, different solution vectors could lead to different estimates of the variance components. This was later proven to be incorrect, and Method 2 was proven to be unique if Henderson's directions are followed exactly (Henderson, Searle, and Schaeffer 1974).
The Calculations
Henderson's Method 2 involves the following steps:
Theoretical Development
Let
A generalized inverse is obtained by setting the rows and columns
corresponding to
and
to zero and
inverting the remaining elements.
Let
To apply Method 1 to
requires that the model for
be completely random except for a common mean effect.
The proof of this is given by Henderson et al.(1974). The model
equation for
is
Consequently,
The coefficient of
in the expectation of
is
An Example
Below are the LS equations for a two-way classification model with
one fixed factor and one random factor. Let factor A be the fixed
factor and let factor B be random. The total sum of squares of the
observations was 1,979.
Now adjust
for the estimates of the fixed effects to
give
Suppose that
and
were set to zero rather than
and
in obtaining solutions to the OLS equations.
The reader should verify the following results.
Problems
Method 2 does not allow models to contain interactions between fixed and random factors or nesting of random factors within fixed factors. Henderson's Method 3 is capable of handling general mixed models, but computationally the method is more demanding than Methods 1 or 2. Method 3 has been called 'the fitting constants' method because reductions in sums of squares due to fitting submodels of the full model are used. The method is unbiased and translation invariant, but is not uniquely defined in the sense that more reductions can be computed than are necessary to estimate the variance components.
Method 3 has been shown in some cases to provide more accurate estimates of variance components than Methods 1 or 2. In special design and model situations, Method 3 is equivalent to Methods 1 or 2. The reason that Method 3 gives greater accuracy seems to be due to fitting submodels such that two or more random factors are simultaneously considered rather than singly as would be the case in Methods 1 or 2.
The Calculations
For a general model with s random factors there are 2^{s}possible submodels and reductions that could be computed, but only
s + 2 reductions are needed to estimate the unknown variances.
Specific guidelines for choosing the best (most accurate) set of
s+ 2 reductions out of 2^{s} reductions are not available.
The following rules are recommended to uniquely define Method 3 in all
situations and to hopefully provide the most accurate set of reductions.
The necessary quadratic forms are
a)
,
the total sum of squares,
b) the reduction due to the full model, where the least
squares equations are
To illustrate,
assume a model with three random factors A, B, and C,
then the following reductions should be used.
Notice that each reduction
contains all of the fixed factors in the model.
Suppose that in a two random factor model, factor D is a fixed factor,
but that there is an interaction between D and random factor A, but not
with random factor B. Then use
Expectations of Reductions
Let
An Example
Consider a three factor model with factors A, B, and C where A is a
fixed factor and B and C are random factors, and the LS equations are as
given below:
Problems
Sire | Dam | Sex | Age at | Feed Intake |
Weaning(d) | (mouthfuls) | |||
1 | 1 | 1 | 185 | 775 |
1 | 1 | 1 | 185 | 690 |
1 | 2 | 2 | 190 | 618 |
1 | 2 | 1 | 190 | 631 |
1 | 3 | 2 | 200 | 615 |
1 | 3 | 1 | 200 | 704 |
1 | 4 | 2 | 210 | 573 |
2 | 5 | 1 | 215 | 662 |
2 | 5 | 1 | 215 | 600 |
2 | 6 | 1 | 170 | 526 |
2 | 6 | 2 | 170 | 485 |
2 | 6 | 2 | 170 | 581 |
2 | 7 | 2 | 175 | 494 |
3 | 8 | 1 | 206 | 660 |
3 | 8 | 2 | 206 | 511 |
3 | 9 | 2 | 180 | 500 |
3 | 9 | 1 | 180 | 450 |
3 | 10 | 2 | 194 | 470 |
3 | 10 | 1 | 194 | 505 |
3 | 10 | 1 | 194 | 516 |
3 | 10 | 2 | 194 | 448 |
Assume the model is
Estimate the variance components by Henderson's Method 3. You might want to take deviations of age at weaning from the overall average weaning age of 200-d, to avoid rounding errors.
Absorption of Fixed Effects
The absorption of all fixed effects in the LS equations
into the equations for the random factors
essentially adjusts the right hand sides of the random effects
for the fixed effects,
and consequently any quadratic forms involving the
absorbed right hand sides are translation invariant.
Numerous methods may therefore be defined by using quadratic
forms of the absorbed right hand sides.
The LS equations are
Quadratic Forms Of Absorbed RHS
Let
The variance-covariance matrix of
is
Element of Model | |||
C_{1} | 404.685950 | 6.640496 | 17.863636 |
C_{2} | 1,068.809917 | 12.400826 | 28.772727 |
C_{3} | 1,636.016529 | 34.016529 | 35.454545 |
C_{4} | 2,309.239669 | 28.739669 | 42.590909 |
C_{5} | 801.561984 | 67.243802 | 24.954545 |
D_{1} | 13.057851 | 2,573.921488 | 42.590909 |
D_{2} | 70.198347 | 2,664.925620 | 43.454545 |
D_{3} | 28.611570 | 731.838843 | 23.318182 |
D_{4} | 37.173554 | 863.537190 | 25.272727 |
MIVQUE-0
The most simple set of quadratic forms of r that could be used
are
and
Assume that
= 668,160.62 for this example with r(M)
= 200 - 3 = 197, degress of freedom, then
The above procedure is known as MIVQUE-0. MIVQUE-0 stands for Minimum Variance Quadratic Unbiased Estimation using zeros for prior values of the variance components except . Calling this procedure MIVQUE is very misleading because the estimates from MIVQUE-0 have been shown to have very large sampling variances. However, because the method is very easy to apply, MIVQUE-0 estimates are frequently used as starting values for other methods of estimation. The estimates from MIVQUE-0 are unbiased and translation invariant, but are not minimum variance.
Quadratics Using Diagonals Of LS Equations
Many methods could be described by simply changing the definition
of .
For example,
define
as a diagonal matrix with
corresponding elements equal to
the diagonals of
and the rest to zero.
For the example data, let
Henderson's Method 4
In 1980 Henderson presented a particular set of matrices for the preceding methodology, which was formally described in his book (1984). This method has been given various names such as Henderson's New Method, Diagonal MIVQUE, and Henderson's Method 4. Dempfle et al. (1983) showed that the sampling variances of Method 4 estimates were similar to and sometimes better than those of Method 3 estimates.
Method 4 uses a specific set of quadratic forms. The motivation for these particular forms is that they more closely approximate the quadratic forms used with MIVQUE (which will be discussed later). The specific quadratic forms are, for the example data:
Problem
Below are parts of the OLS equations for a model with five factors.
The first three factors are fixed and the last two factors are
random (each with 6 levels). The total sum of squares was
3,143,224 based on 41 observations.
Estimate the variances using MIVQUE-0, Henderson's Method 4, and another set of quadratic forms of your own choice.
During the late 1960's, statisticians began to develop methods of variance component estimation that had more desirable properties than simply unbiasedness and translation invariance. Two researchers, working independently, derived minimum variance quadratic unbiased estimators (MIVQUE) in 1970. C.R. Rao published four papers in his newly formed journal on MIVQUE for being normally distributed. Rao also derived a method called MINQUE (minimum norm quadratic unbiased estimation) for cases when does not have a normal distribution, but when is normally distributed then minimizing the Euclidean norm, as Rao does, is equivalent to minimizing the variance of the quadratic form which gives MIVQUE. In order to minimize the variance of quadratic forms, the true unknown variances must be known.
At the same time, L.R. LaMotte, at the University of Kentucky, also derived MIVQUE in bulletins which were not formally published in a statistical journal. Henderson(1973) took the MIVQUE method and derived easier computing formulas based upon solutions to the mixed model equations (MME) and upon the elements of the generalized inverse of the coefficient matrix of the MME.
MIVQUE assumes that , the variance-covariance matrix of , is known. The variances of the quadratic forms are minimized for this . Since is not known in reality, some prior information about is utilized, and consequently the variance of quadratic forms is only minimized for this prior . MIVQUE is unbiased and translation invariant and if the prior is the same as the true , then it also has minimum variance. Thus, MIVQUE in general, is not minimum variance in practice, but is only as good as the prior values that are used.
Quadratic Forms
Assume that
has a multivariate normal distribution with mean
vector ,
and variance-covariance matrix .
Let
be the
prior information about
for i = 0 to s,
then let
be our prior
which is
To simplify these quadratic forms notice that
Thus, under the basic
linear model for variance component estimation the necessary quadratic
forms for MIVQUE are
Expectations
The MME are
MIVQUE requires the inverse elements of the MME in order to calculate the expectations of the quadratic forms. In most animal breeding applications, obtaining an inverse of the coefficient matrix could be impossible due to the large order of the equations.
Example
Assume a model with one fixed factor and one random factor. The
least squares equations are given below, and
= 770.
Problem
Below are the OLS equations for some data for a model with
two random factors (B and C) and one fixed factor (A).
The total number of
observations was 200, and the total sum of squares was 1,000,000.
Assume that the prior variance ratios were
Restricted maximum likelihood (REML), was first suggested by Thompson (1962), but was described formally by Patterson and Thompson (1971).
Derivation of REML from MIVQUE
From MIVQUE, the expectation of quadratic forms of solutions to mixed
model equations were as follows:
Likewise,
An Example
Below are the mixed model equations for an example with three factors, two of which are random.
To demonstrate EM REML,
let
and
be the
starting values of the ratios for factors A and
B, respectively. The solution vector is
Derivation of REML from the likelihood function and a description of derivative free REML are given in the regular notes for the course 10-638 Animal Models.
Hartley and Rao (1967) described the maximum likelihood approach for the estimation of variance components.
Derivation
Assume that
is normally distributed with mean
and
variance-covariance matrix
and follows the basic linear
mixed model. Except for a constant, the log likelihood function of
is
Example With One Random Factor
The computations for ML become fairly simple with only one random
factor in the model. In that case,
is a diagonal matrix and its
elements can be readily computed. To illustrate, consider the least
squares equations below for a model with fixed herd effects, H, and
random sire effects, S, in an analysis of milking speed observations.
One Random Factor Nested Within Another
In a two random factor model we have
To illustrate ML for this type of model, consider the previous milking speed example expanded to include cow effects (i.e. daughters within sires in the table below). Also, some cows are allowed to have more than one observation.
Number of | Total | |||
Cow | Sire | Herd | Observations | 2-min yield(kg) |
A | 1 | 1 | 2 | 9.9 |
B | 1 | 1 | 3 | 20.0 |
C | 2 | 1 | 2 | 9.5 |
D | 2 | 1 | 2 | 8.8 |
E | 2 | 1 | 1 | 5.6 |
F | 3 | 1 | 1 | 4.7 |
G | 4 | 1 | 3 | 12.4 |
H | 4 | 1 | 2 | 8.9 |
I | 1 | 2 | 1 | 5.5 |
J | 1 | 2 | 2 | 10.1 |
K | 1 | 2 | 1 | 2.2 |
L | 2 | 2 | 2 | 10.2 |
M | 2 | 2 | 2 | 10.3 |
N | 2 | 2 | 1 | 5.6 |
O | 3 | 2 | 2 | 8.6 |
P | 3 | 2 | 2 | 10.2 |
Q | 4 | 2 | 2 | 7.8 |
R | 4 | 2 | 2 | 8.1 |
S | 4 | 2 | 2 | 8.2 |
T | 4 | 2 | 1 | 4.3 |
U | 5 | 2 | 2 | 7.0 |
V | 5 | 2 | 2 | 8.6 |
The EM Algorithm
EM stands for Expectation Maximization. From
Searle, Casella, and McCulloch (1992) the following explanation
is given.
The procedure alternates between calculating conditional expected values
and maximizing simplified likelihoods. The actual data
are
called the incomplete data in the EM algorithm, and the complete
data are considered to be
and the unobservable random
effects,
.
If we knew the realized values of the
unobservable random effects then their variance would be the average
of their squared values, i.e.,
The steps of the EM algorithm are as follows:
The EM algorithm can be applied to REML estimation as well. Just
replace
with
and replace with
and note that
In the process of making up an exam question for students in my Variance Component Estimation course in 1986 I discovered a quadratic form whose pseudo expectation (following the method of deriving REML from MIVQUE expectations of quadratic forms) did not involve the inverse elements of the mixed model equations. The estimators were also translation invariant because they were based on the absorbed right hand sides of the mixed model equations. The biggest advantage of these estimators was their computational ease. Unfortunately, with more than one random factor in the model, the 'quadratic forms' can be shown to be bilinear forms and hence not always guaranteed to be positive definite. The method, similar to most other methods, was found to give severely biased estimates when data were selected, as in typical animal breeding situations (Ouweltjes et al. 1988).
Van Raden (1986) expanded the Schaeffer (1986) quadratic forms for the case with relationships among animals, which could also be used in the basic linear model for variance components.
Derivation
Absorb the fixed effects into the
equations for the random factors in the mixed model equations. The
absorbed right hand side for the i^{th} random factor is then
The usual expectation of
is
Van Raden's Method
Van Raden (1986) proposed the following quadratic form,
,
where
and
,
then the pseudo expectation is
An Example
Assume a model with two random factors and let . As a priori values let and . The rank of the fixed effects matrix, , is 4 and the total number of observations is 18. The first random factor has 3 levels and the second random factor has 6 levels. The absorbed coefficient matrix is
The absorbed right hand sides, solutions, and are
Factor | |||
1 | 128.8214 | 9.6695 | 10.2182 |
155.3571 | 9.0836 | 12.5000 | |
-284.1786 | -18.7531 | -22.5412 | |
2 | 6.8905 | .0750 | .6745 |
.5000 | 1.5094 | .0526 | |
174.2500 | 14.8447 | 17.8718 | |
-157.6786 | -12.1682 | -16.4127 | |
49.7976 | 5.1919 | 4.4738 | |
-73.6786 | -9.4528 | -7.6692 |
Problem
Below are data following a model with one fixed factor(A) and two random factors(T and R). Apply EM REML, DFREML, ML, I-MIVQUE (if they stay positive), Pseudo-Expectation method, and Van Raden's method to these data and compare results.
A_{i} | T_{j} | R_{k} | y_{ijkl} | A_{i} | T_{j} | R_{k} | y_{ijkl} | A_{i} | T_{j} | R_{k} | y_{ijkl} |
1 | 1 | 1 | 50 | 2 | 1 | 1 | 65 | 3 | 1 | 1 | 80 |
1 | 1 | 1 | 54 | 2 | 1 | 2 | 76 | 3 | 1 | 1 | 83 |
1 | 1 | 2 | 59 | 2 | 1 | 2 | 73 | 3 | 1 | 1 | 78 |
1 | 1 | 2 | 61 | 2 | 1 | 2 | 75 | 3 | 1 | 2 | 95 |
1 | 1 | 3 | 68 | 2 | 1 | 3 | 84 | 3 | 1 | 3 | 102 |
1 | 1 | 4 | 79 | 2 | 1 | 3 | 87 | 3 | 1 | 3 | 97 |
1 | 1 | 4 | 82 | 2 | 1 | 4 | 98 | 3 | 1 | 4 | 109 |
1 | 2 | 1 | 41 | 2 | 1 | 4 | 93 | 3 | 1 | 4 | 112 |
1 | 2 | 2 | 52 | 2 | 2 | 1 | 53 | 3 | 2 | 2 | 81 |
1 | 2 | 3 | 64 | 2 | 2 | 3 | 77 | 3 | 2 | 2 | 79 |
1 | 2 | 4 | 69 | 2 | 2 | 3 | 74 | 3 | 2 | 4 | 101 |
1 | 2 | 4 | 73 | 2 | 2 | 4 | 86 | 3 | 3 | 1 | 63 |
1 | 2 | 4 | 70 | 2 | 3 | 2 | 57 | 3 | 3 | 1 | 57 |
1 | 3 | 3 | 47 | 2 | 3 | 2 | 54 | 3 | 3 | 3 | 75 |
1 | 3 | 3 | 46 | 2 | 3 | 2 | 51 | 3 | 3 | 3 | 84 |
1 | 3 | 4 | 62 | 2 | 3 | 3 | 66 | 3 | 3 | 4 | 93 |
1 | 3 | 4 | 57 | 3 | 3 | 4 | 86 | ||||
1 | 3 | 4 | 66 |
This method was first reported through the Animal Geneticist's Discussion Group in 1993. A paper was subsequently published in the Journal of Animal Science, 1994, 72:2247-2253, by A. Reverter, B. L. Golden, R. M. Bourdon, and J. S. Brinks. The method came about as a result of a study on the change in genetic evaluations as animals acquire more progeny as in routine beef cattle evaluations. The method is based on the linear regression of recent solutions (based on all the data) on previous solutions (based on a subset of all data). The expected value of the regression equals 1 regardless of the distribution of observations or predictions. If the wrong ratio of variance components is used in the mixed model equations, then the computed value of the regression deviates from the expected value of one. If the computed value is greater than 1, then the h^{2} was too low. The method is iterative, in that the variance ratio is changed until the regression equals one.
As with the pseudo expectation method, Method R is ad hoc in nature and the properties of the method can only be established through simulation or empirical evidence. Schenkel (1998) has shown that Method R also suffers from bias due to non-random mating and when pedigree files are not complete, as does nearly every method.
Development
To simplify presentation assume only one random factor in the model.
Denote the data and model for subset j by
The j^{th} subset is constructed by randomly choosing a sample one half of the data. This should probably be a random sample half of the contemporary groups, so that animals in a contemporary group stay intact. The idea of Method R is to obtain a number of subsets from the same complete data set, say 10 or more, and to estimate the variances from each subset. Then the end result would be to average the 10 estimates together and calculate their SD in order to get an idea of the variability of the estimates due to sample size.
Now consider the i^{th} element of
from the two
analyses above.
An Example
Illustrating Method R really requires a fairly large data set, in order to achieve convergence. This example merely shows the calculations that should be followed, but the example does not converge to a good final estimate. Below are data on progeny of ten sires, for a model with just a mean and random (unrelated) sire and residual effects. The left half of the table represents the data from one subset and the right half represents the complete data.
Sire | Subset 1 | Complete Data | ||
Number | n_{i} | y_{i.} | n_{i} | y_{i.} |
1 | 85 | 11,462 | 213 | 29,218 |
2 | 54 | 6,819 | 182 | 23,745 |
3 | 112 | 15,153 | 121 | 16,456 |
4 | 90 | 11,115 | 191 | 23,637 |
5 | 144 | 16,625 | 307 | 35,666 |
6 | 192 | 26,203 | 298 | 40,720 |
7 | 136 | 17,957 | 237 | 30,944 |
8 | 83 | 10,151 | 226 | 27,899 |
9 | 174 | 20,700 | 338 | 39,865 |
10 | 161 | 21,346 | 274 | 36,483 |
The solutions for sires within each data set are given below, for a variance ratio of 10.
Sire | ||
Number | Subset 1 | Complete |
1 | 6.3132 | 8.2960 |
2 | -1.2768 | 1.8758 |
3 | 6.8886 | 6.9384 |
4 | -3.8619 | -4.4988 |
5 | -11.5384 | -11.9239 |
6 | 8.2531 | 7.8913 |
7 | 3.9549 | 1.9931 |
8 | -4.8995 | -4.8277 |
9 | -8.3459 | -10.2414 |
10 | 4.5125 | 4.4973 |
From these solutions, then
The calculations need to be repeated for other subsets of the same data set. The Method R estimate is then the average of the estimates from each subset. Hence there is lots of computing that must be done. However, the only requirement is for a program that computes genetic evaluations for a particular model, and these are readily available. Computationally, Method R is very attractive, and for randomly mating populations Method R will give estimates close to the true values. One question is the number of subsets that should be chosen. This depends on the size of the complete data set and the amount of time to solve equations. The more subsets one has, the better should be the average of the estimates. The authors claim that the estimates, from their experience, are similar to REML estimates, but the proof is rather loose in a technical sense. The covariances and variances in the regression utilize the same quadratic forms and involve the same inverse elements of the mixed model equations as in EM REML. Method R could be a method for obtaining an initial prior value for use in Bayesian or REML methodology, rather than as a stand-alone method. This is due to its problems with incomplete pedigrees and non random matings, and because the properties can not be proven except empirically.
These notes are terribly outdated now, with the use of DFREML and Bayesian methods, but they may be of interest to those that work with problems of convergence of estimation systems.
Iterative methods of variance component estimation such as REML and ML are known to converge at very slow rates towards a final solution via the EM algorithm. For large datasets, the rate of convergence could affect computing costs substantially. Rates of convergence seem to be very data dependent and even specific to particular traits within a dataset. In REML, iterative MIVQUE is known to converge faster than the REML EM algorithm. Convergence is also adversely affected by the number of random factors in the model. Then there is always the question whether the system will converge or not. Hence the researcher needs to know a) will the system converge, and b) can the rate of convergence be improved, if the system does converge.
Below are least squares equations for a model with two fixed
factors and one random factor
Iteration | |||||
Number(i) | |||||
1 | 9138.59 | 659.45 | 13.85793 | -1.14207 | ---- |
2 | 9131.43 | 709.03 | 12.87875 | -0.97918 | 1.16635 |
3 | 9124.82 | 757.03 | 12.05338 | -0.82537 | 1.18635 |
4 | 9118.87 | 802.14 | 11.36816 | -0.68522 | 1.20453 |
5 | 9113.66 | 843.32 | 10.80684 | -0.56132 | 1.22073 |
6 | 9109.19 | 879.92 | 10.35226 | -0.45458 | 1.23481 |
7 | 9105.44 | 911.67 | 9.98767 | -0.36459 | 1.24683 |
8 | 9102.33 | 938.62 | 9.69760 | -0.29007 | 1.25690 |
9 | 9099.80 | 961.08 | 9.46833 | -0.22927 | 1.26519 |
10 | 9097.76 | 979.51 | 9.28809 | -0.18024 | 1.27203 |
20 | 9090.85 | 1044.20 | 8.70603 | -0.01422 | 1.29466 |
30 | 9090.30 | 1049.49 | 8.66166 | -0.00106 | 1.29245 |
40 | 9090.26 | 1049.88 | 8.65838 | -0.00008 | 1.29750 |
50 | 9090.26 | 1049.91 | 8.65813 |
Iteration | |||||
Number(i) | |||||
1 | 9145.94 | 622.81 | 14.68508 | -0.31492 | ---- |
2 | 9143.95 | 635.96 | 14.37814 | -0.30694 | 1.02600 |
3 | 9141.94 | 649.31 | 14.07958 | -0.29856 | 1.02807 |
4 | 9139.94 | 662.81 | 13.78974 | -0.28984 | 1.03009 |
5 | 9137.94 | 676.44 | 13.50875 | -0.28079 | 1.03223 |
6 | 9135.95 | 690.16 | 13.23748 | -0.27147 | 1.03433 |
7 | 9133.97 | 703.94 | 12.97557 | -0.26191 | 1.03650 |
8 | 9132.01 | 717.73 | 12.72341 | -0.25216 | 1.03867 |
9 | 9130.08 | 731.51 | 12.48112 | -0.24229 | 1.04074 |
10 | 9128.17 | 745.23 | 12.24881 | -0.23231 | 1.04296 |
20 | 9111.59 | 870.89 | 10.46233 | ||
30 | 9100.81 | 958.84 | 9.49152 | ||
40 | 9095.07 | 1007.74 | 9.02521 | ||
50 | 9092.60 | 1027.79 | 8.8468 |
The final estimates were , , and .
Convergence Curves
The following results are due mainly to Dr. Ignacy Misztal. A plot of the estimated variance ratio against iteration number would show a nonlinear relationship. The curve would appear to be asymptoting to a final value if the system was converging, as in the example. This curve may be characterized by a model, and therefore, the asymptotic value could be predicted if the parameters of that model were known. Let
For MIVQUE let
Common Intercept Approach
The Common Intercept Approach (CIA) was suggested in 1979 by Dr. Lynn Johnson (then a postdoctoral research associate with Dr. Burnside) and has been derived from the nonlinear model of the previous section. The purpose of CIA was to reduce the number of iterations needed to attain convergence in any iterative procedure. In addition, CIA can be used to determine if an iterative procedure will converge for a given data set.
Let
represent one starting value and let
represent a different starting value. The corresponding
changes in value after one iteration are
and
.
Then
Using the example data and REML, from Table 2, let
Thus, two iterations with different starting values are needed to obtain
a prediction of the converged value. The closer the two starting values
are to the actual converged estimate, then CIA will work better.
For example, using REML, with
and
then
If the estimate of from CIA is negative, then this indicates that convergence may not be possible. Select new starting ratios that are closer to each other and retry the CIA. If the estimate of the converged value remains negative, then the system will likely not converge. Use of an unbiased, non-iterative procedure on the same data will likely produce negative estimates of one or more variance components. In any event, the negative value of should not be used as the next prior value in the iterative procedure.
Aitken's Approximation
Another method of determining the converged value is
Aitken's approximation given by the following formula:
Relaxation Factors
Relaxation factors attempt to speed convergence.
The success of relaxation factors depends upon the magnitude of the
factor used. If the factor is too large, then changes between
iterations may be too disruptive for the system and convergence may not
be achieved any faster, if at all. If the factor is too small, then
the increased rate of convergence may not be noticeable. If
is the relaxation factor, then the appropriate starting ratio for
the (i+1) iterate is not
,
but is
Usually a factor between 0 and 1 is used. Let in the example, then in REML = 14.68508 and = 15, and therefore = 14.37016. Thus, only one iteration was skipped, and overall probably half the iterations would be needed. Care needs to be used in choosing a value of because it could be dependent on the data structure and the observation values themselves. Thus, using for all situations may not give the best results. Some test runs with different values may be needed.
The linear model that has been assumed for variance component estimation to this point has variance-covariance matrices of all random factors of the form , where c is the variance to be estimated, and covariances between any two random factors were always null. With animal models and maternal genetic effects, these assumptions are not met, and some of the previous methods cannot be used. Methods applicable to more general models are MIVQUE, REML, ML, Method R and Bayesian methods.
Covariances Between Two Random Factors
Assume the following model and example problem where
Depending on the method of estimation applied to ,
the quadratic
forms will likely contain
in their expectations. If
then
y | Level of b | Level of | Level of |
145 | 1 | 1 | 2 |
84 | 1 | 2 | 4 |
83 | 1 | 3 | 4 |
75 | 1 | 4 | 5 |
122 | 1 | 1 | 5 |
100 | 2 | 2 | 3 |
172 | 2 | 3 | 2 |
102 | 2 | 4 | 2 |
158 | 2 | 1 | 4 |
156 | 2 | 2 | 5 |
84 | 2 | 3 | 5 |
133 | 3 | 4 | 3 |
139 | 3 | 1 | 3 |
173 | 3 | 2 | 3 |
117 | 3 | 3 | 4 |
187 | 3 | 4 | 2 |
150 | 3 | 1 | 2 |
160 | 3 | 2 | 5 |
MIVQUE
Let
REML
REML is a biased procedure because the estimates are kept within the
allowable parameter space. Hence, the estimated correlation between the
two random vectors should be between -1 to +1. REML uses the same
quadratic forms as MIVQUE, but also
The course notes have descriptions of methods of variance component estimation for maternal effects and multiple trait models, and will not be reproduced here.
This LaTeX document is available as postscript or asAdobe PDF.
Larry Schaeffer