This LaTeX document is available as postscript or asAdobe PDF.

Henderson Method 1
L. R. Schaeffer, April 27, 1999

Method 1 was derived by Dr. C. R. Henderson in 1949 as a result of his Ph.D. thesis work at Iowa State University. The method was not published until 1953 along with Methods 2 and 3 in Biometrics in one of Henderson's landmark papers. Method 1 applies the analysis of variance(AOV) procedure for balanced data to unbalanced data. Method 1 is only applicable to models that are completely random, i.e, . Method 1 is theoretically derived to be unbiased and translation invariant but only under a model of completely random mating and no selection. Method 1 is probably the easiest of all methods to calculate.

Three types of quadratic forms are defined for Henderson's Method 1. They are

for i=1 to s is the number of random factors in the model. The quadratic form is

Note that is a diagonal matrix whose elements are the number of observations in each level of the ith random factor, and that elements of are the sums of observations within each level of the ith random factor. Let nik. represent the kth diagonal element of , then where qi is the number of levels of the ith random factor.

Each of the Q-matrices in Method 1 has the following properties;

1.

for m = 1 to s+2. That is, each -matrix has elements which add to 1 within each row, and
2.

for m = 1 to s+2. That is, each -matrix is idempotent.
The first property is necessary for Method 1 to give translation invariant estimates of the variance components, and the second property allows the expectations of the quadratic forms to simplify to easy computing formulas.

The expectations of the Method 1 quadratic forms are easy to derive. Recall that

For Method 1 we have so that

and because for all of the Method 1 quadratic forms, then

Hence, all quadratic forms in Method 1 have as part of their expectations. Consequently, when differences are taken among these quadratic forms, then is eliminated, and the estimators are translation invariant.

Now concentrate on for each of the three types of quadratic forms. For the Total Sum of Squares,

Trace of is the sum of the nik for the qi levels of factor iwith Hence, the expectation of in Method 1 is always

For the Sum of Squares due to the Mean,

Thus, the expectation of the sum of squares due to the mean in Method 1 is always

For the Sum of Squares for each random factor,

where nkh is the number of observations within the khth subclass between factors i and j. When i equals jin the summations, then nk1 = nk. and the entire coefficient simplifies to N. The coefficient of simplifies to qj.

Obtaining Estimates

Let be the vector of quadratic forms, i.e.

and let = , then

where a typical element of is for the jth quadratic form and the ith component.

Traditionally with balanced AOV tables, the sum of squares due to the mean was usually subtracted from the other sum of squares. Similarly, find any matrix such that and has full row rank, then

To estimate then equate to its expectation so that

However, we could have obtained estimates without using by

An Example

Consider the case of milk yields by daughters of dairy bulls in several different herds. Let s, the number of random factors in the model, be two (herds and sires), and thus, there are three components of variance to estimate. Data for analysis appears in the table below. Milk yields are given in percentage units relative to a breed mean for cows of a certain age and month of calving. The total sum of squares is 390,729.

Example dairy data for Method 1 analysis.
 Herd Sire Daughter Milk Yields Subclass totals 1 1 157, 160, 138 455 (3) 2 96, 110, 115, 120 441 (4) 3 82, 65, 147 (2) 4 120, 130, 110 360 (3) 2 1 140, 142, 145 427 (3) 2 122, 117, 98 337 (3) 3 70, 94, 164 (2) 3 2 112, 125, 105 342 (3) 3 100, 92, 192 (2) 4 116, 129, 133 378 (3)

 Totals Number Sum By Sire 1 6 882 2 10 1120 3 6 503 4 6 738 By Herd 1 12 1403 2 8 928 3 8 912 Overall 28 3243

Expectations

Solving for Estimates

Premultiply both sides of the above by where

Ignoring the column for which becomes all zeros, then we have

The resulting estimates are

Sampling Variances of Estimates

Let TSS, MSS, HSS, and SSS represent the quadratic forms for the total sum of squares, mean, herds, and sires, respectively, then the coefficients of the unknown parameters for computing the variances and covariances of the corresponding quadratic forms are in the following table.

 TSS 56. 112. 112. 544. 328. 416. TSS,MSS 2. 38.8571 29.7143 196.5714 288. 117.7143 TSS,HSS 6. 112. 34.6667 544. 328. 132.6667 TSS,SSS 8. 45.6 112. 224.5333 328. 416. MSS 2. 38.8571 29.7143 188.7347 288.6531 110.3673 MSS,HSS 2. 38.8571 29.7143 196.5714 288. 110.3810 MSS,SSS 2. 38.8571 29.7143 188.8762 288. 117.7143 HSS 6. 112. 34.6667 544. 328. 112.9514 HSS,SSS 2.4111 45.6 34.6667 224.5333 328. 132.6667 SSS 8. 45.6 112. 193.5867 328. 416.
Each line above would also have a function, , associated with it which will be identical in each term (variance or covariance). These terms have been ignored since they will disappear when differences between pairs of quadratic forms are taken.

Now assume the true components of variance have the following values; and Post-multiply the table of coefficients above by the vector which correspond to the columns of the table above, and store the results in matrix form as

Using the same as before for taking differences in quadratic forms, then

and if we let

then and

The variance of is then 2.2956 provided the true variances are , , and . The covariance between and is -.3218, etc. For this small example, the variances of the estimates are quite large. Another set of true values could be used to derive if desired. Few researchers ever compute the sampling variances of their estimates due to the complexity and number of the traces that must be computed.

Least Squares Equations

The quadratic forms(except for TSS) and their expectations can also be obtained from Ordinary Least Squares equations. For the example data, the OLS equations are

The first step is to calculate the variance-covariance matrix of the right hand sides. Note that, for herds,

so that

and similarly for sires,

so that

Then

The -matrices for are diagonal matrices. For the mean sum of squares, MSS,

and for the herd and sire sum of squares,

and

The expectations of these quadratic forms are given by

Then you need to add the total sum of squares and its expectation and proceed as before.

Problem

Apply Method 1 to the following data using a model with three random factors. Compute the sampling variances of the estimates.

Henderson Method 2
L. R. Schaeffer, April 27, 1999

The main drawback of Method 1 was that only completely random models could be handled. Methods 2 and 3 accommodate mixed models. Method 2 is an unbiased, translation invariant procedure, and can be used with mixed models provided that interactions between fixed and random factors, or nesting of random factors within fixed factors, do not exist in the model.

Method 2 requires the least squares (LS) equations for all factors in the model. A solution vector is obtained by taking a generalized inverse of the coefficient matrix of the LS equations. Because there are an infinite number of possible solution vectors to the LS equations, Searle(1968) maintained for many years that Method 2 was not uniquely defined. That is, different solution vectors could lead to different estimates of the variance components. This was later proven to be incorrect, and Method 2 was proven to be unique if Henderson's directions are followed exactly (Henderson, Searle, and Schaeffer 1974).

The Calculations

Henderson's Method 2 involves the following steps:

Step 1.
Construct LS equations for all factors (fixed and random) in the model. Let and , then solve

Step 2.
Adjust for from the OLS equations in step 1. Let

Step 3.
Apply Method 1 to and adjust the coefficients of in the expectations of all quadratic forms to allow for the change in from .

Theoretical Development

Let

such that the following rank requirements are met,

Thus, This implies that the fewest possible rows and columns of Z are set to zero in obtaining a generalized inverse of the OLS coefficient matrix. Then

A generalized inverse is obtained by setting the rows and columns corresponding to and to zero and inverting the remaining elements. Let

then

The solutions are

An unbiased estimate of the residual variance must be obtained, such as

Each observation must now be adjusted for , as

Each observation does not need to be adjusted separately. Note that

for the jth factor in the model, and that and are available from the LS equations. Hence, only the right hand sides of the LS equations need be adjusted.

To apply Method 1 to requires that the model for be completely random except for a common mean effect. The proof of this is given by Henderson et al.(1974). The model equation for is

where

Fortunately,

So that the model for is

Proof that TXb = is also given in Henderson et al.(1974), and this is the key to showing that Method 2 gives unique estimates of variance components regardless of the generalized inverse used to solve the OLS equations.

Consequently,

The variance-covariance matrix of is slightly different from that of the original observation vector . This difference causes changes to the expectations of the Method 1 quadratic forms where is involved.

The coefficient of in the expectation of is

Thus, we have the usual coefficient plus an additional term. Similarly, the coefficient of in the expectation of where is the design matrix of the jth random factor of the model, is

The coefficients of the other variance components in the expectations are the same as in the usual Method 1.

An Example

Below are the LS equations for a two-way classification model with one fixed factor and one random factor. Let factor A be the fixed factor and let factor B be random. The total sum of squares of the observations was 1,979.

Using the partitioning given in the previous section, then let

and similarly for the design matrices X and Z. The solutions are

The solutions times their corresponding right hand side elements gives

and an estimate of the residual variance component is

This estimate of is the same for any generalized inverse of .

Now adjust for the estimates of the fixed effects to give

where is the design matrix for factor B. The expectations of the next two quadratic forms are

where

In this example,

The resulting equations to solve are

Let

then which gives

giving estimates of , and .

Suppose that and were set to zero rather than and in obtaining solutions to the OLS equations. The reader should verify the following results.

Problems

1.
Below are the OLS equations for a 2-way model with factors A(4 levels) and B(5 levels). Consider factor A as fixed and factor B as random. The total sum of squares of the observations was 43,000. Estimate the variance components by Method 2.

2.
Prove that . (Not Easy)

3.
Method 2 has often been done incorrectly in the literature from 1953 to 1970. The researchers would estimate the fixed effects as

and used this estimate to adjust the observation vector, . Derive the correct expectations of and where

and s=1.

Henderson Method 3
L. R. Schaeffer, April 27, 1999

Method 2 does not allow models to contain interactions between fixed and random factors or nesting of random factors within fixed factors. Henderson's Method 3 is capable of handling general mixed models, but computationally the method is more demanding than Methods 1 or 2. Method 3 has been called 'the fitting constants' method because reductions in sums of squares due to fitting submodels of the full model are used. The method is unbiased and translation invariant, but is not uniquely defined in the sense that more reductions can be computed than are necessary to estimate the variance components.

Method 3 has been shown in some cases to provide more accurate estimates of variance components than Methods 1 or 2. In special design and model situations, Method 3 is equivalent to Methods 1 or 2. The reason that Method 3 gives greater accuracy seems to be due to fitting submodels such that two or more random factors are simultaneously considered rather than singly as would be the case in Methods 1 or 2.

The Calculations

For a general model with s random factors there are 2spossible submodels and reductions that could be computed, but only s + 2 reductions are needed to estimate the unknown variances. Specific guidelines for choosing the best (most accurate) set of s+ 2 reductions out of 2s reductions are not available. The following rules are recommended to uniquely define Method 3 in all situations and to hopefully provide the most accurate set of reductions. The necessary quadratic forms are a) , the total sum of squares, b) the reduction due to the full model, where the least squares equations are

then

as in Method 2, and c) reductions due to omitting one random factor at a time from the model. Let

with omitted from the model and let be the corresponding design matrix with omitted. Then the reduction due to this submodel is

There are s such reductions with one random factor omitted at a time.

To illustrate, assume a model with three random factors A, B, and C, then the following reductions should be used. Notice that each reduction contains all of the fixed factors in the model.

Now assume a model with two random factors and interaction, then use

Note that when interactions are in the model then

and therefore some slight differences in choice of reductions is necessary when interactions are present in the model.

Suppose that in a two random factor model, factor D is a fixed factor, but that there is an interaction between D and random factor A, but not with random factor B. Then use

In this case

and

Expectations of Reductions

Let

The expectation is then

From the theory of generalized inverses,

and consequently,

and

Also, the coefficient of is

but we know that is idempotent and the trace of an idempotent matrix is equal to the rank of the matrix, so that . Therefore,

An Example

Consider a three factor model with factors A, B, and C where A is a fixed factor and B and C are random factors, and the LS equations are as given below:

The reductions and their expectations are

Summarizing the above,

Taking differences from eliminates the term from all quadratic forms and gives

and the solutions are , , and . Notice the zeros in the above expectation matrix as a result of taking differences from . This helps to reduce the sampling variances of the estimated variance components over other possible sets of reductions that could be used in Method 3.

Problems

1.
Below are feed consumption data on progeny of 3 sires and 10 dams over a ten day period after weaning. Age at weaning is known to influence feed consumption during this period.

Table of feed consumption data.
 Sire Dam Sex Age at Feed Intake Weaning(d) (mouthfuls) 1 1 1 185 775 1 1 1 185 690 1 2 2 190 618 1 2 1 190 631 1 3 2 200 615 1 3 1 200 704 1 4 2 210 573 2 5 1 215 662 2 5 1 215 600 2 6 1 170 526 2 6 2 170 485 2 6 2 170 581 2 7 2 175 494 3 8 1 206 660 3 8 2 206 511 3 9 2 180 500 3 9 1 180 450 3 10 2 194 470 3 10 1 194 505 3 10 1 194 516 3 10 2 194 448

Assume the model is

where yijkl is feed intake, is an overall mean, b(Age) is a covariate of feed intake on age at weaning, Xi is a sex of offspring effect (fixed), Sj is a random sire effect, Djk is a random dam effect, and eijkl is a random residual effect.

Estimate the variance components by Henderson's Method 3. You might want to take deviations of age at weaning from the overall average weaning age of 200-d, to avoid rounding errors.

2.
Suppose we have data to be analyzed by the following model

where the fixed factors of the model are and Ai. Write all of the possible reductions for Method 3 that could be used with this model, and mark those that you would use to estimate the variances.

Other Simple Unbiased Methods
L. R. Schaeffer, April 27, 1999

Absorption of Fixed Effects

The absorption of all fixed effects in the LS equations into the equations for the random factors essentially adjusts the right hand sides of the random effects for the fixed effects, and consequently any quadratic forms involving the absorbed right hand sides are translation invariant. Numerous methods may therefore be defined by using quadratic forms of the absorbed right hand sides. The LS equations are

Let

and partition Z into , then the absorbed random effects equations are

To illustrate various methods, suppose we have the following LS equations for a model with two random factors (C and D) and two fixed factors (A and B).

Absorbing the fixed effects gives (each element is multiplied by 11)

Let

The variance-covariance matrix of is

Let , then a quadratic form of has expectation

These expectations are easy to calculate when Q is a diagonal matrix, in which case only the diagonals of and are needed, and these are multiplied times the diagonal elements of . Below are the diagonal elements of the matrices needed to compute the expectations of quadratic forms of the absorbed right hand sides for the example data.
 Element of Model C1 404.685950 6.640496 17.863636 C2 1,068.809917 12.400826 28.772727 C3 1,636.016529 34.016529 35.454545 C4 2,309.239669 28.739669 42.590909 C5 801.561984 67.243802 24.954545 D1 13.057851 2,573.921488 42.590909 D2 70.198347 2,664.925620 43.454545 D3 28.611570 731.838843 23.318182 D4 37.173554 863.537190 25.272727

MIVQUE-0

The most simple set of quadratic forms of r that could be used are and

where diag indicates a diagonal matrix whose diagonals are the elements within the parentheses.

Assume that = 668,160.62 for this example with r(M) = 200 - 3 = 197, degress of freedom, then

The resulting equations are

which gives , and

The above procedure is known as MIVQUE-0. MIVQUE-0 stands for Minimum Variance Quadratic Unbiased Estimation using zeros for prior values of the variance components except . Calling this procedure MIVQUE is very misleading because the estimates from MIVQUE-0 have been shown to have very large sampling variances. However, because the method is very easy to apply, MIVQUE-0 estimates are frequently used as starting values for other methods of estimation. The estimates from MIVQUE-0 are unbiased and translation invariant, but are not minimum variance.

Quadratics Using Diagonals Of LS Equations

Many methods could be described by simply changing the definition of . For example, define as a diagonal matrix with corresponding elements equal to the diagonals of and the rest to zero. For the example data, let

The expectations are

The resulting estimates using the above quadratics with are

Henderson's Method 4

In 1980 Henderson presented a particular set of matrices for the preceding methodology, which was formally described in his book (1984). This method has been given various names such as Henderson's New Method, Diagonal MIVQUE, and Henderson's Method 4. Dempfle et al. (1983) showed that the sampling variances of Method 4 estimates were similar to and sometimes better than those of Method 3 estimates.

Method 4 uses a specific set of quadratic forms. The motivation for these particular forms is that they more closely approximate the quadratic forms used with MIVQUE (which will be discussed later). The specific quadratic forms are, for the example data:

= diagonals of , plus zeros for the rows corresponding to other random factors, where kC = a guess of the value of , and
= diagonals of , plus zeros for the rows corresponding to other random factors, where kD = a guess of the value of .
Assume that kC = 16 and that kD = 2, then

Completing the calculations,

The estimates, using and the above quadratics, are

In all of the methods in this section, can be replaced by any unbiased estimate of

Problem

Below are parts of the OLS equations for a model with five factors. The first three factors are fixed and the last two factors are random (each with 6 levels). The total sum of squares was 3,143,224 based on 41 observations.

and the right hand sides are

Estimate the variances using MIVQUE-0, Henderson's Method 4, and another set of quadratic forms of your own choice.

MIVQUE
L. R. Schaeffer, April 27, 1999

During the late 1960's, statisticians began to develop methods of variance component estimation that had more desirable properties than simply unbiasedness and translation invariance. Two researchers, working independently, derived minimum variance quadratic unbiased estimators (MIVQUE) in 1970. C.R. Rao published four papers in his newly formed journal on MIVQUE for being normally distributed. Rao also derived a method called MINQUE (minimum norm quadratic unbiased estimation) for cases when does not have a normal distribution, but when is normally distributed then minimizing the Euclidean norm, as Rao does, is equivalent to minimizing the variance of the quadratic form which gives MIVQUE. In order to minimize the variance of quadratic forms, the true unknown variances must be known.

At the same time, L.R. LaMotte, at the University of Kentucky, also derived MIVQUE in bulletins which were not formally published in a statistical journal. Henderson(1973) took the MIVQUE method and derived easier computing formulas based upon solutions to the mixed model equations (MME) and upon the elements of the generalized inverse of the coefficient matrix of the MME.

MIVQUE assumes that , the variance-covariance matrix of , is known. The variances of the quadratic forms are minimized for this . Since is not known in reality, some prior information about is utilized, and consequently the variance of quadratic forms is only minimized for this prior . MIVQUE is unbiased and translation invariant and if the prior is the same as the true , then it also has minimum variance. Thus, MIVQUE in general, is not minimum variance in practice, but is only as good as the prior values that are used.

Assume that has a multivariate normal distribution with mean vector , and variance-covariance matrix . Let be the prior information about for i = 0 to s, then let be our prior which is

A projection matrix is constructed as

A projection matrix is simply a way of absorbing the fixed effects and at the same time accounting for information about . Note that and that is idempotent. The MIVQUE quadratic forms are then

Note that these are quadratic forms of and since = 0, then the quadratic forms are translation invariant. Also,

To simplify these quadratic forms notice that

Therefore,

Henderson (1973) noticed that

where is the assumed variance covariance matrix of (in this case , and is BLUP of and the solution of from the MME, then

Similarly, Henderson showed that

where

Thus, under the basic linear model for variance component estimation the necessary quadratic forms for MIVQUE are

Expectations

The MME are

and let a generalized inverse of the coefficient matrix be represented as

Now let , the right hand sides of the MME, then

Also,

and,

The expectation of is

This expectation can be simplified by noting that (for having full rank)

then, for example, for i not equal to j,

The expected value of can be written as

In the same way,

and

Then

but

so that

MIVQUE requires the inverse elements of the MME in order to calculate the expectations of the quadratic forms. In most animal breeding applications, obtaining an inverse of the coefficient matrix could be impossible due to the large order of the equations.

Example

Assume a model with one fixed factor and one random factor. The least squares equations are given below, and = 770.

Let factor D be random and assume that , then to form the MME add 2 to the diagonals of the equations for factor D. The variance-covariance matrix of the right hand sides is

where , the LS equations, and , that is

With then

and the solutions to the mixed model equations are

where

Thus,

The expectations of these quadratic forms are as follows:

also note that

Equate the quadratic forms to their expectations and solve for the estimates of variance components.

Problem

Below are the OLS equations for some data for a model with two random factors (B and C) and one fixed factor (A). The total number of observations was 200, and the total sum of squares was 1,000,000. Assume that the prior variance ratios were

Apply MIVQUE and one of the previous methods to this example data.

REML
L. R. Schaeffer, April 28, 1999

Restricted maximum likelihood (REML), was first suggested by Thompson (1962), but was described formally by Patterson and Thompson (1971).

1.
must have a multivariate normal distribution.
2.
The method is translation invariant.
3.
The estimator is restricted to the allowable parameter space.
4.
REML is a biased procedure.
5.
The exact same quadratic forms as MIVQUE are utilized.
There are several methods of obtaining REML estimators. One method is to take the MIVQUE quadratic forms and derive new expectations assuming that the prior values are equal to the true parameter values. REML estimates can also be obtained by iterating on the MIVQUE quadratic forms and their expectations. The MIVQUE estimates are used to form new prior values of the variance ratios and apply MIVQUE over and over until the estimates and prior values are equal. The final estimates are REML if the variance estimates remain positive. If any of the estimates become negative, then the iteration process must be stopped and REML estimates can not be obtained in that manner.

Derivation of REML from MIVQUE

From MIVQUE, the expectation of quadratic forms of solutions to mixed model equations were as follows:

where and and are assumed prior values of and . To derive REML assume that and substitute into the above expectation to obtain

which gives

Notice that is a sum of squares and therefore always positive, and that is always positive because the diagonals of represent the variances of prediction error (i.e. Hence the estimator, is always positive, i.e. within the parameter space.

Likewise,

under the assumption that Then

The above pseudo expectations give formulas for and that are known as the EM algorithm. EM stands for Expectation Maximization, a procedure described by Dempster et al. (1977). REML estimates are obtained by using and to form new ratios, , then the new ratios are used as priors to calculate a new , until the ratio used as a prior and the estimated ratio are equal. With the REML EM algorithm,
1.
The user must iterate until convergence.
2.
Convergence is not guaranteed.
3.
The final estimate may not maximize the likelihood function over the entire range of parameter values. This would be convergence to a local maximum rather than a global maximum.
4.
Convergence will normally occur if the number of observations is large.
5.
With few observations and several random factors there may be problems in reaching convergence.

An Example

Below are the mixed model equations for an example with three factors, two of which are random.

The total sum of squares is .

To demonstrate EM REML, let and be the starting values of the ratios for factors A and B, respectively. The solution vector is

Then

and from the inverse of the coefficient matrix,

which give rise to the following estimates,

New ratios are formed as

and

and these are used to form the mixed model equations again, new solutions and traces are calculated, and so on, until the estimated ratios and the prior values of the ratios are equal. The estimates in this example converge to

and

Derivation of REML from the likelihood function and a description of derivative free REML are given in the regular notes for the course 10-638 Animal Models.

Maximum Likelihood
L. R. Schaeffer, April 28, 1999

Hartley and Rao (1967) described the maximum likelihood approach for the estimation of variance components.

1.
ML uses the log likelihood function of .
2.
ML uses the exact same quadratic forms as REML and MIVQUE.
3.
ML is translation invariant, but biased.
Henderson(1973) derived ML estimates from mixed model equations, and ML is a more biased method than REML especially the estimator of the residual variance. Estimates of the residual variance are generally smaller with ML than with REML due to the difference in degrees of freedom. REML was proposed in order to account for the loss in degrees of freedom in ML due to estimating the fixed effects. The main disadvantages of ML are the same as for REML in that the procedure is iterative and convergence is not guaranteed. An advantage of ML over REML is that in some models the traces that need to be calculated are easier than those needed for REML (Schaeffer 1976).

Derivation

Assume that is normally distributed with mean and variance-covariance matrix and follows the basic linear mixed model. Except for a constant, the log likelihood function of is

The derivatives of with respect to and to for are

Equating the derivatives to zero gives

Recall that , where is the projection matrix, and that , then the latter equation becomes

Notice that the right hand side of the above equation is the same as in the REML and MIVQUE derivations. However, the difference among the methods is in the left hand side of this equation. For REML, we have and for MIVQUE it is , and for ML it is . For ML, the left hand can be manipulated as

and if

then

If can be partitioned into submatrices for each random factor, then

and

then

Finally,

Combining results gives

for , and for i=0 gives

There are two major differences in these formulas from those for REML, namely,
1.
in ML versus in REML in formula.
2.
N in ML versus in REML as denominator of
in formula.
Note that

and

ML, therefore, does not account for the degrees of freedom needed to estimate , as reflected by and by N in the above formulas.

Example With One Random Factor

The computations for ML become fairly simple with only one random factor in the model. In that case, is a diagonal matrix and its elements can be readily computed. To illustrate, consider the least squares equations below for a model with fixed herd effects, H, and random sire effects, S, in an analysis of milking speed observations.

Also let and , then the solutions are

The following quantities are easily calculated.

In this example, , and therefore must be multiplied by in the computation of . This gives a new variance ratio of . These steps would be repeated, many times, until the estimates converge to the values that maximize the likelihood function. ML calculations for a model with only one random factor are very simple even when the random factor has many levels.

One Random Factor Nested Within Another

In a two random factor model we have

and both and are needed by ML. However, if factor 2 is nested within factor 1, then we know that the columns of can be formed from adding together appropriate columns of . Thus, can be written entirely in terms of , and the matrix that describes which columns of to add together to give . Call this matrix where . Now can be rewritten as

and by using partitioned matrix techniques, then

where the i subscript is summing over factor 1 and the j subscript is summing over levels of factor 2, and nij is the number of observations for a particular subclass. Likewise,

Both traces can be computed readily by proper sorting of the data files by levels of factor 2 within levels of factor 1. Schaeffer (1976) gives the details for these calculations.

To illustrate ML for this type of model, consider the previous milking speed example expanded to include cow effects (i.e. daughters within sires in the table below). Also, some cows are allowed to have more than one observation.

Table 2.2 Example 2-minute milk yield data on dairy cows
 Number of Total Cow Sire Herd Observations 2-min yield(kg) A 1 1 2 9.9 B 1 1 3 20.0 C 2 1 2 9.5 D 2 1 2 8.8 E 2 1 1 5.6 F 3 1 1 4.7 G 4 1 3 12.4 H 4 1 2 8.9 I 1 2 1 5.5 J 1 2 2 10.1 K 1 2 1 2.2 L 2 2 2 10.2 M 2 2 2 10.3 N 2 2 1 5.6 O 3 2 2 8.6 P 3 2 2 10.2 Q 4 2 2 7.8 R 4 2 2 8.1 S 4 2 2 8.2 T 4 2 1 4.3 U 5 2 2 7.0 V 5 2 2 8.6
The least squares equations for this example are of order 29. Assume that and that and for sires and cows, respectively. To compute the traces that are needed, for each sire we have the following:

Now for each sire, compute

Summing gives

The solutions for herds and sires from the mixed model equations were

Solutions for cows are not shown. Other calculations gave

The new variance ratios were then and These new ratios would be used to begin another iteration.

The EM Algorithm

EM stands for Expectation Maximization. From Searle, Casella, and McCulloch (1992) the following explanation is given. The procedure alternates between calculating conditional expected values and maximizing simplified likelihoods. The actual data are called the incomplete data in the EM algorithm, and the complete data are considered to be and the unobservable random effects, . If we knew the realized values of the unobservable random effects then their variance would be the average of their squared values, i.e.,

However, in real life we do not know the realized values of the random effects.

The steps of the EM algorithm are as follows:

Step 0.
Decide on starting values for the variances and set m= 0.
Step 1.(E-step)
Calculate the conditional expectation of the sufficient statistics, conditional on the incomplete data.

Step 2.(M-step)
Maximize the likelihood of the complete data,

Step 3.
If convergence is reached, set , otherwise increase m by one and return to Step 1.
This is equivalent to constructing and solving the mixed model equations with a given set of variances, , and then

The EM algorithm can be applied to REML estimation as well. Just replace with and replace with and note that

Then the conditional expectation in the E-step becomes

Pseudo Expectation Method
L. R. Schaeffer, April 28, 1999

In the process of making up an exam question for students in my Variance Component Estimation course in 1986 I discovered a quadratic form whose pseudo expectation (following the method of deriving REML from MIVQUE expectations of quadratic forms) did not involve the inverse elements of the mixed model equations. The estimators were also translation invariant because they were based on the absorbed right hand sides of the mixed model equations. The biggest advantage of these estimators was their computational ease. Unfortunately, with more than one random factor in the model, the 'quadratic forms' can be shown to be bilinear forms and hence not always guaranteed to be positive definite. The method, similar to most other methods, was found to give severely biased estimates when data were selected, as in typical animal breeding situations (Ouweltjes et al. 1988).

Van Raden (1986) expanded the Schaeffer (1986) quadratic forms for the case with relationships among animals, which could also be used in the basic linear model for variance components.

Derivation

Absorb the fixed effects into the equations for the random factors in the mixed model equations. The absorbed right hand side for the ith random factor is then

where . The expected value is and the variance-covariance matrix is

The solution vector for the random factors are

The latter form is the same as in REML and has pseudo expectation equal to

The usual expectation of is

which when 's are taken as ratios of the true values, reduces to

which gives

The above quadratic form is always positive when there is only one random factor in the model, and therefore, the estimator of is always positive. However, with more than one random factor in the model, the quadratic forms are really bi-linear forms and as such can take on negative values. Also, does not depend on and therefore is a constant that only needs to be computed once. In addition only the diagonals of need to be computed. This can be readily accomplished for most models.

Van Raden (1986) proposed the following quadratic form, , where and , then the pseudo expectation is

which gives

Because is diagonal, then is easily obtained, but now depends on and must be re-computed each iteration. Let be the kth diagonal of then

which is the sum of values that are sometimes used as approximate accuracies of elements of . Hence will always be less than or equal to qi. Recall that Henderson's Method 4 uses as the quadratic form, but is likely closer to than is or .

An Example

Assume a model with two random factors and let . As a priori values let and . The rank of the fixed effects matrix, , is 4 and the total number of observations is 18. The first random factor has 3 levels and the second random factor has 6 levels. The absorbed coefficient matrix is

The absorbed right hand sides, solutions, and are

 Factor 1 128.8214 9.6695 10.2182 155.3571 9.0836 12.5000 -284.1786 -18.7531 -22.5412 2 6.8905 .0750 .6745 .5000 1.5094 .0526 174.2500 14.8447 17.8718 -157.6786 -12.1682 -16.4127 49.7976 5.1919 4.4738 -73.6786 -9.4528 -7.6692
and The following can be calculated from the above;

The estimates are

Problem

Below are data following a model with one fixed factor(A) and two random factors(T and R). Apply EM REML, DFREML, ML, I-MIVQUE (if they stay positive), Pseudo-Expectation method, and Van Raden's method to these data and compare results.

 Ai Tj Rk yijkl Ai Tj Rk yijkl Ai Tj Rk yijkl 1 1 1 50 2 1 1 65 3 1 1 80 1 1 1 54 2 1 2 76 3 1 1 83 1 1 2 59 2 1 2 73 3 1 1 78 1 1 2 61 2 1 2 75 3 1 2 95 1 1 3 68 2 1 3 84 3 1 3 102 1 1 4 79 2 1 3 87 3 1 3 97 1 1 4 82 2 1 4 98 3 1 4 109 1 2 1 41 2 1 4 93 3 1 4 112 1 2 2 52 2 2 1 53 3 2 2 81 1 2 3 64 2 2 3 77 3 2 2 79 1 2 4 69 2 2 3 74 3 2 4 101 1 2 4 73 2 2 4 86 3 3 1 63 1 2 4 70 2 3 2 57 3 3 1 57 1 3 3 47 2 3 2 54 3 3 3 75 1 3 3 46 2 3 2 51 3 3 3 84 1 3 4 62 2 3 3 66 3 3 4 93 1 3 4 57 3 3 4 86 1 3 4 66

Method R
L. R. Schaeffer, April 28, 1999

This method was first reported through the Animal Geneticist's Discussion Group in 1993. A paper was subsequently published in the Journal of Animal Science, 1994, 72:2247-2253, by A. Reverter, B. L. Golden, R. M. Bourdon, and J. S. Brinks. The method came about as a result of a study on the change in genetic evaluations as animals acquire more progeny as in routine beef cattle evaluations. The method is based on the linear regression of recent solutions (based on all the data) on previous solutions (based on a subset of all data). The expected value of the regression equals 1 regardless of the distribution of observations or predictions. If the wrong ratio of variance components is used in the mixed model equations, then the computed value of the regression deviates from the expected value of one. If the computed value is greater than 1, then the h2 was too low. The method is iterative, in that the variance ratio is changed until the regression equals one.

As with the pseudo expectation method, Method R is ad hoc in nature and the properties of the method can only be established through simulation or empirical evidence. Schenkel (1998) has shown that Method R also suffers from bias due to non-random mating and when pedigree files are not complete, as does nearly every method.

Development

To simplify presentation assume only one random factor in the model. Denote the data and model for subset j by

and denote the data and model for all data by the subscript (t). Construct and solve MME for and .

for k=j, and for k=t. Note that the same variance ratio, , is used for both data sets. If an animal model is assumed where , then the dimension of must be the same for both the subset and the total data sets. That is, all animals must be evaluated from each subset and the complete data. Note that contains all of the data in and more.

The jth subset is constructed by randomly choosing a sample one half of the data. This should probably be a random sample half of the contemporary groups, so that animals in a contemporary group stay intact. The idea of Method R is to obtain a number of subsets from the same complete data set, say 10 or more, and to estimate the variances from each subset. Then the end result would be to average the 10 estimates together and calculate their SD in order to get an idea of the variability of the estimates due to sample size.

Now consider the ith element of from the two analyses above.

because the change, , from subset j to complete data t should be uncorrelated with the solution in subset j. The regression of on is then

which should equal one.

An Example

Illustrating Method R really requires a fairly large data set, in order to achieve convergence. This example merely shows the calculations that should be followed, but the example does not converge to a good final estimate. Below are data on progeny of ten sires, for a model with just a mean and random (unrelated) sire and residual effects. The left half of the table represents the data from one subset and the right half represents the complete data.

 Sire Subset 1 Complete Data Number ni yi. ni yi. 1 85 11,462 213 29,218 2 54 6,819 182 23,745 3 112 15,153 121 16,456 4 90 11,115 191 23,637 5 144 16,625 307 35,666 6 192 26,203 298 40,720 7 136 17,957 237 30,944 8 83 10,151 226 27,899 9 174 20,700 338 39,865 10 161 21,346 274 36,483
The total sum of squares of the observations in Subset 1 was 20,715,797., and for the complete data was 39,977,646.

The solutions for sires within each data set are given below, for a variance ratio of 10.

 Sire Number Subset 1 Complete 1 6.3132 8.2960 2 -1.2768 1.8758 3 6.8886 6.9384 4 -3.8619 -4.4988 5 -11.5384 -11.9239 6 8.2531 7.8913 7 3.9549 1.9931 8 -4.8995 -4.8277 9 -8.3459 -10.2414 10 4.5125 4.4973

From these solutions, then

which suggests that the ratio for this sample of data should be lower than 10. To get the new ratio, divide 10 by to give 9.55183. Use the new ratio to construct the MME again for the subset and the complete data sets. Continue until . The authors suggest that a binary search algorithm could be faster. Start with heritability equal to .5, and depending on the value of go to heritability of .75 or .25. Continue reducing the interval by one half until .

The calculations need to be repeated for other subsets of the same data set. The Method R estimate is then the average of the estimates from each subset. Hence there is lots of computing that must be done. However, the only requirement is for a program that computes genetic evaluations for a particular model, and these are readily available. Computationally, Method R is very attractive, and for randomly mating populations Method R will give estimates close to the true values. One question is the number of subsets that should be chosen. This depends on the size of the complete data set and the amount of time to solve equations. The more subsets one has, the better should be the average of the estimates. The authors claim that the estimates, from their experience, are similar to REML estimates, but the proof is rather loose in a technical sense. The covariances and variances in the regression utilize the same quadratic forms and involve the same inverse elements of the mixed model equations as in EM REML. Method R could be a method for obtaining an initial prior value for use in Bayesian or REML methodology, rather than as a stand-alone method. This is due to its problems with incomplete pedigrees and non random matings, and because the properties can not be proven except empirically.

Convergence of Iterative Methods
L. R. Schaeffer, April 28, 1999

These notes are terribly outdated now, with the use of DFREML and Bayesian methods, but they may be of interest to those that work with problems of convergence of estimation systems.

Iterative methods of variance component estimation such as REML and ML are known to converge at very slow rates towards a final solution via the EM algorithm. For large datasets, the rate of convergence could affect computing costs substantially. Rates of convergence seem to be very data dependent and even specific to particular traits within a dataset. In REML, iterative MIVQUE is known to converge faster than the REML EM algorithm. Convergence is also adversely affected by the number of random factors in the model. Then there is always the question whether the system will converge or not. Hence the researcher needs to know a) will the system converge, and b) can the rate of convergence be improved, if the system does converge.

Below are least squares equations for a model with two fixed factors and one random factor

Let factor C be the random factor and assume that Let the best guess of the ratio of residual to factor C variances be 15. Both iterative MIVQUE and the REML EM algorithm were applied to these data and the results by iteration are in Tables 1 and 2. Both computing algorithms were slow to converge, but iterative MIVQUE was faster than REML EM.

Table 1. Results of iterative MIVQUE.
 Iteration Number(i) 1 9138.59 659.45 13.85793 -1.14207 ---- 2 9131.43 709.03 12.87875 -0.97918 1.16635 3 9124.82 757.03 12.05338 -0.82537 1.18635 4 9118.87 802.14 11.36816 -0.68522 1.20453 5 9113.66 843.32 10.80684 -0.56132 1.22073 6 9109.19 879.92 10.35226 -0.45458 1.23481 7 9105.44 911.67 9.98767 -0.36459 1.24683 8 9102.33 938.62 9.69760 -0.29007 1.25690 9 9099.80 961.08 9.46833 -0.22927 1.26519 10 9097.76 979.51 9.28809 -0.18024 1.27203 20 9090.85 1044.20 8.70603 -0.01422 1.29466 30 9090.30 1049.49 8.66166 -0.00106 1.29245 40 9090.26 1049.88 8.65838 -0.00008 1.29750 50 9090.26 1049.91 8.65813

Table 2. Results for REML.
 Iteration Number(i) 1 9145.94 622.81 14.68508 -0.31492 ---- 2 9143.95 635.96 14.37814 -0.30694 1.02600 3 9141.94 649.31 14.07958 -0.29856 1.02807 4 9139.94 662.81 13.78974 -0.28984 1.03009 5 9137.94 676.44 13.50875 -0.28079 1.03223 6 9135.95 690.16 13.23748 -0.27147 1.03433 7 9133.97 703.94 12.97557 -0.26191 1.03650 8 9132.01 717.73 12.72341 -0.25216 1.03867 9 9130.08 731.51 12.48112 -0.24229 1.04074 10 9128.17 745.23 12.24881 -0.23231 1.04296 20 9111.59 870.89 10.46233 30 9100.81 958.84 9.49152 40 9095.07 1007.74 9.02521 50 9092.60 1027.79 8.8468

The final estimates were , , and .

Convergence Curves

The following results are due mainly to Dr. Ignacy Misztal. A plot of the estimated variance ratio against iteration number would show a nonlinear relationship. The curve would appear to be asymptoting to a final value if the system was converging, as in the example. This curve may be characterized by a model, and therefore, the asymptotic value could be predicted if the parameters of that model were known. Let

= initial starting value of variance ratio,
= value of variance ratio after the ith iteration,
= , the difference in values of the variance ratio after one iteration, and
= , ratio of differences in consecutive iterations.
Assume , a constant for all iterations. For REML this assumption is reasonable since varied from 1.026 to 1.043 over the first ten iterations (Table 2), but for iterative MIVQUE the values ranged from 1.17 to 1.272 in the first ten iterations (Table 1). If a constant value can be assumed, then

Then

Suppose , then

For REML in Table 2, if and , then

Obviously, 5.68 is greatly different from the converged value of 8.65, and this is due to the fact that is not a constant over iterations. However, if had been 1.05218, then would equal 8.65. Hence a small difference in can make a big difference in the predicted ratio.

For MIVQUE let

Common Intercept Approach

The Common Intercept Approach (CIA) was suggested in 1979 by Dr. Lynn Johnson (then a postdoctoral research associate with Dr. Burnside) and has been derived from the nonlinear model of the previous section. The purpose of CIA was to reduce the number of iterations needed to attain convergence in any iterative procedure. In addition, CIA can be used to determine if an iterative procedure will converge for a given data set.

Let represent one starting value and let represent a different starting value. The corresponding changes in value after one iteration are and . Then

which can be substituted into

Assuming that

which can be re-arranged to give

Using the example data and REML, from Table 2, let

then CIA gives

With MIVQUE, let

then

Thus, two iterations with different starting values are needed to obtain a prediction of the converged value. The closer the two starting values are to the actual converged estimate, then CIA will work better. For example, using REML, with and then

With MIVQUE, the result was 8.66014.

If the estimate of from CIA is negative, then this indicates that convergence may not be possible. Select new starting ratios that are closer to each other and retry the CIA. If the estimate of the converged value remains negative, then the system will likely not converge. Use of an unbiased, non-iterative procedure on the same data will likely produce negative estimates of one or more variance components. In any event, the negative value of should not be used as the next prior value in the iterative procedure.

Aitken's Approximation

Another method of determining the converged value is Aitken's approximation given by the following formula:

To illustrate, take the values from Table 1 for iterations 6, 7, and 8, then

for MIVQUE, and

for REML.

Relaxation Factors

Relaxation factors attempt to speed convergence. The success of relaxation factors depends upon the magnitude of the factor used. If the factor is too large, then changes between iterations may be too disruptive for the system and convergence may not be achieved any faster, if at all. If the factor is too small, then the increased rate of convergence may not be noticeable. If is the relaxation factor, then the appropriate starting ratio for the (i+1) iterate is not , but is

Usually a factor between 0 and 1 is used. Let in the example, then in REML = 14.68508 and = 15, and therefore = 14.37016. Thus, only one iteration was skipped, and overall probably half the iterations would be needed. Care needs to be used in choosing a value of because it could be dependent on the data structure and the observation values themselves. Thus, using for all situations may not give the best results. Some test runs with different values may be needed.

Covariances and General Models
L. R. Schaeffer, April 28, 1999

The linear model that has been assumed for variance component estimation to this point has variance-covariance matrices of all random factors of the form , where c is the variance to be estimated, and covariances between any two random factors were always null. With animal models and maternal genetic effects, these assumptions are not met, and some of the previous methods cannot be used. Methods applicable to more general models are MIVQUE, REML, ML, Method R and Bayesian methods.

Covariances Between Two Random Factors

Assume the following model and example problem where

and the data are as in the table below. The total sum of squares was . Also, , and letting and have the same number of elements in the same sequence, then

This covariance changes the structure of the variance-covariance matrix of to be

Depending on the method of estimation applied to , the quadratic forms will likely contain in their expectations. If then

Table: Example data for two random factors with covariance.
 y Level of b Level of Level of 145 1 1 2 84 1 2 4 83 1 3 4 75 1 4 5 122 1 1 5 100 2 2 3 172 2 3 2 102 2 4 2 158 2 1 4 156 2 2 5 84 2 3 5 133 3 4 3 139 3 1 3 173 3 2 3 117 3 3 4 187 3 4 2 150 3 1 2 160 3 2 5
The least squares equations, and , are

MIVQUE

Let

then the solutions to the mixed model equations become

The quadratic forms and expectations were

and

The estimates were

The correlation between the two random vectors was greater than one indicating that unbiased procedures, in general, may not perform well in estimating the covariance between factors.

REML

REML is a biased procedure because the estimates are kept within the allowable parameter space. Hence, the estimated correlation between the two random vectors should be between -1 to +1. REML uses the same quadratic forms as MIVQUE, but also

The estimates were

The estimated correlation between the two random vectors with REML was .7462. Of course, the procedure must be iterated many more times before a final solution is obtained, and this could lead to a correlation that converges towards unity.

The course notes have descriptions of methods of variance component estimation for maternal effects and multiple trait models, and will not be reproduced here.

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer
1999-04-28