This LaTeX document is available as postscript or asAdobe PDF.

L. R. Schaeffer, March 1999

The statistical area of methods of variance component estimation (VCE) has seen numerous changes, improvements, and advancements in the last 50 years. Thus, a complete history of the evolution of VCE methods would be very interesting especially if comments about the discoverers of methods were included, and about how various methods came into existence. Some of that history is included in the supplemental notes.

Because of the evolution of methods, the teaching of VCE methods for animal breeders can be very cumbersome if all of the historical developments are covered in great detail. On one hand, an historical perspective is needed so that history does not repeat itself and it does provide a good overview of methods that do not work correctly in animal breeding situations. On the other hand, a complete historical coverage of VCE methods would require an entire course of its own, and in the end only a couple of methods would be of immediate importance for a future researcher. Thus, in this course only the simple basics of VCE estimation will be reviewed and the details for two methods will be given. Those methods are REML (restricted maximum likelihood) and the Bayesian approach using Gibbs Sampling as the computational tool for obtaining Bayesian estimates. Another current method, known as Method R, will not be covered and is, in general, not advocated for use in animal breeding (my opinion, at least) because it has been shown by Flavio Schenkel that Method R can give very biased estimates of variances when relationship matrices are not complete and selection has been practiced (which is the usual situation in animal breeding).

**Quadratic Forms**

By definition, variances are positive, non-zero quantities. Variances are used to derive heritabilities, repeatabilities, prediction error variances or reliabilities of genetic evaluations. They assist in design of experiments to determine the necessary sample size to detect significant differences. They are useful in predicting expected genetic change. Thus, in order to evaluate livestock, animal breeders must firstly estimate the genetic variances and covariances to be used.

Variances are commonly estimated from *quadratic forms*, which
are simply weighted sums of squares of the observations. The different
methods of VCE that exist merely define how those quadratic forms
are to be calculated and what to do with them after you have
calculated them.
A quadratic form is a scalar
quantity of the form
where **Q** is assumed to be
symmetric. If **V** =
and
then

- If is not null, then is greater than zero for all if is positive definite(pd). A matrix is positive definite if all of its eigenvalues are positive and greater than zero.
- If is greater than or equal to zero for all , then is positive semi-definite(psd). A matrix is psd if all of its eigenvalues are greater than or equal to zero.
- Non-negative definite (nnd) matrices include all pd and all psd matrices.
- All other matrices are negative definite.
- A quadratic form will have a Chi-squared
distribution if
is
normally distributed and if
**QVQV**=**QV**, i.e.**QV**must be idempotent. - Two quadratic forms,
and
,
are
independent if their covariance is zero and if
**y**is normally distributed. The covariance is

which would be zero if**QVP**=**0**.

Partition

Define

and Then

and if is partitioned corresponding to , as

This simple model is characterized by the assumed structures of and , and hence the resulting simplification of . The vector is assumed to be random and not influenced by selection on any of the random variables of the model.

**Variance of Quadratic Forms**

The variance of a quadratic form is given by

Only translation invariant quadratic forms are typically considered in variance component estimation, that means . Thus, only needs to be calculated. Remember that can be written as the sum of

For example, if

The exact sampling variances require the true, unknown components of variance. One can also note that the magnitude of the sampling variances depends on

- 1.
- the magnitude of the individual components,
- 2.
- the matrix which depends on the method of estimation and the model, and
- 3.
- the structure and amount of the data through and .

**Small Example**

The best way to describe unbiased methods of estimation is to
give a small example with only three observations. Let

Then

and

and .

In this example, there are 3 unknown variances to be estimated,
and consequently, at least three quadratic forms are needed in
order to estimate the variances.
The -matrices are the 'weights' of the observations in the
quadratic forms. These matrices differ depending on the method of
estimation that is chosen. Below are three arbitrary -matrices
that were chosen such that
.
They do
not necessarily correspond to any known method of estimation, but
are for illustration of the calculations.
Let

The values of the quadratic forms are

For example,

The expectations of the quadratic forms are

Unbiased methods of estimation required that the values of the
quadratic forms be equated to their corresponding expectations,
which gives a system of equations to be solved, such as
.
In this case, the equations
would be

which gives the solution as , or

Normally, the variance-covariance matrix of the estimates, commonly
known as the sampling variances of the estimates, were never
actually computed during the days of unbiased methods due to their
computational complexity. However, with today's computers their
calculation can still be very challenging and usually impossible.
For small examples, the calculations can be easily demonstrated.
In this case,

a function of the variance-covariance matrix of the quadratic forms. Note that is a 3x3 matrix in this example. The (1,1) element is the variance of which is

The (1,2) element is the covariance between the first and second quadratic forms,

and similarly for the other terms. All of the results are summarized in the table below.

Forms | ||||||

Var(w_{1}) |
20 | 16 | 16 | 8 | 0 | 8 |

Cov(w_{1},w_{2}) |
14 | 24 | 8 | 16 | 0 | 8 |

Cov(w_{1},w_{3}) |
24 | 24 | 24 | 16 | 0 | 16 |

Var(w_{2}) |
20 | 48 | 16 | 32 | 16 | 8 |

Cov(w_{2},w_{3}) |
24 | 48 | 24 | 32 | 16 | 16 |

Var(w_{3}) |
36 | 48 | 48 | 32 | 16 | 32 |

To get numeric values for these variances, the true components need
to be known. Assume that the true values are
,
,
and
,
then the variance
of *w*_{1} is

The complete variance- covariance matrix of the quadratic forms is

The variance-covariance matrix of the estimated variances (assuming the above true values) would be

**Variance of Heritability**

Often estimates of ratios of functions of the variances are needed
for animal breeding work, such as heritabilities, repeatabilities,
and variance ratios. Let such a ratio be denoted as *a*/*c* where

and

(NOTE: the negative estimate for was set to zero before calculating

From Osborne and Patterson (1952) and Rao (1968) an approximation to
the variance of the ratio is given by

Now note that

Then

This result is very large, but could be expected from only 3 observations. Thus, with a standard deviation of 1.5933.

Another approximation method assumes that the denominator has been
estimated fairly accurately, so that it is considered to be a
constant. Then,

For the example problem, this gives

which is slightly larger than the previous approximation. The second approximation would not be suitable for a ratio of the residual variance to the variance of one of the other components. Suppose , and , then , and

with the first method, and

with the second method. The first method is probably more realistic in this situation, but both are very large.

**Properties of Estimators**

Methods of estimation of variance components differ in the properties of their estimates. Due to the complexity of estimating non-zero scalar quantities from quadratic forms, there is no method that can possibly include all of the desirable properties that animal breeders would like to have. Below is a description of the properties that animal breeders would like to see in a VCE method.

- 1.
**Translation Invariance.**A necessary and sufficient condition for a quadratic form, , to be translation invariant is that , and then . The function of the fixed effects, , disappears from the expectation of , and the fixed effects have no influence on the quadratic form.- 2.
**Within The Parameter Space.**Because components of variance are defined to always be positive, then estimators that yield positive estimates would be desirable. With multiple trait models, the estimated variance-covariance matrices should be positive definite. Variances and covariances, however, should be within the allowable parameter space. Using estimates that are outside of the parameter space can cause problems in evaluating animals, and thus could reduce expected genetic gains.- 3.
**Nearly Unbiased.**Unbiased methods of estimation were initially used by statisticians, but it was widely recognized that the estimates could fall outside of the permissible parameter space. Thus, it was necessary to relax the necessity for unbiasedness. The concept now is that bias is permitted, but as the number of observations increases and approaches an infinite size, then the estimator should approach and become*nearly unbiased*and approaches the expected value.- 4.
**Minimum Mean Squared Error.**The error of estimation is the difference between the estimate and the true value. Squared errors averaged over repeated conceptual samples from the same population, of the same size and structure, give the mean squared error(MSE). A method of estimation that provides estimates with minimum MSE is said to be*best*. Biased estimators may have smaller MSE than unbiased estimators.- 5.
**Freedom From Selection Bias.**Selection exists in many forms. In animal breeding the most serious form of 'selection' is that we do not have complete pedigrees on all animals back to the same base generation. Hence we cannot account for the selection of parents which gave rise to a particular individual. Because of a lack of adequate pedigree information, estimates of variance components can be severely biased. Selection also determines which individuals survive to produce more records or give additional offspring. Selection reduces genetic variation, (i.e. the Bulmer effect), and for usually estimates of the variances prior to selection are required. All methods of VCE seem to be affected by selection bias, but some methods more than others.- 6.
**Computational Feasibility.**Methods vary greatly in their computing demands. Complicated models also lead to high computing costs even for some simple methods. Animal models are computing time intensive because data files in animal breeding are generally large. Data may need to be limited or randomly sampled from the entire data file in order to estimate genetic parameters. Computers, however, are becoming larger in terms of random access memory (RAM), and are becoming faster at computations. Factors that were very limiting 5 years ago are no longer limiting. As computers improve, animal breeders try to estimate more parameters from more data and from more complex linear models.Generally, researchers tend to use software that is readily available to them. Sometimes the software does not cover the precise model that the researcher would like to employ, or there may be a limit to the amount of data that can be included. I have found it best to write specific software for a specific model and dataset, but not all researchers have the time or experience to do this. Also, writing one's own software there is always the possibility for errors in the code which may or may not be detected.

This LaTeX document is available as postscript or asAdobe PDF.