next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Basic Variance Component Estimation
L. R. Schaeffer, March 1999

The statistical area of methods of variance component estimation (VCE) has seen numerous changes, improvements, and advancements in the last 50 years. Thus, a complete history of the evolution of VCE methods would be very interesting especially if comments about the discoverers of methods were included, and about how various methods came into existence. Some of that history is included in the supplemental notes.

Because of the evolution of methods, the teaching of VCE methods for animal breeders can be very cumbersome if all of the historical developments are covered in great detail. On one hand, an historical perspective is needed so that history does not repeat itself and it does provide a good overview of methods that do not work correctly in animal breeding situations. On the other hand, a complete historical coverage of VCE methods would require an entire course of its own, and in the end only a couple of methods would be of immediate importance for a future researcher. Thus, in this course only the simple basics of VCE estimation will be reviewed and the details for two methods will be given. Those methods are REML (restricted maximum likelihood) and the Bayesian approach using Gibbs Sampling as the computational tool for obtaining Bayesian estimates. Another current method, known as Method R, will not be covered and is, in general, not advocated for use in animal breeding (my opinion, at least) because it has been shown by Flavio Schenkel that Method R can give very biased estimates of variances when relationship matrices are not complete and selection has been practiced (which is the usual situation in animal breeding).

Quadratic Forms

By definition, variances are positive, non-zero quantities. Variances are used to derive heritabilities, repeatabilities, prediction error variances or reliabilities of genetic evaluations. They assist in design of experiments to determine the necessary sample size to detect significant differences. They are useful in predicting expected genetic change. Thus, in order to evaluate livestock, animal breeders must firstly estimate the genetic variances and covariances to be used.

Variances are commonly estimated from quadratic forms, which are simply weighted sums of squares of the observations. The different methods of VCE that exist merely define how those quadratic forms are to be calculated and what to do with them after you have calculated them. A quadratic form is a scalar quantity of the form ${\bf y}'{\bf Qy},$ where Q is assumed to be symmetric. If V = $Var({\bf y})$ and $E({\bf y}) =
{\bf Xb},$ then

\begin{displaymath}E({\bf y}'{\bf Qy}) = tr {\bf QV} +
{\bf b}'{\bf X}'{\bf QXb} \end{displaymath}

\begin{displaymath}Var({\bf y}'{\bf Qy}) = 2 tr {\bf QVQV} +
4{\bf b}'{\bf X}'{\bf QVQXb} \end{displaymath}

The early methods of VCE concentrated on unbiased estimation, as a carryover from unbiased estimation of linear functions of ${\bf y}$. Prior to 1970, a very simplistic linear model was used for purposes of deriving estimates of variance components. This is an easy model for presenting some basic concepts about quadratic forms, their expectations, and their sampling variances. Let

\begin{eqnarray*}{\bf y} & = & {\bf Xb} \ + \ {\bf Zu} \ + \ {\bf e}, \\
...f u}) & = & {\bf0}, \\
\mbox{ and} \ \ E({\bf e}) & = & {\bf0}.

Partition u into s factors as

\begin{displaymath}{\bf u}' = ( {\bf u}'_{1} \ \ {\bf u}'_{2} \ \ldots \
{\bf u}'_{s} \ ). \end{displaymath}


\begin{displaymath}{\bf G} = Var({\bf u}) = Var \left( \begin{array}{l}
{\bf u}_...
...bf0} & \ldots &{\bf I}{\sigma}_{s}^{2} \\
\end{array} \right) \end{displaymath}

and ${\bf R} = Var({\bf e}) = {\bf I} \sigma_{0}^{2}.$Then

\begin{displaymath}Var({\bf y}) = {\bf V} = {\bf ZGZ}' + {\bf R}, \end{displaymath}

and if ${\bf Z}$ is partitioned corresponding to ${\bf u}$, as

\begin{eqnarray*}{\bf Z} & = & [ \ {\bf Z}_{1} \ \ {\bf Z}_{2} \ \ldots \ {\bf Z...
...en} \\
{\bf V} & = & \sum_{i=o}^{s} {\bf V}_{i} \sigma_{i}^{2}.

This simple model is characterized by the assumed structures of ${\bf G}$ and ${\bf R}$, and hence the resulting simplification of ${\bf V}$. The vector ${\bf y}$is assumed to be random and not influenced by selection on any of the random variables of the model.

Variance of Quadratic Forms

The variance of a quadratic form is given by

\begin{displaymath}Var({\bf y}'{\bf Qy}) = 2tr{\bf QVQV} +
4{\bf b}'{\bf X}'{\bf QVQXb}. \end{displaymath}

Only translation invariant quadratic forms are typically considered in variance component estimation, that means ${\bf b}'{\bf X}'{\bf QVQXb}= 0$. Thus, only $2tr{\bf QVQV}$ needs to be calculated. Remember that ${\bf V}$can be written as the sum of s+1 matrices, ${\bf V}_{i} \sigma_{i}^{2}$, then

\begin{eqnarray*}tr{\bf QVQV} & = & tr {\bf Q} \sum_{i=o}^{s}
{\bf V}_{i} \sigm...
... {\bf QV}_{i}{\bf QV}_{j} \
{\sigma}_{i}^{2} \ {\sigma}_{j}^{2}

For example, if s = 2, then

\begin{eqnarray*}tr{\bf QVQV} & = & tr{\bf QV}_{0}{\bf QV}_{0}
\sigma_{0}^{4} \...
...sigma_{2}^{2} \
+ \ tr{\bf QV}_{2}{\bf QV}_{2} \sigma_{2}^{4}.

The exact sampling variances require the true, unknown components of variance. One can also note that the magnitude of the sampling variances depends on
the magnitude of the individual components,
the matrix ${\bf Q}$ which depends on the method of estimation and the model, and
the structure and amount of the data through ${\bf X}$ and ${\bf Z}$.

Small Example

The best way to describe unbiased methods of estimation is to give a small example with only three observations. Let

\begin{eqnarray*}{\bf X} & = &
\left( \begin{array}{c}
1 \\ 1 \\ 1
...y} =
\left( \begin{array}{l}
29 \\ 53 \\ 44
\end{array} \right),


\begin{displaymath}{\bf V}_{1} = {\bf Z}_{1}{\bf Z}'_{1} =
\left( \begin{array}{lll}
1 & 1 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 1
\end{array} \right), \end{displaymath}


\begin{displaymath}{\bf V}_{2} = {\bf Z}_{2}{\bf Z}'_{2} =
\left( \begin{array}{lll}
1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 1
\end{array} \right)

and \( {\bf V}_{0} = {\bf I} \).

In this example, there are 3 unknown variances to be estimated, and consequently, at least three quadratic forms are needed in order to estimate the variances. The ${\bf Q}$-matrices are the 'weights' of the observations in the quadratic forms. These matrices differ depending on the method of estimation that is chosen. Below are three arbitrary ${\bf Q}$-matrices that were chosen such that ${\bf Q}_{k}{\bf X} = {\bf0}$. They do not necessarily correspond to any known method of estimation, but are for illustration of the calculations. Let

\begin{eqnarray*}{\bf Q}_{1} & = & \left( \begin{array}{rrr}
1 & -1 & 0 \\ -1 &...
2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{array} \right)

The values of the quadratic forms are

\begin{eqnarray*}{\bf y}'{\bf Q}_{1}{\bf y} & = & 657, \\
{\bf y}'{\bf Q}_{2}{\bf y} & = & 306, \\
{\bf y}'{\bf Q}_{3}{\bf y} & = & 882.

For example,

\begin{displaymath}{\bf y}'{\bf Q}_{1}{\bf y} = \left( \begin{array}{ccc}
29 & ...
...ft( \begin{array}{r} 29 \\ 53 \\ 44 \end{array} \right) = 657. \end{displaymath}

The expectations of the quadratic forms are

\begin{eqnarray*}E({\bf y}'{\bf Q}_{1}{\bf y})
& = & tr{\bf Q}_{1}{\bf V}_{0}\s...
... \sigma_{0}^{2} \ + \
4 \sigma_{1}^{2} \ + \ 4 \sigma_{2}^{2} .

Unbiased methods of estimation required that the values of the quadratic forms be equated to their corresponding expectations, which gives a system of equations to be solved, such as \( {\bf F \sigma} = {\bf w} \). In this case, the equations would be

\begin{displaymath}\left( \begin{array}{rrr}
4 & 2 & 2 \\ 4 & 4 & 2 \\ 6 & 4 & ...
...t( \begin{array}{r}
657. \\ 306. \\ 882. \end{array} \right), \end{displaymath}

which gives the solution as \( \hat{ {\bf\sigma}} = {\bf F}^{-1}{\bf w} \), or

\begin{displaymath}\left( \begin{array}{c} \hat{\sigma}_{0}^{2} \\
...\begin{array}{r} 216.0 \\ -175.5 \\ 72.0
\end{array} \right). \end{displaymath}

Normally, the variance-covariance matrix of the estimates, commonly known as the sampling variances of the estimates, were never actually computed during the days of unbiased methods due to their computational complexity. However, with today's computers their calculation can still be very challenging and usually impossible. For small examples, the calculations can be easily demonstrated. In this case,

\begin{displaymath}Var({\bf F}^{-1}{\bf w}) = {\bf F}^{-1}Var({\bf w}){\bf F}^{-1'}, \end{displaymath}

a function of the variance-covariance matrix of the quadratic forms. Note that $Var({\bf w})$ is a 3x3 matrix in this example. The (1,1) element is the variance of ${\bf y}'{\bf Q}_{1}{\bf y}$which is

\begin{eqnarray*}Var({\bf y}'{\bf Q}_{1}{\bf y}) & = & 2tr {\bf Q}_{1}{\bf VQ}_{...
...} \ + \
0 \sigma_{1}^{2} \sigma_{2}^{2} \ + \ 8 \sigma_{2}^{4}

The (1,2) element is the covariance between the first and second quadratic forms,

\begin{displaymath}Cov({\bf y}'{\bf Q}_{1}{\bf y}, {\bf y}'
{\bf Q}_{2}{\bf y}) = 2tr {\bf Q}_{1}{\bf VQ}_{2}{\bf V}, \end{displaymath}

and similarly for the other terms. All of the results are summarized in the table below.

Forms $\sigma_{0}^{4}$ $\sigma_{0}^{2} \sigma_{1}^{2}$ $ \sigma_{0}^{2} \sigma_{2}^{2}$ $\sigma_{1}^{4}$ $ \sigma_{1}^{2} \sigma_{2}^{2}$ $\sigma_{2}^{4}$
Var(w1) 20 16 16 8 0 8
Cov(w1,w2) 14 24 8 16 0 8
Cov(w1,w3) 24 24 24 16 0 16
Var(w2) 20 48 16 32 16 8
Cov(w2,w3) 24 48 24 32 16 16
Var(w3) 36 48 48 32 16 32

To get numeric values for these variances, the true components need to be known. Assume that the true values are $\sigma^{2}_{0} = 250$, $\sigma_{1}^{2} = 10$, and $\sigma_{2}^{2} = 80$, then the variance of w1 is

\begin{eqnarray*}Var(w_{1}) & = & 20(250)^{2}+16(250)(10)+16(250)(80) \\
& & + 8(10)^{2} + 0(10)(80) + 8(80)^{2} \\
& = & 1,662,000.

The complete variance- covariance matrix of the quadratic forms is

\begin{displaymath}Var \left( \begin{array}{c} w_{1} \\ w_{2} \\ w_{3} \end{arra...
...00 \\
2,144,000 & 2,218,400 & 3,550,800 \end{array} \right). \end{displaymath}

The variance-covariance matrix of the estimated variances (assuming the above true values) would be

\begin{eqnarray*}Var( \hat{{\bf\sigma}}) & = & {\bf F}^{-1} Var({\bf w}){\bf F}^...
-240,700 & 141,950 & 293,500 \end{array} \right) = {\bf C}.

Variance of Heritability

Often estimates of ratios of functions of the variances are needed for animal breeding work, such as heritabilities, repeatabilities, and variance ratios. Let such a ratio be denoted as a/c where

\begin{displaymath}a = \hat{ \sigma_{2}^{2}} = ( 0 \ \ 0 \ \ 1 )\hat{{\bf\sigma}} =
72. \end{displaymath}


\begin{displaymath}c = \hat{ \sigma_{0}^{2}} + \hat{ \sigma_{1}^{2}} +
\hat{ \sigma_{2}^{2}} = ( 1 \ \ 1 \ \ 1 )\hat{{\bf\sigma}} =
288. \end{displaymath}

(NOTE: the negative estimate for $\hat{\sigma}_{1}^{2}$ was set to zero before calculating c.

From Osborne and Patterson (1952) and Rao (1968) an approximation to the variance of the ratio is given by

\begin{displaymath}Var(a/c) = ( c^{2}Var(a) + a^{2}Var(c) - 2ac \ Cov(a,c) ) / c^{4}. \end{displaymath}

Now note that

\begin{eqnarray*}Var(a) & = & ( 0 \ \ 0 \ \ 1 ){\bf C}( 0 \ \ 0 \ \ 1 )' = 293,5...
...,c,) & = & ( 0 \ \ 0 \ \ 1 ){\bf C}( 1 \ \ 1 \ \ 1 )' = 194,750.


\begin{eqnarray*}Var(a/c) & = & [ (288)^{2}(293,500) + (72)^{2}(231,200) \\
& & - 2(72)(288)(194,750)] / (288)^{4} \\
& = & 2.53876

This result is very large, but could be expected from only 3 observations. Thus, $(a/c) \ = \ .25$ with a standard deviation of 1.5933.

Another approximation method assumes that the denominator has been estimated fairly accurately, so that it is considered to be a constant. Then,

\begin{displaymath}Var(a/c) \cong Var(a)/ c^{2}. \end{displaymath}

For the example problem, this gives

\begin{displaymath}Var(a/c) \cong 293,500 / (288)^{2} = 3.53853, \end{displaymath}

which is slightly larger than the previous approximation. The second approximation would not be suitable for a ratio of the residual variance to the variance of one of the other components. Suppose $a \ = \ \hat{\sigma}_{0}^{2} \ = \ 216$, and $c \ = \ \hat{\sigma}_{2}^{2} \ = \ 72$, then $(a/c) \ = \ 3.0$, and

\begin{eqnarray*}Var(a/c) & = & [(72)^{2}(405,700) + (216)^{2}(293,500) \\
& & - 2 (72)(216)(-240,700) ] / (72)^{4} \\
& = & 866.3966,

with the first method, and

\begin{displaymath}Var(a/c) = 405,700 / (72)^{2} \ = \ 78.26, \end{displaymath}

with the second method. The first method is probably more realistic in this situation, but both are very large.

Properties of Estimators

Methods of estimation of variance components differ in the properties of their estimates. Due to the complexity of estimating non-zero scalar quantities from quadratic forms, there is no method that can possibly include all of the desirable properties that animal breeders would like to have. Below is a description of the properties that animal breeders would like to see in a VCE method.

Translation Invariance. A necessary and sufficient condition for a quadratic form, ${\bf y}'{\bf Qy}$, to be translation invariant is that ${\bf QX} = {\bf0}$, and then $E({\bf y}'{\bf Qy}) = tr {\bf QV}$. The function of the fixed effects, ${\bf b}$, disappears from the expectation of ${\bf y}'{\bf Qy}$, and the fixed effects have no influence on the quadratic form.

Within The Parameter Space. Because components of variance are defined to always be positive, then estimators that yield positive estimates would be desirable. With multiple trait models, the estimated variance-covariance matrices should be positive definite. Variances and covariances, however, should be within the allowable parameter space. Using estimates that are outside of the parameter space can cause problems in evaluating animals, and thus could reduce expected genetic gains.

Nearly Unbiased. Unbiased methods of estimation were initially used by statisticians, but it was widely recognized that the estimates could fall outside of the permissible parameter space. Thus, it was necessary to relax the necessity for unbiasedness. The concept now is that bias is permitted, but as the number of observations increases and approaches an infinite size, then the estimator should approach and become nearly unbiased and approaches the expected value.

Minimum Mean Squared Error. The error of estimation is the difference between the estimate and the true value. Squared errors averaged over repeated conceptual samples from the same population, of the same size and structure, give the mean squared error(MSE). A method of estimation that provides estimates with minimum MSE is said to be best. Biased estimators may have smaller MSE than unbiased estimators.

Freedom From Selection Bias. Selection exists in many forms. In animal breeding the most serious form of 'selection' is that we do not have complete pedigrees on all animals back to the same base generation. Hence we cannot account for the selection of parents which gave rise to a particular individual. Because of a lack of adequate pedigree information, estimates of variance components can be severely biased. Selection also determines which individuals survive to produce more records or give additional offspring. Selection reduces genetic variation, (i.e. the Bulmer effect), and for usually estimates of the variances prior to selection are required. All methods of VCE seem to be affected by selection bias, but some methods more than others.

Computational Feasibility. Methods vary greatly in their computing demands. Complicated models also lead to high computing costs even for some simple methods. Animal models are computing time intensive because data files in animal breeding are generally large. Data may need to be limited or randomly sampled from the entire data file in order to estimate genetic parameters. Computers, however, are becoming larger in terms of random access memory (RAM), and are becoming faster at computations. Factors that were very limiting 5 years ago are no longer limiting. As computers improve, animal breeders try to estimate more parameters from more data and from more complex linear models.

Generally, researchers tend to use software that is readily available to them. Sometimes the software does not cover the precise model that the researcher would like to employ, or there may be a limit to the amount of data that can be included. I have found it best to write specific software for a specific model and dataset, but not all researchers have the time or experience to do this. Also, writing one's own software there is always the possibility for errors in the code which may or may not be detected.

next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer