This LaTeX document is available as postscript or asAdobe PDF.
All biological creatures grow and perform over their lifetime. Traits that are measured at various times during that life are known as longitudinal data. Examples are body weights, body lengths, milk production, feed intake, fat deposition, and egg production. On a biological basis there could be different genes that turn on or turn off as an animal ages causing changes in physiology and performance. Also, an animal's age can be recorded in years, months, weeks, days, hours, minutes, or seconds, so that, in effect, there could be a continuum or continuous range of points in time when an animal could be observed for a trait. These traits have also been called infinitely dimensional traits.
Take body weight as an example, as given in the table below on gilts.
Animal | Days on Test | ||||||
10 | 20 | 30 | 40 | 50 | 100 | ||
1 | 42 | 53 | 60 | 72 | 83 | 140 | |
2 | 30 | 50 | 58 | 68 | 76 | 122 | |
3 | 38 | 44 | 51 | 60 | 70 | 106 | |
SD | 1.6 | 3.7 | 3.9 | 5.0 | 5.3 | 13.9 |
The differences among the three animals increase with days on test as the gilts become heavier. As the mean weight increases, so also the standard deviation of weights increases. The weights over time could be modeled as a mean plus covariates of days on test and days on test squared. Depending on the species and trait, perhaps a cubic or spline function would fit the data better. The point is that the means can be fit by a linear model with a certain number of parameters.
Covariance Functions are a way of modeling the variances and covariances of the weights over time. This model could be different from the model for the actual weights. A random regression model essentially combines both functions into one model. Covariance functions can be written as random regressions where the covariates are standardized expressions of time that go from -1 to +1.
Multiple Trait Approach
The data presented in the previous table have typically been analyzed
such that the weights at each day on test are different traits.
If t is the day on test, i.e. 10, 20, 30, 40, 50, or 100, then
a model for any one of them could be
t_{i} | 10 | 20 | 30 | 40 | 50 | 100 |
q_{i} | -1 | -.778 | -.556 | -.333 | -.111 | +1 |
The quantity h_{ij} is an element of a matrix ,
and
is an element of another matrix, .
The function,
,
is known as a Legendre polynomial.
To calculate the Legendre polynomials, define
By using and there is no need for the use of Legendre polynomials, but rather just functions of the standardized time values. Why Legendre polynomials? Because they are defined over the range of -1 to +1, and they are orthogonal. However, there are other kinds of orthogonal polynomials defined over the same range. The Legendre polynomials are probably the easiest to calculate.
Either
or
can be used to calculate the covariance
between any two days on test between 10 and 100 days. Suppose we
want the covariance between days 20 and 70. Using
we need
a row of
for day 70, and the one for day 20. These are
Using
we need rows of ,
which can be obtained by
Reduced Orders of Fit
Although the order of
in the previous example was six and
polynomials of standardized ages to the fifth power were used to
derive covariance functions, it could be that only squared or cubed ages
are needed to adequately describe the elements of .
Thus,
we want to perform a reduced order of fit. That is, in
To illustrate, let m=3, then
What order of reduced fit is sufficient to explain the variances and covariances in ? Kirkpatrick et al.(1990) suggest looking at the eigenvalues of the matrix from a full rank fit. Below are the values. The sum of all the eigenvalues was 3,865.6681, and also shown is the percentage of that total. Also included are the eigenvalues of and their percentages.
Eigenvalue | Percentage | Eigenvalue | Percentage |
3826.5161 | .9899 | 87538.3830 | .9859 |
22.6809 | .0059 | 1147.1364 | .0129 |
11.2688 | .0029 | 56.6293 | .0006 |
3.8368 | .0010 | 26.9076 | .0003 |
1.3291 | .0003 | 18.1450 | .0002 |
.0364 | .0000 | .1774 | .0000 |
Both matrices indicate that the majority of change in elements in
is explained by a constant, and perhaps a little by
a linear increment. Both suggest that a quadratic function of
the polynomials is probably sufficient. Is there a way to
statistically test the reduced orders of fit to determine which
is sufficient? A goodness of fit statistic is
where
The test statistic,
,
has a Chi-square distribution with
k(k+1)/2 - m(m+1)/2
degrees of freedom.
In the example with m=3,
Is a fit of order 3 poorer than a fit of order 5? An F-statistic
is possible by taking the difference in the goodness of fit
statistics, divided by an estimate of the residual variance.
The residual variance is estimated from a fit of order k-1 or
in this case of order 5. The goodness of fit statistic for
order 5 was 7.8490 with 21-15=6 degrees of freedom. Hence the
residual variance is
CF to RRM
How are covariance functions and a random regression
model related to each other? If we know the lower and upper age
range for a trait, t_{min} and t_{max}, then we know that the
genetic variance for any particular age, i, within the age range is
given by the covariance function,
and implies that there is a different true genetic value of an
animal for every age within the age range. For the i^{th} age of
animal j,
the true genetic value is
Now notice that
is the same for animal j regardless
of the age at which the observation is taken. Using this definition
we can write the model equation for an observation taken at age ion animal j as
Notice that the residual effects may also have a different variance
for each age. It is logical to assume that e_{ji}can be split into parts, i.e.,
e_{ji} & = & p_{ji} + te,
To write an overall model for several observations per animal at
different ages, then
Depending on the trait, there could be other fixed and random factors in the model. For example, contemporary group effects would be defined as those animals that were housed in the same location and measured on the same day by the same person. There could also be other effects like diet effects that could influence a trait on a given day which might differ from animal to animal and from one measurement to the next for the same animal.
The fixed age means in the model do not assume any particular shape of the trait as it changes with age. However, the age range may be large and thus, there would be a large number of discrete age group effects in the model, and possibly there may not be enough data to estimate the means accurately. Therefore, ages may need to be grouped to give fewer age groups overall. Alternatively, one might assume that a mathematical function of age fits the age means, and therefore only a few parameters need to be estimated. If possible, it is likely better to not assume any mathematical function and to have as many age groups as practically possible.
Simulation of Data
Below are the data structure and pedigrees of four dairy cows. Given is the age at which they were observed for stature during four visits to this one herd.
Age at Classification | ||||||
Cow | Sire | Dam | Visit 1 | Visit 2 | Visit 3 | Visit 4 |
1 | 7 | 5 | 22 | 34 | 47 | |
2 | 7 | 6 | 30 | 42 | 55 | 66 |
3 | 8 | 5 | 28 | 40 | ||
4 | 8 | 1 | 20 | 33 | 44 |
The model equation will be
The simulation process begins by
Animal | a_{0} | a_{1} | a_{2} |
5 | -6.08 | .2225 | -.002131 |
6 | 4.54 | -.0822 | .000439 |
7 | 19.35 | -.5466 | .005114 |
8 | -4.99 | .2624 | -.002695 |
For animal 1 who has parents 7 and 5, first average the random
regression coefficients of the parents,
Animal | a_{0} | a_{1} | a_{2} |
1 | -5.15 | .3249 | -.003547 |
2 | 13.47 | -.1286 | .000704 |
3 | -6.25 | .2839 | -.002687 |
4 | -6.84 | .2415 | -.002291 |
Animal | p_{0} | p_{1} | p_{2} |
1 | 3.67 | -.1480 | .001328 |
2 | 7.59 | -.2352 | .002204 |
3 | -2.97 | -.0034 | .000386 |
4 | 3.26 | -.0935 | .000878 |
24 | 22.06 | 2.5233 | .2811 | 1.0567 | -2.2206 |
44 | 27.50 | 2.5233 | 10.2456 | 2.5176 | 1.1294 |
24 | 26.26 | 2.5233 | -.4074 | -2.7626 | -1.3047 |
36 | 29.74 | 2.1462 | 1.7963 | .1732 | 1.7420 |
47 | 33.26 | 2.1462 | 9.3107 | 1.5995 | 1.0237 |
42 | 32.50 | 2.1462 | .8068 | -2.4884 | 8.6908 |
20 | 20.50 | 2.1462 | -2.9264 | 1.7412 | -1.1633 |
39 | 34.81 | -2.0483 | 2.2850 | -.3524 | 4.2048 |
41 | 36.25 | -2.0483 | 8.5266 | 1.3211 | -3.0580 |
34 | 29.21 | -2.0483 | -1.3654 | 1.1306 | 6.8494 |
44 | 36.14 | -2.4207 | 8.0490 | 1.6674 | .9080 |
28 | 33.94 | -2.4207 | -.6494 | .8458 | -3.3007 |
MME
The mixed model equations that need to be constructed to provide
estimated breeding values are as follows;
The entire MME can not be presented, but parts of the MME are given below.
and are composed of the following four blocks of order 3, for the four animals with records;
The right hand sides of the MME are
The solutions to MME are
Animal | a_{0} | a_{1} | a_{2} |
1 | -1.720812 | .039069 | -.000334 |
2 | 9.908001 | -.286581 | .002556 |
3 | -8.723928 | .245155 | -.002166 |
4 | -2.972874 | .108809 | -.001022 |
5 | -5.215457 | .139228 | -.001218 |
6 | 5.243107 | -.150764 | .001344 |
7 | 4.086682 | -.120872 | .001080 |
8 | -4.114334 | .132408 | -.001206 |
Similarly, the solutions for the animal permanent environmental random regression coefficients can be given in tabular form.
Animal | p_{0} | p_{1} | p_{2} |
1 | -.761270 | .012384 | -.000093 |
2 | 3.536084 | -.101685 | .000906 |
3 | -2.737505 | .073743 | -.000643 |
4 | -.037309 | .015559 | -.000170 |
From the table of additive genetic solutions, it becomes a problem to
decide how to rank the animals. If animals are ranked on the basis of
a_{0}, then animal 2 would be the highest (if that was desirable).
If ranked on the basis of a_{1}, then animal 3 would be the highest,
and if ranked on the basis of a_{2}, then animal 2 would be the
highest. To properly rank the animals we need to calculate an EBV
for stature at different ages, and then combine these with the
appropriate economic weights. Suppose we calculate EBVs for 24, 36,
and 48 mo of age. Suppose also that the economic weights were
2, 1, and .5, respectively, for the three EBVs, so that a Total
Economic Value can be calculated as
Animal | EBV(24) | EBV(36) | EBV(48) | TEV |
1 | -.98 | -.75 | -.61 | -3.00 |
2 | 4.50 | 2.90 | 2.04 | 12.93 |
3 | -4.09 | -2.71 | -1.95 | -11.85 |
4 | -.95 | -.38 | -.10 | -2.33 |
5 | -2.58 | -1.78 | -1.34 | -7.60 |
6 | 2.40 | 1.56 | 1.10 | 6.91 |
7 | 1.81 | 1.13 | .77 | 5.14 |
8 | -1.63 | -.91 | -.54 | -4.44 |
The animal with the highest TEV was animal 2. It is interesting to compare animals 1 and 8. Animal 8 was lower than animal 1 for EBV(24) and EBV(36), but not for EBV(48), so that it is evident that growth patterns in these two animals are genetically different. This is a strength of a random regression model analysis, in that different patterns of growth at different ages can be spotted readily. Rankings of animals change with age. Thus, it is possible to change the pattern of growth to one that is desirable.
REML
Estimation of variances and covariances by EM REML is a little more
complicated than usual in this model. The residual variance is
still estimated as
Let the additive genetic random regression coefficients in a previous
table be called the matrix
of order 8 by 3, and let the
table of permanent environmental random regression coefficients
be called
of order 4 by 3. The inverse of the MME can be
represented as
For the permanent environmental variance-covariance matrix,
is of order 12 with 3 rows and columns for each
of the four animals with records, then
As usual, the calculations are iterative and must be repeated until convergence is reached. All results are about twice as large as the original values, possibly due to the particular sample that was generated.
Bayesian Approach - Gibbs Sampling
The conditional distributions from the joint posterior distribution for the solutions to the MME are all normal distributions, and each is sampled separately. Nothing is different from the previous models in this respect. The conditional distributions for
CALL WISHRT(L,TI,ndf)where ndf is v_{a}+q,
This LaTeX document is available as postscript or asAdobe PDF.
Larry Schaeffer