next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Two or More Loci

1. Joint Equilibrium

After one generation of random mating, the alleles at any single locus will be in Hardy-Weinberg Equilibrium. However, when two or more loci are considered jointly, they may not be in joint equilibrium.

Other terms have been used in place of joint equilibrium, such as gametic phase equilibrium and linkage equilibrium. Linkage equilibrium does not have anything to do with genes that are linked, and is therefore, probably a bad choice of term. Try to use joint equilibrium.

Consider two loci each with two alleles. Let pA=0.2 be the frequency of the A1 allele at the first locus, and let pB=0.3 be the frequency of the B1 allele at the second locus. Then the possible genotypes and their expected frequencies, genotypic values would be as shown in the table below.

Possible Frequencies Genotypic
Genotypes fi   Value,vi
A1A1B1B1 pA2pB2 =.0036 aA+aB
A1A1B1B2 pA22pBqB =.0168 aA+dB
A1A1B2B2 pA2qB2 =.0196 aA-aB
A1A2B1B1 2pAqApB2 =.0288 dA+aB
A1A2B1B2 2pAqA2pBqB =.1344 dA+dB
A1A2B2B2 2pAqAqB2 =.1568 dA-aB
A2A2B1B1 qA2pB2 =.0576 -aA+aB
A2A2B1B2 qA2pB2 =.2688 -aA+dB
A2A2B2B2 qA2qB2 =.3136 -aA-aB

The genotypic values assume that there is no epistasis, i.e. no gene interactions between these two loci. However, epistasis could exist and each genotype could have a potentially different epistatic effect. Epistasis will be ignored for a little while in these notes.

The genetic mean, ignoring epistasis, is

\begin{displaymath}\mu_{G} = \sum_{i=1}^{9} f_{i}v_{i} = \mu_{G_{A}} + \mu_{G_{B}}, \end{displaymath}

and the variance is

\begin{eqnarray*}\sigma^{2}_{G} & = & \sum_{i=1}^{9}f_{i}v_{i}^{2} - \mu_{G}^{2}...
...sigma^{2}_{G_{A}} + \sigma^{2}_{G_{B}} + 2 Cov(G_{A},G_{B}). \\

where Cov(GA,GB)=0 if the population is in joint equilibrium. If the population is not in joint equilibrium, then

\begin{displaymath}Cov(G_{A},G_{B}) = 2 \alpha_{A} \alpha_{B} D + 4d_{A}d_{B}D^{2}, \end{displaymath}

where D is the amount of disequilibrium (discussed in the next section).

The gametic frequencies from a population in joint equilibrium are (and hence the term gametic phase equilibrium)

Possible Expected
Gametes Frequencies
A1 B1 pApB
A1 B2 pAqB
A2 B1 qApB
A2 B2 qAqB

In the single locus situation with a small population, drift carried the population towards the A1A1 genotype in some strains with frequency pA and towards the A2A2 genotype in other strains with frequency qA. With two loci, strains will move towards one of the four gamete types with the frequencies shown in the table above.

2. Disequilibrium

Joint disequilibrium can be generated easily by selection on the population. Suppose we select parents that have either genotype A1A1B1B1 with frequency p=0.3 or genotype A2A2B2B2 with frequency q=0.7. These parents are allowed to randomly mate, then the expected progeny genotypes and their frequencies are as follows:

Parent 1 Parent 2 Offspring Frequency  
A1A1B1B1 A1A1B1B1 A1A1B1B1 p2 =0.09
A1A1B1B1 A2A2B2B2 A1A2B1B2 2pq =0.42
A2A2B2B2 A2A2B2B2 A2A2B2B2 q2 =0.49

With two loci, each with two alleles, there should be nine possible genotypes in the offspring generation if they were in joint equilibrium. However, there were only three genotypes in this offspring generation. Note that at the A-locus there is Hardy-Weinberg equilibrium, and at the B-locus there is Hardy-Weinberg equilibrium. However, when both loci are considered jointly, the progeny generation is not in joint equilibrium. The parent population was created by selection, by removing all but two genotypes. Thus, selection has caused the joint disequilibrium. Mixing populations of animals with different gene frequencies will also cause joint disequilbrium, and in small populations, chance can affect gametic frequencies and thereby cause joint disequilibrium. Joint disequilibrium can change genotypic variance.

2.1 Measuring Disequilibrium

To measure the amount of disequilibrium we need to look at the difference between expected gametic frequencies if the parent population was in joint equilibrium, versus the actual gametic frequencies, as in the table below.

Gametes Freq under Actual Differences
  Joint Eq. Freq.  
A1 B1 pApB r=p D= r-pApB
A1 B2 pAqB s=0 -D = s - pAqB
A2 B1 qApB t=0 -D = t - qApB
A2 B2 qAqB u=q D = u - qAqB

So, if pA=pB=p= 0.3, then

D = ru - st = pq = 0.21.

Disequilibrium can be positive or negative. Disequilibrium should be computed between each pair of loci in the genome, and there could be a range of disequilibrium values. The average of the absolute values of D over all pairs of loci could be used as an overall measure of disequilibrium.

Now consider the progeny generation that resulted from the random mating of the selected parents. We need to determine the gametic frequencies of these offspring, as in the table below.

Parent Frequency Gametes
Genotype   $A_{1} \ B_{1}$ $A_{1} \ B_{2}$ $A_{2} \ B_{1}$ $A_{2} \ B_{2}$
A1A1B1B1 .09 .090      
A1A2B1B2 .42 .105 .105 .105 .105
A2A2B2B2 .49       .490
Actual   .195 .105 .105 .595
Expected   .090 .210 .210 .490
Difference   .105 -.105 -.105 .105

D = ru - st = (.195)(.595)-(.105)(.105) = .105.

Note that D is equal to .105, which is one half the D of the parent generation.

2.2 Linked Loci

If we allow for physical linkage between two loci, and let crepresent the recombination rate, then we need to determine how A1B1 gametes can be produced and their frequency. From a nonrecombinant genotype A1B1/- -, the A1B1gamete is produced at a frequency of r(1-c). From a recombinant genotype A1 - /- B1, the A1B1 gamete is produced with frequency pApBc. The total frequency of A1B1 gametes is therefore,

\begin{displaymath}r_{1} \ = \ r(1-c) \ + \ p_{A}p_{B}c. \end{displaymath}

The disequilibrium in the progeny generation would be

\begin{eqnarray*}D_{1} & = & r_{1} - p_{A}p_{B} \\
& = & r(1-c)+p_{A}p_{B}c - ...
..._{A}p_{B}(1-c) \\
& = & (r-p_{A}p_{B})(1-c) \\
& = & D(1-c).

Therefore, the original disequilibrium is reduced by one minus the recombination rate. Using recursion, then in general,

Dt = D(1-c)t

after t generations. When loci are not linked physically, then c=0.5 and disequilibrium is halved with each generation of random mating, as we saw in the previous section. When c < 0.5, then disequilibrium is dissipated at a much slower rate. The median equilibrium time is the number of generations to reduce the disequilibrium by one-half, and is equal to

\begin{displaymath}t \ = \ \log(0.5) / \log(1-c). \end{displaymath}

3. Population Parameters

Below are the frequencies for genotypes at two loci and their genotypic values.

Genotype Freq. Value Genotype Freq. Value
A1A1 p2A aA B1B1 p2B aB
A1A2 2pAqA dA B1B2 2pBqB dB
A2A2 q2A -aA B2B2 q2B -aB

The genetic means for each loci are as for the single loci situation,

\begin{eqnarray*}\mu_{G_{A}} & = & a_{A}(p_{A} - q_{A}) + 2p_{A}q_{A}d_{A} \\
\mu_{G_{B}} & = & a_{B}(p_{B} - q_{B}) + 2p_{B}q_{B}d_{B}.

Assuming that there is no epistasis (i.e. interaction between loci), then

\begin{eqnarray*}\mu_{G} & = & \mu_{G_{A}} \ + \ \mu_{G_{B}} \\
& = & \sum_{i=1}^{n} a_{i}(p_{i}-q_{i}) + 2 \sum_{i=1}^{n}

for n being the number of loci, in general.

Likewise, the genotypic variances for each loci are

\begin{eqnarray*}\sigma^{2}_{G_{A}} & = & 2p_{A}q_{A} \alpha^{2}_{A}
+ (2p_{A}q...
...{B}} & = & 2p_{B}q_{B} \alpha^{2}_{B}
+ (2p_{B}q_{B}d_{B})^{2},

and in total for both loci,

\begin{displaymath}\sigma^{2}_{G} \ = \ \sigma^{2}_{G_{A}} \ + \
\sigma^{2}_{G_{B}} \ + \ 2 \ Cov(G_{A},G_{B}). \end{displaymath}

When the loci are in joint equilibrium, then

\begin{displaymath}Cov(G_{A},G_{B}) \ = \ 0, \end{displaymath}

otherwise with joint disequilibrium, then

\begin{displaymath}Cov(G_{A},G_{B}) \ = \ 2 \alpha_{A} \alpha_{B} D
+ 4 d_{A} d_{B} D^{2}. \end{displaymath}

Recall that

\begin{displaymath}\alpha_{i} \ = \ [a_{i} + d_{i}(q_{i} - p_{i}) ]. \end{displaymath}

With joint disequilibrium the genotypes are not independent, and so there is a non-zero covariance between genotypic values at different loci.

Several simplifications occur if dA and dB can be assumed to be zero. Then

\begin{eqnarray*}\mu_{G_{i}} & = & a_{i}(p_{i}-q_{i}) \\
\sigma^{2}_{G_{i}} & = & 2p_{i}q_{i}a^{2}_{i} \\
Cov(G_{A},G_{B}) & = & 2a_{A}a_{B}D.

4. Epistasis

Below is a table of genotypic values of nine genotypes that are possible from two loci, each with two alleles.

    A1A1 A1A2 A2A2
    p2A 2pAqA q2A
B1B1 p2B aA+aB dA+aB aB-aA
    +4k1 +2k2+2k1  
B1B2 2pBqB aA+dB dA+dB dB-aA
    +k2+2k1 +k1+k2+k3  
B2B2 q2B aA-aB dA-aB -aA-aB

The values of k1, k2, and k3 are all zero in the case when there are no interactions between the two loci. An additive by additive interaction could exist whenever A1 and B1 occur together. Thus, in the genotype A1A1B1B1 there could be four such interactions. If $k_{1} \neq 0$, then the genotypic value of that genotype would be different from the sum of the genotypic values at each loci.

Another possible interaction is an additive by dominance interaction. For example, whenever A1A2 genotype at the A-locus occurs with the B1 allele, then the genotypic value of the A1A2B1 - genotype would be altered from the usual sum of genotypic values at each locus. In this case, k2 would be different from zero.

The third and last possible interaction between two loci is the dominance by dominance interaction which occurs only when A1A2 is with B1B2, or the double heterozygote. Then k3 would be different from zero.

To determine means and variances under the existence of epistasis would require assuming one or more of these types of gene interactions to exist. This gets extremely messy and complex when extended to more than two loci, because more types of gene interactions are possible. Perhaps at a third loci, the presence of C1 might nullify or turn-off the effects of any genotypes at the A and B loci. Also, if loci have more than two alleles, then derivation of simple formulas becomes impossible for epistatic situations.

5. Many Loci

Assume a large, random mating population. Assume that the number of loci affecting a particular trait approaches $\infty$. Assume that there is no epistasis (gene interactions) among loci. The total genotypic value of an individual is then

\begin{displaymath}G \ = \ \sum_{i=1}^{n} \ G_{i}, \end{displaymath}

and the genotypic value of each loci is

\begin{eqnarray*}G_{i} & = & A_{i} \ + \ D_{i} \\
& = & a_{i}(p_{i}-q_{i}) \ + \ 2p_{i}q_{i}d_{i},

where Ai refers to the total additive genetic value and Di refers to the total dominance genetic value. The distribution of genotypes, considering all loci together, is given by

\begin{displaymath}\prod_{i=1}^{n} (p_{i} \ + \ q_{i})^{2}. \end{displaymath}

The Central Limit Theorem can be applied to show that G asymptotically approaches a Normal distribution, and so do $\sum A_{i}$ and $\sum D_{i}$, as long as the loci are statistically independent (disequilibrium value is 0 for all pairs of loci), and there is no epistasis. Animal breeders assume normality in nearly all cases, and the assumptions that are implied above. If n is small, then departures from normality can be large.

If the assumptions of the previous section are true, then phenotypes can be described by a linear model as follows:

\begin{displaymath}y_{i} \ = \ \mu \ + \ A_{i} \ + \ D_{i} \ + \ e_{i}, \end{displaymath}

assuming one record per animal, and where Ai and Di are the additive and dominance genetic values of animal i, and $\mu$ is the population mean, and ei is a random environmental effect peculiar to that animal and record. The random environmental effect is also assumed to be normally distributed, so that yi, the phenotypes, also follow a normal distribution. Also,

\begin{eqnarray*}E(y_{i}) & = & \mu \\
E(A_{i}) & = & 0 \\
E(D_{i}) & = & 0 ...
...& 0 \\
Cov(A_{i},e_{i}) & = & 0 \\
Cov(D_{i},e_{i}) & = & 0

In matrix notation, phenotypes of all animals can be collectively described as

\begin{displaymath}{\bf y} \ = \ \mu {\bf 1} \ + \ {\bf Ia} \ + \ {\bf Id} + {\bf e}, \end{displaymath}

where ${\bf a}$ is the vector of additive genetic values of all animals, ${\bf d}$ is the vector of dominance genetic values of all animals, ${\bf e}$ is the vector or environmental effects for all records, ${\bf 1}$ is a vector of length equal to that of ${\bf a}$, ${\bf d}$, and ${\bf e}$, with all elements equal to one, and $\mu$ is the population mean. The variances and covariances can also be described in matrix notation as follows:

\begin{displaymath}Var \left( \begin{array}{c} {\bf a} \\ {\bf d} \\ {\bf e}
... {\bf0} & {\bf0} & {\bf I}\sigma^{2}_{e}
\end{array} \right), \end{displaymath}

where ${\bf A}$ is the matrix of additive genetic relationships among animals, ${\bf D}$ is the matrix of dominance genetic relationships among animals, and ${\bf I}$ is an identity matrix. Construction of ${\bf A}$ and ${\bf D}$ will be covered later.

The assumptions that go along with this model are

An infinite number of loci, i.e. infinitesimal model,
Random mating population, i.e. no selection,
All loci in joint equilibrium,
A large population, i.e. no inbreeding.
These assumptions are often not understood by many animal breeders. The model is applied to situations where the underlying assumptions are known to be invalid.

next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer