next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Statistics Background
L. R. Schaeffer, March 1999

Random Variables

A random variable is a real-valued function which exists within the domain of a defined sample space. A random variable is designated by a capital letter, say Y, and the value of Y, depending on the outcome of the experiment, is denoted by a small letter, say y. The sample space is the range of values that the value of Y can be assigned.

Random variables can be either discrete or continuous. A discrete random variable can assume only a finite number of distinct values, such as zero or one for example. A continuous random variable can assume any value within the range of the sample space.

Discrete Random Variables

In the discrete case, the probability that Y takes the value y, is defined as the sum of the probabilities of all sample points that are assigned the value y. That is,

P(Y=y) = p(y).

The probability distribution of Y lists the probabilities for each value of y. Suppose Y can take on four values with the following probabilities:

y p(y)
0 1/8
1 1/4
2 1/4
3 3/8

Any other values of y are assumed to have p(y)=0, and the sum of the probabilities is 1. The expected value of a discrete random variable is defined as

\begin{displaymath}E(Y) = \sum_{y} y \ p(y). \end{displaymath}

For the example above,

E(Y) = ( 0 (1/8) + 1 (1/4) + 2 (1/4) + 3 (3/8) ) = 1.875.

Similarly, the expected value of a function of Y, say g(Y) is given by

\begin{displaymath}E(g(Y)) = \sum_{y} g(y) \ p(y). \end{displaymath}

Suppose g(Y) = Y2, then

E(Y2) = ( 0 (1/8) + 1 (1/4) + 4 (1/4) + 9 (3/8) ) = 4.625.

The variance of discrete random variable Y is

Var(Y) = E[(Y - E(Y))2] = E(Y2) - [E(Y)]2.

For the example,

\begin{eqnarray*}Var(Y) & = & (-1.875)^{2}(1/8) + (-.875)^{2}(1/4) + (.125)^{2}(...
...^{2}(3/8) \\
& = & 4.625 \ - \ (1.875)^{2} \\
& = & 1.109375

Binomial Distribution

A common discrete distribution is the binomial distribution. A binomial event can take on only two possible outcomes, success or failure, zero or one, heads or tails, diseased or not diseased, and so on. The probability of one outcome is q and the probability of the other outcome is 1-q. Trials, or a succession of binomial events, are assumed to be independent. The random variable Y is the number of successes. The probability distribution is given by

\begin{displaymath}p(y) = \left( \begin{array}{c} n \\ y \end{array} \right)
q^{y} (1-q)^{n-y}, \end{displaymath}

for $y=0, \ 1, \ 2, \ ... , n$ and $0 \leq \ q \ \leq 1$. The number of trials is n. The expected value and variance of the binomial distribution are

\begin{eqnarray*}E(Y) & = & n \ q \\
Var(Y) & = & n \ q \ (1-q) .

Poisson Distribution

A Poisson probability distribution provides a good model for the probability distribution of the number Y of rare events that occur in a given space, time, volume, or any other dimension, and $\lambda$ is the average value of Y. An example in animal breeding might be the number of quality embryos produced by a cow during superovulation, which can range from 0 to 20 (or more). The Poisson probability distribution is given by

\begin{displaymath}p(y) = \frac{\lambda^{y}}{y!} \exp{- \lambda}, \end{displaymath}

for $y = 0, \ 1, \ 2, \ ... $ and $\lambda > 0.$ Also,

\begin{eqnarray*}E(Y) & = & \lambda \\
Var(Y) & = & \lambda .

General Results


If Y represents a random variable from some defined population, then the expectation of Y is denoted by

\begin{displaymath}E(Y) = \mu \end{displaymath}

where $E(\cdot)$ means expected value. Now let ${\bf Y_{1}}$ and ${\bf Y_{2}}$ represent vectors of random variables (or random vector variables), let k represent a scalar constant, and let ${\bf K}$ be a matrix of constants, then
\(E(kY) = k \: E(Y) = k \mu. \)
\( E({\bf Y_{1}}) = {\bf\mu} = \left( \begin{array}{c}
\mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{n} \end{array} \right). \)
\( E(k \: {\bf Y_{1}}) = k \: {\bf\mu}. \)
\( E({\bf KY_{1}}) = {\bf K \mu}. \)
\( E({\bf Y_{1}+Y_{2}}) = E({\bf Y_{1}})+E({\bf Y_{2}}) =
{\bf\mu_{1}}+{\bf\mu_{2}}, \) and if ${\bf\mu_{1}} =
{\bf\mu_{2}}$ then \( E({\bf Y_{1}+Y_{2}}) = 2{\bf\mu}. \)

The mean of a population is also known as the first moment of the distribution. The exact form of the distribution for a random variable will determine the form of the estimator of the mean and of other parameters of the distribution.

Variance-Covariance Matrices

The variance of a scalar random variable, Y, is defined as

Var(Y) = E(Y2) - E(Y)E(Y) = E(Y-E(Y))2

and is commonly represented as $\sigma_{Y}^{2}$. With two scalar random variables, say Y1 and Y2, then the covariance between them is defined as

Cov(Y1,Y2) = E(Y1Y2) - E(Y1)E(Y2)

and is represented as $\sigma_{Y_{1}Y_{2}}$. These definitions can be extended to a vector of random variables, ${\bf Y}$, to give a variance-covariance matrix of order equal to the length of ${\bf Y}$.

\begin{displaymath}Var({\bf Y}) = E({\bf YY'}) - E({\bf Y})E({\bf Y'})\end{displaymath}

\begin{displaymath}\mbox{ = } \left( \begin{array}{cccc}
\sigma_{Y_{1}}^{2} & \s...
...s & \sigma_{Y_{n}}^{2}
\end{array} \right) \mbox{ = } {\bf V}. \end{displaymath}

A variance-covariance (VCV) matrix of a random vector contains variances on the diagonals and covariances on the off-diagonals. A VCV matrix is square, symmetric and should always be positive definite or positive semi-definite. Another commonly used name for VCV matrix is dispersion matrix.

Let ${\bf A}$ be a matrix of constants conformable for multiplication with the vector ${\bf Y}$, then

\begin{eqnarray*}Var({\bf AY}) & = & E({\bf AYY'A'})-E({\bf AY})E({\bf Y'A'}) \\...
...} \\
& = & {\bf A}Var({\bf Y}){\bf A'} \mbox{ = } {\bf AVA'}.

If we have two sets of functions of ${\bf Y}$, say ${\bf AY}$ and ${\bf BY}$, then

\begin{displaymath}Cov({\bf AY,BY}) \ = \ {\bf AVB'}. \end{displaymath}

Similarly, if we have two functions of different random vectors, say ${\bf AY}$ and ${\bf MZ}$, and \( Cov({\bf Y,Z}) = {\bf C} \), then

\begin{displaymath}Cov({\bf AY,MZ}) \ = \ {\bf ACM'}. \end{displaymath}

Continuous Distributions

Consider measuring the amount of milk given by a dairy cow at a particular milking. Even if a machine of perfect accuracy was used, the amount of milk would be a unique point on a continuum of possible values, such as 32.35769842.... kg of milk. As such it is mathematically impossible to assign a nonzero probability to all of the infinite possible points in the continuum. Thus, a different method of describing a probability distribution of a continuous random variable must be used. The sum of the probabilities (if they could be assigned) through the continuum is still assumed to sum to 1. The cumulative distribution function of a random variable is

\begin{displaymath}F(y) = P(Y \leq y), \end{displaymath}

for $ - \infty < y < \infty$. As y approaches $- \infty$, then F(y) approaches 0. As y approaches $\infty$, then F(y)approaches 1. Thus, F(y) is said to be a nondecreasing function of y. If a < b, then F(a) < F(b).

If F(y) is the cumulative distribution function of Y, then the probability density function of Y is given by

\begin{displaymath}f(y) = \frac{d\,F(y)}{dy} = F^{\prime} (y), \end{displaymath}

wherever the derivative exists. Always for f(y) being a probability density function,

\begin{displaymath}\int_{- \infty}^{\infty} f(y)\,dy = 1. \end{displaymath}


\begin{displaymath}F(y) = \int_{- \infty}^{y} f(t)\,dt. \end{displaymath}

The expected value of a continuous random variable Y is

\begin{displaymath}E(Y) = \int_{- \infty}^{\infty} y\,f(y)\,dy \end{displaymath}

provided that the integral exists. If g(Y) is a function of Y, then

\begin{displaymath}E(g(Y)) = \int_{- \infty}^{\infty} g(y)\,f(y)\,dy \end{displaymath}

provided that the integral exists. Finally,

Var(Y) = E(Y2) - [E(Y)]2.

The Uniform Distribution

The basis for the majority of random number generators is a uniform distribution. A random variable Y has a continuous uniform probability distribution on the interval $(\theta_{1},
\theta_{2})$ if and only if the density function of Y is

\begin{displaymath}f(y) = \left\{ \begin{array}{ll}
\frac{1}{\theta_{2} - \thet...
...eq \theta_{2} \\
0. & \mbox{elsewhere.}
\end{array} \right. \end{displaymath}

The parameters of the density function are $\theta_{1}$ and $\theta_{2}$. Also,

\begin{eqnarray*}E(Y) & = & ( \theta_{1} + \theta_{2})/2 \\
Var(Y) & = & ( \theta_{2} - \theta_{1})^{2} / 12.

The Normal Distribution

A random variable Y has a normal probability distribution if and only if

\begin{displaymath}f(y) = ( \sigma (2\pi)^{.5})^{-1} \exp(-.5(y-\mu)^{2}
\sigma^{-2} ) \end{displaymath}

for $-\infty < x < +\infty$, where $\sigma^{2}$ is the variance of Y and $\mu$ is the expected value of Y.

For the random vector, ${\bf Y}$, the multivariate normal density function is

\begin{displaymath}f({\bf y}) \mbox{ = } (2\pi)^{-.5n} \mid {\bf V} \mid^{-5.}
\exp(-.5({\bf y-\mu})'{\bf V}^{-1}({\bf y-\mu})) \end{displaymath}

denoted as ${\bf y} \sim N({\bf\mu,V})$ where ${\bf V}$ is the variance-covariance matrix of ${\bf Y}$. Note that the determinant of ${\bf V}$ must be positive, otherwise the density function is undefined.

Chi-Square Distribution.

The chi-square distribution is used in hypothesis testing, for example in contingency tables. It is also used as a prior distribution for variances in Bayesian analyses. Chi-square variables are components of the following two distributions.

The t-distribution.

The t-distribution is based on the ratio of two independent random variables. The first is from a univariate normal distribution, and the second is from a central chi-square distribution. Let \( x \sim N(0,1) \) and \( u \sim \chi_{n}^{2} \) with x and u being independent, then

\begin{displaymath}\frac{x}{(u/n)^{.5}} \sim t_{n}. \end{displaymath}

The mean of a t-distribution is the mean of the x variable, and the variance is n/(n-2), and n is the degrees of freedom of the distribution.

The F-distribution.

The central F-distribution is based on the ratio of two independent central chi-square variables. Let $u \sim \chi_{n}^{2}$ and $w \sim \chi_{m}^{2}$ with u and w being independent, then

\begin{displaymath}\frac{(u/n)}{(w/m)} \sim F_{n,m}. \end{displaymath}

The mean of the F-distribution is m/(m-2) and the variance is

\begin{displaymath}\frac{2m^{2}(n+m-2)}{n(m-2)^{2}(m-4)}. \end{displaymath}

Tables of F-values have been constructed for various probability levels as criteria to test if the numerator chi-square variable has a noncentral chi-square distribution. If the calculated F-value is greater than the value in the tables, then u is implied to have a noncentral chi-square distribution, otherwise we assume that u has a central chi-square distribution.

The square of a t-distribution variable gives a variable that has an F-distribution with 1 and n degrees of freedom.

Noncentral F-distributions exist depending on whether the numerator or denominator variables have noncentral chi-square distributions. Tables for noncentral F-distributions generally do not exist because of the difficulty in predicting the noncentrality parameters. However, using random chi-square generators it is possible to numerically calculate an expected noncentral F value for specific situations. When both the numerator and denominator chi-square variables are from noncentral distributions, then their ratio follows a doubly noncentral F-distribution.

Quadratic and Bilinear Forms

Quadratic Forms

A quadratic form is a sum of squares of elements of a vector. The general form is ${\bf y'Qy}$, where ${\bf y}$ is a vector of random variables, and ${\bf Q}$ is a regulator matrix. The regulator matrix can take on various forms and values depending on the situation. Usually ${\bf Q}$is a symmetric matrix. Examples of different ${\bf Q}$ matrices are as follows:

\({\bf Q = I}\), then \({\bf y'Qy = y'y} \) which is a total sum of squares of the elements in ${\bf y}$.
\({\bf Q = J}(1/n)\), then \({\bf y'Qy = y'Jy}(1/n)\) where n is the length of ${\bf y}$. Note that ${\bf J=11'}$, so that \({\bf y'Jy = (y'1)(1'y)}\) and ${\bf (1'y)}$ is the sum of the elements in ${\bf y}$.
\({\bf Q} = \left({\bf I - J}(1/n) \right)/(n-1) \), then ${\bf y'Qy}$ gives the variance of the elements in ${\bf y}$.

The expected value of a quadratic form is

\begin{displaymath}E({\bf y'Qy}) = E(tr({\bf y'Qy})) = E(tr({\bf Qyy'})) = tr({\bf Q}
E({\bf yy'})). \end{displaymath}


\begin{displaymath}Var({\bf y}) = E({\bf yy'})-E({\bf y})E({\bf y'}) \end{displaymath}

so that

\begin{displaymath}E({\bf yy'}) = Var({\bf y})+E({\bf y})E({\bf y'}), \end{displaymath}


\begin{displaymath}E({\bf y'Qy}) = tr({\bf Q}(Var({\bf y})+E({\bf y})E({\bf y'}))).\end{displaymath}

If we let \(Var({\bf y}) = {\bf V} \mbox{ and } E({\bf y}) = {\bf\mu} \), then

\begin{eqnarray*}E({\bf y'Qy}) & = & tr({\bf Q}({\bf V + \mu \mu'})) \\
& = & ...
... tr({\bf Q\mu \mu'}) \\
& = & tr({\bf QV}) + {\bf\mu' Q \mu}.

The expectation of a quadratic form does not depend on the distribution of ${\bf y}$. However, the variance of a quadratic form requires that ${\bf y}$ follows a multivariate normal distribution. Without showing the derivation, the variance of a quadratic form, assuming ${\bf y}$ has a multivariate normal distribution, is

\begin{displaymath}Var({\bf y'Qy}) = 2 tr({\bf QVQV}) + 4 {\bf\mu'QVQ\mu}. \end{displaymath}

The quadratic form, ${\bf y'Qy}$, has a chi-square distribution if

\begin{displaymath}tr({\bf QVQV}) = tr({\bf QV}), \mbox{ and }
{\bf\mu'QVQ\mu} = {\bf\mu'Q\mu}, \end{displaymath}

or the single condition that ${\bf QV}$ is idempotent. Then if

\begin{displaymath}m = tr({\bf QV}) \mbox{ and } \lambda = .5{\bf\mu'Q\mu}, \end{displaymath}

the expected value of ${\bf y'Qy}$ is $m+2\lambda$ and the variance is $2m+8\lambda$, which are the usual results for a noncentral chi-square variable.

The covariance between two quadratic forms, say ${\bf y'Qy}$ and ${\bf y'Py}$, is

\begin{displaymath}Cov({\bf y'Qy,y'Py}) = 2tr({\bf QVPV})+4{\bf\mu'QVP\mu}. \end{displaymath}

The covariance is zero if ${\bf QVP=0}$, then the two quadratic forms are said to be independent.

Bilinear Forms

A bilinear form is represented as ${\bf x'By}$, where ${\bf x}$ and ${\bf y}$ are two different random vectors, possibly of different lengths, and ${\bf B}$ is the regulator matrix. If \( E({\bf x}) =
{\bf\alpha} \) and \(E({\bf y}) = {\bf\mu} \) with \( Cov({\bf x,y}) = {\bf C} \), then

\begin{displaymath}E({\bf x'By}) = tr({\bf BC'})+{\bf\alpha'B\mu}, \end{displaymath}

and if \({\bf V}=Var({\bf y}) \mbox{ and } {\bf T}=Var({\bf x}) \), then

\begin{eqnarray*}Var({\bf x'By}) & = & tr({\bf BC'BC'})+tr({\bf BVB'T})+
{\bf\alpha'BVB'\alpha} \\
& & +{\bf\mu'B'TB\mu}+2{\bf\alpha'BC'B\mu}.

Usually, the lengths of ${\bf x}$ and ${\bf y}$ are equal and ${\bf B}$ is symmetric. If ${\bf B}$ is also positive definite, then ${\bf x'By}$ is a sum of cross-products. Bilinear forms occur in the estimation of covariances. Note that a bilinear form may be written as a quadratic form as follows: Let

\begin{displaymath}{\bf z'} = \left( \begin{array}{cc}
{\bf x'} & {\bf y'} \end{array} \right), \end{displaymath}


\begin{displaymath}{\bf x'By} = \left( \begin{array}{cc}
{\bf x'} & {\bf y'} \en...
...left( \begin{array}{c} {\bf x} \\ {\bf y} \end{array} \right), \end{displaymath}


\begin{displaymath}Var({\bf z}) = \left( \begin{array}{cc}
{\bf T} & {\bf C} \\ {\bf C'} & {\bf V} \end{array} \right). \end{displaymath}

next up previous

This LaTeX document is available as postscript or asAdobe PDF.

Larry Schaeffer