# Maximum Likelihood Estimation (MLE)

## MLE for twin data

How does all of this apply to twins and the kind of complex, quantitative traits that we wish to study?

The fundamental principles of maximum likelihood still apply in exactly the same manner as for the coin-tossing experiment. What change are the data we measure and the form of the probability model that describes these data.

In the case of coin tossing, we observed two items of data: n the total number of tosses and h the number of heads. For twins, in the most basic case, we would collect three pieces of information for each twin pair:
• a trait measure for twin 1
• a trait measure for twin 2
• whether they are identical or not (MZ vs DZ)
For the coin tossing, we used a binomial distribution to model the data. Typically, for quantitative traits, we would assume that our observations come from a normally-disbtributed trait population (bell-shaped curve). As the unit of analysis is a twin pair (i.e. involving two variables rather than one) we need to use the bivariate form of the normal distribution. This specifically describes distributions of pairs of scores.

Finally, in the coin-tossing experiment we had one parameter in our model, representing the probability of obtaining a head. In the case of twins, we would generally have three parameters (four if we include a means model, see below):
• a : proportion of variance attributable to additive genetic variation
• c : proportion of variance attributable to shared environmental variation
• e : proportion of variance attributable to nonshared environmental variation
• m : trait mean
Traditionally, we would say the that binomial distribution takes two parameters, n the total number of trials and p the probability of success. A random variable, say X that has a binomial distribution is written :
```       X~B(n, p)
```
and we are interested in P(X=x): that is, the probability that the the random variable X has the specific value x. In our coin tossing example, h, the observed number of heads, is equivalent to x. Recall,

Similarly, the normal distribution has two parameters. These parameters are in terms of the mean and variance of the distribution rather than probabilities of success and numbers of trials.

[reword this para].These we shall call which is the trait mean and which is the trait standard deviation. A random variable, say X that has a normal distribution is written :
```       X~N(, )
```
The standard formula which defines P(X=x) for the bivariate normal distribution is

Exactly what the component terms of these formula represent is not important - in any case, it is beyond the scope of this tutorial. The important point to note is that the normal probability function is determined by only two parameters (although these parameters are actually matrices):
• : a vector of means (two means in the bivariate case)
• : the covariance matrix (a two-by-two matrix in the bivariate case)
(Each pair's trait scores are in the vector x and p represents the number of variables, i.e. 2 in the bivariate case.)

But we said that the coin tossing model only had one parameter, and that the model we fit to twin data would have 3 or 4 parameters? This is the distinction between parameters of a probability distribution and model parameters. Most model fitting involves some kind of re-parameterisation, but there is a direct correspondence between the two types of parameters.

The following table gives the relationships:
```

Binomial Probability Model           Coin Tossing Model

N (number of trials)         ---->    N (observed data)
P (probability of success)   ---->    P (estimated
parameter)

Normal Probability Model             Twin Design Model

(mean vector)
----> m
(estimated or
fixed parameter)
(covariance matrix)
---->  a, c, e
(estimated or
fixed parameters)

```
In the case of the coin tossing experiment, there was a one-to-one correspondence between the parameters of the binomial probability function and the underlying model. That is, p the probability of 'success' in the binomial model is very directly equivalent to p the probability of getting heads in our model.

In the case of fitting a normal distribution to twin data, parameters can either refer to the direct parameters of the normal distribution (the mean vector and covariance matrix) or the parameters of the underlying genetic model (proportion of trait variation attributable to additive genetic variation, etc.)

Model-fitting for twin data proceeds by specifying the mean vector and covariance matrix of the normal distribution in terms of the genetic parameters of interest. As we shall see in the next section, this is done according to basic biometrical assumptions and allows to us estimate quantities of interest providing we have collected suitably informative data.

## Now we are ready to model fit to twin data

As mentioned elsewhere in this course, twin analysis essentially models the covariation between identical and non-identical twins. The comparison of an MZ twin correlation with a DZ twin correlation allows us to estimate the effects of additive genetic influences, shared environmental influences and nonshared environmental influences.

Specifically, we are re-parameterising the twin covariance structure in terms of the parameters a, c and e (as mentioned above). The covariance matrix for a sample of twin pairs contains three unique values:
• the variance of twin 1
• the variance of twin 2
• the covariance between twin 1 and twin 2
According biometrical theory the trait variance can be decomposed into independent components of variance, and the trait covariance, conditional on twin zygosity, can be expressed in terms of these components of variance also.

• Trait variance = a + c + e
• MZ covariance = a + c
• DZ covariance = 0.5a + c
We can therefore write the trait covariance matrices for MZ and DZ twins in terms of these three components of variance. For MZ twins

whilst for DZ twins

The Means Model

Because the twin design is primarily an analysis of individual differences we are typically only interested in the components of variance - that is, modelling the twin covariance structure. The normal distribution requires a means model however. We could either let all four means (i.e. twin 1 and twin 2 for MZ and DZ twins) be estimated independently, or we could constrain all four measures to be estimated at the same value. The latter option would be the typical choice: conditional on the means not being significantly different from each other, this will provide a more powerful test for fewer parameters are being estimated. (Note: if the means are different in a standard twin design, this may well be indicative of some problem in ascertainment or data management.)

Raw data versus Summary Statistics

We can either formulate models in terms of the raw unit of observation or it may be possible to model certain summary statistics instead. In the coin tossing example, the summary statistics were the total number of tosses and the number of heads. These two summary statistics contained all of the information relevant to the problem - that is, given these summary statistics it was not important that we knew the actual sequence of heads and tails.

In a similar way, the mean vector and covariance matrix are said to be sufficient summary statistics in the sense that, under the assumption of normality, we gain nothing by analysing the raw data (i.e. all actual scores for each twin pair) if we know what the mean vectors and covariance matrices are for all MZ and all DZ pairs.

Indeed, it is common practice to ignore the means model and only analyse the covariance matrices for twins. Model-fitting to summary statistics instead of raw data has a slightly more complicated form, which essentially allow computational shortcuts. These shortcuts were more or less essential in the 1960s and 1970s when MLE techniques were first being implemented. Nowadays, analysis of raw data is computationally not a problem.

From the point of view of using model-fitting software such as Mx it makes little or no difference whether or not the model is fitted to raw data or summary statistics. The main difference is, obviously, just in how the information are entered into the program:
```
Raw Data
Input                  Output
(estimated parameters)

Twin1  Twin2  Zyg             a
-0.23  -0.41  1               c
0.43   1.32   1               e
-0.47  0.76   2               m
1.23   0.65   2
-1.62  -0.44  1
...    ...    .

Covariance Matrices
Input                  Output
(estimated parameters)

MZ  1.32                      a
0.87  1.28		      c
e
DZ  1.29
0.54  1.35

```
However, analysing raw data does have certain advantages :
• outliers can be easily detected
• covariates can be easily incorporated
• missing values can be dealt with efficiently
• more complex gene-by-environment interaction models can be implemented easily
For basic twin ACE models, fitting to covariance matrices will be sufficient.

## MLE for twin data

For the purpose of understanding MLE in the context of analysing twin data, it is more transparent to think in terms of the the analysis of raw data. Model-fitting proceeds in the standard way :
1. select starting values for the parameters (a, c, e, m)
2. evaluate the log-likelihood for the first twin pair using the normal probability distribution and zygosity-specific models of the twin covariance
3. sum the log-likelihoods over all twin pairs in the sample
4. optimise the sample log-likelihood with respect to the model parameters
5. the output is then the values for the sample parameters and the log-likelihood
6. the likelihood ratio test can be used to compare the full model which estimates all the parameters with submodels that constrain one or more of the parameters to be zero
7. select the most parsimonious model that explains the data