Univariate Twin AnalysisS.PurcellIntroductionIn this tutorial we shall take a look at a dataset containing scores for 500 MZ twins and 500 DZ twins on a single measure. The aim of these particular analyses is to determine whether or not there is any evidence for a genetic effect on the trait. The dataset can be downloaded here. Depending on your browser, you might have to right-click and choose 'Save As...' or 'Save Target As...'. Alternatively, you might be able to left-click on the link to view the raw data in your browser window and then copy-and-paste the dataset into WordPad or some other text-editor and save it from there onto your computer. The dataset is in comma-separated value format so it can be read by almost any program, including Excel. Each row of the dataset represents information on one twin pair. It contains four variables:
In genetic analyses, we are often more concerned with the relationships in scores between related individuals. An ideal way to get a sense of the relationship between two continuously-distributed measures is to draw a scatter-plot of the two variables. We can use the following command to plot the scores of one twin against the scores of the other twin.
We can quantify this association between scores with a correlation coefficient. Type the command as shown to the right for Stata to calculate the correlation between twins for this trait. Stata displays the following correlation matrix: the correlation is 0.4241. This will undoubtably be significantly greater than zero (the command pwcorr t0_1 t0_2, o sig can be used to also give p-values for the correlations). So far, all the graphs and statistics have not differentiated between MZ and DZ twins. The aim of this analysis is to compare twin similarity between MZ and DZ twins, so we need to analyse twins separately by their zygosity. Type the command sort zyg to sort the dataset by this variable. It is now possible to stratify analysis by twin zygosity. That is, rather than giving the twin correlation for all twins, Stata will output the twin correlation for MZ twins and DZ twins separately. Looking first at whether there are any differences in means or standard deviations between MZ and DZ twins, use the command as shown to the right. The output of this command, shown left, is now broken down for MZ and DZ twins: we can see that there is no big difference between MZ and DZ twins in terms of their trait means or standard deviations. (Formally, we would test these statistically, but an 'eye-ball test' will suffice for present purposes.) Now enter the following command, shown to the right, in order to obtain the twin correlations for MZ and DZ twins separately: these quantities are fundamental to the twin analysis we are about to perform. We see that the MZ correlation (0.5231) is larger than the DZ correlation (0.3204). So the correlation we obtained earlier for the whole sample, ignoring twin zygosity, effectively represented an average between these two figures. Looking at the scatter-plots by zygosity we can see this difference in correlations: the left scatter-plot is for MZ twins, the right one for DZ twins. We can see a slightly stronger relationship for MZ twins.
Covariances and Model-fittingThese estimates of heritability are based on the comparison between correlations. As mentioned in the Appendix, it is often preferable to analyse variances and covariances instead, for a number of reasons. We can obtain the variance-covariance matrices for MZ and DZ twins from Stata by adding the cov option to the correlation command as shown to the right. As we saw earlier, there are no major differences between four estimates of variance. We can apply model-fitting methods to these variance-covariance matrices and compare the results to the simple analysis with correlations.
We can use the Model-fitting 2 module from the main BGIM site. Clicking on the WWW symbol shown below enters the module designed to perform model-fitting analysis on univariate twin data.
The results of this analysis are shown below: The figures represent components of variance. From the full ACE model we obtain a similar estimate of heritability as from the correlational analysis. The total variance is 2.7124 + 1.122 + 3.6693 = 7.4937. Therefore the proportion of variance due to additive genetic effects, or heritability, is 2.7124/7.4937 = 0.36. In other words, the model-fitting analysis estimates that 36% of the variation in the population is due to genetic effects. This can be regarded as a more accurate estimate than the 40% obtained from analysing correlations. The reason for the difference is that the model-fitting analysis is taking into account the small differences amongst the four variances. We also see that the best-fitting model is in fact the AE model: that is, the effect of shared environment is not significant. A model with only additive genetic and nonshared environmental effects is shown on the second line (again, go to the module tutorial for a full explanation of all the figures in this table). The standard estimates from the AE model are given below: the impact of genes is estimated much higher now:
Further exploration: SimulationThe datafile we used in the analysis above was not collected from real individuals measuring any particular trait: it was simulated, using the SIMULATOR module at this website.
Further exploration: Using Mx to perform the analysisTry installing Mx on your computer and run the basic ACE script to perform the same analyses. The basic ACE script can be found here. You will need to edit the script just to enter the covariance matrices, as follows:Group2: MZ twin pairs Data NInput_vars=2 NObservations=500 CMatrix 7.949 4.088 7.682and for DZ twins Group3: DZ twin pairs Data NInput_vars=2 NObservations=500 CMatrix 7.012 2.315 7.447Looking at the output, you will see the relevant parameters A, C and E representing the components of variance. These will correspond very closely to the values obtained from the model-fitting module above. In this case, MATRIX A This is a computed FULL matrix of order 1 by 1 [=X*X'] 1 1 2.7139 MATRIX C This is a computed FULL matrix of order 1 by 1 [=Y*Y'] 1 1 1.1213 MATRIX E This is a computed FULL matrix of order 1 by 1 [=Z*Z'] 1 1 3.6684The script also contains a standardisation group, so the proportions of variance are also given (T, U and V correspond to A, C and E respectively). MATRIX T This is a computed FULL matrix of order 1 by 1 [=D~*A*D~] 1 1 0.3617 MATRIX U This is a computed FULL matrix of order 1 by 1 [=D~*C*D~] 1 1 0.1494 MATRIX V This is a computed FULL matrix of order 1 by 1 [=D~*E*D~] 1 1 0.4889This gives the same heritability estimate, 36%. The fit statistics are also important: Your model has 3 estimated parameters and 6 Observed statistics Chi-squared fit of model >>>>>>> 1.984 Degrees of freedom >>>>>>>>>>>>> 3 Probability >>>>>>>>>>>>>>>>>>>> 0.576The non-significant fit (p=0.576) represents that the model accounts for the observed data well. The script can be modified to test an AE model- remove the free keyword after the Y matrix from the first group: Begin Matrices; X Lower nvar nvar free ! genetic structure Y Lower nvar nvar ! shared environment Z Lower nvar nvar free ! non-shared path co-efficients H Full 1 1 End Matrices;As can be seen this gives similar results to the model-fitting module above: MATRIX T This is a computed FULL matrix of order 1 by 1 [=D~*A*D~] 1 1 0.5252 MATRIX U This is a computed FULL matrix of order 1 by 1 [=D~*C*D~] 1 1 0.0000 MATRIX V This is a computed FULL matrix of order 1 by 1 [=D~*E*D~] 1 1 0.4748Try using Mx to test the CE model; try analysing other univariate twin covariance matrices. Site created by S.Purcell, last updated 20.05.2007 |