Introduction to Mx

S.Purcell


Univariate ACE script
Univariate ACE output
Example multivariate ACE script
Example multivariate ACE output

 

 

 

Mx model-fitting software by Mike Neale

Mx is a freely-available model-fitting program, very popular in the field of behavioural genetics. It is similar to commercial programs such as LISREL and EQS. It offers a Graphical User Interface (MxGUI), which allows path diagrams to be drawn describing the model to be fit to the data. Alternatively, scripts can be written to specify the model: this tutorial looks at the structure of Mx scripts and output.

Mx is available for free downloading, supporting various platforms from the Mx homepage.

General Structure of an Mx Script

An Mx script comprises of one of more groups. Mx is primarily structured this way because it was designed for the analysis of genetically informative data, where different types of relatives can be thought of as forming distinct groups. In an Mx script, one might find one group for MZ twins, one for DZ same-sex twins, say, and possibly one for DZ opposite-sex twins.

The groups in an Mx script do not always map onto groups of data from different types of relatives, however. Certain groups have special functions such as specifying the model, performing calculations or imposing constraints upon the analysis.

The basic structure of a group is:
  1. TITLE
  2. Indicate GROUP TYPE : data / calculation / constraint
  3. Read and select any observed data, supply labels
  4. MATRICES : declare at least one matrix
  5. Specify numbers and parameters, starting values, equality constraints
  6. MODEL : define matrix formula: covariance / means / threshold / compute
  7. Request fit functions, statistical output and optimisation options, other options
  8. END command

First script : Univariate ACE twin model

Below we will walk through a script designed fit the ACE model to twin data. The basic function is to take two covariance matrices (one for MZ twins, one for DZ twins) and to estimate the proportion of variance attributable to additive genetic effects, to shared environmental effects and nonshared environmental effects for the trait. Click here to see the whole script.
! Mx ACE script for twin data
We have used the exclamation mark character to comment-out this line, so it does not get processed by Mx. It is good practice to put as many comments or remarks in scripts as possible, as they will help you and others navigate scripts.
#define nvar 1

The #define command is a form of macro that assigns the value 1, in this case, to the variable named nvar. Where ever nvar appears in the script subsequently, it will be replaced by the value 1, as determined here. In this case, the label nvar represents the number of variables to be analysed per twin (which, for a univariate script, is 1). This command allows the user to easily make lots of changes in a script by just changing the first #define line. We will see how this is used when we consider modifying the script for multivariate analysis. This command always occurs before the first group.
Group1: Defines Matrices
Calc NGroups=4
The first line here is the title of the group: we have named it Group1: Defines Matrices - a descriptive name is generally a good idea. The second line tells Mx what type of group this is. Because the keyword Calc (or Calculation) appears on this line, Mx does not expect to read any data for this group. The NGroups command is needed to tell Mx how many groups there are in the entire script (so there are four in this script, including this group). The function of this first group is to set up the basic parameters that will be used in the modelling of twin data.

	Begin Matrices;
	        ! path coefficients
		X Lower nvar nvar free    ! additive genetic 
		Y Lower nvar nvar free    ! shared environmental 
		Z Lower nvar nvar free    ! non-shared environmental

The Begin Matrices; (note that Mx can often be very fussy about punctuation - the semi-colon is not optional) defines three matrices, labelled X, Y and Z. As we shall see, these correspond to the path coefficients for additive genetic, shared environmental and nonshared environmental twin model parameters. They correspond to these entities because of the way in which the twin covariance matrices are described in terms of them (as we will see below). There is nothing 'fixed' about them at this stage, however: here they are just any three matrices.

The Lower keyword tells Mx that these matrices are lower triangular matrices - that is, the only 'free' elements in these matrices are those below and including the diagonal. In this case, the matrices are only 1 by 1 (specified by the nvar macro) - so we can think of X, Y and Z as simply being three numbers rather than matrices.

The free keyword tells Mx that the maximum-likelihood model-fitting should attempt to estimate best-fitting values for the free elements in these matrices, rather than assuming the values to be fixed at either zero (by default) or some other user-specified value (which Mx would assume if the free keyword was missing, or replaced with the term fixed).
	
		H Full 1 1

Here we define one other matrix, H. As we shall see, this is a single number (as it is a 1 by 1 matrix, so even though the matrix is full, it is identical to a lower matrix in this case. As we shall see, H is assigned to be the constant 0.5. This value is used in the specification of the DZ covariance structure, below. In Mx, everything must be in a matrix - it is not possible, for example, to specify that a matrix is multiplied by a constant number (e.g 10*A). Always one matrix must operate on another matrix, where one or both matrices might have been predefined to represent constants (i.e. rather than estimated or fixed parameters) previously (e.g. T*A).


	End Matrices;

This command pairs up with the Begin Matrices; command, indicating that we have defined all the matrices in this group.

	Matrix H .5

Here we assign a value of 0.5 to the matrix H. Because it is a 1 by 1 matrix, it is unambiguous which element should be assigned as 0.5 (i.e. there is only one element). If H had more than one element, however, it might be necessary to specify which element(s) are being assigned to.

	Begin Algebra;

We have defined our initial matrices in the section above. Those matrices will contain the free elements that are to be estimated in the model fitting process. Mx also allows us to calculate other matrices that are algebraic functions of the other matrices. These can be used, as we shall see, to make the specification of the model easier, or to make the output for interpretable (e.g. when we standardise the output, as in the final group in this script).
	        ! covariance matrices
		A= X*X';   ! genetic 
		C= Y*Y';   ! shared environmental
		E= Z*Z';   ! nonshared environmental
Here we calculate three matrices that are essentially the squares of the original three matrices - that is, a matrix post-multiplied by its transpose. (Note that this formulation also gives the matrix-equivalent of squaring when the matrices are larger than 1 by 1.) If the first three variables represented the path coefficients, these can be thought of as the variance components (in the univariate case: they would be the components of covariance matrices in the multivariate case, as the script indicates). That is, in path diagram terms, we have traced up the path, round the variance of the latent variable (which is conceptually fixed to 1, and so need not appear in the script) and back down the path. Multiplying the values of the paths traced, gives us the path coefficient squared as the contribution to variance from that associated latent variable. Below we will write out the model for the covariance structure of MZ and DZ twins directly in terms of the variance components - that is, directly in terms of the matrices A, C and E.

	End Algebra;
This is required to let Mx know that we have finished calculating new matrices.


	Start 0   X 1 1 to X nvar nvar 	
	Start 0   Y 1 1 to Y nvar nvar
	Start 1.5 Z 1 1 to Z nvar nvar

For the original matrices, whose free elements will be estimated during model-fitting, we have to supply starting values. For univariate ACE twin models, it is usually enough to simply start the genetic and shared environmental paths at zero, and set the nonshared environmental path to the trait standard deviation. Obviously, the better the starting values, the quicker model-fitting will be. Note the use of the to keyword - if nvar were in fact different from 1, then all the elements between X 1 1 (that is, the top left element of X) and X nvar nvar (that is, this bottom right) would be initialised at the set value. This can be especially useful in the multivariate case.

End
This command signals that it is the end of this, the first group.


Group2: MZ twin pairs
Data NInput_vars=2 NObservations=1500
After the title line of the second group, we see a Data keyword in place of Calc. Mx therefore assumes that we will be supplying some data of some sort for this group to model. We need to tell Mx here about the structure of the data - the number of input variables (in the univariate case we will have one variable on both twins - so we have two input variables) and the number of observations (i.e. MZ twin pairs in this case).

     CMatrix 
	2.12
	1.67	2.08
The CMatrix command tells Mx that we are supplying the data in the form of a covariance matrix. Unless otherwise told, Mx will assume that this covariance matrix will be a lower diagonal matrix - the element above the diagonal is intentionally missing, as it is necessarily equal to the corresponding below diagonal element - i.e. 1.67.

Mx can read data in other forms - notably as raw data (in which case the keyword RE file="myfile.raw" is used, where the file is a space-delimited ASCII file with no header or non-numerical characters) or as a correlation matrix (in which case KMatrix does the job).

     Labels trait1 trait2 
This command assigns text labels to the input variables. The labels are assigned to the input variables in the order in which they are read in: here the first variable is called trait1 (note that this could have been anything, such as anxscr01) - this is the trait score of the first twin. So the second twin is labelled trait2. Note that whether or not a twin is 'first' or 'second' is solely reflecting their position in the variance- covariance matrix.

     Matrices= Group 1
We could have defined matrices in this group using the Begin Matrices; but instead we can specify that the matrices defined in a previous group will be the matrices also for this group.

     Covariances A + C + E  |    A + C  _
                   A + C    |  A + C + E /
This statement contains the core of the model. The terms after the Covariance statement represent an algebraic description of the twin covariance matrix matrix. The operators | and _ are used to adhere matrices together to make larger matrices. We can think of the matrices A, C and E, representing the variance components, as the building blocks with which we describe what we expect the MZ covariance matrix to look like.

The top left element, that is all the terms up to the first | define the trait variance for twin 1: simply the sum of all the components of variance, additive genetic, shared and nonshared environmental.

The next element (which is the top right element in the covariance matrix) represents the covariance between twin 1 and twin 2. For MZ twins, this is equal to the full complement of additive genetic and shared environmental variation, by definition. By symmetry of covariance matrices, the bottom left element is clearly the same. The bottom right element, the variance for twin 2, is also the same as the variance for twin 1. Although this need not be the case, given that we are not modelling correlation matrices, the most parsimonious model has no reason to believe that all 'twin 1s' should be fundamentally different to 'twin 2s'. The / symbol is required to tell Mx that we have finished specifying the model.

     Option RS
This command tells Mx to print out the residual covariance matrix for this group in the output - that is, the difference between the observed MZ covariance matrix and the one predicted by the model. Other optional features can be specified here, with keywords other than RS after Option.

End
The is the end of the MZ data group.


Group3: DZ twin pairs
Data NInput_vars=2 NObservations=2000
Similar to the above data group, we must read in the data for DZ twins and supply a model for DZ twins.

     CMatrix 
	1.98
	1.25	2.06
This is the observed covariance matrix for DZ twins. Note that the covariance is lower than the MZ covariance, whilst the variances are approximately equal. This is pleasing, as these are intrinsic assumptions of the ACE model, and interpreting the data would be difficult if this did not hold.

     Labels trait1 trait2 
As above.

     Matrices= Group 1
As above. This ensures that the components of variance estimated for MZ twins are the same as those for DZ twins. This is clearly what we want - the aetiology of a trait should be the same whether or not an individual is an MZ or DZ twin. (In fact, if we did not equate these values across MZ and DZ groups, the model would not be identified, and would not be able to estimate anything useful.) The difference between MZ and DZ twins is reflected in the different specification of the model (see below) - we use the same building blocks, however.

     Covariances A + C + E  |   H@A + C  _
                  H@A + C   |  A + C + E /
The critical difference is simply that biometrical theory predicts that only half of the variation due to the additive effects of genes will be shared between DZ twins. This is why we multiply the additive component of variance by 0.5 (i.e. H)) in the covariance element. Note the use of the Kronecker product style of multiplication (@) which facilitates multivariate analysis.

     Option RS

End



Group4: Standardised solution
Calculation
This group does not read any data (and so does not contain a data statement on the second line). The function of this group, as the title, implies is to provide a standardised solution. That is, we have been fitting the models to the covariance structures of MZ and DZ twins, and so the parameters represent the components of variance (A, C and E or the corresponding path coefficients X, Y and Z). This group will calculate the proportions of variance that are attributable to additive genetic, shared and nonshared environment effects. Although this is trivial in the univariate case (it could easily be calculated by hand) this is not the case in the multivariate case, so it is useful to include a group such as this.

	Matrices = group 1
Again, we need to tell Mx which set of matrices will be used for this group.
	
	Begin Algebra;
We tell Mx that we are about to calculate some new matrices from existing ones. The details of the procedure we are about to see below are not in themselves important, but we shall go through them just to get a sense of how Mx can be used.

	  ! phenotypic covariance matrix  
	  P = A + C + E;
First we calculate the phenotypic covariance matrix. In the univariate case, this is simply a 1 by 1 matrix - the variance of the trait. In the multivariate case, the off-diagonal elements would represent the phenotypic (i.e. within individual) covariances between the different measures.

	  ! diagonal matrix of phenotypic
	  !  standard deviations
	  D = \sqrt(\v2d(\d2v(P)));

We use some of Mx's matrix functions here to extract the diagonal elements of the phenotypic covariance matrix and put them in a vector (\d2v). We then convert this vector back into a matrix (\v2d). These two operations have essentially had the effect of making all the off-diagonal elements zero, but preserving the diagonal elements (i.e. the variances). We then take the square root of these elements (\sqrt) in order to get the standard deviations.

	  ! Phenotypically standardised 
	  !  covariance matrices :
	  ! phenotypically standardised

	  T = D~ * A * D~;   ! additive genetic
	  U = D~ * C * D~;   ! shared environmental
	  V = D~ * E * D~;   ! nonshared environmental

These statements essentially standardise the three variance component matrices. Pre- and post- multiplying by the inverse of the diagonal matrix containing the standard deviations (D~) has this desired effects. Again, the details are not important, but if you want to work through the matrix algebra you will see that it is really quite simple. This is the beauty of matrix notation - that lots of calculations can be specified in just a few lines. These new matrices will contain the proportions of variance along the diagonal elements, i.e. in this case, matrix T would represent the heritability.

	  ! Correlation matrices 
	  !  for aetiological factors 
	  !  i.e. not phenotypic correlations

	  G = \stnd(T);      ! additive genetic 
	  S = \stnd(U);      ! shared environmental
	  N = \stnd(V);      ! nonshared environmental

If we standardised these matrices (so that we will have 1's along the diagonal) we obtain the genetic correlation matrix, the shared environmental correlation matrix and the nonshared environmental correlation matrix. Here, in the multivariate case, the off-diagonal elements will represent the overlap of genetic and environmental factors in the aetiologies of the traits (in the univariate case, these matrices will simply be each a 1 by 1 matrix containing a 1, and so meaningless).

	End Algebra;
We have finished now...

End

...and so we tell this to Mx.

Mx Output for the Univariate ACE model

Click here to see the whole output, generated from running the above univariate ACE model script. The output can be broken down into several main sections:
  • The input script is given at the top of the output
  • Parameter specifications : for each group, all the matrices defined are listed, and whether elements of these matrices are free to be estimated or fixed is also given.
  • Parameter estimates : for each group, the best-fit parameter estimates are given for all free elements. Note that this often causes repetition in the output, as in the case of a twin model, the parameters in the MZ data group will be the same as the parameters in the DZ data group.
  • Model summary : at the end of the output, gives the model fit and other important statistics.
Mx output files do tend to be quite long, so we will not cover the entire output here: rather, we will focus on some of the most important parts.
 The following MX script lines were read for group  1
 
 #DEFINE NVAR 1
 GROUP1: DEFINES MATRICES
 DATA CALC NGROUPS=4
  BEGIN MATRICES;
   X LOWER NVAR NVAR FREE    ! GENETIC STRUCTURE
   Y LOWER NVAR NVAR FREE    ! SHARED ENVIRONMENT
   Z LOWER NVAR NVAR FREE    ! NON-SHARED PATH CO-EFFICIENTS

   ... 

As mentioned, the output begins with a copy of the script that has been run.
    G = \STND(T);
  How am I supposed to take the square root of  0.?
  Diagonal elements are:
  0.  0.
  Matrix is
   0.0000E+00
  Matrix not fully standardized
You may notice certain warnings or error messages: these are not necessarily important. In this case, for example, Mx is merely warning us that the T matrix contains zero before the model-fitting process begins.
  PARAMETER SPECIFICATIONS
  
  GROUP NUMBER: 1
  
Group1: Defines Matrices
Next, we see the parameter specifications for the different groups listed in order.
  MATRIX A
 This is a computed FULL matrix of order    1 by    1
  It has no free parameters specified
For instance, under Group 1, we see the A matrix listed. The output confirms that it is a full, 1 by 1 matrix, which has no free parameters. This is because it is calculated from the matrix X, which contains the free parameter.
  MATRIX X
 This is a LOWER TRIANGULAR matrix of order    1 by    1
    1
 1  1
As mentioned above, the matrix X contains the free parameter representing the path coefficient for additive genetic effects. The matrix A equals X*X'. Of the three 1s, the top and leftmost numbers represent the row and column numbers of the matrix X. The remaining 1 is the number assigned to this free parameter.
  MATRIX Y
 This is a LOWER TRIANGULAR matrix of order    1 by    1
    1
 1  2
  
  MATRIX Z
 This is a LOWER TRIANGULAR matrix of order    1 by    1
    1
 1  3
Likewise, we see the 2nd and 3rd parameters being assigned here (shared environment and nonshared environment).
  GROUP NUMBER: 2
  
Group2: MZ twin pairs
Next, the parameter specification for the second group is given. Because the script used the Matrices=Group1 command for this group, we will see the same information.
  MATRIX X
 This is a LOWER TRIANGULAR matrix of order    1 by    1
    1
 1  1
For example, in the second group (i.e. the data group for MZ twins) we see that the free element of matrix X is still assigned parameter number 1. (Note that, although different groups can use the same letters to represent different matrices, the parameter numbers specified in the output apply to the entire script: in this way, we know that we have constrained these parameters to be equal across groups.)
  GROUP NUMBER: 3
  
Group3: DZ twin pairs
The third group, the DZ twin data group.
  MATRIX X
 This is a LOWER TRIANGULAR matrix of order    1 by    1
    1
 1  1
This group also uses the same set of matrices.
  GROUP NUMBER: 2
  
Group2: MZ twin pairs
The information so far is useful for checking that the script has been properly constructed, but they do not constitute output as such. It is the best-fit parameter estimates that are of the greatest interest as well as the goodness-of-fit statistic for the model: these are given next, again for each group in the script.

We shall look at the values in Group 4: although these will be identical to the other groups, Group 4, the standardisation group, will also contain the standardised estimates.
  OBSERVED COVARIANCE MATRIX
             TRAIT1     TRAIT2
 TRAIT1      2.1200
 TRAIT2      1.6700     2.0800
Before looking at the best-fit parameter estimates from the standardised group, it is worth noticing the output from the two data groups, 2 and 3. Mx will print out the observed trait covariance matrix, in this case for MZ twins. That is, this should be identical to what was entered in the script.
  EXPECTED COVARIANCE MATRIX
             TRAIT1     TRAIT2
 TRAIT1      2.0512
 TRAIT2      1.6225     2.0512
Additionally, the expected covariance for MZ twins will be displayed: this is based on the best-fit parameter estimates. This gives a very good impression as to how the model is performing, i.e. how well it accounts for the observed data.
  RESIDUAL MATRIX
          TRAIT1  TRAIT2
 TRAIT1   0.0688
 TRAIT2   0.0475  0.0288
A particularly useful representation is the residual covariance matrix - that is, the difference between the observed and expected covariance matrices.
  GROUP NUMBER: 3
  
Group3: DZ twin pairs
These statistics are also given for DZ twins:
  OBSERVED COVARIANCE MATRIX
             TRAIT1     TRAIT2
 TRAIT1      1.9800
 TRAIT2      1.2500     2.0600
The observed statistics, as entered in the script.
  EXPECTED COVARIANCE MATRIX
             TRAIT1     TRAIT2
 TRAIT1      2.0512
 TRAIT2      1.2780     2.0512
The model-based expected statistics for DZ twins (i.e. note how the expected variances are the same as for MZ twins).
  RESIDUAL MATRIX
          TRAIT1  TRAIT2
 TRAIT1  -0.0712
 TRAIT2  -0.0280  0.0088
And the discrepancy between the expected and observed.
  GROUP NUMBER: 4
  
Group4: Standardised solution 
The expected covariance matrices presented above are based on the best-fit parameter estimates for the model, as mentioned. We shall review these here. Mx lists matrices in alphabetical order, although we shall go through the matrices in a more logical order here.
  MATRIX X
 This is a LOWER TRIANGULAR matrix of order    1 by    1
          1
 1   0.8301
The three basic free parameters of this model are the elements in the matrices X, Y and Z. These are given here at their best-fit values. That is, the additive genetic path coefficient is 0.8301;
  MATRIX Y
 This is a LOWER TRIANGULAR matrix of order    1 by    1
          1
 1   0.9662
The shared environmental path coefficient is 0.9962;
  MATRIX Z
 This is a LOWER TRIANGULAR matrix of order    1 by    1
          1
 1   0.6547
The nonshared environmental path coefficient is 0.6547.
  MATRIX A
 This is a computed FULL matrix of order    1 by    1
  [=X*X']
          1
 1   0.6890
Rather than presenting results in terms of path coefficients, it is more standard to talk in terms of proportions of variance. The matrices A, C and E represent the components of variance, and are simply the squares of the path coefficients.
  MATRIX C
 This is a computed FULL matrix of order    1 by    1
  [=Y*Y']
          1
 1   0.9335

  MATRIX E
 This is a computed FULL matrix of order    1 by    1
  [=Z*Z']
          1
 1   0.4287
Note how for calculated variables, the formula is given below the description of the matrix type (e.g. [=Z*Z']).
  MATRIX H
 This is a FULL matrix of order    1 by    1
          1
 1   0.5000
Not all matrices contain parameters to be estimated, or functions of those parameters. The matrix H is set to the constant 0.5, used in the specification of the DZ covariance: Mx prints this for us in any case.
  MATRIX P
 This is a computed FULL matrix of order    1 by    1
  [=A+C+E]
             1
 1      2.0512
As mentioned, we are typically interested in proportions of variance. Here we calculate the total variance, P, by summing A, C and E.
  MATRIX D
 This is a computed FULL matrix of order    1 by    1
  [=\SQRT(\V2D(\D2V(P)))]
             1
 1      1.4322

We take the square root of this to give the standard deviation.
  MATRIX T
 This is a computed FULL matrix of order    1 by    1
  [=D~*A*D~]
          1
 1   0.3359
And we use this to standardise our estimates (remember, the script is written in a general, multivariate form, so the expression looks more complicated than merely what has been described here). The matrix T therefore indicates that additive genetic effects account for just under 34% of the variance in the trait.
  MATRIX U
 This is a computed FULL matrix of order    1 by    1
  [=D~*C*D~]
          1
 1   0.4551
Likewise, shared environmental influences account for just under 46%;
  MATRIX V
 This is a computed FULL matrix of order    1 by    1
  [=D~*E*D~]
          1
 1   0.2090
And nonshared environmental influences account for just under 21%
  MATRIX G
 This is a computed FULL matrix of order    1 by    1
  [=\STND(T)]
          1
 1   1.0000
The matrices G, S and N would represent the genetic and environmental correlations in the multivariate case: in the univariate case, these will always be 1-by-1 unit matrices, so we shall ignore here.
 Your model has    3 estimated parameters and      6 Observed statistics
  
 Chi-squared fit of model >>>>>>>     2.491
 Degrees of freedom >>>>>>>>>>>>>         3
 Probability >>>>>>>>>>>>>>>>>>>>     0.477
 Akaike's Information Criterion >    -3.509
 RMSEA >>>>>>>>>>>>>>>>>>>>>>>>>>     0.003
Last but not least, the goodness-of-fit statistics for the model are given. The output tells us that we had 6 observed statistics (i.e. four variances and two covariances from the MZ and DZ covariance matrices) and estimated 3 parameters (i.e. additive genetic, shared environmental and nonshared environmental components of variance). This gives 6-3=3 degrees of freedom, as also represented above. The chi-squared values for these data is 2.491, which is non-significant (p=0.447). This indicates that the observed values do not significantly depart from the expected values: the model fits. The other lines represent other fit statistics that will not be reviewed here.

Multivariate ACE Model Script

Click here to see the whole script.

Output from Multivariate ACE Model Script

Click here to see the whole output.

 

Site created by S.Purcell, last updated 20.05.2007