SKUDRIVER provides an interface for you to perform a grid search of likelihood space, fitting a mixture of up to three distributions to data under the mixed genetic model. It was designed for the purposes of genetic epidemiology, but can be used to examine any continuous data for the presence of multiple normal distributions.
SKUDRIVER is a Windows program, so double-click on it in Windows Explorer to run it.
SKUDRIVER takes as input a user-specified range of starting values for each of the following variables:
Each of the possible starting values for the parameters is used to perform a maximum likelihood estimation using the program SKUMIX. (MacLean CJ et al., Biometrics 32:695, 1976) In this way a grid search of the likelihood surface is conducted, minimizing problems of singularities or local maxima. Displacement (T) is defined as the difference between the two extreme (homozygote) genotypic means. Thus the three genotypic means are at U, U+DT and U+T. The proportions of the population within each of the three distributions are (1-Q)2 + FQ(1-Q), 2Q(1-Q)(1-F) and Q2+ FQ(1-Q). Since the input parameters are specified as either ‘fixed’ or ‘estimated’ in SKUDRIVER, the user may constrain the model to a single distribution by fixing the value of T as zero, or may specify a two-distribution model by fixing the value of D as zero (or one). One of the important features of SKUMIX is the facility to specify starting values P and R of a deskewing power transformation of the form
y = R / P [ (x/R + 1)P - 1 ]
where R is chosen such that every x/R + 1 is positive in the sample and P is optimized as part of the maximum likelihood estimation. This allows for the more conservative approach of assessing the fit of multiple distributions only after skewness has been removed, since skewness in itself may lead to the mistaken conclusion that more than one distribution is present. Significant skewness may be tested for by a likelihood ratio test comparing a model in which P is fixed to a value of 1 with a corresponding model in which P is not constrained. NB SKUMIX requires that the input data are standardized.
Start by typing in the name of the input data file you are using. The default is DATA.DAT. Next specify the data format in terms of a FORTRAN format specifier. The default is F11.5, i.e. 11 characters wide (including the decimal point) with 5 characters after the decimal point. Then specify the starting values of the parameters of the model, and whether they are to be estimated or are fixed. Note that if a starting values of 0 for the power transform variable P will be ignored as they cause problems for SKUMIX. Let the mouse cursor hover over the ‘Run SKUMIX’ button briefly before clicking it, and you will see a message indicating how many times SKUMIX is to be run. You may wish to break your analysis into separate parts if your system cannot cope with large numbers of runs.
After the SKUMIX runs are over SKUDRIVER will report the best FINAL F value: this is the (2*log likelihood + constant) value used for comparisons between models. SKUDRIVER will also refer you to a file called BEST.OUT, which contains details of the model parameters which give the maximum likelihood. NB several sets of input parameters may converge to the same maximum likelihood model: if so, they are all reported in BEST.OUT. You may calculate how many cases are in each of the output distributions using the formulae given above, or use an Excel spreadsheet designed to make this task easier.
Permission to publish SKUMIX on this site has kindly been
granted by Charles J MacLean.