Behavioural Genetic Interactive Modules
OverviewCovariance is a fundamental statistic that informs us about the relationship between two characteristics (such as height and weight, for example). This module aims to show how this measure of association is calculated, and how it is related to the concept of variance.
TutorialThis module, covariance.exe is very similar to the previous module that illustrated the calculation of variance. This is unsurprising, for statistically and conceptually covariance is indeed very similar to variance - that is, it is a measure of co-variation between two traits.
This first panel controls the input of data into the module, and is similar to that of the variance module except that we are now dealing with pairs of scores rather than individual scores. Each observation represents, say, one individual and the pair of scores represents two measures for each individual, X and Y. The module will calculate the covariance between X and Y.
As before, the module tabulates scores, this time in pairs. In this case, for example, the 15th individual in the sample has scored 6 on measure X and 45 on measure Y.
The univariate statistics such as the sum and mean for X and Y are calculated and displayed as before. In this case we see, for example, that measure X has a sum of 1374 and a mean of 52.85 in the sample of the 26 individuals entered into the module.
For bivariate data, as well as plotting a bar chart for each measure, we can also plot a scatter-plot that gives a visual representation of the association between the two measures. Each point represents an individual, the horizontal axis represents their score on X, the vertical axis their score on Y. The little lines plotted on the axes represent 1 and 2 standard deviations away from the mean.
From the scatter-plot, we can see that the two variables do not seem to be strongly related. If anything, there appears to be a slight negative association, in that individuals who score higher on X seem to, on average, score slightly lower on Y. The covariance statistic we are about to calculate will quantify this relationship more precisely.
For both X and Y we calculate the deviations from the mean, and square these deviations. These are used in the calculation of variance. The fifth column represents the quantity that is most relevant to covariance however: the cross-product of the deviation from the mean for X and the deviation from the mean for Y for each individual. For example, in the top line here we see that this individual has scored 2.15 units above the mean on X and 43.5 units above the mean on Y. The cross-product is simply 2.15 multiplied by 43.5, or 93.69.Note how some of these cross-products are negative, unlike the squared deviations from the mean. This represents the fact that covariance represents not just a measure of strength of association, but also implies a direction of association. In this context, negative values reflect scoring higher than average on one measure but lower than average on the other measure.
In a similar manner, these squared deviations are summed, as are the cross-products. Note how, in this instance, the sum of cross-products is actually negative.
For both X and Y we can use the sum of squared deviations from the mean to calculate the variance of each measure, as in the previous module. This picture shows the variance calculated for X.
Here we calculate the covariance: whereas the variance of one measure is the average squared deviation from the mean, the covariance between two measures is the average cross-product of deviations from the mean for the two measures. In this case, the covariance is -308.6, confirming the impression given by the scatter-plot that X and Y appear to be negatively associated.So what does -308.6 mean? Is this a strong negative association? Is this meaningfully different from a covariance of zero, implying no association? As you might expect, in order to answer such questions, we have to take into consideration the individual variances of each measure. That is, if the variances of X and Y were both around 10,000 then the absolute magnitude of the covariance, only in the hundreds, would mean that the covariation between X and Y was relatively small compared to the variation in the measures that was not shared between them. If the variances were of the order of four or five hundred, however, then the majority of the variance would appear to be shared. This kind of calculation is precisely what a correlation does: the correlation coefficient will be introduced in the next module.
Finally, the standardised scores for both X and Y are given: these are calculated in an identical manner to the variance module.
Please refer to the Appendix for further discussion of what covariances represent and how they are used.
Site created by S.Purcell, last updated 20.05.2007