previous section  contents page  next section  The statistics of cross-validation residuals


Appendix III

Minimum variance weights minimise the expected value of Rfree

First we show that the least-squares refinement using a weight matrix tex2html_wrap_inline1450 which is the inverse of the VCM of the observations, minimises the VCMs of both the refined parameters tex2html_wrap_inline1312 and the free residuals tex2html_wrap_inline1636 .

Consider a function tex2html_wrap_inline1638 which is a function of the refined least-squares parameters which is linear to within a first order Taylor approximation.

displaymath865

Now consider two column matrices tex2html_wrap_inline1640 and tex2html_wrap_inline1642 , defined below, which are both unbiased estimates of tex2html_wrap_inline1638 .

   eqnarray872

where

   eqnarray886

tex2html_wrap_inline1646 is any weight matrix and tex2html_wrap_inline1648 . From equations (27) and (28) and from the definitions of tex2html_wrap_inline1650 and tex2html_wrap_inline1652

  equation904

We wish to show that the VCM of tex2html_wrap_inline1640 is smaller than that of tex2html_wrap_inline1642 . Because tex2html_wrap_inline1640 and tex2html_wrap_inline1642 are unbiased estimators, tex2html_wrap_inline1664 and thus the VCM of tex2html_wrap_inline1642 can be expressed as

  eqnarray911

The last two terms of the above equation are the transpose of each other and are each zero matrices as shown by the following analysis which uses equations (25), (26), (27) and (29).

eqnarray946

Hence from equation (30)

displaymath973

Since the VCMs are positive definite

displaymath987

Thus the VCM of tex2html_wrap_inline1640 which is calculated with tex2html_wrap_inline1450 is less than the VCM of tex2html_wrap_inline1640 which is calculated with another weight matrix tex2html_wrap_inline1646 . Making the substitution tex2html_wrap_inline1684 and setting tex2html_wrap_inline1686 to a unit matrix, this analysis shows that by using the weight matrix tex2html_wrap_inline1450 , we minimise the variance of tex2html_wrap_inline1312 . Substituting tex2html_wrap_inline1692 and tex2html_wrap_inline1694 , the same analysis shows that tex2html_wrap_inline1450 also minimises the VCM of tex2html_wrap_inline1700 . From equation (18) the VCM of the residuals associated with the excluded observations tex2html_wrap_inline1326 is the sum of the constant matrix tex2html_wrap_inline1360 and the VCM of tex2html_wrap_inline1700 . Hence tex2html_wrap_inline1326 and its trace are also minimised by choosing tex2html_wrap_inline1450 as the weight matrix.

The trace of tex2html_wrap_inline1326 is the expected value of the unweighted sum of squared residuals

displaymath1007

where the summation is taken over the p reflections in the test set. By using the normal approximation in equation (22) we can say that the sum of absolute differences

displaymath1017

and hence tex2html_wrap_inline1138 are approximately minimised by choosing tex2html_wrap_inline1450 as the least-squares weight matrix.


previous section  contents page  next section  The statistics of cross-validation residuals