The statistics of cross-validation residuals - TheoryThe algebra for the derivation of the expected value of the free residual is presented in Appendix I. There it is first shown that the second moment matrix of residuals corresponding to a test set of observations excluded from least-squares refinement is equal to the sum of the variance-covariance matrix (VCM) of the omitted observations and the VCM of the corresponding quantities calculated from the parameter estimates at the convergence of a refinement. The expected value of the sum of squared residuals associated with the excluded observations is then obtained by taking matrix traces.
The theory then draws on results from an earlier paper (Tickle, Laskowski & Moss, 1997) where it is shown that when the weighting is on an absolute scale, the expected value of the sum of a subset of s weighted residuals at the convergence of a least-squares refinement is
where the angle brackets denote statistical expectation.
When the above summation is over all n observations (reflections and restraints)
and hence
In Appendix I equation (21) shows that the expected value of the residual associated with p excluded observations in the test set, is given by
It should be noted that the derivation of this equation does not assume that the test set has been randomly selected from reciprocal space.
We now consider the f structure amplitude observations included in the refinement (the working set). From equation (1) the expected value of the residual associated with these observations at convergence is given by
The similarities between equations (3) and (4) will be noted. In the linear approximation, both expressions for the expected value are independent of the observations (structure amplitudes and restraints).
Using these results it is possible to obtain estimates of the ratio of
to R for models with only random uncorrelated
errors. This
ratio is estimated first for unrestrained
refinement and then for refinements with geometrical restraints.
These
ratios are the starting point for understanding
ratios where systematic errors are present.
The statistics of cross-validation residuals - Theory