previous section  Theory  next section  The statistics of cross-validation residuals - Theory


The expected value of the free residual

The algebra for the derivation of the expected value of the free residual is presented in Appendix I. There it is first shown that the second moment matrix of residuals corresponding to a test set of observations excluded from least-squares refinement is equal to the sum of the variance-covariance matrix (VCM) of the omitted observations and the VCM of the corresponding quantities calculated from the parameter estimates at the convergence of a refinement. The expected value of the sum of squared residuals associated with the excluded observations is then obtained by taking matrix traces.

The theory then draws on results from an earlier paper (Tickle, Laskowski & Moss, 1997) where it is shown that when the weighting is on an absolute scale, the expected value of the sum of a subset of s weighted residuals at the convergence of a least-squares refinement is

  equation169

where the angle brackets denote statistical expectation.

When the above summation is over all n observations (reflections and restraints)

  equation180

and hence

displaymath189

In Appendix I equation (21) shows that the expected value of the residual associated with p excluded observations in the test set, is given by

  equation194

It should be noted that the derivation of this equation does not assume that the test set has been randomly selected from reciprocal space.

We now consider the f structure amplitude observations included in the refinement (the working set). From equation (1) the expected value of the residual associated with these observations at convergence is given by

  equation215

The similarities between equations (3) and (4) will be noted. In the linear approximation, both expressions for the expected value are independent of the observations (structure amplitudes and restraints).

Using these results it is possible to obtain estimates of the ratio of tex2html_wrap_inline1138 to R for models with only random uncorrelated errors. This tex2html_wrap_inline1138 ratio is estimated first for unrestrained refinement and then for refinements with geometrical restraints. These tex2html_wrap_inline1138 ratios are the starting point for understanding tex2html_wrap_inline1138 ratios where systematic errors are present.


previous section  Theory  next section  The statistics of cross-validation residuals - Theory