The statistics of cross-validation residuals - Discussion
The estimated
ratios derived in this paper are the values
that should be achievable at the end of a structure refinement when
only random uncorrelated errors exist in data and model provided that
the observations have been properly weighted (see below).
A larger
ratio than that predicted by these formulae may
indicate that parameter shifts have taken place which have minimised
the residual without significantly improving the model. This may arise
when errors in the model are sufficiently large for the refinement to
descend into a false minimum.
A smaller
ratio than that predicted by these formulae may
indicate that the refinement has not reached convergence since the
initial value of the ratio immediately after the division of the data
into a working set and a test set will be approximately
unity. Interpretation requires care since a wrong model which has not
been fully minimised against the data may produce the same ratio as a
fully minimised correct model.
In macromolecular refinement, model error is usually the major
contributor to R and
at the end of the
refinement. Paucity of diffraction data means that thermal and static
disorder in the more mobile parts of the molecule cannot be accurately
modelled. These model errors may cause random perturbations in the
structure amplitude residuals which are indistinguishable from random
experimental errors. The
ratio will not be affected by the
magnitude of these errors provided that the latter have independent
effects on the included and excluded residuals.
The statistics of cross-validation residuals - Discussion