Theory  next section  The statistics of cross-validation residuals - Theory


Error assumptions

The expected or estimated values of residuals and R-factors will be derived on the assumption that the weights used in the structure refinement correctly reflect the errors which include not only experimental errors in measuring X-ray intensities but also errors in the functional form of the structure factor model which produce random and uncorrelated perturbations in the residuals. These model errors, which may arise from complicated atomic disorder, are an important source of random error in protein structures and are the reason why R-factors of refined macromolecular structures are usually higher than their small molecule counterparts. It is assumed in the derivation of the statistics in this paper that these random errors have been correctly accounted for in the weighting of the X-ray data and of any restraints in the refinement process.

Some model errors, such as the absence of a bulk solvent correction, lead to correlated model errors in reciprocal space. In the theory to be presented in this paper, such correlation could be accommodated if refinement took place with a weight matrix with off-diagonal terms but in practice computational difficulties preclude the use of such matrices in macromolecular refinement. All expressions derived in this paper that use a diagonal weight matrix assume that correlated errors are absent. Model errors such as missing or misplaced atoms are similarly assumed to be absent. On the other hand, no assumptions are made about the completeness or otherwise of the reflection dataset.

In order that the model errors in the free reflections are uncorrelated with those reflections used in refinement, the reflections in one set must not be related to those in the other set by crystallographic and non-crystallographic symmetry. No reflection in the free set must be related to one in the main set by pseudosymmetry. Care is needed when selecting reflections from datasets where Bijvoet pairs have been kept separate. Another case arises when there are domains or molecules in the asymmetric unit related by a non-crystallographic axis which is along a rational direction in the crystal lattice (e.g. the pseudo-dyads in rhombohedral insulin).


Theory  next section  The statistics of cross-validation residuals - Theory