The statistics of cross-validation residualsThe last five years have seen a large increase in the use of cross validation in the refinement of macromolecular structures using X-ray data. In this technique a test set of reflections is set aside from the working set and the progress of the refinement is monitored by the calculation of a free R-factor which is based only on the excluded reflections. This paper gives estimates for the ratio of the free R-factor to the R-factor calculated from the working set for both unrestrained and restrained refinement. It is assumed that both the X-ray and restraint observations have been correctly weighted and that there is no correlation of errors between the test and working sets. It is also shown that the least-squares weights that minimise the variances of the refined parameters, also approximately minimise the free R-factor. The estimated free R-factor ratios are compared with those reported for structures in the Protein Data Bank.
The statistics of cross-validation residuals