The statistics of cross-validation residualsOne of the problems in macromolecular crystallography is that the crystallographer cannot always be sure that an apparently fully refined structure is free from large systematic errors. The agreement between the model of the molecular structure and the X-ray diffraction data from which it has been derived is measured by the crystallographic R factor, but it is well known that structures with acceptable values of this parameter can have significant errors (Brändén & Jones, 1990; Kleywegt & Jones, 1995a). The R-factor is susceptible to manipulation by leaving out weak data or by overfitting the data with too many parameters and so is not a completely reliable guide to accuracy. In small-molecule crystallography, where the number of X-ray intensity observations usually exceeds the number of parameters in the model by at least an order of magnitude, the R-factor is a more sure guide to both accuracy and precision.
In 1992 Brünger introduced the idea of an
(Brünger,
1992, 1993), based on the standard statistical modelling technique of
jack-knifing or cross-validatory residuals (McCullagh & Nelder,
1983). The
is the same as the conventional R-factor, but
based on a test set consisting of a small percentage (usually
5-10%) of reflections excluded from a structure refinement. The
remaining reflections included in the refinement are known as the
working set. The
value, unlike the R-factor, cannot be
driven down by refining a false model because the reflections on which
it is based are excluded from this process.
is only
expected to decrease during the course of a successful refinement.
Consequently, a high value of this statistic and a concomitant low
value of R may indicate an inaccurate model. The procedure assumes
that the reflections removed for the cross-validation test have been
randomly selected and have errors uncorrelated with those that remain
in the set used in the refinement. This assumption may be partly
invalidated by the presence of non-crystallographic symmetry. Ideally,
the refinement should be repeated several times removing
non-overlapping sets of reflections each time.
The
is highly correlated with the phase accuracy of the atomic
model (Brünger, 1992, 1993) and can detect various types of errors
in the structure including phase errors and partial mistracing of the
structure. It has also be used in evaluating different refinement
protocols, such as the optimization of the weights used during
refinement. It is particularly useful in preventing the overfitting of
data (Kleywegt & Brünger, 1996).
Kleywegt & Jones (1995a, b) have shown that with low resolution data
it is possible to completely mistrace a structure, deliberately tracing it
backwards through the density, and still achieve an acceptable R
factor. The
, on the other hand, could not be duped so easily,
and remained at a high value, close to that expected for a random set of
scatterers, throughout the refinement.
The use of
is thus a valuable guide to the progress of
refinement, particularly for low-resolution data, and its use and
publication are widely encouraged. A recent review (Kleywegt & Brünger,
1996) indicated that the use of the measure is becoming more widespread
with it being reported in 44% of articles describing macromolecular X-ray
structures.
However, the usefulness of
is limited by the fact that what is
an ``acceptable'' value is often not evident. One would expect
to always be higher than R even when there are no
systematic errors in the model structure, but it is not clear how much
higher it should be. At present we merely have a number of rules of
thumb (Kleywegt & Brünger, 1996).
Cruickshank has estimated that the expected value of the free R-factor (EFRF) is given by
where
is the number of observations,
is the number of
parameters, and R is the conventional R-factor (Dodson, Kleywegt &
Wilson, 1996). Bacchi, Lamzin & Wilson (1996) use this expression in an
extension of the self-validation Hamilton test to assess the significance
of any observed drop in
during refinement.
The need for more understanding of the behaviour of
was
highlighted by Dodson, Kleywegt & Wilson (1996). In spite of the
enthusiasm for its use, actual applications of
have
remained somewhat subjective without an understanding of its
statistical basis. For example, if non-crystallographic symmetry (NCS)
constraints are relaxed during a structure refinement, how much should
rise during subsequent refinement if the restrained model
is correct? Without understanding how
varies as a function
of the number of restraints and/or number of parameters it is only
possible to make rather subjective judgements.
This paper begins to answer these questions by deriving the expected
value of the free residual from which estimates of both
and the
ratio of
to R are
calculated.
The statistics of cross-validation residuals