previous section  Discussion  The statistics of cross-validation residuals - Discussion


Observed Rfree ratios from the Protein Data Bank

We have examined tex2html_wrap_inline1138 ratios for crystal structures in the Protein Data Bank (Bernstein et al, 1977) as at 1 June 1997. Figure 1 shows a plot of the tex2html_wrap_inline1138 ratio as a function of tex2html_wrap_inline1512 , where tex2html_wrap_inline1272 is the number of atoms included in the refinement and f is the number of reflections used, for 725 macromolecular structures for which all these values are reported. The points are colour coded according to resolution range.

We define the tex2html_wrap_inline1138 ratio as tex2html_wrap_inline1520 and tex2html_wrap_inline1522 . Values of y range from about 0.8 to 1.8. By substituting tex2html_wrap_inline1526 into equation (15), we have

  equation472

Figure 1 shows the curves corresponding to equation (16) for different values of a.

In order to make the comparison between experimental and theoretical values easier, a function of y was sought which is a linear function of x. By squaring and rearranging the terms in equation (16) we arrive at

displaymath480

where

displaymath482

Figure 2 shows a plot of z against x where the points are colour coded as in Figure 1. The coloured straight lines in Figure 2 are least-squares lines fitted to the data points in the particular resolution range represented by points of the same colour. For example, the pink triangular points represent data between 3 and 4Å\ resolution and the pink line is the least-squares line through the pink triangular points. The pecked black lines emanating from the origin in Figure 2 are plots of z=ax for the same values of a as shown in Figure 1.

We requested information from some of the authors whose structures were outliers in Figure 2. It became apparent that very unusual tex2html_wrap_inline1138 ratios are normally not the result of careful refinement protocols. The coloured lines were therefore plotted ignoring the points in the darker regions outside the sector bounded by the black lines of slopes 0.5 and 10. The choice of these slopes as cutoffs was somewhat arbitrary but the removal of these outliers caused the coloured lines to pass nearer to the origin.

The plots of z=ax represent refinement regimes with different numbers of parameters per atom. The gradients of the lines (a) increase with the number of parameters per atom. In the absence of relevant information in the Data Bank, it was assumed that in restrained refinements, tex2html_wrap_inline1548 and tex2html_wrap_inline1550 . These estimates ignore temperature factor restraints (if any) because our survey of the latter revealed widely different restraint protocols. Using these values, for restrained refinements

displaymath491

It can be seen that the z=2x line (isotropic temperature factors) passes through the constellations of orange crosses and pink triangular points, representing structures between 2.5 and 1.5Å\ resolution and is close to the green pecked line (2.5-2.0Å\ data). Similarly the z=x line (overall temperature factor) lies close to the pink line which is fitted to the 4 to 3Å data. Even in the absence of details of restraint procedures, the z=ax lines can be seen to pass through areas of the plot where the particular refinement regime is most relevant. The large spread of values about the straight lines is unlikely to be solely a statistical effect and may well say something about the quality of the refinements.

Comparison of the lines z=2x and z=4x, which differ only in respect of restraints, shows how restraints lower the tex2html_wrap_inline1138 ratio. Non-crystallographic symmetry (NCS, see Introduction) might give rise to lower than predicted tex2html_wrap_inline1138 ratios. However, a check on structures in our plots which exhibit NCS, did not reveal any obvious systematic effects.


previous section  Discussion  The statistics of cross-validation residuals - Discussion