Abstract
A simple method is presented to assess the information that is provided
by distance constraints for pairs of residues in proteins. The probability
that the distance dij between the C atoms
of residues i and j lies within a given range is computed
for all N(N-1)/2 pairs in a molecule of N residues, and a
quantity H is defined in terms of these probabilities; H
is a measure of the ambiguity in the computed conformation of the molecule
(consistent with the given distance constraints) and is related to the
root-mean-square deviation of the computed conformation from the native
one. The quantity H is used to determine the number, kind, and quality
of the distance constraints required to define the conformation of a protein
within given limits of error, using the 58-residue molecule bovine pancreatic
trypsin inhibitor as an illustration. For example, to obtain the computed
conformation with a root-mean-square deviation of less than 2 A from the
native conformation, the values of dij of more than `80
pairs (half of them with 5 @|i-j | 20 and the other half with
21 |i-j | 57) must be known exactly, or of more than `150
pairs (half of them with 5 @|i-j | 20 and the other half with
21 |i-j | 57) must be known with an error no greater than `2
A; alternatively, the same root-mean-square deviation of less than 2 A
from the native structure can be achieved by the computed conformation
if more than `160 pairs are chosen so that 20 A is assigned as the lower
limit for half of these dij's (for those pairs in the
native protein that are separated by 20 A) and 10 A is assigned as the
upper limit for the other half of these dij's (for
those pairs in the native protein that are separated by 10 A). In all
of the above examples, all values of di,i+1 were
fixed at 3.8 A, and all values of di,i+2 were confined
to the range 4.5-7.2 A (the minimum and maximum possible values for a polypeptide
chain). We also examined the kind of constraints (in terms of their
distance both along the chain and through space) that are most effective
to obtain a small root-mean-square deviation. For a given number of constraints,
information about pairs with large |i-j | or small dij
is more effective in determining the conformation than is information
about pairs with small |i-j | or large dij.
It is found, however, that information that includes both small and large
|i-j | or both small and large dij is the most
effective.
back