On the use of distance constraints to fold a protein
Wako, H. & Scheraga, H.A.
Macromolecules (1981) 14,  961-969.

Abstract 
A simple method is presented to assess the information that is provided by distance constraints for pairs of residues in proteins. The probability that the distance dij between the C atoms of residues i and j lies within a given range is computed for all N(N-1)/2 pairs in a molecule of N residues, and a quantity H is defined in terms of these probabilities; H is a measure of the ambiguity in the computed conformation of the molecule (consistent with the given distance constraints) and is related to the root-mean-square deviation of the computed conformation from the native one. The quantity H is used to determine the number, kind, and quality of the distance constraints required to define the conformation of a protein within given limits of error, using the 58-residue molecule bovine pancreatic trypsin inhibitor as an illustration. For example, to obtain the computed conformation with a root-mean-square deviation of less than 2 A from the native conformation, the values of dij of more than `80 pairs (half of them with 5 @|i-j | 20 and the other half with 21 |i-j | 57) must be known exactly, or of more than `150 pairs (half of them with 5 @|i-j | 20 and the other half with 21 |i-j | 57) must be known with an error no greater than `2 A; alternatively, the same root-mean-square deviation of less than 2 A from the native structure can be achieved by the computed conformation if more than `160 pairs are chosen so that 20 A is assigned as the lower limit for half of these dij's (for those pairs in the native protein that are separated by 20 A) and 10 A is assigned as the upper limit for the other half of these dij's  (for those pairs in the native protein that are separated by 10 A). In all of the above examples, all values of di,i+1  were fixed at 3.8 A, and all values of di,i+2  were confined to the range 4.5-7.2 A (the minimum and maximum possible values for a polypeptide chain). We also examined the kind of constraints (in terms of their distance both along the chain and through space) that are most effective to obtain a small root-mean-square deviation. For a given number of constraints, information about pairs with large |i-j | or small dij  is more effective in determining the conformation than is information about pairs with small |i-j |  or large dij. It is found, however, that information that includes both small and large |i-j | or both small and large dij is the most effective.

 
back