|
|
||||||||
RESEARCH COMMUNICATION |
a Department of Medicine and the Cancer Center, University of California, San Diego, La Jolla, California 920930058, USA
b San Diego Super Computer Center, La Jolla, California 92093, USA
| ABSTRACT |
|---|
|
|
|---|
Key Words: protein model loop generation structural homoloque glycogen phosphorylase
| INTRODUCTION |
|---|
|
|
|---|
Obtaining structural information about hMSH2 is of interest for several reasons. First, mutations in hMSH2 account for approximately half of the mutations found in families with HNPCC (1526). Second, because hMSH2 is involved in the initial recognition of the mismatch, its function is essential to the MMR process (4, 5, 2729). Although the 3-dimensional (3D) structures of bacterial, yeast, and mammalian proteins have not been determined, the location of the ATP binding and helix-turn-helix domains have been identified in the human homologue from amino acid sequence homology studies (8, 2729). Inasmuch as protein structure is more conserved than the protein sequence (30, 31), we have undertaken a study based on the hypothesis that information from structurally homologous proteins can be used to predict the 3D structure of hMSH2 by computer modeling and threading. We show here that by modeling the functionally identified areas of hMSH2 against proteins with similar domains and known 3D structure, and by highlighting the sites of the mutations found in HNPCC families, the distribution of the regions affected by these mutations may be visually recognized.
| METHODS |
|---|
|
|
|---|
Identification and alignment of structurally conserved regions
The multiple sequence alignment function in PredictProtein is automatically returned in the report from PredictProtein and is built up in two steps (40). In sweep 1, sequences are aligned consecutively to the search sequence by a standard dynamic programming method. After each sequence has been added, a profile is compiled and used to align the next sequence. In sweep 2, after all sequences with significant structural homology have been picked from SWISSPROT (http://expasy.hcuge.ch/sprot/sprot-top.html), the profile is recompiled and the dynamic programming algorithm starts once again to align the sequences consecutively, this time using the conservation profile as derived after completion of sweep 1. The output consists of structurally homologous proteins with regions automatically aligned to hMSH2. In addition, the known and the predicted secondary structures of the PDB proteins and hMSH2 are shown. With this information, we manually highlighted areas of predicted secondary structure in hMSH2 that were identical to the known structural homologues: regions where PredictProtein predicted a helix in hMSH2 were highlighted if this same region was also a helix in the known structural homologue.
Assignment of coordinates
InsightII, a molecular modeling program from Biosym/Molecular Simulations (San Diego, Calif.), was used in combination with the downloaded hMSH2 sequence. The PDB files and images of the three best-fitting structural homologues of hMSH2 identified by PredictProtein were downloaded and individually aligned manually to hMSH2, according to the alignment suggested by PredictProtein. Boxes were created around the sequences that PredictProtein found in hMSH2 to be structurally homologous to the known protein. Each box was frozen and assigned coordinates based on the known reference protein. These coordinates were first transformed into the same coordinate frame as the hMSH2 model before being copied onto the model. All coordinates were transferred if the side chains of the reference and model proteins were at the same corresponding locations along the sequence of the structurally conserved region. However, if these locations differed, only the backbone coordinates were transferred and the side chain atoms were automatically replaced to preserve the hMSH2 model protein's residue types. These replaced residues were first aligned to the backbone of the original residue; the dihedral angles in common with the residue being replaced were also aligned. This allowed the conformation of the reference side chain to be preserved as much as possible.
Loop generation
Since only fragments of the hMSH2 protein had structural homology to the known proteins, and gaps existed between boxes, loops had to be generated. This was done using the method described by Shenkin et al. (41). Briefly, a conformational search with random settings of
and
angles was made in order to build a peptide backbone chain connecting two conserved peptide segments. A set of six distances was defined using two atoms in the start residue of the loop at the amino-terminal as well as two atoms at the carboxyl-terminal stop residue of the loop. These distances must meet a certain criteria for the loop to be acceptably closed. The loop generation command in InsightII uses a linearized Lagrange multiplier method to minimize differences between the desired distances and their current values. After a series of iterations, and provided the distance between the ends of the loop is not too great for an extended chain of the specified number of residues to span, the loop is closed. Finally, the geometry at the base of the loop is checked for proper chirality and steric overlap violations, accepting those conformations that close the loop. The following parameters were used to generate loops: convergence, 0.05; internal overlap, 0.8; external overlap, 0.8; closure iterations, 1000; scale torsions, 60.00; pro-torsion, trans. InsightII suggests 10 possibilities whereby the boxes might be connected, of which the loop with the lowest root mean square (RMS) value was chosen. All loops chosen had an RMS value of less than 2 Å and most were under 1 Å. The `best fit' was defined as the lowest RMS distance value as calculated from:
![]() |
Structure check
To assess the geometric correctness of the theoretical structure, the `ProStat/Struct_Check' function of the PredictProtein program was used. This command checks the protein-specific bond lengths, angles, and torsions of the theoretical hMSH2 protein models against a database derived from accurate small-molecule crystallographic studies. The parameters checked included phi-psi angles, chi1 dihedral angles, chi2 dihedral angles, proline phi, helix phi, chi3 S-S bridges, omega dihedral angles, CA virtual torsion, and CA-N-C-CB and Kabsch and Sander main chain H-bond energy. This process not only assessed the geometric correctness of the proposed structures, but also focused attention on problem areas in the structure (42).
Identification of important regions and sites of mutation
From the literature, we compiled a list of mutations that result in base substitutions in the hMSH2 protein and its bacterial homologue, MutS, and identified the ATP binding domain region as well as the helix-turn-helix domain (8, 1618, 21, 22, 24, 2729, 4351). These regions were then highlighted in the theoretically threaded models of hMSH2, as were exons 5 and 15, which are deleted in several cases of HNPCC (22, 26, 46).
Confirmation of structural homologues
We used the THREADER2 program from Jones et al. (52) as a secondary check of the structural homologues to hMSH2 found by PredictProtein. The threading program applies double dynamic programming and statistical potential energy functions to fit sequences directly onto the backbone coordinates of known protein structures in full 3D space. This technique makes use of a dynamic-based algorithm (53, 54) capable of optimizing pairwise interactions by using a standard sequence alignment method to optimize the threading of the sequence to a series of putative structures and ranking the models according to total energy scores. This program is available from the author and can be downloaded from the WWW URL: http://globin.bio.warwick.ac.uk/~jones/threader.html. In addition, we used the alignment information provided by THREADER2 for hMSH2 and glycogen phosphorylase to generate a theoretical model using InsightII, as described above.
RESULTS
Identification of structurally homologous proteins
The PredictProtein program identifies the 20 closest structural homologues from prediction-based threading and provides a z score for each. The z score is derived from the final alignment score minus the alignment score averaged over a background distribution of alignments, divided by the standard deviation for that distribution. This score is highly dependent on the similarity of characteristics such as alignment length, compositions of secondary structure, and accessibility of amino acids between the protein of known 3D structure and the protein of interest. The higher the z score, the higher the probability that the first hit is correct. In a recent test of this technique, a z score of >4.5 was associated with an 88% probability that the first hit was a correct one; a z score of >3.5 was associated with a 75% probability that the first hit was correct (36). Z scores vary depending on the number of folds in the fold library, but the estimated confidence of a prediction suggested by the z score has been shown to correlate well with the actual degree of correspondence between a theoretical model and its experimentally determined protein structure (39).
Among the 20 best structural homologues of hMSH2 identified by prediction-based threading using the PredictProtein program, three had a z score of 4 or greater, predicting a >80% probability that these are true structural homologues. These are glycogen phosphorylase (gpb), a 70 kDa soluble lytic transglycosylase (sly), and ribonucleotide reductase protein R1 (rlr).
Table 1
summarizes the z scores of these putative structural homologues when their amino acid sequences were threaded against each other using the PredictProtein program. As shown, 100% structural homology resulted in a z score of 16.48 for gpb, 13.88 for sly, and 13.70 for rlr. The z scores that resulted from threading the predicted hMSH2 secondary structure to gpb, sly, and rlr were all between 3.9 and 5, the highest being 4.97 against gpb. When the protein-specific bond lengths, angles, and torsions in the theoretically modeled hMSH2 protein were analyzed using the Pro-Stat Structure Check command of the InsightII program, the values for percent of phi-psi core region occupancy were 41.7 for gpb, 59.1 for sly, and 49.8 for rlr. These scores indicate that the phi-psi angles are within the Ramachandran plot-favored regions (>90%), and are consistent with the conclusion that the theoretical models have folds similar to hMSH2.
|
Identification of important regions and sites of mutation
The steps of identifying and aligning structurally conserved regions, assigning coordinates to these regions, and generation and assignment of coordinates to loops were undertaken sequentially for each of the three structural homologues identified.
Fig. 1Figures 1ac show the complete hMSH2 structures threaded against gpb, sly, and rlr, respectively, along with the gaps filled in with loop generation. In addition,
Fig. 2 provides ribbon plots of these same models in the same configurations.
|
|
We sought to identify the location of ATP binding and the helix-turn-helix domains and point mutations found in HNPCC family kindreds as well as in bacteria in the theoretically threaded models of hMSH2.
Table 2
summarizes the known mutations in bacteria and human kindreds, including which amino acid residues are affected and the changes that occur (17, 47, 4951). The bacterial MutS sequence was aligned to the hMSH2 sequence using the ALIGN Query at http://genome.eerie.fr/bin/align-guess.cgi. ALIGN produces an optimal global alignment between two protein or DNA sequences by using a modification of the algorithm described by Myers and Miller (55) utilizing the PAM120 matrix. The location of the MutS mutations were associated with the analogous residue in hMSH2 and then highlighted in the hMSH2 models, using the corresponding location information for hMSH2. Although not all mutations can be seen in the projection of each model, it appears that each mutated residue is exposed to the outside surface of each protein in the model. In addition, MutS and hMSH2 point mutations appear to be clustered in similar spans of amino acid residues near the carboxyl-terminal, helix-turn-helix, and ATP binding domains, with two residues in bacteria having close proximity to exon 5 near the amino terminus. This suggests that the sequences are functionally important since mutations in these regions appear to alter the protein sufficiently to disable MMR. Not only do the ATP and helix-turn-helix domains appear to be in close proximity to each other (especially in the hMSH2 protein modeled after gpb), but they also appear to be exposed on the outside surface of the protein. One would expect such external exposure considering the ATP-dependent binding of hMSH2 to mismatches. Also, if the helix-turn-helix domain is to play any role in DNA structure-specific recognition, this region must also be exposed on the surface of the protein, as seen in our models.
|
Confirmation of structural homologues
The structural homologues to hMSH2 found by PredictProtein were confirmed with the use of the THREADER2 program. The first possible homologue found by THREADER2, based on the z score for the pairwise energies filtered for the set of proteins with a reasonable proportion of the sequence and structure matched, was gpb, as with PredictProtein. The z score given by THREADER2 was 5.02, indicating a very significant match and suggesting that gpb is probably a true structural homologue of hMSH2. We found that the known mutations in the bacterial MutS and human MSH2 were exposed on the surface of the protein modeled on information generated by the THREADER2 program, as they were in the theoretical models generated from information given by PredictProtein. The other two putative structural homologues found by PredictProtein, sly and rlr, are not in the current library of THREADER2 against which the hMSH2 sequence was compared and thus could not have been identified as homologues by this program.
DISCUSSION
The MMR system is responsible for recognizing and correcting DNA mismatches, and hMSH2 plays a central role because of its DNA and ATP binding functions (4, 5, 2729). The functional importance of hMSH2 in this process is documented by the fact that hMSH2 mutations are found in a large percentage of all families with HNPCC (16, 17, 1924, 26, 43). Current estimates indicate that mutations in hMSH2 account for 50% of these kindreds; mutations in hMLH1 account for 30%, and mutations in hPMS1 and hPMS2 for 5% each (56). Structural information about hMSH2 is thus of great interest because it should provide clues about how particular mutations disable its function.
The prediction-based threading approach used in this study was productive in identifying three proteins with z scores high enough to suggest that they are true structural homologues of hMSH2. As should be the case, the models suggest that the ATP binding domain and helix-turn-helix domain are exposed on the outside of the protein. In addition, the amino acid sequences coding for exons 5 and 15, which are often deleted in cases of HNPCC, span a large area on the outside of all three predicted structures. Since mutational information on human kindreds is still limited, we mapped known mutations of humans and bacteria onto the models in an effort to identify functionally important regions. As is apparent from the projections shown in
Fig. 1 and from rotations performed on the computer, MutS and hMSH2 mutations both appear to be clustered in similar vicinities in the theoretical models of hMSH2: the major site is within the ATP binding domain and near the carboxyl-terminal end, with a smaller number occurring near the region coding for exon 5 and the amino-terminal domain. All point mutations also appear to affect amino acids that are exposed on the outside surface of the protein. The distribution of the residues at risk for mutations that have phenotypic consequences indicates that structural changes in the ATP binding pocket can effectively disable hMSH2 function. Likewise, the distribution of mutations suggests that the amino-and carboxyl-terminal domains are important for function and may play a central role in essential protein'otein interactions. Others have shown, functionally, that the carboxyl-terminal region is important in the binding to mismatched oligonucleotides (28, 29). In our theoretical models (especially the one modeled against gpb, which has the highest degree of structural homology), a majority of the highlighted mutations are clustered in the same topological region. Similarly, although other groups have suggested that the helix-turn-helix domain is unlikely to have a role as a DNA recognition domain, the close proximity of this region to known mutations in the human MSH2 and bacterial MutS proteins, as well as to the ATP binding domain, and its overlap in the coding region for exon 15 suggest that this region may in fact also be important in proteinprotein interactions (8).
The difference between the theoretical 3D structure of hMSH2 based on the 70 kDa soluble lytic transglycosylase structure and the other two models is most likely an artifact and the result of a difference in the length of alignment. The 70 kDa soluble lytic transglycosylase is only 618 amino acids long and starts aligning with hMSH2 at amino acid 201, whereas gpb and rlr both start aligning almost immediately with the hMSH2 amino acid sequence. By deleting the first 200 amino acids of the theoretical 3D hMSH2 structure threaded to the 823 amino acids of gpb and the first 175 amino acids of the 738 amino acid rlr, we found a similar donut-shaped groove, much like the structure predicted by threading the hMSH2 structure to sly (pictures not shown). Considering the information about the clustered locations of known mutations and the absence of any known mutations in the core of the modeled proteins, this would suggest that a groove or similar structure in hMSH2 would not play a major role, if mutated, in the loss of hMSH2 function in DNA mismatch repair. It is possible that the best fit 3D structure of hMSH2 is a combination of structurally conserved regions of gpb, sly, and rlr, and that combining areas of structural homology between these three putative structural homologues of hMSH2 rather than using loop generation to fill in gaps might improve the structural prediction. However, since these theoretical models are based on known protein folds, the gap regions with unknown protein folds will be similar in all theoretical models. Hence, it was not feasible with the current protein database of information to combine the coordinates from each theoretical model, and so this was not undertaken as part of this study.
It is clear that the putative structural homologues to hMSH2 found in this study are, among themselves, different in function and overall structure, although altogether they have characteristics similar to the modeled hMSH2. Glycogen phosphorylase catalyzes glycogen breakdown and plays a central role in the regulation of glycogen metabolism (57). The 70 kDa soluble lytic transglycosylase cleaves the ß-1,4-glycosidic bonds of peptidoglycan to produce small 1,6-anhydromuropeptides (5860). Ribonucleotide reductase R1 catalyzes de novo formation of deoxyribonucleotides and is a key enzyme in DNA synthesis (61). In general, all three proteins are made up of three domains. Glycogen phosphorylase has an amino-terminal domain consisting of 320 residues, a central domain of 160 residues, and a carboxyl-terminal domain of 360 residues with alternating
and ß structures overall (62). The 70 kDa soluble lytic transglycosylase, which is very rich in
helices with 63% of residues in an alpha-helix, has an amino-terminal domain of 360 residues and 22
helices and a linker domain of 79 residues and 4
helices, which form an asymmetric donut shape. A globular carboxyl-terminal domain of 161 residues and 9
helices sits atop these two domains (63). The ribonucleotide reductase protein R1 looks much like the side view of a left hand with fingers at a right angle, with a helical amino-terminal of 220 residues, an
/ß barrel of 480 residues, and an
ß
ß domain of 70 residues (61). The active sites of glycogen phosphorylase and ribonucleotide reductase protein R1 are both buried in a deep cavity or cleft either at points of domain interactions or between two domains (61, 62). The active site of the 70 kDa soluble lytic transglycosylase, on the other hand, similar to that of hMSH2, is found in the carboxyl-terminal region (63). Glycogen phosphorylase and ribonucleotide reductase R1 also have allosteric binding sites that are not found in the soluble lytic transglycosylase and are not known to exist in hMSH2 (6163). Like glycogen phosphorylase and riboucleotide reductase R1, the predicted secondary structure of hMSH2 has a high incidence of alternating
and ß structures (61, 62). However, most of these regions in hMSH2 are long stretches of
helices and short segments of ß sheets. Thus, like the 70 kDa soluble lytic transglycosylase, hMSH2 also has a majority of residues in
helices. Of 940 residues in hMSH2, 629 were assigned a predicted secondary structure; of these 629, 504 residues are in
helices.
Several lines of evidence support the reliability of the threaded theoretical models of hMSH2. First, comparable structural features are present in all three models. Second, mutations known to disable hMSH2 function are mapped to sites on the surface of the predicted structures. Third, the validity of the prediction-based threading technique we used has been tested and compared to other threading methods that have also proved to be reliable (39). The first test compared the results of the prediction-based threading method used here to that of the potential-based threading method that utilizes the THREADER program published by Jones and co-workers (52). Twelve examples were tested with the potential-based method. For all 12 cases, the first hits were identified as the correct homologue. When using the prediction-based threading method, Rost et al. (39) also found 100% accuracy in identifying the correct homologues of these same proteins. In another test, Russell et al. (64) evaluated a different version of the prediction-based technique on 11 different proteins and compared their results to that of the same potential-based method of Jones et al. (52). They reported a first hit accuracy of 3745% for their technique and a 919% accuracy when using Jones' THREADER program. When Rost et al. (39) analyzed this same set of 11 proteins using their prediction-based threading technique, they succeeded in getting 78% correct first hits. In a third study, however, Rost and co-workers analyzed 11 proteins that were used in the first Asilomar meeting for the evaluation of prediction methods (65, 66) and found that their method managed to correctly detect only 4 of 11 cases, whereas the THREADER technique detected 5 of 9 correct matches (39, 67).
To further validate the identification of putative structural homologues made when using the prediction-based threading approach, we also used the THREADER2 program to search for structural homologues of hMSH2 (52). The closest homologue found by THREADER2, based on z scores of pairwise energies, was gpb with a z score of 5.02, as was also found by PredictProtein, with a z score of 4.97 (3236). We found, too, that the known mutations in bacterial MutS and human MSH2 are exposed on the exterior of the hMSH2 model created by InsightII from the hMSH2-gpb alignment generated by the THREADER2 program. The other two proteins that had been identified by the prediction-based threading approach, sly and rlr, are not currently in the library of proteins used in THREADER2 to compare the target sequences. However, on the basis of the structural similarities discussed above between gpb, sly, and rlr among themselves and between them and hMSH2, sly and rlr would most likely also be found by THREADER2 had they been available for comparison. According to the author of the THREADER2 program, the z score for solvation energy is, along with the z score based on pairwise energies, an important parameter on which predictions should be based. However, the z score for solvation can be used only when comparing monomeric proteins. Since hMSH2 becomes part of a multiprotein complex, the solvation energy z score is unlikely to be useful in predicting structural homologues; the solvation energy z score for the comparison of hMSH2 and gpb actually yielded a negative value. The fact that the potential-based threading method also identified gpb as a putative structural homologue of hMSH2 provides further evidence that the proteins identified by the prediction-based threading method are true structural homologues of hMSH2.
Though desirable, energy minimization and molecular dynamics could not be performed on these theoretical models due to software limitations. Generally, it is required that all residues be assigned coordinates. For rlr, for example, a region of 40 residues of hMSH2 was inserted into the rlr sequence and thus did not overlap any residues of rlr. The InsightII program is incapable of assigning coordinates to a loop flex region of more than 37 residues. Hence, without overlapping residues to guide assignments, the coordinates necessary for any refinement, minimization, or molecular dynamics steps could not be generated for this region of hMSH2. Similarly, the InsightII program cannot assign loop coordinates to flex regions of fewer than three residues. This was the problem with gpb, where two residues of hMSH2 were inserted into gpb with no overlapping sequences. Theoretically, arbitrary coordinates could be assigned to these residues; however, there is a high probability that these arbitrarily assigned coordinates would be wrong, and therefore minimization, although it would occur, would be done on an incorrect model with erroneous coordinates.
The models resulting from the prediction-based threading of hMSH2 to proteins with known structure, although still hypothetical, provide insight into the way this protein is likely to function. hMSH2 seems to be a globular molecule that does not contain a DNA binding groove, yet hMSH2 has been reported to bind to DNA mismatches even without the assistance of the other MMR proteins (68). The models suggest that the surface made up by the carboxyl-terminal domain is likely to be essential, and thus that mutational analysis focused on this region may prove fruitful in further elucidating the key amino acids involved in the mismatch recognition process.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
1 Correspondence: 9500 Gilman Drive 0058, La Jolla, CA 920930058, USA. E-mail, mdelasal{at}sdcc14.ucsd.edu ![]()
3 Abbreviations: PDB, Protein Data Bank; gpb, glycogen phosphorylase; sly, 70 kDa soluble lytic transglycosylase; rlr, ribonucleotide reductase protein R1; 3D, 3-dimensional; MMR, DNA mismatch repair; HNPCC, hereditary nonpolyposis colon carcinoma; RMS, root mean square. ![]()
Received for publication September 15, 1997. Accepted for publication January 5, 1998.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. M. d. l. Alas, G. Los, X. Lin, B. Kurdi-Haidar, G. Manorek, and S. B. Howell Identification of Transdominant-Negative Genetic Suppressor Elements Derived from hMSH2 That Mediate Resistance to 6-Thioguanine Mol. Pharmacol., November 1, 2002; 62(5): 1198 - 1206. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Culligan, G. Meyer-Gauen, J. Lyons-Weiler, and J. B. Hays Evolutionary origin, diversification and specialization of eukaryotic MutS homolog mismatch repair proteins Nucleic Acids Res., January 15, 2000; 28(2): 463 - 471. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. D. Scheeff, J. M. Briggs, and S. B. Howell Molecular Modeling of the Intrastrand Guanine-Guanine DNA Adducts Produced by Cisplatin and Oxaliplatin Mol. Pharmacol., September 1, 1999; 56(3): 633 - 643. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |