FASEB J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by De las alas, M. M.
Right arrow Articles by Howell, S. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by De las alas, M. M.
Right arrow Articles by Howell, S. B.
(The FASEB Journal. 1998;12:653-663.)
© 1998 FASEB


RESEARCH COMMUNICATION

Prediction-based threading of the hMSH2 DNA mismatch repair protein

Maida M. De las alas2,a,1, Robertus A. M. de Bruinb,2, Lynn Ten Eycka, Gerrit Losa, and Stephen B. Howella

a Department of Medicine and the Cancer Center, University of California, San Diego, La Jolla, California 92093–0058, USA
b San Diego Super Computer Center, La Jolla, California 92093, USA


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
REFERENCES
 
Mutations in the genes whose products participate in DNA mismatch repair underlie the increased risk of cancer in families with hereditary nonpolyposis colon carcinoma. Mutations in hMSH2 account for approximately 50% of the mutations found in these families. We sought to predict the 3-dimensional structure of hMSH2 by identifying structural homologues using prediction-based threading and by computer modeling using information from these putative structurally related proteins. Prediction-based threading identified three candidate structural homologues: glycogen phosphorylase (gpb), a 70 kDa soluble lytic transglycosylase, and ribonucleotide reductase protein R1. An independent approach utilizing a potential-based threading program also identified gpb as a structural homologue. The models based on the structures of these proteins suggest that the ATP binding domain and helix-turn-helix domain are exposed on the outside of the protein. All known bacterial MutS and hMSH2 mutations appear to be clustered in similar vicinities in the theoretical models of hMSH2; the major site is within the ATP binding domain and near the carboxyl-terminal end, whereas a smaller number map to the region coding for exon 5 and the amino-terminal domain. All point mutations also appear to affect amino acids that are exposed on the outside surface of the protein.—de las Alas, M. M., de Bruin, R. A. M., Ten Eyck, L., Los, G., Howell, S. B. Prediction-based threading of the hMSH2 DNA mismatch repair protein. FASEB J. 12, 653–663 (1998)


Key Words: protein model • loop generation • structural homoloque • glycogen phosphorylase


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
REFERENCES
 
THE DNA MISMATCH repair (MMR)3 system is believed to be responsible for the recognition and correction of mismatches in DNA; it plays important roles in postreplication repair and in the processing of recombination heteroduplexes that contain mismatched base pairs. Mutations in the MMR genes underlie the increased risk of cancer in families with hereditary nonpolyposis colon carcinoma (HNPCC) (1). Individuals at risk in these families have a germline mutation in one of four genes known to be involved in DNA mismatch repair: hMSH2, hMLH1, hPMS1, and hPMS2 (2). These proteins show a high degree of amino acid sequence similarity to proteins in the bacterial and yeast DNA mismatch repair systems, indicating extensive conservation throughout evolution and suggesting a common function (1). The MMR system is well characterized in bacteria and yeast; by analogy to these systems and on the basis of studies in mammalian cells, it is thought that MMR is initiated through the binding of a heterodimer of hMSH2 with either hMSH3 or hMSH6 to the site of the mismatch (313). A heterodimer consisting of hMLH1 and hPMS2 then joins the other two proteins to complete the MMR complex (2, 14).

Obtaining structural information about hMSH2 is of interest for several reasons. First, mutations in hMSH2 account for approximately half of the mutations found in families with HNPCC (1526). Second, because hMSH2 is involved in the initial recognition of the mismatch, its function is essential to the MMR process (4, 5, 2729). Although the 3-dimensional (3D) structures of bacterial, yeast, and mammalian proteins have not been determined, the location of the ATP binding and helix-turn-helix domains have been identified in the human homologue from amino acid sequence homology studies (8, 2729). Inasmuch as protein structure is more conserved than the protein sequence (30, 31), we have undertaken a study based on the hypothesis that information from structurally homologous proteins can be used to predict the 3D structure of hMSH2 by computer modeling and threading. We show here that by modeling the functionally identified areas of hMSH2 against proteins with similar domains and known 3D structure, and by highlighting the sites of the mutations found in HNPCC families, the distribution of the regions affected by these mutations may be visually recognized.


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
REFERENCES
 
Identification of structurally homologous proteins
In searching for structural homologues for hMSH2, the complete amino acid sequence of hMSH2 was submitted to PredictProtein (http://www.embl-heidelberg.de/predictprotein). This server returns a multiple sequence alignment and predictions of secondary structure, residue solvent accessibility, and the location of transmembrane helices (3236). The secondary structure of hMSH2 was then threaded, using this information against proteins in the Protein Data Bank (PDB). The PredictProtein program detects remote homologues (0–25% sequence identity) by a novel prediction-based threading method (3739). To recognize folds by threading, the PredictProtein program evaluates the amino acid sequence of a protein and determines how well it fits into the 3D configuration of proteins whose structures are known. The goal is to detect similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold. Proteins with known 3D structure and the highest degree of structural homology to hMSH2 were identified by PredictProtein, which also provided summary information on these proteins from the server via e-mail.

Identification and alignment of structurally conserved regions
The multiple sequence alignment function in PredictProtein is automatically returned in the report from PredictProtein and is built up in two steps (40). In sweep 1, sequences are aligned consecutively to the search sequence by a standard dynamic programming method. After each sequence has been added, a profile is compiled and used to align the next sequence. In sweep 2, after all sequences with significant structural homology have been picked from SWISSPROT (http://expasy.hcuge.ch/sprot/sprot-top.html), the profile is recompiled and the dynamic programming algorithm starts once again to align the sequences consecutively, this time using the conservation profile as derived after completion of sweep 1. The output consists of structurally homologous proteins with regions automatically aligned to hMSH2. In addition, the known and the predicted secondary structures of the PDB proteins and hMSH2 are shown. With this information, we manually highlighted areas of predicted secondary structure in hMSH2 that were identical to the known structural homologues: regions where PredictProtein predicted a helix in hMSH2 were highlighted if this same region was also a helix in the known structural homologue.

Assignment of coordinates
InsightII, a molecular modeling program from Biosym/Molecular Simulations (San Diego, Calif.), was used in combination with the downloaded hMSH2 sequence. The PDB files and images of the three best-fitting structural homologues of hMSH2 identified by PredictProtein were downloaded and individually aligned manually to hMSH2, according to the alignment suggested by PredictProtein. Boxes were created around the sequences that PredictProtein found in hMSH2 to be structurally homologous to the known protein. Each box was frozen and assigned coordinates based on the known reference protein. These coordinates were first transformed into the same coordinate frame as the hMSH2 model before being copied onto the model. All coordinates were transferred if the side chains of the reference and model proteins were at the same corresponding locations along the sequence of the structurally conserved region. However, if these locations differed, only the backbone coordinates were transferred and the side chain atoms were automatically replaced to preserve the hMSH2 model protein's residue types. These replaced residues were first aligned to the backbone of the original residue; the dihedral angles in common with the residue being replaced were also aligned. This allowed the conformation of the reference side chain to be preserved as much as possible.

Loop generation
Since only fragments of the hMSH2 protein had structural homology to the known proteins, and gaps existed between boxes, loops had to be generated. This was done using the method described by Shenkin et al. (41). Briefly, a conformational search with random settings of {Phi} and {Psi} angles was made in order to build a peptide backbone chain connecting two conserved peptide segments. A set of six distances was defined using two atoms in the start residue of the loop at the amino-terminal as well as two atoms at the carboxyl-terminal stop residue of the loop. These distances must meet a certain criteria for the loop to be acceptably closed. The loop generation command in InsightII uses a linearized Lagrange multiplier method to minimize differences between the desired distances and their current values. After a series of iterations, and provided the distance between the ends of the loop is not too great for an extended chain of the specified number of residues to span, the loop is closed. Finally, the geometry at the base of the loop is checked for proper chirality and steric overlap violations, accepting those conformations that close the loop. The following parameters were used to generate loops: convergence, 0.05; internal overlap, 0.8; external overlap, 0.8; closure iterations, 1000; scale torsions, 60.00; pro-torsion, trans. InsightII suggests 10 possibilities whereby the boxes might be connected, of which the loop with the lowest root mean square (RMS) value was chosen. All loops chosen had an RMS value of less than 2 Å and most were under 1 Å. The `best fit' was defined as the lowest RMS distance value as calculated from:

where N is the total number of preflex residues plus postflex residues and x, y, z are the coordinates of the alpha carbons of these residues. Depending on the size of the loop fragment one is trying to generate and whether there is a deletion or insertion of the unknown protein in the known protein, this step may take much time and a lot of memory to perform and run to completion (42).

Structure check
To assess the geometric correctness of the theoretical structure, the `ProStat/Struct_Check' function of the PredictProtein program was used. This command checks the protein-specific bond lengths, angles, and torsions of the theoretical hMSH2 protein models against a database derived from accurate small-molecule crystallographic studies. The parameters checked included phi-psi angles, chi1 dihedral angles, chi2 dihedral angles, proline phi, helix phi, chi3 S-S bridges, omega dihedral angles, CA virtual torsion, and CA-N-C-CB and Kabsch and Sander main chain H-bond energy. This process not only assessed the geometric correctness of the proposed structures, but also focused attention on problem areas in the structure (42).

Identification of important regions and sites of mutation
From the literature, we compiled a list of mutations that result in base substitutions in the hMSH2 protein and its bacterial homologue, MutS, and identified the ATP binding domain region as well as the helix-turn-helix domain (8, 1618, 21, 22, 24, 2729, 4351). These regions were then highlighted in the theoretically threaded models of hMSH2, as were exons 5 and 15, which are deleted in several cases of HNPCC (22, 26, 46).

Confirmation of structural homologues
We used the THREADER2 program from Jones et al. (52) as a secondary check of the structural homologues to hMSH2 found by PredictProtein. The threading program applies double dynamic programming and statistical potential energy functions to fit sequences directly onto the backbone coordinates of known protein structures in full 3D space. This technique makes use of a dynamic-based algorithm (53, 54) capable of optimizing pairwise interactions by using a standard sequence alignment method to optimize the threading of the sequence to a series of putative structures and ranking the models according to total energy scores. This program is available from the author and can be downloaded from the WWW URL: http://globin.bio.warwick.ac.uk/~jones/threader.html. In addition, we used the alignment information provided by THREADER2 for hMSH2 and glycogen phosphorylase to generate a theoretical model using InsightII, as described above.

RESULTS
Identification of structurally homologous proteins
The PredictProtein program identifies the 20 closest structural homologues from prediction-based threading and provides a z score for each. The z score is derived from the final alignment score minus the alignment score averaged over a background distribution of alignments, divided by the standard deviation for that distribution. This score is highly dependent on the similarity of characteristics such as alignment length, compositions of secondary structure, and accessibility of amino acids between the protein of known 3D structure and the protein of interest. The higher the z score, the higher the probability that the first hit is correct. In a recent test of this technique, a z score of >4.5 was associated with an 88% probability that the first hit was a correct one; a z score of >3.5 was associated with a 75% probability that the first hit was correct (36). Z scores vary depending on the number of folds in the fold library, but the estimated confidence of a prediction suggested by the z score has been shown to correlate well with the actual degree of correspondence between a theoretical model and its experimentally determined protein structure (39).

Among the 20 best structural homologues of hMSH2 identified by prediction-based threading using the PredictProtein program, three had a z score of 4 or greater, predicting a >80% probability that these are true structural homologues. These are glycogen phosphorylase (gpb), a 70 kDa soluble lytic transglycosylase (sly), and ribonucleotide reductase protein R1 (rlr). Table 1 summarizes the z scores of these putative structural homologues when their amino acid sequences were threaded against each other using the PredictProtein program. As shown, 100% structural homology resulted in a z score of 16.48 for gpb, 13.88 for sly, and 13.70 for rlr. The z scores that resulted from threading the predicted hMSH2 secondary structure to gpb, sly, and rlr were all between 3.9 and 5, the highest being 4.97 against gpb. When the protein-specific bond lengths, angles, and torsions in the theoretically modeled hMSH2 protein were analyzed using the Pro-Stat Structure Check command of the InsightII program, the values for percent of phi-psi core region occupancy were 41.7 for gpb, 59.1 for sly, and 49.8 for rlr. These scores indicate that the phi-psi angles are within the Ramachandran plot-favored regions (>90%), and are consistent with the conclusion that the theoretical models have folds similar to hMSH2.


View this table:
[in this window]
[in a new window]
 
Table 1. Summary of z scores for comparisons of hMSH2, gpb, sly and rlra

Identification of important regions and sites of mutation
The steps of identifying and aligning structurally conserved regions, assigning coordinates to these regions, and generation and assignment of coordinates to loops were undertaken sequentially for each of the three structural homologues identified. Fig. 1Figures 1a–c show the complete hMSH2 structures threaded against gpb, sly, and rlr, respectively, along with the gaps filled in with loop generation. In addition, Fig. 2 provides ribbon plots of these same models in the same configurations.





View larger version (415K):
[in this window]
[in a new window]
 
Figure 1. hMSH2 theoretical model threaded against putative structural homologs: hMSH2 threaded against A) glycogen phosphorylase; B) 70 kDa soluble lytic transglycosylase; C) ribonucleotide reductase protein R1. The ATP binding domain is shown in green. Exons 5 and 15 are light blue and purple, respectively. Point mutations leading to base substitutions in hMSH2 are highlighted in gray. Point mutations in MutS, corresponding to the hMSH2 residue, are colored yellow. Orange reflects point mutations found in MutS that are in the ATP binding domain. Royal blue is a point mutation found both in the human and bacterial homologue of MutS as well as in the ATP binding domain. The helix-turn-helix domain is light purple.





View larger version (372K):
[in this window]
[in a new window]
 
Figure 2. Ribbon plots of the models generated by threading hMSH2 against putative structural homologues: hMSH2 threaded against A) glycogen phoshorylase; B) 70 kDa soluble lytic transglycosylase; C) ribonucleotide reductase protein R1. Color coding of domains and mutations is the same as for Fig. 1.

We sought to identify the location of ATP binding and the helix-turn-helix domains and point mutations found in HNPCC family kindreds as well as in bacteria in the theoretically threaded models of hMSH2. Table 2 summarizes the known mutations in bacteria and human kindreds, including which amino acid residues are affected and the changes that occur (17, 47, 4951). The bacterial MutS sequence was aligned to the hMSH2 sequence using the ALIGN Query at http://genome.eerie.fr/bin/align-guess.cgi. ALIGN produces an optimal global alignment between two protein or DNA sequences by using a modification of the algorithm described by Myers and Miller (55) utilizing the PAM120 matrix. The location of the MutS mutations were associated with the analogous residue in hMSH2 and then highlighted in the hMSH2 models, using the corresponding location information for hMSH2. Although not all mutations can be seen in the projection of each model, it appears that each mutated residue is exposed to the outside surface of each protein in the model. In addition, MutS and hMSH2 point mutations appear to be clustered in similar spans of amino acid residues near the carboxyl-terminal, helix-turn-helix, and ATP binding domains, with two residues in bacteria having close proximity to exon 5 near the amino terminus. This suggests that the sequences are functionally important since mutations in these regions appear to alter the protein sufficiently to disable MMR. Not only do the ATP and helix-turn-helix domains appear to be in close proximity to each other (especially in the hMSH2 protein modeled after gpb), but they also appear to be exposed on the outside surface of the protein. One would expect such external exposure considering the ATP-dependent binding of hMSH2 to mismatches. Also, if the helix-turn-helix domain is to play any role in DNA structure-specific recognition, this region must also be exposed on the surface of the protein, as seen in our models.


View this table:
[in this window]
[in a new window]
 
Table 2. Summary of known mutationsa

Confirmation of structural homologues
The structural homologues to hMSH2 found by PredictProtein were confirmed with the use of the THREADER2 program. The first possible homologue found by THREADER2, based on the z score for the pairwise energies filtered for the set of proteins with a reasonable proportion of the sequence and structure matched, was gpb, as with PredictProtein. The z score given by THREADER2 was 5.02, indicating a very significant match and suggesting that gpb is probably a true structural homologue of hMSH2. We found that the known mutations in the bacterial MutS and human MSH2 were exposed on the surface of the protein modeled on information generated by the THREADER2 program, as they were in the theoretical models generated from information given by PredictProtein. The other two putative structural homologues found by PredictProtein, sly and rlr, are not in the current library of THREADER2 against which the hMSH2 sequence was compared and thus could not have been identified as homologues by this program.

DISCUSSION
The MMR system is responsible for recognizing and correcting DNA mismatches, and hMSH2 plays a central role because of its DNA and ATP binding functions (4, 5, 2729). The functional importance of hMSH2 in this process is documented by the fact that hMSH2 mutations are found in a large percentage of all families with HNPCC (16, 17, 1924, 26, 43). Current estimates indicate that mutations in hMSH2 account for 50% of these kindreds; mutations in hMLH1 account for 30%, and mutations in hPMS1 and hPMS2 for 5% each (56). Structural information about hMSH2 is thus of great interest because it should provide clues about how particular mutations disable its function.

The prediction-based threading approach used in this study was productive in identifying three proteins with z scores high enough to suggest that they are true structural homologues of hMSH2. As should be the case, the models suggest that the ATP binding domain and helix-turn-helix domain are exposed on the outside of the protein. In addition, the amino acid sequences coding for exons 5 and 15, which are often deleted in cases of HNPCC, span a large area on the outside of all three predicted structures. Since mutational information on human kindreds is still limited, we mapped known mutations of humans and bacteria onto the models in an effort to identify functionally important regions. As is apparent from the projections shown in Fig. 1 and from rotations performed on the computer, MutS and hMSH2 mutations both appear to be clustered in similar vicinities in the theoretical models of hMSH2: the major site is within the ATP binding domain and near the carboxyl-terminal end, with a smaller number occurring near the region coding for exon 5 and the amino-terminal domain. All point mutations also appear to affect amino acids that are exposed on the outside surface of the protein. The distribution of the residues at risk for mutations that have phenotypic consequences indicates that structural changes in the ATP binding pocket can effectively disable hMSH2 function. Likewise, the distribution of mutations suggests that the amino-and carboxyl-terminal domains are important for function and may play a central role in essential protein–'otein interactions. Others have shown, functionally, that the carboxyl-terminal region is important in the binding to mismatched oligonucleotides (28, 29). In our theoretical models (especially the one modeled against gpb, which has the highest degree of structural homology), a majority of the highlighted mutations are clustered in the same topological region. Similarly, although other groups have suggested that the helix-turn-helix domain is unlikely to have a role as a DNA recognition domain, the close proximity of this region to known mutations in the human MSH2 and bacterial MutS proteins, as well as to the ATP binding domain, and its overlap in the coding region for exon 15 suggest that this region may in fact also be important in protein–protein interactions (8).

The difference between the theoretical 3D structure of hMSH2 based on the 70 kDa soluble lytic transglycosylase structure and the other two models is most likely an artifact and the result of a difference in the length of alignment. The 70 kDa soluble lytic transglycosylase is only 618 amino acids long and starts aligning with hMSH2 at amino acid 201, whereas gpb and rlr both start aligning almost immediately with the hMSH2 amino acid sequence. By deleting the first 200 amino acids of the theoretical 3D hMSH2 structure threaded to the 823 amino acids of gpb and the first 175 amino acids of the 738 amino acid rlr, we found a similar donut-shaped groove, much like the structure predicted by threading the hMSH2 structure to sly (pictures not shown). Considering the information about the clustered locations of known mutations and the absence of any known mutations in the core of the modeled proteins, this would suggest that a groove or similar structure in hMSH2 would not play a major role, if mutated, in the loss of hMSH2 function in DNA mismatch repair. It is possible that the best fit 3D structure of hMSH2 is a combination of structurally conserved regions of gpb, sly, and rlr, and that combining areas of structural homology between these three putative structural homologues of hMSH2 rather than using loop generation to fill in gaps might improve the structural prediction. However, since these theoretical models are based on known protein folds, the gap regions with unknown protein folds will be similar in all theoretical models. Hence, it was not feasible with the current protein database of information to combine the coordinates from each theoretical model, and so this was not undertaken as part of this study.

It is clear that the putative structural homologues to hMSH2 found in this study are, among themselves, different in function and overall structure, although altogether they have characteristics similar to the modeled hMSH2. Glycogen phosphorylase catalyzes glycogen breakdown and plays a central role in the regulation of glycogen metabolism (57). The 70 kDa soluble lytic transglycosylase cleaves the ß-1,4-glycosidic bonds of peptidoglycan to produce small 1,6-anhydromuropeptides (5860). Ribonucleotide reductase R1 catalyzes de novo formation of deoxyribonucleotides and is a key enzyme in DNA synthesis (61). In general, all three proteins are made up of three domains. Glycogen phosphorylase has an amino-terminal domain consisting of 320 residues, a central domain of 160 residues, and a carboxyl-terminal domain of 360 residues with alternating {alpha} and ß structures overall (62). The 70 kDa soluble lytic transglycosylase, which is very rich in {alpha} helices with 63% of residues in an alpha-helix, has an amino-terminal domain of 360 residues and 22 {alpha} helices and a linker domain of 79 residues and 4 {alpha} helices, which form an asymmetric donut shape. A globular carboxyl-terminal domain of 161 residues and 9 {alpha} helices sits atop these two domains (63). The ribonucleotide reductase protein R1 looks much like the side view of a left hand with fingers at a right angle, with a helical amino-terminal of 220 residues, an {alpha}/ß barrel of 480 residues, and an {alpha}ß{alpha}{alpha}ß domain of 70 residues (61). The active sites of glycogen phosphorylase and ribonucleotide reductase protein R1 are both buried in a deep cavity or cleft either at points of domain interactions or between two domains (61, 62). The active site of the 70 kDa soluble lytic transglycosylase, on the other hand, similar to that of hMSH2, is found in the carboxyl-terminal region (63). Glycogen phosphorylase and ribonucleotide reductase R1 also have allosteric binding sites that are not found in the soluble lytic transglycosylase and are not known to exist in hMSH2 (6163). Like glycogen phosphorylase and riboucleotide reductase R1, the predicted secondary structure of hMSH2 has a high incidence of alternating {alpha} and ß structures (61, 62). However, most of these regions in hMSH2 are long stretches of {alpha} helices and short segments of ß sheets. Thus, like the 70 kDa soluble lytic transglycosylase, hMSH2 also has a majority of residues in {alpha} helices. Of 940 residues in hMSH2, 629 were assigned a predicted secondary structure; of these 629, 504 residues are in {alpha} helices.

Several lines of evidence support the reliability of the threaded theoretical models of hMSH2. First, comparable structural features are present in all three models. Second, mutations known to disable hMSH2 function are mapped to sites on the surface of the predicted structures. Third, the validity of the prediction-based threading technique we used has been tested and compared to other threading methods that have also proved to be reliable (39). The first test compared the results of the prediction-based threading method used here to that of the potential-based threading method that utilizes the THREADER program published by Jones and co-workers (52). Twelve examples were tested with the potential-based method. For all 12 cases, the first hits were identified as the correct homologue. When using the prediction-based threading method, Rost et al. (39) also found 100% accuracy in identifying the correct homologues of these same proteins. In another test, Russell et al. (64) evaluated a different version of the prediction-based technique on 11 different proteins and compared their results to that of the same potential-based method of Jones et al. (52). They reported a first hit accuracy of 37–45% for their technique and a 9–19% accuracy when using Jones' THREADER program. When Rost et al. (39) analyzed this same set of 11 proteins using their prediction-based threading technique, they succeeded in getting 78% correct first hits. In a third study, however, Rost and co-workers analyzed 11 proteins that were used in the first Asilomar meeting for the evaluation of prediction methods (65, 66) and found that their method managed to correctly detect only 4 of 11 cases, whereas the THREADER technique detected 5 of 9 correct matches (39, 67).

To further validate the identification of putative structural homologues made when using the prediction-based threading approach, we also used the THREADER2 program to search for structural homologues of hMSH2 (52). The closest homologue found by THREADER2, based on z scores of pairwise energies, was gpb with a z score of 5.02, as was also found by PredictProtein, with a z score of 4.97 (3236). We found, too, that the known mutations in bacterial MutS and human MSH2 are exposed on the exterior of the hMSH2 model created by InsightII from the hMSH2-gpb alignment generated by the THREADER2 program. The other two proteins that had been identified by the prediction-based threading approach, sly and rlr, are not currently in the library of proteins used in THREADER2 to compare the target sequences. However, on the basis of the structural similarities discussed above between gpb, sly, and rlr among themselves and between them and hMSH2, sly and rlr would most likely also be found by THREADER2 had they been available for comparison. According to the author of the THREADER2 program, the z score for solvation energy is, along with the z score based on pairwise energies, an important parameter on which predictions should be based. However, the z score for solvation can be used only when comparing monomeric proteins. Since hMSH2 becomes part of a multiprotein complex, the solvation energy z score is unlikely to be useful in predicting structural homologues; the solvation energy z score for the comparison of hMSH2 and gpb actually yielded a negative value. The fact that the potential-based threading method also identified gpb as a putative structural homologue of hMSH2 provides further evidence that the proteins identified by the prediction-based threading method are true structural homologues of hMSH2.

Though desirable, energy minimization and molecular dynamics could not be performed on these theoretical models due to software limitations. Generally, it is required that all residues be assigned coordinates. For rlr, for example, a region of 40 residues of hMSH2 was inserted into the rlr sequence and thus did not overlap any residues of rlr. The InsightII program is incapable of assigning coordinates to a loop flex region of more than 37 residues. Hence, without overlapping residues to guide assignments, the coordinates necessary for any refinement, minimization, or molecular dynamics steps could not be generated for this region of hMSH2. Similarly, the InsightII program cannot assign loop coordinates to flex regions of fewer than three residues. This was the problem with gpb, where two residues of hMSH2 were inserted into gpb with no overlapping sequences. Theoretically, arbitrary coordinates could be assigned to these residues; however, there is a high probability that these arbitrarily assigned coordinates would be wrong, and therefore minimization, although it would occur, would be done on an incorrect model with erroneous coordinates.

The models resulting from the prediction-based threading of hMSH2 to proteins with known structure, although still hypothetical, provide insight into the way this protein is likely to function. hMSH2 seems to be a globular molecule that does not contain a DNA binding groove, yet hMSH2 has been reported to bind to DNA mismatches even without the assistance of the other MMR proteins (68). The models suggest that the surface made up by the carboxyl-terminal domain is likely to be essential, and thus that mutational analysis focused on this region may prove fruitful in further elucidating the key amino acids involved in the mismatch recognition process.


   ACKNOWLEDGMENTS
 
The authors thank Dr. Jim Briggs for assistance in using the modeling software. We also thank Drs. David Jones and Burkhard Rost for their helpful advice. This work was conducted in part by the Clayton Foundation or Research-California Division. S.B.H. and G.L. are Clayton Foundation investigators. Contributions to this work by M.M.D. are in partial fulfillment of Ph.D. requirements in the Department of Biomedical Sciences.


   FOOTNOTES
 
2 Both authors contributed equally to this work.

1 Correspondence: 9500 Gilman Drive 0058, La Jolla, CA 92093–0058, USA. E-mail, mdelasal{at}sdcc14.ucsd.edu

3 Abbreviations: PDB, Protein Data Bank; gpb, glycogen phosphorylase; sly, 70 kDa soluble lytic transglycosylase; rlr, ribonucleotide reductase protein R1; 3D, 3-dimensional; MMR, DNA mismatch repair; HNPCC, hereditary nonpolyposis colon carcinoma; RMS, root mean square.

Received for publication September 15, 1997. Accepted for publication January 5, 1998.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
REFERENCES
 

  1. Kolodner, R. (1996) Biochemistry and genetics of eukaryotic mismatch repair. Genes & Dev. 10, 1433–1442[Free Full Text]
  2. Rhyu, M. S. (1996) Molecular mechanisms underlying hereditary nonpolyposis colorectal carcinoma. J. Natl. Cancer Inst. 88, 240–251[Abstract/Free Full Text]
  3. Papadopoulos, N., Nicolaides, N. C., Lui, B., Parson, R., Lenguaer, C., Palombo, F., D'Arrigo, A., Markowitz, S., Willson, J. K. V., Kinzler, K. W., Jiricny, J., and Vogelstein, B. (1995) Mutations of GTBP in genetically unstable cells. Science 268, 1915–1917[Abstract/Free Full Text]
  4. Fishel, R., Ewel, A., and Lescoe, M. K. (1994) Purified human MSH2 protein binds to DNA containing mismatched nucleotides. Cancer Res. 54, 5539–5542[Abstract/Free Full Text]
  5. Fishel, R., Ewel, A., Lee, S., Lescoe, M. K., and Griffith, J. (1994) Binding of mismatch microsatellite DNA sequences by the human MSH2 protein. Science 266, 1403–1405[Abstract/Free Full Text]
  6. Drummond, J. T., Li, G.-M., Longley, M. J., and Modrich, P. (1995) Isolation of an hMSH2-p160 heterodimer that restores DNA mismatch repair to tumor cells. Science 268, 1909–1912[Abstract/Free Full Text]
  7. Alani, E., Chi, H. W., and Kolodner, R. (1995) The Saccharomyces cerevisiae MSH2 protein specifically binds to duplex oligonucleotides containing mismatched DNA base pairs and insertions. Genes & Dev. 9, 234–247[Abstract/Free Full Text]
  8. Alani, E., Sokolsky, T., Studamire, B., Miret, J. J., and Lahue, R. S. (1997) Genetic and biochemical analysis of Msh2p-Msh6p: role of ATP hydrolysis and Msh2p-Msh6p subunit interactions in mismatch base pair recognition. Mol. Cell. Biol. 17, 2436–2447[Abstract]
  9. Palombo, F., Gallinari, P., Laccarino, I., Lettieri, T., Hughes, M., D'Arrigo, A., Truong, O., Hsuan, J. J., and Jiricny, J. (1995) GTBP, a 160-kilodalton protein essential for mismatch binding activity in human cells. Science 268, 1912–1914[Abstract/Free Full Text]
  10. Palombo, F., Iccarino, I., Nakajima, E., Ikejima, M., Shimada, T., and Jiricny, J. (1996) hMutSbeta, a heterodimer of hMSH2 and hMSH3, binds to insertion/deletion loops in DNA. Curr. Biol. 6, 1181–1184[Medline]
  11. Habraken, Y., Sung, P., Prakash, L., and Prakash, S. (1996) Binding of insertion/deletion DNA mismatches by the heterodimer of yeast mismatch repair proteins MSH2 and MSH3. Curr. Biol. 6, 1186–1187
  12. Iaccarino, I., Palombo, F., Drummond, J., Totty, N. F., Hsuan, J. J., Modrich, P., and Jiricny, J. (1996) MSH6, a Saccharomyces cerevisiae protein that binds to mismatches as a heterodimer with MSH2. Curr. Biol. 6, 484–486[Medline]
  13. Acharya, S., Wilson, T., Gradia, S., Kan, M. F., Guerrete, S., Marsischky, G. T., Kolodner, R., and Fishel, R. (1996) hMSH2 forms specific mispair-binding complexes with hMSH3 and hMSH6. Proc. Natl. Acad. Sci. USA 93, 13629–13634[Abstract/Free Full Text]
  14. Li, G.-M., and Modrich, P. (1995) Restoration of mismatch repair to nuclear extracts of colorectal tumor cells by a heterodimer of MutL homologs. Proc. Natl. Acad. Sci. USA 92, 1950–1954[Abstract/Free Full Text]
  15. Weber, T. K., Conlon, W., Petrelli, N. J., Rodriguez-Bigas, M., Keitz, B., Pazik, J., Farrell, C., O'Malley, L., Oshalim, M., Abdo, M., Anderson, G., Stoler, D., and Yandell, D. (1997) Genomic DNA-based hMSH2 and hMLH1 mutation screening in 32 eastern United States nonpolyposis colorectal cancer pedigrees. Cancer Res. 57, 3798–3803[Abstract/Free Full Text]
  16. Fishel, R., Lescoe, M. K., Rao, M. R. S., Copeland, N. G., Jenkinds, N. A., Garber, J., Kane, M., and Kolodner, R. (1993) The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell 75, 1027–1038[Medline]
  17. Leach, F. S., Nicolaides, N. C., Papadopoulos, N., Lui, B., Jen, J., Parsons, R., Peltomaki, P., Sistonen, P., Aaltonen, L. A., Nystrom-Lahti, M., Guan, X.-Y., Zhang, J., Meltzer, P. S., Yu, J.-W., Kao, F.-T., Chen, D. J., Cerosaletti, K. M., Fournier, R. E. K., Todd, S., Lewis, T., Leach, R. J., Naylor, S. L., Weissenbach, J., Mecklin, J.-P., Jarvinen, H., Petersen, G. M., Hamilton, S. R., Green, J., Jass, J., Watson, P., Lynch, H. T., Trent, J. M., de la Chapelle, A., Kinzler, K. W., and Vogelstein, B. (1993) Mutations of a MutS homolog in hereditary nonpolyposis colorectal cancer. Cell 75, 1215–1225[Medline]
  18. Buerstedde, J.-M., Alday, P., Torhorst, J., Weber, W., Müller, H., and Scott, R. (1995) Detection of new mutations in six out of 10 Swiss HNPCC families by genomic sequencing of the hMSH2 and hMLH1 genes. J. Med. Genet. 32, 909–912[Abstract]
  19. Hall, N. R., Taylor, G. R., Finan, P. J., Kolodner, R. D., Bodmer, W. F., Cottrell, S. E., Frayling, I., and Bishop, D. T. (1994) Intron splice acceptor site sequence variation in the hereditary non-polyposis colorectal cancer gene hMSH2. Eur.J. Cancer Res. 30A, 1550–1552
  20. Nystrom-Lahti, M., Parson, R., Sistonen, P., Pylkkanen, L., Aaltonen, L. A., Leach, F. S., Hamilton, S. R., Watson, P., Bronson, E., Fusaro, R., Cavlieri, J., Lynch, J., Lanspa, S., Smyrk, T., Lynch, P., Drouhard, T., Kinzler, K. W., Vogelstein, B., Lynch, H.-T., de la Chapelle, A., and Peltomaki, P. (1994) Mismatch repair genes on chromosomes 2p and 3p account for a major share of hereditary nonpolyposis colorectal cancer families evaluable by linkage. Am. J. Human Genet. 55, 659–665
  21. Wijnen, J., Vasen, H., Khan, P. M., Menko, F. H., van der Klift, H., van den Broek, M., van Leeuwen-Cornelisse, I., Nagengast, F., Meijers-Heijboer, E. J., Lindhout, D., Griffioen, G., Cats, A., Kleibeuher, J., Varesco, L., Bertario, L., Bisgaard, M.-L., Mohr, J., Kolodner, R., and Fodde, R. (1995) Seven new mutations in hMSH2, an HNPCC gene, identified by denaturing gradient-gel electrophoresis. Am. J. Human Genet. 56, 1060–1066[Medline]
  22. Liu, B., Parsons, R. E., Hamilton, S. R., Petersen, G. M., Lynch, H. T., Watson, P., Markowitz, S., Willson, J. K. V., Green, J., de la Chapelle, A., Kinzler, K. W., and Vogelstein, B. (1994) hMSH2 mutations in hereditary nonpolyposis colorectal cancer kindreds. Cancer Res. 54, 4590–4594[Abstract/Free Full Text]
  23. Liu, B., Parsons, R., Papadopoulos, N., Nicolaides, N. C., Lynch, H. T., Watson, P., Jass, J. R., Dunlop, M., Wyllie, A., Peltomaki, P., de la Chapelle, A., Hamilton, S. R., Vogelstein, B., and Kinzler, K. W. (1996) Analysis of mismatch repair genes in hereditary non-polyposis colorectal cancer patients. Nature Med. 2, 169–174[Medline]
  24. Nystrom-Lahti, M., Wu, Y., Moisio, A.-L., Hofstra, R. M. W., Osinga, J., Mecklin, J.-P., Jarvinen, H. J., Leisti, J., Buys, C. H. C. M., de la Chapelle, A., and Peltomaki, P. (1996) DNA mismatch repair gene mutations in 55 kindreds with verified of putative hereditary non-polyposis colorectal cancer. Human Mol. Genet. 5, 763–769[Abstract/Free Full Text]
  25. Froggatt, K. J., Koch, J., Davies, R., Evans, D. G., Clamp, A., Quarrell, O. W., Weissenbach, J., Hodgson, S. V., Ponder, B. A., Barton, D. E., et al. (1995) Genetic linkage analysis in hereditary non-polyposis colon cancer syndrome. J. Med. Genet. 32, 352–357[Abstract]
  26. de la Chapelle, A., and Peltomaki, P. (1995) Genetics of hereditary colon cancer. Annu. Rev. Genet. 29, 329–348[Medline]
  27. Whitehouse, A., Parmar, R., Deeble, J., Taylor, G. R., Phillips, S. E. V., Meredith, D. M., and Markham, A. F. (1996) Mutational analysis of the nucleotide binding domain of the mismatch repair enzyme hMSH-2. Biochem. Biophys. Res. Commun. 229, 147–153[Medline]
  28. Whitehouse, A., Deeble, J., Taylor, G. R., Guillou, P. J., Phillips, S. E. V., Meredith, D. M., and Markham, A. F. (1997) Mapping the minimal domain of hMSH-2 sufficient for binding mismatched oligonucleotides. Biochem. Biophys. Res. Commun. 232, 10–13[Medline]
  29. Whitehouse, A., Taylor, G. R., Deeble, J., Phillips, S. E. V., Meredith, D. M., and Markham, A. F. (1996) A carboxy terminal domain of the hMSH-2 gene product is sufficient for binding specific mismatched oligonucleotides. Biochem. Biophys. Res. Commun. 225, 289–295[Medline]
  30. Chothia, C., and Lesk, A. M. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823–836[Medline]
  31. Lesk, A. M. (1991) Protein Architecture—A Practical Approach, Oxford University Press, Oxford
  32. Rost, B., and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599[Medline]
  33. Rost, B., and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19, 55–77[Medline]
  34. Rost, B., and Sander, C. (1994) Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226[Medline]
  35. Rost, B., Fariselli, P., and Casadio, R. (1996) Prediction of helical transmembrane segments at 95% accuracy. Protein Sci. 7, 1704–1718
  36. Rost, B. (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. 266, 525–539[Medline]
  37. Rost, B. (1995) Fitting 1D predictions into 3D structures. In Protein Folds. A Distance-based Approach (Bohr, H., and Brunak, S., eds) pp. 132–151, CRC Press, Boca Raton, Florida
  38. Rost, B. (1995) TOPITS: threading one-dimensional predictions into three-dimensional structures. In The Third International Conference on Intelligent Systems for Molecular Biology (ISMB) (Rawlings, C., Clark, D., Altman, R., Hunter, L., Lengauer, T., and Wodak, S., eds) pp. 314–321, AAAI Press, Menlo Park, California
  39. Rost, B., Schneider, R., and Sander, C. (1997) Protein fold recognition by prediction-based threading. J. Mol. Biol. 270, 471–480[Medline]
  40. Sander, C., and Schneider, R. (1991) Database of homology-derived structures and the structural meaning of sequence alignment. Proteins 9, 56–68[Medline]
  41. Shenkin, P. S., Yarmush, D. L., Fine, R. M., Wang, H. J., and Levinthal, C. (1987) Predicting antibody hypervariable loop conformation. I. Ensembles of random conformations for ringlike structures. Biopolymers 26, 2053–3085[Medline]
  42. Biosym/MSI (1995) Homology User Guide, Biosym/MSI, San Diego, California
  43. Fishel, R., and Kolodner, R. D. (1995) Identification of mismatch repair genes and their role in the development of cancer. Curr. Opin. Genet. Dev. 5, 382–395[Medline]
  44. Lazar, V., Grndjouan, S., Bognel, C., Couturier, D., Rougier, P., Bellet, D., and Bressac-de Paillerets, B. (1994) Accumulation of multiple mutations in tumor suppressor genes during colorectal tumorigenesis in HNPCC patients. Human Mol. Genet. 3, 2257–2260[Free Full Text]
  45. Kolodner, R. D., Hall, N. R., Lipford, J., Kane, M. R. F., Rao, M. R. S., Morrison, P., Wirth, L., Finan, P., Burm, J., Chapman, P., Earabino, C., Merchant, E., and Bishop, D. T. (1994) Structure of the human MSH2 locus and analysis of two Muir-Torre kindreds for MSH2 mutations. Genomics 54, 4590–4594
  46. Chung, D. C., and Rustgi, A. K. (1995) DNA mismatch repair and cancer. Gastroenterology 109, 1685–1699[Medline]
  47. Borresen, A.-L., Lothe, R. A., Meling, G. I., Lystad, S., Lipford, J., Kane, M. F., Rognum, T. O., and Kolodner, R. D. (1995) Somatic mutations in the hMSH2 gene in microsatellite unstable colorectal carcinomas. Human Mol. Genet. 4, 2065–2072[Abstract/Free Full Text]
  48. Cama, A., Genuardi, M., Guanti, G., Radice, P., and Varesco, L. (1996) Molecular genetics of hereditary non-polyposis colorectal cancer (HNPCC). Tumori 82, 122–135[Medline]
  49. Orth, K., Hung, J., Gazdar, A., Bowcock, A., Mathis, J. M., and Sambrook, J. (1994) Genetic instability in human ovarian cancer cell lines. Proc. Natl. Acad. Sci. USA 91, 9495–9499[Abstract/Free Full Text]
  50. Thibodeau, S. N., French, A. J., Roche, P. C., Cunningham, J. M., Tester, D. J., Lindor, N. M., Moslein, G., Baker, S. M., Liskay, R. M., and Burgart, L. J. (1996) Altered expression of hMSH2 and hMLH1 in tumors with microsatellite instability and genetic alterations in mismatch repair genes. Cancer Res. 56, 4836–4840[Abstract/Free Full Text]
  51. Wu, T. H., and Marinus, M. G. (1994) Dominant negative mutator mutations in the mutS gene of E. coli. J. Bacteriol. 176, 5393–5400
  52. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature (London) 368, 86–89
  53. Orengo, C. A., and Taylor, W. R. (1990) A rapid method of protein structure alignment. J. Theor. Biol. 147, 517–551[Medline]
  54. Taylor, W. R., and Orengo, C. A. (1989) Protein structure alignment. J. Mol. Biol. 208, 1–22[Medline]
  55. Myers, E., and Miller, W. (1988) Optimal alignments in linear space. CABIOS 4, 11–17[Abstract/Free Full Text]
  56. Umar, A., and Kunkel, T. A. (1996) DNA-replication fidelity, mismatch repair and genome instability in cancer cells. Eur. J. Biochem. 238, 297–307[Medline]
  57. Dombradi, V. (1981) Structural aspects of the catalytic and regulatory function of glycogen phosphorylase. Int. J. Biochem. 13, 125–139[Medline]
  58. Engel, H., Kazemier, B., and Keck, W. J. (1991) Murein-metabolizing enzymes from Escherichia coli: sequence analysis and controlled overexpression of the slt gene, which encodes the soluble lytic transglycosylase. J. Bacteriol. 173, 6773–6782[Abstract/Free Full Text]
  59. Holtje, J.-V., Mirelman, D., Sharon, N., and Schwarz, U. (1975) Novel type of murein transglycosylase in Escherichia coli. J. Bacteriol. 124, 1067–1076
  60. Keck, W., Wientjes, F. B., and Schwarz, U. (1985) Comparison of two hydrolytic murein transglycosylases of Escherichia coli. Eur. J. Biochem. 148, 493–497
  61. Uhlin, U., and Eklund, H. (1994) Structure of ribonucleotide reductase protein R1. Nature (London) 370, 533–539[Medline]
  62. Blake, C. C. F. (1979) Structure and control of phosphorylase. Nature (London) 280, 448
  63. Thunnissen, A.-M. W. H., Dijkstra, A. J., Kalk, K. H., Rozeboom, H. J., Engel, H., Keck, W., and Dijkstra, B. W. (1994) Doughnut-shaped structure of a bacterial muramidase revealed by X-ray crystallography. Nature (London) 367, 750–753[Medline]
  64. Russell, R. B., Copley, R. R., and Barton, G. J. (1996) Protein fold recognition by mapping predicted secondary structures. J. Mol. Biol. 259, 349–365[Medline]
  65. Lemer, C. M.-R., Rooman, M. J., and Wodak, S. J. (1995) Protein structure prediction by threading methods: evaluation of current techniques. Proteins 23, 337–355[Medline]
  66. Moult, J., Judson, R., Fidelis, K., and Pederson, J. T. (1995) Large scale experiment to assess protein structure prediction methods. Proteins 23, ii–iv[Medline]
  67. Jones, D. T., Miller, R. T., and Thornton, H. M. (1995) Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing. Proteins 23, 384–397
  68. Mello, J. A., Acharaya, S., Fishel, R., and Essigman, J. M. (1996) The mismatch repair protein hMSH2 binds selectively to DNA adducts of the anticancer drug cisplatin. Chem. Biol. 3, 579–589[Medline]



This article has been cited by other articles:


Home page
Mol. Pharmacol.Home page
M. M. d. l. Alas, G. Los, X. Lin, B. Kurdi-Haidar, G. Manorek, and S. B. Howell
Identification of Transdominant-Negative Genetic Suppressor Elements Derived from hMSH2 That Mediate Resistance to 6-Thioguanine
Mol. Pharmacol., November 1, 2002; 62(5): 1198 - 1206.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. M. Culligan, G. Meyer-Gauen, J. Lyons-Weiler, and J. B. Hays
Evolutionary origin, diversification and specialization of eukaryotic MutS homolog mismatch repair proteins
Nucleic Acids Res., January 15, 2000; 28(2): 463 - 471.
[Abstract] [Full Text] [PDF]


Home page
Mol. Pharmacol.Home page
E. D. Scheeff, J. M. Briggs, and S. B. Howell
Molecular Modeling of the Intrastrand Guanine-Guanine DNA Adducts Produced by Cisplatin and Oxaliplatin
Mol. Pharmacol., September 1, 1999; 56(3): 633 - 643.
[Abstract] [Full Text]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by De las alas, M. M.
Right arrow Articles by Howell, S. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by De las alas, M. M.
Right arrow Articles by Howell, S. B.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS