|
|
||||||||
Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
1Correspondence: GeneFormatics, Inc., 5830 Oberlin Dr., Ste. 200, San Diego, CA 92121, USA. E-mail: jacque{at}geneformatics.com
| ABSTRACT |
|---|
|
|
|---|
Key Words: functional genomics function prediction structural genomics structure-based function annotation fuzzy functional form
| INTRODUCTION |
|---|
|
|
|---|
To rectify both shortcomings, we have developed a structure-based
function identification method (2
3
4)
. We use a geometric
and residue-based descriptor of the active site associated with a given
biochemical function (termed a `fuzzy functional form' or
FFF2
).If a protein structure satisfies this descriptor, then the protein is
predicted to have the function of interest. Thus, we do not require any
information about the evolutionary relationship of the protein of
interest to other proteins that have the same function. Rather, the
biochemical function is established based on the physical and chemical
properties of the active site itself. Another advantage of using a
structural model is that, in principle, one could address all the
levels of function described above; in practice, however, only some
levels have so far been addressed. Nevertheless, even at the present
state of the art, if one finds two active sites ascribed to different
biochemical functions to be in close spatial proximity, and if one of
the functions has been suggested experimentally to exert control over
the other, then knowledge of their spatial proximity would make a very
strong circumstantial case for such control. When the relevant residues
are far apart in sequence, no sequence-based approach can provide this
type of information. On the other hand, a clear disadvantage of our
approach is that the protein's structure needs to be known to apply
the structural descriptors. However, we have shown that the descriptors
can be applied not only to high-resolution X-ray and nuclear magnetic
resonance structures, but also to inexact models produced by
state-of-the-art threading, homology modeling, and ab initio
folding programs (2)
.
We have previously built a FFF for the thiol-disulfide oxidoreductase
active site of the glutaredoxin/thioredoxin protein family (2
, 3)
. Application of this FFF to a set of high-resolution
structures from the Brookhaven Protein Data bank (5)
yielded all known glutaredoxins, thioredoxins, and disulfide isomerase
proteins. To our surprise, it also yielded the protein 1fjm
(6)
, a member of the serine/threonine phosphatase protein
family. Subsequent analysis presented here shows that this protein
might indeed have a disulfide oxidoreductase active site, suggesting a
mechanism of redox control for these phosphatases. Identification of a
putative regulatory active site in a well-studied protein whose
structure was solved over three years ago demonstrates the
multi-faceted nature of function in proteins and emphasizes the need
for structural descriptors of protein function at all levels.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Sequence variability
Sequence variability was calculated using a program kindly
provided by Dr. Peter Shenkin. This program takes the Pileup alignments
as input and calculates the residue entropy and variability for each
aligned residue in the sequence, as described previously
(9)
.
Cluster analysis
The 70 serine/threonine phosphatases, identified from the
Psi-BLAST search described above, were aligned using the program
Pileup. From the multiple alignment, a pairwise distance table was
created using the program Distances. The Growtree program was used to
create the tree figure from the Distances table. All programs are found
in the Wisconsin GCG sequence analysis software package, v9.1. Default
parameters were used.
Construction of a FFF for the thiol-disulfide oxidoreductase
active site
By design, FFFs are meant to be `fuzzy' so that they can be
applied to the inexact structures produced by current protein modeling
techniques. Thus, they are built using only the alpha carbon
coordinates of residues, that have been shown to be important. In the
thiol-disulfide oxidoreductases, two cysteine residues are apparently
critical for the complete functioning of the protein
(10
11
12
13)
. A cis-proline that is close in space,
but not in sequence, is also structurally conserved in this diverse
family (14)
. An example of the active site in
Escherichia coli thioredoxin (2trx; ref 15
) is
shown in Fig. 1
A. The FFF for the thiol-disulfide oxidoreductase active site
of the glutaredoxin/thioredoxin family was built from the relative
alpha carbon positions of these three active site residues (2
, 3)
. The FFF also contains residues located on either side of
these active site residues to create a structural motif of nine
residues (4)
. Use of these adjacent residues to describe
the active site geometry specifies the location of the beta carbon of
each of the three active site residues (16)
. The FFF is
described by the average distances between the alpha carbons of these
nine residues plus or minus a small variance.
|
| RESULTS |
|---|
|
|
|---|
Identification of a potential thiol-disulfide oxidoreductase active
site in the serine/threonine phosphatase-1 subfamily
The thiol-disulfide oxidoreductase active site in the
glutaredoxin/thioredoxin family exhibits some identifying
characteristics (Fig. 1A
). The two active site cysteines are
found at the amino terminus of an
-helix. The side chains of the two
cysteines are found on one face of the helix and lie parallel to the
conserved proline side chain. The proline is found in an upside-down
V-shaped structure, where the proline is at the vertex of the V. In the
eight structures belonging to the glutaredoxin/thioredoxin family in
our database of 1501 proteins, the proline is always found in the
cis configuration. None of these characteristics was encoded
into the FFF, which is a simple structural descriptor that only uses
distances between alpha carbons to define the active site.
Analysis of the putative thiol-disulfide oxidoreductase active site
residues identified by the FFF in 1fjm (6)
shows many of
these same characteristics (Fig. 1B
). The two putative
active site cysteines in 1fjm are found at a helix terminus. The
proline is at the vertex of an upside-down V and the proline is in the
cis configuration. The root mean square difference between
the backbone atoms of the proline and the two residues on either side
of it in 1fjm and 2trx (E. coli thioredoxin) is 0.58 Å.
There are two distinct structural differences between the potential
redox site in serine/threonine phosphatase-1 (1fjm) and the actual
disulfide oxidoreductase active site in thioredoxin (2trx). The first
is that the putative active site in 1fjm is found at the carboxyl
terminus of a helix, whereas the sites in the glutaredoxin/thioredoxin
family are found at a helix amino terminus (Fig. 1)
. Although the helix
dipole has been suggested to be important for biological activity
(13)
, this contention is disputed (10)
. Of
course, the specific biological activity could be somewhat different in
the serine/threonine phosphatases. The second difference between the
active site structures is in the orientation of the helix relative to
the proline. If the vertex of the V-shaped proline structure is
oriented so that it is pointing up, then the cysteine side chains hang
down from one face of the helix to lie parallel to the proline side
chain in the glutaredoxin/thioredoxin structures (Fig. 1A
).
In 1fjm, on the other hand, the cysteines point up from the face of the
helix to lie next to the proline side chain when the vertex of the
proline V is pointing up (Fig. 1B)
. It should be noted that the
cysteine side chains lie in virtually identical positions relative to
the cis-proline, despite the orientation of the helix.
However, the change in orientation causes the cysteines to be at the
protein surface in the serine/threonine phosphatase-1, whereas they are
somewhat more buried in the glutaredoxin and thioredoxin structures.
Structural analysis provides some evidence that the site identified in
the serine/threonine phosphatase-1 1fjm might indeed be a
thiol-disulfide oxidoreductase active site. Note that although the 1fjm
structure has been solved (6)
, the serine/threonine
phosphatase-1 subfamily has been well studied (reviewed in refs
17
18
19
20
) and reactive sulfhydryl groups are known to be
present (21
, 22)
, the actual disulfide oxidoreductase site
in the PP1 subfamily has not been previously identified.
Analysis of the serine/threonine phosphatase sequences
To further investigate the putative redox site in 1fjm, we
analyzed the serine/threonine phosphatase sequences. This family of
proteins is divided into four subfamilies: PP1, of which 1fjm is a
member; PP2A; PP2B (of which calcineurin is a member); and PP2C
(19)
. PP2C appears to be sequentially unrelated to PP1,
PP2A, and PP2B, although members of this subfamily catalyze a similar
reaction (19)
and structural comparison suggests a similar
catalytic reaction mechanism (23)
. Members of subfamilies
PP1, PP2A, and PP2B exhibit significant sequence similarity (Fig. 2
). On average, there is 49% sequence identity between the catalytic
domains of PP1 and PP2A sequences. Likewise, there is an average 40%
sequence identity between catalytic domains of PP1 and PP2B sequences
(6)
. Although the sequences of PP1, PP2A, and PP2B are
very similar, the proteins in these subfamilies differ in their
substrate specificities and interactions with regulatory molecules.
|
To analyze the serine/threonine phosphatase subfamilies, we first used
the PP1A HUMAN sequence (human serine/threonine
phosphatase-1) as the probe sequence in a Psi-BLAST search
(8)
of the SwissProt sequence database (7)
.
This search revealed 31 sequences that are explicitly annotated as
serine/threonine phosphatase-1 proteins. (To limit the search,
sequences annotated as possible or putative PP1s were not considered.)
A multiple sequence alignment of this subfamily shows that the putative
redox active site cysteines and proline are almost invariant in this
subfamily (Fig. 2
, Table 1
). This initial analysis showed that one of the 31 sequences (P20604)
contained none of the three residues. (Subsequent analysis shows that
this sequence is unlikely to be a PP1 sequence; see below.) Of the
remaining 30 sequences, the second cysteine and the proline are
invariant. The first cysteine is conserved in 26 sequences and is
replaced by a threonine in three sequences and by a serine in one
sequence. A residue conservation analysis (9)
shows that
these three residues are better conserved than many residues in the
family overall (Fig. 3
A). Such strong conservation suggests that these residues
might indeed be important for function or structure of the proteins in
this subfamily.
|
|
We also performed two other Psi-BLAST searches on the SwissProt
database, using human serine/threonine phosphatase-2A
(PP2A HUMAN) and a serine/threonine phosphatase-2B
(PP2B HUMAN) as the probe sequences. These searches
found 23 sequences explicitly annotated as serine/threonine
phosphatase-2A and 16 sequences annotated as serine/threonine
phosphatase-2B. (Again, proteins labeled as putative or `possible'
were not considered.) Multiple alignments were created for each of the
three sequence sets, and these alignments show that the putative
oxidoreductase active site cysteines and proline are not conserved in
the PP2A or PP2B families (Fig. 2)
. Residue conservation analysis shows
that the proline is rather unconserved in the PP2A subfamily and both
cysteines are rather unconserved in the PP2B subfamily (Fig. 3)
.
The above observations suggested some covariation among the
putative redox active site cysteines and proline. Therefore, we
performed a cluster analysis of the multiple sequence alignment created
from the 70 PP1, PP2A, and PP2B sequences and created a tree diagram
from the pairwise distances. As expected, the PP1, PP2A, and PP2B
subfamilies are clearly separated in the tree (Fig. 4
); however, the tree exhibited some interesting results. First, the one
sequence (P20604) labeled as a PP1, but which did not contain either of
the cysteines or the proline, does not belong in the PP1 subfamily, but
rather belongs in the PP2A subfamily (red bar in Fig. 4
). This strongly
suggests that this sequence is incorrectly annotated in the SwissProt
database. The second interesting result is that two of the four
sequences that contain a threonine rather than the first cysteine in
the PP1 sequences fall between the PP1 and PP2A subfamilies (cyan bars
in Fig. 4
). The two other sequences that contain a serine or threonine
residue instead of the first cysteine are found as a subfamily of the
PP1 subfamily (Fig. 4)
. Thus, the second cysteine and the proline are
invariant in 30 out of 30 PP1 sequences. The first cysteine is
invariant in 26 out of 30 sequences, but two of the four sequences in
which it is not a cysteine lie between the PP1 and PP2A subfamilies in
the cluster analysis.
|
This strong residue conservation in the PP1 subfamily contrasts with
the sequences in the other subfamilies, where the identities of the
residues that are sequentially homologous to the cysteines and proline
are not conserved (Table 1)
. In the PP2A subfamily, the second cysteine
is replaced by an invariant tyrosine. The first cysteine is replaced by
leucine, valine, or isoleucine, and the proline is replaced by either
isoleucine, valine, leucine, alanine, lysine, or phenylalanine. In the
PP2B subfamily, the position homologous to the proline is conserved as
a phenylalanine, but the first cysteine is replaced by serine,
threonine, or alanine whereas the second cysteine is replaced by
asparagine, valine, cysteine, alanine, or serine. The strong
conservation of the two cysteines and the proline in the PP1 subfamily,
but not in the PP2A or PP2B subfamilies, strongly supports our
contention that these residues are functionally important.
Proximity of putative disulfide oxidoreductase active site to the
phosphatase active site
If the putative disulfide oxidoreductase active site is a location
for activation or inactivation of the phosphatase, one would expect the
redox site to be located somewhere near the phosphatase active site.
This is indeed found to be the case. This fact points out the power of
a structure-based approach to function prediction. Even if a
sequence-based method had found the disulfide oxidoreductase active
site (and none did), one could not tell if the two functions were
independent or interdependent based on sequence alone. The active site
residues in 1fjm are found to lie in a groove of the protein
(6)
. The inhibitor microcystin, which was cocrystallized
with the phosphatase, lies along this groove, across the phosphatase
active site residues (Fig. 1C
). The alpha carbon of the
histidine in the phosphatase active site is an average of 14 Å from
the three alpha carbons of the putative redox active site, which is
about half the length of the groove along which the microcystin lies
(Fig. 1C
). Thus, the putative disulfide oxidoreductase
active site is in a location where it could potentially affect the
phosphatase active site of the protein.
Analysis of other residues possibly involved in the disulfide
exchange reaction
We then looked in the vicinity of the putative disulfide
oxidoreductase active site for other residues that might be involved in
the disulfide exchange mechanism. It has been reported that a buried
aspartic acid residue (Asp 26 in E. coli thioredoxin) is
responsible for decreasing the pKa of the active
site cysteines, thus increasing the rate of the disulfide
oxidoreductase reaction in vivo (10)
. Assuming
a similar mechanism might be at work here, we looked for aspartic acid
and glutamic acid residues near the cysteines in the serine/threonine
phosphatase-1 structure. We found two: Asp 154 and Glu 44. Asp 154 is
adjacent to Cys 155, one of the putative active site cysteines, in the
helix. This residue is well conserved and is either an aspartic acid or
a glutamic acid in the 30 PP1 sequences analyzed (Figs. 2
and 4)
. The
residue that aligns with Asp 154 in the PP2A subfamily is aspartic acid
in 20 out of 24 sequences and glutamic acid in two; the other two are
Asn and Gln. In the PP2B subfamily in this position are glutamic acid,
aspartic acid, histidine, glutamine, and arginine. Thus, the negative
charge is invariant in the PP1 subfamily; it is conserved in the PP2A
subfamily but is not conserved in the PP2B subfamily. Glu 44 is found
in a helix that packs against and is parallel to the helix containing
the potential active site cysteines. This residue is not as conserved
in either of the subfamilies (Fig. 2)
. The conservation of the negative
charge on residue 154 suggests it might be involved in the disulfide
exchange mechanism.
Positively charged side chains could also work to stabilize the
negative thiolate anion that is thought to form during the disulfide
exchange reaction. We found several arginine residues in close
structural proximity to the cysteines, including Arg 43, whose side
chain lies over the cysteine side chains, and Arg 191, which is
adjacent to the cis-Pro 192. Arg 191 is strictly conserved
in all the serine/threonine phosphatase subfamilies. The positive
charge at residue 43 is conserved in the PP1 subfamily: it is an
arginine in 21 sequences and a lysine in 8 sequences, but in the PP2A
subfamily it can be arginine, lysine, valine, or glutamine. In the PP2B
subfamily, there are no arginines, but alanine, threonine, or glycine
can be found at this position (Fig. 2)
.
In the 1fjm crystal structure, there is another cysteine, Cys 39,
which lies in a position to potentially react with the putative redox
Cys 155. This residue is usually a cysteine in PP1 sequences, but is
replaced by a valine in two sequences: P23733 and P23734. These two
sequences are the ones found between the PP1 and PP2A subfamilies in
the cluster analysis (Fig. 4)
. In these two sequences, the first
cysteine is also replaced by a threonine. This residue covariation does
suggest that all three cysteines might be involved in the redox
reaction in the serine/threonine phosphatases. Cys 39 is conserved
in the PP2A subfamily, except for one protein, where it is replaced
by valine. In the PP2B subfamily, the residues found in this
position are isoleucine, leucine, and valine.
| DISCUSSION |
|---|
|
|
|---|
Biological significance of a redox site in the serine/threonine
phosphatase-1 subfamily
Serine/threonine protein phosphatases catalyze the removal of a
phosphate group from a serine or threonine residue. They are found in
all eukaryotic cell types and play a central role in the control of
many cellular processes, including control of the cell cycle, metabolic
regulation, and growth factor signaling pathways (reviewed in ref
24
). Because of this central role, they are also subject
to complex regulatory mechanisms (24)
.
Control of serine/threonine protein phosphatases and other
phosphatases by redox mechanisms has been hypothesized and in some
cases demonstrated (17
, 22
, 25
26
27)
; however, the
mechanisms are complex and not well understood. Serine/threonine
protein phosphatases-1 and -2A have been shown to be inactivated by a
variety of thiol group reagents, although PP1 is more rapidly
inactivated by several of these compounds (21)
. The
in vivo significance of these observations is not
understood. A redox-sensitive protein phosphatase is inactivated by
tumor necrosis factor/interleukin-1 signal transduction
(22)
. More recent evidence has suggested that the
particular redox-sensitive phosphatase in this case comes from the PP2A
subfamily (28)
Given the multiplicity of pathways that the serine/threonine
phosphatases regulate, there probably are multiple control mechanisms
that act under many different specific conditions. Because PP1 proteins
are more rapidly inactivated by some sulfhydryl reagents
(21)
and given the structural similarity of the site
described here to the disulfide oxidoreductase active sites in the
glutaredoxin/thioredoxin protein family, that this site is not
conserved in the PP2A or PP2B subfamilies, and that the putative
disulfide reductase active site is very close to the phosphatase active
site, we propose that a specific pathway exists for redox regulation
for the PP1 subfamily that involves this site.
Implications for functional genomics and function annotation
This prediction, if true, has some significant implications for
the burgeoning field of functional genomics. First of all, by using
structural descriptors of active sites, we do not require that proteins
having the same function be evolutionary related. Indeed, in this
method such information is not necessary. In the specific case of the
serine/threonine phosphatases, we have identified a putative disulfide
reductase active site using a descriptor based on proteins having no
apparent evolutionary relationship whatsoever to this family.
Furthermore, this study suggests that proteins with very similar
sequences can gain additional functional sites during evolution. This
has been shown in other proteins (see, for example, ref
29
), but has been rarely predicted in advance of
experiment. Thus, a protein can have more than one `function', and
identification of a sequence as a serine/threonine phosphatase, or even
as a serine/threonine phosphatase-1, is not enough to identify the
full, multilevel biological function of the protein. The most useful
annotation methods will explicitly identify all levels of function,
including catalytic activities, substrate specificities, binding of
cofactors, interaction with regulatory proteins, or interaction with
other macromolecules. Structural descriptors of active sites, rather
than linear sequence motifs, are especially well suited to this
task.
| FOOTNOTES |
|---|
Received for publication January 27, 1999. Accepted for publication July 29, 1999.
| REFERENCES |
|---|
|
|
|---|
/ß hydrolase family. Folding Design 4,535-548
This article has been cited by other articles:
![]() |
K. M. Humphries, J. K. Pennypacker, and S. S. Taylor Redox Regulation of cAMP-dependent Protein Kinase Signaling: KINASE VERSUS PHOSPHATASE INACTIVATION J. Biol. Chem., July 27, 2007; 282(30): 22072 - 22079. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Goyal, D. Mohanty, and S. C. Mande PAR-3D: a server to predict protein active site residues Nucleic Acids Res., July 13, 2007; 35(suppl_2): W503 - W505. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Chu and T. J. Ferro Identification of a hydrogen peroxide-induced PP1-JNK1-Sp1 signaling pathway for gene regulation Am J Physiol Lung Cell Mol Physiol, November 1, 2006; 291(5): L983 - L992. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Quevillon-Cheruel, N. Leulliot, M. Graille, N. Hervouet, F. Coste, H. Benedetti, C. Zelwer, J. Janin, and H. Van Tilbeurgh Crystal structure of yeast YHR049W/FSH1, a member of the serine hydrolase family Protein Sci., May 1, 2005; 14(5): 1350 - 1356. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Baxter, J. S. Rosenblum, S. Knutson, M. R. Nelson, J. S. Montimurro, J. A. Di Gennaro, J. A. Speir, J. J. Burbaum, and J. S. Fetrow Synergistic Computational and Experimental Proteomics Approaches for More Accurate Detection of Active Serine Hydrolases in Yeast Mol. Cell. Proteomics, March 1, 2004; 3(3): 209 - 225. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Goldsmith-Fischman and B. Honig Structural genomics: Computational methods for structure analysis Protein Sci., September 1, 2003; 12(9): 1813 - 1821. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-K. Wu, T. A. Dailey, H. A. Dailey, B.-C. Wang, and J. P. Rose The crystal structure of augmenter of liver regeneration: A mammalian FAD-dependent sulfhydryl oxidase Protein Sci., May 1, 2003; 12(5): 1109 - 1118. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Samavati, M. M. Monick, S. Sanlioglu, G. R. Buettner, L. W. Oberley, and G. W. Hunninghake Mitochondrial KATP channel openers activate the ERK kinase by an oxidant-dependent mechanism Am J Physiol Cell Physiol, July 1, 2002; 283(1): C273 - C281. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Ondrechen, J. G. Clifton, and D. Ringe THEMATICS: A simple computational predictor of enzyme function from structure PNAS, October 12, 2001; (2001) 211436698. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Fetrow, N. Siew, J. A. Di Gennaro, M. Martinez-Yamout, H. J. Dyson, and J. Skolnick Genomic-scale comparison of sequence- and structure-based methods of function prediction: Does structure provide additional insight? Protein Sci., May 1, 2001; 10(5): 1005 - 1014. [Abstract] [Full Text] |
||||
![]() |
M. J. Ondrechen, J. G. Clifton, and D. Ringe THEMATICS: A simple computational predictor of enzyme function from structure PNAS, October 23, 2001; 98(22): 12473 - 12478. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |