|
|
||||||||


,
,
,1
* Department of Chemistry and Biochemistry
Center for Theoretical Biological Physics, La Jolla, California, USA; and
Howard Hughes Medical Institute and Department of Pharmacology, University of California, San Diego
1Correspondence: 9500 Gilman Dr., Mail Code 0371, La Jolla, CA 92093-0371, USA. E-mail: pwolynes{at}chem.ucsd.edu
| ABSTRACT |
|---|
|
|
|---|
Key Words: conformational switch transcription factor effective charge multi-phosphorylation contact-map PCA
| INTRODUCTION |
|---|
|
|
|---|
In this paper, we address the energetic and structural consequence of one of the most important forms of reversible modification: phosphorylation. In phosphopeptides, the side chain polar hydrogens of serine, threonine, tyrosine, and histidine can be replaced by phosphoryl groups. Phosphorylation occurs quite frequently in signal transduction regulation and is increasingly being noticed throughout the genome due to the advancements of mass spectrometry (1)
. Direct structural modifications not only may affect protein-protein interactions, but can allow conformational switches to be constructed using single domain proteins (e.g., see refs 2
3
4
5
6
). Using the NFAT regulatory domain as an example, we show these changes can not only modify the secondary structure of the protein locally but also globally alter protein tertiary structure, thus allowing phosphorylation to reset the switch.
A phosphorylation target routinely has multiple phosphorylation sites: the maximum number known so far is close to 20 (7)
. Extensive multi-phosphorylation (8)
of the protein NFAT (9)
(nuclear factor of activated T cells) modifies its action as a transcription factor that activates T cells (10
, 11)
. NFAT turns on the DNA transcription machine. This eventually yields several cytokines that help the immune cell to mature and then duplicate. Inactive NFAT is located in the cytoplasm; when activated, NFAT is transported into the nucleus to initiate transcription. About 13 specific dephosphorylation events are required to signal for the transportation of NFAT into the nucleus (9)
. Dephosphorylation of the cytoplasmic form of NFAT by calcineurin unmasks a nuclear localization signal (NLS), which in turn becomes attached to a shuttle protein going into the nucleus. This dephosphorylation may possibly mask a nuclear export sequence (NES) at the same time, but so far there is no clear evidence of the existence of a NES for NFAT. Conversely, when the nuclear form of NFAT becomes phosphorylated in the nucleus, the NLS becomes masked (and possibly NES unmasked) and the NFAT is then exported out of the nucleus. The structural details of how phosphorylation and dephosphorylation may act as conformational switches, as proposed in refs 9
, 10
, are still lacking (12)
.
To address these structural issues, we use two levels of modeling. We first examine those structural changes upon phosphorylation that are short ranged in sequence by using all-atom models (13)
. It reveals to what extent secondary structure is changed for peptides that are prone to being phosphorylated. We then use a coarse-grained, united residue level model to predict the global tertiary structural effects on protein conformation caused by (de)phosphorylations using associative memory hamiltonian (AMH) methods (14
, 15)
. These changes are largely driven by electrostatic and solvation changes.
Among many proteins in the NFAT family, we study murine NFAT1. NFAT1 has many aliases: NFATc2, NFATp, and NFAT
. It is a multi-domain protein with total 928 amino acids. The relevant part in this study is the regulatory domain as shown in Fig. 1
. We assume that other domainsthe AT domain, (Rel homology) DNA binding domain, and C-terminal domainare not involved in the regulation. There are several serine-rich regions (SRRs) and Ser-Pro motif sequences in the regulatory domain. All 13 of these changeable serines are found in SP motifs (nine of them) or in SRR-1 (four of them).
|
| MODELS |
|---|
|
|
|---|
These fragments were simulated using a constrained Brownian dynamics algorithm with adaptive time-step and implicit solvation implemented in software UHBD (16)
using the dynamical equations in ref 1
). The simulation uses the Amber force field to provide the systematic forces (17)
. Two implicit polar solvation schemes were studied: one based on a generalized Born (GB) solvation energy (18)
and the other based on a distance-dependent dielectric (DDD) solvation (13)
. The comparison of the results with DDD and GB solvation gives an idea of the robustness of our conclusions. To study the SP motif, owing to the unique nature of proline backbone conformations we also paid attention to the omega angle for the peptide TSPI.
There are no standard force field parameters for doubly deprotonated phosphoserines. Thus, to obtain a force field model, the atomic charges of phosphoserines were determined using the Amber charge fitting method (RESP) (19)
at the HF/6-31G* level of theory. The bonding interactions were determined by comparable standard parameters of Cornell et al. (17)
. After building initial models of each peptide, their dynamics were simulated at 350K for at least 3 µs.
The landscape of an all-atom model with effective solvation terms can be adequately sampled for the configurations of small peptides. Nevertheless, it is difficult to sample well the landscape for a complete protein comprising hundreds of residues. Thus, to study tertiary structural changes, we need a simple polymer model that contains the essential chemical physics of the effect of phosphorylation on inter-residue interactions yet retains minimal necessities of protein stereochemistry. We adopt an effective energy model with these virtues that has been studied by Wolynes and collaborators (14
, 15)
. This so-called associative memory hamiltonian model (14)
uses explicit dynamical variables for the locations of the C
, O, and Cß atoms of each residue other than glycines. Implicit locations for the C and N atoms in the backbone follow from the geometrical constraints of the peptide backbone. To study phosphorylation, we modify the coarse-grained model using the insight from the all-atom simulations and physical intuition. The AMH has many protein structural details; but is still a simplified model containing contact interactions with only a few flavors. To extend beyond the modeling of translationally unmodified proteins, we will model the phosphorylated serine straightforwardly as an extra "hypercharged" flavor.
The AMH, like many other coarse-grained protein models, splits the physical interactions into two parts: H=Hbackbone+ Hcontact, backbone interactions and contact interactions. The first part accounts for the effects of generic polymeric properties. We could potentially use different Ramachandran terms, which assign the preferred backbone torsional angles
(C-N-C
-C) and
(N-C
-C-N) based on all-atom simulation results. This Ramachandran term is important for inducing the protein to form the correct secondary structure. The original AMH utilizes a residue independent Ramachandran potential with UR(
,
), as a sum of preferred Gaussian wells. In the present incarnation of the AMH, we treat both glycine and proline specifically with their own torsional angle potentials.
For interactions between nonbonded residues, the AMH has a summation of pairwise contact terms (short, medium, and long): Hc = Hs + Hm + Hl. There is a strength parameter of the form
(Pi,Pj,Pk,Pl) for each short and medium (sequence-wise) range contact, where Pk and Pl refer to residues from "memory proteins" that contain possibly matching supersecondary structure motifs (Indices i, j are labeled for target residues and k, l for memory residues identified by the alignments) and a simple three-well contact term
(Pi,Pj, n) for long-range contacts. In the usual AMH model, the residues are designated with a code encoding four flavors: acidic, basic, polar, and hydrophobic. Since we do not have much database information on how a phosphoserine interacts with these four flavors, using physical intuition we built a fifth flavor for phosphoserine based on the idea that it is a "hypercharged" glutamate. We use a somewhat ad hoc parameter m to control the value of the hypercharge. More specifically, the new flavor has a normal charge when any interacting residue of j, k, or l is hydrophobic (assuming i is a phosphoserine residue). The hypercharge will be m encountering any other polar residues. Finally, the hypercharge will be m2 for all the interactions between a phosphoserine and other charged or hypercharged residues. For example, when m = 1.2, a long-range contact energy between Zer(
phoSer) and Thr is 1.2-fold the energy between Glu and Thr, i.e.,
zer-thr=1.2 x
glu-thr. Similarly, we have
zer-zer/
glu-glu =1.24 = 2.08,
zer-lys/
glu-lys = 1.22 = 1.44, and
zer-phe/
glu-phe = 1.
In actual runs, we tested multiple values of m. Setting m = 1 returns to the case of mutants in which the selected serines are changed to glutamates. Similarly, we modify the initial 10 x3 long-range
s. We also studied other ways of setting the phosphoserine parameters, such as setting
(phosphoserine) =
(acidic)+ mx[
(acidic)-
(polar)]. This assignment gives similar results as the first setup. The qualitative results are not sensitive to these admittedly ad hoc modifications of the usual AMH model. Below we report the results using the first assignment of interactions. Using the AMH dynamics we simulate the chain of 170 amino acids (from residue no. 163 to no. 332: YRE... PDPT) that covers most of the regulatory domain and contains all 13 changeable serines. The nuclear form of the protein has charges (3+9)x(e) (for E and D), (3+10)xe (for K and R) and one permanent phosphoserine (2 e). Thus, the nuclear form is quite neutral with total charge 1 e, if we assume all the histidines are singly protonated. When 13 more serines are turned to phosphoserines in the cytoplasmic form, the molecule becomes highly charged with a total charge 27 e. This form can be called a strongly charged polyampholyte. For convenience, we will call the cytoplasmic form of this 170-mer "nfatZ," while the nuclear form having only one phosphorylated serine will be called "nfatS."
Following the usual AMH methodology, memory sets were obtained using a sequence-structure alignment (20)
. Each memory set is composed of the top 50 proteins by Z score (15)
ranking. It is interesting to point out that although we do not have any known structural homologues of this domain, the top aligned structures found by this search are 1ALU, 1A7M, 1LKI, etc. These are all cytokines that belong to the 4-helical cytokine fold superfamily (by SCOP classification). One of the high-scoring proteins is 1ALU, human interleukin-6. 1ALU is directly related to NFAT activation. Even using a database of the size of several thousand proteins containing both pure
and
/ß proteins, these same cytokine-like structures still come to the top as giving the best alignments by sequence-structure matching as judged by Z scores.
| RESULTS AND DISCUSSIONS |
|---|
|
|
|---|
Our study indicates that for the three nonproline cases, the phosphorylation stabilizes
-helix conformations. This conclusion was also reached by an earlier study focusing on other peptides (13)
. One difference between the present and earlier results is the magnitude of the effect that is found. The earlier study suggested that the maximum probability density moved completely from ß strand conformations to
-helical ones (13)
. With GB or DDD solvation, we see the effects of phosphorylation changes are predicted to be more subtle (around 0.51.5 kcal/mol). Several factors cause the difference between the studies: different sequences, force fields, etc. However, the qualitative conclusion is the same from both. We show in Fig. 2
the changes of Ramachandran plot of the peptide TSPI (DDD solvation) as an example.
|
We now focus on the phosphorylation on proline isomerization. The double difference 
G= (Gz,c- Gz,t)-(Gs,c- Gs,t) measures the change of stability between the cis and trans form caused by phosphorylation. Our calculations give Es,cis - Es,trans= 0.74 ± 0.1 kcal/mol. Ez,cis - Ez,trans= 0.20 ± 0.1 kcal/mol. The corresponding entropy changes of the omega angle degree of freedom are small
Ss
Sz
0.16 kB. This means the cis omega bond is slightly more flexible whether it is phosphorylated or not. Our calculations give the unphosphorylated trans:cis ratio of around 8.2:1 and for phosphorylated species a ratio of 2.3:1 at 350K. Phosphorylation slightly increases the cis population only by
2-fold. The trend favoring the cis isomer more for phosphoSer-Pro is consistent with the NMR results on phosphoThr-Pro (3)
and phosphoSer-Pro (21)
. Clearly the effect is very subtle.
Based on these calculations for peptides, phosphorylation does change secondary structure preference but not in a dramatic way. The phosphoserine favors the more helical form slightly. A proline-rich sequence also has the possibility of forming the left-handed polyproline type II (PPII) helical structure (22)
(at 75, +145). Comparing the serine-proline peptide TZPI with the three nonproline cases we simulated suggests there is increased stability of the PPII region for the SP motif. The effects of phosphorylation of the SP motif on PPII stability are not conclusive, however; the GB result shows a slight increase of PPII population with phosphorylation whereas the DDD results show the opposite trend.
Tertiary structure of nfatS vs. nfatZ
To predict changes at the tertiary level, we performed 50 AMH annealing runs for nfatS and 100 runs for nfatZ (20 for each m=1, 1.1, 1.2, 1.3, 1.4). The predicted folds for the cytoplasmic form and the nuclear form are quite distinct. A striking feature found in the ensemble of structures of nfatS is that nearly all the 13 changeable serines are physically quite close to each other. In nfatS, the NLS region is formed quite far from these serines and remains very exposed. Though nfatS is probably quite flexible and does not have a unique stable tertiary packing, we can clearly see a well-defined consensus for the secondary structures for the 25 annealing runs that began with random starting structures. The N terminus region (163-190) is nearly always largely disordered. Occasionally one finds a short ß-strand at 182-184 or some small helical segments here and there. Most runs show the region from 190 to 212 as the first large helical segment (light blue). This is followed by the second helix comprising residues 217-240 (dark green), the third helix (248-276, light green), the short fourth helix (285-292, yellow), and finally the C-terminal helix (298-325, red).
Sampled nfatS and nfatZ conformations are shown in Fig. 3
and Fig. 4
, respectively. Our AMH results for the (cytoplasmic form) nfatZ indicate it is a more stable and rigid folded helical bundle ensemble. One noticeable difference from nfatS is the increased ordering of the helical segment at the N terminus. More significant is the change in the spatial distribution of the now phosphorylated 13 serines. The phosphoserines are no longer found close together, but instead are spread out more along the whole protein surface. This is consistent with their expected increased electrostatic repulsion once phosphorylated.
|
|
Do the nfatZ conformations protect the NLS region better than the configurations of nfatS? The answer appears to be yes. Occasionally we find structures that completely hide the NLS by direct contacts with the region of phosphorylated residues 170, 173, 176, 179, 182 in the SRR-1 motif (such as Fig. 4e
). This probably arises from the electrostatic interactions of the positive charges of NLS (sequence Lys-Arg-Arg) with the negatively charged phosphoserines. In most other sampled configurations, while there may be no direct contact with phosphoserines, the NLS remains hidden by newly formed tight tertiary contacts between helices.
The annealed structures of the nuclear form nfatS compose an ensemble with many globular forms. They are more flexible and remain open around the NLS region. For the nuclear form one finds no tight helical bundles masking the NLS region.
To study the robustness of this structural ensemble, we studied the case of nfatZ using m = 1 to 1.4. The hamiltonian with m = 1 corresponds to modeling a phosphorylated residue as a singly charged glutamate residue. This is basically a normally encoded protein in which the 13 serines are replaced with glutamates. We call this extensive mutant "nfatE." We noted little change between the structures found for nfatZ with m = 1.0 to m = 1.2, but for m = 1.4 we begin to see structures are not packed as well as m = 1.
To quantify these observations, we studied the statistics of the contact maps formed by these structures. A contact is assumed to be formed between two non-neighboring residues if any heavy atoms are closer than 5 Å. We only count the non-neighboring contacts, i.e., the difference of residue indices is larger than four. Note that we use a principal component analysis (PCA) (23)
of the binary contact degrees of freedom from a combination of snapshots of nfatS and nfatZ. This is not the usual PCA based on Cartesian coordinates. To facilitate the analyses, we further coarse-grained the contacts by grouping neighboring residues in groups of four residues, i.e., a coarse-grained contact matrix is calculated for each structure, with each of those independent 43x(43-1)/2 = 903 elements being either 0 or 1. The first thing to note is that, by these definitions, the mean contact number of the NLS region increases by 80% for the nfatZ form compared with the nfatS form. We then proceeded to calculate and diagonalize the covariance matrix (of dimension 903 x 903) of these degrees of freedom.
Though the top principal components are not highly dominant (when measured by eigenvalues compared with the trace), we see a clear difference between nfatS and nfatZ when we project the annealed structures on these two PCs, as shown in Fig. 5
. The corresponding two PCs, as well as mean contacts for S and Z form, are shown in Fig. 6
. Note that most nfatZ structures have a negative PC1 and most nfatS structures have a positive PC1 in Fig. 5
. Combining this information with the fact that PC1 shown in Fig. 6
has strong blue (negative PC1) helix bundle contacts (res. no. 80-105) in the middle, we see that they conform to this observation: upon phosphorylations, there are stronger helix-helix contacts that cover the NLS (res. no. 91-93 in the internal index).
|
|
Besides using the usual AMH potential with simple pairwise contact energies, we predicted structures using the enhanced associative memory model containing a water-mediated potential (AMW) (24)
. This model is a version of the AMH reparameterized to deal with interactions mediated by water between hydrophobic residues mostly. It is difficult to see how to scale the phosphorylated residue interaction in AMW, since it is already a 20-letter interaction instead of 4-letter. Therefore, the AMW was used for the glutamate mutant nfatE only. The results of for nfatE-AMW runs are somewhat similar to those for nfatE-AMH. We still observe the effects of stiffened helix formation from the initially soft and less structured nfatS. One important difference we have seen is that nfatE-AMW is not packed as well as nfatE-AMH. This might indicate that stable folded structures of nfatZ (if they exist) do not follow the usual pattern of having hydrophobic cores. Instead, the structures may resemble the polyelectrolyte folding of systems such as t-RNA.
The results reported use the AMH appropriate for helical proteins (25)
. We also investigate the possibilities present in the
/ß model (26)
, but no ß sheet signals were found in aligning to a memory database with both
and ß proteins. Based on this and that the top aligned structures are helical proteins, we doubt there will be extensive ß sheets in the partially structural ensemble of NFAT.
Proline-rich sequences may form PPII structures. This left-handed helix has a circular dichroism (CD) signature (27)
that could contaminate a part of the
-helix pattern. This may be why no helical content has been detected in the earlier CD study (12)
. In the cell, possibly NFAT regulatory domain needs partners to form completely stable folded structures (or at the CD experimental conditions).
Overall, it appears more certain the cytoplasmic form needs a well-defined structure to function, such as hiding the NLS until dephosphorylation occurs and entering the nucleus. On the other hand, since it is not clear whether the NES exists, it may not be necessary for the nuclear form to have a single folded structure so long as other domains of the protein, such as the DNA binding domain, can function. Our results do show a more flexible globular protein for the nuclear form. The nuclear form may, however, be totally unfolded in vivo.
| CONCLUSIONS |
|---|
|
|
|---|
All-atom studies of peptides quantify the effects of phosphorylation on the backbone conformations. Based on the GB and DDD results, the phosphorylation slightly favors right-helical conformations. DDD results show more distinct changes than do the GB results. We also studied the effect of phosphorylation on the relative stability of cis/trans conformations. Phosphorylation increases the ratio of cis slightly.
The more important effects of phosphorylation are nonlocal changes in the tertiary structure and arise from the increased charge. Our main hypothesis is that phosphoserine effectively serves as a supercharged glutamate mutant and thus alters the effect of normal serine on tertiary interactions. As a consequence, our calculations show that collectively the 13 phosphoserines rearrange to hide the nuclear localization sequence well for the cytoplasmic form in two ways: 1
) there are directly contacts between positive charged NLS with negative charged phosphoserine in the SRR1 region and/or 2
) phosphorylating the regulatory domain of NFAT shifts the conformation from a regularly globular protein structure to a more rigid helical bundle form, which changes the accessibility of NLS region significantly (i.e., hides the NLS between two helices).
| ACKNOWLEDGMENTS |
|---|
Received for publication December 20, 2004. Accepted for publication April 27, 2005.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. O. Collins, L. Yu, I. Campuzano, S. G. N. Grant, and J. S. Choudhary Phosphoproteomic Analysis of the Mouse Brain Cytosol Reveals a Predominance of Protein Phosphorylation in Regions of Intrinsic Sequence Disorder Mol. Cell. Proteomics, July 1, 2008; 7(7): 1331 - 1348. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Shental-Bechor and Y. Levy Effect of glycosylation on protein folding: A close look at thermodynamic stabilization PNAS, June 17, 2008; 105(24): 8256 - 8261. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Hamelberg, T. Shen, and J. A. McCammon A proposed signaling motif for nuclear import in mRNA processing via the formation of arginine claw PNAS, September 18, 2007; 104(38): 14947 - 14951. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lu, T. Shen, C. Zong, J. Hasty, and P. G. Wolynes Statistics of cellular signal transduction as a race to the nucleus by multiple random walkers in compartment/phosphorylation space PNAS, November 7, 2006; 103(45): 16752 - 16757. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |