|
|
||||||||
|
FJ
EXPRESS SUMMARY ARTICLE The Full-length version of this article is also available, published online July 24, 2001 as doi:10.1096/fj.00-0889fje. |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
*,
,
2
* Department of Microbiology, Biozentrum of the University, Basel, Switzerland;
Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic;
Department of Biophysical Chemistry, Biozentrum of the University, Basel, Switzerland; and
Collegium Basilea, Institute of Advanced Study, Basel, Switzerland
2Correspondence: Hochstrasse 51, CH-4053 Basel, Switzerland. E-mail: J.Ramsden{at}unibas.ch
SPECIFIC AIM
From our previous demonstration that the simplified canonical law (scl) that relates protein synthesis rates to the rank of those rates in an ordered list characterizes the quantitative state of gene expression in a prokaryote, we extend our work to monitoring the development of a bacterial culture, making use of protein expression data obtained using 2-dimensional (2D) electrophoresis. We had previously examined such data obtained from arbitrarily chosen epochs for cultures of several different microorganisms; here, our aim was to examine how the two characteristic parameters of the scl evolve during the natural development of a culture of a single organism, Streptomyces coelicolor.
PRINCIPAL FINDINGS
1. The scl can be fitted to the distribution of protein synthesis
rates throughout development
Protein synthesis rates were measured in S. coelicolor
in liquid culture (data from C. J. Thompsons laboratory at the
Biozentrum, Basel University). Samples were pulse radiolabeled with
35S-met/cys at successive epochs, and the
proteins extracted and separated on 2D gels according to isoelectric
point (3<pI<8) and molecular mass
(15,000<Mr<90,000). Autoradiographs
of the gels were scanned digitally to determine integrated spot
densities Ir, from which the normalized
rates of synthesis pr were obtained:
![]() |
The pr were then ranked in decreasing order of
rate of synthesis. According to our previous work, these rates can be
predicted from their ranks (r) by using the scl:
![]() |
and
are the parameters of the distribution.
P is simply a normalizing coefficient, chosen to ensure that
![]() |
and
, they were
allowed to vary while fitting Eq. 2
|
2. The growth hiatus associated with change in nutrient utilization
is accompanied by marked changes in the theta parameter (which
characterizes the extent to which the genome is used) and in the rho
parameter (which characterizes the extent of functional redundancy
among the expressed genes)
began at a fairly steady, moderately high value,
suddenly jumped up to a peak of almost 1, plunged to a deep minimum,
and then regained its previous value (as the culture became senescent,
slowly declined).
, on the other hand, remained fairly steady
during growth except for a sudden jump to a high peak, just when
sank to a trough, and almost equally abruptly fell back to its original
value (Fig. 2
). These dramatic changes in
and
coincided precisely with the
exhaustion of the initially exploited nutrient (maltose), when the
organism had to readapt to a different carbon source (glutamate).
|
3. The scl concept is a novel and revealing concept for exploiting
the explosion of data generated from advances in proteomics. The
parameters provide a very compact description of the quantitative state
of the proteome
Two-dimensional gel electrophoresis has now advanced sufficiently
to allow the simultaneous quantification of almost all the expressed
proteins in a cell (the proteome), opening up extensive new
possibilities (in principle) for understanding the organization of
metabolism. Fitting the scl to a ranked list of protein synthesis rates
allows the state of gene expression to be characterized by just two
parameters with clear biological significance, and allows the
parameters of the distribution of synthesis rates to be interpreted
according to global features of gene expression.
CONCLUSIONS AND SIGNIFICANCE
The ultimate utility of the scl concept depends on being
able to assign biological significance to the parameters, and to
understand the underlying reason for the remarkably good fits of the
protein synthesis rates to this law. A vital clue came from comparing
the abrupt changes in
and
with simultaneous measurements of the
overall growth rate of the culture, combined with principal component
analysis (PCA) of the expression pattern determined from the 2D
electrophoretograms.
For PCA, the spots were ranked according to image quality, and the best
128 spots (i.e.,
10% of the total) selected as a minimal
representative set. Each epoch is thus a point in 128D space, the
coordinates being given by the spot densities. To apply PCA, a much
smaller set (the principal components) of orthogonal linear
combinations of these original variables were found, which incorporate
all the variance of the original data. The first three principal
component axes alone incorporate 56% of the variance of the whole data
set.
Four distinct consecutive growth phases could be identified. The values of the first principal axis (PC1, bearing 26% of the variance of the whole database) remain fairly constant during each phase, whereas the passages from one phase to the next one are accompanied by abrupt and marked changes. These four phases are:
1) Standard exponential growth;
2) The transition phase (cessation of growth), associated with exhaustion of the nutrition source exploited initially. On the second and third principal axis (PC2 and PC3), this phase is demarcated by points 8 and 9 (3236 h growth), which exhibit exceptionally high values, whereas the values of the other points remain close to 0 and therefore have no significant influence on the variance borne by PC2. The extraordinarily high values at points 8 and 9 reflect marked changes in the distribution of proteins synthesized during this phase of development;
3) The second (post-transition) phase of exponential growth;
4) Onset of the stationary phase; growth of the culture is arrested.
Comparison of the plain growth rate data, together with the far more
sophisticated PCA analysis, with the scl parameters, clearly indicates
a close correlation between the peaks and troughs in the evolution of
and
and the growth phases. The transition phase associated with
the exhaustion of nutrient is anticipated by the peak in
, and the
transition phase proper corresponds to the trough in
and the peak
in
.
Our application of the scl (Eq. 2)
is a generalization of a
result originally derived in the context of communication theory. It
was shown that when a message is transmitted word by word, the scl is
the distribution of word usage frequency that minimizes the overall
cost of transmitting a message containing a given quantity of
information.
Distributions of word usage in many different languages, ranging
from Chinese and Latin to English, have been shown to follow the scl.
One of the motivations for the derivation of the scl was a large body
of prior data showing that word usage distributions could be
approximated by a simplified form of Eq. 2
, known as Zipfs
law, in which
is always 1 and
is always 0. On the one hand, it
is remarkable that a power law was found for so many languages,
suggesting some common fundamental mechanism; on the other hand, there
were clearly significant deviations between Zipfs law and the
data, which were then eliminated by allowing
and
to take on
different values for each data set.
The parallel between communication and the cell is that the genome is communicating information to the organism (i.e., specifying form and function) protein by protein while always operating under the overall constraint of minimizing energy consumption.
By analogy with the familiar thermodynamic temperature,
is called
the informational temperature. It is a measure of resource
utilization: low values correspond to a state in which the most
strongly expressed proteins are almost the only ones present;
conversely, a value of 1 corresponds to maximum exploitation of the
available genetic resources.
characterizes the degree of functional redundancy prevailing among
the most abundantly expressed proteins. High
corresponds to high
diversity in the sense of implying an absence of functional redundancy
(note that redundancy often plays a key role in maintaining the
stability of natural processes). This is exactly what would be expected
under the emergency condition in which the culture, having almost
starved itself to extinction, is suddenly able to start metabolizing a
hitherto unexploited food resource. There is an immediate and
imperative need to survive, and only those proteins that are absolutely
essential (and not already available) will be synthesized. There are
not enough nutritional resources to allow the cell the shrewd
luxury of creating backup pathways.
The scl concept is rooted in a minimization (of energy) condition
operating under a constraint of transmission of a certain amount of
information in discrete (word) units. Our generalization of words to
include proteins is not merely fortuitous, but is actually a deep
analogy. In particular, the discrete (gene by gene, protein by protein)
way the genome encodes and transmits its message to its corporeal host
stands in sharp contrast to Shannons approach to the transmission of
messages in which an entire message is optimized en bloc. A priori,
this seems improbable for cell metabolism: it would result in a far too
rigid response and be hopelessly inflexible in dealing with the
vicissitudes of a naturally fluctuating environment. We ascertained
that the Shannon entropy H of the protein distribution
![]() |
Characterization of an emergency mode
After a new source of nutrition has been identified along with the
necessary metabolic machinery,
drops abruptly, indicating strong
concentration of effort onto relatively few proteins. The cell is now
in emergency mode: a new food source has just been identified, and the
cells utmost priority is to get its metabolism running again as fast
as it can before it starves. At the same time,
rises to
unprecedentedly high values, indicating the absence of redundant
backup metabolic pathways. Hence, the organism briefly lives in a
highly fragile state, during which it may be expected to be very
vulnerable to any sudden stress of a different nature from the
nutritional one it is in the process of overcoming. The new stress
would require yet another set of different enzymes for its
neutralization that the organism momentarily lacks the resources to
provide.
Our arguments are summarized in Fig. 3
. Chip technology and quantitative proteomics have advanced to the
ability to provide a quantitative picture of gene expression in time
for large numbers of genes working in parallel. The scl condenses the
information about overall gene expression into a small number of
variables by which the state of the organism can be characterized.
|
The relative amounts of protein resolved on 2D electrophoretograms are fingerprints of protein biosynthesis (i.e., the quantitative state of gene expression) at a given epoch in development.
We show that the distribution of protein synthesis rates always follows the canonical law and the parameters depend only on the state of gene expression, independent of other variables such as cell type and type of organism. The parameters of the canonical law represent state variables of gene expression analogous to those known from thermodynamics.
Mathematical approaches using 2D electrophoresis data are, or are becoming, an integral part of the technology and make it possible to interpret the data provided beyond mere enumeration of all the expressed genes.
FOOTNOTES
1 To read the full text of this article, go to
http://www.fasebj.org/cgi/doi/10.1096/fj.00-0889fje ; to cite this
article, use FASEB J. (July 24, 2001)
10.1096/fj.00-0889fje ![]()
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |