|
|
||||||||

* Department of Human Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA; and
Department of Biochemistry, Obafemi Awolowo University, Ile-Ife, Nigeria
1Correspondence: Department of Human Genetics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, USA. E-mail: l-luzzatto{at}ski.mskcc.org
| ABSTRACT |
|---|
|
|
|---|
Key Words: G6PD deficiency G6PD variants housekeeping genes human mutants evolution
| INTRODUCTION |
|---|
|
|
|---|
| MATERIALS AND METHODS |
|---|
|
|
|---|
Alignment and sequence analysis
The sequences were aligned using the program PIMA 1.4
(12)
(Human Genome Center, Baylor College of Medicine,
Houston, Tex.: http://dot.imgen.bcm.tmc.edu:9331/multi-align.html). The
alignment was manually refined. The percentages of identity and
similarity were calculated from the alignment using the program GeneDoc
(13)
. For aa similarity, we adopted the criteria of the
Dayhoffs PAM 250 matrix (14)
: 1) DEQHN,
2) FY, 3) KR, 4) SAT, and
5) LIVM. For physicochemical grouping, we adopted GeneDoc
criteria in which aa are classified in twelve groups based on three
subsequent hierarchical levels: the first level is based on size, the
second is based on electrical charge for polar aa, the third is based
on aromaticity for nonpolar aa (15
, 16)
.
Phylogenetic tree
The evolutionary tree of G6PD was derived from the alignment of
the 52 sequences. Distances between sequences were calculated from the
alignment by using PROTDIST (according to Dayhoffs PAM 250 matrix;
ref 14
); the tree was then generated by using KITSCH and
drawn by using TREEVIEW (17)
. The programs PROTDIST and
KITSCH are part of the PHYLIP 3.57 package (18)
.
Amino acid solvent accessibility
The 3-dimensional structure of human G6PD was previously modeled
on the crystal structure of L. mesenteroides G6PD
(19)
. Specifically, residues 27512 of human G6PD are
aligned to residues 1486 of L. mesenteroides. The
coordinates of this model have been used to calculate the solvent
accessibility (SA) of individual aa residues in the monomeric structure
of human G6PD using Swisse-PdbViewer (20)
program
(http://www.expasy.ch/spdbv/mainpage.htm). Amino acids with SA <
10% are regarded as buried and amino acids with SA
10% are
regarded as exposed (21)
.
Human mutations
One hundred twenty-two mutations or combination of mutations of
human G6PD have recently been tabulated (22)
: 114 variants
have missense mutations; of these, 106 have a single mutation, 7 have
two mutations, and 1 has three mutations. Six variants have in-frame
deletions (four single aa deletions, one 2 aa, and one 8 aa deletion).
One variant has a nonsense mutation and one has a splicing site
mutation. We have made use of only 118 variants from this database
(Fig. 4)
because no clinical data are available on the other four. Of
these 118 variants, 3 do not affect enzyme activity; all others entail
enzyme deficiency: of these, 57 are associated with chronic
nonspherocytic hemolytic anemia (WHO class I, severe phenotype) and 58
are associated with the risk of acute hemolytic anemia (WHO classes II
and III, mild phenotype). In studying the relationship between human
mutations and evolutionary conservation, we have confined the analysis
to only the 99 single missense mutations and the 4 single amino acid
deletions.
|
| RESULTS |
|---|
|
|
|---|
94% identity (by
comparison, homology among mammalian hemoglobins is of the order of
70%); when these are compared with those of lower vertebrates,
invertebrates, and microorganisms, it is not surprising that the degree
of identity decreases but remains higher than 20% (Table 1
|
|
In our retrieval of G6PD sequences, it was of special interest that no coding sequence recognizable as G6PD could be found in seven microorganisms whose genome has been fully sequenced. Four of these (Archaeoglobus fulgidus, Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Pyrococcus horikoshii) are Archea, which grow in an oxygen-poor habitat. The other three are Eubacteria with a small and defective genome, two of which (Mycoplasma genitalium and M. pneumoniae) are cell surface parasites, whereas one (Rickettsia prowazekii) is an obligate intracellular parasite that might be able to capitalize on the G6PD activity of the host cell.
The homology between G6PD sequences from Eubacteria and
Eukaryota clearly supports a common evolutionary origin of
G6PD throughout living organisms. Unlike tissue-specific genes such as
globins, a housekeeping gene such as G6PD allows us to outline
evolutionary relationships for most of the living organism, the only
exception so far being the Archea. A dendrogram based on
economy principles is consonant with taxonomy (Fig. 2
), and some specific points are noteworthy. 1) Metazoa,
fungi, and plants all separate early on from bacteria. 2)
Vertebrates separate cleanly from invertebrates. 3) The
chloroplast G6PD sequences of the plants separate early from the
cytosolic G6PD species of the plants and from G6PD of fungi and
metazoa. 4) Rather than being nearer metazoa, the G6PD
sequences of fungi seem to fall between chloroplast G6PD and cytosolic
G6PD.
|
Sequence conservation and 3-dimensional structure
In all of the 52 known G6PD species, there are 25 identical
aa and 56 similar aa. The total number of aa is 515 in human and
between 425 and 604 in most species. If we apply broader criteria of
similarity, whereby aa are classified into 12 groups sharing some
physicochemical properties, there are in fact 143 conserved aa. The
overall percent homology among G6PD sequences is of course only a rough
measure of evolutionary conservation, because homology is not uniform
throughout the sequence. The degree of homology tends to decrease
toward both the NH2 terminus and the carboxyl
terminus, perhaps because these regions are allowed greater flexibility
without prejudice to the overall conformation of the molecule.
Similarity conservation is shaped in blocks; we have identified 12 by
visual inspection (Fig. 1)
. Although we cannot yet fully explain the
significance of each conserved block, we can offer a rationale in most
cases. Block I is the NADP binding region (residues 3847 of human
G6PD); it has a characteristic dinucleotide pocket (GXXGXX), flanked on
either side by additional conserved hydrophobic amino acids (e.g., IIM
3537, P 50 and P 62, L 55, and L 61) and a KKK motif. Block IV
comprises the catalytic site, which surrounds a completely conserved
lysine; in human G6PD this is K 205, previously shown to be essential
for enzyme activity though not for substrate binding (25)
.
To understand the significance of other regions of high conservation,
we must consider the conformation of the protein. The crystal structure
has been fully solved for L. mesenteroides G6PD
(26)
and human G6PD (27)
. The latter agrees
well with a previously published model (19)
; therefore, it
is reasonable to presume that the 3-dimensional structure is
essentially conserved. With reference to the human G6PD model
(19)
, we find that the conserved blocks III and V are
facing the active center (see above). Block X contributes much of the
subunit interface. Blocks II, VII, VIII, IX, and XI are part of the
hydrophobic core. The carboxyl-terminal ends of block IV and block XI
also contribute to the subunit interface. Blocks VI and XII include
amphipatic
helices. To test the notion that surface accessibility
is significantly related to evolutionary conservation, we consider four
groups of aa based on similarity conservation (fully conserved, 100%;
highly conserved, 7599%; moderately conserved, 5074%; poorly
conserved, <50%). We find that buried aa residues are
over-represented and that exposed aa residues are under-represented
among those with a similarity higher than 75% (Fig. 3
) (
2=50; P<0.001).
|
Mutations in human G6PD
We next analyzed the known mutations of human G6PD
(microevolution) in relation to the macroevolutionary conservation of
the protein sequence (Fig. 4
). The majority of mutations causing G6PD deficiency are missense, and
obvious null mutations (early nonsense mutations, mutations
destroying the reading frame, mutations in the substrate binding or
coenzyme binding regions) have never been observed. In addition, the
mutations in human G6PD are spread throughout the sequence. Only one
discrete cluster emerges in the aa range 380410 that corresponds to
the subunit interface in the enzymatically active G6PD dimer
(19)
,
To assess how human mutations associated with G6PD deficiency relate in
general to the evolutionary history of the G6PD sequence, we refer
again to the four similarity conservation groups defined above and see
that a distinct pattern emerges (Table 2
; Fig. 5
). Fully conserved amino acids and poorly conserved amino acids are
under-represented in G6PD-deficient mutants. By contrast, highly and
moderately conserved amino acids are over-represented in G6PD-deficient
mutants. This skewed distribution of mutations among the different
amino acid conservation group is statistically significant
(
2=9.36; P<0.03).
|
|
In 84% (83/99) of cases, regardless of the degree of evolutionary conservation, aa replacements in human mutants do not respect similarity; in 68% (67/99) of cases, the aa replaced in the mutants are not found in any nonhuman G6PDif so, they probably would not cause G6PD deficiency. In support of this notion, we noticed that in 9 of the 16 cases (56%) where a human mutation does respect similarity, the mutated residue is normal in another species; but of the 83 cases where the human mutation does not respect similarity, there are only 23 (28%) where the mutated residue is normal in another species.
| DISCUSSION |
|---|
|
|
|---|
G6PD is not ubiquitous
Knowledge of the complete sequence of the genome of several
microorganisms has been one of the most significant advances in
genomics research in the past 5 years. A remarkable implication is that
for the first time it is possible to ask directly not only what genes
can be found in an organism, but also what genes are lacking. To our
surprise, we found no recognizable G6PD sequence in seven
microorganisms whose genome had been fully sequenced. Indeed, three of
the microorganisms that lack G6PD are Eubacteria, with small
and defective genomes, and parasitic habit (Mycoplasma
genitalium and M. pneumoniae, Rickettsia prowazekii).
We surmise that all three might be able to capitalize on the G6PD
activity of the host cell. The other four microorganisms that lack G6PD
(Archaeoglobus fulgidus, Methanococcus
jannaschii, Methanobacterium thermoautotrophicum,
Pyrococcus horikoshii) are Archea, which grow in
an oxygen-poor habitat; we surmise that they do not need G6PD because
they do not need to defend themselves against oxidative stress. These
evolutionary findings, while showing that life without G6PD is
possible, provide independent confirmation for the notion derived from
targeting the G6PD gene in mouse embryonic stem cellsnamely, that in
the organisms that do have G6PD, its only indispensable function
consists not in pentose synthesis, but rather in supplying reductive
potential in the form of NADPH (28)
. NADPH in turn serves
as a defense against oxidative stress as well as for the synthesis of
nitric oxide (29
, 30)
.
Phylogenesis
The phylogenetic tree we obtained for G6PD clearly supports a
common evolutionary origin throughout living organisms and is consonant
with taxonomy (Fig. 2)
. Our findings on the evolution of the G6PD gene
are reminiscent of those that have been reported with other
housekeeping genes, most of which are ancient in evolutionary history
(31)
: namely, the degree of homology bears a broadly
inverse correlation to evolutionary distances, and certain critical
regions are nearly completely conserved. P. falciparum seems
to be at an evolutionary dead-end, as is often the case with parasitic
protozoa. The absence of G6PD in all the Archea that have
been fully sequenced is intriguing, but not unique. Eight hundred
sixty-four clusters of orthologous groups (COG) have been defined on
the basis of consistent patterns of sequence similarities when
comparing protein sequences encoded in eight complete genomes from six
major phylogenetic lineages (32)
. Each COG consists of
individual proteins or groups of paralogs from at least three lineages.
Indeed, 26% of COG show the same phylogenetic pattern of G6PD
(presence in Bacteria and Eukarya, absence in
Archaea) (33)
. This phylogenetic pattern
suggests that G6PD gene has been lost in the common ancestor of
Archaea. Alternatively, the common ancestor of
Archea and Eukaria did not have G6PD, and
Eukaria later acquired G6PD gene by horizontal transfer from
Bacteria.
Evolutionary conservation and 3-dimensional structure
In general, one would expect in an enzyme that is a globular
protein that the active center and the hydrophobic core would show a
high degree of conservation (34)
, and we have validated
this principle in the case of G6PD: 64.2% of buried (but only 35.8%
of exposed) amino acid residues are highly or fully conserved (Fig. 1
,
Fig. 3
; P<0.001). In addition, we note that in a homodimer
the subunit interface is structurally special because each amino acid
within that region is represented twice, in symmetrical positions, on
the contact surface. As a result, any amino acid replacement within
this region will produce two nearby changes in the structure of the
protein. In fact, we have found that the subunit interface probably
accounts for the majority of the conserved blocks in the G6PD sequence
on a long-range evolutionary scale, suggesting that constraints against
sequence change are greater than average in that region.
Genetic variation in human G6PD
Comparing the distribution of mutations in a human gene with its
evolutionary history is a powerful tool for pinpointing the function of
domains and even of individual aa within a protein. To do this, it is
useful to consider the range of potential phenotypic (clinical)
consequences of different types of mutations in a housekeeping gene
such as G6PD. 1) Null mutations have never been observed. We
have obtained mice heterozygous for G6PD deficiency from G6PD null
embryonic stem cells (28)
, but no hemizygous
G6PD-deficient mice; we have recently found that the condition is
lethal at an early stage in embryonic development (35)
.
2) Mutations that compromise drastically substrate binding,
coenzyme binding, or the catalytic mechanism would be expected to
affect G6PD function in all cells to approximately the same extent and
they ought to cluster in the respective regions of the sequence; no
such clusters are seen (Fig. 4)
. We presume that mutations with such
drastic effects may frequently be lethal. 3) Mutations that
cause instability of the G6PD protein molecule would be expected to
produce marked deficiency of G6PD in red cells (much more than in other
cells), because these cells have a long life span after they have lost
capacity for protein synthesis (36)
. The majority of known
human G6PD-deficient variants fall into this category. 4)
The only cluster of mutations emerges between aa 380 and 410, and
corresponds to the subunit interface in the enzymatically active G6PD
dimer (19)
. Moreover, nearly all of the mutations in this
cluster cause a severe phenotype (class I); indeed, 34% of all class I
mutations fall within this 6% of the entire sequence. This highlights
the critical role of precisely fitting noncovalent interactions for the
formation and stability of the dimeric molecule.
The distribution of mutations in relationship to the evolutionary
history of the G6PD sequence shows a definite and peculiar pattern:
more than two-thirds of the aa replacements that cause G6PD deficiency
in humans are in highly and moderately conserved aa, whereas relatively
few are in fully or poorly conserved aa (Fig. 5
, P<0.03).
Fully conserved amino acids are under-represented in G6PD-deficient
mutants, presumably because of a higher probability that their
replacement may be lethal. Poorly conserved amino acids are also
under-represented, presumably because in many cases their replacement
may not cause G6PD deficiency and therefore such mutations may go
undetected. In fact, of the 19 mutants in this group, 7 are in the
subunit interface region (see above) and 3 are in-frame deletions
rather than missense mutations; of the remaining, none has a severe
phenotype, confirming that replacements of these nonconserved amino
acids are generally well tolerated. By contrast, highly and moderately
conserved amino acids are over-represented in G6PD-deficient mutants;
indeed, two-thirds of these mutants affect such amino acids residues,
as though the consequence of their replacement is often serious enough
to cause instability but not sufficiently serious to be lethal. This
finding is a novel modification of the notion that mutations associated
with genetic defects tend to affect the amino acids residues that are
most conserved.
Human polymorphic mutations vs. human sporadic mutations
Extensive databases on human disease genes in several cases
comprise 100 or more mutations (37)
. However, G6PD is the
only case of a housekeeping gene in which many (about one-half; see
Table 2
) of the known mutations are both potentially pathogenic and
polymorphic2
as a result of malaria selection (38
, 39)
.
Indeed, each of the mutant alleles concerned constitutes an example of
balanced polymorphism, whereby the deleterious phenotypic consequences
of the mutation are balanced by the resistance that it confers against
Plasmodium falciparum. By contrast, we can presume that each
of the sporadic G6PD mutants has remained sporadic because its
phenotypic consequences are too deleterious to be balanced by the
resistance it might confer against Plasmodium falciparum. If
we analyze separately the severe sporadic mutations and the mild
polymorphic mutations against the backdrop of macroevolution, we find
that the former are slightly more frequent in the 7699% conservation
bracket and the latter are more frequent in the 5075% conservation
bracket (see Table 2
). However, with the number of mutants known thus
far, the difference is not yet significant.
| CONCLUSION |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
Note added in proof: In a publication that appeared after this paper was submitted, Cheng et al. [J. Biomed. Sci. (1999) 6, 106114] have reported an independent analysis of human mutations in relation to amino acid conservation among 23 G6PD sequences from different organisms; some of their conclusions are in good agreement with ours.
| FOOTNOTES |
|---|
and the ß globin genes are known, although they are not as numerous as the G6PD mutants. Received for publication February 28, 1999. Revised for publication September 16, 1999.
| REFERENCES |
|---|
|
|
|---|
Ketchup/genedocshtml
This article has been cited by other articles:
![]() |
M. Saliola, G. Scappucci, I. De Maria, T. Lodi, P. Mancini, and C. Falcone Deletion of the Glucose-6-Phosphate Dehydrogenase Gene KlZWF1 Affects both Fermentative and Respiratory Metabolism in Kluyveromyces lactis Eukaryot. Cell, January 1, 2007; 6(1): 19 - 27. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Paglialunga, A. Fico, I. Iaccarino, R. Notaro, L. Luzzatto, G. Martini, and S. Filosa G6PD is indispensable for erythropoiesis after the embryonic-adult hemoglobin switch Blood, November 15, 2004; 104(10): 3148 - 3152. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Matzkin Population Genetics and Geographic Variation of Alcohol Dehydrogenase (Adh) Paralogs and Glucose-6-Phosphate Dehydrogenase (G6pd) in Drosophila mojavensis Mol. Biol. Evol., February 1, 2004; 21(2): 276 - 285. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. N Ames, I. Elson-Schwab, and E. A Silver High-dose vitamin therapy stimulates variant enzymes with decreased coenzyme binding affinity (increased Km): relevance to genetic disease and polymorphisms Am. J. Clinical Nutrition, April 1, 2002; 75(4): 616 - 658. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. P. Miller and S. Kumar Understanding human disease mutations through the use of interspecific genetic variation Hum. Mol. Genet., October 1, 2001; 10(21): 2319 - 2328. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Rovira, M. De Angioletti, O. Camacho-Vanegas, D. Liu, V. Rosti, H. F. Gallardo, R. Notaro, M. Sadelain, and L. Luzzatto Stable in vivo expression of glucose-6-phosphate dehydrogenase (G6PD) and rescue of G6PD deficiency in stem cells by gene transfer Blood, December 15, 2000; 96(13): 4111 - 4117. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |