|
|
||||||||

* Department of Microbiology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, New Jersey 07103, USA; and
PharmaSeq, Inc., Monmouth Junction, New Jersey 08852, USA
1Correspondence: Department of Microbiology, New Jersey Medical School, University of Medicine & Dentistry of New Jersey, 185 South Orange Ave., Newark, NJ 07103, USA. E-mail: egoldman{at}umdnj.edu
| ABSTRACT |
|---|
|
|
|---|
Key Words: E. coli protein synthesis recoding programmed translational frameshifts readthrough of UGA codons
| INTRODUCTION |
|---|
|
|
|---|
In previous work, a large number of sequences from a random peptide
library were isolated by phage display technology during drug discovery
protocols and determined to contain non-open reading frame (non-ORF)
and frameshifted sequences, which were nevertheless apparently being
expressed (4
, 5)
. However, it was unclear whether
expression of these sequences was a rare event.
The peptide library consisted of
40 residue-long random peptides
encoded by NNK codons (N=A,C,G or T; K=G or T) of a synthetic gene on a
phagemid. The peptide sequences were flanked by the FLAG and the
epitope tag (E-tag) linear epitopes, and were fused to the minor coat
protein of filamentous phage (encoded by gene III).
The diversity of the library was 1.5 x 1010
clones. Phage were prepared in an amber suppressor host, thus UAG stop
codons were translated as glutamine. On occasion, seemingly as a result
of chemical modification in synthetic oligonucleotides used, deletions,
insertions, or point mutations were observed in the genes from phage
display. Such sequence changes at times resulted in TGA or TAA stop
codons.
Among
150 different clones that were sequenced and shown to express
peptides specific for different receptors (5)
, three
general types of DNA sequences were obtained. What was originally
expected was an ORF corresponding to the full length of the peptide and
(in this library) the E-tag that follows. This class of sequence was
observed only in
50% of all sequences identified in biopanning as
binding to a target (4)
. Two other types of sequences,
qualitatively very different, were also observed. The second type of
sequence contained a nonsuppressed stop codon (i.e., TAA or TGA) within
the nucleic acid sequence encoding the peptide. The third type of
sequence contained a nonsuppressed stop codon and, in a different
reading frame, the E-tag sequence (4)
. In a few instances,
some clones did not have a 0-frame stop codon, but encoded the E-tag in
a different reading frame.
One of the sequences, named H10, was chosen for further study
(5)
. H10 was selected by phage display as producing a
peptide that bound to GHBP (growth hormone binding protein). The
sequence exhibited two in-frame TGA stop codons upstream of the
sequence encoding the E-tag epitope (and the fusion to M13 gene
III protein) in the same reading frame. A secondary peptide
library on filamentous phage, designed to average
four mutations per
sequence, was obtained by a doped mutagenesis procedure and again
subjected to biopanning against the GHBP target. A number of additional
clones were thus obtained, most of which exhibited frameshifts, both +1
and -1, in the placement of the E-tag sequence relative to the
translation start, as well as retention of the 5' proximal TGA stop
codon of parental H10. Because the 3' proximal third of the sequence
yielded almost no mutations after phage display selection, it was
possible to infer the amino acid sequence (and hence the reading frame)
of the portion of the peptide that bound to GHBP; this was proved by
independent synthesis of the putative peptide fragment, which did in
fact display the appropriate binding properties (5)
.
In the work described here, the H10 sequence and several of the
secondary derivatives have been subcloned into a reporter system and
tested for expression in E. coli. The results show that this
sequence facilitates high-frequency readthrough (or bypass) of UGA stop
codons, and with minor modifications, high-frequency +1 or -1
frameshifting. In most of the tested clones, expression was obtained in
two of the three reading frames, although it is possible that one of
these frames represents reinitiation at an internal AUG (6
, 7)
. We believe this phenomenon to be translational in origin,
because two of the clones in which the stop codons were altered to
sense codons primarily displayed translation only in the zero frame.
| MATERIALS AND METHODS |
|---|
|
|
|---|
[lac-pro], supE,
thi/FlacIQZ
M15,
proA+B+); this
strain (8)
The vector into which the H10 derivatives were cloned was pJC27
(9)
, also supplied by Jim Curran, and used previously in
this laboratory (10
, 11)
. This vector contains a
chloramphenicol resistance gene, a p15A origin of replication, and a
pseudo-wild type lacZ gene under control of the
lacUV5 promoter, with a HindIII site at
nucleotides 1217 and a BamHI site at nucleotides 3237
(where nucleotide 1 is the A of the initiating ATG codon of
lacZ).
Plasmids carrying the H10 sequence or its derivatives (5)
were obtained from DGI Biotechnologies (Edison, N.J.).
PCR amplification and cloning procedures
The following oligonucleotides were used to polymerase chain
reaction (PCR) amplify the H10 and related sequences from the host
phagemid, pCANTAB5E:
WM11.1: CCCAACAAGCTTCGACTACAAAGAC.
WM11.2 (frame 0): AACCAAGGATCCCCGGCGCACCTGC (25 nt).
WM11.3 (frame +1): AACCAAGGATCCACCGGCGCACCTGC (26 nt).
WM11.4 (frame -1): AACCAAGGATCCCGGCGCACCTGC (24 nt).
Control oligo: CGACTACAAAGACGCGGCCGCAGGTGCGCCGG.
To obtain a DNA fragment for the fusion to the ß-galactosidase gene in the 0 reading frame, oligonucleotides WM11.1 (5'-end primer) and oligonucleotide WM11.2 (3'-end primer) were used. Similarly, to obtain gene fusions in either +1 or -1 reading frames, oligonucleotides WM11.1 and either WM11.3 or WM11.4, respectively, were used.
The 5'-end primer facilitated introduction into the PCR product, after the 4th codon of lacZ, the FLAG sequence (GACTACAAAGAC), at the same time adding a HindIII (AAGCTT) site. The 3'-end primers added a part of the E-tag sequence (GCAGGTGCGCCG) in front of the lacZ gene in all three reading frames (relative to the E-tag frame), and at the same time introduced a BamHI (GGATCC) site. The control oligonucleotide was used to construct pJC27 derivatives that do not carry any H10 sequences for the purpose of establishing the expression levels without the H10 insertions.
The PCR products obtained were digested with HindIII and BamHI restriction enzymes, purified by gel electrophoresis, and ligated into precut (with HindIII and BamHI) plasmid pJC27.
All sequences used in this work were verified by DNA sequencing carried out by the New Jersey Medical School Molecular Resource Facility or by DGI Biotechnologies.
Site-directed mutagenesis
Codon 13 (TGA) of H10, 221, and 210 sequences was mutated
to TTA in a PCR reaction involving one of WM11.2, WM11.3, or WM11.4
primers and one of the following three primers:
H108T: CCCAACAAGCTTCGACTACAAAGACTTTCCGTTAGTGTGTTGGAGGGCG.
2218T: CCCAACAAGCTTCGACTACAAAGACTTTCCGTTAGTGTCTTGGAGGGGG.
2108T: CCCAACAAGCTTCGACTACAAAGACTTTCCGTTAGTGTGTTCGTGGCGC.
As a result, nine mutated variants were obtained. The PCR products were digested with HindIII and BamHI restriction enzymes, purified by gel electrophoresis, and ligated into precut (with HindIII and BamHI) plasmid pJC27.
| RESULTS |
|---|
|
|
|---|
|
Three control derivatives of vector pJC27 were constructed
(Fig. 2
) containing the sequences encoding the FLAG (AspTyrLysAsp) and partial
E-tag (AlaAlaAlaGlyAlaPro) epitopes, with the fusion to reporter
following in all three reading frames, respectively. These controls
allowed us to monitor expression levels from the pJC27 plasmid itself,
without any H10-series inserts. In pJC27.1, ß-galactosidase follows
the FLAG and partial E-tag sequences in the 0-frame (this construct
shows expression comparable to pJC27), whereas in pJC27.2 the reporter
is in the +1-frame relative to the translation start, and in pJC27.3,
the reporter is in the -1-frame. After 1 h of
isopropyl-ß-D-thiogalactopyranoside induction, expression of
ß-galactosidase in pJC27.1 was high, whereas pJC27.2 expressed the
reporter 3% as well, and pJC27.3 showed the same level of activity as
the lac deletion background, strain MY411 (Table 1
, lines 14).
|
|
Bypass or readthrough of two UGA codons in clone H10
We constructed several clones containing the H10 sequence and its
derivatives 117, 221, and 210 (5)
in plasmid pJC27 in
different reading frames with respect to ß-galactosidase (Fig. 3
).
|
The H10 sequence has two 0-frame TGAs at codons 13 and 34 in the
reporter construct, where codon 1 is the initiating ATG (Fig. 3)
. In
the original isolations of the segments, the partial E-tag epitope
(AAAGAP) delineates the reading frame of the fused M13 protein needed
for the phage display selection. This frame must be accessed at least
by codon 36 of the H10 sequence, because phage binding depends on
expression of the amino acid sequence LGCYFVAGVVACV (ref 5
; see Fig. 3
). Expression of ß-galactosidase was high, 19% of control pJC27.1,
when this sequence was joined to the reporter in the 0-frame in clone
H10.1. There was no expression above background when this sequence was
joined to the reporter in the +1-frame (H10.2), but even higher
synthesis, 43% of the control when the sequence was joined to the
reporter in the -1-frame (H10.3) (Table 1
, lines 57).
One of the phage display-selected derivatives from H10, clone
117, had six nucleotide changes (Fig. 3)
, including changes in the two
TGA codons (to TGC [Cys], at codon 13, and AGA [Arg], at codon 34),
as well as a change from TGG (Trp) to TAG (amber, suppressed as Gln) at
codon 35. This derivative showed reporter synthesis essentially only in
the 0-frame (117.1), albeit with a reduced level of 37% compared to
pJC27.1, probably reflecting the efficiency of suppression of the UAG
at codon 35 (Table 1
, lines 810). When normalized to the sum of
expression in all 3 frames, synthesis in the 0-frame was >96%.
+1 and -1 frameshifting in clones 221 and 210
In the isolates from the secondary library of H10, clone 221
had the E-tag in the +1-frame relative to the translation start as a
result of insertion of a C residue into codon 31. This clone also had
five other nucleotide substitutions (Fig. 3)
. Because of the insertion,
the TGA at codon 34 was no longer in the 0-frame; however, the codon 13
TGA was still present. Significant expression of the reporter, almost
10% of pJC27.1, was obtained when ß-galactosidase was joined in both
the +1 (221.1) and the 0 (221.3) frames relative to the translation
start (Table 1
, lines 1113).
In clone 210, the E-tag was in the -1-frame relative to the
translation start as a result of deletion of an A residue in codon 17.
This clone also had seven other nucleotide substitutions (Fig. 3)
.
Because of the deletion, the TGA at codon 34 was no longer in the
0-frame, but the TGA at codon 13 was still present. Significant
expression of the reporter,
10% of pJC27.1, was obtained when
ß-galactosidase was joined in the -1-frame (210.1), and
20% when
the reporter was joined in the +1-frame (210.3), relative to the
translation start (Table 1
, lines 1416).
A graphical representation of the relative expression levels obtained
in all three reading frames for H10, 117, 221, and 210 is shown in
Fig. 4
.
|
Site-directed mutants that eliminate the UGA at codon 13
We constructed a number of site-directed mutants of our H10 series
sequences in order to test possible involvement of the first TGA stop
codon in the unusual expression patterns. Clones H10, 221, and 210 all
contain a TGA stop at codon 13 in our reporter constructs. The G of the
TGA was changed to T (making a TTA leucine codon) by PCR (see Materials
and Methods). These derivatives are designated H108T, 2218T, and
2108T, respectively (reflecting the G
T transversion at the 8th
nucleotide after the FLAG-encoding sequence). As
with all the other sequences, these mutants were also prepared with the
reporter in all three reading frames. The sequences of the
site-directed mutants of 221 and 210 are shown in Fig. 5
(the parental H10 sequence is in Fig. 3
; note that the TGA at codon 34
is still present in the 8T mutants of H10).
|
Abolishing the first TGA (of two) in H10 diminished expression of the
reporter by about half in both the 0-frame (H108T.1) and the
-1-frame (H108T.3), but expression was still substantial and the
normalized (i.e., relative) expression of each frame was unchanged
(Table 1
, lines 1719).
By contrast, abolishing the TGA at codon 13 of clone 221 essentially
prevented the +1 (2218T.1) frameshift, whereas greatly increasing
0-frame (2218T.3) expression (Table 1
, lines 2022).
For clone 210, there was no reduction of ß-galactosidase expression
by changing the codon 13 TGA to sense for the -1 (2108T.1) or the +1
(2108T.3) connections to the reporter. Further, there was only modest
expression for the construct where the reporter was joined in the
0-frame (2108T.2), even though no 0-frame stop codon was present
(Table 1
, lines 2325). These results were so unexpected that the
plasmids were resequenced; even though the sequences were verified,
construction of the 2108T series was also independently repeated.
Nevertheless, the same patterns of expression were obtained.
A graphical representation of the changes in relative expression
levels obtained in all three reading frames for the site-directed
mutants is shown in Fig. 6
.
|
During the PCR cloning, one of the isolated colonies from the
construction of 2108T.3 was found to have a deletion of a T in codon
8 (2nd codon of the sequence encoding the FLAG epitope). This changed
the connection to the reporter into the 0 reading frame. Expression of
this variant, called 2108T.3(
-9T) (Fig. 5)
, was substantially
higher (Table 1
, line 26) than the other 0-frame derivative of 210
(Table 1
, line 24).
| DISCUSSION |
|---|
|
|
|---|
Potential RNA secondary structures
Subjecting the entire H10 sequence to an RNA folding program
results in a highly ordered structure with many long-range
interactions. However, this is not particularly specific: a scrambled
version of H10 with the same nucleotide composition also gave a highly
ordered structure that even showed some resemblance to the structure
generated for the native H10 sequence. A more informative approach
appeared to be to generate a structure based on the 51 nucleotides
encompassing the UGA at codon 13. This suggested a stem-loop with a
stability of about -5 kcal immediately downstream of the UGA
(Fig. 7
), which is similar to the configuration of several known recoding
elements (e.g., 26
27
28
29
30
31
). This putative stem-loop is
preserved in all three non-ORF constructs, H10, 221, and 210. Two
alternate stem-loops were suggested by the folding program for 221, but
structure a for 221 in Fig. 7
conformed to the structures obtained
for H10 and 210. This stem-loop in 221 showed the same primary and
secondary structure as H10 except for a compensating GC to CG base pair
in the stem. The stem-loop in 210 had the identical stem as in H10, but
two base changes and a one base deletion in the loop. The existence and
potential function of these putative structures remain to be
established.
|
Expression in two of the three reading frames
H10, 221, and 210 all showed reporter expression not only in the
E-tag frame (designated by the .1 suffix), consistent with the
phage display selections, but also in the frame -1 to the E-tag frame
(designated by the .3 suffix). Expression in this frame would not
have been detected in the phage display selections. The FLAG epitope at
the NH2 terminus of the phage-displayed peptides
is only recognized by antibody when it is positioned precisely at the
amino-terminal end. In our constructs, the FLAG sequence is after codon
6 in the fusion protein, and therefore cannot be assayed
immunologically.
Inspection of the H10 sequence reveals that the A of the codon 34
TGA is followed by TG, and that 6 nt upstream is the sequence GGAGG
(Fig. 3)
; in other words, the -1-frame harbors a potential AUG
initiation codon with an excellent Shine-Dalgarno appropriately spaced
just upstream. This nucleotide sequence is preserved in both 221 and
210 (see Fig. 5
), in both cases in the frame -1 to the E-tag frame. In
the case of clone 117, one of the mutations changed the G of the ATG to
an A (Fig. 3)
, severely reducing or eliminating this potential start
site. In the cases of 2218T.3 and 2108T.3(
-9T), both of which
show reporter expression in the 0-frame, initial and internal ATGs are
in the same reading frame. Thus, it seems plausible that expression in
frames -1 to the E-tag frame results from initiation or reinitiation
at the internal AUG.
This suggestion is further strengthened by other experiments in which
H10.1, H10.3, 221.1, 221.3, 210.1, and 210.3 were all placed
individually in an su- lac deletion host that does not
contain a lac repressor allele. In these conditions,
transcription from the lac promoter is constitutive, but
amber codons are not suppressed. Nevertheless, all of these constructs
did express ß-galactosidase at about the same relative levels
compared to control pJC27.1, as is observed in Table 1
(data not
shown). Since there are in-frame TAG stops upstream of the putative
downstream ATG start in both H10.3 and 210.3, but none after the ATG,
these results also favor downstream initiation as the explanation for
expression in this frame.
By contrast, expression in the E-tag frame cannot be accounted
for by a downstream initiation. If this were the case, the start would
have to be within the reporter sequence; therefore, expression would
have been observed regardless of the reading frame in which the H10
fusions were attached. However, expression in the frame +1 to the E-tag
(designated by the .2 suffix) was always insignificant after the
H10 fusions (Table 1)
.
Further, we have made a preliminary attempt to determine the
amino-terminal amino acid sequence of some of the H10 fusions.
Sonicated lysates of induced cultures, in the presence of a mixture of
protease inhibitors, were passed over an APTG
(4-aminophenyl-ß-D-thiogalactopyranoside) agarose (Boehringer
Mannheim, Mannheim, Germany) column (an affinity column for
ß-galactosidase), and the eluate showed a significant degree of
purification of ß-galactosidase as judged by staining sodium dodecyl
sulfate-polyacrylamide gels. The ß-galactosidase fusion protein was
transferred to ProBlott PVDF-type membrane (Applied Biosystems, Foster
City, Calif.) by Western blotting and microsequencing was performed by
the New Jersey Medical School Molecular Resource Facility. The protein
isolated from cells with clone 221.1 (which shifts +1 to express
reporter) gave an amino-terminal sequence of VVACVAAAGAPG, which
matches exactly a sequence starting five residues upstream of the
partial E-tag through the residue following (see Figs. 3
and 5
), in the
E-tag frame. This most likely represents a proteolytic product from
which the true NH2 terminus has been cleaved.
Recombinant hybrid proteins derived from fusions to ß-galactosidase
are known to be highly unstable and subject to proteolytic degradation
(32)
. Although this result does not help us determine the
start or shift sites, it does demonstrate that a protein of about the
molecular weight of ß-galactosidase that purifies on a
ß-galactosidase affinity column expresses E-tag in a frame +1
relative to the ATG start in the DNA sequence. This is strong evidence
against a downstream initiation event as being responsible for
expression, at least for this +1 shifting clone.
Weak expression of one of the background control vectors
The small but significant level of expression of control vector
pJC27.2 (Table 1
, line 2) was consistent even though the reporter was
in the +1-frame relative to the translation start. Since, in general,
expression of our fusion constructs with the reporter in the +1-frame
relative to the E-tag frame (designated by the .2 suffix) was not
significantly higher than the lac deletion host (Table 1)
,
this observation was of little concern for interpreting our
experiments.
One possible speculation about the origin of this weak frameshifted
expression comes from inspection of the pJC27.2 sequence (Fig. 2)
:
codons 19 and 20 in the 0-frame are CCC CGA. The CGA (Arg) codon is one
of the rarest in E. coli (33)
, and in fact is
used so infrequently that cells can tolerate missense suppressors of
this codon at the astonishingly high level of 8% efficiency
(34)
. Thus, it is possible that the CGA codon in the A
site of the ribosome facilitates a prolonged vacancy of this site,
allowing the peptidyl-tRNAPro in the P site to
slide to the right one base (still paired with a CCC triplet), shifting
to the reporter frame. This may be similar to the recoding sequence CCC
UGA, which shifts to the +1 frame at a frequency of 24%
(35)
. It is known in other systems that the extent of
vacancy of the A site facilitates +1 frameshifts (10
, 11
, 19
, 20
, 36
, 37)
. The window of opportunity for the +1 shift in pJC27.2 is
not extensive: the ribosome will encounter a UGA at codon 28 or a UAA
at codon 38 in the 0-frame.
| CONCLUDING REMARKS |
|---|
|
|
|---|
Experiments are under way to determine the amino acid sequence of
these constructs so as to determine where the frameshifts and/or hops
occur. Once we obtain these results, we will be in a position to
identify the minimal sequence required for the recoding, and we will be
able to use site-directed mutagenesis to identify essential nucleotides
and/or structures involved in the recoding. Pending those results, we
can only speculate as to what features in this unusual nucleotide
sequence lead to such striking expression patterns. The bias in the
design of the initial random peptide library, which was 67% G+T, may
be responsible for a greater degree of structure in the message than
usual, since G-U base pairs are permissible in RNA structures. Nascent
peptides are also known in some instances to affect ribosome movement
(3
, 38)
. Whatever the mechanism turns out to be, this
unusual sequence has underscored again the remarkable flexibility of
the biosynthetic (most likely, translational) apparatus in implementing
gene expression.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
L. SONG, W. MANDECKI, and E. GOLDMAN Expression of non-open reading frames isolated from phage display due to translation reinitiation FASEB J, September 1, 2003; 17(12): 1674 - 1681. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Cai, M. Xaymardan, J. M. Holm, J. Zheng, J. R. Kizer, and J. M. Edelberg Age-associated impairment in TNF-{alpha} cardioprotection from myocardial infarction Am J Physiol Heart Circ Physiol, July 11, 2003; 285(2): H463 - H469. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |