FASEB J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published as doi: 10.1096/fj.06-7330com.
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
fj.06-7330comv1
21/3/851    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by de la Vega, M. R.
Right arrow Articles by Avilés, F. X.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by de la Vega, M. R.
Right arrow Articles by Avilés, F. X.
(The FASEB Journal. 2007;21:851-865.)
© 2007 FASEB

Nna1-like proteins are active metallocarboxypeptidases of a new and diverse M14 subfamily

Monica Rodriguez de la Vega*, Rafael G. Sevilla{dagger}, Antoni Hermoso*, Julia Lorenzo*, Sebastian Tanco*, Amalia Diez{dagger}, Lloyd D. Fricker{ddagger}, José M. Bautista{dagger},1 and Francesc X. Avilés*,1

* Institut de Biotecnologia i de Biomedicina, Universitat Autonoma de Barcelona, Bellaterra (Barcelona), Spain;

{dagger} Department of Biochemistry and Molecular Biology IV, Universidad Complutense de Madrid, Facultad de Veterinaria, Ciudad Universitaria, Madrid, Spain; and

{ddagger} Department of Molecular Pharmacology, Albert Einstein College of Medicine, Bronx, New York, USA

1Correspondence: F.X.A., Institut de Biotecnologia i de Biomedicina and Department Bioquimica i Biol. Mol., Universitat Autonoma de Barcelona, 08193 Bellaterra (Barcelona), Spain; E-mail: FrancescXavier.Aviles{at}uab.es; or J.M.B., Department of Biochemistry and Molecular Biology IV, Universidad Complutense de Madrid, Facultad de Veterinaria, Ciudad Universitaria, 28040 Madrid, Spain; E-mail: jmbau{at}vet.ucm.es


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Nna1 has some sequence similarity to metallocarboxypeptidases, but the biochemical characterization of Nna1 has not previously been reported. In this work we performed a detailed genomic scan and found >100 Nna1 homologues in bacteria, Protista, and Animalia, including several paralogs in most eukaryotic species. Phylogenetic analysis of the Nna1-like sequences demonstrates a major divergence between Nna1-like peptidases and the previously known metallocarboxypeptidases subfamilies: M14A, M14B, and M14C. Conformational modeling of representative Nna1-like proteins from a variety of species indicates an unusually open active site, a property that might facilitate its action on a wide variety of peptide and protein substrates. To test this, we expressed a recombinant form of one of the Nna1-like peptidases from Caenorhabditis elegans and demonstrated that this protein is a fully functional metallocarboxypeptidase that cleaves a range of C-terminal amino acids from synthetic peptides. The enzymatic activity is activated by ATP/ADP and salt-inactivated, and is preferentially inhibited by Z-Glu-Tyr dipeptide, which is without precedent in metallocarboxypeptidases and resembles tubulin carboxypeptidase functioning; this hypothesis is strongly reinforced by the results depicted in Kalinina et al. published as accompanying paper in this journal (1) . Our findings demonstrate that the M14 family of metallocarboxypeptidases is more complex and diverse than expected, and that Nna1-like peptidases are functional variants of such enzymes, representing a novel subfamily (we propose the name M14D) that contributes substantially to such diversity.—Rodriguez de la Vega, M., Sevilla, R. G., Hermoso, A., Lorenzo, J., Tanco, S., Diez, A., Fricker, L. D., Bautista, J. M., Avilés, F. X. Nna1-like proteins are active metallocarboxypeptidases of a new and diverse M14 subfamily.


Key Words: peptidase classification • CCP • tubulin processing • degradome • Tubulinyl-Tyr carboxypeptidase


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES
 
METALLOCARBOXYPEPTIDASES (MCPs) are exopeptidases that catalyze the hydrolysis of C-terminal amino acids from their substrates. They are found in genomes from phyla of all five biological kingdoms and belong to Clan MC in MEROPS database (2) , which contains only one peptidase family, M14. The active site of M14 peptidases contains an essential catalytic zinc atom per molecule, which is penta-coordinated in a slightly distorted tetrahedral manner to two His, one Glu, and a water molecule (3) . One of the histidines and the glutamate occur in the motif His-Xaa-Xaa-Glu-Xbb; the third zinc ligand is 103–143 residues C-terminal to this motif. Based on sequence conservation within the motifs containing the zinc ligands and around the catalytic residues, family M14 is grouped into three subfamilies: M14A, M14B, and M14C. Such divergence is confirmed by phylogenetic analyses. In subfamily A, the zinc ligands occur within the motifs His-Xaa-Arg-Glu-Xbb, in which Xaa is Ser or Ala, and Xbb is Trp or His; and Xaa-His-Xbb-Tyr-Ser-Xcc, in which Xaa is a hydrophobic residue, Xbb is Ser or Thr, and Xcc is Gln or Glu. In subfamily B, the motifs are His-Gly-Xaa-Glu-Xbb, in which Xaa is Asp or Asn and Xbb is uncharged; and Xaa-His-Gly-Gly-Xaa-Xbb, in which Xaa is any small amino acid, and Xbb is hydrophobic or Arg (4) .

Members of the M14A subfamily contain a catalytic domain ~300 residue in length preceded by a prosegment of 90–100 residues at the N terminus. The precursor of the enzymes is either completely inactive or has greatly reduced enzyme activity relative to the form with the prosegment removed. Members of M14B subfamily are not produced as inactive proenzymes; instead of the prosegment, these proteins contain a transthyretin-like subdomain at the C terminus of catalytic domain. Members of this subfamily contain other domains and even repeats of the carboxypeptidase domain (3) . The M14C subfamily comprises the bacterial orthologs of D-glutamyl-(L)-meso-diaminopimelate peptidase I (5) .

Most M14A and M14B peptidases function either within the secretory pathway or after secretion from the cell (3) . A cytosolic metallocarboxypeptidase functions in the removal of Tyr from the C terminus of {alpha}-tubulin (tubCP). Despite much effort, there is no sequence information of tubCP, and the gene (or genes) corresponding to this activity have not previously been reported (6) . Two members of the M14B subfamily may be expressed in the cytosol and nucleus: the adipocyte enhancer binding protein-1 and CPD-N, but neither are functional as the tubCP because both are reported to be specific for C-terminal basic residues (7 , 8) . Recently, a novel gene transcript related to MCPs was identified as being up-regulated in spinal cord of mice subjected to sciatic nerve transection or crush injury. After the initial rise following nerve crush, transcript levels decline in affected motor neurons, with a time course coincident with target reinnervation. This transcript was named Nna1 (nervous system nuclear protein induced by axotomy) (9) and was identified as the gene mutated in the classical recessive mouse mutant Purkinje cell degeneration (pcd) (10) . Nna1 is also known as ATP/GTP binding protein (AGTPBP-1), and human related genes have been named ATP/GTP binding-like proteins: AGTPBP1, AGBL2, AGBL3, AGBL4, and AGBL5 [Human Genome Organization (HUGO) Gene Nomenclature Committee: http://www.gene.ucl.ac.uk/nomenclature/; human, mouse and rat degradomes: http://www.uniovi.es/degradome/]. Previous reports did not demonstrate carboxypeptidase activity for Nna; however, a recent study found that some amino acids thought to be important for carboxypeptidase-like activity (based on structural modeling) are necessary for the biological function of this protein in Purkinje cells (11) .

The large number of putative unclassified MCPs sequences prompted us to perform a detailed scanning of genomic, cDNA, and protein-based databases to give rise to a structurally driven classification analysis for sequences carrying the M14 signature. It has been completed with an extensive phylogenic analysis to demonstrate that Nna1-like proteins represent a new M14 subfamily. The cDNA encoding a representative member of this new metallocarboxypeptidase group was expressed in bacteria and the enzyme properties were studied; these results show that Nna1-like proteins are active metallocarboxypeptidases. We also collected evidence about the ability of such new enzymes to act on tubulin, a fundamental protein for cytoskeleton and an important pharmaceutical target. Overall, our study provides the necessary groundwork for further studies aimed at understanding the multiple physiological and pathological processes catalyzed by Nna1-like proteins, and stresses the importance of this new subfamily of metallocarboxypeptidases. A new name, cytosolic carboxypeptidases(CCPs), is proposed for members of this subfamily.


   MATERIALS AND METHODS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Database searches
The Protein Information Resource (PIR) nonredundant reference database (12) was scanned using the HHMER program (13) to detect metallocarboxypeptidase sequences using the peptidase_M14 PFAM model (14) . Those sequences that matched the Hidden Markov Model profiles with an e-value of <1 x 10–10 were grouped as a candidate dataset. All sequences shorter than 230 aa were discarded. All candidate sequences in the dataset were aligned to one another using basic local alignment search tool (BLAST) 2.2.x (15) blastp option of blastall program with default parameters. BLAST scores for each sequence analysis were used to generate a lower triangular matrix of all members in the dataset. This matrix was used to generate a clustering tree with the UPGMA algorithm of the PHYLIP neighbor program (16) . The novel putative metallocarboxypeptidase subfamily had already been defined as a clustered separate block of sequences at considerable BLAST-derived distance from well-known classic carboxypeptidases. JOY (17) structural alignment information used in the modeling procedure (explained below) was taken into account during this iterative process and was stopped when no more promising sequences could be added to the working set.

To expand the analysis, in those organisms where information from current genome projects could be extracted, working sequences were mapped to chromosomal locations using available annotation and/or by searching the all of the currently sequenced genomes with BLAST and SSAHA (18) applications in ENSEMBL (19) . Sequences with an incomplete carboxypeptidase domain or ones that appeared to represent alternative splicing variants were sorted out in order to reduce and refine the working set of proteins.

Comparative conformational modeling
Caenorhabditis elegans gene EEED8.6 encodes the C. elegans AGBL4 homologue (ceAGBL4). The amino acid sequence of ceAGBL4 was taken from the C. elegans ORFeome cloning project database (http://worfdb.dfci.harvard.edu/). The amino acid sequences and 3-dimensional (3-D) structures of pancreatic CPA from Bos taurus (2ctc) and Sus scrofa (1pca), pancreatic carboxypeptidase B (CPB) from Sus scrofa (1nsa), and CPT from Thermoactinomyces vulgaris (1obr) were obtained from the Protein Data Bank (www.rcsb.org/pdb/) and used as templates in comparative modeling.

Secondary structure predictions of ceAGBL4 were obtained using PSIPRED (20) and its carboxypeptidase domain (CP domain) limits were defined. The ceAGBL4 CP domain sequence was aligned to templates using BLAST and FUGUE (21) . Key residues involved in substrate binding and catalysis were annotated with a secondary structure. Models were generated using MODELLER 8v1 (22 , 23) and later evaluated for correctness of stereochemistry, energy distribution, and fold assessment quality with PROCHECK (24) , VERIFY3D (25) and JOY. The process of modeling and manual realignment using GeneDoc (26) was repeated until models with good geometry and conformation were obtained. Root mean square (r.m.s.) deviation calculations of the modeled structures with respect to the crystallographic ones were obtained using COMPARER (27) . PyMOL (28) was used for representation and visual inspection of models. All residues are numbered according to mature bovine carboxypeptidase A1.

Phylogenetic analyses
A total of 151 protein sequences belonging to M14 family from 46 different species representing the major lineages from Eubacteria and a wide range of eukaryotes were aligned with ClustalX (29) in order to infer the evolutionary history of these proteins (Supplemental Table A). Manual arrangement to correct disruptive inserts in the alignment was carried out using GeneDoc. These corrections respected all conserved motifs and functional domains of M14 family. The definitive maximum length alignment for the complete phylogenetic analysis ranged from 370 sites in the case of the described metallocarboxypeptidase subfamilies M14A, M14B, and M14C to 565 sites for the novel putative carboxypeptidase unclassified sequences. In this alignment, 531 sites were "parsimony informative" (Supplemental Fig. A). A manual correction of the alignment was required especially for the sequences from Trichomonads, Kinetoplastids (Trypanosomatidae), Apicomplexans, Rhodopirellula baltica, and Idiomarina loihiensis, since they were largely divergent from the rest of the taxa analyzed.

Visual exploration of the multiple alignments detected eight incomplete sequences, which were excluded from the general phylogenetic analysis shown in Fig. 4 and supplemental Table B. A minimal length alignment (Supplemental Fig. B) was used to include these eight sequences in the independent analyses performed with those clearly differentiated clades from the global analysis, defined as M14D1, M14D2, M14D3, and M14D4; each was analyzed against an out-group formed by two sequences belonging to the direct ancestor group (see Fig. 5 ).


Figure 1
View larger version (41K):
[in this window]
[in a new window]

 
Figure 1. Diverse domain organization of Nna1-like proteins. Conserved N-terminal domain (Nt) and carboxypeptidase domain (CP) are represented as blue boxes and green tubes, respectively. Other conserved motifs among subgroups are represented as colored boxes, with numbers indicating the PfamB family. Bacterial signal peptide is represented as an orange box labeled Sp. Conserved motifs located at the N-terminal domain and active site residues of the peptidase unit are indicated.


Figure 2
View larger version (34K):
[in this window]
[in a new window]

 
Figure 2. A) Structural model of C. elegans AGBL4 carboxypeptidase domain obtained by comparative modeling exhibits the characteristic topology of well-known zinc carboxypeptidases. Zn atom is represented as a yellow sphere. B) Superimposition of ceAGBL4 CP domain model (orange) and bovine pancreatic CPA (green) (PDB code: 2ctc). Overall topology is coincident (r.m.s.d value: 0.571). Main differences have been found in LoopA ({alpha}4-{alpha}5), LoopB (ß8-ß9), LoopC (ß2-ß3), and LoopD ({alpha}5-ß5). C) Representation of the ceAGBL4 model active site. Zn ligands, catalytic residues, and substrate binding residues are represented as sticks. Zn ligands are shown in green, catalytic residues in orange, S1' site in blue, and specificity pocket residues in red. Zinc atom (Zn) and water molecule (W) are represented as yellow and red spheres, respectively. Bovine CPA numbering has been used.


Figure 3
View larger version (39K):
[in this window]
[in a new window]

 
Figure 3. Sequence Logos for first (A) and second (B) zinc binding motifs in M14 peptidase subfamilies. The representation was generated using Weblogo (http://weblogo.berkeley.edu/logo.cgi). The height of each amino acid indicates the relative frequency of that amino acid at that position.


Figure 4
View larger version (115K):
[in this window]
[in a new window]

 
Figure 4. Phylogeny of M14 family. Reconciled tree of metallocarboxypeptidases obtained from the definitive alignment (Supplemental Fig. A) by the Minimum Evolution method under the Poisson Correction model. Numbered nodes were specifically analyzed for consistency by Bootstrap and Interior Branch Test; values are given in supplemental Table B. Abbreviated protein names, including sequence codes, are given in supplemental Table A. AGTPBP1, AGBL2, AGBL3, AGBL4, and AGBL5 group those homologues to the human Nna1-related genes named according to the HUGO Gene Nomenclature Committee: http://www.gene.ucl.ac.uk/nomenclature/; and the human, mouse, and rat degradomes: http://www.uniovi.es/degradome/; Tryp1 and Tryp2 group Clade 1 and Clade 2, respectively, from the family Trypanosomatidae; Trich groups T. vaginalis paralogous; Rho, Rhodopirellula baltica; Idi, Idiomarina loihiensis.


Figure 5
View larger version (35K):
[in this window]
[in a new window]

 
Figure 5. Phylogeny of the four groups of M14D subfamily. Trees derived from partial analyses within the novel subfamily M14D, using an out-group formed by two sequences belonging to the direct ancestor group. The topologies of M14D subfamily groups were obtained from the minimal length alignment (Supplemental Fig. B) by IBT (1000 replicates), NJ method under JTT model. Abbreviated protein names, including sequence codes, are given in supplemental Table A. AGTPBP1, AGBL2, AGBL3, AGBL4, and AGBL5 respectively group those homologues to the human Nna1-related genes named according to the HUGO Gene Nomenclature Committee: http://www.gene.ucl.ac.uk/nomenclature/; and the human, mouse and rat degradomes: http://www.uniovi.es/degradome/; Tryp1 and Tryp2, group Clade 1 and Clade 2, respectively, from the family Trypanosomatidae; Trich groups T. vaginalis paralogous. Proposed classification of eukaryotic Nna1-like genes into groups M14D2, to M14D4 follows tree topologies. M14D2 is the most ancestral eukaryotic group comprising most protista (except Tryp2). Although relative average distances within M14D2 clade are the largest and their basal definition is not always well supported, polyphyly in M14D2 seems to be due only to the atypical Trich group. Thus, the other groups in M14D2, including Plasmodium, Tryp1, and vertebrates AGBL4 and AGBL5, have a common ancestor, which justifies including them within a common clade. In addition, M14D1, M14D3, and M14D4 are always well defined at basal nodes, justifying, by exclusion, that M14D2 is a group by itself.

Phylogenetic reconstructions were performed using MEGA3 (30) . Branch lengths of the inferred phylogenies were estimated by Neighbor-Joining (31) and Minimum Evolution methods (30) under the Poisson Correction (30) and JTT substitution models (32) . Before selection of best-fit models of protein evolution to use in the analyses, the sequence alignment was tested using Prottest (33) . Bootstrap (34) and the Interior Branch Test (30) (1000 replicates) were calculated for each tree topology obtained by each method, and an evolutionary model was used. Gaps and missing data were handled by computing the distance between each pair of sequences ignoring the gaps in the pairwise comparison (distance matrix provided as Supplemental Table C). Tree plotting was performed using the MEGA3 Tree Explorer.

Expression of cDNA encoding the C. elegans AGBL4
Gateway-compatible open reading frame (ORF) coding for the Nna1-like peptidase ceAGBL4 (EEED8.6) derived from the C. elegans ORFeome collection version 1.1 (35) was purchased from OpenBiosystems (http://www.openbiosystems.com) as a stock culture of Escherichia coli HT115(DE3) containing the ORF clone into the pL4440-Dest RNAi feeding vector. Plasmids were amplified, purified using GFX Micro Plasmid Prep Kit (Amersham Biosciences, Arlington Heights, IL, USA), and digested with EcoRV (Roche, Nutley, NJ, USA) to isolate the inserted DNA fragment. The insert was amplified by polymerase chain reaction (PCR) using the following gene-specific primers that contained 5' ligation-independent cloning (LIC) (36) compatible ends: forward 5'-GGTATTGAGGGTCGCATGGGCTATGTTGGAAATGTCAGCTATCCT-3'; reverse 5'-AGAGGAGAGTTAGAGCCTTAAGTTTTCTGGGCGCGGGCGGGTA-3'. Ligation was carried out using the pET30 Xa/LIC vector (Novagen, Madison, WI, USA) as recommended by the manufacturer, and the resulting recombinant plasmid was transformed in E. coli NovaBlue GigaSingles. Clones were sequence verified.

Escherichia coli BL21-Gold(DE3) cells were transformed for protein expression. ceAGBL4 was expressed as a fusion protein with a thrombin-cleavable His-tag and a factor Xa-cleavable S-tag (epitope tag composed of a 15 residue peptide derived from pancreatic ribonuclease A) at the N terminus. Protein expression was induced with 0.1 mM isopropyl-ß-D-thiogalactopyranoside for 5 h at 28°C. Cells were harvested by centrifugation at 5000 g for 10 min at 4°C and resuspended using 5 ml of BugBuster (Novagen) solution per gram of cell paste supplemented with 1 µM 4-(2-aminoethyl)benzenesulphonyl fluoride (Sigma, St. Louis, MO, USA), lysozyme, and bezonase (Novagen) according to the manufacturer’s recommendations. Lysates were clarified by centrifugation at 20,000 g at 4°C for 30 min. BL21 cells expressing glutathione S-transferase were used as expression control. Soluble fractions from control and ceAGBL4-expressing cells were analyzed by Western blot. Western blot analysis was carried out using polyvinylidene fluoride membranes in 5% methanol at 4°C with His-tag mouse antibody or a specific metallocarboxypeptidase rabbit antibody produced against the peptide CYNGFDLNRQWSNPIGY located close to the active site of the enzyme.

Enzymatic characterization
CPA-like substrates furyl-acryloyl-L-phenylalanyl-L-phenylalanine (faFF) (37) , N-(4-methoxyphenylazoformyl)-L-phenylalanine (AzoF) (38) , and CPB-like substrates furyl-acryloyl-L-alanine-L-lysine (faAK) (39) and anisylazoformyl-L-arginine (AzoR) (40) , were purchased from Sigma and Bachem (Torrance, CA, USA). Carboxypeptidase O (CPO) -like substrate N-benzoyl-L-alanyl-L-glutamic acid (BzAE) was kindly provided by Dr. Michael Edge (AstraZeneca, Wilmington, DE, USA). Adenosine triphosphate (ATP), adenosine diphosphate (ADP), adenosine mono-phosphate (AMP), dimethyl sulfoxide (DMSO), ethylenediaminetetraacetic acid (EDTA), 1,10-phenanthroline (o-phenanthroline), benzylsuccinic acid, Z-Gly-Tyr, and Z-Glu-Tyr were purchased from Sigma.

Soluble fractions of control or ceAGBL4-expressing cells extracts were assayed against specific carboxypeptidase substrates AzoF, faFF, AzoR, faAK, and BzAE. Because of the limited solubility of substrates in water, dimethyl sulfoxide (DMSO) was used as cosolvent. Final concentration of DMSO in the assay buffer was <2%. Enzymatic reactions were developed in 1 ml final volume of an aqueous buffer containing 10 mM Tris pH 7.5. The final concentrations of substrates in the reaction were 0.1 mM for AzoR or AzoF, 0.2 mM for faFF and faAK, and 0.75 mM for BzE. The rate of hydrolysis of each substrate was continuously monitored spectrophotometrically at 25°C by measuring the absorbance at 350 nm (AzoR and AzoF), 330 nm (faFF and faAK), or 254 nm (BzAE) for 20 min after the addition of substrate. Inhibitory or activation kinetic assays were performed using AzoF as substrate. Final concentrations of inhibitors/activators used were as follows: EDTA 10 mM, o-phenanthroline 10 mM, benzylsuccinic acid 10 mM, ZnCl2 from 10–8 M to 10–3 M, ATP, ADP, or AMP 2 mM and Z-dipeptides 15 mM.


   RESULTS AND DISCUSSION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES
 
In the present study, we searched within the currently available genomes, transcriptomes, and protein sequences databases in order to profile the Nna1 gene family in various taxa. We found more than 100 Nna1 homologues in bacteria, Protista, and Animalia (Table 1 ), but not in Archaea, Fungi, or Plantae. Exhaustive sequence studies with all the putative Nna1-like sequences revealed distinctive architecture in domains despite sharing a characteristic zinc carboxypeptidase signature confirmed by both Pfam (http://www.sanger.ac.uk/Software/Pfam/) and MEROPS (http://merops.sanger.ac.uk/) databases. Nna1-like genes that did not appear to conserve the peptidase unit were excluded throughout the study; for example, mouse and human Nna1-like gene products that had sequence similarity only to the N-terminal portion of Nna1 were not further considered in this study, and so only five mouse and human Nna1-like gene products were detected by this analysis. In a more thorough analysis of the mouse genome, this partial sequence in the mouse database was extended using PCR and found to contain further Nna1-like sequence similarity over the entire peptidase domain; thus there are a total of six Nna1-related genes in the mouse genome, and presumably in the human genome as well (1) .


View this table:
[in this window]
[in a new window]

 
Table 1. Organisms shown to contain Nna1-like sequences after sequence data mininga

Domain architecture and sequence analysis
Nna1-like proteins range from 400 to 2000 residues in length and comprise a common metallocarboxypeptidase domain of ~300 residues. From the full-length sequence alignments, an N-terminal conserved domain was also identified that is ~150 residues in length and contains three highly conserved motifs: F[E,D]SGNL at the N terminus, W[F,Y][Y,H,N]Y 60 residues downstream, and [F,Y]P[F,Y][S,T]Y at the C terminus, right before the peptidase domain (Fig. 1 and Supplemental Fig. A).

Nna1-like genes from the different phyla are highly diverse (Fig. 1) . Some bacterial Nna1-like proteins display a signal peptide at their N terminus, which suggests they may be secreted to their environment. In contrast, eukaryotic Nna1-like proteins lack this signal peptide, which correlates with the experimentally demonstrated intracellular localization of Nna1 in cultured murine neurons (1 , 9) .

We found three different organizations in protist eukaryotes (Fig. 1) . Species from the genera Plasmodium have only one large Nna1-like gene (>1000 residues), with long N- and C-terminal extensions and widespread insertions of up to 300 residues long that were not conserved, probably caused by the particular nucleotide composition bias shown in this genera (41) . Trichomona vaginalis shows 11 paralogous genes encoding Nna1-like peptidases from 500 to 600 residues, all with N- and C-terminal extensions <100 residues in length. On the other side, Trypanosomatids encoded three paralogous genes of 800-1200 residues, depending on N-terminal extensions size.

Nna1-like genes of multicellular eukaryotes also display several organization patterns. The AGTPBP1 homologues are the largest proteins, reaching up to 1200 residues in length. They present the CP domain located at the C terminus and wide N-terminal extensions with highly conserved motifs, among them the PfamB families (conserved protein motifs with unknown function) PB052974, PB063034, and PB018501 (Pfam 20.0, PfamB clusters information are given as supplemental data). AGBL2 and AGBL3 homologues are similar in sequence, the main differences found in Cys distribution. Their size varies from 400 to 1000 residues due to the presence or absence of C-terminal extensions. The presence of two conserved motifs, PB012725 and PB009161, at the N and C terminus, respectively, also defines these closely related AGBL genes. AGBL4 homologues are relatively small due to the lack of N- or C-terminal extensions and the absence of other un-Nna1-like conserved domains. These AGBL4 genes are unique in their active site, with two conserved Arg residues close to position 255 instead of the single Arg found in the rest of the Nna1-like peptidases (Supplemental Fig. B). AGBL5 homologues extend between 700 and 800 residues, with a large CP domain centered in the sequence and an additional highly conserved domain, the proline-rich PB028944 in the C-terminal side of the CP domain. Two other features distinguish AGBL5 homologues: first, substitution of the motif FESGNL located at the beginning of the N-terminal domain, by FDSGNL; second, the presence of two main insertions in the CP domain just before the zinc-anchoring motifs: the shortest is 20 residues in length (PB017213 and PB036922) and is located in ß2-ß3 loop; the largest consists of a 80 residue-long insertion (PB019879) located in {alpha}5-ß5 loop (Fig. 1 and Supplemental Fig. A).

M14A carboxypeptidases have a conserved 100 residue-long proregion located at the N terminus of the peptidase domain. This proregion folds in a globular-independent unit called the activation domain, which is necessary for the folding of the adjacent peptidase domain (42) . There is no amino acid sequence similarity between M14A prodomain and the conserved N-terminal domain of Nna1-like peptidases and, based on secondary structure predictions and fold recognition analyses using FUGUE and GeneTHREADER (43 , 44) , we found no clear structural relations between them. N-terminal domain sequences were submitted to PSI-BLAST (45) analysis, but no significant sequence homologues were found other than Nna1-like species. Searches using Pfam HMMs (hidden Markov models) showed clear links to PB003298, PB030815, PB029118, PB017470, PB011640 and PB003904 domains. The PfamB families identified above only comprise Nna1-like proteins, confirming earlier PSI-BLAST results.

Although it was not possible to assign a function to the N-terminal domain of Nna1-like peptidases by sequence homology or by fold recognition analysis, we suggest that this domain might act as a folding domain because it is conserved among Nna1-like peptidases, it is specific for these proteins and does not appear to exist in proteins that do not contain an adjacent CP domain. Furthermore, all known M14 metallocarboxypeptidases have a domain adjacent to the catalytic one that seems to help in folding M14A peptidases (42) at the N terminus and M14B peptidases at the C terminus (3) . The N-terminal domain might also act as a regulatory domain (discussed later) or as a binding domain, as is the case of the N-terminal extensions found in M14C peptidases (46) .

ceAGBL4 CP domain model
To examine whether Nna1-like peptidases conserve the typical secondary structures and the spatial conformation of the active site of known MCPs, we built a 3-D model of ceAGBL4 CP domain. Based on secondary structure predictions, the ceAGBL4 CP domain was defined from residue 131 to 415 and the conserved N-terminal domain motif YPYTY was located right before the first {alpha} helix of the peptidase unit. An initial alignment of templates and ceAGBL4 CP domain sequence was achieved using fold recognition software FUGUE. After >10 rounds of modeling and manual realignment, the Ramachandran plot for the final model showed 87.4% residues in most favored regions, 12.6% in allowed regions, and 0.0% in disallowed regions. The PROCHECK overall G-factor was –0.19 and the VERIFY3D report indicated that no poor areas were present in the model.

The ceAGBL4 CP domain model shows a fold containing an {alpha}/ß/{alpha} sandwich structure with an antiparallel ß-sheet of eight strands (Fig. 2 A). The main chain atoms of the ceAGBL4 CP domain model and templates can be superimposed fairly well using COMPARER, displaying closest topological similarity to pancreatic MCPs. The overall r.m.s. deviation values calculated between the enzyme moieties of the model and the templates denote that all regular secondary structures of the model lie in regions topologically equivalent to porcine CPA (0.679), bovine CPA (0.571), and porcine CPB (0.567). A mayor divergence was observed between the model and the CPT from T. vulgaris, with an r.m.s. deviation value of 1.574.

The main differences between the model and templates were found within the lengths of the loops (Fig. 2B ). At the active site entrance, the largest loop ({alpha}4-{alpha}5) of pancreatic carboxypeptidases has been reduced to half the number of residues in Nna1 homologues (LoopA), but the ß8-{alpha}9 loop is a few residues larger in ceAGBL4 (LoopB) and other Nna1-like peptidases. LoopB might interact with natural substrates of these enzymes or confer resistance against natural protein inhibitors, as is the case of the Helicoverpa zea CPB (47) . However, the ceAGBL4 CP domain model clearly has a more accessible active site when compared with pancreatic carboxypeptidases, suggesting that the Nna1-like enzymes are capable of hydrolyzing bulky substrates like compact proteins, which do not have an extended C-terminal tail. At the opposite side of the peptidase domain, the ß2-ß3 loop (LoopC) and the {alpha}5-ß5 loop (LoopD) are both larger in ceAGBL4 than in M14A peptidases. Most of the insertions found in Nna1-like sequences are located in those loops, suggesting they could play a role in the biological function of the native protein.

The predicted active site of ceAGBL4, the residues involved in the coordination of the Zn atom and the series of conserved key residues that form the different active center subsites have essentially the same conformation described in other well-characterized metallocarboxypeptidases (Fig. 2C ).

Active site analysis
Nna1-like peptidases conserve the Zn ligands and the catalytic residues of metallocarboxypeptidases (Fig. 2C ), but the motifs where they occur differ from the ones defined for M14A, M14B, or M14C subfamilies (Supplemental Fig. A). In M14 family, the Zn atom is held in place by penta-coordination with two His residues (His69 and His196), one Glu residue (Glu72), and a water molecule (the numbering system corresponds to bovine CPA and will be used throughout). Zinc ligands are located in two zinc-anchoring motifs. The first Zn binding motif of Nna1-like peptidases contains a highly conserved proline adjacent to His69, in contrast to known MCPs, which contain an alanine or glycine in this position (Fig. 3 A). A new consensus pattern for the first Zn-anchoring motif in Nna1-like proteins can be defined as His-Pro-Gly-Glu-[Ser,Thr]. The second Zn-anchoring motif of Nna1-like peptidases also differs from those found in M14 subfamilies (Fig. 3B ). It can be defined as [aliphatic,aromatic]-His-[Gly, Ala,Ser]-His-[Ser, Ala] for eukaryotic Nna1-like peptidases and [aliphatic]-His-Gly-Asp-Glu for prokaryotic ones.

In addition to zinc ligands, other important MCPs residues involved in catalysis or in substrate binding are conserved in Nna1-like peptidases (see Fig 2C ); these include Arg127, which helps to stabilize the oxyanion hole in the S1 site; Glu270, which is the general base for catalysis (48) ; and Asn144 and Arg145 at the S1' site, which bind to the C-terminal carboxylate group of the substrate. The Arg71 residue at the S2 binding site is substituted by an Asn residue in M14B or M14C subfamilies. In contrast, Nna1-like peptidases lack basic or amide polar residues at this position (Supplemental Fig. A).

The role of Tyr248 at the S1' site of MCPs has been the subject of much debate over the years. From the X-ray crystal structures of CPA in complex with Gly-Tyr it has been proposed that Tyr248 plays a role as a proton donor (49) . Subsequent high-resolution X-ray crystallographic study of the complex failed to confirm the proposition and demonstrated that the Tyr phenolic hydroxyl instead forms a hydrogen bond with the terminal carboxylate of the substrate (50) ; a kinetic study performed with the Y248F mutant rat CPA showed that Tyr248 was not required for the catalytic process (51) . In contrast, when this study was repeated with bovine CPA and its Y248F and Y248A mutants, it was shown that Tyr248 is essential for the catalytic activity of bovine CPA and that its aromatic ring plays a significant role in it (52) . Nna1-like peptidases have either a Tyr or Phe at position 248, preserving the aromatic ring (Fig. 2 and Supplemental Fig. A). Based on this substitution, T. Wang et al. proposed that the substrate(s) for Nna1 may have structural characteristics that set them apart from those of other members of M14 family (11) ; controversial results regarding the role of Tyr248 in the enzymatic reaction indicate that this is an important issue to be investigated.

In M14A members, residues Ser-194, Ile243, Ser-253, Ile255, and Thr268 define the substrate specificity pocket. Predictions of individual specificities are based largely on the conformational and space-filling effects of the amino acids in these positions (53) . In Nna1-like peptidases, CPA1 Ser-194 has been replaced by a highly conserved Asp residue (this position is adjacent to the second zinc binding motif; Fig. 3B ) and CPA1 Ile243 has been replaced by a Glu residue in ceAGBL4. According to the ceAGBL4 model (Fig. 2C ), the acidic side chain of Asp 194 and Glu243 point into the active site cleft, suggesting a possible catalytic role (discussed bellow). The major contributor to the substrate specificity of M14A peptidases is residue 255, which interacts with the side chain of the substrate’s C terminus (54) . In M14B peptidases, this role is carried out by a highly conserved Asp in position 207 (55) . M14A, M14C, and Nna1-like peptidases have a conserved Gly residue at position 207; therefore, we can be relatively confident that residue 255 is responsible for the specificity of Nna1-like peptidases. Given that there is no sequence similarity with the M14A peptidases in this region, it is not straightforward to accurately predict this position in Nna1-like peptidases. Based on secondary structure predictions and fold recognition results, there are three optional residue types that could occupy position 255: an aliphatic/hydrophobic residue, a conserved Arg residue, or an Ala, Ser, or Gly residue. The first possibility restricts the substrate specificity of Nna1-like peptidases to CPA-like; the second option restricts it to CPO-like activity, and the third confers less restrictive substrate specificity. The presence of a small uncharged residue at position 255 is thought to confer broad specificity to H. armigera CPA1 (haCPA1), which has Ser at this position and hydrolyzes CPA-like and CPB-like substrates (56) . haCPA1 also presents Ser-194 substituted by an Asp residue. The role of Asp194 should be further investigated to confirm whether the presence of such an acidic residue at this position could be implicated in CPB-like specificity of haCPA1 and Nna1-like peptidases (see below for kinetic results). The Glu243 substitution in ceAGBL4 could also contribute to the CPB-like activity of this enzyme. As shown below, C. elegans AGBL4 displays CPA-like and CPB-like activities; those results were also obtained for human AGBL3 peptidase (not shown). Thus, based on our substrate specificity results and the sequence similarities with haCPA1 on specificity pocket residues, it can be suggested that Nna1-like peptidases have Ala, Ser, or Gly at position 255 (Supplemental Fig. A). Nevertheless, until the 3-D structures of these new peptidases are determined, the exact residues that contribute to substrate specificity remain unknown.

In summary, Nna1-like peptidases conserve the zinc ligands as well as the key catalytic and substrate binding residues of the M14 family, but the motifs where they occur are distinct from the other M14 subfamilies. Some Nna1-like peptidases are classified into the M14B subfamily in the present MEROPS classification; however, on the basis of well-defined and different sequence conservations at the active site motifs, we propose that Nna1-like peptidases constitute a new M14 subfamily, which we have tentatively named M14D. The divergence of M14D peptidases has been demonstrated by phylogenetic studies.

Distinct and early separation of carboxypeptidases and Nna1-like proteins
Amino acid sequences from the M14 peptidase family produced a final alignment of 565 positions whose positional identity was difficult to establish for certain sites. Large gaps due to unique sequences from some organisms were excluded from the analysis because of ambiguity. In the alignment, 14 positions were invariant and 531 were parsimony informative. Amino acid composition was homogeneous across most sequences according to a 5% {chi}2 test. Nevertheless, 14 sequences showed an amino acid composition bias of >10% variation: eight from bacteria (G. kaustophilus, B. sphaericus, P. putida, P aeruginosa, Z. mobilis, Azoarcus EbN1, R. solanacearum, B. pseudomallei) and six from Protozoa, mostly from the genera Plamodium (Paramecium tetraurelia, Plasmodium falciparum, P. yoelii yoelii, P. berghei, P. chabaudi, P. gallinaceum).

The molecular phylogeny of M14 carboxypeptidase family and their homologous Nna1-like proteins was reconstructed based on their amino acid sequences by using the Minimum Evolution method with the Poisson Correction model (Fig. 4 ). Phylogenetic inference with Neighbor Joining and Maximum Parsimony produced similar and congruent trees with respect to the four main groups in M14A, M14B, M14C, and M14D (Fig. 4 and Fig. 5 ). All analysis and methods yielded highly concordant topologies, with only slight differences in Bootstrap support for a few basal nodes within the new M14D subfamily members corresponding to sequences from R. baltica, I. loihiensis, T. vaginalis (11 sequences), and those from the family Trypanosomatidae, which distributed into two different clades (8+4 sequences). We can draw several interesting evolutionary trends from a comparison of Bootstrap and Interior Branch Test supports on those nodes (Supplemental Table B) and from the main topologies. First, from a common ancestor there is a distinct and early separation of Nna1-like proteins from the classical carboxypeptidases belonging to the M14A, B, and C subfamilies, with 100% Bootstrap support in all trees. This split is a clear example of how the addition of new functional signatures on a gene causes functional divergence to acquire a new protein function in a group of organisms. This partition supports classifying Nna1-like sequences into a new subfamily, which we propose to name M14D.

Second, gene duplication has accompanied the specialized evolution of Nna1-like peptidases. Thus, the earliest duplication found is in I. loihiensis, which has two paralogous genes clustered separately (one is the most ancestral to all Nna1-like genes and the second is within the bacterial grouping M14D1), also contains two recent copies from N. meningitidis. The derivation of three main subgroups of eukaryotic Nna1-like genes (M14D2, M14D3, and M14D4) reflects several duplication events that are not restricted to vertebrates but also extend to protozoan species (Fig. 5) . Trypanosomatids Nna1-like genes cluster at two main groups, M12D2 and M14D3, revealing two duplication events with great divergences among them. Remarkably, Trichomonas vaginalis shows the most abundant gene duplication, with 11 copies in its genome from two main duplication events clustering together as sister groups within the M12D2 subfamily. The most striking exception of a whole group where the evolution of Nna1-like genes do not proceed through gene duplication is the genus Plasmodium, where a single gene copy was present in the genome of each of the seven species analyzed. According to phylogenetic clustering, gene duplications took place before the divergence of vertebrates (57) , since genes from Arthropoda also distributed along the three Nna1-like eukaryotic groups (M14D2, M14D3, and M14D4). This observation agrees with the fact that many early chordate gene families were formed or expanded by large-scale DNA duplications (58) . According to the AGBL nomenclature regarding its ability to bind ATP/GTP (AGTPBP1, AGBL2, AGBL3, AGBL4, and AGBL5), numbering of these M14D peptidase paralogous genes in humans and mice does not match the phylogenetic distribution (Fig. 5) . Hence, the ancient M14D2 cluster includes AGBL5 and AGBL4, while the most recent M14D4 group encompasses AGBL2 and AGBL3. Although this alternative nomenclature can be applied to most multicellular eukaryota, some exceptions are found (rat has only AGTPBP1, AGBL2, and AGBL3 and mouse has the entire set). Moreover, since not all organisms might contain the full set of paralogous AGBL genes (see Table 1 ), the new gene names require revision, specially in those species that have only retained one gene copy (as those belonging to the genera Plasmodium: M14D2) or with multiple paralogous genes as in Trypanosomatids that contain two M14D2 genes (M14D2a and M14D2b) and a single M14D3 gene.

Third, Nna1-like proteins have evolved from a common ancestor shared in two bacteria R. baltica and I. loihiensis. The sequences from these two species show sister-group relationship with both, classical carboxypeptidases and Nna1-like peptidases (Fig. 4) . It should be mentioned that only a few bacterial genomes contains either previously classified metallocarboxypeptidases or Nna1-like proteins. Bacterial carboxypeptidases from the subfamily M14C are restricted to two families from Firmicutes and one from Actinobacteria.2 Nna1-like genes are spread along nine families from Proteobacteria and Plantomycetes. All of them are Gram-negative and flagellated bacterial. The mosaic pattern of bacterial phylogenetic distribution is most readily explained by vertical inheritance and selective loss in many bacterial lineages, with specific retention in metazoans and some protozoans.

Fourth, the simplest explanation for the absence of Nna1-like proteins in plants and fungi is that a common mutual ancestor already lacked a Nna1-like gene, and thus the function executed by the Nna1-like proteins is either not required in these organisms or is covered by another gene family. The protozoan and metazoan lineages of Nna1-like genes were assembled from unique components (N-terminal domain) not found in any other gene family, and thus alternative Nna1-like function in plants and fungi could be evolved by assembling components from other origins, like the different lineages of assembling the hedgehog genes (59) , which in turn did not reveal their identify through a data mining search scheme. Bacterial, protozoan, and metazoan Nna1-like homologues are virtually identical in domain organization and belong to the same M14D subfamily, suggesting that the N-terminal domain is an integral part of the protein that cannot be separated from the carboxypeptidase domain during gene duplication; this association precedes the eubacterial/parazoan split, the earliest divergence among existing organisms containing Nna1-like proteins. This observation is consistent with the proposal that the conserved N-terminal domain functions in the folding of the carboxypeptidase domain, as described above.

Nevertheless, from a statistical point of view, 400 bacterial genomes have been completed and 638 are in progress. Of them, only 14 contain Nna1-like genes (<1.5%). Since genomes from Archaea (29 complete and 28 in progress), Fungi (9 and 56), and Plantae (2 and 34) are notably less profuse in databases than those from bacteria, we cannot discard the possibility that Nna-1-like genes may later be found in those other kingdoms. Thus, the low occurrence of Nna-1-like genes in bacterial genomes could also be explained by the relatively small horizontal transfer rate described from eukaryotes (60) , which has been associated with pathogenicity (61) .

Cloning, expression, and enzymatic characterization
Nna1-like peptidases are putative metallocarboxypeptidases and unassigned peptidases in the widely used MEROPS classification because such annotation is based entirely on their theoretical similarity to biochemically characterized MCPs. Therefore, we set out to examine whether Nna1-like peptidases are metal-dependent carboxypeptidases. Accordingly, we proceeded to the recombinant expression and enzymatic characterization of C. elegans AGBL4.

ceAGBL4 has 443 amino acids and includes the carboxypeptidase unit as well as the conserved N-terminal domain. It has been expressed in E. coli and detected in supernatant and pellets of cell lysates by Western blot analysis using anti-His tag antibody. Kinetic analyses for soluble ceAGBL4 were performed using five different chromogenic carboxypeptidase substrates: AzoF and faFF for CPA-like specificity, AzoR and faAK for CPB-like specificity, and BzAE for CPO-like specificity (see Materials and Methods for acronyms and details). Contrary to standard procedures for pancreatic MCPs, enzymatic reactions were assayed in 10 mM TrisCl pH 7.5 without NaCl because the solubility and activity of the enzyme appear to be negatively affected by intermediate or high salt concentration. From such an analysis, it became clear that ceAGBL4 peptidase is an active carboxypeptidase that displays CPA-like and CPB-like specificities in a much less strict fashion than the pancreatic-like enzymes (Table 2 ). ceAGBL4 peptidase had no detectable activity toward C-terminal Glu residues, indicating the absence of CPO-like activity (Table 2) .


View this table:
[in this window]
[in a new window]

 
Table 2. Substrate specificity of ceAGBL4, (–): < 0.1% of substrate cleaved for 20 mina

Irreversible inhibitors of cysteine peptidases (E-64), aspartic peptidases (pepstatin), or serine peptidases (Pefabloc®) did not inhibit the carboxypeptidase activity of ceAGBL4. On the other hand, ceAGBL4 is dramatically inhibited by chelating agents such as o-phenanthroline or EDTA, consistent with all other MCPs. Low concentrations of Zn divalent cation (10–6 to 10–8 M) activate ceAGBL4 whereas concentrations above 10–6 M have an inhibiting effect on enzymatic activity. The activity of ceAGBL4 is completely inhibited by the CPA inhibitor benzylsuccinic acid at 10 mM (Table 2) . Dipeptides Z-Gly-Tyr or Z-Glu-Tyr (15 mM) showed different inhibitory effects on bovine CPA or ceAGBL4; whereas the pancreatic CPA is inhibited to the same extent by both Z-dipeptides, ceAGBL4 is preferentially inhibited by Z-Glu-Tyr than by Z-Gly-Tyr (Table 2) .

Remarkably, enzyme activity was affected by ATP or ADP and, to a lesser extent, by AMP. Nucleotides have an activating effect on enzymatic activity at 2 mM final concentration, with an improvement of 22% for AMP, 134% for ADP, and 68% for ATP (Table 2) . As ceAGBL4 was also activated with ADP, the activation should not be mediated by phosphorylation through protein kinases. This was unexpected because ceAGBL4 peptidase does not display the ATP/GTP binding motif of the P-loop type found in mouse Nna1 (9) . This motif has a distorted high matching probability in biological databases, so its presence in protein sequences does not necessarily assure the function with which it is associated. Although the ATP/GTP binding P-loop type motif is only conserved in AGTPBP1 homologues (M14D3), experimental evidence indicates that ceAGBL4 (Table 2) and human AGBL3 peptidase (data not shown) are activated by ATP/ADP.

Because the secondary structure prediction and fold recognition studies could not determine whether the N-terminal domains of Nna1-like peptidases form proregions that either assist with the folding, interact with substrates, and/or inhibit enzyme activity, we tested whether the active ceAGBL4 protein represented the full-length protein or the product that would result from an ATP/ADP-dependent cleavage between the conserved N-terminal domain and the CP domain. The enzyme preparation was incubated with ADP for 30 min and Western blot analysis performed using the metallocarboxypeptidase antibody. After activation with ADP, the 56 kDa protein was detected, corresponding to the predicted molecular mass of the full-length protein. No protein was detected at 36 kDa, the theoretical molecular mass of the peptidase domain without the N-terminal domain attached (data not shown), indicating that the protein was activated without cleavage.

Fold recognition studies suggested that the N-terminal domain of Nna1-like peptidases might fold as an independent unit with a different fold than the M14A peptidases prodomain; the conserved N-terminal folding domain might also act as a regulatory domain by covering the active site entrance, analogous to the prodomain of the M14A peptidases. In M14A peptidases, both domains are linked by a connecting segment where the cleavage of the prodomain occurs. The ATP/GTP binding signature present in M14D3 peptidases is located 20 residues upstream of the YPYTY motif, which is located right before the first {alpha} helix of the CP domain. This would suggest that nucleotide binding occurs in the connecting segment between the conserved N-terminal domain and the CP domain of M14D3 peptidases. There is a clear activation process mediated by nucleotides, but how this occurs in unclear. It is conceivable that the binding of nucleotides might induce a conformational change that moves the N-terminal domain so that the enzyme becomes activated.

There is a link between tubulinyl-Tyr carboxypeptidase (EC 3.4.17.17) and Nna1-like proteins
As shown above, the computer-based modeling of ceAGBL4 indicates that the active site entrance is much wider and open than in the previously studied carboxypeptidases. Because this AGBL4 of C. elegans is representative of the entire M14D subfamily with respect to the gaps and inserts required for alignment with the other subfamilies, it is likely that the other Nna1-like proteins will have generally similar structures. This, together with the absence of an N-terminal signal peptide in the eukaryotic members of this new protein subfamily, implies that they function in the cleavage of cytosolic proteins. The question then arises, what is the natural substrate for Nna1 and the Nna1-related enzymes? The only cytosolic protein known to undergo C-terminal processing is the {alpha} subunit of tubulin. This protein is initially produced with a C-terminal Tyr, which is removed by the tubulinyl-Tyr carboxypeptidase (tubCP) and reattached by a specific tubulin ligase. Although the ligase has been identified (62) , tubCP has not, despite a great deal of effort since it was first described in 1977 (63) .

It has been reported that tubCP activity is increased with ADP, inhibited by intermediate to high concentrations of NaCl, and shows strong specificity for Z-dipeptides, which, in the carboxyl end, contain Tyr or Phe linked to glutamic acid (64) . Such an enzyme is thought to prefer polymerized over dimeric tubulin as substrate and performs tubulin detyrosination better on microtubules (65 66 67) . In proliferating cells, tubCP activity is not detectable, but when cells undergo differentiation, the enzyme is activated (65) . In the case of nervous cells, tubCP activity seems to be restricted to differentiating neurons whose processes are rich in detyrosinated tubulin, indicating that tubCP is involved in growing of processes (65 , 68) .

Several observations from the present study are consistent with the possibility that Nna1 and/or related enzymes function as tubCP. First, cytosolic localization of Nna1 and Nna1-related enzymes fits with this proposed role. Second, the modeling suggests the ability to remove C-terminal Tyr residues from proteins based on both a consideration of the substrate binding pocket (which can accommodate a Tyr) and the loops surrounding the active site pocket (discussed above). Third, the direct demonstration that one of the Nna1-like proteins has enzymatic activity toward small peptides that contain C-terminal hydrophobic groups implies that this enzyme is capable of cleaving C-terminal Tyr. Fourth, the nucleotide dependence (ADP/ATP) of the activity of that recombinant enzyme is congruent with its potential involvement in a pathway regulated energetically, as is the transformation of tubulin in cytoskeleton remodeling, and with the reported activation of tubulin carboxypeptidase by ADP (64) . Fifth, the higher inhibitory capability of the dipeptide Z-Glu-Tyr vs. Z-Gly-Tyr, and the negative effects of moderate or high salt concentrations on the activity of such enzyme, match the properties of the previously reported tubulin Tyr carboxypeptidase and contrast with other well-known metallocarboxypeptidases from different subfamilies.

Besides the similarities between the Nna1-like family and the Tubulin-Tyr carboxypeptidase, additional evidence favors this hypothesis. For example, Nna1 mRNA is up-regulated after axotomy, with a marked increase during the reinervation process (9) . Some forms of {alpha}- and ß-tubulin are also increased in growing or regenerating axons (69) ; tubCP activity is regulated relative to its substrate, Tyr-tubulin, which is increased during neurite outgrowth (70) and after axotomy (71) . During development, Nna1 expression was restricted to differentiating, but not proliferating, neural population (9) , sharing the expression pattern of tubulin carboxypeptidase.

An important related issue in such a comparison is the expected number of tubulin carboxypeptidases and Nna1-like variants. How many forms could be cross-assigned; which ones are identical, fitting each other? Assuming the hypothesis of potential identities (either partial or total), it is worth mentioning that our genomics scanning has revealed there are some organisms with only one Nna1-related gene, corresponding to the only metallocarboxypeptidase gene they have. Those organisms have {alpha}-tubulin and active tubulin ligase, which are presumably essential for cellular structuring and remodeling. In these organisms, we suggest that the single Nna1-like form is present is the Tubulinyl-Tyr, carboxypeptidase. In organisms with multiple Nna1-like enzymes (there are from one to six Nna1-related genes in different species), determination of the enzyme(s) involved in tubulin processing will require further investigations. The biological significance of tubulin would easily justify the occurrence of various alternative processing forms, although some organisms might survive with a single one or might develop substitutive mechanisms. Perhaps this would be the case for fungi and plants, which apparently lack Nna1-like enzymes, if the equivalence with tubulin carboxypeptidase(s) is finally proved. It is important to mention that the tyrosination cycle is highly conserved among eukaryotes and has been found in most cells where it has been searched for, with the exception of the fission yeast S. pombe (72) . It was demonstrated that in Saccharomyces cerevisiae there is no turnover of the {alpha}-tubulin C-terminal and that the tubulin ligase activity is absent despite the presence of a gene with a significant homology to tubulin-tyrosine ligase in other organisms (73 , 74) . The lack of Nna1-like peptidases in fungi is consistent with the lack of the detyrosinylation cycle of {alpha}-tubulin in those organisms.

Direct experimental analyses to substantiate the above hypothesis have been performed or are in advanced stages in our groups. First of all, preliminary evidences collected by us (M. Rodriguez de la Vega and S. Tanco et al., unpublished results) indicates that ceAGBL4 and hAGBL3 specifically trim tubulin at the C terminus in vitro, showing a greater capability on polymerized tubulin over its monomeric-dimeric species. Second, mice lacking Nna1 (pcd mice) were found to accumulate the Tyr form of {alpha}-tubulin, and have a deficiency of the detyrosinylated form in certain cell types (1) . In this study, a preferential (but not exclusive) localization of the distinct Nna1-like proteins is found at the cytoplasm of several different cell types investigated. Because of this, we propose to rename them CCP proteins, for cytosolic carboxypeptidases (first "C" for cytosolic and the "CP" for carboxypeptidase).

Overall, the present study indicates that Nna1-like peptidases represent a new M14 subfamily (M14D); they are ADP/ATP dependent and have the potential to act on protein substrates, particularly in the processing of tubulin as Tubulinyl-Tyr carboxypeptidase. Additional experimental studies are required to further substantiate this hypothesis, which should have great biological and biomedical significance.


   ACKNOWLEDGMENTS
 
We are greatly indebted to Prof. Tom Blundell and Dr. Ricardo Nuñez-Miguel (Cambridge University, Cambridge, UK) and Prof. Andrej Sali (UCSF, San Francisco, CA, USA) for the training in modeling studies for two authors of this paper (M.R.V. and A.H.), as well as for collaborative related work to be published. This work has been supported by grants from the Spanish Ministry of Education and Science (BIO2003–07179 to J.M.B and BIO2004–05879 and GEN2004-20642-C09-05 to F.X.A) and from the Centre de Referencia en Biotecnologia (Generalitat de Catalunya). M.R.V. acknowledges a doctoral fellowship from the Universitat Autònoma de Barcelona.


   FOOTNOTES
 
2 Once this phylogenetic analysis was completed (May 2006), two sequences from Clostridium and one from Symbiobacterium were added to the MEROPS database under subfamily M14C, and consequently are not included in the trees.

Received for publication September 25, 2006. Accepted for publication October 25, 2006.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES
 

  1. Kalinina, E., Biswas, R., Berezniuk, I., Hermoso, A., Aviles, F. X., Fricker, L. D. (2007) A novel subfamily of mouse cytosolic carboxypeptidases. FASEB J. 20In press
  2. Rawlings, N. D., Morton, F. R., Barrett, A. J. (2006) MEROPS: the peptidase database. Nucleic Acids Res. 34,D270-D272[Abstract/Free Full Text]
  3. Vendrell, J., Aviles, F. X., Fricker, L. D. (2004) Metallocarboxypeptidases. Messerschmidt, A. Bode, W. Cygler, M. eds. Handbook of Metallopeptidases ,167-192 John Wiley & Sons Chichester, UK.
  4. Barrett, A. J., Rawlings, N. D., Woessner, J. F. (1999) Introduction: clan MC containing metallocarboxypeptidases. Barrett, A. J. Rawlings, N. D. Woessner, J. F. eds. Handbook of Proteolytic Enzymes Academic San Diego, CA. CDROM, chapter 450
  5. Hourdou, M. L., Guinand, M., Vacheron, M. J., Michel, G., Denoroy, L., Duez, C., Englebert, S., Joris, B., Weber, G., Ghuysen, J. M. (1993) Characterization of the sporulation-related gamma-D-glutamyl-(L)meso-diaminopimelic-acid-hydrolysing peptidase I of Bacillus sphaericus NCTC 9602 as a member of the metallo(zinc) carboxypeptidase A family. Modular design of the protein. Biochem. J. 292,563-570
  6. Webster, D. R. (2004) Tubulinyl-Tyr carboxypeptidase. Barrett, A. J. Rawlings, N. D. Woessner, J. F. eds. Handbook of Proteolytic Enzymes ,2111-2113 Elsevier London. UK.
  7. He, G. P., Muise, A., Li, A. W., Ro, H. S. (1995) A eukaryotic transcriptional repressor with carboxypeptidase activity. Nature 378,92-96[CrossRef][Medline]
  8. Too, C. K., Vickaryous, N., Boudreau, R. T., Sangster, S. M. (2001) Identification and nuclear localization of a novel prolactin and cytokine-responsive carboxypeptidase D. Endocrinology 142,1357-1367[Abstract/Free Full Text]
  9. Harris, A., Morgan, J. I., Pecot, M., Soumare, A., Osborne, A., Soares, H. D. (2000) Regenerating motor neurons express Nna1, a novel ATP/GTP-binding protein related to zinc carboxypeptidases. Mol. Cell. Neurosci. 16,578-596[CrossRef][Medline]
  10. Fernandez-Gonzalez, A., La Spada, A. R., Treadaway, J., Higdon, J. C., Harris, B. S., Sidman, R. L., Morgan, J. I., Zuo, J. (2002) Purkinje cell degeneration (pcd) phenotypes caused by mutations in the axotomy-induced gene, Nna1. Science 295,1904-1906[Abstract/Free Full Text]
  11. Wang, T., Parris, J., Li, L., Morgan, J. I. (2006) The carboxypeptidase-like substrate-binding site in Nna1 is essential for the rescue of the Purkinje cell degeneration (pcd) phenotype. Mol. Cell. Neurosci. 33,200-213[CrossRef][Medline]
  12. Wu, C. H., Yeh, L. S., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R. S., Suzek, B. E., et al (2003) The Protein Information Resource. Nucleic Acids Res. 31,345-347[Abstract/Free Full Text]
  13. Durbin, R., Eddy, S., Krogh, A., Mitchison, G. (1998) Hidden Markov models. Durbin, R. Eddy, S. Krogh, A. eds. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Cambridge University Press Cambridge, UK.
  14. Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L., et al (2004) The Pfam protein families database. Nucleic Acids Res. 32,D138-D141[Abstract/Free Full Text]
  15. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215,403-410[CrossRef][Medline]
  16. Felsenstein, J. (2005) PHYLIP (Phylogeny Inference Package) Free package from the University of Washington, Department of Genome Sciences Seattle, WA.
  17. Mizuguchi, K., Deane, C. M., Blundell, T. L., Johnson, M. S., Overington, J. P. (1998) JOY: protein sequence-structure representation and analysis. Bioinformatics 14,617-623[Abstract/Free Full Text]
  18. Ning, Z., Cox, A. J., Mullikin, J. C. (2001) SSAHA: a fast search method for large DNA databases. Genome Res. 11,1725-1729[Abstract/Free Full Text]
  19. Birney, E., Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., et al (2006) Ensembl 2006. Nucleic Acids Res. 34,D556-D561[Abstract/Free Full Text]
  20. McGuffin, L. J., Bryson, K., Jones, D. T. (2000) The PSIPRED protein structure prediction server. Bioinformatics 16,404-405[Abstract/Free Full Text]
  21. Shi, J., Blundell, T. L., Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310,243-257[CrossRef][Medline]
  22. Sali, A., Blundell, T. L. (1993) Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234,779-815[CrossRef][Medline]
  23. Fiser, A., Sali, A. (2003) Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol. 374,461-491[Medline]
  24. Laskowski, R. A., Rullmannn, J. A., MacArthur, M. W., Kaptein, R., Thornton, J. M. (1996) AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 8,477-486[Medline]
  25. Eisenberg, D., Luthy, R., Bowie, J. U. (1997) VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol. 277,396-404[Medline]
  26. Nicholas, K. B., Nicholas, H.B., Jr, Deerfield, D. W. (1997) GeneDoc: analysis and visualization of genetic variation. EMBNEW 4,14
  27. Sali, A., Blundell, T. L. (1990) Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J. Mol. Biol. 212,403-428[CrossRef][Medline]
  28. DeLano, W. L. (2002) The PyMOL Molecular Graphics System DeLano Scientific San Carlos, CA. (http://www.pymol.org)
  29. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., Higgins, D. G. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25,4876-4882[Abstract/Free Full Text]
  30. Kumar, S., Tamura, K., Nei, M. (2004) MEGA3: Integrated software for molecular e