FASEB J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


FJ EXPRESS SUMMARY ARTICLE
The
Full-length version of this article is also available, published online April 22, 2003 as doi:10.1096/fj.02-1052fje.
Published as doi: 10.1096/fj.02-1052fje.
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
17/9/1141
02-1052fjev1    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by STADLER, M. B.
Right arrow Articles by STADLER, B. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by STADLER, M. B.
Right arrow Articles by STADLER, B. M.
(The FASEB Journal. 2003;17:1141-1143.)
© 2003 FASEB

Allergenicity prediction by protein sequence1

MICHAEL B. STADLER2 and BEDA M. STADLER

Institute of Immunology, University of Bern, Switzerland

2Correspondence: Institute of Immunology, Sahlihaus 2, Inselspital, CH-3010 Bern, Switzerland. E-mail: michael.stadler{at}insel.ch

SPECIFIC AIM

We have critically reviewed the performance of the current guidelines proposed by Food and Agriculture Organization (FAO) and World Health Organization (WHO) concerning their sequence-based allergenicity prediction. We propose a new strategy based on sequence motifs identified from a new allergen database.

PRINCIPAL FINDINGS

1. Construction of an allergen sequence database
As allergens do not share common structural characteristics, the use of sequence similarity in allergenicity evaluation is highly dependent on a database of allergens serving as reference. Although most sequences of allergenic proteins are known and publicly available, no single database exists that contains all of these sequences. We therefore extracted all accession numbers in published allergen lists and downloaded the corresponding sequences from the public sequence databases (Swiss-Prot, PIR and GenBank). We have written a script automatically performing this task that allows frequent database updates and facilitates the error-prone and time-consuming process of downloading the sequences manually. The allergen database used in this study was generated on February 11, 2002, and contained 779 nonredundant protein sequences, including translated allergen genes and 99 generated sequence variants that were described in database annotation.

2. Current allergenicity prediction has a low precision
According to the FAO/WHO guidelines for allergenicity evaluation of foods derived from biotechnology, a query protein is potentially allergenic if it either has an identity of at least six contiguous amino acids (identity length n=6) or >35% sequence similarity over a window of 80 amino acids when compared with known allergens. We have performed an evaluation according to these guidelines for all proteins in Swiss-Prot, rice, human trGEN (automatic translation of human genome), and allergen databases, rating a query as allergenic, if either of the two criteria were fulfilled (Table 1 ). Almost all allergens in our database were predicted correctly. However, using a value of 6 for the identity length n as proposed by FAO/WHO, > two-thirds of Swiss-Prot proteins were also rated as allergens, and this figurewas hardly reduced if known allergens were removed from Swiss-Prot before analysis. Similarly high percentages of predicted allergens were found for rice and human trGEN sequences. Based on the present clinical observation by allergologists, it can be excluded that such high predictive percentages in Swiss-Prot, or even trGEN reflect the numbers of true allergens. If prediction was performed using higher values for the identity length n, the stringency of the method was drastically increased, even though the numbers of predicted allergens (2% for trGEN, ~7% for other databases) were still higher than the expected percentage of real allergens (~0.4% for Swiss-Prot based on Swiss-Prot allergen index). We therefore tried to find a new approach to quantify potential cross-reactivity of a query sequence with a known allergen.


View this table:
[in this window]
[in a new window]
 
Table 1. Allergen prediction according to FAO/WHO guidelines

3. Most allergens can be matched by only 52 allergen motifs
To generate a minimal set of sequence motifs representing allergens for use in allergenicity prediction, an iterative motif discovery was performed. Starting with all 779 sequences in the allergen database, the following procedure was performed iteratively until no more statistically relevant motifs were identified: MEME motif discovery tool was used to identify the most relevant motif contained in the allergen sequences. The motif was converted into a generalized profile and its scoring parameters were scaled on a randomized version of Swiss-Prot. Matching allergen sequences were removed from the allergen database and remaining sequences were submitted to the next iteration of motif discovery.

Only 52 statistically relevant allergen motifs were identified in the allergen database. 644 of 779 allergen sequences were matched by one or several of these motifs. Of the 135 sequences that did not match an allergen motif, 78 corresponded to partial allergen sequences and therefore could not be optimally aligned to an allergen motif; the remaining 57 were assumed to represent relatively unique allergens. We decided not to generate potentially unrepresentative motifs for each of these 135 allergen sequences. Nevertheless, the 135 sequences were included in motif-based allergenicity prediction (see below).

4. Motif-based allergenicity prediction has an increased performance compared with the FAO/WHO method
In motif-based allergenicity prediction, a query protein sequence was first scanned using the 52 allergen motifs. If no matching motif was found, a second analysis step was performed for allergenicity prediction: The query sequence was aligned to the allergen sequences not matching one of the allergen motifs (135 sequences). The query was rated allergenic if it either matched an allergen motif or scored better than an E value of 10-8 in the pairwise sequence alignment step. Prediction was evaluated and compared with the FAO/WHO method by performing allergenicity prediction for sequences in Swiss-Prot (Table 2 ) and a synthetic test database. Consistent results were obtained for predictions in both databases, namely, that both FAO/WHO and motif-based method were highly sensitive. However, as expected from the high number of predicted allergens, the FAO/WHO method was found to produce many false positives. Only ~1 in 200 predicted allergens was a true allergen when using the FAO/WHO method. With ~1 true allergen in 10 predicted allergens, precision of the motif-based method was ~20-fold higher (Table 2) .


View this table:
[in this window]
[in a new window]
 
Table 2. Comparison of FAO/WHO and motif-based allergen predictions for Swiss-Prot proteins

CONCLUSIONS AND SIGNIFICANCE

Although the scientific community agrees on including sequence similarity in evaluation of allergenicity of foods derived from biotechnology, no consensus has been reached on how to perform similarity testing. In the current study we demonstrate that the allergen prediction method proposed by FAO/WHO has a very low precision, predicting the majority of Swiss-Prot and rice proteins as allergens. It is evident that with such a high level of noise, the method is unable to discriminate between nonallergens and allergens. Moreover, the method could lead to a general overestimation of allergenic potential and thus require disproportionately high efforts in clinical risk assessment. We showed that an increased value for the identity length parameter drastically reduced the number of false positives, although the number of predicted allergens was still higher than the expected number of true allergens based on clinical observation. We assume that it may not be possible to define an absolute threshold for immunological cross-reactivity in terms of percent identical residues. Ideally, identities at surface exposed positions that can be part of IgE binding epitopes should receive an increased weight in allergenicity prediction. However, local sequence alignment tools such as BLASTP or FASTA as used in FAO/WHO allergenicity prediction are not designed for this purpose. Thus, we propose a new method based on allergen motifs for allergenicity prediction, providing the flexibility to apply position specific scoring schemes, define individual cross-reactivity threshold scores for different allergen families, and include spatial information in sequence analysis.

The allergen motifs we used were identified and scaled according to an automated protocol. We have shown that allergenicity prediction based on these motifs is possible with high sensitivity and greatly improved precision compared with the current method. Because of the improved signal-to-noise ratio of motif-based allergenicity prediction, searching sequence databases for new potential allergens becomes feasible. Nevertheless, manual inspection, such as realigning motif containing allergens and construction of optimized motifs, eventually focusing on surface accessible residues, has the potential to further increase prediction performance.

It cannot be ruled out that a protein without sequence similarity to known allergens might nevertheless cause an allergic reaction. Further investigation of sequence–structure relationship is needed to improve our understanding of sequence similarity leading to immunological cross-reactivity. Yet allergenicity prediction based on protein sequence provides an important tool to identify potential cross-reactivity with known allergens. In the future, increasing numbers of allergenic proteins will be identified, resulting in a more comprehensive set of allergen motifs and probably eliminating the need to perform pairwise alignments in motif-based allergenicity prediction. Our motif-based approach permits more flexible allergenicity prediction and provides a reasonable tool in risk assessment to identify transgenes that require further investigation by other techniques.



View larger version (22K):
[in this window]
[in a new window]
 
Figure 1. Schematic diagram.

FOOTNOTES

1 To read the full text of this article, go to http://www.fasebj.org/cgi/doi/10.1096/fj.02-1052fje; to cite this article, use FASEB J. (April 22, 2003) 10.1096/fj.02-1052fje




This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
A. M. Barrio, D. Soeria-Atmadja, A. Nister, M. G. Gustafsson, U. Hammerling, and E. Bongcam-Rudloff
EVALLER: a web server for in silico assessment of potential protein allergenicity
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W694 - W700.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. H. Zhang, J. L. Y. Koh, G. L. Zhang, K. H. Choo, M. T. Tammi, and J. C. Tong
AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins
Bioinformatics, February 15, 2007; 23(4): 504 - 506.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Soeria-Atmadja, T. Lundell, M. G. Gustafsson, and U. Hammerling
Computational detection of allergenic proteins attains a new level of accuracy with in silico variable-length peptide extraction and machine learning
Nucleic Acids Res., August 29, 2006; 34(13): 3779 - 3793.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Saha and G. P. S. Raghava
AlgPred: prediction of allergenic proteins and mapping of IgE epitopes.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W202 - W209.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
A. Silvanovich, M. A. Nemeth, P. Song, R. Herman, L. Tagliani, and G. A. Bannon
The Value of Short Amino Acid Sequence Matches for Prediction of Protein Allergenicity
Toxicol. Sci., March 1, 2006; 90(1): 252 - 258.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
K. Thomas, G. Bannon, S. Hefle, C. Herouet, M. Holsapple, G. Ladics, S. MacIntosh, and L. Privalle
In Silico Methods for Evaluating Human Allergenicity to Novel Proteins: International Bioinformatics Workshop Meeting Report, 23-24 February 2005
Toxicol. Sci., December 1, 2005; 88(2): 307 - 310.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Furmonaviciene, B. J. Sutton, F. Glaser, C. A. Laughton, N. Jones, H. F. Sewell, and F. Shakib
An attempt to define allergen-specific molecular surface features: a bioinformatic approach
Bioinformatics, December 1, 2005; 21(23): 4201 - 4204.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Riaz, H. L. Hor, A. Krishnan, F. Tang, and K.-B. Li
WebAllergen: a web server for predicting allergenic proteins
Bioinformatics, May 15, 2005; 21(10): 2570 - 2571.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Ås. K. Björklund, D. Soeria-Atmadja, A. Zorzet, U. Hammerling, and M. G. Gustafsson
Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins
Bioinformatics, January 1, 2005; 21(1): 39 - 50.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
17/9/1141
02-1052fjev1    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by STADLER, M. B.
Right arrow Articles by STADLER, B. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by STADLER, M. B.
Right arrow Articles by STADLER, B. M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS