|
|
||||||||
|
FJ
EXPRESS SUMMARY ARTICLE The Full-length version of this article is also available, published online July 1, 2004 as doi:10.1096/fj.04-1797fje. |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



* Center for Bioinformatics, Saarland University, Saarbrücken, Germany; and
Institute of Human Genetics, Saarland University, Homburg/Saar, Germany
3Correspondence: Department for Simulation of Biological Systems, Eberhard Karls University, Sand 14, D-72076 Tübingen, Germany. E-mail: oliver.kohlbacher{at}uni-tuebingen.de
SPECIFIC AIMS
The primary aim of this study was to create a flexible and extensible system for the integrative analysis of heterogeneous cancer-related data. This type of analysis can correlate data sets from different sources and of different types and is thus ideally suited to answer questions about multicausal diseases such as cancer. Using this approach we addressed some fundamental questions of tumor immunology.
PRINCIPAL FINDINGS
1. Integration of cancer related data from different fields
Cancer-related databases usually focus on specific fields of research (e.g., cancer genetics or cancer immunology), whereas the complexity of cancer genesis requires an integrated analysis of heterogeneous data from several sources. Here we present Cancer Associated Proteins (CAP), a novel analysis system for cancer-related data. CAP integrates data from our own experiments and multiple external databases, including the SEREX database, RefSeq, LocusLink, SWISS-PROT, Cancer GeneticsWeb, and NCI60. The data are augmented with functional annotations based on predictions. CAP also offers tools for statistical analysis of these data across data sets. We demonstrate how differing types of data can be integrated and analyzed successfully. There is no single straightforward approach to modeling biological data since it is very heterogeneous by nature. The CAP data model is designed in such a way that it can accommodate and integrate diverse types of data. Unified Modeling Language (UML) was used for modeling and provides a well-defined data model that can easily be extended to include new data types. CAP is freely accessible on our website at http://www.bioinf.uni-sb.de/CAP/.
2. Correlation of autoimmune responses and genetic alterations
We analyzed a set of genes identified in SEREX experiments for genetic alterations. Cancer GeneticsWeb (CGW) contains information about genes that are altered in various cancer types. Out of 723 genes identified in SEREX experiments, we found 17 genes and two splice variants in CGW. This first step was done without any concern about the cancer types in which the genes were identified. Our next step was to look for genes that are found in the same cancer type in both SEREX experiments and CGW. A total of seven genes was identified, including two genes carrying specific mutations or polymorphisms, TP53 and GSTT1 (Glutathione S-transferase Theta 1). TP53 has previously been found to cause immune responses in primary colon carcinoma and in breast carcinoma, both known to carry TP53 mutations. Mutations in TP53 have also been found in a large number of other tumor types in patients that do not have antibodies against TP53. GSTT1 antibody responses occur in patients with breast cancer. This tumor is associated with specific GSTT1 polymorphisms. However, these types of polymorphisms also occur in other tumors including head and neck cancer without an antibody response. Other examples of genes include NME2/NME1 (protein NM23B/A expressed in nonmetastatic cells 2/1), HSPCA (heat shock 90 kDa protein 1), Ki-67 (MKI67), and MIF (macrophage migration inhibitory factor). NME1 and NME2 have been reported as immunogenic and overexpressed in malignant colon carcinoma, HSPCA in renal cell carcinoma, Ki-67 in melanoma and MIF in melanoma. From available data, we see no evidence that genetic alterations, such as mutations or polymorphisms, cause immune responses in cancer.
3. Correlation of autoimmune responses and gene expression
To analyze expression levels, we used the gene expression profile data of the NCI60 microarray project (http://genome-www.stanford.edu/nci60/). In this project, cDNA microarrays are used to explore the variation of gene expression in 8000 genes from 60 cancer cell lines. Genes that show at least a 2-fold increase in expression levels are considered to be overexpressed. We also require all genes to have measured expression levels in at least 4 of the 60 cell lines. This results in a set of 319 genes. Independent of cancer type, we found 277 (87%) of the genes to be overexpressed in at least one cell line. Of these 277 genes, 69 were found to be overexpressed in at least 10% of all evaluated cell lines. The 60 cancer clones in the NCI60 data can be grouped into a number of cancer types (e.g., melanoma, breast cancer, or colon cancer). Expression levels for genes found in the same cancer type in both SEREX experiments and the NCI60 data were extracted. The criterion for selection was that at least three tumor-specific cell lines show overexpression, giving a total of 13 genes. The genes and SEREX-related information are presented in Table 1
. In terms of variations in expression, we see indications that overexpression contributes to the antibody responses against tumor antigens. The majority of the 319 genes are actually found to be overexpressed in the NCI60 data set.
|
CONCLUSIONS AND SIGNIFICANCE
It has been hypothesized that immunogenic antigens might stem from genes that are altered by tumor-specific mutations or that have a changed expression profile in a certain tumor. We have used CAP to merge information from the fields of genetics and immunology. Our preliminary results suggest that mutations are not a significant contributor to raising an antibody response against tumor antigens, whereas overexpression seems to play a more important role. For a schematic illustration of the data sources used in the analysis, see Fig. 1
.
|
It may be misleading to turn these findings into general rules concerning immune responses in cancer.Rather, we show that CAP makes this kind of analysis possible by integrating different sources of data. Statistics on specific data sets might help to understand the mechanisms behind certain cancer types. There are many reports on the correlations between cancers and chromosomal aberrations. One example is the changed expression patterns of genes in ovarian carcinomas. These genes show reduced expression in the 3p25.5-3p21.31 region and increased expression of genes from 3q13.33-3q28. CAP can be a useful tool in the identification of such chromosomal regions. Many cancer types have disrupted protein and signaling pathways. An example is the retinoblastoma protein pathway. Analysis of protein function and subcellular location are important steps in the identification of such pathways. CAP provides tools for both finding protein functional families associated to certain cancers as well as analysis of protein subcellular location for sets of sequences.
The CAP data model and analysis system bridges fields of cancer research that are largely separated (e.g., cancer genetics and cancer immunology). We present a system to combine heterogeneous data from diverse focus areas of cancer research. The data model in CAP can easily be extended to incorporate new data types. We also believe that the need for analysis systems like CAP will continue to grow as new data are accumulated.
FOOTNOTES
To read the full text of this article, go to http://www.fasebj.org/cgi/doi/10.1096/fj.04-1797fje;
1 These authors contributed equally to this work. ![]()
2 Present address: Department for Simulation of Biological Systems, Eberhard Karls University, Sand 14, D-72076 Tübingen, Germany ![]()
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |