|
|
||||||||
|
FJ
EXPRESS SUMMARY ARTICLE The Full-length version of this article is also available, published online December 3, 2002 as doi:10.1096/fj.02-0351fje. |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



Division of Pulmonary and Critical Care, Department of Medicine, Brigham and Womens Hospital at Harvard Medical School, Boston, Massachusetts, USA; Department of
* Obstetrics and Gynecology,
Division of Biostatistics, and Departments of
Pathology and Immunology, and
Cell Biology and Physiology, Washington University School of Medicine, St. Louis, Missouri, USA
2Correspondence: Division of Pulmonary and Critical Care Medicine, Brigham and Womens Hospital, Harvard Medical School, 75 Francis St., Boston, MA 02115, USA. E-mail: tmariani{at}rics.bwh.harvard.edu
SPECIFIC AIM
The use of expression microarrays to determine bona fide changes in gene expression between experimental paradigms is confounded by noise due to variability in measurement. We have attempted to assess the inherent technical variability of commercial high-density oligonucleotide (Affymetrix) microarrays, focusing on the variability related to the common processes post-RNA sample isolation.
PRINCIPAL FINDINGS
1. Technical variability has a significant intensity-specific bias
We have focused on technical variability and surmised that precision, reflecting the common processes of target hybridization, scanning, and analysis, depends on signal intensity. To examine the influence of expression microarray signal intensity on signal precision, we performed a series of replicate experiments with the Affymetrix U95 chip set analyzed with MAS 4.0. Using five replicates for each of the three experimental conditions, we defined an "outlier" as a gene (probe set) with poor replicate reproducibility (> 2-fold change in expression in at least 7 of the 10 possible pairwise comparisons) within a condition. All comparisons were made using replicate measures from parallel hybridization of an identical labeled target. The percentage of outliers was 7.99.7% of all probe sets (depending on the treatment condition) and 6.910.8% for all conditions depending on the individual chip in the set analyzed. We investigated the intensity-specific distribution of outlier occurrence at a defined intensity range. The rate of the outliers (number of outliers relative to the total number of genes "expressed" at that defined intensity range) was significantly biased toward low signal intensity values. Similar results were obtained when the data was reanalyzed with MAS 5.0.
2. Simple, static fold change thresholds are too stringent at high intensities and not stringent enough at low intensities
To further explore the relationship between variability and intensity, we plotted the variability in fold change within replicates relative to intensity (Fig. 1
). These data demonstrate that 1) at low intensity, replicate variability is much higher than commonly used thresholds (2- to 3-fold). For chip A, 12% of genes had a mean fold change > 2.0 at a SADV of < 1000. 2) At high intensity, replicate variability is lower than commonly used thresholds. For chip A, 100% of genes had a mean fold change of < 2.0 at a SADV of > 30,000. 3) Variability as measured by replicate fold change is somewhat "chip" specific.
|
3. Technical variability can be described and accommodated by a "variable fold change" threshold
We devised a method to use intensity-specific variability in order to determine significant changes in gene expression. The standard deviation of replicate intensities was calculated and plotted relative to mean intensity, giving a measure of intensity-specific variance (Fig. 2
A). A LOESS curve was fit to these data and used to estimate the variance at any given intensity. This strategy was initially applied to each chip (U95A-E) independently, as the variability for each chip was distinct (Fig. 1)
, then to the entire data set, generating a chip-independent estimate. The LOESS curve enabled us to assign confidence (P value) to an intensity-specific fold change, allowing calculation of a variable fold change threshold for any baseline intensity (SADV) at any given P value. We defined a P value at an arbitrary confidence level and constructed a function that demonstrates the dependence of significant fold changes on absolute baseline intensity. The data in Fig. 2B
depict fold change thresholds as a function of absolute intensity with a given P value of 0.05 and 0.01. Additional analyses of the variability and a "differential expression" calculator based on our variable fold change threshold are available as supplementary material (http://lungtranscriptome.bwh.harvard.edu).
|
CONCLUSIONS AND SIGNIFICANCE
The use of expression microarrays is widespread, but determining the meaningful portion of the abundant data generated from this technology when applied to a discriminate task (e.g., determining differences in two samples) is difficult. Diverse sources of variability affect the reliability of expression microarray data and limit validity of conclusions drawn from these experiments. In this study, we focused on technical components downstream from sample isolation and labeling inherent in the protocol and technology, and should be universally consistent. Variability derived from other sources (Fig. 3
) is not defined by this study. We have quantified the technical variability of oligonucleotide-based microarrays and applied these data to increase confidence in determining bona fide changes in gene expression. We found that the variability was intensity specific. We have developed a "variable fold change" method that accommodates non-uniform variability at different intensities to improve prediction of changes in gene expression. Testing this method indicates that it removes intensity-specific bias and results in a 5- to 10-fold reduction in the number of false-positive changes when compared with a static fold change threshold (2.0-fold). An interesting finding of this study is the distinct distribution of variability between individual chips in the U95 set as shown in Fig. 1
. These distinctions are evident in Fig. 2A
as well, but do not result in notable differences to the variable fold change thresholds in Fig. 2B
.
|
We have applied LOESS fitting to derive a realistic prediction of the variability of a particular observed intensity level. Predicted values of standard deviation are used to construct a score statistic of relative difference in gene expression. The assumption of normality of data is implicit and essential only in calculating P values from the defined t statistic. When normality is violated, P values lose their classical definition of probability, but the calculated values of the statistic can be used to rank order genes that are most likely to be differentially expressed. Although probe-specific contributions to variability (or probe-dependent variability) likely exist, they should not negate our intensity-specific approach.
The relative similarity in the variable fold change thresholds between chips and conditions suggests that these thresholds may be widely applicable to expression data generated from other (Affymetrix or) similar high-density oligonucleotide-based microarrays. We suggest that this approach can be widely incorporated into microarray data analysis methods to improve prediction of significant changes in gene expression of oligonucleotide microarray experiments and reduce false leads even in the absence of replicates. Although we believe this represents a valuable incorporation and application of the technical variability in determining differential expression, additional improved noise models are clearly warranted. Proof that this method is biologically relevant, as evidenced by consistency with results from classical measures of gene expression, is the subject of current investigation.
FOOTNOTES
1 To read the full text of this article, go to http://www.fasebj.org/cgi/doi/10.1096/fj.02-0351fje; to cite this article, use FASEB J. (December 3, 2002) 10.1096/fj.02-0351fje ![]()
This article has been cited by other articles:
![]() |
S. Bhattacharya and T. J. Mariani Transformation of expression intensities across generations of Affymetrix microarrays using sequence matching and regression modeling Nucleic Acids Res., October 13, 2005; 33(18): e157 - e157. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. T. Schaiff, I. Bildirici, M. Cheong, P. L. Chern, D. M. Nelson, and Y. Sadovsky Peroxisome Proliferator-Activated Receptor-{gamma} and Retinoid X Receptor Signaling Regulate Fatty Acid Uptake by Primary Human Placental Trophoblasts J. Clin. Endocrinol. Metab., July 1, 2005; 90(7): 4267 - 4275. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Stolovitzky, A. Kundaje, G. A. Held, K. H. Duggar, C. D. Haudenschild, D. Zhou, T. J. Vasicek, K. D. Smith, A. Aderem, and J. C. Roach Statistical analysis of MPSS measurements: Application to the study of LPS-activated macrophage gene expression PNAS, February 1, 2005; 102(5): 1402 - 1407. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. H. Mecham, D. Z. Wetmore, Z. Szallasi, Y. Sadovsky, I. Kohane, and T. J. Mariani Increased measurement accuracy for sequence-verified microarray probes Physiol Genomics, August 11, 2004; 18(3): 308 - 315. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Bildirici, C.-R. Roh, W. T. Schaiff, B. M. Lewkowski, D. M. Nelson, and Y. Sadovsky The Lipid Droplet-Associated Protein Adipophilin Is Expressed in Human Trophoblasts and Is Regulated by Peroxisomal Proliferator-Activated Receptor-{gamma}/Retinoid X Receptor J. Clin. Endocrinol. Metab., December 1, 2003; 88(12): 6056 - 6062. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |