FASEB J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


FJ EXPRESS SUMMARY ARTICLE
The
Full-length version of this article is also available, published online December 3, 2002 as doi:10.1096/fj.02-0351fje.
Published as doi: 10.1096/fj.02-0351fje.
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
17/2/321
02-0351fjev1    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by MARIANI, T. J.
Right arrow Articles by SADOVSKY, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by MARIANI, T. J.
Right arrow Articles by SADOVSKY, Y.
(The FASEB Journal. 2003;17:321-323.)
© 2003 FASEB

A variable fold change threshold determines significance for expression microarrays1

THOMAS J. MARIANI2, VIKRAM BUDHRAJA*, BRIGHAM H. MECHAM, C. CHARLES GU{dagger}, MARK A. WATSON{ddagger} and YOEL SADOVSKY*,§

Division of Pulmonary and Critical Care, Department of Medicine, Brigham and Women’s Hospital at Harvard Medical School, Boston, Massachusetts, USA; Department of
* Obstetrics and Gynecology,
{dagger} Division of Biostatistics, and Departments of
{ddagger} Pathology and Immunology, and
§ Cell Biology and Physiology, Washington University School of Medicine, St. Louis, Missouri, USA

2Correspondence: Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis St., Boston, MA 02115, USA. E-mail: tmariani{at}rics.bwh.harvard.edu

SPECIFIC AIM

The use of expression microarrays to determine bona fide changes in gene expression between experimental paradigms is confounded by noise due to variability in measurement. We have attempted to assess the inherent technical variability of commercial high-density oligonucleotide (Affymetrix) microarrays, focusing on the variability related to the common processes post-RNA sample isolation.

PRINCIPAL FINDINGS

1. Technical variability has a significant intensity-specific bias
We have focused on technical variability and surmised that precision, reflecting the common processes of target hybridization, scanning, and analysis, depends on signal intensity. To examine the influence of expression microarray signal intensity on signal precision, we performed a series of replicate experiments with the Affymetrix U95 chip set analyzed with MAS 4.0. Using five replicates for each of the three experimental conditions, we defined an "outlier" as a gene (probe set) with poor replicate reproducibility (> 2-fold change in expression in at least 7 of the 10 possible pairwise comparisons) within a condition. All comparisons were made using replicate measures from parallel hybridization of an identical labeled target. The percentage of outliers was 7.9–9.7% of all probe sets (depending on the treatment condition) and 6.9–10.8% for all conditions depending on the individual chip in the set analyzed. We investigated the intensity-specific distribution of outlier occurrence at a defined intensity range. The rate of the outliers (number of outliers relative to the total number of genes "expressed" at that defined intensity range) was significantly biased toward low signal intensity values. Similar results were obtained when the data was reanalyzed with MAS 5.0.

2. Simple, static fold change thresholds are too stringent at high intensities and not stringent enough at low intensities
To further explore the relationship between variability and intensity, we plotted the variability in fold change within replicates relative to intensity (Fig. 1 ). These data demonstrate that 1) at low intensity, replicate variability is much higher than commonly used thresholds (2- to 3-fold). For chip A, 12% of genes had a mean fold change > 2.0 at a SADV of < 1000. 2) At high intensity, replicate variability is lower than commonly used thresholds. For chip A, 100% of genes had a mean fold change of < 2.0 at a SADV of > 30,000. 3) Variability as measured by replicate fold change is somewhat "chip" specific.



View larger version (42K):
[in this window]
[in a new window]
 
Figure 1. Intensity-specific variability of replicate fold change. The absolute value of the mean fold change for each probe set within replicates was calculated using all 10 possible pairwise comparisons for each of the three conditions. The mean fold change in the 10 pairwise comparisons within each of the three conditions was calculated and graphed for each gene as a function of gene-specific mean intensity. We calculated a true fold change by applying Affymetrix’ fold change algorithm without adding or subtracting 1 from the result. Three data points are plotted for each probe set, representing the mean fold change within each of the three experimental conditions; control, troglitazone-treated, GW7845-treated. Data for each of the 5 chips representing the U95 chip-set are presented individually, revealing chip-specific variability in replicate fold change.

3. Technical variability can be described and accommodated by a "variable fold change" threshold
We devised a method to use intensity-specific variability in order to determine significant changes in gene expression. The standard deviation of replicate intensities was calculated and plotted relative to mean intensity, giving a measure of intensity-specific variance (Fig. 2 A). A LOESS curve was fit to these data and used to estimate the variance at any given intensity. This strategy was initially applied to each chip (U95A-E) independently, as the variability for each chip was distinct (Fig. 1) , then to the entire data set, generating a chip-independent estimate. The LOESS curve enabled us to assign confidence (P value) to an intensity-specific fold change, allowing calculation of a variable fold change threshold for any baseline intensity (SADV) at any given P value. We defined a P value at an arbitrary confidence level and constructed a function that demonstrates the dependence of significant fold changes on absolute baseline intensity. The data in Fig. 2B depict fold change thresholds as a function of absolute intensity with a given P value of 0.05 and 0.01. Additional analyses of the variability and a "differential expression" calculator based on our variable fold change threshold are available as supplementary material (http://lungtranscriptome.bwh.harvard.edu).



View larger version (27K):
[in this window]
[in a new window]
 
Figure 2. Intensity-specific signal variance (A) and estimated variable fold change thresholds (B). The variance in intensity of replicates, quantified using standard deviation, was plotted relative to the mean intensity of each probe set for each condition. A Loess curve was fit to the data and used to calculate the intensity-specific variance (A). Loess curves were generated for each individual chip in the set to accommodate potential inter-chip variability. Using the Loess curves, we calculated the change in intensity, then fold change necessary to provide 95% and 99% confidence that intensity values are significantly different (B). Only values for positive SADVs are presented.

CONCLUSIONS AND SIGNIFICANCE

The use of expression microarrays is widespread, but determining the meaningful portion of the abundant data generated from this technology when applied to a discriminate task (e.g., determining differences in two samples) is difficult. Diverse sources of variability affect the reliability of expression microarray data and limit validity of conclusions drawn from these experiments. In this study, we focused on technical components downstream from sample isolation and labeling inherent in the protocol and technology, and should be universally consistent. Variability derived from other sources (Fig. 3 ) is not defined by this study. We have quantified the technical variability of oligonucleotide-based microarrays and applied these data to increase confidence in determining bona fide changes in gene expression. We found that the variability was intensity specific. We have developed a "variable fold change" method that accommodates non-uniform variability at different intensities to improve prediction of changes in gene expression. Testing this method indicates that it removes intensity-specific bias and results in a 5- to 10-fold reduction in the number of false-positive changes when compared with a static fold change threshold (2.0-fold). An interesting finding of this study is the distinct distribution of variability between individual chips in the U95 set as shown in Fig. 1 . These distinctions are evident in Fig. 2A as well, but do not result in notable differences to the variable fold change thresholds in Fig. 2B .



View larger version (50K):
[in this window]
[in a new window]
 
Figure 3. Schematic diagram. Steps in the generation of oligonucleotide-based expression microarray data capable of introducing experimental noise are depicted. Standard analysis techniques lead to a defined misclassification rate for differential expression (false-positive rate). The steps controlled by our replicate data set are shaded. We determined the variability from 5 replicates each for 3 conditions querying 60,000 genes (180,000 measuresx5 replicates=900,000 data points). Assessment of our replicate data set reveals an intensity-specific bias that can be accommodated by a "variable fold change" threshold for any specified level of confidence (P value). Use of our method reduces the false-positive rate and removes intensity-specific bias.

We have applied LOESS fitting to derive a realistic prediction of the variability of a particular observed intensity level. Predicted values of standard deviation are used to construct a score statistic of relative difference in gene expression. The assumption of normality of data is implicit and essential only in calculating P values from the defined t statistic. When normality is violated, P values lose their classical definition of probability, but the calculated values of the statistic can be used to rank order genes that are most likely to be differentially expressed. Although probe-specific contributions to variability (or probe-dependent variability) likely exist, they should not negate our intensity-specific approach.

The relative similarity in the variable fold change thresholds between chips and conditions suggests that these thresholds may be widely applicable to expression data generated from other (Affymetrix or) similar high-density oligonucleotide-based microarrays. We suggest that this approach can be widely incorporated into microarray data analysis methods to improve prediction of significant changes in gene expression of oligonucleotide microarray experiments and reduce false leads even in the absence of replicates. Although we believe this represents a valuable incorporation and application of the technical variability in determining differential expression, additional improved noise models are clearly warranted. Proof that this method is biologically relevant, as evidenced by consistency with results from classical measures of gene expression, is the subject of current investigation.

FOOTNOTES

1 To read the full text of this article, go to http://www.fasebj.org/cgi/doi/10.1096/fj.02-0351fje; to cite this article, use FASEB J. (December 3, 2002) 10.1096/fj.02-0351fje




This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Bhattacharya and T. J. Mariani
Transformation of expression intensities across generations of Affymetrix microarrays using sequence matching and regression modeling
Nucleic Acids Res., October 13, 2005; 33(18): e157 - e157.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
W. T. Schaiff, I. Bildirici, M. Cheong, P. L. Chern, D. M. Nelson, and Y. Sadovsky
Peroxisome Proliferator-Activated Receptor-{gamma} and Retinoid X Receptor Signaling Regulate Fatty Acid Uptake by Primary Human Placental Trophoblasts
J. Clin. Endocrinol. Metab., July 1, 2005; 90(7): 4267 - 4275.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. A. Stolovitzky, A. Kundaje, G. A. Held, K. H. Duggar, C. D. Haudenschild, D. Zhou, T. J. Vasicek, K. D. Smith, A. Aderem, and J. C. Roach
Statistical analysis of MPSS measurements: Application to the study of LPS-activated macrophage gene expression
PNAS, February 1, 2005; 102(5): 1402 - 1407.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
B. H. Mecham, D. Z. Wetmore, Z. Szallasi, Y. Sadovsky, I. Kohane, and T. J. Mariani
Increased measurement accuracy for sequence-verified microarray probes
Physiol Genomics, August 11, 2004; 18(3): 308 - 315.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
I. Bildirici, C.-R. Roh, W. T. Schaiff, B. M. Lewkowski, D. M. Nelson, and Y. Sadovsky
The Lipid Droplet-Associated Protein Adipophilin Is Expressed in Human Trophoblasts and Is Regulated by Peroxisomal Proliferator-Activated Receptor-{gamma}/Retinoid X Receptor
J. Clin. Endocrinol. Metab., December 1, 2003; 88(12): 6056 - 6062.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
17/2/321
02-0351fjev1    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by MARIANI, T. J.
Right arrow Articles by SADOVSKY, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by MARIANI, T. J.
Right arrow Articles by SADOVSKY, Y.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS