| [ Home ] [ Data Simulator ]
[ Data Analysis
Tool ] [ Dataset Links
] [ Recommendations
for Analysis ] |
|
Recommendations for the Clinical
Analysis of Microarray Data |
| (0) Try the J5 test with the JackKnife to Increase True Positives to Find Differentially Expressed Genes. Our simulation results and complete replicated experiments show that the overlap in the genes found to be differentially expressed in independently produced and analyzed data sets with this method is much higher than the second - place method. We are working to describe these results in a paper to be submitted for publication. |
| (1) Include a statistician or someone with some experience in experimental design prior to setting the experimental design. At the University of Pittsburgh, I strongly recommend contacting my expert colleagues in the Biostatistics Division of UPCI. Some important tips to avoid disaster: use a balanced design whenever possible, make direct comparisons (avoid the wasteful reference designs), avoid confounding, and randomize processing order to avoid unanticipated sources of variability that might lead to confounding. |
| (2) Adopt a flexible,
‘for now’ approach to analysis. As we add further methods for
normalization, additional tests for differentially expressed genes, and
additional classification algorithms, we will label the combination of
approaches that maximize PER* as a J-method. The current best J-method is
J5 method = GMA + the J5 test + ALCED (No transformation, No background subtraction) Update: With only one or two sources of array-specific bias, the J5 method works best with a combined transformation: Trimmed Mean for multiplicative error, and GMA for additive error. With more than one or two array-specific multiplicative or additive insults, the best normalization approach appears to be median mean + GMA, in that order. |
| (3) Always examine the distribution of the data prior to analysis. Demons abound in microarray experiments; all samples should be treated the same way (e.g., always EtoH precipitated, regardless of RNA quality). A comparison of the frequency distributions for each array will sometimes reveal samples with extremely poor data quality that if undetected will compromise the entire analysis. Always look at the between-array COV prior to normalization (See below). |
| (4) "NEVER" analyze log
transform intensity values or ratios. This always leads to a loss of
information. All methods for finding differentially expressed genes
examined to date are hindered by log transformed data. These methods are
more robust to the violation of the assumption of normality than they are
to the effects of log transformation. No research has ever been conducted
that demonstrates that log transformation improves downstream analyses in
microarray data. Log transformation may help with
visualization.
Update 8/06/03: In rare cases, log transformation appears to be absolutely essential. Please take a look at the Hedenfalk BRCA1 vs. BRCA2 data set in the GEDA tool sample data sets. It is the clearest empirical counterexample to this recommendation. Nevertheless, we remain skeptical on the calls to ubiquitously log-transform to linearize the data. So, we recommend log transformation conditionally: use it if it appears to obviously help. We are at this time examining a conditional log transformation strategy that will be applied only to genes that fail the test of normality. |
| (5) NEVER use or analyze ratios. While surprising to most, it’s true: This always leads to a substantial loss of information. The alternative is to measure and analyze the difference in expression (expression in sample A – expression in sample B). The t-test and the J5 test are far superior to all ratio-based methods we have examined to date. Ratios such as the so-called fold change do not provide reliable information on genes that are reproducibly but only slightly different in expression levels between or among classes. A growing number of experts in analysis are recommending the fold-change be abandoned completely for this reason. |
| (6) NEVER use 1-Pearson’s correlation as a pairwise distance to perform agglomerative clustering for class prediction. This distance, while popular, ranks 4rth out of 8 examined to date. It is highly sensitive to false positives (genes that appear statistically different but are, in fact, not) – and the effect is to produce accurate-looking sample classifications (e.g., perfect separation of tumors and normals). We recommend Euclidean distance instead (it ranked 1st out of the 8 studied with simulations to date). |
| (7) Do not use housekeeping gene normalization, or any other method for normalization that uses a reference set of genes. The reason for this is simple: each gene expression measurement in each array is measured with error. Regardless of how these measurements are used (i.e., which correction is applied), error propagation insures that gene subset methods will tend to add noise to the data. We have determined, using simulations, that this is true for the housekeeping gene approach, but we have yet to evaluate the other proposed reference gene set methods (rank invariant set approach, iterative rank invariant set approach, self-consistent method). If you suspect linear among-array bias, use GMA instead (see recommendation [8]). |
| (8) Do no use the z-transformation to normalize. Various methods for finding differentially expressed genes are hindered by z-transformed data. All methods for finding differentially expressed genes to date except the t-test are hindered by this transformation. |
There is no net improvement in finding differentially expressed genes or in an accurate but not misleading classification unless systematic error exists among the arrays. No criteria yet exist for determining whether multiplicative, both multiplicative + additive, linear, array-specific biases exist in a given data set. Update:With only one or two sources of array-specific bias, the J5 method works best with a combined transformation: Trimmed Mean for multiplicative error, and GMA for additive error. With more than one or two array-specific multiplicative or additive insults, the best normalization approach appears to be median mean + GMA, in that order. |
| (10) Never use the nonparametric Mann-Whitney U test, nfold thresholding, or the t-test for correlated data. These methods are all inferior to the univariate t-test and the J5 test. |
| (11)When using the J5
test, never perform background subtraction or 'above threshold'
filtering. All of our studies to date demonstrate that background
subtraction is a source of noise. This is because like intensity (I),
background(B) is measured with error, Remember that
I+eI - B+eb = I-B + eI+eb On a related matter, filtering genes 'above background' precludes calculation of precise averages among measures 'below' background, and noisy (high-error) 'low-expressors' are not likely to be significant anyway under most tests, so why not allow the tests to evaluate all of the genes? |
|
|
| (12) How does on determine that linear multiplicative bias exists in their data set? How well do other statistical normalization approaches (e.g., variance stabilization, local weighted regression (lowess)) work in the face of the nonlinear bias(es) they are designed to correct for? What are the costs of applying these when there is no need to? What is the optimal strategy for approaching the problem of normalization? |
| (13) How do permutation tests effect the PER* of a method? |
| (14) How do corrections for multiple testing effect the PER* of a method? |
| (15) How do other tests for differentially expressed genes rank? (E.g.s, adaptive sign test, Delta-h, diagnostic metric, discriminative weighting, ideal discriminator method, local Bayesian Error test, log-odds tests, neighborhood analysis, SAM, empirical Bayes analysis, percentile band method, maximum likelihood methods, Z-ratio score) |
| (16) How do other classification algorithms rank? (E.g.s, BTSVQ, CAST, decision tree classification, deterministic annealing, gene shaving, k-means clustering, Kohonen-clustering, logistic discrimination, multidimensional scaling, normalized cuts, neighbor joining, nearest neighbor, paritioning around medoids, principle components analysis, quadratic discriminant analysis, self-organizing maps, weighted voting). |
| (17) How well do machine-learning approaches that simultaneous find biomarkers and classes compare to the two-step methods? (E.g.s, fuzzy-Kohohen networks, fuzzy-K-means clustering, fuzzy C-means, fuzzy ART, stepwise discriminant classification analysis, correspondence analysis, neural networks, support vector machines) |
| (18) Under what conditions is background correction safe? |
| (19) What is best way to reduce false positives? |
| (20) What are the effects of combining sets of differentially expressed genes found under various combinations of tests? |
| (21) What is best way for computational validation of class prediction results? |
| (22) How well do competing methods for estimating the false discovery rate work? |
last updated on April 29, 2004.