[ Home ] [ Data Simulator ] [ Data Analysis Tool ] [ Dataset Links ] [ Recommendations for Analysis ]

GEDA References for Microarray Analysis

suggest a paper

I. Data Format/Data Storage

II. Experimental Design

III. Steps for Data Handling/Normalization/Transformation

IV. Tests for Differentially Expressed Genes

V. Clustering (Supervised and Unsupervised, Gene and Sample)/ Classification

VI. Machine Learning

VII. Neural Networks/AI

VIII. Computational Validation

IX. Evaluation/Comparisons

X. Functional Interpretation

XI. Integrating diverse genomic and proteomic data sources

XII. Other Software

===========================================================================

I. Data Format/Data Storage

  1. Brazma A, et al.: One Stop Shopping for Microarrays: Is a universal, public DNA microarray database a realistic goal? Nature 2000, 403:699-700.[Full text]
  2. Spellman P, et al.: Design and implementation of microarray gene expression markup language (MAGE-ML).Genome Biology, 23 August 2002. [Full Text]

I. Experimental Design

  1. Black M.A. and RW Doerge. 2002. Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18(12):1609-1616.[PubMed]
  2. Churchill GA. 2002. Fundamentals of experimental design for cDNA microarrays. Nat Genet 32 Suppl 2,490-5.[Pubmed]
  3. Dash A, Maine IP, Varambally S, Shen R, Chinnaiyan AM, Rubin MA. 2002. Changes in differential gene expression because of warm ischemia time of radical prostatectomy specimens. Am J Pathol 161,1743-8.[Pubmed]
  4. Dobbin K, Simon R. 2002. Comparison of microarray designs for class comparison and class discovery. Bioinformatics 18(11):1438-45.[Pubmed]
  5. Emptage MR, Hudson-Curtis B, Sen K. 2003. Treatment of microarray experiments as split-plot designs. J Biopharm Stat. 2003 May;13(2):159-78.[Pubmed]
  6. Lee M-LT, Whitmore GA, Yukhananov RY. Analysis of unbalanced microarray data. Journal of Data Science 2003, 1:103-121. [JDS Full Text]
  7. Herwig R, Aanstad P, Clark M, Lehrach H. 2001 Statistical evaluation of differential expression on cDNA nylon arrays with replicated experiments. Nucleic Acids Res 29,E117[Pubmed]
  8. Huang J, Qi R, Quackenbush J, Dauway E, Lazaridis E, Yeatman T. (2001) Effects of ischemia on gene expression. J Surg Res 99,222-7.[Pubmed]
  9. Kendziorski, C.M., M.A. Newton, H. Lan, and M.N. Gould. 2003. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine, to appear.[Pubmed]
  10. Kerr MK. Experimental design to make the most of microarray studies. Methods Mol Biol. 2003;224:137-47.[Pubmed]
  11. Kerr MK, Churchill GA. 2001. Statistical design and the analysis of gene expression microarray data. Genet Res. 77,123-8.[Pubmed]
  12. Kerr MK, Churchill GA: Experimental design for gene expression microarrays. Biostatistics 2:183-201.[Pubmed]
  13. Lee ML, Kuo FC, Whitmore GA, Sklar J. 2000. Importance of replication in microarray gene expression studies, statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A 97,9834-9.[Pubmed]
  14. Lee ML, Whitmore GA. 2002. Power and sample size for DNA microarray studies. Stat Med 21,3543-70.[Pubmed]
  15. Li C, Hung Wong W. 2001. Model-based analysis of oligonucleotide arrays, model validation, design issues and standard error application. PNAS 98,31-36.[Pubmed]
  16. Liang M, Briggs AG, Rute E, Greene AS, Cowley Jr AW.2003. Quantitative assessment of the importance of dye switching and biological replication in cDNA microarray studies. Physiol Genomics. 2003 Jun 10 [Epub ahead of print].[Pubmed]
  17. Lönnstedt, I , and T. Speed. 2002. Replicated microarray data, Statistica Sinica, 12 (1) , 31-46.[Pubmed]
  18. Pan W, Lin J, Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 2002;3(5):research0022[Pubmed]
  19. Peng X, Wood CL, Blalock EM, Chen KC, Landfield PW, Stromberg AJ. 2003. Statistical implications of pooling NA samples for microarray experiments. BMC Bioinformatics ;4(1):26 [Pubmed]
  20. Wang J, Nygaard V, Smith-Sorensen B, Hovig E, Myklebost O. (2002) MArray: analysing single, replicated or reversed microarray experiments. Bioinformatics 18,1139-40.[Pubmed]
  21. Simon RM, Dobbin K. Experimental design of DNA microarray experiments. Biotechniques. 2003 Mar;Suppl:16-21.[Pubmed]
  22. Simon R, Radmacher MD, Dobbin K, McShane LM. 2003. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003 Jan 1;95(1):14-8.[Pubmed]
  23. Yang YH, Speed T. 2002. Design issues for cDNA microarray experiments. Nat Rev Genet 3,579-88.[Pubmed]
  24. Wang Y, Wang X, Guo SW, Ghosh S. 2002. Conditions to ensure competitive hybridization in two-color microarray, a theoretical and experimental analysis. Biotechniques 32,1342-6.[Pubmed]
  25. Wrobel G, Schlingemann J, Hummerich L, Kramer H, Lichter P, Hahn M. 2003. Optimization of high-density cDNA-microarray protocols by 'design of experiments'. Nucleic Acids Res.31(12):e67. [Pubmed]

II. Steps for Data Handling/Normalization/Transformation

  1. Bilban M, Buehler LK, Head S, Desoye G, Quaranta V. 2002a. Normalizing DNA microarray data. Curr Issues Mol Biol 4,57-64.[Pubmed]
  2. Bolstad BM, Irizarry RA, Astrand M, Speed TP. 2003. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185-93.[Pubmed]
  3. Cheadle C, Vawter MP, Freed WJ, Becker KG. 2003. Analysis of microarray data using z score transformation. J Mol Diagn. May;5(2):73-81.[Pubmed]
  4. Chen YJ, Kodell R, Sistare F, Thompson KL, Morris S, Chen JJ. Normalization methods for analysis of microarray gene-expression data. J Biopharm Stat. 2003 Feb;13(1):57-74.[Pubmed]
  5. Colantuoni C, Henry G, Zeger S, Pevsner J. 2002a. Local mean normalization of microarray element signal intensities across an array surface, quality control and correction of spatially systematic artifacts. Biotechniques 32,1316-20.[Pubmed]
  6. Chen, YJ, R Kodell, F Sistare, KL Thompson, S Morris JJ Chen 2003. Normalization methods for analysis of microarray gene-expression data. J Biopharm Stat 13(1): 57-74.[Pubmed]
  7. Colantuoni C, Henry G, Zeger S, Pevsner J. 2002b. SNOMAD (Standardization and NOrmalization of MicroArray Data), web-accessible gene expression data analysis. Bioinformatics 18,1540-1541.[Pubmed]
  8. Durbin BP, Hardin JS, Hawkins DM, Rocke DM. (2002) A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 18 Suppl 1,S105-10.[Pubmed]
  9. Edwards D. Non-linear normalization and background correction in one-channel cDNA microarray studies. Bioinformatics. 2003 May 1;19(7):825-33.[Pubmed]
  10. Hill AA, Brown EL, Whitley MZ, Tucker-Kellogg G, Hunter CP, Slonim DK. 2001. Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls. Genome Biol 2,RESEARCH0055 [Pubmed]
  11. Hoffmann R, Seidl T, Dugas M. 2002. Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis. Genome Biol 3,RESEARCH0033[Pubmed]
  12. Hsiao, LL, RV Jensen, T Yoshida, KE Clark, JE Blumenstock SR Gullans 2002. Correcting for signal saturation errors in the analysis of microarray data. Biotechniques 32(2): 330-2, 4, 6.[Pubmed]
  13. Huber W, Von Heydebreck A, Sultmann H, Poustka A, Vingron M. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 Suppl 1,S96-S104.[Pubmed]
  14. Irizarry, RA, B Hobbs, F Collin, YD Beazer-Barclay, KJ Antonellis, U Scherf TP Speed 2003. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2): 249-64.[Pubmed]
  15. Kepler TB, Crosby L, Morgan KT. (2002) Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biol 3,research0037.1-0037.12.[Pubmed]
  16. Kim JH, Shin DM, Lee YS. (2002) Effect of local background intensities in the normalization of cDNA microarray data with a skewed expression profiles. Exp Mol Med 34,224-32.[Pubmed]
  17. Kroll TC, Wolfl S. 2002. Ranking: a closer look on globalisation methods for normalisation of gene expression arrays. Nucleic Acids Res. 2002 Jun 1;30(11):e50.[Pubmed]
  18. Park T, Yi SG, Kang SH, Lee S, Lee YS, Simon R 2003.  Evaluation of Normalization Methods for Microarray Data.BMC Bioinformatics. 2003 Sep 2 [Epub ahead of print]. Epub 2003 Sep 02. [Pubmed]
  19. Qian, J. Y. Kluger, H. Yu and M. Gerstein. 2003. Identification and correction of spurious correlations in microarray data.  Biotechniques 35:42-48.
  20. Quackenbush J. (2002) Microarray data normalization and transformation. Nat Genet 32 Suppl,496-501.[Pubmed]
  21.  Rocke DM, Durbin B. 2003. Approximate variance-stabilizing transformations for gene-expression microarray data.Bioinformatics. May 22;19(8):966-72. [Pubmed]
  22. Rudi K, Treimo J, Moen B, Rud I, Vegarud G.200.  Internal controls for normalizing DNA arrays. Biotechniques. 2002 Sep;33(3):496, 498, 500 passim [Pubmed]. 
  23. Schageman, JJ, M. Basit, TD Gallardo, HR Garner and RV Shohet. 2002. MarcC-V, a spreadsheet-based tool for analysis, normalization, and visualization of single cDNA microarray experiments. Biotechniques 32,338-340, 342, 344.[Pubmed]
  24. Schadt EE, Li C, Ellis B, Wong WH. Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J Cell Biochem Suppl. 2001;Suppl 37:120-5.[Pubmed]
  25. Schuchhardt J, Beule D, Malik A, Wolski E, Eickhofin H, Lehrach H, Herzel H. Normalization strategies for cDNA microarrays. Nucleic Acids Research 2000; 28(10):e47. [Pubmed]
  26. Shmulevich I, Zhang W.2002. Binary analysis and optimization-based normalization of gene expression data. Bioinformatics. 2002 Apr;18(4):555-65. [Pubmed]
  27. Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH. (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 29,2549-57.[Pubmed]
  28. Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F. (2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 18;3(7),37[Pubmed]
  29. Wang Y, Lu J, Lee R, Gu Z, Clarke R. 2002. Iterative normalization of cDNA microarray data. IEEE Trans Inf Technol Biomed. Mar;6(1):29-37. [Pubmed]
  30. Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, Saxild HH, Nielsen C, Brunak S, Knudsen S. 2002. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol 3,research0048[Pubmed]
  31. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 15;30(4),e15.[Pubmed]
  32. Yeung, K. Y. C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo. Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977--987, 2001.[Pubmed]
  33. Cope, LM, RA Irizarry, HA Jaffee, Z Wu TP Speed 2003. A Benchmark for Affymetrix GeneChip Expression Measures. Bioinformatics 1(1): 1-10.[Pubmed]
    Eisen, MB, PT Spellman, PO Brown D Botstein 1998. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25): 14863-8.[Pubmed]

III. Tests for Differentially Expressed Genes

1.      General Overview/Comparisons

    1. Baggerly KA, Coombes KR, Hess KR, Stivers DN, Abruzzo LV, Zhang W. 2001. Identifying differentially expressed genes in cDNA microarray experiments. J Comput Biol 8,639-59.[Pubmed]
    2. Broberg P. Ranking genes with respect to differential expression. Genome Biol 2002 Aug 5;3(9):preprint0007[Pubmed]
    3. Kooperberg C, Sipione S, LeBlanc M, Strand AD, Cattaneo E, Olson JM. 2002. Evaluating test statistics to select interesting genes in microarray experiments. Hum Mol Genet 11,2223-32.[Pubmed]
    4. Storey JD, Tibshirani R. 2003. Statistical methods for identifying differentially expressed genes in DNA microarrays. Methods Mol Biol. 224:149-57. [Pubmed]
    5. Zhang, M.Q. 1999. Large-scale gene expression data analysis: a new challenge to computational biologists. Genome Res. 9: 681-688[Pubmed]

2.      Least Squares Methods

    1. Bushel PR, Hamadeh HK, Bennett L, Green J, Ableson A, Misener S, Afshari CA, Paules RS. 2002.Computational selection of distinct class- and subclass-specific gene expression signatures.   J Biomed Inform. 2002 Jun;35(3):160-70. [Pubmed]
    2. Cui X, Churchill GA. 2003. Statistical tests for differential expression in cDNA microarray experiments.  Genome Biol. 2003;4(4):210. Epub 2003 Mar 17. [Pubmed]
    1. Draghici S, Kulaeva O, Hoff B, Petrov A, Shams S, Tainsky MA. Noise sampling method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays. Bioinformatics. 2003 Jul 22;19(11):1348-59. [Pubmed]
    2. Welford SM, Gregg J, Chen E, Garrison D, Sorensen PH, Denny CT, Nelson SF. 1998. Detection of differentially expressed genes in primary tumor tissues using representational differences analysis coupled to microarray hybridization. Nucleic Acids Res 26, 3059-65.[Pubmed]
    3. Thomas JG, Olson JM, Tapscott SJ, Zhao LP. (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11,1227-36.[Pubmed]
    4. Yang IV, Chen E, Hasseman JP, Liang W, Frank BC, Wang S, Sharov V, Saeed AI, White J, Li J, Lee NH, Yeatman TJ, Quackenbush J. (2002) Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol. 24;3.62.[Pubmed]

3.      Nonparametric Methods

    1. Pan W. 2003.  On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics. 2003 Jul 22;19(11):1333-40.   [Pubmed]
    2. Huang X, Pan W. 2002. Comparing three methods for variance estimation with duplicated high density oligonucleotide arrays. Funct Integr Genomics. 2002 Aug;2(3):126-33. Epub 2002 Jul 24.  [Pubmed]
    3. Park PJ, Pagano M, Bonetti M.2001 A nonparametric scoring algorithm for identifying informative genes from microarray data.Pac Symp Biocomput:52-63[Pubmed]
    4. Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB.2002 Nonparametric methods for identifying differentially expressed genes in microarray data.Bioinformatics 2002 Nov;18(11):1454-61[Pubmed]
    5. Li C, Wong WH. 2001 Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A. 98(1):31-6.[Pubmed]

4.      False Discovery Rate Estimation

    1. Efron B, Tibshirani R. Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol 2002 Jun;23(1):70-86[Pubmed]
    2. Storey, J. (2002) A direct approach to false discovery rates. J. Roy. Stat. Soc. Ser. B, 64:479-498.[Pubmed]
    3. Reiner A, Yekutieli D, Benjamini Y. 2003. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics Feb;19(3):368-75.[Pubmed]
    4. Tusher VG, Tibshirani R, Chu G. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98,5116-21.[Pubmed]
    5. Westfall, PH and Young, SS 1989. P-value adjustments for multiple tests in multivariate binomial models, Journal of the American Statistical Association, 84, 780 -786. [Pubmed  - no entry]

5.  Multivariate Analysis

  1. Peterson LE. 2003. Partitioning large-sample microarray-based gene expression profiles using principal components analysis. Comput Methods Programs Biomed. 2003 Feb;70(2):107-19. [Pubmed] 

Model-Based Analysis

6.      Likelihood Models

    1. Ideker, T., Thorsson, V., Siegel, A.F., and Hood, L.E. 2000. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. Journal of Computational Biology 7: 805- 817. [Pubmed]

7.      Bayesian Models

    1. Baldi P, Long AD. 2001. A Bayesian framework for the analysis of microarray expression data, regularized t -test and statistical inferences of gene changes. Bioinformatics 17,509-19.[Pubmed]
    2. Broet P, Richardson S, Radvanyi F. Bayesian hierarchical model for identifying changes in gene expression from microarray experiments. J Comput Biol. 2002;9(4):671-83. [Pubmed]
    3. Domingos, P. and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103--130.[Citeseer]
    4. Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyze expression data J Comput Biol. 2000;7(3-4):601-20.[Pubmed]
    5. Ibrahim, J.G., Chen, M.H., and Gray, R.J. Bayesian models for gene expression with DNA microarray data. Journal of the American Statistical Association 97: 88-99, 2002.[Pubmed]
    6. Kendziorski, C.M., M.A. Newton, H. Lan, and M.N. Gould. 2003. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine, to appear.[Pubmed]
    7. Lee KE, Sha N, Dougherty ER, Vannucci M, Mallick BK. Gene selection: a Bayesian variable selection approach. Bioinformatics. 2003 Jan;19(1):90-7.[Pubmed]
    8. Townsend JP, Hartl DL. Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments. Genome Biol 2002;3(12):RESEARCH0071[Pubmed]
    9. Theilhaber J, Bushnell S, Jackson A, Fuchs R. 2001. Bayesian estimation of fold-changes in the analysis of gene expression: the PFOLD algorithm. J Comput Biol 8:585-614.[Pubmed]
    10. Li Y, Campbell C, Tipping M. 2002. Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics 18:1332-9. [Pubmed

    8. Other Models

  1. Zhou X, Wang X, Dougherty ER. 2003. Binarization of microarray data on the basis of a mixture model. Mol Cancer Ther. 2003 Jul;2(7):679-84. [Pubmed]
  2. Kato M, Tsunoda T, Takagi T. 2000. Inferring genetic networks from DNA microarray data by multiple regression analysis. Genome Inform Ser Workshop Genome Inform. 11:118-28 [Pubmed] 
  3. Li, C WH Wong 2001. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A 98(1): 31-6.[Pubmed]

IV. Clustering (Supervised and Unsupervised, Gene and Sample/ Classification

  1. Alon U, Barkai N, Notterman DA, Gish, K, Ybarra, S. Mack, D and Levine, AJ. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays Proc. Natl. Acad. Sci. USA. 96: 6745- 6750.[Pubmed]

  2. Ball, G. and Hall, D. A clustering technique for summarizing multivariate data. Behavioral Science 12 (1967), 153-155[Pubmed]

  3. Bagirov AM, Ferguson B, Ivkovic S, Saunders G, Yearwood J. 2003. New algorithms for multi-class cancer diagnosis using tumor gene expression signatures. Bioinformatics. 2003 Sep 22;19(14):1800-7. [Pubmed]

  4. Ben-Dor A, R Shamir, and Z Yakhini.1999. Clustering gene expression patterns. Journal of Computational Biology, 6(3/4):281-297.[Pubmed]

  5. Cherepinsky V, Feng J, Rejali M, Mishra B. Shrinkage-based similarity metric for cluster analysis of microarray data.  Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9668-73. Epub 2003 Aug 05. [Pubmed]

  6.  Dougherty ER, Barrera J, Brun M, Kim S, Cesar RM, Chen Y, Bittner M, Trent JM. 2002. Inference from clustering with application to gene-expression microarrays. J Comput Biol. 9,105-26.[Pubmed]

  7. Dudoit S and J. Fridlyand (2002). A prediction-based resampling method to estimate the number of clusters in a dataset. Genome Biology , Vol. 3, No. 7, p. 0036.1 -- 0036.21.[Pubmed]

  8. Dudoit S., J. Fridlyand, and T. P. Speed (2002a). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, Vol. 97, No. 457, p. 77--87.[Pubmed - no entry]

  9. Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA., 95,14863-14868.[Pubmed]

  10. Gasch AP, Eisen MB. 2002. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 3,RESEARCH0059.[Pubmed]

  11. Hartuv E, Schmitt AO, Lange J, Meier-Ewert S, Lehrach H, Shamir R. An algorithm for clustering cDNA fingerprints. Genomics 2000 Jun 15;66(3):249-56[Pubmed]

  12. Jain AK, Dubes RC. 1988 : Algorithms for Clustering Data. Englewood Cliffs, NJ:Prentice-Hall.

  13. Kluger Y, Basri R, Chang JT, Gerstein M. Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 2003 Apr;13(4):703-16.[Pubmed]

  14. Lee Y, Lee CK. Classification of multiple cancer types by multicategory support vector machines using gene expression data.  Bioinformatics. 2003 Jun 12;19(9):1132-9. [Pubmed]

  15. Li H, Hong F. 2001. Cluster-Rasch models for microarray gene expression data. Genome Biol. 2001;2(8):RESEARCH0031. Epub 2001 Jul 31 [Pubmed]

  16. McConnell P, Johnson K, Lin S. 2002. Applications of Tree-Maps to hierarchical biological data. Bioinformatics. 2002 Sep;18(9):1278-9.[Pubmed]
  17. McLachlan GJ, Bean RW, Peel D. 2002. A mixture model-based approach to the clustering of microarray expression data. Bioinformatics. 2002 Mar;18(3):413-22. [Pubmed]
  18. McLanchlan GJ, Bean RW, Peel D. 2002 Mixture model-based approach to the clustering of microarray. Bioinformatics. 18:3, 413-422.[Pubmed]
  19. McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC, Simon R. 2002. Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics. 2002 Nov;18(11):1462-9. [Pubmed]
  20. Medvedovic M, Sivaganesan S. Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics. 2002 Sep;18(9):1194-206.[Pubmed]
  21.  Milligan, G. W. and M. C. Cooper (1986). A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research 21, 441--458.[Pubmed no entry]
  22. Nguyen DV, Rocke DM. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics. 2002 Jan;18(1):39-50. [Pubmed]
  23. Radmacher MD, McShane LM, Simon R. 2002. A paradigm for class prediction using gene expression profiles. J Comput Biol. 2002;9(3):505-11.[Pubmed]
  24. Romualdi C, Campanaro S, Campagna D, Celegato B, Cannata N, Toppo S, Valle G, Lanfranchi G. 2003. Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. Hum Mol Genet. 2003 Apr 15;12(8):823-36. [Pubmed]
  25. Sawa T, Ohno-Machado L. A neural network-based similarity index for clustering DNA microarray data. Comput Biol Med. 2003 Jan;33(1):1-15[Pubmed]
  26. Sharan R., and Shamir R. 2000. CLICK: A clustering algorithm with applications to gene expression analysis. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 307-316.[Pubmed - no entry]
  27. Sawa T, Ohno-Machado L. A neural network-based similarity index for clustering DNA microarray data. Comput Biol Med. 2003 Jan;33(1):1-15.[Pubmed]
  28. Simon R, Radmacher MD, Dobbin K, McShane LM. 2003. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003 Jan 1;95(1):14-8.  [Pubmed]
  29. Sultan M, Wigle DA, Cumbaa CA, Maziarz M, Glasgow J, Tsao MS, Jurisica I. 2002. Binary tree-structured vector quantization approach to clustering and visualizing microarray data. Bioinformatics Suppl 1,S111-S119. [Pubmed]
  30. Tibshirani, Hastie, Narashiman and Chu (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression PNAS 99:6567-6572. [Pubmed]
  31.   Tsao, ECK, JC Bezdek and NR Pal. 1994. Fuzzy Kohonen clustering networks. Pattern Recognition 27,757-764.[Pubmed no entry]
  32. Valentini G. Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles.  Artif Intell Med. 2002 Nov;26(3):281-304. [Pubmed]
  33. Wuju L, Momiao X.Bioinformatics. 2002 Tclass: tumor classification system based on gene expression profile. Feb;18(2):325-6.  [Pubmed]
  34. Xing, E P. and R M. Karp. 2001. CLIFF: Clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. In Proceedings of the GCB.[Pubmed]
  35. Yeung KY, WL Ruzzo 2001a. Prinicipal component analysis for clustering gene expression data. Bioinformatics 17: 763-774.[Pubmed]
  36. Yeung, K. Y., Haynor, D. R. and Ruzzo, W. L. (2000) Validating clustering for gene expression data. Bioinformatics. 2001 Apr;17(4):309-18. [Pubmed]
  37. Yeung, K. Y. C. Fraley, A. Murua, A. E. Raftery, and W. L. Ruzzo. 2001. Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977--987, 2001.[Pubmed]
  38. Zhang H, Yu CY, Singer B, Xiong M. Recursive partitioning for tumor classification with gene expression microarray data.Proc Natl Acad Sci U S A. 2001 Jun 5;98(12):6730-5. Epub 2001 May 29.[Pubmed]
  39.  Zhang K, Zhao H. Assessing reliability of gene clusters from gene expression data. Funct Integr Genomics. 2000 Nov;1(3):156-73. [Pubmed]
  40. Zhang H, Yu CY, Singer B, Xiong M. 2001. Recursive partitioning for tumor classification with gene expression microarray data. Proc Natl Acad Sci U S A. 2001 Jun 5;98(12):6730-5. Epub 2001 May 29. [Pubmed]

Multivariate Analysis

1.      Alter, O., P.O. Brown and D. Botstein. 2000. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA, 97,10101-10106.[Pubmed]

2.      Bicciato S, Luchini A, Di Bello C. PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinformatics. 2003 Mar 22;19(5):571-8.[Pubmed]

3.      Culhane AC, Perriere G, Considine EC, Cotter TG, Higgins DG. Between-group analysis of microarray data. Bioinformatics. 2002 Dec;18(12):1600-8.[Pubmed]

4.      Fellenberg, K, NC Hauser, B Brors, A Neutzner, JD Hohheisel and M Vingron. 2001. Correspondence analysis applied to microarray data. Proc. Natl. Acad. Sci. USA., 98,10781-10786.[Pubmed]

5.      Ghosh D. 2002. Resampling methods for variance estimation of singular value decomposition analyses from microarray experiments. Funct Integr Genomics Aug;2(3):92-7[Pubmed]

6.      Ghosh D. 2002. Singular value decomposition regression models for classification of tumors from microarray experiments. Pac Symp Biocomput,18-29.[Pubmed]

7.      Kerr MK, Martin M, Churchill GA. 2000. Analysis of variance for gene expression microarray data. J Comput Biol 7, 819-37.[Pubmed]

8.      Wall ME, Dyck PA, Brettin TS. 2001 SVDMAN--singular value decomposition analysis of microarray data. Bioinformatics 2001 Jun;17(6):566-8.[Pubmed]


V. Machine Learning

  1. Brown, M, W. Grundy, D. Lin, N. Cristianini, C. Sugnet, T. Furey, M. Jr, and D. Haussler. 2000. Knowledge-based analysis of microarray gene expression data by using suport vector machines. Proc. Natl. Acad. Sci. 97:262-267..[Pubmed]

  2. Furey, T.S., N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer and D. Haussler. 2000. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16, 906-914.[Pubmed]

  3. Kohavi R and GH. John. 1997. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273-324.[Pubmed no entry]

  4. Koller D and M Sahami. 1996. Toward optimal feature selection. In International Conference on Machine Learning, pages 284-292.[Pubmed no entry]

  5. Lee Y, Lee CK. Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics. 2003 Jun 12;19(9):1132-9.[Pubmed]

  6. Lyons-Weiler J, Patel S, Bhattacharya S. (2003) A classification-based machine learning approach for the analysis of genome-wide expression data. Genome Res Mar;13(3):503-12[Pubmed]

  7. Xing E P., M I. Jordan, and R M. Karp 2001a. Feature selection for high-dimensional genomic microarray data. In Proc. 18th International Conf. on Machine Learning, pages 601-608, Morgan Kaufmann, San Francisco, CA.

  8. Xiong M, Li W, Zhao J, Jin L, Boerwinkle E. 2001. Feature (gene) selection in gene expression-based tumor classification. Mol Genet Metab 73,239-47.[Pubmed]

  9. Xiong M, X Fang, J Zhao. 2001. Biomarker identification by feature wrappers. Genome Research, 11:1878-1887.[Pubmed]

  10. Quinlan, R. 1994. C4.5: programs for machine learning. Morgan Kaufmann.[Link]

  11. Quinlan, R. 1996. Improved use of continuous attributes in C4.5. JAIR 4,77-90[JAIR]

  12. Rougemont J, Hingamp P. DNA microarray data and contextual analysis of correlation graphs. BMC Bioinformatics. 2003 Apr 29;4(1):15.[Pubmed]

Regression Trees

1.      Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Belmont (CA): Wadsworth International Group; 1984.[Amazon.com]

Bagging

1.      Breiman.L 1996. Bagging Predictors. Machine Learning 24(2): 123-140.[Pubmed no entry]

2.      Dudoit S, Fridlyand J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics. 12;19(9):1090-9. [Pubmed]

3.      Hastie, T., R. Tibshirani, J.H.F. Friedman. The Elements of Statistical Learning. (Springer) [Amazon.com]

Voting Methods/Boosting

  1. Bijlani R, Cheng Y, Pearce DA, Brooks AI, Ogihara M. 2003. Prediction of biologically significant components from microarray data: Independently Consistent Expression Discriminator (ICED). Bioinformatics. 2003 Jan;19(1):62-70.[Pubmed]

  2. Dettling M, Buhlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003 Jun 12;19(9):1061-9. [Pubmed]

  3. Dudoit S, Fridlyand J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics. 2003 Jun 12;19(9):1090-9. [Pubmed]

  4. Schapire R.E., Y. Freund, P. Barlett, W.S. Lee. 1998. Boosting the Margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, vol.26, pp. 1651-1686.[Pubmed no entry]

Jackknife to Reduce False Positives

1.      Lyons-Weiler J, Patel S, Bhattacharya S. (2003) A classification-based machine learning approach for the analysis of genome-wide expression data. Genome Res Mar;13(3):503-12[Pubmed]

VI. Neural Networks/AI

  1. Ando T, Suguro M, Hanai T, Kobayashi T, Honda H, Seto M. 2002. Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma. Jpn J Cancer Res. 93,1207-12.[Pubmed]

  2. Azuaje F. 2001. A computational neural approach to support the discovery of gene function and classes of cancer. IEEE Trans Biomed Eng. 48,332-9.[Pubmed]

  3. Bicciato S, Pandin M, Didone G, Di Bello C. Pattern identification and classification in gene expression data using an autoassociative neural network model. Biotechnol Bioeng. 2003 Mar 5;81(5):594-606.[Pubmed]

  4. Bishop, C. M. 1995. Neural Networks for Pattern Recognition. Oxford University Press.[Amazon.com]

  5. Bishop, C. M. 1999. Bayesian PCA. In M. S. Kearns, S. A. Solla, and D. A. Cohn (Eds.), Advances in Neural Information Processing Systems, Volume 11, pp. 382-388. MIT Press [Amazon.com]

  6. Deutsch JM. Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics. 2003 Jan;19(1):45-52.[Pubmed]

  7. Dettling M, Buhlmann P. Supervised clustering of genes. Genome Biol. 2002;3(12):RESEARCH0069. Epub 2002 Nov 25.[Pubmed]

  8. Gasch AP, Eisen MB. 2002. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 3,RESEARCH0059.[Pubmed]

  9. Huntsberger T.L. and Aijimarangsee P., 1992. Parallel self-organising feature maps for unsupervised pattern recognition. In, Bezdek J.C. and Pal N.R, Editors, Fuzzy models for pattern recognition, pp 483-495. IEEE Press, New York.[Amazon.com]

  10.  Jordan M.1995. Why the logistic function? A tutorial discussion on probabilities and neural networks. TR 9503, Computational Cognitive Science, MIT.[CiteSeer]

  11. Jornsten R, Yu B. 2003. Simultaneous gene clustering and subset selection for sample classification via MDL. Bioinformatics. 2003 Jun 12;19(9):1100-9.[Pubmed]

  12. Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001;7:673-9.[Pubmed]

  13. Kohonen T, Somervuo P. How to make large self-organizing maps for nonvectorial data. Neural Netw 2002 Oct-Nov;15(8-9):945-52[Pubmed]

  14. Kohonen, T. 1982. Self-organized formation of topologically correct feature map. Biol. Cybern. 43,59-69.[Pubmed]

  15. Neal. R. M., 1996. Bayesian Learning for Neural Networks, volume 118 of Lecture Notes in Statistics. Springer.[Amazon.com]

  16.  Mateos A, Dopazo J, Jansen R, Tu Y, Gerstein M, Stolovitzky G. 2002  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res. Nov;12(11):1703-15. [Pubmed]

  17.  O'Neill MC, Song L. Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics. 2003 Apr 10;4(1):13. [Pubmed]

  18. Ringner M, Peterson C. Microarray-based cancer diagnosis with artificial neural networks. Biotechniques. 2003 Mar;Suppl:30-5.[Pubmed]

  19. Ripley BD. Pattern recognition and neural networks. Cambridge (U.K.): Cambridge University Press; 1996.[Amazon.com]

  20.  Sawa T, Ohno-Machado L. A neural network-based similarity index for clustering DNA microarray data. Comput Biol Med. 2003 Jan;33(1):1-15[Pubmed]

  21.  Selaru FM, Xu Y, Yin J, Zou T, Liu TC, Mori Y, Abraham JM, Sato F, Wang S, Twigg C, Olaru A, Shustova V, Leytin A, Hytiroglou P, Shibata D, Harpaz N, Meltzer SJ. 2002. Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions. Gastroenterology 3,606-13.[Pubmed]

  22. Tipping, M.E. and C.M. Bishop 1999. Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2):443-482, 1999.[Pubmed]

  23. Tomida S, Hanai T, Honda H, Kobayashi T. 2002. Analysis of expression profile using fuzzy adaptive resonance theory. Bioinformatics 18,1073-83.[Pubmed]

  24. Tsao, ECK, JC Bezdek and NR Pal. 1994. Fuzzy Kohonen clustering networks. Pattern Recognition 27,757-764.[Pubmed no entry]

VII. Computational Validation

1.      Dudoit S., J. Fridlyand, and T. P. Speed (2002a). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, Vol. 97, No. 457, p. 77--87.[Pubmed]

2.      Efron B, Tibshirani R. Improvements on cross-validation: the .632+ bootstrap method. J Am Stat Assoc 1997;92:548-60[Pubmed]

3.      Landgrebe J, Wurst W, Welzl G. Permutation-validated principal components analysis of microarray data. Genome Biol. 2002;3(4):RESEARCH0019. Epub 2002 Mar 22. [Pubmed]

4.      Felsenstein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 666-670.[Pubmed]

VIII. Evaluation/Comparisons

1.      Dudoit S., J. Fridlyand, and T. P. Speed (2002a). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, Vol. 97, No. 457, p. 77--87.[Pubmed].

2.      William J. Lemon, Jeffrey J.T. Palatini, Ralf Krahe and Fred A. Wright (2002) Theoretical and experimental comparisons of gene expression indexes for oligonucleotide arrays.  Bioinformatics. Vol 18, 1470-1476. [Pubmed]

3.      Kooperberg C, Sipione S, LeBlanc M, Strand AD, Cattaneo E, Olson JM. 2002. Evaluating test statistics to select 

interesting genes in microarray experiments. Hum Mol Genet 11,2223-32.[Pubmed]

4.      Pan, W. (2002) A Comparative Review of Statistical Methods for Discovering Differentially Expressed Genes in Replicated Microarray Experiments. Bioinformatics, 12, 546-554 [Pubmed]

5.      Powell DA, Anderson LM, Cheng RY, Alvord WG 2002. Robustness of the Chen-Dougherty-Bittner procedure against non-normality and heterogeneity in the coefficient of variation. J Biomed Opt. 2002 Oct;7(4):650-60.  [Pubmed]

IX.  Regulatory Networks

  1. Kato M, Tsunoda T, Takagi T. 2000. Inferring genetic networks from DNA microarray data by multiple regression analysis. Genome Inform Ser Workshop Genome Inform. 11:118-28 [Pubmed]

 

  1. Peterson LE. Partitioning large-sample microarray-based gene expression profiles using principal components analysis. Comput Methods Programs Biomed. 2003 Feb;70(2):107-19.  [Pubmed]

 

  1. Bagirov AM, Ferguson B, Ivkovic S, Saunders G, Yearwood J. New algorithms for multi-class cancer diagnosis using tumor gene expression

 

  1. Yoo C, Cooper GF. Discovery of gene-regulation pathways using local causal search. Proc AMIA Symp. 2002;:914-8.[Pubmed]

signatures. Bioinformatics. 2003 Sep 22;19(14):1800-7.  [Pubmed]

X. Functional Interpretation

  1. Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression.  Genomics. 2003 Feb;81(2):98-104. [Pubmed]
  2. Masys DR, Welsh JB, Fink JL, Gribskov M, Klacansky I, Corbeil J. Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics, 2001, April; 7(4):319-26.[Pubmed]
  3. Fink JL, Drewes S, Patel H, Welsh JB, Masys DR, Corbeil J, Gribskov M.2HAPI: a microarray data analysis system. Bioinformatics. 2003 Jul 22;19(11):1443-5. [Pubmed]

XI. Integrating diverse genomic and proteomic data sources

  1. The Gene Ontology Consortium. 2000. Gene Ontology: tool for the unification of biology. Nature Genetics 25: 25-29.[Pubmed]
  2. Fickett JW, Wasserman WW 2000. Discovery and modeling of transcriptional regulatory regions. Curr Opin Biotechnol. 2000 Feb;11(1):19-24.[Pubmed]
  3. Suzuki S, Moore DH 2nd, Ginzinger DG, Godfrey TE, Barclay J, Powell B, Pinkel D, Zaloudek C, Lu K, Mills G, Berchuck A, Gray JW. 2000. An approach to analysis of large-scale correlations between genome changes and clinical endpoints in ovarian cancer. Cancer Res. 2000 Oct 1;60(19):5382-5.[Pubmed]
  4. Walhout AJ, Reboul J, Shtanko O, Bertin N, Vaglio P, Ge H, Lee H, Doucette-Stamm L, Gunsalus KC, Schetter AJ, Morton DG, Kemphues KJ, Reinke V, Kim SK, Piano F, Vidal M. 2002. Integrating interactome, phenome, and transcriptome mapping data for the C. elegans germline. Curr Biol. 2002 Nov 19;12(22):1952-8.[Pubmed]
  5. Zhou X, Kao MC, Wong WH. 2002. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A. 2002 Oct 1;99(20):12783-8. Epub 2002 Aug 26.[Pubmed]

XI. Other Software

At the NCI

  1. Gene Expression Data Portal
  2. BRB-ArrayTool ( http://linus.nci.nih.gov/BRB-ArrayTools.html) {Pubmed}
  3. GOMiner
  4. CIM Maker (Color-Coded Image Map)
  5. Gene Expression Data Analysis Workbench
  6. Gene Expression Data Portal (http://gedp.nci.nih.gov/dc/index.jsp)

Elsewhere

  1. Bioconductor (http://www.bioconductor.org)/)  Dudoit S, Gentleman RC, Quackenbush J. 2003. Open source software for the analysis of microarray data. Biotechniques. 2003  Mar;Suppl:45-51.  [Pubmed]
  2. OntoTools (http://vortex.cs.wayne.edu/projects.htm) [Pubmed]
  3. OncoMine (http://www.oncomine.org/) [Pubmed]
  4. TIGR MeV (http://www.tigr.org/software/tm4/index.html) [Pubmed]
  5. See a list of other software tool at the Stanford Microarray Database page.

last updated 11/06/03 by JLW