Characterization of lung tumor subtypes through gene expression cluster validity assessment
RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications, Volume 40 (2006) no. 2, pp. 163-176.

The problem of assessing the reliability of clusters patients identified by clustering algorithms is crucial to estimate the significance of subclasses of diseases detectable at bio-molecular level, and more in general to support bio-medical discovery of patterns in gene expression data. In this paper we present an experimental analysis of the reliability of clusters discovered in lung tumor patients using DNA microarray data. In particular we investigate if subclasses of lung adenocarcinoma can be detected with high reliability at bio-molecular level. To this end we apply cluster validity measures based on random projections recently proposed by Bertoni and coworkers. The results show that at least two subclasses of lung adenocarcinoma can be detected with relatively high reliability, confirming and extending previous findings reported in the literature.

DOI: 10.1051/ita:2006011
Classification: 62H30, 62P10, 92C50
Keywords: cluster validity, clustering algorithms, bio-molecular taxonomy of tumors, DNA microarray data analysis
@article{ITA_2006__40_2_163_0,
     author = {Valentini, Giorgio and Ruffino, Francesca},
     title = {Characterization of lung tumor subtypes through gene expression cluster validity assessment},
     journal = {RAIRO - Theoretical Informatics and Applications - Informatique Th\'eorique et Applications},
     pages = {163--176},
     publisher = {EDP-Sciences},
     volume = {40},
     number = {2},
     year = {2006},
     doi = {10.1051/ita:2006011},
     mrnumber = {2252634},
     zbl = {1108.62122},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ita:2006011/}
}
TY  - JOUR
AU  - Valentini, Giorgio
AU  - Ruffino, Francesca
TI  - Characterization of lung tumor subtypes through gene expression cluster validity assessment
JO  - RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications
PY  - 2006
SP  - 163
EP  - 176
VL  - 40
IS  - 2
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ita:2006011/
DO  - 10.1051/ita:2006011
LA  - en
ID  - ITA_2006__40_2_163_0
ER  - 
%0 Journal Article
%A Valentini, Giorgio
%A Ruffino, Francesca
%T Characterization of lung tumor subtypes through gene expression cluster validity assessment
%J RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications
%D 2006
%P 163-176
%V 40
%N 2
%I EDP-Sciences
%U http://www.numdam.org/articles/10.1051/ita:2006011/
%R 10.1051/ita:2006011
%G en
%F ITA_2006__40_2_163_0
Valentini, Giorgio; Ruffino, Francesca. Characterization of lung tumor subtypes through gene expression cluster validity assessment. RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications, Volume 40 (2006) no. 2, pp. 163-176. doi : 10.1051/ita:2006011. http://www.numdam.org/articles/10.1051/ita:2006011/

[1] A. Alizadeh, D.T. Ross, C.M. Perou and M. Van De Rijn, Towards a novel classification of human malignancies based on gene expression. J. Pathol. 195 (2001) 41-52.

[2] R Anbazhagan et al., Classification of small cell lung cancer and pulmonary carcinoid by gene expression profiles. Cancer Research 59 (1999) 5119-5122.

[3] F. Azuaje, A cluster validity framework for genome expression data. Bioinformatics 18 (2002) 319-320.

[4] A. Bertoni, R. Folgieri, F. Ruffino and G. Valentini, Assessment of clusters reliability for high dimensional genomic data, in BITS 2005, Bioinformatics Italian Society Meeting, Milano Italy (2005).

[5] A. Bertoni and G. Valentini, Random projections for assessing gene expression cluster stability, in IJCNN 2005, The IEEE-INNS International Joint Conference on Neural Networks, Montreal (2005).

[6] A. Bertoni and G. Valentini, Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses. Artif. Intell. Med. (in press)

[7] J.C. Bezdek and N.R. Pal, Some new indexes of cluster validity. IEEE Trans. Systems, Man and Cybernetics Part B 28 (1998) 301-315.

[8] A. Bhattacharjee, W.G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E.J. Mark, E.S. Lander, W. Wong, B.E. Johnson, T.R. Golub, D.J. Sugarbaker and M. Meyerson, Classification of human lung carcinoma by mRNA expression profiling reveals distinct adenocarcinoma subclasses. PNAS 98 (2001) 13790-13795.

[9] N. Bolshakova, F. Azuaje and P. Cunningham, An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics 21 (2005) 451-455.

[10] O.S. Breathnach et al., Clinical features of patients with stage iiib and iv bronchioloalveolar carcinoma of the lung. Cancer 86 (1999) 1165-1173.

[11] P. Cheeseman and J. Stutz, Bayesian classification (autoclass): Theory and results, in Advances in Knowledge Discovery and Data Mining, edited by U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurasamy, MIT Press, Cambridge, MA 2 (1996) 153-180.

[12] J.J. Chen, R. Delongchamp, C. Tsai, H. Hsueh, F. Sisatare, K. Thompson, V. Deasi and J. Fuscoe, Analysis of variance components in gene expression data. Bioinformatics 20 (2004) 1436-1446.

[13] D.L. Davies and D.W. Bouldin, A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence 1 (1979) 224-227.

[14] S. Dudoit and J. Fridlyand, A prediction-based method for estimating the number of clusters in a dataset. Genome Biology 3 (2002) 1-21.

[15] S. Dudoit and J. Fridlyand, Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19 (2003) 1090-1099.

[16] J. Dunn, Well separated clusters and optimal fuzzy partitions. J. Cybernetics 4 (1974) 95-104. | Zbl

[17] M.E. Garber et al., Diversity of gene expression in adenocarcinoma of the lung. PNAS 98 (2001) 13784-13789.

[18] J.A. Hartigan and M.A. Wong, A k-means clustering algorithm. Appl. Stat. 28 (1979) 100-108. | Zbl

[19] T.K. Ho, The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 832-844.

[20] A.K. Jain, M.N. Murty and P.J. Flynn, Data Clustering: a Review. ACM Computing Surveys 31 (1999) 264-323.

[21] W.B. Johnson and J. Lindenstrauss, Extensions of Lipshitz mapping into Hilbert space, in Conference in modern analysis and probability, Contemporary Mathematics. Amer. Math. Soc. 26 (1984) 189-206. | Zbl

[22] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990). | MR | Zbl

[23] M.K. Kerr and G.A. Curchill, Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS 98 (2001) 8961-8965. | Zbl

[24] B. King, Step-wise clustering procedures. J. Am. Stat. Assoc. 69 (1967) 86-101.

[25] L.M. Mcshane, D. Radmacher, B. Freidlin, R. Yu, M.C. Li and R. Simon, Method for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18 (2002) 1462-1469.

[26] S. Monti, P. Tamayo, J. Mesirov and T. Golub, Consensus Clustering: A Resampling-based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning 52 (2003) 91-118. | Zbl

[27] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comp. App. Math. 20 (1987) 53-65. | Zbl

[28] M. Smolkin and D. Gosh, Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 36 (2003).

[29] J.B. Sorensen, F.R. Hirsch, A. Gazdar and J.E. Olsen, Interobserver variability in histopahologic subtyping and grading of pulmonary adenocarcinoma. Cancer 71 (1993) 2971-2976.

[30] G. Valentini, Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data. Bioinformatics 22 (2006) 369-370.

[31] J.H. Ward, Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58 (1963) 236-244.

Cited by Sources: