Numéro spécial : analyse de mélanges
A mixture model to characterize genomic alterations of tumors
[Un modèle de mélange pour caractériser les altérations génomiques tumorales]
Journal de la société française de statistique, Tome 160 (2019) no. 1, pp. 130-148.

Characterizing the genomic copy number alterations (CNA) in cancer is of major importance in order to develop personalized medicine. Single nucleotide polymorphism (SNP) arrays are still in use to measure CNA profiles. Among the methods for SNP-array analysis, the Genome Alteration Print (GAP) by Popova et al, based on a preliminary segmentation of SNP-array profiles, uses a deterministic approach to infer the absolute copy numbers profile. We develop a probabilistic model for GAP and define a Gaussian mixture model where centers are constrained to belong to a frame depending on unknown parameters such as the proportion of normal tissue. The estimation is performed using an expectation-maximization (EM) algorithm to recover the parameters characterizing the genomic alterations as well as the most probable copy number change of each segment and the unknown proportion of normal tissue. We claim to deduce the tumor ploidy from penalized model selection criterion. Our model is tested on simulated and real data.

La caractérisation des altérations du nombre de copies dans le génome est d’importance capitale pour développer une médecine personnalisée en cancérologie. Les puces à SNPs (Single Nucleotide Polymorphism), une variante de puce à ADN, sont toujours utilisées pour mesurer les profils d’altération du nombre de copies. Parmi les méthodes d’analyse de profil de SNPs, la méthode GAP (Genome Alteration Print) de Popova et al, basée sur une segmentation préliminaire de profils issus de puces SNPs, utilise une approche déterministe pour déterminer le profil du nombre absolu de copies. Nous développons un modèle probabiliste pour la méthode GAP et définissons un modèle de mélange gaussien dont les centres sont contraints d’appartenir à un réseau dépendant de paramètres inconnus tels que la proportion de tissu tumoral dans le prélèvement. L’estimation est effectuée à l’aide d’un algorithme EM (expectation-maximization) permettant d’accéder non seulement aux paramètres mais aussi au nombre altéré de copies le plus probable sur chaque segment ainsi que la proportion tumorale inconnue. Nous proposons de déduire la ploïdie tumorale en utilisant un critère pénalisé de choix de modèle. Notre modèle est testé sur des données simulées et appliqué à un exemple de données de cancer du côlon.

Mots clés : mixture model, EM algorithm, BIC criterion, slope heuristics, cancer, GAP method, SNP-array
     author = {Keribin, Christine and Liu, Yi and Popova, Tatiana and Rozenholc, Yves},
     title = {A mixture model to characterize genomic alterations of tumors},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {130--148},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {160},
     number = {1},
     year = {2019},
     zbl = {1417.62316},
     mrnumber = {3928543},
     language = {en},
     url = {}
AU  - Keribin, Christine
AU  - Liu, Yi
AU  - Popova, Tatiana
AU  - Rozenholc, Yves
TI  - A mixture model to characterize genomic alterations of tumors
JO  - Journal de la société française de statistique
PY  - 2019
DA  - 2019///
SP  - 130
EP  - 148
VL  - 160
IS  - 1
PB  - Société française de statistique
UR  -
UR  -
UR  -
LA  - en
ID  - JSFS_2019__160_1_130_0
ER  - 
Keribin, Christine; Liu, Yi; Popova, Tatiana; Rozenholc, Yves. A mixture model to characterize genomic alterations of tumors. Journal de la société française de statistique, Tome 160 (2019) no. 1, pp. 130-148.

[1] Arlot, Sylvain Minimal penalties and the slope heuristics: a survey, arXiv preprint arXiv:1901.07277 (2019) | MR 4021408 | Zbl 1437.62121

[2] Baudry, Jean-Patrick Sélection de modèle pour la classification non supervisée. Choix du nombre de classes. (2009) (Ph. D. Thesis)

[3] Birgé, Lucien; Massart, Pascal Gaussian model selection, Journal of the European Mathematical Society, Volume 3 (2001) no. 3, pp. 203-268 | MR 1848946 | Zbl 1037.62001

[4] Birgé, Lucien; Massart, Pascal Minimal penalties for Gaussian model selection, Probability theory and related fields, Volume 138 (2007) no. 1-2, pp. 33-73 | MR 2288064 | Zbl 1112.62082

[5] Baudry, Jean-Patrick; Maugis, Cathy; Michel, Bertrand Slope heuristics: overview and implementation, Statistics and Computing, Volume 22 (2012) no. 2, pp. 455-470 | MR 2865029 | Zbl 1322.62007

[6] Carr, Steven M; Marshall, H Dawn; Duggan, Ana T; Flynn, Sarah MC; Johnstone, Kimberley A; Pope, Angela M; Wilkerson, Corinne D Phylogeographic genomics of mitochondrial DNA: highly-resolved patterns of intraspecific evolution and a multi-species, microarray-based DNA sequencing strategy for biodiversity studies, Comparative Biochemistry and Physiology Part D: Genomics and Proteomics, Volume 3 (2008) no. 1, pp. 1-11

[7] Comte, Fabienne; Rozenholc, Yves A new algorithm for fixed design regression and denoising, Annals of the Institute of Statistical Mathematics, Volume 56 (2004) no. 3, pp. 449-473 | MR 2095013 | Zbl 1057.62030

[8] Dempster, Arthur P; Laird, Nan M; Rubin, Donald B Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (methodological) (1977), pp. 1-38 | MR 501537 | Zbl 0364.62022

[9] Keribin, Christine Consistent estimation of the order of mixture models, Sankhyā: The Indian Journal of Statistics, Series A (2000), pp. 49-66 | MR 1769735 | Zbl 1081.62516

[10] Li, Ao; Liu, Zongzhi; Lezon-Geyda, Kimberly; Sarkar, Sudipa; Lannin, Donald; Schulz, Vincent; Krop, Ian; Winer, Eric; Harris, Lyndsay; Tuck, David GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays, Nucleic Acids Research, Volume 39 (2011) no. 12, pp. 4928-4941 | Article

[11] Mosén-Ansorena, David; Aransay, Ana M; Rodríguez-Ezpeleta, Naiara Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data, BMC Bioinformatics, Volume 13 (2012) no. 1 | Article

[12] Popova, Tatiana; Manié, Elodie; Stoppa-Lyonnet, Dominique; Rigaill, Guillem; Barillot, Emmanuel; Stern, Marc Henri Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays, Genome Biol, Volume 10 (2009) no. 11, p. R128-R128

[13] Pinkel, Daniel; Segraves, Richard; Sudar, Damir; Clark, Steven; Poole, Ian; Kowbel, David; Collins, Colin; Kuo, Wen-Lin; Chen, Chira; Zhai, Ye High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays, Nature genetics, Volume 20 (1998) no. 2, pp. 207-211 | Article

[14] Staaf, Johan; Lindgren, David; Vallon-Christersson, Johan; Isaksson, Anders; Goransson, Hanna; Juliusson, Gunnar; Rosenquist, Richard; Hoglund, Mattias; Borg, Ake; Ringner, Markus Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays, Genome Biology, Volume 9 (2008) no. 9 | Article

[15] Solinas-Toldo, Sabina; Lampel, Stefan; Stilgenbauer, Stephan; Nickolenko, Jeremy; Benner, Axel; Döhner, Hartmut; Cremer, Thomas; Lichter, Peter Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances, Genes, chromosomes and cancer, Volume 20 (1997) no. 4, pp. 399-407

[16] Sun, Wei; Wright, Fred A; Tang, Zhengzheng; Nordgard, Silje H; Van Loo, Peter; Yu, Tianwei; Kristensen, Vessela N; Perou, Charles M Integrated study of copy number states and genotype calls using high-density SNP arrays, Nucleic acids research, Volume 37 (2009) no. 16, pp. 5365-5377

[17] Van Loo, Peter; Nordgard, Silje H; Lingjærde, Ole Christian; Russnes, Hege G; Rye, Inga H; Sun, Wei; Weigman, Victor J; Marynen, Peter; Zetterberg, Anders; Naume, Bjørn Allele-specific copy number analysis of tumors, Proceedings of the National Academy of Sciences, Volume 107 (2010) no. 39, pp. 16910-16915

[18] Yau, Christopher; Mouradov, Dmitri; Jorissen, Robert N; Colella, Stefano; Mirza, Ghazala; Steers, Graham; Harris, Adrian; Ragoussis, Jiannis; Sieber, Oliver; Holmes, Christopher C A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data, Genome Biol, Volume 11 (2010) no. 9, p. R92-R92