Characterizing the genomic copy number alterations (CNA) in cancer is of major importance in order to develop personalized medicine. Single nucleotide polymorphism (SNP) arrays are still in use to measure CNA profiles. Among the methods for SNP-array analysis, the Genome Alteration Print (GAP) by Popova et al, based on a preliminary segmentation of SNP-array profiles, uses a deterministic approach to infer the absolute copy numbers profile. We develop a probabilistic model for GAP and define a Gaussian mixture model where centers are constrained to belong to a frame depending on unknown parameters such as the proportion of normal tissue. The estimation is performed using an expectation-maximization (EM) algorithm to recover the parameters characterizing the genomic alterations as well as the most probable copy number change of each segment and the unknown proportion of normal tissue. We claim to deduce the tumor ploidy from penalized model selection criterion. Our model is tested on simulated and real data.
La caractérisation des altérations du nombre de copies dans le génome est d’importance capitale pour développer une médecine personnalisée en cancérologie. Les puces à SNPs (Single Nucleotide Polymorphism), une variante de puce à ADN, sont toujours utilisées pour mesurer les profils d’altération du nombre de copies. Parmi les méthodes d’analyse de profil de SNPs, la méthode GAP (Genome Alteration Print) de Popova et al, basée sur une segmentation préliminaire de profils issus de puces SNPs, utilise une approche déterministe pour déterminer le profil du nombre absolu de copies. Nous développons un modèle probabiliste pour la méthode GAP et définissons un modèle de mélange gaussien dont les centres sont contraints d’appartenir à un réseau dépendant de paramètres inconnus tels que la proportion de tissu tumoral dans le prélèvement. L’estimation est effectuée à l’aide d’un algorithme EM (expectation-maximization) permettant d’accéder non seulement aux paramètres mais aussi au nombre altéré de copies le plus probable sur chaque segment ainsi que la proportion tumorale inconnue. Nous proposons de déduire la ploïdie tumorale en utilisant un critère pénalisé de choix de modèle. Notre modèle est testé sur des données simulées et appliqué à un exemple de données de cancer du côlon.
@article{JSFS_2019__160_1_130_0, author = {Keribin, Christine and Liu, Yi and Popova, Tatiana and Rozenholc, Yves}, title = {A mixture model to characterize genomic alterations of tumors}, journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique}, pages = {130--148}, publisher = {Soci\'et\'e fran\c{c}aise de statistique}, volume = {160}, number = {1}, year = {2019}, zbl = {1417.62316}, mrnumber = {3928543}, language = {en}, url = {http://www.numdam.org/item/JSFS_2019__160_1_130_0/} }
TY - JOUR AU - Keribin, Christine AU - Liu, Yi AU - Popova, Tatiana AU - Rozenholc, Yves TI - A mixture model to characterize genomic alterations of tumors JO - Journal de la société française de statistique PY - 2019 DA - 2019/// SP - 130 EP - 148 VL - 160 IS - 1 PB - Société française de statistique UR - http://www.numdam.org/item/JSFS_2019__160_1_130_0/ UR - https://zbmath.org/?q=an%3A1417.62316 UR - https://www.ams.org/mathscinet-getitem?mr=3928543 LA - en ID - JSFS_2019__160_1_130_0 ER -
Keribin, Christine; Liu, Yi; Popova, Tatiana; Rozenholc, Yves. A mixture model to characterize genomic alterations of tumors. Journal de la société française de statistique, Tome 160 (2019) no. 1, pp. 130-148. http://www.numdam.org/item/JSFS_2019__160_1_130_0/
[1] Minimal penalties and the slope heuristics: a survey, arXiv preprint arXiv:1901.07277 (2019) | MR 4021408 | Zbl 1437.62121
[2] Sélection de modèle pour la classification non supervisée. Choix du nombre de classes. (2009) (Ph. D. Thesis)
[3] Gaussian model selection, Journal of the European Mathematical Society, Volume 3 (2001) no. 3, pp. 203-268 | MR 1848946 | Zbl 1037.62001
[4] Minimal penalties for Gaussian model selection, Probability theory and related fields, Volume 138 (2007) no. 1-2, pp. 33-73 | MR 2288064 | Zbl 1112.62082
[5] Slope heuristics: overview and implementation, Statistics and Computing, Volume 22 (2012) no. 2, pp. 455-470 | MR 2865029 | Zbl 1322.62007
[6] Phylogeographic genomics of mitochondrial DNA: highly-resolved patterns of intraspecific evolution and a multi-species, microarray-based DNA sequencing strategy for biodiversity studies, Comparative Biochemistry and Physiology Part D: Genomics and Proteomics, Volume 3 (2008) no. 1, pp. 1-11
[7] A new algorithm for fixed design regression and denoising, Annals of the Institute of Statistical Mathematics, Volume 56 (2004) no. 3, pp. 449-473 | MR 2095013 | Zbl 1057.62030
[8] Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (methodological) (1977), pp. 1-38 | MR 501537 | Zbl 0364.62022
[9] Consistent estimation of the order of mixture models, Sankhyā: The Indian Journal of Statistics, Series A (2000), pp. 49-66 | MR 1769735 | Zbl 1081.62516
[10] GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays, Nucleic Acids Research, Volume 39 (2011) no. 12, pp. 4928-4941 | Article
[11] Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data, BMC Bioinformatics, Volume 13 (2012) no. 1 | Article
[12] Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays, Genome Biol, Volume 10 (2009) no. 11, p. R128-R128
[13] High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays, Nature genetics, Volume 20 (1998) no. 2, pp. 207-211 | Article
[14] Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays, Genome Biology, Volume 9 (2008) no. 9 http://genomebiology.com/2008/9/9/R136 | Article
[15] Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances, Genes, chromosomes and cancer, Volume 20 (1997) no. 4, pp. 399-407
[16] Integrated study of copy number states and genotype calls using high-density SNP arrays, Nucleic acids research, Volume 37 (2009) no. 16, pp. 5365-5377
[17] Allele-specific copy number analysis of tumors, Proceedings of the National Academy of Sciences, Volume 107 (2010) no. 39, pp. 16910-16915
[18] A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data, Genome Biol, Volume 11 (2010) no. 9, p. R92-R92