Numéro spécial : analyse de mélanges
Testing for univariate two-component Gaussian mixture in practice
[Comment identifier un mélange gaussien en pratique ? Une étude comparative de tests]
Journal de la société française de statistique, Tome 160 (2019) no. 1, pp. 86-113.

Après une présentation générale de la problématique des mélanges, dans le but de déterminer leur nombre de composantes, nous envisageons plus précisément les mélanges gaussiens univariés. Une abondante littérature a été consacrée à ce domaine. Mais les procédures de mise en œuvre des résultats théoriques et les études comparatives des diverses procédures font cruellement défaut. Nous souhaitons apporter une contribution en ce sens, afin de faciliter les applications. Pour tester une hypothèse d’homogénéité contre une hypothèse de mélange à deux composantes, nous avons retenu deux grandes familles de tests : les tests du rapport des vraisemblances (LRT) et les tests EM. Nous proposons notamment pour le LRT une approche par plug-in de certains paramètres supposés connus dans la théorie asymptotique, ce qui rend ces tests utilisables en pratique. Pour les quatre cas de mélanges envisagés ici, nous fournissons les valeurs critiques et comparons les performances de ces tests en termes de puissance. Nous illustrons leur mise en œuvre sur des données réelles qui se rapportent au temps qui sépare les périodes d’ovulation et d’agnelage chez des brebis dans le cadre d’un projet en Région Centre.

We consider univariate Gaussian mixtures theory and applications, and particularly the problem of testing the null hypothesis of homogeneity (one component) against two components. Several approaches have been proposed in the literature during the last decades. We focus on two different techniques, one based on the Likelihood-Ratio Test (LRT), and another one based on estimation of the parameters of the mixture grounded on some specific adaptation of the well-known EM algorithm often called the EM-test. We propose in particular a novel methodology allowing application of the LRT in actual situations, by plugging-in estimates that are assumed known in asymptotic setup. We aim to provide useful comparisons between different techniques, together with guidelines for practitioners in order to enable them to use theoretical advances for analyzing actual data of realistic sample sizes. We finally illustrate these methods in an application to real data corresponding to the number of days between two events concerning ovarian response and lambing for ewes.

Mots clés : Modèle de mélange, Test du rapport de vraisemblance, Test EM, Processus Gaussien
@article{JSFS_2019__160_1_86_0,
     author = {Chauveau, Didier and Garel, Bernard and Mercier, Sabine},
     title = {Testing for univariate two-component {Gaussian} mixture in practice},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {86--113},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {160},
     number = {1},
     year = {2019},
     zbl = {1417.62033},
     mrnumber = {3928541},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2019__160_1_86_0/}
}
TY  - JOUR
AU  - Chauveau, Didier
AU  - Garel, Bernard
AU  - Mercier, Sabine
TI  - Testing for univariate two-component Gaussian mixture in practice
JO  - Journal de la société française de statistique
PY  - 2019
DA  - 2019///
SP  - 86
EP  - 113
VL  - 160
IS  - 1
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2019__160_1_86_0/
UR  - https://zbmath.org/?q=an%3A1417.62033
UR  - https://www.ams.org/mathscinet-getitem?mr=3928541
LA  - en
ID  - JSFS_2019__160_1_86_0
ER  - 
Chauveau, Didier; Garel, Bernard; Mercier, Sabine. Testing for univariate two-component Gaussian mixture in practice. Journal de la société française de statistique, Tome 160 (2019) no. 1, pp. 86-113. http://www.numdam.org/item/JSFS_2019__160_1_86_0/

[1] Bordes, Laurent; Chauveau, Didier Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data, Computational Statistics, Volume 31 (2016) no. 4, pp. 1513-1538 | MR 3573089 | Zbl 1348.65016

[2] Bickel, P.; Chernoff, H. Asymptotic distribution of the likelihood ratio statistic in a prototypical non regular problem (1993), pp. 83-96

[3] Benaglia, Tatiana; Chauveau, Didier; Hunter, David R.; Young, Derek mixtools: An R Package for Analyzing Finite Mixture Models, Journal of Statistical Software, Volume 32 (2009) no. 6, pp. 1-29 http://www.jstatsoft.org/v32/i06/

[4] Bertillon, L. A. Démographie figurée de la France, Masson, Paris, 1874

[5] Bertillon, L.A. Moyenne. Dictionnaire encyclopédique des sciences médicales (1876), pp. 296-324

[6] Bhattacharya, C.G. A simple method for resolution of a distribution into its Gaussian components, Biometrics, Volume 23 (1967), pp. 115-135

[7] Böhning, D. Computer-assisted analysis of mixtures and applications, Monographs on Statistics and Applied Probability 81, Chapman & Hall, 2000 | MR 1684363 | Zbl 0951.62088

[8] Chen, H.; Chen, J. Large sample distribution of the likelihood ratio test for normal mixtures, Statistics and Probability Letters, Volume 52 (2001), pp. 125-133 | MR 1841402 | Zbl 0981.62015

[9] Chen, H.; Chen, J.; Kalbfleisch, D. A modified likelihood ratio test for homogeneity in finite mixture models, Journal Royal Statistical Society, Volume 63 (2001), pp. 19-29 | MR 1811988 | Zbl 0976.62011

[10] Chen, H.; Chen, J.; Kalbfleisch, D. Testing for a finite mixture model with two components, Journal Royal Statistical Society, Volume 66 (2004), pp. 95-115 | MR 2035761 | Zbl 1061.62025

[11] Chauveau, Didier; Hunter, David R. ECM and MM algorithm for mixtures with constrained parameters (2013) no. hal-00625285, version 2 https://hal.archives-ouvertes.fr/hal-00625285 (Technical report)

[12] Chernoff, H. On the distribution of the likelihood ratio, Annals of Mathematical Statistics, Volume 25 (1954), pp. 573-578 | MR 65087 | Zbl 0056.37102

[13] Chen, J.; Li, P. Hypothesis test for normal mixture models: the EM approach, The Annals of Statistics, Volume 37 (2009), pp. 2523-2542 | MR 2543701 | Zbl 1173.62007

[14] Chen, Jiahua; Li, Pengfei Tuning the EM-test for finite mixture models, The Canadian Journal of Statistics, Volume 39 (2011) no. 3, pp. 389-404 | MR 2842420 | Zbl 1230.62020

[15] Chen, J.; Li, P.; Fu, Y. Inference on the order of a normal mixture, Journal of the American Statistical Association, Volume 107 (2012), pp. 1096-1115 | MR 3010897

[16] Dacunha-Castelle, D.n; Gassiat, E. Testing in locally conic models and application to mixture models, Esaim Prob. Statistics, Volume 1 (1997), pp. 285-317 | Numdam | MR 1468112 | Zbl 1007.62507

[17] Donoho, D.; Jin, J. Higher criticism for detecting sparse heterogeneous mixtures, Annals of Statistics (2004), pp. 962-994 | MR 2065195 | Zbl 1092.62051

[18] Everitt, B.; Hand, D. Finite mixture distributions, Chapman and Hall, London, 1981 | MR 624267 | Zbl 0466.62018

[19] Feng, Z.D.; McCulloch, C.E. Using bootstrap likelihood ratio in finite mixture models, Journal of the Royal Statistical Society B (1996), pp. 609-617 | Zbl 0906.62021

[20] Fabre-Nys, Claude; Chanvallon, Audrey; Dupont, Joëlle; Lardic, Lionel; Lomet, Didier; Martinet, Stéphanie; Scaramuzzi, Rex J The “Ram Effect”: A “Non-Classical”Mechanism for Inducing LH Surges in Sheep, PLoS ONE, Volume 11 (2016) no. 7 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934854/ | Article

[21] Frühwirth-Schnatter, S. Finite mixture and Markov switching models, Springer-Verlag, New-York, 2006 | MR 2265601 | Zbl 1108.62002

[22] Garel, Bernard Likelihood ratio test for univariate Gaussian mixture, Journal of Statistical Planning and Inference, Volume 96 (2001), pp. 325-350 | MR 1842105 | Zbl 0972.62011

[23] Garel, B. Asymptotic theory of the likelihood ratio test for the identification of a mixture, Journal of Statistical Planning and Inference, Volume 131 (2005), pp. 272-296 | MR 2139373 | Zbl 1061.62028

[24] Garel, B. Percentiles of the supremum of a nonstationary Gaussian Process, Proceedings of the Fifth Workshop on Simulation (2005), pp. 267-272

[25] Garel, Bernard 3, Modèles de mélanges : le nombre de composants (2013), pp. 57-84

[26] Garel, B.; Goussanou, F. Removing separation conditions in a 1 against 3-components Gaussian mixture problem, Classification, Clustering and Data Analysis (2002), pp. 61-73 | MR 2010439 | Zbl 1032.62013

[27] Ghosh, J.K.; Sen, P.K. On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results, Proc. Berkeley Conf. in honor of Jerzy Neyman and Jack Kiefer (1985), pp. 789-806 | MR 822065

[28] Hartigan, J.A. A failure of likelihood asymptotics for normal mixtures, Proc. Berkeley Conf. in honor of Jerzy Neyman and Jack Kiefer (1985), pp. 807-810 | MR 822066

[29] Hall, P.; Stewart, M. A constrained formulation of maximum-likelihood estimation for normal mixture distributions, Annals of Statistics, Volume 13 (1985), pp. 795-800 | MR 790575 | Zbl 0576.62039

[30] Li, Shaoting; Chen, Jiahua; Li., Pengfei MixtureInf: Inference for Finite Mixture Models (2016) https://CRAN.R-project.org/package=MixtureInf (R package version 1.1)

[31] Li, P.; Chen, J.; Marriot, P. Non-finite Fisher information and homogeneity: an EM approach, Biometrika, Volume 96 (2009), pp. 411-426 | MR 2507152 | Zbl 1163.62012

[32] Lindsay, B. Mixture models: theory, geometry and applications, NSF-CBMS Regional Conference Series in Probability and Statistics, 5, Institute of Mathematical Statistics, Hayward, 1995 | Zbl 1163.62326

[33] Lemdani, M.; Pons, O. Likelihood ratio tests in contamination models, Bernoulli, Volume 5 (1999), pp. 705-719 | MR 1704563 | Zbl 0929.62015

[34] Liu, X.; Shao, Y. Asymptotics for likelihood ratio tests under loss of identifiability, Annals of Statistics, Volume 31 (2003), pp. 807-832 | MR 1994731 | Zbl 1032.62014

[35] Liu, X.; Shao, Y. Asymptotics for the likelihood ratio test in a two-component normal mixture model, Journal Statistical Planning and Inference, Volume 123 (2004), pp. 61-81 | MR 2058122 | Zbl 1050.62025

[36] Maciejowska, Katarzyna Assessing the number of components in a normal mixture: an alternative approach (2013) no. 50303 https://ideas.repec.org/p/pra/mprapa/50303.html (MPRA Paper)

[37] McLachlan, G.J.; Basford, K.E. Mixture Models: Inference and Aplications to Clustering, Marcel Dekker, New-York, 1988 | MR 926484 | Zbl 0697.62050

[38] McLachlan, G.J. On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture, Appli. Statist., Volume 36 (1987), pp. 318-324

[39] McLachlan, G.J.; Krishnan, T. The EM algorithm and extensions, Wiley and Sons, New-York, 1997 | MR 1417721

[40] McLachlan, Geoffrey; Peel, David Finite mixture models, Wiley Series in Probability and Statistics: Applied Probability and Statistics, Wiley-Interscience, New York, 2000, xxii+419 pages | MR 1789474 | Zbl 0963.62061

[41] Newcomb, S. Discussion and results of observations on transits of Mercury from 1677 to 1881, Astr. Papers, Volume 1 (1882), pp. 363-487

[42] Newcomb, S. A generalized theory of the combination of observations so as to obtain the best result, American Journal of mathematics, Volume 8 (1886), pp. 343-366 | JFM 18.0183.01 | MR 1505430

[43] Pearson, K. Testing homogeneity in a multivariate mixture model, Philosophical Transactions of the Royal Society of London, A, Volume 185 (1894), pp. 71-110 | JFM 25.0347.02

[44] Quetelet, A. Lettres à S.A.R. le Duc régnant de Saxe-Cobourg et Gotha, sur la théorie des probabilités appliquée aux sciences morales et politiques, Hayez, Bruxelles, 1846

[45] R Core Team R: A Language and Environment for Statistical Computing (2016) https://www.R-project.org/

[46] Redner, R.A. Note on the consistency of the maximum likelihood estimate for non identifiable distributions, Ann. Statistics, Volume 9 (1981), pp. 225-228 | MR 600553 | Zbl 0453.62021

[47] Saint Pierre, G. Identification du nombre de composants d’un mélange gaussien par maximum de vraisemblance dans le cas univarié (2003) (Technical report)

[48] Schlattmann, P. Medical applications of finite mixture models, Statistics for Biology and Health, Springer-Verlag, Berlin, Heidelberg, 2009 | Zbl 1158.62082

[49] Thode, H.C.; Finch, S.J.; Mendell, N.R. Simulated percentage points for the null distribution of the likelihood ratio for a mixture of two normals, Biometrics, Volume 44 (1988), pp. 1195-1201 | MR 981003 | Zbl 0715.62040

[50] Titterington, D. M.; Smith, A. F. M.; Makov, U. E. Statistical analysis of finite mixture distributions, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons Ltd., Chichester, 1985, x+243 pages | MR 838090 | Zbl 0646.62013

[51] Wilks, S.S. The large sample distribution of the likelihood ratio for testing composite hypotheses, Annals of Mathematical Statistics, Volume 9 (1938), pp. 60-62 | JFM 64.1211.05 | Zbl 0018.32003

[52] Wolfe, J.H. A Monte-Carlo study of the sampling distribution of the likelihood ratio for mixtures of multinormal distributions, Technical Bulletin STB, 72-2, U.S. Nav. Pers. and Train. Res. Lab., San Diego, 1971