A non asymptotic penalized criterion for gaussian mixture model selection
ESAIM: Probability and Statistics, Tome 15 (2011) , pp. 41-68.

Specific Gaussian mixtures are considered to solve simultaneously variable selection and clustering problems. A non asymptotic penalized criterion is proposed to choose the number of mixture components and the relevant variable subset. Because of the non linearity of the associated Kullback-Leibler contrast on Gaussian mixtures, a general model selection theorem for maximum likelihood estimation proposed by [Massart Concentration inequalities and model selection Springer, Berlin (2007). Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23 (2003)] is used to obtain the penalty function form. This theorem requires to control the bracketing entropy of Gaussian mixture families. The ordered and non-ordered variable selection cases are both addressed in this paper.

DOI : https://doi.org/10.1051/ps/2009004
Classification : 62H30,  62G07
Mots clés : model-based clustering, variable selection, penalized likelihood criterion, bracketing entropy
@article{PS_2011__15__41_0,
     author = {Maugis, Cathy and Michel, Bertrand},
     title = {A non asymptotic penalized criterion for gaussian mixture model selection},
     journal = {ESAIM: Probability and Statistics},
     pages = {41--68},
     publisher = {EDP-Sciences},
     volume = {15},
     year = {2011},
     doi = {10.1051/ps/2009004},
     mrnumber = {2870505},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ps/2009004/}
}
Maugis, Cathy; Michel, Bertrand. A non asymptotic penalized criterion for gaussian mixture model selection. ESAIM: Probability and Statistics, Tome 15 (2011) , pp. 41-68. doi : 10.1051/ps/2009004. http://www.numdam.org/articles/10.1051/ps/2009004/

[1] H. Akaike, Information theory and an extension of the maximum likelihood principle, in Second International Symposium on Information Theory (Tsahkadsor, 1971), Akadémiai Kiadó, Budapest (1973) 267-281. | MR 483125 | Zbl 0283.62006

[2] S. Arlot and P. Massart, Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. (2008) (to appear).

[3] J.D. Banfield and A.E. Raftery, Model-based Gaussian and non-Gaussian clustering. Biometrics 49 (1993) 803-821. | MR 1243494 | Zbl 0794.62034

[4] A. Barron, L. Birgé and P. Massart, Risk bounds for model selection via penalization. Prob. Th. Re. Fields 113 (1999) 301-413. | MR 1679028 | Zbl 0946.62036

[5] J.-P. Baudry, Clustering through model selection criteria. Poster session at One Day Statistical Workshop in Lisieux. http://www.math.u-psud.fr/ baudry, June (2007).

[6] C. Biernacki, G. Celeux and G. Govaert, Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Analy. Mach. Intell. 22 (2000) 719-725.

[7] C. Biernacki, G. Celeux, G. Govaert and F. Langrognet, Model-based cluster and discriminant analysis with the mixmod software. Comput. Stat. Data Anal. 51 (2006) 587-600. | MR 2297473 | Zbl 1157.62431

[8] L. Birgé and P. Massart, Gaussian model selection. J. Eur. Math. Soc. 3 (2001) 203-268. | MR 1848946 | Zbl 1037.62001

[9] L. Birgé and P. Massart, A generalized Cp criterion for Gaussian model selection. Prépublication n° 647, Universités de Paris 6 et Paris 7 (2001).

[10] L. Birgé and P. Massart. Minimal penalties for Gaussian model selection. Prob. Th. Rel. Fields 138 (2007) 33-73. | MR 2288064 | Zbl 1112.62082

[11] L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam. Springer, New York (1997) 55-87. | MR 1462939 | Zbl 0920.62042

[12] C. Bouveyron, S. Girard and C. Schmid, High-Dimensional Data Clustering. Comput. Stat. Data Anal. 52 (2007) 502-519. | MR 2409998

[13] K.P. Burnham and D.R. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag, New York, 2nd edition (2002). | MR 1919620 | Zbl 1005.62007

[14] G. Castellan, Modified Akaike's criterion for histogram density estimation. Technical report, Université Paris-Sud 11 (1999).

[15] G. Castellan, Density estimation via exponential model selection. IEEE Trans. Inf. Theory 49 (2003) 2052-2060. | MR 2004713 | Zbl 1288.62054

[16] G. Celeux and G. Govaert, Gaussian parsimonious clustering models. Pattern Recogn. 28 (1995) 781-793.

[17] A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc, Ser. B. 39 (1977) 1-38. | MR 501537 | Zbl 0364.62022

[18] C.R. Genovese and L. Wasserman, Rates of convergence for the Gaussian mixture sieve. Ann. Stat. 28 (2000) 1105-1127. | MR 1810921 | Zbl 1105.62333

[19] S. Ghosal and A.W. Van Der Vaart, Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Stat. 29 (2001) 1233-1263. | MR 1873329 | Zbl 1043.62025

[20] C. Keribin, Consistent estimation of the order of mixture models. Sankhyā. The Indian Journal of Statistics. Series A 62 (2000) 49-66. | MR 1769735 | Zbl 1081.62516

[21] M.H. Law, M.A.T. Figueiredo and A.K. Jain, Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004) 1154-1166.

[22] E. Lebarbier, Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Proc. 85 (2005) 717-736. | Zbl 1148.94403

[23] V. Lepez, Potentiel de réserves d'un bassin pétrolier: modélisation et estimation. Ph.D. thesis, Université Paris-Sud 11 (2002).

[24] P. Massart, Concentration inequalities and model selection. Springer, Berlin (2007). Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23 (2003). | MR 2319879 | Zbl 1170.60006

[25] C. Maugis, Sélection de variables pour la classification non supervisée par mélanges gaussiens. Applications à l'étude de données transcriptomes. Ph.D. thesis, University Paris-Sud 11 (2008).

[26] C. Maugis, G. Celeux and M.-L. Martin-Magniette, Variable Selection for Clustering with Gaussian Mixture Models. Biometrics (2008) (to appear). | MR 2649842 | Zbl 1172.62021

[27] C. Maugis and B. Michel, Slope heuristics for variable selection and clustering via Gaussian mixtures. Technical Report 6550, INRIA (2008).

[28] A.E. Raftery and N. Dean, Variable Selection for Model-Based Clustering. J. Am. Stat. Assoc. 101 (2006) 168-178. | MR 2268036 | Zbl 1118.62339

[29] G. Schwarz, Estimating the dimension of a model. Ann. Stat. 6 (1978) 461-464. | MR 468014 | Zbl 0379.62005

[30] D. Serre, Matrices. Springer-Verlag, New York (2002). | MR 1923507 | Zbl 1011.15001

[31] M. Talagrand, Concentration of measure and isoperimetric inequalities in product spaces. Publ. Math., Inst. Hautes Étud. Sci. 81 (1995) 73-205. | Numdam | MR 1361756 | Zbl 0864.60013

[32] M. Talagrand, New concentration inequalities in product spaces. Invent. Math. 126 (1996) 505-563. | MR 1419006 | Zbl 0893.60001

[33] F. Villers, Tests et sélection de modèles pour l'analyse de données protéomiques et transcriptomiques. Ph.D. thesis, University Paris-Sud 11 (2007).