Numéro spécial : analyse de mélanges
Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models
[Estimation par maximum de vraisemblance régularisé et sélection de variables dans les modèles de mélanges d’experts]
Journal de la société française de statistique, Tome 160 (2019) no. 1, pp. 57-85.

Les mélanges d’experts (MoE) sont des modèles efficaces pour la modélisation de données hétérogènes dans de nombreux problèmes en apprentissage statistique, y compris en régression, en classification et en discrimination. Généralement ajustés par maximum de vraisemblance via l’algorithme EM, leur application aux problémes de grande dimension est difficile dans un tel contexte. Nous considérons le problème de l’estimation et de la sélection de variables dans les modèles de mélanges d’experts, et proposons une approche d’estimation par maximum de vraisemblance régularisé qui encourage des solutions parcimonieuses pour des modéles de données de régression hétérogènes comportant un nombre de prédicteurs potentiellement grand. La méthode de régularisation proposée, contrairement aux méthodes de l’état de l’art sur les mélanges d’experts, ne se base pas sur une pénalisation approchée et ne nécessite pas de seuillage pour retrouver la solution parcimonieuse. L’estimation parcimonieuse des paramètres s’appuie sur une régularisation de l’estimateur du maximum de vraisemblance pour les experts et les fonctions d’activations, mise en œuvre par deux versions d’un algorithme EM hybride. L’étape M de l’algorithme, effectuée par montée de coordonnées ou par un algorithme MM, évite l’inversion de matrices dans la mise à jour et rend ainsi prometteur le passage de l’algorithme à l’échelle. Une étude expérimentale met en évidence de bonnes performances de l’approche proposée.

Mixture of Experts (MoE) are successful models for modeling heterogeneous data in many statistical learning problems including regression, clustering and classification. Generally fitted by maximum likelihood estimation via the well-known EM algorithm, their application to high-dimensional problems is still therefore challenging. We consider the problem of fitting and feature selection in MoE models, and propose a regularized maximum likelihood estimation approach that encourages sparse solutions for heterogeneous regression data models with potentially high-dimensional predictors. Unlike state-of-the art regularized MLE for MoE, the proposed modelings do not require an approximate of the penalty function. We develop two hybrid EM algorithms: an Expectation-Majorization-Maximization (EM/MM) algorithm, and an EM algorithm with coordinate ascent algorithm. The proposed algorithms allow to automatically obtaining sparse solutions without thresholding, and avoid matrix inversion by allowing univariate parameter updates. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data.

Mots clés : Mélanges d’experts, Classification à base de modéle, Sélection de variable, Régularisation, Algorithme EM, Montée de coordonnées, Algorithme MM, Données de grande dimension
@article{JSFS_2019__160_1_57_0,
     author = {Chamroukhi, Faicel and Huynh, Bao-Tuyen},
     title = {Regularized {Maximum} {Likelihood} {Estimation} and {Feature} {Selection} in {Mixtures-of-Experts} {Models}},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {57--85},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {160},
     number = {1},
     year = {2019},
     zbl = {1417.62170},
     mrnumber = {3928540},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2019__160_1_57_0/}
}
TY  - JOUR
AU  - Chamroukhi, Faicel
AU  - Huynh, Bao-Tuyen
TI  - Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models
JO  - Journal de la société française de statistique
PY  - 2019
DA  - 2019///
SP  - 57
EP  - 85
VL  - 160
IS  - 1
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2019__160_1_57_0/
UR  - https://zbmath.org/?q=an%3A1417.62170
UR  - https://www.ams.org/mathscinet-getitem?mr=3928540
LA  - en
ID  - JSFS_2019__160_1_57_0
ER  - 
Chamroukhi, Faicel; Huynh, Bao-Tuyen. Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models. Journal de la société française de statistique, Tome 160 (2019) no. 1, pp. 57-85. http://www.numdam.org/item/JSFS_2019__160_1_57_0/

[1] Celeux, Gilles; Maugis-Rabusseau, Cathy; Sedki, Mohammed Variable selection in model-based clustering and discriminant analysis with a regularization approach, Advances in Data Analysis and Classification (2018) to appear in 2018 (available on line) | Article | MR 3935198 | Zbl 07061248

[2] Devijver, E. An 1 -oracle inequality for the Lasso in multivariate finite mixture of multivariate Gaussian regression models, ESAIM: Probability and Statistics, Volume 19 (2015), pp. 649-670 | MR 3433431 | Zbl 1392.62179

[3] Dempster, A. P.; Laird, N. M.; Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm, J. of the royal statistical society. Series B (1977), pp. 1-38 | MR 501537 | Zbl 0364.62022

[4] Friedman, Jerome; Hastie, Trevor; Tibshirani, Rob Regularization paths for generalized linear models via coordinate descent, Journal of statistical software, Volume 33 (2010) no. 1, pp. 1-22

[5] Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American statistical Association, Volume 96 (2001) no. 456, pp. 1348-1360 | MR 1946581 | Zbl 1073.62547

[6] Fraley, Chris; Raftery, Adrian E Bayesian regularization for normal mixture estimation and model-based clustering (2005) (Technical report) | MR 2415725 | Zbl 1159.62302

[7] Fraley, Chris; Raftery, Adrian E Bayesian regularization for normal mixture estimation and model-based clustering, Journal of classification, Volume 24 (2007) no. 2, pp. 155-181 | MR 2415725 | Zbl 1159.62302

[8] Frühwirth-Schnatter, S. Finite Mixture and Markov Switching Models (Springer Series in Statistics), Springer Verlag, New York, 2006 | MR 2265601 | Zbl 1108.62002

[9] Hunter, D. R.; Li, R. Variable selection using M M algorithms, Annals of statistics, Volume 33 (2005) no. 4, pp. 1617-1642 | Article | MR 2166557 | Zbl 1078.62028

[10] Hui, F. K.; Warton, D. I.; Foster, S. D. Multi-species distribution modeling using penalized mixture of regressions, The Annals of Applied Statistics, Volume 9 (2015) no. 2, pp. 866-882 | MR 3371339 | Zbl 1397.62263

[11] Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; Hinton, G. E. Adaptive mixtures of local experts, Neural computation, Volume 3 (1991) no. 1, pp. 79-87

[12] Jiang, W.; Tanner, M. A. Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation, Annals of Statistics (1999), pp. 987-1011 | MR 1724038 | Zbl 0957.62032

[13] Khalili, A.; Chen, J. Variable selection in finite mixture of regression models, Journal of the American Statistical association, Volume 102 (2007) no. 479, pp. 1025-1038 | MR 2411662 | Zbl 05564429

[14] Khalili, A. New estimation and feature selection methods in mixture-of-experts models, Canadian Journal of Statistics, Volume 38 (2010) no. 4, pp. 519-539 | MR 2753000 | Zbl 1349.62071

[15] Lange, K. Optimization (2nd edition), Springer, 2013 | MR 3052733 | Zbl 1273.90002

[16] Law, Martin HC; Figueiredo, Mario AT; Jain, Anil K Simultaneous feature selection and clustering using mixture models, IEEE transactions on pattern analysis and machine intelligence, Volume 26 (2004) no. 9, pp. 1154-1166

[17] Lloyd-Jones, Luke R.; Nguyen, Hien D.; McLachlan, Geoffrey J. A globally convergent algorithm for lasso-penalized mixture of linear regression models, Computational Statistics & Data Analysis, Volume 119 (2018), pp. 19 - 38 | MR 3729520 | Zbl 06920181

[18] Lee, Su-In; Lee, Honglak; Abbeel, Pieter; Ng, Andrew Y Efficient L 1 regularized logistic regression, AAAI, Volume 6 (2006), pp. 401-408

[19] Lee, Jason D; Sun, Yuekai; Saunders, Michael A Proximal Newton-type methods for minimizing composite functions, SIAM Journal on Optimization, Volume 24 (2014) no. 3, pp. 1420-1443 | MR 3252813 | Zbl 1306.65213

[20] Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure Variable selection for clustering with Gaussian mixture models, Biometrics, Volume 65 (2009) no. 3, pp. 701-709 | MR 2649842 | Zbl 1172.62021

[21] Maugis, Cathy; Celeux, Gilles; Martin-Magniette, M-L Variable selection in model-based clustering: A general variable role modeling, Computational Statistics & Data Analysis, Volume 53 (2009) no. 11, pp. 3872-3882 | MR 2749931 | Zbl 05689143

[22] Meynet, C. An 1 -oracle inequality for the Lasso in finite mixture Gaussian regression models, ESAIM: Probability and Statistics, Volume 17 (2013), pp. 650-671 | Numdam | MR 3126156 | Zbl 1395.62166

[23] McLachlan, G. J.; Peel., D. Finite mixture models, New York: Wiley, 2000 | MR 1789474 | Zbl 0963.62061

[24] Nguyen, Hien D.; Chamroukhi, Faicel Practical and Theoretical Aspects of Mixture-of-Experts Modeling: An overview, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (2018), p. e1246-n/a (https://arxiv.org/abs/1707.03538v1) | Article

[25] Pan, Wei; Shen, Xiaotong Penalized model-based clustering with application to variable selection, Journal of Machine Learning Research, Volume 8 (2007) no. May, pp. 1145-1164 | Zbl 1222.68279

[26] Raftery, Adrian E; Dean, Nema Variable selection for model-based clustering, Journal of the American Statistical Association, Volume 101 (2006) no. 473, pp. 168-178 | MR 2268036 | Zbl 1118.62339

[27] Städler, N.; Bühlmann, P.; Van De Geer, S. l 1-penalization for mixture regression models, Test, Volume 19 (2010) no. 2, pp. 209-256 | MR 2677722 | Zbl 1203.62128

[28] Snoussi, Hichem; Mohammad-Djafari, Ali Degeneracy and likelihood penalization in multivariate Gaussian mixture models, Univ. of Technology of Troyes, Troyes, France, Tech. Rep. UTT (2005)

[29] Stephens, Matthew; Phil, D Bayesian methods for mixtures of normal distributions, 1997

[30] Schifano, Elizabeth D; Strawderman, Robert L; Wells, Martin T Majorization-minimization algorithms for nonsmoothly penalized objective functions, Electronic Journal of Statistics, Volume 4 (2010), pp. 1258-1299 | MR 2738533 | Zbl 1267.65009

[31] Tibshirani, R. Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (1996), pp. 267-288 | MR 1379242 | Zbl 0850.62538

[32] Tseng, P. Convergence of a block coordinate descent method for nondifferentiable minimization, Journal of optimization theory and applications, Volume 109 (2001) no. 3, pp. 475-494 | MR 1835069 | Zbl 1006.65062

[33] Tseng, P. Coordinate ascent for maximizing nondifferentiable concave functions (1988) (Technical report)

[34] Titterington, D.; Smith, A.; Makov, U. Statistical Analysis of Finite Mixture Distributions, John Wiley & Sons, 1985 | MR 838090 | Zbl 0646.62013

[35] Witten, Daniela M; Tibshirani, Robert A framework for feature selection in clustering, Journal of the American Statistical Association, Volume 105 (2010) no. 490, pp. 713-726 | MR 2724855 | Zbl 1392.62194