An 1 -oracle inequality for the Lasso in finite mixture gaussian regression models
ESAIM: Probability and Statistics, Tome 17 (2013) , pp. 650-671.

We consider a finite mixture of Gaussian regression models for high-dimensional heterogeneous data where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by an 1-penalized maximum likelihood estimator. We shall provide an 1-oracle inequality satisfied by this Lasso estimator with the Kullback-Leibler loss. In particular, we give a condition on the regularization parameter of the Lasso to obtain such an oracle inequality. Our aim is twofold: to extend the 1-oracle inequality established by Massart and Meynet [12] in the homogeneous Gaussian linear regression case, and to present a complementary result to Städler et al. [18], by studying the Lasso for its 1-regularization properties rather than considering it as a variable selection procedure. Our oracle inequality shall be deduced from a finite mixture Gaussian regression model selection theorem for 1-penalized maximum likelihood conditional density estimation, which is inspired from Vapnik's method of structural risk minimization [23] and from the theory on model selection for maximum likelihood estimators developed by Massart in [11].

DOI : https://doi.org/10.1051/ps/2012016
Classification : 62G08,  62H30
Mots clés : finite mixture of gaussian regressions model, Lasso, ℓ1-oracle inequalities, model selection by penalization, ℓ1-balls
@article{PS_2013__17__650_0,
     author = {Meynet, Caroline},
     title = {An $\ell _1$-oracle inequality for the Lasso in finite mixture gaussian regression models},
     journal = {ESAIM: Probability and Statistics},
     pages = {650--671},
     publisher = {EDP-Sciences},
     volume = {17},
     year = {2013},
     doi = {10.1051/ps/2012016},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ps/2012016/}
}
Meynet, Caroline. An $\ell _1$-oracle inequality for the Lasso in finite mixture gaussian regression models. ESAIM: Probability and Statistics, Tome 17 (2013) , pp. 650-671. doi : 10.1051/ps/2012016. http://www.numdam.org/articles/10.1051/ps/2012016/

[1] P.L. Bartlett, S. Mendelson and J. Neeman, ℓ1-regularized linear regression: persistence and oracle inequalities, Probability and related fields. Springer (2011).

[2] J.P. Baudry, Sélection de Modèle pour la Classification Non Supervisée. Choix du Nombre de Classes. Ph.D. thesis, Université Paris-Sud 11, France (2009).

[3] P.J. Bickel, Y. Ritov and A.B. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37 (2009) 1705-1732. | MR 2533469 | Zbl 1173.62022

[4] S. Boucheron, G. Lugosi and P. Massart, A non Asymptotic Theory of Independence. Oxford University press (2013). | MR 3185193 | Zbl 1279.60005

[5] P. Bühlmann and S. Van De Geer, On the conditions used to prove oracle results for the Lasso. Electr. J. Stat. 3 (2009) 1360-1392. | MR 2576316

[6] E. Candes and T. Tao, The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35 (2007) 2313-2351. | MR 2382644 | Zbl 1139.62019

[7] S. Cohen and E. Le Pennec, Conditional Density Estimation by Penalized Likelihood Model Selection and Applications, RR-7596. INRIA (2011).

[8] B. Efron, T. Hastie, I. Johnstone and R. Tibshirani, Least Angle Regression. Ann. Stat. 32 (2004) 407-499. | MR 2060166 | Zbl 1091.62054

[9] M. Hebiri, Quelques questions de sélection de variables autour de l'estimateur Lasso. Ph.D. Thesis, Université Paris Diderot, Paris 7, France (2009).

[10] C. Huang, G.H.L. Cheang and A.R. Barron, Risk of penalized least squares, greedy selection and ℓ1-penalization for flexible function librairies. Submitted to the Annals of Statistics (2008). | MR 2711791

[11] P. Massart, Concentration inequalities and model selection. Ecole d'été de Probabilités de Saint-Flour 2003. Lect. Notes Math. Springer, Berlin-Heidelberg (2007). | MR 2319879 | Zbl 1170.60006

[12] P. Massart and C. Meynet, The Lasso as an ℓ1-ball model selection procedure. Elect. J. Stat. 5 (2011) 669-687. | MR 2820635 | Zbl 1274.62468

[13] C. Maugis and B. Michel, A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: PS 15 (2011) 41-68. | Numdam | MR 2870505

[14] G. Mclachlan and D. Peel, Finite Mixture Models. Wiley, New York (2000). | MR 1789474 | Zbl 0963.62061

[15] N. Meinshausen and B. Yu, Lasso type recovery of sparse representations for high dimensional data. Ann. Stat. 37 (2009) 246-270. | MR 2488351 | Zbl 1155.62050

[16] R.A. Redner and H.F. Walker, Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 (1984) 195-239. | MR 738930 | Zbl 0536.62021

[17] P. Rigollet and A. Tsybakov, Exponential screening and optimal rates of sparse estimation. Ann. Stat. 39 (2011) 731-771. | MR 2816337 | Zbl 1215.62043

[18] N. Städler, B.P. Hlmann, and S. Van De Geer, ℓ1-penalization for mixture regression models. Test 19 (2010) 209-256. | Zbl 1203.62128

[19] R. Tibshirani, Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. Ser. B 58 (1996) 267-288. | MR 1379242 | Zbl 0850.62538

[20] M.R. Osborne, B. Presnell and B.A. Turlach, On the Lasso and its dual. J. Comput. Graph. Stat. 9 (2000) 319-337. | MR 1822089

[21] M.R. Osborne, B. Presnell and B.A Turlach, A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 (2000) 389-404. | MR 1773265 | Zbl 0962.65036

[22] A. Van Der Vaart and J. Wellner, Weak Convergence and Empirical Processes. Springer, Berlin (1996). | MR 1385671 | Zbl 0862.60002

[23] V.N. Vapnik, Estimation of Dependencies Based on Empirical Data. Springer, New-York (1982). | MR 672244 | Zbl 0499.62005

[24] V.N. Vapnik, Statistical Learning Theory. J. Wiley, New-York (1990). | MR 1641250 | Zbl 0935.62007

[25] P. Zhao and B. Yu On model selection consistency of Lasso. J. Mach. Learn. Res. 7 (2006) 2541-2563. | MR 2274449 | Zbl 1222.62008