Minimal penalties and the slope heuristics: a survey
[Pénalités minimales et heuristique de pente]
Journal de la société française de statistique, Tome 160 (2019) no. 3, pp. 1-106.

Birgé et Massart ont proposé en 2001 l’heuristique de pente, pour déterminer à l’aide des données une constante multiplicative optimale devant une pénalité en sélection de modèles. Cette heuristique s’appuie sur la notion de pénalité minimale, et elle a depuis été généralisée en “algorithmes à base de pénalités minimales”. Cet article passe en revue les résultats théoriques obtenus sur ces algorithmes, avec une preuve complète dans le cadre le plus simple, des idées de preuves précises pour généraliser ce résultat au-delà des cadres déjà étudiés, et quelques résultats nouveaux. Des liens sont faits avec les méthodes d’estimation de la variance résiduelle (avec une contribution originale sur ce thème, qui démontre que l’heuristique de pente produit un estimateur de la variance quasiment aussi bon qu’un estimateur fondé sur les résidus d’un modèle oracle) ainsi qu’avec plusieurs algorithmes classiques tels que les heuristiques de coude (ou de courbe en L), C p de Mallows et FPE d’Akaike. Les questions de mise en œuvre pratique sont également étudiées, avec notamment la proposition de deux nouvelles définitions pratiques pour des algorithmes à base de pénalités minimales et leur comparaison aux définitions précédentes sur des données simulées. Enfin, des conjectures et problèmes ouverts sont proposés comme pistes de recherche pour l’avenir.

Birgé and Massart proposed in 2001 the slope heuristics as a way to choose optimally from data an unknown multiplicative constant in front of a penalty. It is built upon the notion of minimal penalty, and it has been generalized since to some “minimal-penalty algorithms”. This article reviews the theoretical results obtained for such algorithms, with a self-contained proof in the simplest framework, precise proof ideas for further generalizations, and a few new results. Explicit connections are made with residual-variance estimators —with an original contribution on this topic, showing that for this task the slope heuristics performs almost as well as a residual-based estimator with the best model choice— and some classical algorithms such as L-curve or elbow heuristics, Mallows’ C p , and Akaike’s FPE. Practical issues are also addressed, including two new practical definitions of minimal-penalty algorithms that are compared on synthetic data to previously-proposed definitions. Finally, several conjectures and open problems are suggested as future research directions.

Keywords: model selection, estimator selection, penalization, slope heuristics, minimal penalty, residual-variance estimation, L-curve heuristics, elbow heuristics, scree test, overpenalization
Mot clés : sélection de modèles, sélection d’estimateurs, pénalisation, heuristique de pente, pénalité minimale, estimation de la variance résiduelle, heuristique de courbe en L, heuristique de coude, test scree, surpénalisation
@article{JSFS_2019__160_3_1_0,
     author = {Arlot, Sylvain},
     title = {Minimal penalties and the slope heuristics: a survey},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {1--106},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {160},
     number = {3},
     year = {2019},
     mrnumber = {4021408},
     zbl = {1437.62121},
     language = {en},
     url = {http://www.numdam.org/item/JSFS_2019__160_3_1_0/}
}
TY  - JOUR
AU  - Arlot, Sylvain
TI  - Minimal penalties and the slope heuristics: a survey
JO  - Journal de la société française de statistique
PY  - 2019
SP  - 1
EP  - 106
VL  - 160
IS  - 3
PB  - Société française de statistique
UR  - http://www.numdam.org/item/JSFS_2019__160_3_1_0/
LA  - en
ID  - JSFS_2019__160_3_1_0
ER  - 
%0 Journal Article
%A Arlot, Sylvain
%T Minimal penalties and the slope heuristics: a survey
%J Journal de la société française de statistique
%D 2019
%P 1-106
%V 160
%N 3
%I Société française de statistique
%U http://www.numdam.org/item/JSFS_2019__160_3_1_0/
%G en
%F JSFS_2019__160_3_1_0
Arlot, Sylvain. Minimal penalties and the slope heuristics: a survey. Journal de la société française de statistique, Tome 160 (2019) no. 3, pp. 1-106. http://www.numdam.org/item/JSFS_2019__160_3_1_0/

[1] Arlot, Sylvain; Baudry, Jean-Patrick Sélection de modèles, 2002 In French. Master 1 report, ENS Paris. Available at https://www.math.u-psud.fr/~arlot/papers/02selection_modeles.pdf. Advisor: Yannick Baraud. Report about the paper “Gaussian model selection” by L. Birgé & P. Massart, JEMS 3(3):203–268, 2001.

[2] Arlot, Sylvain; Bach, Francis Data-driven calibration of linear estimators with minimal penalties, Advances in Neural Information Processing Systems 22 (Bengio, Y.; Schuurmans, D.; Lafferty, J.; Williams, C. K. I.; Culotta, A., eds.) (2009), pp. 46-54

[3] Arlot, Sylvain; Bach, Francis Data-driven calibration of linear estimators with minimal penalties, 2011 | arXiv

[4] Arlot, Sylvain; Celisse, Alain A survey of cross-validation procedures for model selection, Statist. Surv., Volume 4 (2010), pp. 40-79 | DOI | MR | Zbl

[5] Arlot, Sylvain; Celisse, Alain; Harchaoui, Zaïd A Kernel Multiple Change-point Algorithm via Model Selection, J. Mach. Learn. Res. (2019) (To appear. Preliminary version available at arXiv:1202.3878) | MR | Zbl

[6] Akakpo, Nathalie Estimating a discrete distribution via histogram selection, ESAIM: Probability and Statistics, Volume 15 (2011), pp. 1-29 | DOI | Numdam | MR | Zbl

[7] Akaike, Hirotugu Fitting autoregressive models for prediction, Ann. Inst. Statist. Math., Volume 21 (1969), pp. 243-247 | MR | Zbl

[8] Akaike, Hirotugu Statistical predictor identification, Ann. Inst. Statist. Math., Volume 22 (1970), pp. 203-217 | MR | Zbl

[9] Akaike, Hirotugu Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory (Tsahkadsor, 1971), Akadémiai Kiadó, Budapest, 1973, pp. 267-281 | MR | Zbl

[10] Allen, David M. The relationship between variable selection and data augmentation and a method for prediction, Technometrics, Volume 16 (1974), pp. 125-127 | MR | Zbl

[11] Arlot, Sylvain; Massart, Pascal Data-driven calibration of penalties for least-squares regression, J. Mach. Learn. Res., Volume 10 (2009), p. 245-279 (electronic) http://www.jmlr.org/papers/volume10/arlot09a/arlot09a.pdf

[12] Arlot, Sylvain Resampling and Model Selection, University Paris-Sud 11, December (2007) http://tel.archives-ouvertes.fr/tel-00198803/en/ (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-00198803v1 )

[13] Arlot, Sylvain Model selection by resampling penalization, Electron. J. Stat., Volume 3 (2009), p. 557-624 (electronic) | DOI | MR | Zbl

[14] Andresen, Andreas; Spokoiny, Vladimir Critical dimension in profile semiparametric estimation, Electron. J. Statist., Volume 8 (2014) no. 2, pp. 3077-3125 | DOI | MR | Zbl

[15] Burnham, Kenneth P.; Anderson, David R. Model Selection and Multimodel Inference, Springer-Verlag, New York, 2002, xxvi+488 pages (A practical information-theoretic approach) | MR | Zbl

[16] Baraud, Yannick Model selection for regression on a fixed design, Probab. Theory Related Fields, Volume 117 (2000) no. 4, pp. 467-493 | MR | Zbl

[17] Baraud, Yannick Estimator selection with respect to Hellinger-type risks, Probab. Theory Related Fields, Volume 151 (2011) no. 1-2, pp. 353-401 | DOI | MR | Zbl

[18] Baudry, Jean-Patrick Model selection for clustering. Choosing the number of classes, University Paris-Sud, December (2009) http://tel.archives-ouvertes.fr/tel-00461550/ (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-00461550v1 )

[19] Baudry, Jean-Patrick Estimation and model selection for model-based clustering with the conditional classification likelihood, Electron. J. Statist., Volume 9 (2015) no. 1, pp. 1041-1077 | DOI | MR | Zbl

[20] Bartlett, Peter L.; Bousquet, Olivier; Mendelson, Shahar Local Rademacher complexities, Ann. Statist., Volume 33 (2005) no. 4, pp. 1497-1537 | MR | Zbl

[21] Bouveyron, Charles; Côme, Etienne; Jacques, Julien The discriminative functional mixture model for a comparative analysis of bike sharing systems, Ann. Appl. Stat., Volume 9 (2015) no. 4, pp. 1726-1760 | DOI | MR | Zbl

[22] Belloni, Alexandre; Chernozhukov, Victor; Wang, Lie Pivotal estimation via square-root Lasso in nonparametric regression, Ann. Statist., Volume 42 (2014) no. 2, pp. 757-788 | DOI | MR | Zbl

[23] Buckley, Michael J.; Eagleson, Geoffrey K. A graphical method for estimating the residual variance in nonparametric regression, Biometrika, Volume 76 (1989) no. 2, pp. 203-210 | DOI | MR | Zbl

[24] Bellec, Pierre Optimistic lower bounds for convex regularized least-squares, 2017 (arXiv:1703.01332v3)

[25] Bellec, Pierre The noise barrier and the large signal bias of the Lasso and other convex estimators, 2018 (arXiv:1804.01230v4)

[26] Biau, Gérard; Fischer, Aurélie Parameter Selection for Principal Curves, IEEE Transactions on Information Theory, Volume 58 (2012) no. 3, pp. 1924-1939 | DOI | MR | Zbl

[27] Bouveyron, Charles; Fauvel, Mathieu; Girard, Stéphane Kernel discriminant analysis and clustering with parsimonious Gaussian process models, Statistics and Computing, Volume 25 (2015) no. 6, pp. 1143-1162 | DOI | MR | Zbl

[28] Bardet, Jean-Marc; Guenaizi, Abdellatif Semi-parametric detection of multiple changes in long-range dependent processes, 2018 no. 1801.02515v2 (arXiv:1801.02515v2) | MR | Zbl

[29] Baraud, Yannick; Giraud, Christophe; Huet, Sylvie Gaussian model selection with an unknown variance, Ann. Statist., Volume 37 (2009) no. 2, pp. 630-672 | DOI | MR | Zbl

[30] Baraud, Yannick; Giraud, Christophe; Huet, Sylvie Estimator selection in the Gaussian setting, Ann. Inst. Henri Poincaré Probab. Stat., Volume 50 (2014) no. 3, pp. 1092-1119 | DOI | Numdam | MR | Zbl

[31] Bar-Hen, Avner; Gey, Servane; Poggi, Jean-Michel Spatial CART Classification Trees, 2018 (Available at https://hal.archives-ouvertes.fr/hal-01837065v1 ) | HAL

[32] Bardet, Jean-Marc; Kengne, William Chakry; Wintenberger, Olivier Multiple breaks detection in general causal time series using penalized quasi-likelihood, Electron. J. Stat., Volume 6 (2012), p. 435-477 (electronic) | DOI | Zbl

[33] Brown, Lawrence D.; Levine, Michael Variance estimation in nonparametric regression via the difference sequence method, Ann. Statist., Volume 35 (2007) no. 5, pp. 2219-2232 | DOI | MR | Zbl

[34] Bertin, Karine; Le Pennec, Erwann; Rivoirard, Vincent Adaptive Dantzig density estimation, Ann. Inst. H. Poincaré Probab. Statist., Volume 47 (2011) no. 1, pp. 43-74 | DOI | Numdam | Zbl

[35] Bertin, Karine; Lacour, Claire; Rivoirard, Vincent Adaptive pointwise estimation of conditional density function, Ann. Inst. Henri Poincaré Probab. Stat., Volume 52 (2016) no. 2, pp. 939-980 | DOI | Zbl

[36] Birgé, Lucien; Massart, Pascal A generalized Cp criterion for Gaussian model selection (2001) http://massart.pascal.free.fr/Site/publications_files/Cp.pdf (Technical report Prépublication 647, 39 pages. Available at http://massart.pascal.free.fr/Site/publications_files/Cp.pdf )

[37] Birgé, Lucien; Massart, Pascal Gaussian model selection, J. Eur. Math. Soc. (JEMS), Volume 3 (2001) no. 3, pp. 203-268 | MR | Zbl

[38] Blanchard, Gilles; Massart, Pascal Discussion: “Local Rademacher complexities and oracle inequalities in risk minimization” [Ann. Statist. 34 (2006), no. 6, 2593–2656] by V. Koltchinskii, Ann. Statist., Volume 34 (2006) no. 6, pp. 2664-2671 | MR

[39] Bartlett, Peter L.; Mendelson, Shahar Empirical minimization, Probability Theory and Related Fields, Volume 135 (2006) no. 3, pp. 311-334 | Zbl

[40] Birgé, Lucien; Massart, Pascal Minimal penalties for Gaussian model selection, Probab. Theory Related Fields, Volume 138 (2007) no. 1-2, pp. 33-73 | MR | Zbl

[41] Boucheron, Stéphane; Massart, Pascal A high dimensional Wilks phenomenon, Probab. Theory Related Fields, Volume 150 (2011) no. 3-4, pp. 405-433 | DOI | Zbl

[42] Birgé, Lucien; Massart, Pascal From model selection to adaptive estimation, Festschrift for Lucien Le Cam, Springer, New York, 1997, pp. 55-87 | MR | Zbl

[43] Baudry, Jean-Patrick; Maugis, Cathy; Michel, Bertrand Slope heuristics: overview and implementation, Statistics and Computing, Volume 22 (2012) no. 2, pp. 455-470 | Zbl

[44] Bitseki Penda, S. Valère; Roche, Angelina Local bandwidth selection for kernel density estimation in bifurcating Markov chain model, 2017 (arXiv:1706.07034v1)

[45] Breiman, Leo; Spector, Philip Submodel Selection and Evaluation in Regression. The X-Random Case, International Statistical Review, Volume 60 (1992) no. 3, pp. 291-319

[46] Bontemps, Dominique; Toussile, Wilson Clustering and variable selection for categorical multivariate data, Electron. J. Stat., Volume 7 (2013), pp. 2344-2371 | DOI | Zbl

[47] Bellec, Pierre; Tsybakov, Alexandre Bounds on the Prediction Error of Penalized Least Squares Estimators with Convex Penalty, Modern Problems of Stochastic Analysis and Statistics (Panov, Vladimir, ed.), Springer International Publishing, Cham (2017), pp. 315-333 | Zbl

[48] Castellan, Gwenaëlle Modified Akaike’s criterion for histogram density estimation (1999) no. 1999-61 https://www.math.u-psud.fr/~biblio/pub/1999/abs/ppo1999_61.html (Technical report Available at https://www.math.u-psud.fr/~biblio/pub/1999/abs/ppo1999_61.html )

[49] Cattell, Raymond B. The scree test for the number of factors, Multivariate Behav. Res., Volume 1 (1966) no. 2, pp. 245-276

[50] Carter, Christopher K.; Eagleson, Geoffrey K. A comparison of variance estimators in nonparametric regression, J. Roy. Statist. Soc. Ser. B, Volume 54 (1992) no. 3, pp. 773-780 http://links.jstor.org/sici?sici=0035-9246(1992)54:3<773:ACOVEI>2.0.CO;2-7&origin=MSN | MR

[51] Cao, Yun; Golubev, Yuri On oracle inequalities related to smoothing splines, Math. Methods Statist., Volume 15 (2006) no. 4, p. 398-414 (2007) | MR

[52] Castellanos, Justina Longina; Gómez, Susana; Guerra, Valia The triangle method for finding the corner of the L-curve, Appl. Numer. Math., Volume 43 (2002) no. 4, pp. 359-373 | DOI | MR | Zbl

[53] Chen, Xi; Guntuboyina, Adityanand; Zhang, Yuchen A note on the approximate admissibility of regularized estimators in the Gaussian sequence model, Electron. J. Statist., Volume 11 (2017) no. 2, pp. 4746-4768 | DOI | MR | Zbl

[54] Chagny, Gaëlle Penalization versus Goldenshluger-Lepski strategies in warped bases regression, ESAIM Probab. Stat., Volume 17 (2013), pp. 328-358 | DOI | Zbl

[55] Chatterjee, Sourav A new perspective on least squares under convex constraint, Ann. Statist., Volume 42 (2014) no. 6, pp. 2340-2381 | DOI | Zbl

[56] Chatterjee, Sourav High dimensional regression and matrix estimation without tuning parameters, 2015 | arXiv

[57] Cohen, Serge X.; Le Pennec, Erwann Unsupervised Segmentation of Spectral Images with a Spatialized Gaussian Mixture Model and Model Selection, Oil Gas Sci. Technol. – Rev. IFP Energies nouvelles, Volume 69 (2014) no. 2, pp. 245-259 | DOI

[58] Caillerie, Claire; Michel, Bertrand Model selection for simplicial approximation, Foundations of Computational Mathematics, Volume 11 (2011) no. 6, pp. 707-731 | DOI | Zbl

[59] Connault, Pierre Calibration d’algorithmes de type Lasso et analyse statistique de données métallurgiques en aéronautique, Université Paris-Sud (2011) (Ph. D. Thesis)

[60] Comte, Fabienne; Rozenholc, Yves A new algorithm for fixed design regression and denoising, Ann. Inst. Statist. Math., Volume 56 (2004) no. 3, pp. 449-473 | MR | Zbl

[61] Cattell, Raymond B.; Vogelmann, S. A comprehensive trial of the scree and K.G. criteria for determining the number of factors, Multivariate Behav. Res., Volume 12 (1977) no. 3, pp. 289-325 | DOI

[62] Craven, Peter; Wahba, Grace Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation, Numer. Math., Volume 31 (1978) no. 4, pp. 377-403 | MR | Zbl

[63] Devijver, Émilie Joint rank and variable selection for parsimonious estimation in a high-dimensional finite mixture regression model, Journal of Multivariate Analysis, Volume 157 (2017), pp. 1-13 | DOI | Zbl

[64] Devijver, Émilie Model-based regression clustering for high-dimensional data: application to functional data, Adv. Data Analysis and Classification, Volume 11 (2017) no. 2, pp. 243-279 | DOI | Zbl

[65] Devijver, Émilie; Gallopin, Mélina Block-diagonal covariance selection for high-dimensional Gaussian graphical models, Journal of the American Statistical Association (2018), pp. 306-314 | DOI | Zbl

[66] Devijver, Émilie; Gallopin, Mélina; Perthame, Emeline Nonlinear network-based quantitative trait prediction from transcriptomic data, 2017 (arXiv:1701.07899v5)

[67] Devijver, Émilie; Goude, Yannig; Poggi, Jean-Michel Clustering electricity consumers using high-dimensional regression mixture models, Applied Stochastic Models in Business and Industry (2019), pp. 1-19 | DOI

[68] Donoho, David L.; Johnstone, Iain M.; Kerkyacharian, Gérard; Picard, Dominique Wavelet shrinkage: asymptopia?, J. Roy. Statist. Soc. Ser. B, Volume 57 (1995) no. 2, pp. 301-369 http://links.jstor.org/sici?sici=0035-9246(1995)57:2<301:WSA>2.0.CO;2-S&origin=MSN (With discussion and a reply by the authors) | MR | Zbl

[69] Dossal, Charles; Kachour, Maher; Fadili, Jalal M.; Peyré, Gabriel; Chesneau, Christophe The degrees of freedom of the lasso for general design matrix, Statistica Sinica, Volume 23 (2013) no. 2, pp. 809-828 http://www.jstor.org/stable/24310363 | Zbl

[70] Derman, Esther; Le Pennec, Erwan Clustering and Model Selection via Penalized Likelihood for Different-sized Categorical Data Vectors, 2017 (arXiv:1709.02294v1)

[71] Durot, Cécile; Lebarbier, Émilie; Tocquet, Anne-Sophie Estimating the joint distribution of independent categorical variables via model selection, Bernoulli, Volume 15 (2009) no. 2, pp. 475-507 | DOI | Zbl

[72] Dette, Holger; Munk, Axel; Wagner, Thorsten Estimating the variance in nonparametric regression—what is a reasonable choice?, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 60 (1998) no. 4, pp. 751-764 | DOI | MR | Zbl

[73] Du, Jichang; Schick, Anton A covariate-matched estimator of the error variance in nonparametric regression, J. Nonparametr. Stat., Volume 21 (2009) no. 3, pp. 263-285 | DOI | MR | Zbl

[74] Efron, Bradley The estimation of prediction error: covariance penalties and cross-validation, J. Amer. Statist. Assoc., Volume 99 (2004) no. 467, pp. 619-642 (With comments and a rejoinder by the author) | MR | Zbl

[75] Efron, Bradley How biased is the apparent error rate of a prediction rule?, J. Amer. Statist. Assoc., Volume 81 (1986) no. 394, pp. 461-470 | MR | Zbl

[76] Engl, Heinz W.; Grever, Wilhelm Using the L -curve for determining optimal regularization parameters, Numer. Math., Volume 69 (1994) no. 1, pp. 25-31 | DOI | MR | Zbl

[77] Frontier, Serge Étude de la décroissance des valeurs propres dans une analyse en composantes principales: Comparaison avec le modèle du bâton brisé, Journal of Experimental Marine Biology and Ecology, Volume 25 (1976) no. 1, pp. 67-75 | DOI

[78] Godichon-Baggioni, Antoine; Maugis-Rabusseau, Cathy; Rau, Andrea Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data, Journal of Applied Statistics, Volume 46 (2019) no. 1, pp. 47-65 | DOI

[79] Gavish, Matan; Donoho, David L. The Optimal Hard Threshold for Singular Values is 4 / 3 , IEEE Trans. Inform. Theory, Volume 60 (2014) no. 8, pp. 5040-5053 | DOI | Zbl

[80] Gendre, Xavier Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression, Electron. J. Stat., Volume 2 (2008), pp. 1345-1372 | DOI | MR | Zbl

[81] Giraud, Christophe; Huet, Sylvie; Verzelen, Nicolas High-dimensional regression with unknown variance, Statist. Sci., Volume 27 (2012) no. 4, pp. 500-518 | DOI | MR | Zbl

[82] Giraud, Christophe Estimation of Gaussian graphs by model selection, Electron. J. Stat., Volume 2 (2008), p. 542-563 (electronic) | DOI | Zbl

[83] Giraud, Christophe Low rank multivariate regression, Electron. J. Stat., Volume 5 (2011), pp. 775-799 | DOI | MR | Zbl

[84] Gey, Servane; Lebarbier, Émilie Using CART to Detect Multiple Change Points in the Mean for large samples (2008) no. 12 (Technical report Available at https://hal.archives-ouvertes.fr/hal-00327146v1 )

[85] Garivier, Aurélien; Lerasle, Matthieu Oracle approach and slope heuristic in context tree estimation, 2011 (arXiv:1111.2191v1)

[86] Goldenshluger, Alexander; Lepski, Oleg Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality, Ann. Statist., Volume 39 (2011) no. 3, pp. 1608-1632 | DOI | MR | Zbl

[87] Giacobino, Caroline; Sardy, Sylvain; Diaz-Rodriguez, Jairo; Hengartner, Nick Quantile universal threshold, Electron. J. Statist., Volume 11 (2017) no. 2, pp. 4701-4722 | DOI | Zbl

[88] Gassiat, Elisabeth; Van Handel, Ramon Consistent order estimation and minimal penalties, IEEE Trans. Inform. Theory, Volume 59 (2013) no. 2, pp. 1115-1128 | DOI | Zbl

[89] Grodzevich, Oleg; Wolkowicz, Henry Regularization using a parameterized trust region subproblem, Math. Program., Volume 116 (2009) no. 1-2, pp. 193-220 | Zbl

[90] Hansen, Per Christian Analysis of discrete ill-posed problems by means of the L -curve, SIAM Rev., Volume 34 (1992) no. 4, pp. 561-580 | DOI | MR | Zbl

[91] Hanke, Martin Limitations of the L -curve method in ill-posed problems, BIT, Volume 36 (1996) no. 2, pp. 287-301 | DOI | MR | Zbl

[92] Horn, John L.; Engstrom, Robert Cattell’s Scree Test In Relation To Bartlett’s Chi-Square Test And Other Observations On The Number Of Factors Problem, Multivariate Behavioral Research, Volume 14 (1979) no. 3, pp. 283-300 | DOI

[93] Hansen, Per Christian; Jensen, Toke Koldborg; Rodriguez, Giuseppe An adaptive pruning algorithm for the discrete L-curve criterion, J. Comput. Appl. Math., Volume 198 (2007) no. 2, pp. 483-492 | DOI | MR | Zbl

[94] Hall, Peter; Kay, Jim W.; Titterington, Donald Michael Asymptotically optimal difference-based estimation of variance in nonparametric regression, Biometrika, Volume 77 (1990) no. 3, pp. 521-528 | DOI | MR | Zbl

[95] Heng, Yi; Lu, Shuai; Mhamdi, Adel; Pereverzev, Sergei V. Model functions in the modified L -curve method—case study: the heat flux reconstruction in pool boiling, Inverse Problems, Volume 26 (2010) no. 5, 13 pages | DOI | MR | Zbl

[96] Hall, Peter; Marron, James Stephen On variance estimation in nonparametric regression, Biometrika, Volume 77 (1990) no. 2, pp. 415-419 | DOI | MR | Zbl

[97] Hansen, Per Christian; O’Leary, Dianne Prost The use of the L -curve in the regularization of discrete ill-posed problems, SIAM J. Sci. Comput., Volume 14 (1993) no. 6, pp. 1487-1503 | DOI | MR | Zbl

[98] Horn, John L. A rationale and test for the number of factors in factor analysis, Psychometrika, Volume 30 (1965) no. 2, pp. 179-185 | DOI | Zbl

[99] Jackson, Donald A. Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches, Ecology, Volume 74 (1993) no. 8

[100] Koltchinskii, Vladimir Rademacher penalties and structural risk minimization, IEEE Trans. Inform. Theory, Volume 47 (2001) no. 5, pp. 1902-1914 | MR | Zbl

[101] Koltchinskii, Vladimir Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Statist., Volume 34 (2006) no. 6, pp. 2593-2656 | MR | Zbl

[102] Lavielle, Marc Using penalized contrasts for the change-point problem, Signal Proces., Volume 85 (2005) no. 8, pp. 1501-1510 | Zbl

[103] Liitiäinen, Elia; Corona, Francesco; Lendasse, Amaury Residual variance estimation using a nearest neighbor statistic, J. Multivariate Anal., Volume 101 (2010) no. 4, pp. 811-823 | DOI | MR | Zbl

[104] Lebarbier, Émilie Quelques approches pour la détection de ruptures à horizon fini, Université Paris-Sud, July (2002) http://www.theses.fr/2002PA112141 (Ph. D. Thesis)

[105] Lebarbier, Émilie Detecting multiple change-points in the mean of a Gaussian process by model selection, Signal Proces., Volume 85 (2005), pp. 717-736 | Zbl

[106] Lehéricy, Luc State-by-state Minimax Adaptive Estimation for Nonparametric Hidden Markov Models, Journal of Machine Learning Research, Volume 19 (2018) no. 39, pp. 1-46 http://jmlr.org/papers/v19/17-345.html | Zbl

[107] Lerasle, Matthieu Rééchantillonnage et sélection de modèles optimale pour l’estimation de la densité de variables indépendantes ou mélangeantes, INSA de Toulouse, June (2009) http://lerasle.perso.math.cnrs.fr/docs/these.pdf (Ph. D. Thesis Available at http://lerasle.perso.math.cnrs.fr/docs/these.pdf )

[108] Lerasle, Matthieu Optimal model selection for stationary data under various mixing conditions, Ann. Statist., Volume 39 (2011) no. 4, pp. 1852-1877 | DOI | Zbl

[109] Lerasle, Matthieu Optimal model selection in density estimation, Ann. Inst. Henri Poincaré Probab. Stat., Volume 48 (2012) no. 3, pp. 884-908 | DOI | Numdam | MR | Zbl

[110] Letué, Frédérique Modèle de Cox: estimation par sélection de modèle et modèle de chocs bivarié, Université Paris-Sud (2000) (Ph. D. Thesis Available at http://www-ljk.imag.fr/membres/Frederique.Letue/These3.pdf )

[111] Li, Ker-Chau From Stein’s unbiased risk estimates to the method of generalized cross validation, Ann. Statist., Volume 13 (1985) no. 4, pp. 1352-1377 | MR | Zbl

[112] Li, Ker-Chau Asymptotic optimality of C L and generalized cross-validation in ridge regression with application to spline smoothing, Ann. Statist., Volume 14 (1986) no. 3, pp. 1101-1112 | DOI | MR | Zbl

[113] Li, Ker-Chau Asymptotic optimality for C p , C L , cross-validation and generalized cross-validation: discrete index set, Ann. Statist., Volume 15 (1987) no. 3, pp. 958-975 | MR | Zbl

[114] Lacour, Claire; Massart, Pascal Minimal penalty for Goldenshluger-Lepski method, Stochastic Processes and their Applications, Volume 126 (2016) no. 12, pp. 3774-3789 (In Memoriam: Evarist Giné) | DOI | Zbl

[115] Lacour, Claire; Massart, Pascal; Rivoirard, Vincent Estimator Selection: a New Method with Applications to Kernel Density Estimation, Sankhya A, Volume 79 (2017) no. 2, pp. 298-335 | DOI | Zbl

[116] Lerasle, Matthieu; Magalhães, Nelo; Reynaud-Bouret, Patricia Optimal kernel selection for density estimation, High Dimensional Probability VII: The Cargese Volume (Progress in Probability), Volume 71, Springer, 2016, pp. 425-460 (Preliminary version available at arXiv:1511.02112) | DOI | Zbl

[117] Lozano, Fernando Model selection using Rademacher penalization, Proceedings of the 2nd ICSC Symp. on Neural Computation (NC2000). Berlin, Germany, ICSC Academic Press (2000)

[118] Lerasle, Matthieu; Takahashi, Daniel Yasumasa An Oracle Approach for Interaction Neighborhood Estimation in Random Fields, Electron. J. Stat., Volume 5 (2011), p. 534-571 (electronic) | DOI | Zbl

[119] Lerasle, Matthieu; Takahashi, Daniel Yasumasa Sharp oracle inequalities and slope heuristic for specification probabilities estimation in discrete random fields, Bernoulli, Volume 22 (2016) no. 1, pp. 325-344 | DOI | Zbl

[120] Liitiäinen, Elia; Verleysen, Michel; Corona, Francesco; Lendasse, Amaury Residual variance estimation in machine learning, Neurocomputing, Volume 72 (2009) no. 16, pp. 3692-3703 Financial Engineering Computational and Ambient Intelligence (IWANN 2007) | DOI

[121] Lung-Yut-Fong, Alexandre; Lévy-Leduc, Céline; Cappé, Olivier Homogeneity and change-point detection tests for multivariate data using rank statistics, Journal de la SFdS, Volume 156 (2015) no. 4, pp. 133-162 | Numdam | Zbl

[122] Magalhães, Nelo Cross-Validation and Penalization for Density Estimation, Université Paris Sud - Paris XI, May (2015) http://tel.archives-ouvertes.fr/tel-01164581 (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-01164581v1 )

[123] Mallows, Colin L. Some comments on C p , Technometrics, Volume 15 (1973), pp. 661-675 | Zbl

[124] Massart, Pascal A non-asymptotic theory for model selection, European Congress of Mathematics, Eur. Math. Soc., Zürich, 2005, pp. 309-323 | MR | Zbl

[125] Massart, Pascal Concentration Inequalities and Model Selection, Lecture Notes in Mathematics, 1896, Springer, Berlin, 2007, xiv+337 pages (Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard) | MR | Zbl

[126] Massart, Pascal Sélection de modèles: de la théorie à la pratique, Journal de la SFdS, Volume 149 (2008) no. 4, pp. 5-28 | Zbl

[127] Mendelson, Shahar Learning without concentration for general loss functions, Probab. Theory Related Fields, Volume 171 (2018) no. 1-2, pp. 459-502 | DOI | MR | Zbl

[128] Muro, Alan; Geer, Sara Concentration behavior of the penalized least squares estimator, Statistica Neerlandica, Volume 72 (2018) no. 2, pp. 109-125 | DOI

[129] Michel, Bertrand Modélisation de la production d’hydrocarbures dans un bassin pétrolier, Université Paris-Sud, December (2008) http://tel.archives-ouvertes.fr/tel-00345753/ (Ph. D. Thesis Available at http://tel.archives-ouvertes.fr/tel-00345753v1 )

[130] Miller, Keith Least squares methods for ill-posed problems with a prescribed bound., SIAM J. Math. Anal., Volume 1 (1970), pp. 52-74 | MR | Zbl

[131] Maugis, Cathy; Michel, Bertrand A non asymptotic penalized criterion for gaussian mixture model selection, ESAIM Probab. Stat., Volume 15 (2011), pp. 41-68 | DOI | Numdam | MR | Zbl

[132] Maugis, Cathy; Michel, Bertrand Data-driven penalty calibration: A case study for Gaussian model selection, ESAIM Probab. Stat., Volume 15 (2011), pp. 320-339 | DOI | Numdam | MR | Zbl

[133] Matias, Catherine; Miele, Vincent Statistical clustering of temporal networks through a dynamic stochastic block model, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Volume 79 (2017) no. 4, pp. 1119-1141 | DOI | MR | Zbl

[134] Meynet, Caroline; Maugis-Rabusseau, Cathy A sparse variable selection procedure in model-based clustering, 2012 (Available at https://hal.inria.fr/hal-00734316v1 )

[135] Massart, Pascal; Nédélec, Élodie Risk bounds for statistical learning, Ann. Statist., Volume 34 (2006) no. 5, pp. 2326-2366 | MR | Zbl

[136] Müller, Ursula U.; Schick, Anton; Wefelmeyer, Wolfgang Estimating the error variance in nonparametric regression by a covariate-matched U -statistic, Statistics, Volume 37 (2003) no. 3, pp. 179-188 | DOI | MR | Zbl

[137] Mammen, Enno; Tsybakov, Alexandre B. Smooth discrimination analysis, Ann. Statist., Volume 27 (1999) no. 6, pp. 1808-1829 | MR | Zbl

[138] Navarro, Fabien; Saumard, Adrien Slope heuristics and V-fold model selection in heteroscedastic regression using strongly localized bases, ESAIM Probab. Stat., Volume 21 (2017), pp. 412-451 | DOI | MR | Zbl

[139] Oueslati, Abdullah; Lopez, Olivier A proportional hazards regression model with change-points in the baseline function, Lifetime Data Analysis, Volume 19 (2013) no. 1, pp. 59-78 | DOI | MR | Zbl

[140] Reynaud-Bouret, Patricia; Rivoirard, Vincent Near optimal thresholding estimation of a Poisson intensity on the real line, Electron. J. Stat., Volume 4 (2010), p. 172-238 (electronic) | DOI | MR | Zbl

[141] Reynaud-Bouret, Patricia; Rivoirard, Vincent; Tuleau-Malot, Christine Adaptive density estimation: a curse of support?, J. Statist. Plann. Inference, Volume 141 (2011) no. 1, pp. 115-139 | DOI | MR | Zbl

[142] Reynaud-Bouret, Patricia; Schbath, Sophie Adaptive estimation for Hawkes processes; application to genome analysis, Ann. Statist., Volume 38 (2010) no. 5, pp. 2781-2822 | DOI | MR | Zbl

[143] Regińska, Teresa A regularization parameter in discrete ill-posed problems, SIAM J. Sci. Comput., Volume 17 (1996) no. 3, pp. 740-749 | DOI | MR | Zbl

[144] Rice, John Bandwidth choice for nonparametric regression, Ann. Statist., Volume 12 (1984) no. 4, pp. 1215-1230 | DOI | MR | Zbl

[145] Rau, Andrea; Maugis-Rabusseau, Cathy; Martin-Magniette, Marie-Laure; Celeux, Gilles Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models, Bioinformatics, Volume 31 (2015) no. 9, pp. 1420-1427 | DOI

[146] Roche, Angelina Statistical modeling for functional data: non-asymptotic approaches and adaptive methods, Université Montpellier II - Sciences et Techniques du Languedoc, July (2014) (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-01023919v1 )

[147] Rozenholc, Yves Statistical Base Jumping: A simple and fully data-driven answer to penalized model selection (2012) (Séminaire de Statistique du MAP5, February 3rd)

[148] Ramosaj, Burim; Pauly, Markus Consistent estimation of residual variance with random forest Out-Of-Bag errors, Statistics & Probability Letters, Volume 151 (2019), pp. 49-57 | DOI | MR | Zbl

[149] Reichel, Lothar; Rodriguez, Giuseppe Old and new parameter choice rules for discrete ill-posed problems, Numerical Algorithms, Volume 63 (2013) no. 1, pp. 65-87 | DOI | MR | Zbl

[150] Reid, Stephen; Tibshirani, Robert; Friedman, Jerome A study of error variance estimation in Lasso regression, Statist. Sinica, Volume 26 (2016) no. 1, pp. 35-67 | MR | Zbl

[151] Solnon, Matthieu; Arlot, Sylvain; Bach, Francis Multi-task Regression using Minimal Penalties, J. Mach. Learn. Res., Volume 13 (2012), p. 2773-2812 (electronic) http://jmlr.csail.mit.edu/papers/v13/solnon12a.html | MR | Zbl

[152] Saumard, Adrien Convergence in sup-norm of least-squares estimators in regression with random design and nonparametric heteroscedastic noise (2010) (Available at http://hal.archives-ouvertes.fr/hal-00528539v2 )

[153] Saumard, Adrien Estimation par Minimum de Contraste Régulier et Heuristique de Pente en Sélection de Modèles, Université de Rennes 1, October (2010) http://tel.archives-ouvertes.fr/tel-00569372/fr/ (Ph. D. Thesis Available at http://tel.archives-ouvertes.fr/tel-00569372v1 )

[154] Saumard, Adrien Nonasymptotic quasi-optimality of AIC and the slope heuristics in maximum likelihood estimation of density using histogram models (2010) (Available at https://hal.archives-ouvertes.fr/hal-00512310v1 )

[155] Saumard, Adrien Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression, Electron. J. Stat., Volume 6 (2012), pp. 579-655 | DOI | MR | Zbl

[156] Saumard, Adrien Optimal model selection in heteroscedastic regression using piecewise polynomial functions, Electron. J. Stat., Volume 7 (2013), pp. 1184-1223 | DOI | MR | Zbl

[157] Saumard, Adrien A concentration inequality for the excess risk in least-squares regression with random design and heteroscedastic noise, 2017 (arXiv:1702.05063v2)

[158] Schwarz, Gideon Estimating the dimension of a model, Ann. Statist., Volume 6 (1978) no. 2, pp. 461-464 | MR | Zbl

[159] Sugar, Catherine A.; James, Gareth M. Finding the number of clusters in a dataset: an information-theoretic approach, J. Amer. Statist. Assoc., Volume 98 (2003) no. 463, pp. 750-763 | DOI | MR | Zbl

[160] Saumard, Adrien; Navarro, Fabien Finite sample improvement of Akaike’s Information Criterion, 2018 (arXiv:1803.02078v4)

[161] Solnon, Matthieu Apprentissage statistique multi-tâches, Université Pierre et Marie Curie - Paris VI, November (2013) http://hal.inria.fr/tel-00911498 (Ph. D. Thesis Available at https://hal.inria.fr/tel-00911498v1)

[162] Sorba, Olivier Minimal penalties for model selection, Université Paris-Saclay, February (2017) (Ph. D. Thesis Available at https://tel.archives-ouvertes.fr/tel-01515957v1 )

[163] Spokoiny, Vladimir Variance estimation for high-dimensional regression models, J. Multivariate Anal., Volume 82 (2002) no. 1, pp. 111-133 | DOI | MR | Zbl

[164] Spokoiny, Vladimir Parametric estimation. Finite sample theory, Ann. Statist., Volume 40 (2012) no. 6, pp. 2877-2909 | DOI | MR | Zbl

[165] Spokoiny, Vladimir Penalized maximum likelihood estimation and effective dimension, Ann. Inst. Henri Poincaré Probab. Stat., Volume 53 (2017) no. 1, pp. 389-429 | DOI | MR | Zbl

[166] Stein, Charles M. Estimation of the mean of a multivariate normal distribution, Ann. Statist., Volume 9 (1981) no. 6, pp. 1135-1151 | MR | Zbl

[167] Stone, Mervyn Cross-validatory choice and assessment of statistical predictions, J. Roy. Statist. Soc. Ser. B, Volume 36 (1974), pp. 111-147 (With discussion by G. A. Barnard, A. C. Atkinson, L. K. Chan, A. P. Dawid, F. Downton, J. Dickey, A. G. Baker, O. Barndorff-Nielsen, D. R. Cox, S. Giesser, D. Hinkley, R. R. Hocking, and A. S. Young, and with a reply by the authors) | MR | Zbl

[168] Tong, Tiejun; Ma, Yanyuan; Wang, Yuedong Optimal variance estimation without estimating the mean function, Bernoulli, Volume 19 (2013) no. 5A, pp. 1839-1854 | DOI | MR | Zbl

[169] Tibshirani, Ryan J.; Taylor, Jonathan Degrees of freedom in lasso problems, Ann. Statist., Volume 40 (2012) no. 2, pp. 1198-1232 | DOI | MR | Zbl

[170] Tibshirani, Robert; Walther, Guenther; Hastie, Trevor Estimating the number of clusters in a data set via the gap statistic, J. R. Stat. Soc. Ser. B Stat. Methodol., Volume 63 (2001) no. 2, pp. 411-423 | DOI | MR | Zbl

[171] van Handel, Ramon On the minimal penalty for Markov order estimation, Probability Theory and Related Fields, Volume 150 (2011) no. 3, pp. 709-738 | DOI | MR | Zbl

[172] van de Geer, Sara; Wainwright, Martin J. On concentration for (regularized) empirical risk minimization, Sankhya A, Volume 79 (2017) no. 2, pp. 159-200 | DOI | MR | Zbl

[173] Vaiter, Samuel; Deledalle, Charles; Peyré, Gabriel; Fadili, Jalal M.; Dossal, Charles The Degrees of Freedom of the Group Lasso, International Conference on Machine Learning Workshop (ICML), Edinburgh, United Kingdom (2012) (Available at https://hal.archives-ouvertes.fr/hal-00695292.) | HAL

[174] Verzelen, Nicolas Data-driven neighborhood selection of a Gaussian field, Comput. Statist. Data Anal., Volume 54 (2010) no. 5, pp. 1355-1371 | DOI | MR | Zbl

[175] van Erven, Tim; Grünwald, Peter D.; de Rooij, Steven Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC-BIC dilemma, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Volume 74 (2012) no. 3, pp. 361-417 | DOI | MR | Zbl

[176] Varet, Suzanne; Lacour, Claire; Massart, Pascal; Rivoirard, Vincent Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation (2019) (Technical report arXiv:1902.01075v1)

[177] Vogel, Curt R. Non-convergence of the L -curve regularization parameter selection method, Inverse Problems, Volume 12 (1996) no. 4, pp. 535-547 | DOI | MR | Zbl

[178] Wahba, Grace A survey of some smoothing problems and the method of generalized cross-validation for solving them, Applications of statistics (Proc. Sympos., Wright State Univ., Dayton, Ohio, 1976), North-Holland, Amsterdam, 1977, pp. 507-523 | MR | Zbl

[179] Wilks, Samuel Stanley The large-sample distribution of the likelihood ratio for testing composite hypotheses, Ann. Math. Statistics, Volume 9 (1938), pp. 60-62 | JFM | Zbl

[180] Yang, Yuhong Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation, Biometrika, Volume 92 (2005) no. 4, pp. 937-950 | MR | Zbl

[181] Zwald, Laurent Statistical performances of learning algorithm : Kernel Projection Machine and Kernel Principal Component Analysis, Université Paris Sud, November (2005) http://tel.archives-ouvertes.fr/tel-00012011/fr/ (Ph. D. Thesis Available at http://tel.archives-ouvertes.fr/tel-00012011v1 )