Model selection via testing : an alternative to (penalized) maximum likelihood estimators
Annales de l'I.H.P. Probabilités et statistiques, Volume 42 (2006) no. 3, p. 273-325
@article{AIHPB_2006__42_3_273_0,
     author = {Birge, Lucien},
     title = {Model selection via testing : an alternative to (penalized) maximum likelihood estimators},
     journal = {Annales de l'I.H.P. Probabilit\'es et statistiques},
     publisher = {Elsevier},
     volume = {42},
     number = {3},
     year = {2006},
     pages = {273-325},
     doi = {10.1016/j.anihpb.2005.04.004},
     zbl = {05024238},
     mrnumber = {2219712},
     language = {en},
     url = {http://www.numdam.org/item/AIHPB_2006__42_3_273_0}
}
Birgé, Lucien. Model selection via testing : an alternative to (penalized) maximum likelihood estimators. Annales de l'I.H.P. Probabilités et statistiques, Volume 42 (2006) no. 3, pp. 273-325. doi : 10.1016/j.anihpb.2005.04.004. http://www.numdam.org/item/AIHPB_2006__42_3_273_0/

[1] P. Assouad, Deux remarques sur l'estimation, C. R. Acad. Sci. Paris, Sér. I Math. 296 (1983) 1021-1024. | MR 777600 | Zbl 0568.62003

[2] J.-Y. Audibert, Théorie statistique de l'apprentissage : une approche PAC-bayésienne, Thèse de doctorat, Laboratoire de Probabilités et Modèles Aléatoires, Université Paris VI, Paris, 2004.

[3] Y. Baraud, Model selection for regression on a random design, ESAIM Probab. Statist. 6 (2002) 127-146. | Numdam | MR 1918295 | Zbl 1059.62038

[4] A.R. Barron, Complexity regularization with applications to artificial neural networks, in: Roussas G. (Ed.), Nonparametric Functional Estimation, Kluwer, Dordrecht, 1991, pp. 561-576. | MR 1154352 | Zbl 0739.62001

[5] A.R. Barron, L. Birgé, P. Massart, Risk bounds for model selection via penalization, Probab. Theory Related Fields 113 (1999) 301-415. | MR 1679028 | Zbl 0946.62036

[6] A.R. Barron, T.M. Cover, Minimum complexity density estimation, IEEE Trans. Inform. Theory 37 (1991) 1034-1054. | MR 1111806 | Zbl 0743.62003

[7] J. Beirlant, L. Györfi, On the asymptotic normality of the L 2 -error in partitioning regression estimation, J. Statist. Plann. Inference 71 (1998) 93-107. | MR 1651863 | Zbl 0961.62030

[8] L. Birgé, Approximation dans les espaces métriques et théorie de l'estimation, Z. Wahrscheinlichkeitstheorie Verw. Gebiete 65 (1983) 181-237. | MR 722129 | Zbl 0506.62026

[9] L. Birgé, Sur un théorème de minimax et son application aux tests, Probab. Math. Statist. 3 (1984) 259-282. | MR 764150 | Zbl 0571.62036

[10] L. Birgé, Stabilité et instabilité du risque minimax pour des variables indépendantes équidistribuées, Ann. Inst. H. Poincaré Sect. B 20 (1984) 201-223. | Numdam | MR 762855 | Zbl 0542.62018

[11] L. Birgé, On estimating a density using Hellinger distance and some other strange facts, Probab. Theory Related Fields 71 (1986) 271-291. | MR 816706 | Zbl 0561.62029

[12] L. Birgé, Model selection for Gaussian regression with random design, Bernoulli 10 (2004) 1039-1051. | MR 2108042 | Zbl 1064.62030

[13] L. Birgé, P. Massart, Rates of convergence for minimum contrast estimators, Probab. Theory Related Fields 97 (1993) 113-150. | MR 1240719 | Zbl 0805.62037

[14] L. Birgé, P. Massart, From model selection to adaptive estimation, in: Pollard D., Torgersen E., Yang G. (Eds.), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, Springer-Verlag, New York, 1997, pp. 55-87. | MR 1462939 | Zbl 0920.62042

[15] L. Birgé, P. Massart, Minimum contrast estimators on sieves: exponential bounds and rates of convergence, Bernoulli 4 (1998) 329-375. | MR 1653272 | Zbl 0954.62033

[16] L. Birgé, P. Massart, An adaptive compression algorithm in Besov spaces, Constr. Approx. 16 (2000) 1-36. | MR 1848840 | Zbl 1004.41006

[17] L. Birgé, P. Massart, Gaussian model selection, J. Eur. Math. Soc. 3 (2001) 203-268. | MR 1848946 | Zbl 1037.62001

[18] M.S. Birman, M.Z. Solomjak, Piecewise-polynomial approximation of functions of the classes W p , Mat. Sb. 73 (1967) 295-317. | MR 217487 | Zbl 0173.16001

[19] L.D. Brown, M.G. Low, Asymptotic equivalence of nonparametric regression and white noise, Ann. Statist. 24 (1996) 2384-2398. | MR 1425958 | Zbl 0867.62022

[20] F. Bunea, A.B. Tsybakov, M.H. Wegkamp, Aggregation for regression learning, Technical report 948, Laboratoire de Probabilités, Université Paris VI, 2004, http://www.proba.jussieu.fr/mathdoc/preprints/index.html# 2004.

[21] G. Castellan, Modified Akaike's criterion for histogram density estimation, Technical report 99.61, Université Paris-Sud, Orsay, 1999, http://www.math.u-psud.fr/~biblio/pub/1999/.

[22] G. Castellan, Sélection d'histogrammes à l'aide d'un critère de type Akaike, C. R. Acad. Sci. Paris 330 (2000) 729-732. | MR 1763919 | Zbl 0969.62023

[23] O. Catoni, The mixture approach to universal model selection, Technical report LMENS-97-22, Ecole Normale Supérieure, Paris, 1997, http://www.dma.ens.fr/edition/publis/1997/titre97.html. | Zbl 0928.62033

[24] O. Catoni, Statistical learning theory and stochastic optimization, in: Picard J. (Ed.), Lecture on Probability Theory and Statistics, Ecole d'Eté de Probabilités de Saint-Flour XXXI - 2001, Lecture Note in Math., vol. 1851, Springer-Verlag, Berlin, 2004. | MR 2163920 | Zbl 1076.93002

[25] H. Chernoff, A measure of asymptotic efficiency of tests of a hypothesis based on a sum of observations, Ann. Math. Statist. 23 (1952) 493-507. | MR 57518 | Zbl 0048.11804

[26] R.A. DeVore, G. Kerkyacharian, D. Picard, V. Temlyakov, Mathematical methods for supervised learning, Technical report 0422, IMI, University of South Carolina, Columbia, 2004, http://www.math.sc.edu/imip/preprints/04.html.

[27] R.A. Devore, G.G. Lorentz, Constructive Approximation, Springer-Verlag, Berlin, 1993. | MR 1261635 | Zbl 0797.41016

[28] L. Devroye, G. Lugosi, Combinatorial Methods in Density Estimation, Springer-Verlag, New York, 2001. | MR 1843146 | Zbl 0964.62025

[29] D.L. Donoho, I.M. Johnstone, G. Kerkyacharian, D. Picard, Density estimation by wavelet thresholding, Ann. Statist. 24 (1996) 508-539. | MR 1394974 | Zbl 0860.62032

[30] D.L. Donoho, R.C. Liu, B. Macgibbon, Minimax risk over hyperrectangles, and implications, Ann. Statist. 18 (1990) 1416-1437. | MR 1062717 | Zbl 0705.62018

[31] P.P.B. Eggermont, V.N. Lariccia, Maximum Penalized Likelihood Estimation, vol. I: Density Estimation, Springer, New York, 2001. | MR 1837879 | Zbl 0984.62026

[32] P. Groeneboom, Some current developments in density estimation, in: Bakker J.W. De, Hazewinkel M., Lenstra J.K. (Eds.), Mathematics and Computer Science, CWI Monograph, vol. 1, Elsevier, Amsterdam, 1986, pp. 163-192. | MR 873578 | Zbl 0593.62030

[33] L. Györfi, M. Kohler, A. Kryżak, H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer, New York, 2002. | Zbl 1021.62024

[34] P.J. Huber, A robust version of the probability ratio test, Ann. Math. Statist. 36 (1965) 1753-1758. | MR 185747 | Zbl 0137.12702

[35] P.J. Huber, Robust Statistics, John Wiley, New York, 1981. | MR 606374 | Zbl 0536.62025

[36] I.M. Johnstone, Chi-square oracle inequalities, in: Gunst M.C.M. De, Klaassen C.A.J., Vaart A.W. Van Der (Eds.), State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Lecture Notes Monograph Ser., vol. 36, Institute of Mathematical Statistics, 2001, pp. 399-418. | MR 1836572

[37] A. Juditsky, A.S. Nemirovski, Functional aggregation for nonparametric estimation, Ann. Statist. 28 (2000) 681-712. | MR 1792783 | Zbl 1105.62338

[38] G. Kerkyacharian, D. Picard, Thresholding algorithms, maxisets and well-concentrated bases, Test 9 (2000) 283-344. | MR 1821645 | Zbl 1107.62323

[39] A.N. Kolmogorov, V.M. Tikhomirov, ε-entropy and ε-capacity of sets in function spaces, Amer. Math. Soc. Transl. (2) 17 (1961) 277-364. | Zbl 0133.06703

[40] B. Laurent, P. Massart, Adaptive estimation of a quadratic functional by model selection, Ann. Statist. 28 (2000) 1302-1338. | MR 1805785 | Zbl 1105.62328

[41] L.M. Le Cam, On the assumptions used to prove asymptotic normality of maximum likelihood estimates, Ann. Math. Statist. 41 (1970) 802-828. | MR 267676 | Zbl 0246.62039

[42] L.M. Le Cam, Limits of experiments, in: Proc. 6th Berkeley Symp. on Math. Stat. and Prob. I, 1972, pp. 245-261. | MR 415819 | Zbl 0271.62004

[43] L.M. Le Cam, Convergence of estimates under dimensionality restrictions, Ann. Statist. 1 (1973) 38-53. | MR 334381 | Zbl 0255.62006

[44] L.M. Le Cam, On local and global properties in the theory of asymptotic normality of experiments, in: Puri M. (Ed.), Stochastic Processes and Related Topics, vol. 1, Academic Press, New York, 1975, pp. 13-54. | MR 395005 | Zbl 0389.62011

[45] L.M. Le Cam, Asymptotic Methods in Statistical Decision Theory, Springer-Verlag, New York, 1986. | MR 856411 | Zbl 0605.62002

[46] L.M. Le Cam, Maximum likelihood: an introduction, Inter. Statist. Rev. 58 (1990) 153-171. | Zbl 0715.62045

[47] L.M. Le Cam, Metric dimension and statistical estimation, CRM Proc. and Lecture Notes 11 (1997) 303-311. | MR 1479680 | Zbl 0942.62035

[48] G.G. Lorentz, Approximation of Functions, Holt, Rinehart, Winston, New York, 1966. | MR 213785 | Zbl 0153.38901

[49] G.G. Lorentz, M. Von Golitschek, Y. Makovoz, Constructive Approximation, Advanced Problems, Springer, Berlin, 1996. | MR 1393437 | Zbl 0910.41001

[50] A.S. Nemirovski, Topics in non-parametric statistics, in: Bernard P. (Ed.), Lecture on Probability Theory and Statistics, Ecole d'Eté de Probabilités de Saint-Flour XXVIII - 1998, Lecture Notes in Math., vol. 1738, Springer-Verlag, Berlin, 2000, pp. 85-297. | MR 1775640 | Zbl 0998.62033

[51] M. Nussbaum, Asymptotic equivalence of density estimation and Gaussian white noise, Ann. Statist. 24 (1996) 2399-2430. | MR 1425959 | Zbl 0867.62035

[52] A. Pinkus, n-widths in Approximation Theory, Springer-Verlag, Berlin, 1985. | MR 774404 | Zbl 0551.41001

[53] M.S. Pinsker, Optimal filtration of square-integrable signals in Gaussian noise, Problems Inform. Transmission 16 (1980) 120-133. | MR 624591 | Zbl 0452.94003

[54] X. Shen, W.H. Wong, Convergence rates of sieve estimates, Ann. Statist. 22 (1994) 580-615. | MR 1292531 | Zbl 0805.62008

[55] B.W. Silverman, On the estimation of a probability density function by the maximum penalized likelihood method, Ann. Statist. 10 (1982) 795-810. | MR 663433 | Zbl 0492.62034

[56] A.B. Tsybakov, Optimal rates of aggregation, in: Proceedings of 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, Lecture Notes in Artificial Intelligence, vol. 2777, Springer-Verlag, Berlin, 2003, pp. 303-313.

[57] S. Van De Geer, Estimating a regression function, Ann. Statist. 18 (1990) 907-924. | MR 1056343 | Zbl 0709.62040

[58] S. Van De Geer, Hellinger-consistency of certain nonparametric maximum likelihood estimates, Ann. Statist. 21 (1993) 14-44. | MR 1212164 | Zbl 0779.62033

[59] S. Van De Geer, Empirical Processes in M-Estimation, Cambridge University Press, Cambridge, 2000. | MR 1739079

[60] A.W. Van Der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge, 1998. | MR 1652247 | Zbl 0910.62001

[61] G. Wahba, Spline Models for Observational Data, SIAM, Philadelphia, PA, 1990. | MR 1045442 | Zbl 0813.62001

[62] A. Wald, Note on the consistency of the maximum likelihood estimate, Ann. Math. Statist. 20 (1949) 595-601. | MR 32169 | Zbl 0034.22902

[63] M.H. Wegkamp, Model selection in nonparametric regression, Ann. Statist. 31 (2003) 252-273. | MR 1962506 | Zbl 1019.62037

[64] W.H. Wong, X. Shen, Probability inequalities for likelihood ratios and convergence rates of sieve MLEs, Ann. Statist. 23 (1995) 339-362. | MR 1332570 | Zbl 0829.62002

[65] Y. Yang, Minimax optimal density estimation, Ph.D. dissertation, Dept. of Statistics, Yale University, New Haven, 1996.

[66] Y. Yang, Mixing strategies for density estimation, Ann. Statist. 28 (2000) 75-87. | MR 1762904 | Zbl 1106.62322

[67] Y. Yang, Combining different procedures for adaptive regression, J. Multivariate Anal. 74 (2000) 135-161. | MR 1790617 | Zbl 0964.62032

[68] Y. Yang, Adaptive regression by mixing, J. Amer. Statist. Assoc. 96 (2001) 574-588. | MR 1946426 | Zbl 1018.62033

[69] Y. Yang, How accurate can any regression procedure be?, Technical report, Iowa State University, Ames, 2001, http://www.public.iastate.edu/yyang/papers/index.html.

[70] Y. Yang, Aggregating regression procedures to improve performance, Bernoulli 10 (2004) 25-47. | MR 2044592 | Zbl 1040.62030

[71] Y. Yang, A.R. Barron, An asymptotic property of model selection criteria, IEEE Trans. Inform. Theory 44 (1998) 95-116. | MR 1486651 | Zbl 0949.62041

[72] Y. Yang, A.R. Barron, Information-theoretic determination of minimax rates of convergence, Ann. Statist. 27 (1999) 1564-1599. | MR 1742500 | Zbl 0978.62008

[73] Y.G. Yatracos, Rates of convergence of minimum distance estimates and Kolmogorov's entropy, Ann. Statist. 13 (1985) 768-774. | MR 790571 | Zbl 0576.62057

[74] B. Yu, Assouad, Fano and Le Cam, in: Pollard D., Torgersen E., Yang G. (Eds.), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, Springer-Verlag, New York, 1997, pp. 423-435. | MR 1462963 | Zbl 0896.62032