On the optimality of the empirical risk minimization procedure for the convex aggregation problem
Annales de l'I.H.P. Probabilités et statistiques, Volume 49 (2013) no. 1, p. 288-306

We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class F. We show that ERM performed in the convex hull of F is an optimal aggregation procedure for the convex aggregation problem. We also show that if this procedure is used for the problem of model selection aggregation, in which one wants to mimic the performance of the best function in F itself, then its rate is the same as the one achieved for the convex aggregation problem, and thus is far from optimal. These results are obtained in deviation and are sharp up to logarithmic factors.

Nous étudions les performances de la procédure de minimisation du risque empirique, par rapport au risque quadratique, pour le problème d’agrégation convexe. Dans ce problème, on souhaite construire des procédures dont le risque est aussi proche que possible du risque du meilleur élément dans l’enveloppe convexe d’une classe finie F de fonctions. Nous prouvons que la procédure obtenue par minimisation du risque empirique sur la coque convexe de F est une procédure optimale pour le problème d’aggrégation convexe. Nous prouvons aussi que si cette procédure est utilisée pour le problème d’agrégation en sélection de modèle, pour lequel on souhaite imiter le meilleur dans F, alors le résidu d’agrégation est le même que celui obtenue pour le problème d’agrégation convexe. Cette procédure est donc loin d’être optimale pour le problème d’agrégation en sélection de modèle. Ces résultats sont obtenus en déviation et sont optimaux à des facteurs logarithmiques prés.

DOI : https://doi.org/10.1214/11-AIHP458
Classification:  62G08
Keywords: learning theory, aggregation theory, empirical process theory
@article{AIHPB_2013__49_1_288_0,
     author = {Lecu\'e, Guillaume and Mendelson, Shahar},
     title = {On the optimality of the empirical risk minimization procedure for the convex aggregation problem},
     journal = {Annales de l'I.H.P. Probabilit\'es et statistiques},
     publisher = {Gauthier-Villars},
     volume = {49},
     number = {1},
     year = {2013},
     pages = {288-306},
     doi = {10.1214/11-AIHP458},
     zbl = {1259.62038},
     mrnumber = {3060158},
     language = {en},
     url = {http://www.numdam.org/item/AIHPB_2013__49_1_288_0}
}
Lecué, Guillaume; Mendelson, Shahar. On the optimality of the empirical risk minimization procedure for the convex aggregation problem. Annales de l'I.H.P. Probabilités et statistiques, Volume 49 (2013) no. 1, pp. 288-306. doi : 10.1214/11-AIHP458. http://www.numdam.org/item/AIHPB_2013__49_1_288_0/

[1] J.-Y. Audibert. Aggregated estimators and empirical complexity for least square regression. Ann. Inst. Henri Poincaré Probab. Stat. 40 (2004) 685-736. | Numdam | MR 2096215 | Zbl 1052.62037

[2] J.-Y. Audibert. Progressive mixture rules are deviation suboptimal. In Advances in Neural Information Processing Systems (NIPS), 2007.

[3] J.-Y. Audibert. Fast learning rates in statistical inference through aggregation. Ann. Statist. 37 (2009) 1591-1646. | MR 2533466 | Zbl 05582005 | Zbl pre05582005

[4] P. L. Bartlett and S. Mendelson. Empirical minimization. Probab. Theory Related Fields 135 (2006) 311-334. | MR 2240689 | Zbl 1142.62348

[5] P. L. Bartlett, S. Mendelson and J. Neeman. 1 -regularized linear regression: Persistence and oracle inequalities. Probab. Theory Related Fields 154 (2012) 193-224. | MR 2981422 | Zbl 06125014 | Zbl pre06125014

[6] L. Birgé. Model selection via testing: An alternative to (penalized) maximum likelihood estimators. Ann. Inst. Henri Poincaré Probab. Stat. 42 (2006) 273-325. | Numdam | MR 2219712 | Zbl 1333.62094 | Zbl pre05024238

[7] O. Bousquet, V. Koltchinskii and D. Panchenko. Some local measures of complexity of convex hulls and generalization bounds. In Computational Learning Theory (Sydney, 2002) 59-73. Lecture Notes in Comput. Sci. 2375. Springer, Berlin, 2002. | MR 2040405 | Zbl 1050.68055

[8] F. Bunea and A. Nobel. Sequential procedures for aggregating arbitrary estimators of a conditional mean. IEEE Trans. Inform. Theory 54 (2008) 1725-1735. | MR 2450298

[9] F. Bunea, A. B. Tsybakov and M. H. Wegkamp. Aggregation for Gaussian regression. Ann. Statist. 35 (2007) 1674-1697. | MR 2351101 | Zbl 1209.62065

[10] A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory 97-111. Lecture Notes in Comput. Sci. 4539. Springer, Berlin, 2007. | MR 2397581 | Zbl 1203.62063

[11] M. Emery, A. Nemirovski and D. Voiculescu. Lectures on Probability Theory and Statistics. Lecture Notes in Math. 1738. Springer, Berlin, 2000. | MR 1775638 | Zbl 0941.00026

[12] A. Juditsky and A. Nemirovski. Functional aggregation for nonparametric regression. Ann. Statist. 28 (2000) 681-712. | MR 1792783 | Zbl 1105.62338

[13] B. Klartag. 5n Minkowski symmetrizations suffice to arrive at an approximate Euclidean ball. Ann. of Math. (2) 156 (2002) 947-960. | MR 1954241 | Zbl 1028.52002

[14] V. Koltchinskii. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole d'Été de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Math. Springer, Berlin, 2011. | MR 2829871 | Zbl 1223.91002

[15] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 (2006) 2593-2656. | MR 2329442 | Zbl 1118.62065

[16] G. Lecué and S. Mendelson. Aggregation via empirical risk minimization. Probab. Theory Related Fields 145 (2009) 591-613. | MR 2529440 | Zbl 1206.62094

[17] G. Lecué and S. Mendelson. General non-exact oracle inequalities in the unbounded case. Preprint.

[18] M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin, 1991. | MR 1102015 | Zbl 0748.60004

[19] S. Mendelson. Obtaining fast error rates in nonconvex situations. J. Complexity 24 (2008) 380-397. | MR 2426759 | Zbl pre05303163

[20] S. Mendelson. Empirical processes with a bounded ψ 1 diameter. Geom. Funct. Anal. 20 (2010) 988-1027. | MR 2729283 | Zbl 1204.60042

[21] S. Mendelson, A. Pajor and N. Tomczak-Jaegermann. Reconstruction and sub-Gaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17 (2007) 1248-1282. | MR 2373017 | Zbl 1163.46008

[22] V. V. Petrov. Limit Theorems of Probability Theory. Oxford Studies in Probability 4. Oxford Univ. Press, New York, 1995. | MR 1353441 | Zbl 0826.60001

[23] G. Pisier. The Volume of Convex Bodies and Banach Space Geometry. Cambridge Tracts in Math. 94. Cambridge Univ. Press, Cambridge, 1989. | MR 1036275 | Zbl 0698.46008

[24] A. B. Tsybakov. Optimal rate of aggregation. In Computational Learning Theory and Kernel Machines (COLT-2003) 303-313. Lecture Notes in Artificial Intelligence 2777. Springer, Heidelberg, 2003. | Zbl 1208.62073

[25] A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 (2004) 135-166. | MR 2051002 | Zbl 1105.62353

[26] A. W. Van Der Vaart and Jon A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Ser. Statist. Springer, New York, 1996. | MR 1385671 | Zbl 0862.60002

[27] Y. Yang. Mixing strategies for density estimation. Ann. Statist. 28 (2000) 75-87. | MR 1762904 | Zbl 1106.62322

[28] Y. Yang. Aggregating regression procedures to improve performance. Bernoulli 10 (2004) 25-47. | MR 2044592 | Zbl 1040.62030