On the optimality of sample-based estimates of the expectation of the empirical minimizer
ESAIM: Probability and Statistics, Tome 14 (2010), pp. 315-337.

We study sample-based estimates of the expectation of the function produced by the empirical minimization algorithm. We investigate the extent to which one can estimate the rate of convergence of the empirical minimizer in a data dependent manner. We establish three main results. First, we provide an algorithm that upper bounds the expectation of the empirical minimizer in a completely data-dependent manner. This bound is based on a structural result due to Bartlett and Mendelson, which relates expectations to sample averages. Second, we show that these structural upper bounds can be loose, compared to previous bounds. In particular, we demonstrate a class for which the expectation of the empirical minimizer decreases as O(1/n) for sample size n, although the upper bound based on structural properties is Ω(1). Third, we show that this looseness of the bound is inevitable: we present an example that shows that a sharp bound cannot be universally recovered from empirical data.

DOI : https://doi.org/10.1051/ps:2008036
Classification : 62G08,  68Q32
Mots clés : error bounds; empirical minimization; data-dependent complexity
@article{PS_2010__14__315_0,
author = {Bartlett, Peter L. and Mendelson, Shahar and Philips, Petra},
title = {On the optimality of sample-based estimates of the expectation of the empirical minimizer},
journal = {ESAIM: Probability and Statistics},
pages = {315--337},
publisher = {EDP-Sciences},
volume = {14},
year = {2010},
doi = {10.1051/ps:2008036},
mrnumber = {2779487},
language = {en},
url = {http://www.numdam.org/articles/10.1051/ps:2008036/}
}
TY  - JOUR
AU  - Bartlett, Peter L.
AU  - Mendelson, Shahar
AU  - Philips, Petra
TI  - On the optimality of sample-based estimates of the expectation of the empirical minimizer
JO  - ESAIM: Probability and Statistics
PY  - 2010
DA  - 2010///
SP  - 315
EP  - 337
VL  - 14
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ps:2008036/
UR  - https://www.ams.org/mathscinet-getitem?mr=2779487
UR  - https://doi.org/10.1051/ps:2008036
DO  - 10.1051/ps:2008036
LA  - en
ID  - PS_2010__14__315_0
ER  - 
Bartlett, Peter L.; Mendelson, Shahar; Philips, Petra. On the optimality of sample-based estimates of the expectation of the empirical minimizer. ESAIM: Probability and Statistics, Tome 14 (2010), pp. 315-337. doi : 10.1051/ps:2008036. http://www.numdam.org/articles/10.1051/ps:2008036/

[1] P.L. Bartlett and S. Mendelson, Empirical minimization. Probab. Theory Relat. Fields 135 (2006) 311-334. | Zbl 1142.62348

[2] P.L. Bartlett and M.H. Wegkamp, Classification with a reject option using a hinge loss. J. Machine Learn. Res. 9 (2008) 1823-1840. | Zbl 1151.62302

[3] P.L. Bartlett, O. Bousquet and S. Mendelson, Local Rademacher Complexities. Ann. Statist. 33 (2005) 1497-1537. | Zbl 1083.62034

[4] P.L. Bartlett, M.I. Jordan and J.D. Mcauliffe, Convexity, classification, and risk bounds. J. Am. Statist. Assoc. 101 (2006) 138-156. | Zbl 1118.62330

[5] G. Blanchard, G. Lugosi and N. Vayatis, On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. 4 (2003) 861-894. | Zbl 1083.68109

[6] S. Boucheron, G. Lugosi and P. Massart, Concentration inequalities using the entropy method. Ann. Probab. 31 (2003) 1583-1614. | Zbl 1051.60020

[7] O. Bousquet, Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. Ph.D. thesis, École Polytechnique, 2002.

[8] R.M. Dudley, Uniform Central Limit Theorems, Cambridge University Press (1999). | Zbl 1139.60016

[9] D. Haussler, Sphere Packing Numbers for Subsets of the Boolean n-cube with Bounded Vapnik-Chervonenkis Dimension. J. Combin. Theory Ser. A 69 (1995) 217-232. | Zbl 0818.60005

[10] T. Klein, Une inégalité de concentration gauche pour les processus empiriques. C. R. Math. Acad. Sci. Paris 334 (2002) 501-504. | Zbl 1003.60024

[11] V. Koltchinskii, Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. Ann. Statist. 34 (2006). | Zbl 1118.62065

[12] V. Koltchinskii and D. Panchenko, Rademacher processes and bounding the risk of function learning. High Dimensional Probability, Vol. II (2000) 443-459. | Zbl 1106.68385

[13] M. Ledoux, The Concentration of Measure Phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society (2001). | Zbl 0995.60002

[14] W.S. Lee, P.L. Bartlett and R.C. Williamson, The Importance of Convexity in Learning with Squared Loss. IEEE Trans. Informa. Theory 44 (1998) 1974-1980. | Zbl 0935.68091

[15] G. Lugosi and N. Vayatis, On the Bayes-risk consistency of regularized boosting methods (with discussion), Ann. Statist. 32 (2004) 30-55. | Zbl 1105.62319

[16] G. Lugosi and M. Wegkamp, Complexity regularization via localized random penalties. Ann. Statist. 32 (2004) 1679-1697. | Zbl 1045.62060

[17] P. Massart, The constants in Talagrand's concentration inequality for empirical processes. Ann. Probab. 28 (2000) 863-884. | Zbl 1140.60310

[18] P. Massart, Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math. IX (2000) 245-303. | Numdam | Zbl 0986.62002

[19] P. Massart and E. Nédélec, Risk bounds for statistical learning. Ann. Statist. 34 (2006) 2326-2366. | Zbl 1108.62007

[20] S. Mendelson, Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 (2002) 1977-1991. | Zbl 1061.68128

[21] S. Mendelson, A few notes on Statistical Learning Theory. In Proc. of the Machine Learning Summer School, Canberra 2002, S. Mendelson and A. J. Smola (Eds.), LNCS 2600. Springer (2003). | Zbl 1019.68093

[22] E. Rio, Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Relat. Fields 119 (2001) 163-175. | Zbl 0976.60033

[23] M. Rudelson and R. Vershynin, Combinatorics of random processes and sections of convex bodies. Ann. Math. 164 (2006) 603-648. | Zbl 1114.60009

[24] M. Talagrand, Sharper Bounds for Gaussian and Empirical Processes. Ann. Probab. 22 (1994) 20-76. | Zbl 0798.60051

[25] M. Talagrand, New concentration inequalities in product spaces. Inventiones Mathematicae 126 (1996) 505-563. | Zbl 0893.60001

[26] B. Tarigan and S.A. Van De Geer, Adaptivity of support vector machines with ${\ell }_{1}$ penalty. Technical Report MI 2004-14, University of Leiden (2004).

[27] A. Tsybakov, Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 (2004) 135-166. | Zbl 1105.62353

[28] S.A. Van De Geer, A new approach to least squares estimation, with applications. Ann. Statist. 15 (1987) 587-602. | Zbl 0625.62046

[29] S.A. Van De Geer, Empirical Processes in M-Estimation, Cambridge University Press (2000). | Zbl 1179.62073

[30] A. Van Der Vaart and J. Wellner, Weak Convergence and Empirical Processes. Springer (1996). | Zbl 0862.60002

[31] V.N. Vapnik and A.Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 (1971) 264-280. | Zbl 0247.60005

Cité par Sources :