Density estimation with quadratic loss : a confidence intervals method
ESAIM: Probability and Statistics, Tome 12 (2008), pp. 438-463.

We propose a feature selection method for density estimation with quadratic loss. This method relies on the study of unidimensional approximation models and on the definition of confidence regions for the density thanks to these models. It is quite general and includes cases of interest like detection of relevant wavelets coefficients or selection of support vectors in SVM. In the general case, we prove that every selected feature actually improves the performance of the estimator. In the case where features are defined by wavelets, we prove that this method is adaptative near minimax (up to a log term) in some Besov spaces. We end the paper by simulations indicating that it must be possible to extend the adaptation result to other features.

DOI : https://doi.org/10.1051/ps:2007050
Classification : 62G07,  62G15,  62G20,  68T05,  68Q32
Mots clés : density estimation, support vector machines, kernel algorithms, thresholding methods, wavelets
@article{PS_2008__12__438_0,
     author = {Alquier, Pierre},
     title = {Density estimation with quadratic loss : a confidence intervals method},
     journal = {ESAIM: Probability and Statistics},
     pages = {438--463},
     publisher = {EDP-Sciences},
     volume = {12},
     year = {2008},
     doi = {10.1051/ps:2007050},
     mrnumber = {2437718},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ps:2007050/}
}
TY  - JOUR
AU  - Alquier, Pierre
TI  - Density estimation with quadratic loss : a confidence intervals method
JO  - ESAIM: Probability and Statistics
PY  - 2008
DA  - 2008///
SP  - 438
EP  - 463
VL  - 12
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ps:2007050/
UR  - https://www.ams.org/mathscinet-getitem?mr=2437718
UR  - https://doi.org/10.1051/ps:2007050
DO  - 10.1051/ps:2007050
LA  - en
ID  - PS_2008__12__438_0
ER  - 
Alquier, Pierre. Density estimation with quadratic loss : a confidence intervals method. ESAIM: Probability and Statistics, Tome 12 (2008), pp. 438-463. doi : 10.1051/ps:2007050. http://www.numdam.org/articles/10.1051/ps:2007050/

[1] H. Akaike, A new look at the statistical model identification. IEEE Trans. Autom. Control 19 (1974) 716-723. | MR 423716 | Zbl 0314.62039

[2] P. Alquier, Iterative Feature Selection In Least Square Regression Estimation. Ann. Inst. H. Poincaré B: Probab. Statist. 44 (2008) 47-88. | Numdam | MR 2451571

[3] A. Barron, A. Cohen, W. Dahmen and R. Devore, Adaptative Approximation and Learning by Greedy Algorithms, preprint (2006). | MR 2387964 | Zbl 1138.62019

[4] G. Blanchard, P. Massart, R. Vert and L. Zwald, Kernel Projection Machine: A New Tool for Pattern Recognition. Proceedings of NIPS (2004).

[5] B.E. Boser, I.M. Guyon and V.N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, D. Haussler (ed.), ACM Press (1992) 144-152.

[6] T.T. Cai and L.D. Brown, Wavelet Estimation for Samples with Random Uniform Design. Stat. Probab. Lett. 42 (1999) 313-321. | MR 1688134 | Zbl 0940.62037

[7] O. Catoni, Statistical learning theory and stochastic optimization, Lecture Notes, Saint-Flour Summer School on Probability Theory (2001), Springer. | MR 2163920 | Zbl 1076.93002

[8] O. Catoni, PAC-Bayesian Inductive and Transductive Learning, manuscript (2006).

[9] O. Catoni, A PAC-Bayesian approach to adaptative classification, preprint Laboratoire de Probabilités et Modèles Aléatoires (2003).

[10] A. Cohen, Wavelet methods in numerical analysis, in Handbook of numerical analysis, Vol. VII, North-Holland, Amsterdam (2000) 417-711. | MR 1804747 | Zbl 0976.65124

[11] I. Daubechies, Ten Lectures on Wavelets. SIAM, Philadelphia (1992). | MR 1162107 | Zbl 0776.42018

[12] D.L. Donoho and I.M. Johnstone, Ideal Spatial Adaptation by Wavelets. Biometrika 81 (1994) 425-455. | MR 1311089 | Zbl 0815.62019

[13] D.L. Donoho, I.M. Johnstone, G. Kerkyacharian and D. Picard, Density Estimation by Wavelet Thresholding. Ann. Statist. 24 (1996) 508-539. | MR 1394974 | Zbl 0860.62032

[14] I.J. Good and R.A. Gaskins, Nonparametric roughness penalties for probability densities. Biometrika 58 (1971) 255-277. | MR 319314 | Zbl 0221.62012

[15] W. Härdle, G. Kerkyacharian, D. Picard and A.B. Tsybakov, Wavelets, Approximations and Statistical Applications. Lecture Notes in Statistics, Springer (1998). | MR 1618204 | Zbl 0899.62002

[16] J.S. Marron and S.P. Wand, Exact Mean Integrated Square Error. Ann. Statist. 20 (1992) 712-736. | MR 1165589 | Zbl 0746.62040

[17] D. Panchenko, Symmetrization Approach to Concentration Inequalities for Empirical Processes. Ann. Probab. 31 (2003) 2068-2081. | MR 2016612 | Zbl 1042.60008

[18] R Development Core Team, R: A Language And Environment For Statistical Computing, R Foundation For Statistical Computing, Vienna, Austria, 2004. URL http://www.R-project.org.

[19] G. Ratsch, C. Schafer, B. Scholkopf and S. Sonnenburg, Large Scale Multiple Kernel Learning. J. Machine Learning Research 7 (2006) 1531-1565. | MR 2274416

[20] J. Rissanen, Modeling by shortest data description. Automatica 14 (1978) 465-471. | Zbl 0418.93079

[21] M. Seeger, PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification. J. Machine Learning Res. 3 (2002) 233-269. | MR 1971338 | Zbl 1088.68745

[22] M. Tipping, The Relevance Vector Machine, in Advances in Neural Information Processing Systems, San Mateo, CA (2000). Morgan Kaufmann.

[23] A.B. Tsybakov, Introduction à l'estimation non-paramétrique. Mathématiques et Applications, Springer (2004). | MR 2013911 | Zbl 1029.62034

[24] V.N. Vapnik, The nature of statistical learning theory. Springer Verlag (1998). | MR 1367965 | Zbl 0833.62008

[25] Zhao Zhang, Su Zhang, Chen-xi Zhang and Ya-zhu Chen, SVM for density estimation and application to medical image segmentation. J. Zhejiang Univ. Sci. B 7 (2006).

Cité par Sources :