Histogram selection in non gaussian regression
ESAIM: Probability and Statistics, Tome 13 (2009), pp. 70-86.

We deal with the problem of choosing a piecewise constant estimator of a regression function $s$ mapping $𝒳$ into $ℝ$. We consider a non gaussian regression framework with deterministic design points, and we adopt the non asymptotic approach of model selection via penalization developed by Birgé and Massart. Given a collection of partitions of $𝒳$, with possibly exponential complexity, and the corresponding collection of piecewise constant estimators, we propose a penalized least squares criterion which selects a partition whose associated estimator performs approximately as well as the best one, in the sense that its quadratic risk is close to the infimum of the risks. The risk bound we provide is non asymptotic.

DOI : https://doi.org/10.1051/ps:2008002
Classification : 62G08,  62G05
Mots clés : CART, change-points detection, deviation inequalities, model selection, oracle inequalities, regression
@article{PS_2009__13__70_0,
author = {Sauv\'e, Marie},
title = {Histogram selection in non gaussian regression},
journal = {ESAIM: Probability and Statistics},
pages = {70--86},
publisher = {EDP-Sciences},
volume = {13},
year = {2009},
doi = {10.1051/ps:2008002},
mrnumber = {2502024},
language = {en},
url = {http://www.numdam.org/articles/10.1051/ps:2008002/}
}
TY  - JOUR
AU  - Sauvé, Marie
TI  - Histogram selection in non gaussian regression
JO  - ESAIM: Probability and Statistics
PY  - 2009
DA  - 2009///
SP  - 70
EP  - 86
VL  - 13
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ps:2008002/
UR  - https://www.ams.org/mathscinet-getitem?mr=2502024
UR  - https://doi.org/10.1051/ps:2008002
DO  - 10.1051/ps:2008002
LA  - en
ID  - PS_2009__13__70_0
ER  - 
Sauvé, Marie. Histogram selection in non gaussian regression. ESAIM: Probability and Statistics, Tome 13 (2009), pp. 70-86. doi : 10.1051/ps:2008002. http://www.numdam.org/articles/10.1051/ps:2008002/

[1] Y. Baraud, Model selection for regression on a fixed design. Probab. Theory Related Fields 117 (2000) 467-493. | MR 1777129 | Zbl 0997.62027

[2] Y. Baraud, F. Comte and G. Viennet, Model $\text{S}$election for (auto-)regression with dependent data. ESAIM: PS 5 (2001) 33-49. | EuDML 104278 | Numdam | MR 1845321 | Zbl 0990.62035

[3] L. Birgé and P. Massart, Gaussian model selection. J. Eur. Math. Soc. 3 (2001) 203-268. | EuDML 277724 | MR 1848946 | Zbl 1037.62001

[4] L. Birgé and P. Massart, Minimal penalties for gaussian model selection. To be published in Probab. Theory Related Fields (2005). | MR 2288064 | Zbl 1112.62082

[5] L. Birgé and Y. Rozenholc, How many bins should be put in a regular histogram. ESAIM: PS 10 (2006) 24-45. | EuDML 249764 | Numdam | MR 2197101 | Zbl 1136.62329

[6] O. Bousquet, Concentration $\text{I}$nequalities for $\text{S}$ub-$\text{A}$dditive $\text{F}$unctions $\text{U}$sing the $\text{E}$ntropy $\text{M}$ethod. Stochastic Inequalities and Applications 56 (2003) 213-247. | MR 2073435 | Zbl 1037.60015

[7] L. Breiman, J. Friedman, R. Olshen and C. Stone, Classification And Regression Trees. Chapman et Hall (1984). | MR 726392 | Zbl 0541.62042

[8] G. Castellan, Modified $\text{A}$kaike’s criterion for histogram density estimation. C.R. Acad. Sci. Paris Sér. I Math. 330 (2000) 729-732. | MR 1763919 | Zbl 0969.62023

[9] O. Catoni, Universal aggregation rules with sharp oracle inequalities. Ann. Stat. (1999) 1-37.

[10] E. Lebarbier, Quelques approches pour la détection de ruptures à horizon fini. Ph.D. thesis, Université Paris XI Orsay (2002).

[11] G. Lugosi and A. Nobel, Consistency of data-driven histogram methods for density estimation and classification. Ann. Stat. 24 (1996) 786-706. | MR 1394983 | Zbl 0859.62040

[12] C.L. Mallows, Some comments on ${c}_{p}$. Technometrics 15 (1973) 661-675. | Zbl 0269.62061

[13] P. Massart, Notes de $\text{S}$aint-$\text{F}$lour. Lecture Notes to be published (2003).

[14] A. Nobel, Histogram regression estimation using data-dependent partitions. Ann. Stat. 24 (1996) 1084-1105. | MR 1401839 | Zbl 0862.62038

[15] M. Sauvé, Sélection de modèles en régression non gaussienne. Applications à la sélection de variables et aux tests de survie accélérés. Ph.D. thesis, Université Paris XI Orsay (2006).

[16] M. Sauvé and C. Tuleau, Variable selection through CART. Research Report 5912, INRIA (2006).

Cité par Sources :