Recursive bias estimation for multivariate regression smoothers
ESAIM: Probability and Statistics, Tome 18 (2014) , pp. 483-502.

This paper presents a practical and simple fully nonparametric multivariate smoothing procedure that adapts to the underlying smoothness of the true regression function. Our estimator is easily computed by successive application of existing base smoothers (without the need of selecting an optimal smoothing parameter), such as thin-plate spline or kernel smoothers. The resulting smoother has better out of sample predictive capabilities than the underlying base smoother, or competing structurally constrained models (MARS, GAM) for small dimension (3 ≤ d ≤ 7) and moderate sample size n ≤ 1000. Moreover our estimator is still useful when d > 10 and to our knowledge, no other adaptive fully nonparametric regression estimator is available without constrained assumption such as additivity for example. On a real example, the Boston Housing Data, our method reduces the out of sample prediction error by 20%. An R package ibr, available at CRAN, implements the proposed multivariate nonparametric method in R.

DOI : https://doi.org/10.1051/ps/2013046
Classification : 62G07,  62G20
Mots clés : nonparametric regression, smoother, kernel, thin-plate splines, stopping rules
@article{PS_2014__18__483_0,
     author = {Cornillon, Pierre-Andr\'e and Hengartner, N. W. and Matzner-L{\o}ber, E.},
     title = {Recursive bias estimation for multivariate regression smoothers},
     journal = {ESAIM: Probability and Statistics},
     pages = {483--502},
     publisher = {EDP-Sciences},
     volume = {18},
     year = {2014},
     doi = {10.1051/ps/2013046},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ps/2013046/}
}
Cornillon, Pierre-André; Hengartner, N. W.; Matzner-Løber, E. Recursive bias estimation for multivariate regression smoothers. ESAIM: Probability and Statistics, Tome 18 (2014) , pp. 483-502. doi : 10.1051/ps/2013046. http://www.numdam.org/articles/10.1051/ps/2013046/

[1] B. Abdous, Computationally efficient classes of higher-order kernel functions. Can. J. Statist. 23 (1995) 21-27. | MR 1340959 | Zbl 0819.62031

[2] L. Breiman, Using adaptive bagging to debias regressions. Technical Report 547, Dpt of Statist., UC Berkeley (1999).

[3] L. Breiman and J. Friedman, Estimating optimal transformation for multiple regression and correlation. J. Amer. Stat. Assoc. 80 (1995) 580-598. | MR 803258 | Zbl 0594.62044

[4] P. Bühlmann and B. Yu, Boosting with the l2 loss: Regression and classification. J. Amer. Stat. Assoc. 98 (2003) 324-339. | MR 1995709 | Zbl 1041.62029

[5] P.-A. Cornillon, N. Hengartner and E. Matzner-Løber, Recursive bias estimation and l2 boosting. Technical report, ArXiv:0801.4629 (2008).

[6] P.-A. Cornillon, N. Hengartner and Matzner-Løber, ibr: Iterative Bias Reduction. CRAN (2010). http://cran.r-project.org/web/packages/ibr/index.html.

[7] P.-A. Cornillon, N. Hengartner, N. Jégou and Matzner-Løber, Iterative bias reduction: a comparative study. Statist. Comput. (2012).

[8] P. Craven and G. Wahba, Smoothing noisy data with spline functions. Numer. Math. 31 (1979) 377-403. | MR 516581 | Zbl 0377.65007

[9] M. Di Marzio and C. Taylor, On boosting kernel regression. J. Statist. Plan. Infer. 138 (2008) 2483-2498. | MR 2432380 | Zbl 1182.62091

[10] R. Eubank, Nonparametric regression and spline smoothing. Dekker, 2nd edition (1999). | MR 1680784 | Zbl 0936.62044

[11] W. Feller, An introduction to probability and its applications, vol. 2. Wiley (1966). | MR 210154 | Zbl 0039.13201

[12] J. Friedman, Multivariate adaptive regression splines. Ann. Statist. 19 (1991) 337-407. | MR 1091842 | Zbl 0765.62064

[13] J. Friedman, Greedy function approximation: A gradient boosting machine. Ann. Statist. 28 (1189-1232) (2001). | MR 1873328 | Zbl 1043.62034

[14] J. Friedman and W. Stuetzle, Projection pursuit regression. J. Amer. Statist. Assoc. 76 (817-823) (1981). | MR 650892

[15] J. Friedman, T. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Statist. 28 (2000) 337-407. | MR 1790002 | Zbl 1106.62323

[16] C. Gu, Smoothing spline ANOVA models. Springer (2002). | MR 1876599 | Zbl 1269.62040

[17] L. Gyorfi, M. Kohler, A. Krzyzak and H. Walk, A Distribution-Free Theory of Nonparametric Regression. Springer Verlag (2002). | MR 1920390 | Zbl 1021.62024

[18] D. Harrison and D. Rubinfeld, Hedonic prices and the demand for clean air. J. Environ. Econ. Manag. (1978) 81-102. | Zbl 0375.90023

[19] T. Hastie and R. Tibshirani, Generalized Additive Models. Chapman & Hall (1995). | MR 1082147 | Zbl 0747.62061

[20] R.A. Horn and C.R. Johnson, Matrix analysis. Cambridge (1985). | MR 832183 | Zbl 1267.15001

[21] C. Hurvich, G. Simonoff and C.L. Tsai, Smoothing parameter selection in nonparametric regression using and improved akaike information criterion. J. Roy. Stat. Soc. B 60 (1998) 271-294. | MR 1616041 | Zbl 0909.62039

[22] O. Lepski, Asymptotically minimax adaptive estimation. I: upper bounds. optimally adaptive estimates. Theory Probab. Appl. 37 (1991) 682-697. | MR 1147167 | Zbl 0776.62039

[23] K.-C. Li, Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: Discrete index set. Ann. Statist. 15 (1987) 958-975. | MR 902239 | Zbl 0653.62037

[24] G. Ridgeway, Additive logistic regression: a statistical view of boosting: Discussion. Ann. Statist. 28 (2000) 393-400. | MR 1790002 | Zbl 1106.62323

[25] L. Schwartz, Analyse IV applications à la théorie de la mesure. Hermann (1993). | Zbl 0920.00003

[26] W. Stuetzle and Y. Mittal, Some comments on the asymptotic behavior of robust smoothers, in Smoothing Techniques for Curve Estimation, edited by T. Gasser and M. Rosenblatt. Springer-Verlag (1979) 191-195. | MR 564259 | Zbl 0421.62022

[27] J. Tukey, Explanatory Data Analysis. Addison-Wesley (1977). | Zbl 0409.62003

[28] F. Utreras, Convergence rates for multivariate smoothing spline functions. J. Approx. Theory (1988) 1-27. | MR 922591 | Zbl 0646.41006

[29] J. Wendelberger, Smoothing Noisy Data with Multivariate Splines and Generalized Cross-Validation. PhD thesis, University of Wisconsin (1982). | MR 2632494

[30] S. Wood, Thin plate regression splines. J. R. Statist. Soc. B 65 (2003) 95-114. | MR 1959095 | Zbl 1063.62059

[31] Y. Yang, Combining different procedures for adaptive regression. J. Mult. Analysis 74 (2000) 135-161. | MR 1790617 | Zbl 0964.62032