Partition-based conditional density estimation
ESAIM: Probability and Statistics, Tome 17 (2013) , pp. 672-697.

We propose a general partition-based strategy to estimate conditional density with candidate densities that are piecewise constant with respect to the covariate. Capitalizing on a general penalized maximum likelihood model selection result, we prove, on two specific examples, that the penalty of each model can be chosen roughly proportional to its dimension. We first study a classical strategy in which the densities are chosen piecewise conditional according to the variable. We then consider Gaussian mixture models with mixing proportion that vary according to the covariate but with common mixture components. This model proves to be interesting for an unsupervised segmentation application that was our original motivation for this work.

DOI : https://doi.org/10.1051/ps/2012017
Classification : 62G08
Mots clés : conditional density estimation, partition, penalized likelihood, piecewise polynomial density, gaussian mixture model
@article{PS_2013__17__672_0,
     author = {Cohen, S. X. and Le Pennec, E.},
     title = {Partition-based conditional density estimation},
     journal = {ESAIM: Probability and Statistics},
     pages = {672--697},
     publisher = {EDP-Sciences},
     volume = {17},
     year = {2013},
     doi = {10.1051/ps/2012017},
     zbl = {1284.62250},
     mrnumber = {3126157},
     language = {en},
     url = {http://www.numdam.org/articles/10.1051/ps/2012017/}
}
Cohen, S. X.; Le Pennec, E. Partition-based conditional density estimation. ESAIM: Probability and Statistics, Tome 17 (2013) , pp. 672-697. doi : 10.1051/ps/2012017. http://www.numdam.org/articles/10.1051/ps/2012017/

[1] N. Akakpo, Adaptation to anisotropy and inhomogeneity via dyadic piecewise polynomial selection. Math. Meth. Stat. 21 (2012) 1-28. | MR 2901269

[2] N. Akakpo and C. Lacour, Inhomogeneous and anisotropic conditional density estimation from dependent data. Electon. J. Statist. 5 (2011) 1618-1653. | MR 2870146 | Zbl 1271.62060

[3] A. Antoniadis, J. Bigot and R. Von Sachs, A multiscale approach for statistical characterization of functional images. J. Comput. Graph. Stat. 18 (2008) 216-237. | MR 2649646

[4] A. Barron, C. Huang, J. Li and X. Luo, MDL Principle, Penalized Likelihood, and Statistical Risk, in Festschrift in Honor of Jorma Rissanen on the Occasion of his 75th Birthday. Tampere University Press (2008).

[5] D. Bashtannyk and R. Hyndman, Bandwidth selection for kernel conditional density estimation. Comput. Stat. Data Anal. 36 (2001) 279-298. | MR 1836204 | Zbl 1038.62034

[6] L. Bertrand, M.-A. Languille, S.X. Cohen, L. Robinet, C. Gervais, S. Leroy, D. Bernard, E. Le Pennec, W. Josse, J. Doucet and S. Schöder, European research platform IPANEMA at the SOLEIL synchrotron for ancient and historical materials. J. Synchrotron Radiat. 18 (2011) 765-772.

[7] Ch. Biernacki, G. Celeux, G. Govaert and F. Langrognet, Model-based cluster and discriminant analysis with the MIXMOD software. Comput. Stat. Data Anal. 51 (2006) 587-600. | MR 2297473 | Zbl 1157.62431

[8] L. Birgé and P. Massart, Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4 (1998) 329-375. | MR 1653272 | Zbl 0954.62033

[9] L. Birgé and P. Massart, Minimal penalties for gaussian model selection. Probab. Theory Related Fields 138 (2007) 33-73. | MR 2288064 | Zbl 1112.62082

[10] G. Blanchard, C. Schäfer, Y. Rozenholc and K.R. Müller, Optimal dyadic decision trees. Mach. Learn. 66 (2007) 209-241.

[11] E. Brunel, F. Comte and C. Lacour, Adaptive estimation of the conditional density in presence of censoring. Sankhy 69 (2007) 734-763. | MR 2521231 | Zbl 1193.62055

[12] S.X. Cohen and E. Le Pennec, Conditional density estimation by penalized likelihood model selection and applications. Technical report, RR-7596. INRIA (2011). arXiv:1103.2021.

[13] S.X. Cohen and E. Le Pennec, Conditional density estimation by penalized likelihood model selection. Submitted (2012).

[14] S.X. Cohen and E. Le Pennec, Unsupervised segmentation of hyperspectral images with spatialized Gaussian mixture model and model selection. Submitted (2012).

[15] J. De Gooijer and D. Zerom, On conditional density estimation. Stat. Neerlandica 57 (2003) 159-176. | MR 2028911 | Zbl 1090.62526

[16] D. Donoho, CART and best-ortho-basis: a connection. Ann. Stat. 25 (1997) 1870-1911. | MR 1474073 | Zbl 0942.62044

[17] S. Efromovich, Conditional density estimation in a regression setting. Ann. Stat. 35 (2007) 2504-2535. | MR 2382656 | Zbl 1129.62025

[18] S. Efromovich, Oracle inequality for conditional density estimation and an actuarial example. Ann. Inst. Stat. Math. 62 (2010) 249-275. | MR 2592098

[19] J. Fan, Q. Yao and H. Tong, Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83 (1996) 189-206. | MR 1399164 | Zbl 0865.62026

[20] Ch. Genovese and L. Wasserman, Rates of convergence for the Gaussian mixture sieve. Ann. Stat. 28 (2000) 1105-1127. | MR 1810921 | Zbl 1105.62333

[21] L. Györfi and M. Kohler, Nonparametric estimation of conditional distributions. IEEE Trans. Inform. Theory 53 (2007) 1872-1879. | MR 2317148

[22] P. Hall, R. Wolff and Q. Yao, Methods for estimating a conditional distribution function. J. Amer. Stat. Assoc. 94 (1999) 154-163. | MR 1689221 | Zbl 1072.62558

[23] T. Hofmann, Probabilistic latent semantic analysis, in Proc. of Uncertainty in Artificial Intelligence (1999).

[24] Y. Huang, I. Pollak, M. Do and C. Bouman, Fast search for best representations in multitree dictionaries. IEEE Trans. Image Process. 15 (2006) 1779-1793.

[25] R. Hyndman and Q. Yao, Nonparametric estimation and symmetry tests for conditional density functions. J. Nonparam. Stat. 14 (2002) 259-278. | MR 1905751 | Zbl 1013.62040

[26] R. Hyndman, D. Bashtannyk and G. Grunwald, Estimating and visualizing conditional densities. J. Comput. Graphical Stat. 5 (1996) 315-336. | MR 1422114

[27] B. Karaivanov and P. Petrushev, Nonlinear piecewise polynomial approximation beyond besov spaces. Appl. Comput. Harmonic Anal. 15 (2003) 177-223. | MR 2010943 | Zbl 1032.41018

[28] E. Kolaczyk and R. Nowak, Multiscale generalised linear models for nonparametric function estimation. Biometrika 92 (2005) 119-133. | MR 2158614 | Zbl 1068.62046

[29] E. Kolaczyk, J. Ju and S. Gopal, Multiscale, multigranular statistical image segmentation. J. Amer. Stat. Assoc. 100 (2005) 1358-1369. | MR 2236447 | Zbl 1117.62371

[30] Q. Li and J. Racine, Nonparametric Econometrics: Theory and Practice. Princeton University Press (2007). | MR 2283034 | Zbl 1183.62200

[31] J. Lin, Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory 37 (1991) 145-151. | MR 1087893 | Zbl 0712.94004

[32] P. Massart, Concentration inequalities and model selection, vol. 1896 of Lecture Notes in Mathematics (2007). Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour (2003), With a foreword by Jean Picard. | MR 2319879 | Zbl 1170.60006

[33] C. Maugis and B. Michel, A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: PS 15 (2012) 41-68. | Numdam | MR 2870505

[34] C. Maugis and B. Michel, Data-driven penalty calibration: a case study for Gaussian mixture model selection. ESAIM: PS 15 (2012) 320-339. | Numdam | MR 2870518

[35] M. Rosenblatt, Conditional probability density and regression estimators, in Multivariate Analysis II, Proc. of Second Internat. Sympos., Dayton, Ohio, 1968. Academic Press (1969) 25-31. | MR 254987

[36] L. Si and R. Jin, Adjusting mixture weights of gaussian mixture model via regularized probabilistic latent semantic analysis, in Advances in Knowledge Discovery and Data Mining (2005) 218-252.

[37] Ch. Stone, The use of polynomial splines and their tensor products in multivariate function estimation. Ann. Stat. 22 (1994) 118-171. | MR 1272079 | Zbl 0827.62038

[38] S. Szarek, Metric entropy of homogeneous spaces, in Quantum Probability (Gdansk 1997) (1998) 395-410. | MR 1649741 | Zbl 0927.46047

[39] S. Van De Geer, The method of sieves and minimum contrast estimators. Math. Methods Stat. 4 (1995) 20-38. | MR 1324688 | Zbl 0831.62029

[40] A. Van Der Vaart and J. Wellner, Weak Convergence. Springer (1996). | MR 1385671

[41] I. Van Keilegom and N. Veraverbeke, Density and hazard estimation in censored regression models. Bernoulli 8 (2002) 607-625. | MR 1935649 | Zbl 1007.62029

[42] R. Willett and R. Nowak, Multiscale poisson intensity and density estimation. IEEE Trans. Inform. Theory 53 (2007) 3171-3187. | MR 2417680

[43] D. Young and D. Hunter, Mixtures of regressions with predictor-dependent mixing proportions. Comput. Stat. Data Anal. 54 (2010) 2253-2266. | MR 2720486 | Zbl 1284.62467