Adaptive Dantzig density estimation
Annales de l'I.H.P. Probabilités et statistiques, Volume 47 (2011) no. 1, p. 43-74

The aim of this paper is to build an estimate of an unknown density as a linear combination of functions of a dictionary. Inspired by Candès and Tao's approach, we propose a minimization of the 1-norm of the coefficients in the linear combination under an adaptive Dantzig constraint coming from sharp concentration inequalities. This allows to consider a wide class of dictionaries. Under local or global structure assumptions, oracle inequalities are derived. These theoretical results are transposed to the adaptive Lasso estimate naturally associated to our Dantzig procedure. Then, the issue of calibrating these procedures is studied from both theoretical and practical points of view. Finally, a numerical study shows the significant improvement obtained by our procedures when compared with other classical procedures.

L'objectif de cet article est de construire un estimateur d'une densité inconnue comme combinaison linéaire de fonctions d'un dictionnaire. Inspirés par l'approche de Candès et Tao, nous proposons une minimisation de la norme 1 des coefficients dans la combinaison linéaire sous une contrainte de Dantzig adaptative issue d'inégalités de concentration précises. Ceci nous permet de considérer une large classe de dictionnaires. Sous des hypothèses de structure locale ou globale, nous obtenons des inégalités oracles. Ces résultats théoriques sont transposés à l'estimateur Lasso adaptatif naturellement associé à notre procédure de Dantzig. Le problème de la calibration de ces procédures est alors étudié à la fois du point de vue théorique et du point de vue pratique. Enfin, une étude numérique montre l'amélioration significative obtenue par notre procédure en comparaison d'autres procédures plus classiques.

DOI : https://doi.org/10.1214/09-AIHP351
Classification:  62G07,  62G05,  62G20
Keywords: calibration, concentration inequalities, Dantzig estimate, density estimation, dictionary, Lasso estimate, oracle inequalities, sparsity
@article{AIHPB_2011__47_1_43_0,
     author = {Bertin, K. and Le Pennec, E. and Rivoirard, V.},
     title = {Adaptive Dantzig density estimation},
     journal = {Annales de l'I.H.P. Probabilit\'es et statistiques},
     publisher = {Gauthier-Villars},
     volume = {47},
     number = {1},
     year = {2011},
     pages = {43-74},
     doi = {10.1214/09-AIHP351},
     zbl = {1207.62077},
     mrnumber = {2779396},
     language = {en},
     url = {http://www.numdam.org/item/AIHPB_2011__47_1_43_0}
}
Bertin, K.; Le Pennec, E.; Rivoirard, V. Adaptive Dantzig density estimation. Annales de l'I.H.P. Probabilités et statistiques, Volume 47 (2011) no. 1, pp. 43-74. doi : 10.1214/09-AIHP351. http://www.numdam.org/item/AIHPB_2011__47_1_43_0/

[1] S. Arlot and P. Massart. Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10 (2009) 245-279.

[2] M. S. Asif and J. Romberg. Dantzig selector homotopy with dynamic measurements. In Proceedings of SPIE Computational Imaging VII 7246 (2009) 72460E.

[3] P. Bickel, Y. Ritov and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 (2009) 1705-1732. | MR 2533469 | Zbl 1173.62022

[4] L. Birgé. Model selection for density estimation with L2-loss, 2008. Available at arXiv 0808.1416.

[5] L. Birgé and P. Massart. Minimal penalties for Gaussian model selection. Probab. Theory Related. Fields 138 (2007) 33-73. | MR 2288064 | Zbl 1112.62082

[6] F. Bunea, A. Tsybakov and M. Wegkamp. Aggregation and sparsity via ℓ1 penalized least squares. In Learning Theory 379-391. Lecture Notes in Comput. Sci. 4005. Springer, Berlin, 2006. | MR 2280619 | Zbl 1143.62319

[7] F. Bunea, A. Tsybakov and M. Wegkamp. Sparse density estimation with ℓ1 penalties. Learning Theory 530-543. Lecture Notes in Comput. Sci. 4539. Springer, Berlin, 2007. | Zbl 1203.62053

[8] F. Bunea, A. Tsybakov and M. Wegkamp. Sparsity oracle inequalities for the LASSO. Electron. J. Statist. 1 (2007) 169-194. | MR 2312149 | Zbl 1146.62028

[9] F. Bunea, A. Tsybakov and M. Wegkamp. Aggregation for Gaussian regression. Ann. Statist. 35 (2007) 1674-1697. | MR 2351101 | Zbl 1209.62065

[10] F. Bunea, A. Tsybakov, M. Wegkamp and A. Barbu. Spades and mixture models. Ann. Statist. (2010). To appear. Available at arXiv 0901.2044. | MR 2676897 | Zbl 1198.62025

[11] F. Bunea. Consistent selection via the Lasso for high dimensional approximating regression models. In Pushing the Limits of Contemporary Statistics: Cartributions in Honor of J. K. Ghosh 122-137. Inst. Math. Stat. Collect 3. IMS, Beachwood, OH, 2008. | MR 2459221

[12] E. Candès and Y. Plan. Near-ideal model selection by ℓ1 minimization. Ann. Statist. 37 (2009) 2145-2177. | MR 2543688 | Zbl 1173.62053

[13] E. Candès and T. Tao. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 (2007) 2313-2351. | MR 2382644 | Zbl 1139.62019

[14] D. Chen, D. Donoho and M. Saunders. Atomic decomposition by basis pursuit. SIAM Rev. 43 (2001) 129-159. | MR 1854649 | Zbl 0979.94010

[15] D. Donoho, M. Elad and V. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 (2006) 6-18. | MR 2237332

[16] D. Donoho and I. Johnstone. Ideal spatial adaptation via wavelet shrinkage. Biometrika 81 (1994) 425-455. | MR 1311089 | Zbl 0815.62019

[17] B. Efron, T. Hastie, I. Johnstone and R. Tibshirani. Least angle regression. Ann. Statist. 32 (2004) 407-499. | MR 2060166 | Zbl 1091.62054

[18] A. Juditsky and S. Lambert-Lacroix. On minimax density estimation on ℝ. Bernoulli 10 (2004) 187-220. | MR 2046772 | Zbl 1076.62037

[19] K. Knight and W. Fu. Asymptotics for Lasso-type estimators. Ann. Statist. 28 (2000) 1356-1378. | MR 1805787 | Zbl 1105.62357

[20] K. Lounici. Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 (2008) 90-102. | MR 2386087 | Zbl pre05274636

[21] P. Massart. Concentration inequalities and model selection. Lecture Notes in Math. 1896. Springer, Berlin. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour July 6-23 2003, 2007. | MR 2319879 | Zbl 1170.60006

[22] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 (2006) 1436-1462. | MR 2278363 | Zbl 1113.62082

[23] N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 (2009) 246-270. | MR 2488351 | Zbl 1155.62050

[24] M. Osborne, B. Presnell and B. Turlach. On the Lasso and its dual. J. Comput. Graph. Statist. 9 (2000) 319-337. | MR 1822089

[25] M. Osborne, B. Presnell and B. Turlach. A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 (2000) 389-404. | MR 1773265 | Zbl 0962.65036

[26] P. Reynaud-Bouret and V. Rivoirard. Near optimal thresholding estimation of a Poisson intensity on the real line. Electron. J. Statist. 4 (2010) 172-238. | MR 2645482

[27] P. Reynaud-Bouret, V. Rivoirard and C. Tuleau. Adaptive density estimation: A curse of support? 2009. Available at arXiv 0907.1794. | Zbl 1197.62033

[28] R. Tibshirani. Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288. | MR 1379242 | Zbl 0850.62538

[29] S. Van De Geer. High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 (2008) 614-645. | MR 2396809 | Zbl 1138.62323

[30] B. Yu and P. Zhao. On model selection consistency of Lasso estimators. J. Mach. Learn. Res. 7 (2006) 2541-2567. | MR 2274449

[31] C. Zhang and J. Huang. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 (2008) 1567-1594. | MR 2435448 | Zbl 1142.62044

[32] H. Zou. The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 (2006) 1418-1429. | MR 2279469 | Zbl 1171.62326