How many bins should be put in a regular histogram
ESAIM: Probability and Statistics, Tome 10 (2006), pp. 24-45.

Given an $n$-sample from some unknown density $f$ on $\left[0,1\right]$, it is easy to construct an histogram of the data based on some given partition of $\left[0,1\right]$, but not so much is known about an optimal choice of the partition, especially when the data set is not large, even if one restricts to partitions into intervals of equal length. Existing methods are either rules of thumbs or based on asymptotic considerations and often involve some smoothness properties of $f$. Our purpose in this paper is to give an automatic, easy to program and efficient method to choose the number of bins of the partition from the data. It is based on bounds on the risk of penalized maximum likelihood estimators due to Castellan and heavy simulations which allowed us to optimize the form of the penalty function. These simulations show that the method works quite well for sample sizes as small as 25.

DOI : https://doi.org/10.1051/ps:2006001
Classification : 62E25,  62G05
Mots clés : regular histograms, density estimation, penalized maximum likelihood, model selection
@article{PS_2006__10__24_0,
author = {Birge, Lucien and Rozenholc, Yves},
title = {How many bins should be put in a regular histogram},
journal = {ESAIM: Probability and Statistics},
pages = {24--45},
publisher = {EDP-Sciences},
volume = {10},
year = {2006},
doi = {10.1051/ps:2006001},
zbl = {1136.62329},
mrnumber = {2197101},
language = {en},
url = {http://www.numdam.org/articles/10.1051/ps:2006001/}
}
TY  - JOUR
AU  - Birge, Lucien
AU  - Rozenholc, Yves
TI  - How many bins should be put in a regular histogram
JO  - ESAIM: Probability and Statistics
PY  - 2006
DA  - 2006///
SP  - 24
EP  - 45
VL  - 10
PB  - EDP-Sciences
UR  - http://www.numdam.org/articles/10.1051/ps:2006001/
UR  - https://zbmath.org/?q=an%3A1136.62329
UR  - https://www.ams.org/mathscinet-getitem?mr=2197101
UR  - https://doi.org/10.1051/ps:2006001
DO  - 10.1051/ps:2006001
LA  - en
ID  - PS_2006__10__24_0
ER  - 
Birgé, Lucien; Rozenholc, Yves. How many bins should be put in a regular histogram. ESAIM: Probability and Statistics, Tome 10 (2006), pp. 24-45. doi : 10.1051/ps:2006001. http://www.numdam.org/articles/10.1051/ps:2006001/

[1] H. Akaike, A new look at the statistical model identification. IEEE Trans. Automatic Control 19 (1974) 716-723. | Zbl 0314.62039

[2] A.R. Barron, L. Birgé and P. Massart. Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113 (1999) 301-415. | Zbl 0946.62036

[3] L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, D. Pollard, E. Torgersen and G. Yang, Eds., Springer-Verlag, New York (1997) 55-87. | Zbl 0920.62042

[4] L. Birgé and P. Massart, Gaussian model selection. J. Eur. Math. Soc. 3 (2001) 203-268. | Zbl 1037.62001

[5] G. Castellan, Modified Akaike's criterion for histogram density estimation. Technical Report. Université Paris-Sud, Orsay (1999).

[6] G. Castellan, Sélection d'histogrammes à l'aide d'un critère de type Akaike. CRAS 330 (2000) 729-732. | Zbl 0969.62023

[7] J. Daly, The construction of optimal histograms. Commun. Stat., Theory Methods 17 (1988) 2921-2931. | Zbl 0696.62175

[8] L. Devroye, A Course in Density Estimation. Birkhäuser, Boston (1987). | MR 891874 | Zbl 0617.62043

[9] L. Devroye, and L. Györfi, Nonparametric Density Estimation: The ${L}_{1}$ View. John Wiley, New York (1985). | MR 780746 | Zbl 0546.62015

[10] L. Devroye and G. Lugosi, Combinatorial Methods in Density Estimation. Springer-Verlag, New York (2001). | MR 1843146 | Zbl 0964.62025

[11] D. Freedman and P. Diaconis, On the histogram as a density estimator: ${L}_{2}$ theory. Z. Wahrscheinlichkeitstheor. Verw. Geb. 57 (1981) 453-476. | Zbl 0449.62033

[12] P. Hall, Akaike's information criterion and Kullback-Leibler loss for histogram density estimation. Probab. Theory Relat. Fields 85 (1990) 449-467. | Zbl 0675.62027

[13] P. Hall and E.J. Hannan, On stochastic complexity and nonparametric density estimation. Biometrika 75 (1988) 705-714. | Zbl 0661.62025

[14] K. He and G. Meeden, Selecting the number of bins in a histogram: A decision theoretic approach. J. Stat. Plann. Inference 61 (1997) 49-59. | Zbl 0879.62002

[15] D.R.M. Herrick, G.P. Nason and B.W. Silverman, Some new methods for wavelet density estimation. Sankhya, Series A 63 (2001) 394-411.

[16] M.C. Jones, On two recent papers of Y. Kanazawa. Statist. Probab. Lett. 24 (1995) 269-271. | Zbl 0835.62047

[17] Y. Kanazawa, Hellinger distance and Akaike's information criterion for the histogram. Statist. Probab. Lett. 17 (1993) 293-298. | Zbl 0779.62041

[18] L.M. Le Cam, Asymptotic Methods in Statistical Decision Theory. Springer-Verlag, New York (1986). | MR 856411 | Zbl 0605.62002

[19] L.M. Le Cam and G.L. Yang, Asymptotics in Statistics: Some Basic Concepts. Second Edition. Springer-Verlag, New York (2000). | MR 1784901 | Zbl 0952.62002

[20] J. Rissanen, Stochastic complexity and the MDL principle. Econ. Rev. 6 (1987) 85-102. | Zbl 0718.62008

[21] M. Rudemo, Empirical choice of histograms and kernel density estimators. Scand. J. Statist. 9 (1982) 65-78. | Zbl 0501.62028

[22] D.W. Scott, On optimal and databased histograms. Biometrika 66 (1979) 605-610. | Zbl 0417.62031

[23] H.A. Sturges, The choice of a class interval. J. Am. Stat. Assoc. 21 (1926) 65-66.

[24] C.C. Taylor, Akaike's information criterion and the histogram. Biometrika. 74 (1987) 636-639. | Zbl 0628.62032

[25] G.R. Terrell, The maximal smoothing principle in density estimation. J. Am. Stat. Assoc. 85 (1990) 470-477.

[26] M.P. Wand, Data-based choice of histogram bin width. Am. Statistician 51 (1997) 59-64.

Cité par Sources :