Even for a well-trained statistician the construction of a histogram for a given real-valued data set is a difficult problem. It is even more difficult to construct a fully automatic procedure which specifies the number and widths of the bins in a satisfactory manner for a wide range of data sets. In this paper we compare several histogram construction procedures by means of a simulation study. The study includes plug-in methods, cross-validation, penalized maximum likelihood and the taut string procedure. Their performance on different test beds is measured by their ability to identify the peaks of an underlying density as well as by Hellinger distance.

Keywords: regular histogram, model selection, penalized likelihood, taut string

@article{PS_2009__13__181_0, author = {Davies, Laurie and Gather, Ursula and Nordman, Dan and Weinert, Henrike}, title = {A comparison of automatic histogram constructions}, journal = {ESAIM: Probability and Statistics}, pages = {181--196}, publisher = {EDP-Sciences}, volume = {13}, year = {2009}, doi = {10.1051/ps:2008005}, mrnumber = {2518545}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ps:2008005/} }

TY - JOUR AU - Davies, Laurie AU - Gather, Ursula AU - Nordman, Dan AU - Weinert, Henrike TI - A comparison of automatic histogram constructions JO - ESAIM: Probability and Statistics PY - 2009 SP - 181 EP - 196 VL - 13 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/ps:2008005/ DO - 10.1051/ps:2008005 LA - en ID - PS_2009__13__181_0 ER -

%0 Journal Article %A Davies, Laurie %A Gather, Ursula %A Nordman, Dan %A Weinert, Henrike %T A comparison of automatic histogram constructions %J ESAIM: Probability and Statistics %D 2009 %P 181-196 %V 13 %I EDP-Sciences %U http://www.numdam.org/articles/10.1051/ps:2008005/ %R 10.1051/ps:2008005 %G en %F PS_2009__13__181_0

Davies, Laurie; Gather, Ursula; Nordman, Dan; Weinert, Henrike. A comparison of automatic histogram constructions. ESAIM: Probability and Statistics, Volume 13 (2009), pp. 181-196. doi : 10.1051/ps:2008005. http://www.numdam.org/articles/10.1051/ps:2008005/

[1] A new look at the statistical model identification. IEEE Trans. Automatic Control 19 (1973) 716-723. | MR | Zbl

,[2] A look at some data on the Old Faithful geyser. Appl. Statist. 39 (1990) 357-365. | Zbl

and ,[3] Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113 (1999) 301-413. | MR | Zbl

, and ,[4] How many bins should be put in a regular histogram? ESAIM: PS 10 (2006) 24-45. | Numdam | MR | Zbl

and ,[5] Construction of optimal histograms. Commun. Stat., Theory Methods 17 (1988) 2921-2931. | MR | Zbl

,[6] Local extremes, runs, strings and multiresolution (with discussion). Ann. Stat. 29 (2001) 1-65. | MR | Zbl

and ,[7] Densities, spectral densities and modality. Ann. Stat. 32 (2004) 1093-1136. | MR | Zbl

and ,[8] ftnonpar, R-package, version 0.1-82, http://www.r-project.org (2008).

and ,[9] Nonparametric density estimation: the ${L}_{1}$ view. John Wiley, New York (1985). | MR | Zbl

and ,[10] Multiscale inference about a density. Ann. Stat. 36 (2008) 1758-1785. | MR | Zbl

and ,[11] The multiresolution histogram. Metrika 46 (1997) 41-57. | MR | Zbl

,[12] On the histogram as a density estimator: ${L}_{2}$ theory. Z. Wahr. Verw. Geb. 57 (1981) 453-476. | MR | Zbl

and ,[13] Density estimation and bump-hunting by the penalizes likelihood method exemplified by scattering and meteorite data. J. Amer. Statist. Assoc. 75 (1980) 42-73. | MR | Zbl

and ,[14] Akaike's information criterion and Kullback-Leibler loss for histogram density estimation. Probab. Theory Relat. Fields 85 (1990) 449-467. | MR | Zbl

,[15] On stochastic complexity and nonparametric density estimation. Biometrika 75 (1988) 705-714. | MR | Zbl

and ,[16] Minimizing ${L}_{1}$ distance in nonparametric density estimation. J. Multivariate Anal. 26 (1988) 59-88. | MR | Zbl

and ,[17] Selecting the number of bins in a histogram: A decision theoretic approach. J. Stat. Plann. Inference 61 (1997) 49-59. | Zbl

and ,[18] An optimal variable cell histogram. Commun. Stat., Theory Methods 17 (1988) 1401-1422. | MR | Zbl

,[19] An optimal variable cell histogram based on the sample spacings. Ann. Stat. 20 (1992) 291-304. | MR | Zbl

,[20] Hellinger distance and Akaike's information criterion for the histogram. Statist. Probab. Lett. 17 (1993) 293-298. | MR | Zbl

,[21] Bandwidth selection: classical or plug-in? Ann. Stat. 27 (1999) 415-438. | MR | Zbl

,[22] Exact mean integrated squared error. Ann. Stat. 20 (1992) 712-736. | MR | Zbl

and ,[23] Probes of large-scale structures in the Corona Borealis region. Astrophys. J. 92, (1986) 1238-1247.

, and ,[24] A universal prior for integers and estimation by minimum description length. Ann. Stat. 11 (1983) 416-431. | MR | Zbl

,[25] Stochastic Complexity (with discussion). J. R. Statist. Soc. B 49 (1987) 223-239. | MR | Zbl

,[26] Stochastic complexity in statistical inquiry. World Scientific, New Jersey (1989). | MR | Zbl

,[27] Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 42 (1996) 40-47. | MR | Zbl

,[28] Density estimation by stochastic complexity. IEEE Trans. Inf. Theory 38 (1992) 315-323. | Zbl

, and ,[29] Density estimation with confidence sets exemplified by superclusters and voids in galaxies. J. Amer. Statist. Assoc. 85 (1990) 617-624. | Zbl

,[30] Empirical choice of histograms and kernel density estimators. Scand. J. Statist. 9 (1982)65-78. | MR | Zbl

,[31] Estimating the dimension of a model. Ann. Stat. 6 (1978) 461-464. | MR | Zbl

,[32] On optimal and data-based histograms. Biometrika 66 (1979) 605-610. | MR | Zbl

,[33] Multivariate density estimation: theory, practice, and visualization. Wiley, New York (1992). | MR | Zbl

,[34] Choosing the window width when estimating a density. Biometrika 65 (1978) 1-11. | MR | Zbl

,[35] Density estimation for statistics and data analysis. Chapman and Hall, London (1985). | MR | Zbl

,[36] Measuring the stability of histogram appearance when the anchor position is changed. Comput. Stat. Data Anal. 23 (1997) 335-353. | MR | Zbl

and ,[37] The choice of a class-interval. J. Amer. Statist. Assoc. 21 (1926) 65-66.

,[38] On asymptotics of certain recurrences arising in universal coding. Prob. Inf. Trans. 34 (1998) 142-146. | MR | Zbl

,[39] Data-based choice of histogram bin width. American Statistician 51 (1997) 59-64.

,[40] KernSmooth, R-package, version 2.22-21, http://www.r-project.org (2007).

and ,*Cited by Sources: *