Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given independent observations drawn from an unknown multivariate probability density , we propose a new approach to estimate the number of connected components, or clusters, of the -level set . The basic idea is to form a rough skeleton of the set using any preliminary estimator of , and to count the number of connected components of the resulting graph. Under mild analytic conditions on , and using tools from differential geometry, we establish the consistency of our method.
Keywords: cluster analysis, connected component, level set, graph, tubular neighborhood
@article{PS_2007__11__272_0,
author = {Biau, G\'erard and Cadre, Beno{\^\i}t and Pelletier, Bruno},
title = {A graph-based estimator of the number of clusters},
journal = {ESAIM: Probability and Statistics},
pages = {272--280},
year = {2007},
publisher = {EDP Sciences},
volume = {11},
doi = {10.1051/ps:2007019},
mrnumber = {2320821},
zbl = {1187.62114},
language = {en},
url = {https://www.numdam.org/articles/10.1051/ps:2007019/}
}
TY - JOUR AU - Biau, Gérard AU - Cadre, Benoît AU - Pelletier, Bruno TI - A graph-based estimator of the number of clusters JO - ESAIM: Probability and Statistics PY - 2007 SP - 272 EP - 280 VL - 11 PB - EDP Sciences UR - https://www.numdam.org/articles/10.1051/ps:2007019/ DO - 10.1051/ps:2007019 LA - en ID - PS_2007__11__272_0 ER -
%0 Journal Article %A Biau, Gérard %A Cadre, Benoît %A Pelletier, Bruno %T A graph-based estimator of the number of clusters %J ESAIM: Probability and Statistics %D 2007 %P 272-280 %V 11 %I EDP Sciences %U https://www.numdam.org/articles/10.1051/ps:2007019/ %R 10.1051/ps:2007019 %G en %F PS_2007__11__272_0
Biau, Gérard; Cadre, Benoît; Pelletier, Bruno. A graph-based estimator of the number of clusters. ESAIM: Probability and Statistics, Tome 11 (2007), pp. 272-280. doi: 10.1051/ps:2007019
[1] , Topology and Geometry, Springer-Verlag, New York, Graduate Texts in Mathematics 139 (1993). | Zbl | MR
[2] ,, and, Connectivity of the mutual -nearest neighbor graph in clustering and outlier detection. Statist. Probab. Lett. 35 (1997) 33-42. | Zbl
[3] , Kernel estimation of density level sets. J. Multivariate Anal. 97 (2006) 999-1023. | Zbl
[4] , Riemannian Geometry: A Modern Introduction. Cambridge University Press, Cambridge (1993). | Zbl | MR
[5] , and, Introduction to Algorithms. The MIT Press, Cambridge (1990). | Zbl | MR
[6] , and, Estimating the number of clusters. Canad. J. Statist. 28 (2000) 367-382. | Zbl
[7] , and, Cluster analysis: a further approach based on density estimation. Comput. Statist. Data Anal. 36 (2001) 441-459. | Zbl
[8] and, Detection of abnormal behavior via nonparametric estimation of the support. SIAM J. Appl. Math. 38 (1980) 480-488. | Zbl
[9] , and, Pattern Classification, 2nd edition. Wiley-Interscience, New York (2000). | Zbl | MR
[10] ,, and, A Distribution-Free Theory of Nonparametric Regression. Springer-Verlag, New York (2002). | Zbl | MR
[11] , Clustering Algorithms. John Wiley, New York (1975). | Zbl | MR
[12] , and, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001). | Zbl | MR
[13] and, Foundations of Differential Geometry, Vol. I & II, 2nd edition. Wiley, New York (1996). | Zbl | MR
[14] and, Towards a statistical theory of clustering. PASCAL Workshop on Statistics and Optimization of Clustering (2005).
[15] , A strong law for the longest edge of the minimal spanning tree. Ann. Probab. 27 (1999) 246-260. | Zbl
[16] , Measuring mass concentrations and estimating density contour clusters-an excess mass approach. Ann. Statist. 23 (1995) 855-881. | Zbl
[17] , Nonparametric Functional Estimation. Academic Press, Orlando (1983). | Zbl | MR
[18] , On nonparametric estimation of density level sets. Ann. Statist. 25 (1997) 948-969. | Zbl
Cité par Sources :






