Let $X$ be a random element in a metric space $(\mathcal{F},d)$, and let $Y$ be a random variable with value $0$ or $1$. $Y$ is called the class, or the label, of $X$. Let ${({X}_{i},{Y}_{i})}_{1\le i\le n}$ be an observed i.i.d. sample having the same law as $(X,Y)$. The problem of classification is to predict the label of a new random element $X$. The $k$-nearest neighbor classifier is the simple following rule: look at the $k$ nearest neighbors of $X$ in the trial sample and choose $0$ or $1$ for its label according to the majority vote. When $(\mathcal{F},d)=({\mathbb{R}}^{d},||.|\left|\right)$, Stone (1977) proved in 1977 the universal consistency of this classifier: its probability of error converges to the Bayes error, whatever the distribution of $(X,Y)$. We show in this paper that this result is no longer valid in general metric spaces. However, if $(\mathcal{F},d)$ is separable and if some regularity condition is assumed, then the $k$-nearest neighbor classifier is weakly consistent.

Keywords: classification, consistency, non parametric statistics

@article{PS_2006__10__340_0, author = {C\'erou, Fr\'ed\'eric and Guyader, Arnaud}, title = {Nearest neighbor classification in infinite dimension}, journal = {ESAIM: Probability and Statistics}, pages = {340--355}, publisher = {EDP-Sciences}, volume = {10}, year = {2006}, doi = {10.1051/ps:2006014}, mrnumber = {2247925}, language = {en}, url = {http://www.numdam.org/articles/10.1051/ps:2006014/} }

TY - JOUR AU - Cérou, Frédéric AU - Guyader, Arnaud TI - Nearest neighbor classification in infinite dimension JO - ESAIM: Probability and Statistics PY - 2006 SP - 340 EP - 355 VL - 10 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/ps:2006014/ DO - 10.1051/ps:2006014 LA - en ID - PS_2006__10__340_0 ER -

Cérou, Frédéric; Guyader, Arnaud. Nearest neighbor classification in infinite dimension. ESAIM: Probability and Statistics, Volume 10 (2006), pp. 340-355. doi : 10.1051/ps:2006014. http://www.numdam.org/articles/10.1051/ps:2006014/

[1] On the kernel rule for function classification. submitted (2003). | Zbl

, and ,[2] On the kernel rule for function classification. IEEE Trans. Inform. Theory, to appear (2005). | MR

, and ,[3] Nearest neighbor pattern classification. IEEE Trans. Inform. Theory IT-13 (1967) 21-27. | Zbl

and ,[4] Nonparametric regression estimation when the regressor takes its values in a metric space, submitted (2001). | Zbl

and ,[5] On the almost everywhere convergence of nonparametric regression function estimates. Ann. Statist. 9 (1981) 1310-1319. | Zbl

,[6] On the strong universal consistency of nearest neighbor regression function estimates. Ann. Statist. 22 (1994) 1371-1385. | Zbl

, , and ,[7] A probabilistic theory of pattern recognition 31, Applications of Mathematics (New York). Springer-Verlag, New York (1996). | MR | Zbl

, and ,[8] Measure theory and fine properties of functions. Studies in Advanced Mathematics. CRC Press, Boca Raton, FL (1992). | MR | Zbl

and ,[9] Geometric measure theory. Die Grundlehren der mathematischen Wissenschaften, Band 153. Springer-Verlag New York Inc., New York (1969). | MR | Zbl

,[10] Gaussian measures and the density theorem. Comment. Math. Univ. Carolin. 22 (1981) 181-193. | Zbl

,[11] Dimension of metrics and differentiation of measures, in General topology and its relations to modern analysis and algebra, V (Prague, 1981), Sigma Ser. Pure Math., Heldermann, Berlin 3 (1983) 565-568. | Zbl

,[12] Differentiation of measures on Hilbert spaces, in Measure theory, Oberwolfach 1981 (Oberwolfach, 1981), Springer, Berlin. Lect. Notes Math. 945 (1982) 194-207. | Zbl

and ,[13] Consistent nonparametric regression. Ann. Statist. 5 (1977) 595-645. With discussion and a reply by the author. | Zbl

,*Cited by Sources: *