We consider a problem of multiclass classification, where the training sample $$ is generated from the model ℙ(Y = m|X = x) = η$$(x), 1 ≤ m ≤ M, and η1(x), …, η$$(x) are unknown α-Holder continuous functions. Given a test point X, our goal is to predict its label. A widely used k-nearest-neighbors classifier constructs estimates of η1(X), …, η$$(X) and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter k, which may become tricky in some situations. We fix several integers n1, …, n$$, compute corresponding n$$-nearest-neighbor estimates for each m and each n$$ and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter α and to the margin and establish rates of convergence under mild assumptions.
Accepté le :
Première publication :
Publié le :
DOI : 10.1051/ps/2019021
Keywords: Multiclass classification, k nearest neighbors, adaptive procedures
@article{PS_2020__24_1_69_0,
author = {Puchkin, Nikita and Spokoiny, Vladimir},
title = {An adaptive multiclass nearest neighbor classifier},
journal = {ESAIM: Probability and Statistics},
pages = {69--99},
year = {2020},
publisher = {EDP Sciences},
volume = {24},
doi = {10.1051/ps/2019021},
mrnumber = {4069295},
zbl = {1440.62250},
language = {en},
url = {https://www.numdam.org/articles/10.1051/ps/2019021/}
}
TY - JOUR AU - Puchkin, Nikita AU - Spokoiny, Vladimir TI - An adaptive multiclass nearest neighbor classifier JO - ESAIM: Probability and Statistics PY - 2020 SP - 69 EP - 99 VL - 24 PB - EDP Sciences UR - https://www.numdam.org/articles/10.1051/ps/2019021/ DO - 10.1051/ps/2019021 LA - en ID - PS_2020__24_1_69_0 ER -
Puchkin, Nikita; Spokoiny, Vladimir. An adaptive multiclass nearest neighbor classifier. ESAIM: Probability and Statistics, Tome 24 (2020), pp. 69-99. doi: 10.1051/ps/2019021
[1] , Selective sampling algorithms for cost-sensitive multiclass prediction, in Proceedings of the 30th International Conference on Machine Learning, edited by and , Vol. 28 of Proceedings of Machine Learning Research. Atlanta, Georgia, USA, (2013) 1220–1228.
[2] and , Corporate credit rating using multiclass classification models with order information. World Acad. Sci. Eng. Technol. 60 (2011) 95–100.
[3] , and , Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res. 1 (2001) 113–141. | MR | Zbl
[4] and , Fast learning rates for plug-in classifiers. Ann. Stat. 35 (2007) 608–633. | MR | Zbl
[5] and , Spatial aggregation of local likelihood estimates with applications to classification. Ann. Stat. 35 (2007) 2287–2311. | MR | Zbl | DOI
[6] , , and , Optimal exponential bounds for aggregation of estimators for the Kullback-Leibler loss. Electron. J. Stat. 11 (2017) 2258–2294. | MR | Zbl | DOI
[7] , and , Local nearest neighbour classification with applications to semi-supervisedlearning (2017), . | arXiv
[8] and , Rates of convergence for nearest neighbor classification, in Vol. 2 of Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, Cambridge, MA, USA (2014) 3437–3445.
[9] and , On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2 (2002) 265–292. | Zbl
[10] , and , Deviation optimal learning using greedy Q-aggregation. Ann. Stat. 40 (2012) 1878–1905. | MR | Zbl
[11] , and . , Multiclass learning approaches: A theoretical comparison with implications, in Advancesin Neural Information Processing Systems 25, edited by , , , and , Curran Associates Inc. (2012), 485–493.
[12] and Taniskidou UCI machine learning repository, 2017.
[13] and , Solving multiclass learning problems via error-correcting output codes. J. Artif. Int. Res. 2 (1995) 263–286. | Zbl
[14] and , Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17 (2001) 349–358.
[15] , , , and , in Theory and Applications of Models of Computation, Vol. 9076 of Lecture Notes in Computer Sciences. Springer, Cham (2015) 375–387. | MR
[16] , and , Rate of convergence of k-nearest-neighbor classification rule. J. Mach. Learn. Res., 18 (2017) 16. | MR | Zbl
[17] , and , Classification in general finite dimensional spaces with the k-nearest neighbor rule. Ann. Stat. 44 (2016) 982–1009. | MR | Zbl
[18] , and , Application of support vector machines to speech recognition. IEEE Trans. Signal Process. 52 (2004) 2348–2355.
[19] , and , Learning by mirror averaging. Ann. Stat. 36 (2008) 2183–2206. | MR | Zbl
[20] , , and . Face verification via error correcting output codes. Image Vis. Comput. 21 (2003) 1163–1169. | DOI
[21] , Optimal rates of aggregation in classification under low noise assumption. Bernoulli 13 (2007) 1000–1022. | MR | Zbl | DOI
[22] , Empirical risk minimization is optimal for the convex aggregation problem. Bernoulli 19 (2013) 2153–2166. | MR | Zbl | DOI
[23] and , Optimal learning with Q-aggregation. Ann. Stat. 42 (2014) 211–224. | MR | Zbl | DOI
[24] and , Smooth discrimination analysis. Ann. Stat. 27 (1999) 1808–1829. | MR | Zbl | DOI
[25] and . The multi-armed bandit problem with covariates. Ann. Stat. 41 (2013) 693–721. | MR | Zbl | DOI
[26] and , In defense of one-vs-all classification. J. Mach. Learn. Res. 5 (2003/04) 101–141. | MR | Zbl
[27] , Kullback-Leibler aggregation and misspecified generalized linear models. Ann. Stat. 40 (2012) 639–665. | MR | Zbl | DOI
[28] and , Sparse estimation by exponential weighting. Stat. Sci. 27 (2012) 558–575. | MR | Zbl | DOI
[29] , and , Shifting, one-inclusion mistake bounds and tight multiclass expected risk bounds, in Advances in Neural Information Processing Systems 19, edited by , , and , MIT Press, Cambridge (2007) 1193–1200.
[30] , Optimal weighted nearest neighbour classifiers. Ann. Stat. 40 (2012) 2733–2763. | MR | Zbl | DOI
[31] and , Parameter tuning in pointwise adaptation using a propagation approach. Ann. Stat. 37 (2009) 2783–2807. | MR | Zbl | DOI
[32] , , and , Class prediction by nearest shrunken centroids, with applications to dna microarrays. Stat. Sci. 18 (2003) 02. | MR | Zbl | DOI
[33] , Optimal Rates of Aggregation, Springer, Berlin (2003) 303–313. | Zbl
[34] , , and , Recursive aggregation of estimators by the mirror descent method with averaging. Problemy Peredachi Informatsii 41 (2005) 78–96. | Zbl
Cité par Sources :
Financial support by the Russian Academic Excellence Project 5-100 and by the German Research Foundation (DFG) through the Collaborative Research Center 1294 is gratefully acknowledged.





