A method to enrich experimental datasets by means of numerical simulations in view of classification tasks
ESAIM: Mathematical Modelling and Numerical Analysis , Tome 55 (2021) no. 5, pp. 2259-2291

Classification tasks are frequent in many applications in science and engineering. A wide variety of statistical learning methods exist to deal with these problems. However, in many industrial applications, the number of available samples to train and construct a classifier is scarce and this has an impact on the classifications performances. In this work, we consider the case in which some a priori information on the system is available in form of a mathematical model. In particular, a set of numerical simulations of the system can be integrated to the experimental dataset. The main question we address is how to integrate them systematically in order to improve the classification performances. The method proposed is based on Nearest Neighbours and on the notion of Hausdorff distance between sets. Some theoretical results and several numerical studies are proposed.

DOI : 10.1051/m2an/2021060
Classification : 60B10, 68T05
Keywords: Classification, Hausdorff distance, nearest neighbors
@article{M2AN_2021__55_5_2259_0,
     author = {Lombardi, Damiano and Raphel, Fabien},
     title = {A method to enrich experimental datasets by means of numerical simulations in view of classification tasks},
     journal = {ESAIM: Mathematical Modelling and Numerical Analysis },
     pages = {2259--2291},
     year = {2021},
     publisher = {EDP-Sciences},
     volume = {55},
     number = {5},
     doi = {10.1051/m2an/2021060},
     mrnumber = {4323407},
     zbl = {07477245},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/m2an/2021060/}
}
TY  - JOUR
AU  - Lombardi, Damiano
AU  - Raphel, Fabien
TI  - A method to enrich experimental datasets by means of numerical simulations in view of classification tasks
JO  - ESAIM: Mathematical Modelling and Numerical Analysis 
PY  - 2021
SP  - 2259
EP  - 2291
VL  - 55
IS  - 5
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/m2an/2021060/
DO  - 10.1051/m2an/2021060
LA  - en
ID  - M2AN_2021__55_5_2259_0
ER  - 
%0 Journal Article
%A Lombardi, Damiano
%A Raphel, Fabien
%T A method to enrich experimental datasets by means of numerical simulations in view of classification tasks
%J ESAIM: Mathematical Modelling and Numerical Analysis 
%D 2021
%P 2259-2291
%V 55
%N 5
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/m2an/2021060/
%R 10.1051/m2an/2021060
%G en
%F M2AN_2021__55_5_2259_0
Lombardi, Damiano; Raphel, Fabien. A method to enrich experimental datasets by means of numerical simulations in view of classification tasks. ESAIM: Mathematical Modelling and Numerical Analysis , Tome 55 (2021) no. 5, pp. 2259-2291. doi: 10.1051/m2an/2021060

[1] D. W. Aha, D. Kibler and M. K. Albert, Instance-based learning algorithms. Mach. Learn. 6 (1991) 37–66. | DOI

[2] A. Basudhar and S. Missoum, An improved adaptive sampling scheme for the construction of explicit boundaries. Struct. Multi. Optim. 42 (2010) 517–529. | DOI

[3] C. Bigoni and J. S. Hesthaven, Simulation-based anomaly detection and damage localization: an application to structural health monitoring. Comput. Methods Appl. Mech. Eng. 363 (2020) 112896. | MR | DOI

[4] P. Binev, A. Cohen, W. Dahmen and R. Devore, Classification algorithms using adaptive partitioning. Ann. Stat. 42 (2014) 2141–2163. | MR | Zbl

[5] M. Blachnik and M. Kordos, Comparison of instance selection and construction methods with various classifiers. Appl. Sci. 10 (2020) 3933. | DOI

[6] M. Browne, A geometric approach to non-parametric density estimation. Pattern Recogn. 40 (2007) 134–140. | Zbl | DOI

[7] A. Bueno-Orovio, E. M. Cherry and F. H. Fenton, Minimal model for human ventricular action potentials in tissue. J. Theor. Biol. 253 (2008) 544–560. | MR | Zbl | DOI

[8] J. R. Cano, F. Herrera and M. Lozano, Stratification for scaling up evolutionary prototype selection. Pattern Recognit. Lett. 26 (2005) 953–963. | DOI

[9] M. J. Fryer, A review of some non-parametric methods of density estimation. IMA J. Appl. Math. 20 (1977) 335–354. | MR | Zbl | DOI

[10] S. Givant and P. Halmos, Introduction to Boolean Algebras. Springer Science & Business Media (2008). | MR | Zbl

[11] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial networks. Preprint arXiv:1406.2661 (2014).

[12] M. Gu and K. Anderson, Calibration of imperfect mathematical models by multiple sources of data with measurement bias. Preprint arXiv:1810.11664 (2018).

[13] H. Gweon, M. Schonlau and S. H. Steiner, The k conditional nearest neighbor algorithm for classification and class probability estimation. Peer J. Comput. Sci. 5 (2019) e194. | DOI

[14] P. Hart, The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 14 (1968) 515–516. | DOI

[15] D. Higdon, M. Kennedy, J. C. Cavendish, J. A. Cafeo and R. D. Ryne, Combining field data and computer simulations for calibration and prediction. SIAM J. Sci. Comput. 26 (2004) 448–466. | MR | Zbl | DOI

[16] M. Johnson, Classification of AE transients based on numerical simulations of composite laminates. Ndt & e Int. 36 (2003) 319–329. | DOI

[17] M. Lapin, M. Hein and B. Schiele, Learning using privileged information: SVM+ and weighted SVM. Neural Netw. 53 (2014) 95–108. | Zbl | DOI

[18] D. Lombardi and F. Raphel, A greedy dimension reduction method for classification problems (2019) https://hal.inria.fr/hal-02280502.

[19] E. Marchiori, Hit miss networks with applications to instance selection. J. Mach. Learn. Res. 9 (2008) 997–1017. | MR | Zbl

[20] A. Mendizabal, T. Fountoukidou, J. Hermann, R. Sznitman and S. Cotin, A combined simulation and machine learning approach for image-based force classification during robotized intravitreal injections. In: International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, Cham (2018 September) 12–20.

[21] B. Müller, A. Hasman and J. A. Blom, Building intelligent alarm systems by combining mathematical models and inductive machine learning techniques Part 2 – sensitivity analysis. Int. J. Bio-Med. Comput. 42 (1996) 165–179. | DOI

[22] D. Nova and P. A. Estávez, A review of learning vector quantization classifiers. Neural Comput. App. 25 (2014) 511–524. | DOI

[23] V. M. Patel, R. Gopalan, R. Li and R. Chellappa, Visual domain adaptation: a survey of recent advances. IEEE Signal Process. Mag. 32 (2015) 53–69. | DOI

[24] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg and J. Vanderplas, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12 (2011) 2825–2830. | MR | Zbl

[25] B. Qin and F. Xiao, A non-parametric method to determine basic probability assignment based on kernel density estimation. IEEE Access 6 (2018) 73509–73519. | DOI

[26] G. Ritter, H. Woodruff, S. Lowry and T. Isenhour, An algorithm for a selective nearest neighbor decision rule (corresp.). IEEE Trans. Inf. Theory 21 (1975) 665–669. | Zbl | DOI

[27] L. Rosafalco, A. Manzoni, S. Mariani and A. Corigliano, Fully convolutional networks for structural health monitoring through multivariate time series classification. Adv. Model. Simul. Eng. Sci. 7 (2020) 1–31.

[28] S. Sun, H. Shi and Y. Wu, A survey of multi-source domain adaptation. Inf. Fusion 24 (2015) 84–92. | DOI

[29] T. Taddei, J. D. Penn, M. Yano and A. T. Patera, Simulation-based classification; a model-order-reduction approach for structural health monitoring. Arch. Comput. Methods Eng. 25 (2018) 23–45. | MR | Zbl | DOI

[30] L. O. Tedeschi, Assessment of the adequacy of mathematical models. Agric. Syst. 89 (2006) 225–247. | DOI

[31] I. Tomek, An experiment with the edited nearest-nieghbor rule. IEEE Trans. Syst. Man Cybern. 6 (1976) 448–452. | MR | Zbl

[32] F. Vázquez, J. S. Sánchez and F. Pla, A stochastic approach to Wilson’s editing algorithm. In: Iberian Conference on Pattern Recognition and Image Analysis. Springer, Berlin, Heidelberg (2005 June) 35–42.

[33] M. Wang and W. Deng, Deep visual domain adaptation: a survey. Neurocomputing 312 (2018) 135–153. | DOI

[34] D. R. Wilson and T. R. Martinez, Instance pruning techniques. In: Vol. 97 of ICML (1997, July) 400–411.

[35] D. R. Wilson and T. R. Martinez, Reduction techniques for instance-based learning algorithms. Mach. Learn. 38 (2000) 257–286. | Zbl | DOI

[36] K. Zhang, B. Schökopf, K. Muandet and Z. Wang, Domain adaptation under target and conditional shift. In: International Conference on Machine Learning. PMLR (2013, May) 819–827.

Cité par Sources :