We improve recently introduced consensus-based optimization method, proposed in [R. Pinnau, C. Totzeck, O. Tse, S. Martin, Math. Models Methods Appl. Sci. 27 (2017) 183–204], which is a gradient-free optimization method for general non-convex functions. We first replace the isotropic geometric Brownian motion by the component-wise one, thus removing the dimensionality dependence of the drift rate, making the method more competitive for high dimensional optimization problems. Secondly, we utilize the random mini-batch ideas to reduce the computational cost of calculating the weighted average which the individual particles tend to relax toward. For its mean-field limit – a nonlinear Fokker-Planck equation – we prove, in both time continuous and semi-discrete settings, that the convergence of the method, which is exponential in time, is guaranteed with parameter constraints independent of the dimensionality. We also conduct numerical tests to high dimensional problems to check the success rate of the method.
Keywords: Global optimization, high dimensional optimization, consensus-based optimization, random batch method
@article{COCV_2021__27_S1_A6_0,
author = {Carrillo, Jos\'e A. and Jin, Shi and Li, Lei and Zhu, Yuhua},
title = {A consensus-based global optimization method for high dimensional machine learning problems},
journal = {ESAIM: Control, Optimisation and Calculus of Variations},
year = {2021},
publisher = {EDP-Sciences},
volume = {27},
doi = {10.1051/cocv/2020046},
mrnumber = {4222159},
language = {en},
url = {https://www.numdam.org/articles/10.1051/cocv/2020046/}
}
TY - JOUR AU - Carrillo, José A. AU - Jin, Shi AU - Li, Lei AU - Zhu, Yuhua TI - A consensus-based global optimization method for high dimensional machine learning problems JO - ESAIM: Control, Optimisation and Calculus of Variations PY - 2021 VL - 27 PB - EDP-Sciences UR - https://www.numdam.org/articles/10.1051/cocv/2020046/ DO - 10.1051/cocv/2020046 LA - en ID - COCV_2021__27_S1_A6_0 ER -
%0 Journal Article %A Carrillo, José A. %A Jin, Shi %A Li, Lei %A Zhu, Yuhua %T A consensus-based global optimization method for high dimensional machine learning problems %J ESAIM: Control, Optimisation and Calculus of Variations %D 2021 %V 27 %I EDP-Sciences %U https://www.numdam.org/articles/10.1051/cocv/2020046/ %R 10.1051/cocv/2020046 %G en %F COCV_2021__27_S1_A6_0
Carrillo, José A.; Jin, Shi; Li, Lei; Zhu, Yuhua. A consensus-based global optimization method for high dimensional machine learning problems. ESAIM: Control, Optimisation and Calculus of Variations, Tome 27 (2021), article no. S5. doi: 10.1051/cocv/2020046
[1] and , Binary interaction algorithms for the simulation of flocking and swarming dynamics. Multisc. Model. Simul. 11 (2013) 1–29. | MR | Zbl | DOI
[2] and , Introduction to multidisciplinary design optimization. In Vol. 222 of Aeronautics & Astronautics. Standford University (2012).
[3] , and , From the microscale to collective crowd dynamics. Multis. Model. Simul. 11 (2013) 943–963. | MR | Zbl | DOI
[4] and , Advanced Mathematical Methods for Scientists and Engineers. International Series in Pure and Applied Mathematics. McGraw-Hill (1978). | MR | Zbl
[5] , and , Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5 (1994) 157–166. | DOI
[6] , , and , Contagion shocks in one dimension. J. Stat. Phys. 158 (2015) 647–664. | MR | DOI
[7] , and , Stochastic mean-field limit: non-Lipschitz forces and swarming. Math. Models Methods Appl. Sci. 21 (2011) 2179–2210. | MR | Zbl | DOI
[8] , Online learning and stochastic approximations. On-line Learn. Neural Netw. 17 (1998) 142. | Zbl
[9] , Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8 (2015) 231–357. | DOI
[10] , and , The derivation of swarming models: mean-field limit and Wasserstein distances. Collective dynamics from bacteria to crowds, volume 553 of CISM Courses and Lectures. Springer, Vienna (2014) 1–46. | MR
[11] , , and , An analytical framework for consensus-based global optimization method. Math. Models Methods Appl. Sci. 28 (2018) 1037–1066. | MR | DOI
[12] , , and , Asymptotic flocking dynamics for the kinetic Cucker-Smale model. SIAM J. Math. Anal. 42 (2010) 218–236. | MR | Zbl | DOI
[13] , , and , Particle, kinetic, and hydrodynamic models of swarming. In Mathematical modeling of collective behavior in socio-economic and life sciences, Modelling and Simulation in Materials Science and Engineering. Birkhäuser Boston, Inc., Boston, MA (2010) 297–336. | MR | Zbl
[14] , and , Particle based gPC methods for mean-field models of swarming with uncertainty. Commun. Comput. Phys. 25 (2018) 508–531. | MR
[15] and , On the mathematics of emergence. Jpn. J. Math. 2 (2007) 197–227. | MR | Zbl | DOI
[16] and , Towards theoretical understanding of large batch training in stochastic gradient descent. Preprint (2018). | arXiv
[17] and , Vol. 38 of Large deviations techniques and applications. Springer Science & Business Media (2009). | MR | Zbl
[18] and , Particle swarm optimization. In Proc. of IEEE International Conference on Neural Networks 4 (1995) 1942–1948.
[19] , and , Convergence of a first-order consensus-based global optimization algorithm. Preprint (2019). | arXiv | MR
[20] and , From particle to kinetic and hydrodynamic descriptions of flocking. Kinetic Related Models 1 (2008) 415–435. | MR | Zbl | DOI
[21] , Which neural net architectures give rise to exploding and vanishing gradients? In Adv. Neural Inf. Process. Syst. (2018) 582–591.
[22] and , -particles approximation of the Vlasov equations with singular potential. Arch. Ratl. Mech. Anal. 183 (2007) 489–524. | MR | Zbl | DOI
[23] , Genetic algorithms. Sci. Am. 267 (1992) 66–73. | DOI
[24] , and , Asymptotics of the spectral gap with applications to the theory of simulated annealing. J. Funct. Anal. 83 (1989) 333–347. | MR | Zbl | DOI
[25] , A theorem on the asymptotic behavior of a multiple integral. Duke Math. J. 15 (1948) 623–632. | MR | Zbl
[26] and , Large-time behavior of perturbed diffusion Markov processes with applications to the second eigenvalue problem for Fokker-Planck operators and simulated annealing. Acta Appl. Math. 19 (1990) 253–295. | MR | Zbl | DOI
[27] and , Simple upper and lower bounds for the multivariate Laplace approximation. J. Approx. Theory 186 (2014) 1–11. | MR | Zbl | DOI
[28] , A review of the mean field limits for Vlasov equations. Kinetic Related Models 7 (2014) 661–711. | MR | DOI
[29] and , Quantitative estimates of propagation of chaos for stochastic systems with kernels. Invent. Math. 214 (2018) 523–591. | MR | DOI
[30] , , , , , and , Three factors influencing minima in SGD. Preprint (2017). | arXiv
[31] , and , Random batch methods (RBM) for interacting particle systems. J. Comput. Phys. 400 (2020) 108877. | MR | DOI
[32] , Swarm intelligence, handbook of nature-inspired and innovative computing. Springer (2006) 187–219. | DOI
[33] , , , and , On large-batch training for deep learning: generalization gap and sharp minima. In International Conference on Learning Representations (2017).
[34] , and , Optimization by simulated annealing. Science 220 (1983) 671–680. | MR | Zbl | DOI
[35] , , , and , Emergent behaviour in multi-particle systems with non-local interactions [Editorial]. J. Phys. D 260 (2013) 1–4. | MR | DOI
[36] and , Error bounds for multidimensional Laplace approximation. J. Approx. Theory 37 (1983) 372–390. | MR | Zbl | DOI
[37] and , Simulated annealing: theory and applications. D. Reidel Publishing Co., Dordrecht (1987) 37. | MR | Zbl
[38] , and , Bad global minima exist and SGD can reach them. Preprint (2019). | arXiv
[39] , Applied asymptotic analysis. American Mathematical Society (2006). | MR | Zbl
[40] and , Heterophilious dynamics enhances consensus. SIAM Rev. 56 (2014) 577–621. | MR | Zbl | DOI
[41] and , A simplex method for function minimization. Comput. J. 7 (1965) 308–313. | MR | Zbl | DOI
[42] , , and , A consensus-based model for global optimization and its mean-field limit. Math. Models Methods Appl. Sci. 27 (2017) 183–204. | MR | DOI
[43] and , A stochastic approximation method. Ann. Math. Stat. (1951) 400–407. | MR | Zbl | DOI
[44] , Kinetic models of opinion formation. Commun. Math. Sci. 4 (2006) 481–496. | MR | Zbl | DOI
[45] , , and , A numerical comparison of consensus-based global optimization to other particle-based global optimization schemes. Proc. Appl. Math. Mech. 18 (2018) e201800291. | DOI
Cité par Sources :





