A consensus-based global optimization method for high dimensional machine learning problems
ESAIM: Control, Optimisation and Calculus of Variations, Tome 27 (2021), article no. S5

We improve recently introduced consensus-based optimization method, proposed in [R. Pinnau, C. Totzeck, O. Tse, S. Martin, Math. Models Methods Appl. Sci. 27 (2017) 183–204], which is a gradient-free optimization method for general non-convex functions. We first replace the isotropic geometric Brownian motion by the component-wise one, thus removing the dimensionality dependence of the drift rate, making the method more competitive for high dimensional optimization problems. Secondly, we utilize the random mini-batch ideas to reduce the computational cost of calculating the weighted average which the individual particles tend to relax toward. For its mean-field limit – a nonlinear Fokker-Planck equation – we prove, in both time continuous and semi-discrete settings, that the convergence of the method, which is exponential in time, is guaranteed with parameter constraints independent of the dimensionality. We also conduct numerical tests to high dimensional problems to check the success rate of the method.

DOI : 10.1051/cocv/2020046
Classification : 60H35, 70F10
Keywords: Global optimization, high dimensional optimization, consensus-based optimization, random batch method
@article{COCV_2021__27_S1_A6_0,
     author = {Carrillo, Jos\'e A. and Jin, Shi and Li, Lei and Zhu, Yuhua},
     title = {A consensus-based global optimization method for high dimensional machine learning problems},
     journal = {ESAIM: Control, Optimisation and Calculus of Variations},
     year = {2021},
     publisher = {EDP-Sciences},
     volume = {27},
     doi = {10.1051/cocv/2020046},
     mrnumber = {4222159},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/cocv/2020046/}
}
TY  - JOUR
AU  - Carrillo, José A.
AU  - Jin, Shi
AU  - Li, Lei
AU  - Zhu, Yuhua
TI  - A consensus-based global optimization method for high dimensional machine learning problems
JO  - ESAIM: Control, Optimisation and Calculus of Variations
PY  - 2021
VL  - 27
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/cocv/2020046/
DO  - 10.1051/cocv/2020046
LA  - en
ID  - COCV_2021__27_S1_A6_0
ER  - 
%0 Journal Article
%A Carrillo, José A.
%A Jin, Shi
%A Li, Lei
%A Zhu, Yuhua
%T A consensus-based global optimization method for high dimensional machine learning problems
%J ESAIM: Control, Optimisation and Calculus of Variations
%D 2021
%V 27
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/cocv/2020046/
%R 10.1051/cocv/2020046
%G en
%F COCV_2021__27_S1_A6_0
Carrillo, José A.; Jin, Shi; Li, Lei; Zhu, Yuhua. A consensus-based global optimization method for high dimensional machine learning problems. ESAIM: Control, Optimisation and Calculus of Variations, Tome 27 (2021), article no. S5. doi: 10.1051/cocv/2020046

[1] G. Albi and L. Pareschi, Binary interaction algorithms for the simulation of flocking and swarming dynamics. Multisc. Model. Simul. 11 (2013) 1–29. | MR | Zbl | DOI

[2] J. J. Alonso and J. Hicken, Introduction to multidisciplinary design optimization. In Vol. 222 of Aeronautics & Astronautics. Standford University (2012).

[3] N. Bellomo, A. Bellouquid and D. Knopoff, From the microscale to collective crowd dynamics. Multis. Model. Simul. 11 (2013) 943–963. | MR | Zbl | DOI

[4] C. M. Bender and S. A. Orszag, Advanced Mathematical Methods for Scientists and Engineers. International Series in Pure and Applied Mathematics. McGraw-Hill (1978). | MR | Zbl

[5] Y. Bengio, P. Simard and P. Frasconi, Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5 (1994) 157–166. | DOI

[6] A. L. Bertozzi, J. Rosado, M. B. Short and L. Wang, Contagion shocks in one dimension. J. Stat. Phys. 158 (2015) 647–664. | MR | DOI

[7] F. Bolley, J. A. Cañizo and J. A. Carrillo, Stochastic mean-field limit: non-Lipschitz forces and swarming. Math. Models Methods Appl. Sci. 21 (2011) 2179–2210. | MR | Zbl | DOI

[8] L. Bottou, Online learning and stochastic approximations. On-line Learn. Neural Netw. 17 (1998) 142. | Zbl

[9] S. Bubeck, Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8 (2015) 231–357. | DOI

[10] J. A. Carrillo, Y.-P. Choi and M. Hauray, The derivation of swarming models: mean-field limit and Wasserstein distances. Collective dynamics from bacteria to crowds, volume 553 of CISM Courses and Lectures. Springer, Vienna (2014) 1–46. | MR

[11] J. A. Carrillo, Y.-P. Choi, C. Totzeck and O. Tse, An analytical framework for consensus-based global optimization method. Math. Models Methods Appl. Sci. 28 (2018) 1037–1066. | MR | DOI

[12] J. A. Carrillo, M. Fornasier, J. Rosado and G. Toscani, Asymptotic flocking dynamics for the kinetic Cucker-Smale model. SIAM J. Math. Anal. 42 (2010) 218–236. | MR | Zbl | DOI

[13] J. A. Carrillo, M. Fornasier, G. Toscani and F. Vecil, Particle, kinetic, and hydrodynamic models of swarming. In Mathematical modeling of collective behavior in socio-economic and life sciences, Modelling and Simulation in Materials Science and Engineering. Birkhäuser Boston, Inc., Boston, MA (2010) 297–336. | MR | Zbl

[14] J. A. Carrillo, L. Pareschi and M. Zanella, Particle based gPC methods for mean-field models of swarming with uncertainty. Commun. Comput. Phys. 25 (2018) 508–531. | MR

[15] F. Cucker and S. Smale, On the mathematics of emergence. Jpn. J. Math. 2 (2007) 197–227. | MR | Zbl | DOI

[16] X. Dai and Y. Zhu, Towards theoretical understanding of large batch training in stochastic gradient descent. Preprint (2018). | arXiv

[17] A. Dembo and O. Zeitouni, Vol. 38 of Large deviations techniques and applications. Springer Science & Business Media (2009). | MR | Zbl

[18] R. Eberhart and J. Kennedy, Particle swarm optimization. In Proc. of IEEE International Conference on Neural Networks 4 (1995) 1942–1948.

[19] S.-Y. Ha, S. Jin and D. Kim, Convergence of a first-order consensus-based global optimization algorithm. Preprint (2019). | arXiv | MR

[20] S.-Y. Ha and E. Tadmor, From particle to kinetic and hydrodynamic descriptions of flocking. Kinetic Related Models 1 (2008) 415–435. | MR | Zbl | DOI

[21] B. Hanin, Which neural net architectures give rise to exploding and vanishing gradients? In Adv. Neural Inf. Process. Syst. (2018) 582–591.

[22] M. Hauray and P.-E. Jabin, N -particles approximation of the Vlasov equations with singular potential. Arch. Ratl. Mech. Anal. 183 (2007) 489–524. | MR | Zbl | DOI

[23] J. H. Holland, Genetic algorithms. Sci. Am. 267 (1992) 66–73. | DOI

[24] R. A. Holley, S. Kusuoka and D. W. Stroock, Asymptotics of the spectral gap with applications to the theory of simulated annealing. J. Funct. Anal. 83 (1989) 333–347. | MR | Zbl | DOI

[25] L. C. Hsu, A theorem on the asymptotic behavior of a multiple integral. Duke Math. J. 15 (1948) 623–632. | MR | Zbl

[26] C.-R. Hwang and S.-J. Sheu, Large-time behavior of perturbed diffusion Markov processes with applications to the second eigenvalue problem for Fokker-Planck operators and simulated annealing. Acta Appl. Math. 19 (1990) 253–295. | MR | Zbl | DOI

[27] T. Inglot and P. Majerski, Simple upper and lower bounds for the multivariate Laplace approximation. J. Approx. Theory 186 (2014) 1–11. | MR | Zbl | DOI

[28] P.-E. Jabin, A review of the mean field limits for Vlasov equations. Kinetic Related Models 7 (2014) 661–711. | MR | DOI

[29] P.-E. Jabin and Z. Wang, Quantitative estimates of propagation of chaos for stochastic systems with W - 1 , kernels. Invent. Math. 214 (2018) 523–591. | MR | DOI

[30] S. Jastrzebski, Z. Kenton, D. Arpit, N. Ballas, A. Fischer, Y. Bengio and A. Storkey, Three factors influencing minima in SGD. Preprint (2017). | arXiv

[31] S. Jin, L. Li and J.-G. Liu, Random batch methods (RBM) for interacting particle systems. J. Comput. Phys. 400 (2020) 108877. | MR | DOI

[32] J. Kennedy, Swarm intelligence, handbook of nature-inspired and innovative computing. Springer (2006) 187–219. | DOI

[33] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy and P. T. P. Tang, On large-batch training for deep learning: generalization gap and sharp minima. In International Conference on Learning Representations (2017).

[34] S. Kirkpatrick, C. Daniel Gelatt and M. P. Vecchi, Optimization by simulated annealing. Science 220 (1983) 671–680. | MR | Zbl | DOI

[35] T. Kolokolnikov, J. A. Carrillo, A. Bertozzi, R. Fetecau and M. Lewis, Emergent behaviour in multi-particle systems with non-local interactions [Editorial]. J. Phys. D 260 (2013) 1–4. | MR | DOI

[36] J. P. Mcclure and R. Wong, Error bounds for multidimensional Laplace approximation. J. Approx. Theory 37 (1983) 372–390. | MR | Zbl | DOI

[37] P. J. M. Van Laarhoven and E. H. L. Aarts, Simulated annealing: theory and applications. D. Reidel Publishing Co., Dordrecht (1987) 37. | MR | Zbl

[38] S. Liu, D. Papailiopoulos and D. Achlioptas, Bad global minima exist and SGD can reach them. Preprint (2019). | arXiv

[39] P. D. Miller, Applied asymptotic analysis. American Mathematical Society (2006). | MR | Zbl

[40] S. Motsch and E. Tadmor, Heterophilious dynamics enhances consensus. SIAM Rev. 56 (2014) 577–621. | MR | Zbl | DOI

[41] J. A. Nelder and R. Mead, A simplex method for function minimization. Comput. J. 7 (1965) 308–313. | MR | Zbl | DOI

[42] R. Pinnau, C. Totzeck, O. Tse and S. Martin, A consensus-based model for global optimization and its mean-field limit. Math. Models Methods Appl. Sci. 27 (2017) 183–204. | MR | DOI

[43] H. Robbins and S. Monro, A stochastic approximation method. Ann. Math. Stat. (1951) 400–407. | MR | Zbl | DOI

[44] G. Toscani, Kinetic models of opinion formation. Commun. Math. Sci. 4 (2006) 481–496. | MR | Zbl | DOI

[45] C. Totzeck, R. Pinnau, S. Blauth and S. Schotthófer, A numerical comparison of consensus-based global optimization to other particle-based global optimization schemes. Proc. Appl. Math. Mech. 18 (2018) e201800291. | DOI

Cité par Sources :