The backtrack Hölder gradient method with application to min-max and min-min problems
Open Journal of Mathematical Optimization, Tome 4 (2023), article no. 8, 17 p.

We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method – the backtrack Hölder algorithm applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us, in particular, to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptation which departs from the usual overly cautious backtracking methods. In a general framework, we provide convergence theoretical guarantees and rates. We apply our findings on simple Generative Adversarial Network (GAN) problems obtaining promising numerical results. It is worthwhile mentioning that a byproduct of our approach is a simple recipe for general Hölderian backtracking optimization.

Reçu le :
Révisé le :
Accepté le :
Publié le :
DOI : 10.5802/ojmo.24
Keywords: Hölder gradient, backtracking line search, min-max optimization, ridge method, semi-algebraic optimization

Bolte, Jérôme 1 ; Glaudin, Lilian 2 ; Pauwels, Edouard 3 ; Serrurier, Mathieu 4

1 Toulouse School of Economics, Université Toulouse Capitole, Toulouse, France
2 ANITI, University of Toulouse, Toulouse France
3 Toulouse School of Economics, Institut Universitaire de France, Toulouse, France.
4 Université Paul-Sabatier, IRIT Toulouse, France
Licence : CC-BY 4.0
Droits d'auteur : Les auteurs conservent leurs droits
@article{OJMO_2023__4__A8_0,
     author = {Bolte, J\'er\^ome and Glaudin, Lilian and Pauwels, Edouard and Serrurier, Mathieu},
     title = {The backtrack {H\"older} gradient method with application to min-max and min-min problems},
     journal = {Open Journal of Mathematical Optimization},
     eid = {8},
     pages = {1--17},
     year = {2023},
     publisher = {Universit\'e de Montpellier},
     volume = {4},
     doi = {10.5802/ojmo.24},
     language = {en},
     url = {https://www.numdam.org/articles/10.5802/ojmo.24/}
}
TY  - JOUR
AU  - Bolte, Jérôme
AU  - Glaudin, Lilian
AU  - Pauwels, Edouard
AU  - Serrurier, Mathieu
TI  - The backtrack Hölder gradient method with application to min-max and min-min problems
JO  - Open Journal of Mathematical Optimization
PY  - 2023
SP  - 1
EP  - 17
VL  - 4
PB  - Université de Montpellier
UR  - https://www.numdam.org/articles/10.5802/ojmo.24/
DO  - 10.5802/ojmo.24
LA  - en
ID  - OJMO_2023__4__A8_0
ER  - 
%0 Journal Article
%A Bolte, Jérôme
%A Glaudin, Lilian
%A Pauwels, Edouard
%A Serrurier, Mathieu
%T The backtrack Hölder gradient method with application to min-max and min-min problems
%J Open Journal of Mathematical Optimization
%D 2023
%P 1-17
%V 4
%I Université de Montpellier
%U https://www.numdam.org/articles/10.5802/ojmo.24/
%R 10.5802/ojmo.24
%G en
%F OJMO_2023__4__A8_0
Bolte, Jérôme; Glaudin, Lilian; Pauwels, Edouard; Serrurier, Mathieu. The backtrack Hölder gradient method with application to min-max and min-min problems. Open Journal of Mathematical Optimization, Tome 4 (2023), article  no. 8, 17 p.. doi: 10.5802/ojmo.24

[1] Absil, Pierre-Antoine; Mahony, Robert; Andrews, Benjamin Convergence of the iterates of descent methods for analytic cost functions, SIAM J. Optim., Volume 16 (2005) no. 2, pp. 531-547 | DOI | MR | Zbl

[2] Anil, Cem; Lucas, James; Grosse, Roger Sorting Out Lipschitz Function Approximation, Proceedings of the 36th International Conference on Machine Learning (Chaudhuri, Kamalika; Salakhutdinov, Ruslan, eds.) (Proceedings of Machine Learning Research), Volume 97, PMLR (2019), pp. 291-301

[3] Arjovsky, Martin; Chintala, Soumith; Bottou, Léon Wasserstein Generative Adversarial Networks, Proceedings of the 34th International Conference on Machine Learning (Precup, Doina; Teh, Yee Whye, eds.) (Proceedings of Machine Learning Research), Volume 70, PMLR (2017), pp. 214-223

[4] Attouch, Hédy; Bolte, Jérôme; Redont, Patrick; Soubeyran, Antoine Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res., Volume 35 (2010) no. 2, pp. 438-457 | DOI | Zbl

[5] Attouch, Hedy; Bolte, Jérôme; Svaiter, Benar Fux Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods, Math. Program., Volume 137 (2013) no. 1-2, pp. 91-129 | DOI | Zbl | MR

[6] Berger, Guillaume O.; Absil, Pierre-Antoine; Jungers, Raphaël M.; Nesterov, Yurii On the Quality of First-Order Approximation of Functions with Hölder Continuous Gradient, J. Optim. Theory Appl., Volume 185 (2020) no. 1, pp. 17-33 | DOI | Zbl

[7] Bertsekas, Dimitri P. Constrained optimization and Lagrange multiplier methods, Academic Press Inc., 2014

[8] Bochnak, Jacek; Coste, Michel; Roy, Marie-Françoise Géométrie algébrique réelle, 12, Springer, 1987

[9] Bolte, Jérôme; Sabach, Shoham; Teboulle, Marc Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems, Math. Program., Volume 146 (2014) no. 1-2, pp. 459-494 | DOI | MR | Zbl

[10] Bolte, Jérôme; Sabach, Shoham; Teboulle, Marc Nonconvex Lagrangian-based optimization: monitoring schemes and global convergence, Math. Oper. Res., Volume 43 (2018) no. 4, pp. 1210-1232 | DOI | MR | Zbl

[11] Boyd, Stephen P.; Vandenberghe, Lieven Convex Optimization, Cambridge University Press, 2004 | DOI

[12] Castera, Camille; Bolte, Jérôme; Févotte, Cédric; Pauwels, Edouard An Inertial Newton Algorithm for Deep Learning (2019) | arXiv

[13] Combettes, Patrick L.; Bauschke, Heinz H. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2017

[14] Coste, Michel An Introduction to Semialgebraic Geometry, RAAG Notes (1999)

[15] Cuturi, Marco Sinkhorn distances: Lightspeed computation of optimal transport, Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Curran Associates Inc. (2013), pp. 2292-2300

[16] Genevay, Aude; Peyré, Gabriel; Cuturi, Marco GAN and VAE from an Optimal Transport Point of View (2017) | arXiv

[17] Genevay, Aude; Peyré, Gabriel; Cuturi, Marco Learning Generative Models with Sinkhorn Divergences, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (Storkey, Amos; Perez-Cruz, Fernando, eds.) (Proceedings of Machine Learning Research), Volume 84, PMLR (2018), pp. 1608-1617

[18] Gidel, Gauthier; Berard, Hugo; Vignoud, Gaëtan; Vincent, Pascal; Lacoste-Julien, Simon A Variational Inequality Perspective on Generative Adversarial Networks, International Conference on Learning Representations (2019) https://openreview.net/forum?id=r1laena5ym

[19] Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua Generative Adversarial Nets, Proceedings of the 27th International Conference on Neural Information Processing Systems (Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N. D.; Weinberger, K. Q., eds.), Curran Associates, Inc., 2014, pp. 2672-2680

[20] Grapiglia, Geovani N.; Nesterov, Yurii Tensor Methods for Minimizing Functions with Hölder Continuous Higher-Order Derivatives (2019) | arXiv

[21] Gulrajani, Ishaan; Ahmed, Faruk; Arjovsky, Martin; Dumoulin, Vincent; Courville, Aaron Improved Training of Wasserstein GANs, Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc. (2017), pp. 5769-5779

[22] Hsieh, Yu-Guan; Iutzeler, Franck; Malick, Jérôme; Mertikopoulos, Panayotis On the convergence of single-call stochastic extra-gradient methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (2019), pp. 6936-6946

[23] Korpelevich, G. M. An extragradient method for finding saddle points and for other problems, Ehkon. Mat. Metody, Volume 12 (1976) no. 4, pp. 747-756 | MR | Zbl

[24] Kurdyka, Krzysztof On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier, Volume 48 (1998) no. 3, pp. 769-783 | DOI | MR | Numdam | Zbl

[25] Laraki, Rida; Renault, Jérôme; Sorin, Sylvain Mathematical foundations of game theory, Springer, 2019 | DOI

[26] Lin, Tianyi; Jin, Chi; Jordan, Michael I. On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems (2020) | arXiv

[27] Mertikopoulos, Panayotis; Lecouat, Bruno; Zenati, Houssam; Foo, Chuan-Sheng; Chandrasekhar, Vijay; Piliouras, Georgios Optimistic mirror descent in saddle-point problems: Going the extra(-gradient) mile, International Conference on Learning Representations (2019) https://openreview.net/forum?id=bkg8jjc9kq

[28] Nemirovski, Arkadi Prox-Method with Rate of Convergence O(1/t) ) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems, SIAM J. Optim., Volume 15 (2004) no. 1, pp. 229-251 | DOI | MR | Zbl

[29] Nesterov, Yu Universal Gradient Methods for Convex Optimization Problems, Math. Program., Volume 152 (2015) no. 1-2, pp. 381-404 | Zbl | MR | DOI

[30] Nouiehed, Maher; Sanjabi, Maziar; Huang, Tianjian; Lee, Jason D.; Razaviyayn, Meisam Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (Wallach, H.; Larochelle, H.; Beygelzimer, A.; dAlché-Buc, F.; Fox, E.; Garnett, R., eds.), Curran Associates, Inc., 2019, pp. 14905-14916

[31] Nouiehed, Maher; Sanjabi, Maziar; Huang, Tianjian; Lee, Jason D.; Razaviyayn, Meisam Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (Wallach, H.; Larochelle, H.; Beygelzimer, A.; d’Alché-Buc, F.; Fox, E.; Garnett, R., eds.), Curran Associates, Inc., 2019, pp. 14934-14942

[32] Rockafellar, R. Tyrrell Proximal subgradients, marginal values, and augmented Lagrangians in nonconvex optimization, Math. Oper. Res., Volume 6 (1981) no. 3, pp. 424-436 | DOI | MR | Zbl

[33] Rockafellar, R. Tyrrell; Wets, Roger J.-B. Variational Analysis, Grundlehren der Mathematischen Wissenschaften, Springer, 1998 no. 317 | DOI

[34] Sabach, Shoham; Teboulle, Marc Lagrangian Methods for Composite Optimization, Processing, analyzing and learning of images, shapes, and forms. Part 2 (Handbook of Numerical Analysis), Volume 20, Elsevier, 2019, pp. 401-436 | MR | Zbl | DOI

[35] Sinkhorn, Richard A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices, Ann. Math. Stat., Volume 35 (1964) no. 2, pp. 876-879 | MR | Zbl | DOI

[36] Von Neumann, John Zur theorie der gesellschaftsspiele, Math. Ann., Volume 100 (1928) no. 1, pp. 295-320 | DOI | MR | Zbl

[37] Von Neumann, John On Rings of Operators. Reduction Theory, Ann. Math., Volume 50 (1949) no. 2, pp. 401-485 | MR | Zbl | DOI

[38] Wang, Jingkang; Zhang, Tianyun; Liu, Sijia; Chen, Pin-Yu; Xu, Jiacen; Fardad, Makan; Li, Bo Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness (2019) | arXiv

[39] Yashtini, Maryam On the Global Convergence Rate of the Gradient Descent Method for Functions with Hölder Continuous Gradients, Optim. Lett., Volume 10 (2016) no. 6, pp. 1361-1370 | Zbl | MR | DOI

Cité par Sources :