Tight analyses for subgradient descent I: Lower bounds
Open Journal of Mathematical Optimization, Tome 5 (2024), article no. 4, 17 p.

Consider the problem of minimizing functions that are Lipschitz and convex, but not necessarily differentiable. We construct a function from this class for which the Tþ iterate of subgradient descent has error Ω(log(T)/T). This matches a known upper bound of O(log(T)/T). We prove analogous results for functions that are additionally strongly convex. There exists such a function for which the error of the Tþ iterate of subgradient descent has error Ω(log(T)/T), matching a known upper bound of O(log(T)/T). These results resolve a question posed by Shamir (2012).

Reçu le :
Révisé le :
Accepté le :
Publié le :
DOI : 10.5802/ojmo.31
Keywords: Gradient descent, First-order methods, Lower bounds

Harvey, Nicholas J. A.  1   ; Liaw, Chris  2   ; Randhawa, Sikander  1

1 University of British Columbia
2 Google Research
Licence : CC-BY 4.0
Droits d'auteur : Les auteurs conservent leurs droits
@article{OJMO_2024__5__A4_0,
     author = {Harvey, Nicholas J. A. and Liaw, Chris and Randhawa, Sikander},
     title = {Tight analyses for subgradient descent {I:} {Lower} bounds},
     journal = {Open Journal of Mathematical Optimization},
     eid = {4},
     pages = {1--17},
     year = {2024},
     publisher = {Universit\'e de Montpellier},
     volume = {5},
     doi = {10.5802/ojmo.31},
     mrnumber = {4779171},
     zbl = {07939164},
     language = {en},
     url = {https://www.numdam.org/articles/10.5802/ojmo.31/}
}
TY  - JOUR
AU  - Harvey, Nicholas J. A.
AU  - Liaw, Chris
AU  - Randhawa, Sikander
TI  - Tight analyses for subgradient descent I: Lower bounds
JO  - Open Journal of Mathematical Optimization
PY  - 2024
SP  - 1
EP  - 17
VL  - 5
PB  - Université de Montpellier
UR  - https://www.numdam.org/articles/10.5802/ojmo.31/
DO  - 10.5802/ojmo.31
LA  - en
ID  - OJMO_2024__5__A4_0
ER  - 
%0 Journal Article
%A Harvey, Nicholas J. A.
%A Liaw, Chris
%A Randhawa, Sikander
%T Tight analyses for subgradient descent I: Lower bounds
%J Open Journal of Mathematical Optimization
%] 4
%D 2024
%P 1-17
%V 5
%I Université de Montpellier
%U https://www.numdam.org/articles/10.5802/ojmo.31/
%R 10.5802/ojmo.31
%G en
%F OJMO_2024__5__A4_0
Harvey, Nicholas J. A.; Liaw, Chris; Randhawa, Sikander. Tight analyses for subgradient descent I: Lower bounds. Open Journal of Mathematical Optimization, Tome 5 (2024), article  no. 4, 17 p.. doi: 10.5802/ojmo.31

[1] Barbu, Viorel; Precupanu, Teodor Convexity and optimization in Banach spaces, Springer, 2012 | DOI | MR

[2] Bauschke, Heinz H. Projection Algorithms and Monotone Operators, Ph. D. Thesis, Simon Fraser University (1996)

[3] Bauschke, Heinz H.; Combettes, Patrick L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2017 | DOI | MR

[4] Hazan, Elad Introduction to Online Convex Optimization, Found. Trends Optim., Volume 2 (2015) no. 3–4

[5] Hazan, Elad; Agarwal, Amit; Kale, Satyen Logarithmic regret algorithms for online convex optimization, Mach. Learn., Volume 69 (2007) no. 2-3, pp. 169-192 | DOI | Zbl

[6] Hazan, Elad; Kale, Satyen Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization, J. Mach. Learn. Theory, Volume 15 (2014) no. 1, pp. 2489-2512 | Zbl | MR

[7] Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude Convex Analysis and Minimization Algorithms I, Springer, 1996

[8] Jain, Prateek; Nagaraj, Dheeraj; Netrapalli, Praneeth Making the Last Iterate of SGD Information Theoretically Optimal, Conference on Learning Theory (2019), pp. 1752-1755

[9] Lacoste-Julien, Simon; Schmidt, Mark W.; Bach, Francis R. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method (2012) | arXiv

[10] Nemirovsky, Arkadi S.; Yudin, David B. Problem complexity and method efficiency in optimization, John Wiley & Sons, 1983 | MR

[11] Nesterov, Yurii; Shikhman, Vladimir Quasi-monotone Subgradient Methods for Nonsmooth Convex Minimization, J. Optim. Theory Appl., Volume 165 (2015) no. 3, pp. 917-940 | DOI | Zbl | MR

[12] Polyak, Boris T.; Juditsky, Anatoli B. Acceleration of stochastic approximation by averaging, SIAM J. Control Optim., Volume 30 (1992) no. 4, pp. 838-855 | DOI | Zbl | MR

[13] Rakhlin, Alexander; Shamir, Ohad; Sridharan, Karthik Making gradient descent optimal for strongly convex stochastic optimization, Proceedings of ICML (2012)

[14] Rockafellar, R. Tyrrell Convex Analysis, Princeton University Press, 1970 | DOI | MR

[15] Ruppert, David Efficient estimations from a slowly convergent Robbins-Monro process (1988) (Technical report)

[16] Shamir, Ohad Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?, Proceedings of the 25th Annual Conference on Learning Theory (Proceedings of Machine Learning Research), Volume 23, 2012, p. 47.1-47.3

[17] Shamir, Ohad; Zhang, Tong Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes, Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Volume 28, 2013, pp. 71-79

Cité par Sources :