Screening for a Reweighted Penalized Conditional Gradient Method
Open Journal of Mathematical Optimization, Tome 3 (2022), article no. 3, 35 p.

The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach for collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, replacing the usual convex constraints with gauge-inspired penalties. This generalization does not increase the per-iteration complexity noticeably. Without assuming bounded iterates or using line search, we show O(1/t) convergence of the gap of each subproblem, which measures distance to a stationary point. We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate O(1/(δ 2 )) where δ0 measures how close the problem is to degeneracy. In the nonconvex case the screening rule converges to the true support in a finite number of iterations, but is not necessarily safe in the intermediate iterates. In our experiments, we verify the consistency of the method and adjust the aggressiveness of the screening rule by tuning the concavity of the regularizer.

Reçu le :
Révisé le :
Accepté le :
Publié le :
DOI : 10.5802/ojmo.14
Mots clés : Dual screening, conditional gradient method, atomic sparsity, reweighted optimization
Sun, Yifan 1 ; Bach, Francis 2

1 Stonybrook University - Department of Computer Science, Stonybrook, New York, USA
2 INRIA - Département d’Informatique de l’Ecole Normale Supérieure PSL Research University Paris, France
@article{OJMO_2022__3__A3_0,
     author = {Sun, Yifan and Bach, Francis},
     title = {Screening for a {Reweighted} {Penalized} {Conditional} {Gradient} {Method}},
     journal = {Open Journal of Mathematical Optimization},
     eid = {3},
     pages = {1--35},
     publisher = {Universit\'e de Montpellier},
     volume = {3},
     year = {2022},
     doi = {10.5802/ojmo.14},
     language = {en},
     url = {http://www.numdam.org/articles/10.5802/ojmo.14/}
}
TY  - JOUR
AU  - Sun, Yifan
AU  - Bach, Francis
TI  - Screening for a Reweighted Penalized Conditional Gradient Method
JO  - Open Journal of Mathematical Optimization
PY  - 2022
SP  - 1
EP  - 35
VL  - 3
PB  - Université de Montpellier
UR  - http://www.numdam.org/articles/10.5802/ojmo.14/
DO  - 10.5802/ojmo.14
LA  - en
ID  - OJMO_2022__3__A3_0
ER  - 
%0 Journal Article
%A Sun, Yifan
%A Bach, Francis
%T Screening for a Reweighted Penalized Conditional Gradient Method
%J Open Journal of Mathematical Optimization
%D 2022
%P 1-35
%V 3
%I Université de Montpellier
%U http://www.numdam.org/articles/10.5802/ojmo.14/
%R 10.5802/ojmo.14
%G en
%F OJMO_2022__3__A3_0
Sun, Yifan; Bach, Francis. Screening for a Reweighted Penalized Conditional Gradient Method. Open Journal of Mathematical Optimization, Tome 3 (2022), article  no. 3, 35 p. doi : 10.5802/ojmo.14. http://www.numdam.org/articles/10.5802/ojmo.14/

[1] Bach, Francis Structured sparsity-inducing norms through submodular functions, Advances in Neural Information Processing Systems (2010), pp. 118-126

[2] Bach, Francis Duality between subgradient and conditional gradient methods, SIAM J. Optim., Volume 25 (2015) no. 1, pp. 115-129 | DOI | MR | Zbl

[3] Bauschke, Heinz H.; Bolte, Jérôme; Teboulle, Marc A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Math. Oper. Res., Volume 42 (2017) no. 2, pp. 330-348 | DOI | MR | Zbl

[4] Bauschke, Heinz H.; Combettes, Patrick L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 408, Springer, 2011 | DOI

[5] Berrada, Leonard; Zisserman, Andrew; Kumar, M. Pawan Deep Frank-Wolfe for neural network optimization (2018) (https://arxiv.org/abs/1811.07591)

[6] Bondell, Howard D.; Reich, Brian J. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR, Biometrics, Volume 64 (2008) no. 1, pp. 115-123 | DOI | MR | Zbl

[7] Bonnefoy, Antoine; Valentin, Emiya; Liva, Ralaivola; Gribonval, Remi Dynamic screening: Accelerating first-order algorithms for the LASSO and group-LASSO, IEEE Trans. Signal Process., Volume 63 (2015) no. 19, pp. 5121-5132 | DOI | MR | Zbl

[8] Borwein, Jonathan; Lewis, Adrian S. Convex Analysis and Nonlinear Optimization: Theory and Examples, Springer, 2010

[9] Bredies, Kristian; Lorenz, Dirk A. Iterated hard shrinkage for minimization problems with sparsity constraints, SIAM J. Sci. Comput., Volume 30 (2008) no. 2, pp. 657-683 | DOI | MR | Zbl

[10] Bredies, Kristian; Lorenz, Dirk A.; Maass, Peter A generalized conditional gradient method and its connection to an iterative shrinkage method, Comput. Optim. Appl., Volume 42 (2009) no. 2, pp. 173-193 | DOI | MR | Zbl

[11] Burke, James V.; Moré, Jorge J. On the identification of active constraints, SIAM J. Numer. Anal., Volume 25 (1988) no. 5, pp. 1197-1211 | DOI | MR | Zbl

[12] Burke, Jim On the identification of active constraints II: The nonconvex case, SIAM J. Numer. Anal., Volume 27 (1990) no. 4, pp. 1081-1102 | DOI | MR | Zbl

[13] Candès, Emmanuel; Romberg, Justin Robust signal recovery from incomplete observations, 2006 International Conference on Image Processing, IEEE (2006), pp. 1281-1284 | DOI

[14] Candès, Emmanuel; Tao, Terence Decoding by linear programming, IEEE Trans. Inf. Theory, Volume 51 (2005) no. 12, pp. 4203-4215 | DOI | MR | Zbl

[15] Chandrasekaran, Venkat; Recht, Benjamin; Parrilo, Pablo A.; Willsky, Alan S. The convex geometry of linear inverse problems, Found. Comput. Math., Volume 12 (2012) no. 6, pp. 805-849 | DOI | MR | Zbl

[16] Chari, Visesh; Lacoste-Julien, Simon; Laptev, Ivan; Sivic, Josef On pairwise costs for network flow multi-object tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 5537-5545

[17] Chen, Xiaojun; Zhou, Weijun Convergence of reweighted 1 minimization algorithms and unique solution of truncated p minimization, Department of Applied Mathematics, The Hong Kong Polytechnic University, 2010

[18] Clarke, Frank H. Generalized gradients and applications, Trans. Am. Math. Soc., Volume 205 (1975), pp. 247-262 | DOI | MR | Zbl

[19] Clarke, Frank H. Nonsmooth analysis and optimization, Proceedings of the international congress of mathematicians, Volume 5, Citeseer (1983), pp. 847-853

[20] Clarkson, Kenneth L. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, ACM Trans. Algorithms, Volume 6 (2010) no. 4, p. 63 | MR | Zbl

[21] Daubechies, Ingrid; DeVore, Ronald; Fornasier, Massimo; Güntürk, C Sinan Iteratively reweighted least squares minimization for sparse recovery, Commun. Pure Appl. Math., Volume 63 (2010) no. 1, pp. 1-38 | DOI | MR | Zbl

[22] Donoho, David L. Compressed sensing, IEEE Trans. Inf. Theory, Volume 52 (2006) no. 4, pp. 1289-1306 | DOI | MR | Zbl

[23] Dudik, Miroslav; Harchaoui, Zaid; Malick, Jérôme Lifted coordinate descent for learning with trace-norm regularization, Artificial Intelligence and Statistics (2012), pp. 327-336

[24] Dunn, Joseph C.; Harshbarger, S. Conditional gradient algorithms with open loop step size rules, J. Math. Anal. Appl., Volume 62 (1978) no. 2, pp. 432-444 | DOI | MR | Zbl

[25] El Ghaoui, Laurent; Viallon, Vivian; Rabbani, Tarek Safe feature elimination for the Lasso and sparse supervised learning problems, Pac. J. Optim., Volume 8 (2012) no. 4, pp. 667-698 | Zbl

[26] Ene, Alina; Vladu, Adrian Improved Convergence for 1 and Regression via Iteratively Reweighted Least Squares, International Conference on Machine Learning (2019), pp. 1794-1801

[27] Fan, Zhenan; Jeong, Halyun; Sun, Yifan; Friedlander, Michael et al. Atomic Decomposition via Polar Alignment: The Geometry of Structured Optimization, Found. Trends Optim., Volume 3 (2020) no. 4, pp. 280-366

[28] Fercoq, Olivier; Gramfort, Alexandre; Salmon, Joseph Mind the duality gap: safer rules for the lasso, International Conference on Machine Learning, PMLR (2015), pp. 333-342

[29] Frank, Marguerite; Wolfe, Philip An algorithm for quadratic programming, Nav. Res. Logist. Q., Volume 3 (1956) no. 1-2, pp. 95-110 | DOI | MR

[30] Freund, Robert M. Dual gauge programs, with applications to quadratic programming and the minimum-norm problem, Math. Program., Volume 38 (1987) no. 1, pp. 47-67 | DOI | MR | Zbl

[31] Freund, Robert M.; Grigas, Paul; Mazumder, Rahul An Extended Frank–Wolfe Method with “In-Face” Directions, and Its Application to Low-Rank Matrix Completion, SIAM J. Optim., Volume 27 (2017) no. 1, pp. 319-346 | DOI | MR | Zbl

[32] Friedlander, Michael; Macedo, Ives; Pong, Ting Kei Gauge optimization and duality, SIAM J. Optim., Volume 24 (2014) no. 4, pp. 1999-2022 | DOI | MR | Zbl

[33] Gong, Pinghua; Zhang, Changshui; Lu, Zhaosong; Huang, Jianhua; Ye, Jieping A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, International Conference on Machine Learning (2013), pp. 37-45

[34] Guélat, Jacques; Marcotte, Patrice Some comments on Wolfe’s ‘away step’, Math. Program., Volume 35 (1986) no. 1, pp. 110-119 | DOI | MR | Zbl

[35] Guyon, Isabelle; Gunn, Steve; Ben-Hur, Asa; Dror, Gideon Result analysis of the NIPS 2003 feature selection challenge, Adv. Neural Inf. Process. Syst., Volume 17 (2004)

[36] Harchaoui, Zaid; Juditsky, Anatoli; Nemirovski, Arkadi Conditional gradient algorithms for norm-regularized smooth convex optimization, Math. Program., Volume 152 (2015) no. 1-2, pp. 75-112 | DOI | MR | Zbl

[37] Hare, Warren Identifying active manifolds in regularization problems, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer, 2011, pp. 261-271 | DOI | Zbl

[38] Hazan, Elad Sparse approximate solutions to semidefinite programs, Latin American Symposium on Theoretical Informatics, Springer (2008), pp. 306-316 | Zbl

[39] Jaggi, Martin Revisiting Frank-Wolfe: Projection-free sparse convex optimization., ICML (2013), pp. 427-435

[40] Johnson, Tyler B.; Guestrin, Carlos Stingy CD: safely avoiding wasteful updates in coordinate descent, Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), pp. 1752-1760

[41] Krishnan, Rahul G.; Lacoste-Julien, Simon; Sontag, David Barrier Frank-Wolfe for marginal inference, Advances in Neural Information Processing Systems (2015), pp. 532-540

[42] Lacoste-Julien, Simon; Jaggi, Martin On the global linear convergence of Frank-Wolfe optimization variants, Advances in Neural Information Processing Systems, 2015, pp. 496-504

[43] Lacoste-Julien, Simon; Jaggi, Martin; Schmidt, Mark; Pletscher, Patrick Block-coordinate Frank-Wolfe optimization for structural SVMs, International Conference on Machine Learning, PMLR (2013), pp. 53-61

[44] Lacoste-Julien, Simon; Lindsten, Fredrik; Bach, Francis Sequential kernel herding: Frank-Wolfe optimization for particle filtering, Artificial Intelligence and Statistics, PMLR (2015), pp. 544-552

[45] Lewis, Adrian S.; Wright, Stephen J. Identifying activity, SIAM J. Optim., Volume 21 (2011) no. 2, pp. 597-614 | DOI | MR | Zbl

[46] Liu, Jun; Zhao, Zheng; Wang, Jie; Ye, Jieping Safe screening with variational inequalities and its application to lasso, International Conference on Machine Learning, PMLR (2014), pp. 289-297

[47] Lu, Haihao; Freund, Robert M.; Nesterov, Yurii Relatively smooth convex optimization by first-order methods, and applications, SIAM J. Optim., Volume 28 (2018) no. 1, pp. 333-354 | MR | Zbl

[48] Mairal, Julien; Bach, Francis; Ponce, Jean Sparse modeling for image and vision processing (2014) (https://arxiv.org/abs/1411.3230)

[49] Malti, Abed; Herzet, Cédric Safe screening tests for Lasso based on firmly non-expansiveness, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016), pp. 4732-4736 | DOI

[50] Mu, Cun; Zhang, Yuqian; Wright, John; Goldfarb, Donald Scalable robust matrix recovery: Frank–Wolfe meets proximal methods, SIAM J. Sci. Comput., Volume 38 (2016) no. 5, p. A3291-A3317 | MR | Zbl

[51] Ndiaye, Eugene; Fercoq, Olivier; Gramfort, Alexandre; Salmon, Joseph GAP safe screening rules for sparse multi-task and multi-class models, Advances in Neural Information Processing Systems (2015), pp. 811-819

[52] Nesterov, Yurii Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2013

[53] Nutini, Julie; Schmidt, Mark; Hare, Warren “Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?, Optim. Lett., Volume 13 (2019) no. 4, pp. 645-655 | DOI | MR | Zbl

[54] Nutini, Julie; Schmidt, Mark; Laradji, Issam; Friedlander, Michael; Koepke, Hoyt Coordinate descent converges faster with the Gauss-Southwell rule than random selection, International Conference on Machine Learning (2015), pp. 1632-1641

[55] Obozinski, Guillaume; Jacob, Laurent; Vert, Jean-Philippe Group lasso with overlaps: the latent group lasso approach (2011) (https://arxiv.org/abs/1110.0413)

[56] Ochs, Peter; Dosovitskiy, Alexey; Brox, Thomas; Pock, Thomas On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision, SIAM J. Imaging Sci., Volume 8 (2015) no. 1, pp. 331-372 | DOI | MR | Zbl

[57] Ping, Wei; Liu, Qiang; Ihler, Alexander T. Learning infinite RBMs with Frank-Wolfe, Advances in Neural Information Processing Systems (2016), pp. 3063-3071

[58] Rakotomamonjy, Alain; Gasso, Gilles; Salmon, Joseph Screening rules for lasso with non-convex sparse regularizers, International Conference on Machine Learning, PMLR (2019), pp. 5341-5350

[59] Rao, Nikhil; Shah, Parikshit; Wright, Stephen J. Forward-backward greedy algorithms for atomic norm regularization, IEEE Trans. Signal Process., Volume 63 (2015) no. 21, pp. 5798-5811 | MR | Zbl

[60] Rockafellar, R. Tyrrell Convex Analysis, 28, Princeton University Press, 1970 | DOI

[61] Sener, Ozan; Koltun, Vladlen Multi-task learning as multi-objective optimization, Advances in Neural Information Processing Systems (2018), pp. 527-538

[62] Tewari, Ambuj; Ravikumar, Pradeep K.; Dhillon, Inderjit S. Greedy algorithms for structurally constrained high dimensional problems, Advances in Neural Information Processing Systems (2011), pp. 882-890

[63] Tibshirani, Robert Regression shrinkage and selection via the Lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol., Volume 58 (1996) no. 1, pp. 267-288 | MR | Zbl

[64] Vinyes, Marina; Obozinski, Guillaume Fast column generation for atomic norm regularization, Proceedings of International Conference on Artificial Intelligence and Statistics (2017)

[65] Von Hohenbalken, Balder Simplicial decomposition in nonlinear programming algorithms, Math. Program., Volume 13 (1977) no. 1, pp. 49-68 | DOI | MR | Zbl

[66] Wang, Jie; Zhou, Jiayu; Wonka, Peter; Ye, Jieping LASSO screening rules via dual polytope projection, Adv. Neural Inf. Process. Syst. (2013), pp. 1070-1078

[67] Wolke, R.; Schwetlick, H. Iteratively reweighted least squares: algorithms, convergence analysis, and numerical comparisons, SIAM J. Sci. Stat. Comput., Volume 9 (1988) no. 5, pp. 907-921 | DOI | MR | Zbl

[68] Wright, Stephen J.; Nowak, Robert D.; Figueiredo, Mário A. T. Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., Volume 57 (2009) no. 7, pp. 2479-2493 | DOI | MR

[69] Yu, Yaoliang; Zhang, Xinhua; Schuurmans, Dale Generalized conditional gradient for sparse estimation, J. Mach. Learn. Theory, Volume 18 (2017) no. 1, pp. 5279-5324

[70] Yurtsever, Alp; Udell, Madeleine; Tropp, Joel; Cevher, Volkan Sketchy decisions: Convex low-rank matrix optimization with optimal storage, Artificial intelligence and statistics, PMLR (2017), pp. 1188-1196

[71] Zeng, Xiangrong; Figueiredo, Mário A. T. The Ordered Weighted 1 Norm: Atomic Formulation, Projections, and Algorithms (2014) (https://arxiv.org/abs/1409.4271)

[72] Zhou, Song; Gupta, Swati; Udell, Madeleine Limited memory Kelley’s method converges for composite convex and submodular objectives, Advances in Neural Information Processing Systems, 2018, pp. 4414-4424

Cité par Sources :