Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation
ESAIM: Control, Optimisation and Calculus of Variations, Tome 27 (2021), article no. 16

A learning approach for optimal feedback gains for nonlinear continuous time control systems is proposed and analysed. The goal is to establish a rigorous framework for computing approximating optimal feedback gains using neural networks. The approach rests on two main ingredients. First, an optimal control formulation involving an ensemble of trajectories with ‘control’ variables given by the feedback gain functions. Second, an approximation to the feedback functions via realizations of neural networks. Based on universal approximation properties we prove the existence and convergence of optimal stabilizing neural network feedback controllers.

DOI : 10.1051/cocv/2021009
Classification : 49J15, 49N35, 68Q32, 93B52, 93D15
Keywords: Optimal feedback stabilization, neural networks, Hamilton-Jacobi-Bellman equation, reinforcement learning
@article{COCV_2021__27_1_A18_0,
     author = {Kunisch, Karl and Walter, Daniel},
     title = {Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation},
     journal = {ESAIM: Control, Optimisation and Calculus of Variations},
     year = {2021},
     publisher = {EDP-Sciences},
     volume = {27},
     doi = {10.1051/cocv/2021009},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/cocv/2021009/}
}
TY  - JOUR
AU  - Kunisch, Karl
AU  - Walter, Daniel
TI  - Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation
JO  - ESAIM: Control, Optimisation and Calculus of Variations
PY  - 2021
VL  - 27
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/cocv/2021009/
DO  - 10.1051/cocv/2021009
LA  - en
ID  - COCV_2021__27_1_A18_0
ER  - 
%0 Journal Article
%A Kunisch, Karl
%A Walter, Daniel
%T Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation
%J ESAIM: Control, Optimisation and Calculus of Variations
%D 2021
%V 27
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/cocv/2021009/
%R 10.1051/cocv/2021009
%G en
%F COCV_2021__27_1_A18_0
Kunisch, Karl; Walter, Daniel. Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation. ESAIM: Control, Optimisation and Calculus of Variations, Tome 27 (2021), article no. 16. doi: 10.1051/cocv/2021009

[1] R. Bellman, Adaptive control processes: A guided tour. (A RAND Corporation Research Study). Princeton University Press, XVI, Princeton, N.J. (1961).

[2] D. Bertsekas, Multiagent rollout algorithms and reinforcement learning (2019).

[3] D. Bertsekas, Reinforcement Learning and Optimal Control. Athena Scientific (2019).

[4] T. Breiten, K. Kunisch and L. Pfeiffer, Infinite-horizon bilinear optimal control problems: Sensitivity analysis and polynomial feedback laws. SIAM J. Control Optim. 56 (2018) 3184–3214.

[5] T. Breiten, K. Kunisch and L. Pfeiffer, Numerical study of polynomial feedback laws for a bilinear control problem. Math. Control Relat. Fields 8 (2018) 557–582.

[6] T. Breiten, K. Kunisch and L. Pfeiffer, Feedback stabilization of the two-dimensional Navier-Stokes equations by value function approximation, tech. rep., University of Graz (2019). Preprint . | arXiv

[7] E. Casas and K. Kunisch, Stabilization by sparse controls for a class of semilinear parabolic equations. SIAM J. Control Optim. 55 (2017) 512–532.

[8] Y. T. Chow, W. Li, S. Osher and W. Yin, Algorithm for Hamilton-Jacobi equations in density space via a generalized Hopf formula (2018).

[9] E. Corominas and F. Sunyer Balaguer Conditions for an infinitely differentiable function to be a polynomial. Revista Mat. Hisp.-Amer. 14 (1954) 26–43.

[10] R. Curtain and H. Zwart, An Introduction to Infinite-Dimensional Linear Systems Theory. Springer-Verlag (2005).

[11] J. Diestel and J. J. Uhl, Jr., Vector measures. With a foreword by B. J. Pettis, Mathematical Surveys, No. 15. American Mathematical Society, Providence, R.I. (1977).

[12] S. Dolgov, D. Kalise and K. Kunisch, Tensor decomposition for high-dimensional Hamilton-Jacobi-Bellman equations. To appear in: Siam J. Sci. Comput. (2019).

[13] W. F. Donoghue, Jr., Distributions and Fourier transforms. Vol. 32 of Pure and Applied Mathematics. Academic Press, New York (1969).

[14] R. E. Edwards, Functional Analysis. Theory and Applications. Holt, Rinehart and Winston, New York (1965).

[15] M. Falcone and R. Ferretti, Semi-Lagrangian approximation schemes for linear and Hamilton-Jacobi equations. Society for Industrial and Applied Mathematics SIAM, Philadelphia, PA (2014).

[16] W. H. Fleming and H. M. Soner, Controlled Markov processes and viscosity solutions. Vol. 25 of Stochastic Modelling and Applied Probability. Springer, New York, second ed. (2006).

[17] J. Garcke and A. Kröner, Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids. J. Sci. Comput. 70 (2017) 1–28.

[18] S. Garreis and M. Ulbrich, Constrained optimization with low-rank tensors and applications to parametric problems with PDEs. SIAM J. Sci. Comput. 39 (2017) A25–A54.

[19] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition. Preprint (2015). | arXiv

[20] K. Hornik, Multilayer feedforward networks are universal approximators. Neural Netw. 2 (1989) 359–366.

[21] D. Kalise and K. Kunisch, Polynomial approximation of high-dimensional Hamilton-Jacobi-Bellman equations and applications to feedback control of semilinear parabolic PDEs. SIAM J. Sci. Comput. 40 (2018) A629–A652.

[22] D. Kalise, K. Kunisch and Z. Rao, eds., Hamilton-Jacobi-Bellman equations. Vol. 21 of Radon Series on Computational and Applied Mathematics. De Gruyter, Berlin (2018).

[23] D. P. Kouri, M. Heinkenschloss, D. Ridzal and B. G. Van Bloemen Waanders, A trust-region algorithm with adaptive stochastic collocation for PDE optimization under uncertainty. SIAM J. Sci. Comput. 35 (2013) A1847–A1879.

[24] M. Leshno, V. Y. Lin, A. Pinkus and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6 (1993) 861–867.

[25] V. Y. Lin and A. Pinkus, Fundamentality of ridge functions. J. Approx. Theory 75 (1993) 295–311.

[26] J. Lions and E. Magenes, Non-homogeneous Boundary Value Problems and Applications. Vol. I/II. Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen. Springer-Verlag, Berlin (1972).

[27] P.-L. Lions and J.-C. Rochet, Hopf formula and multitime Hamilton-Jacobi equations. Proc. Am. Math. Soc. 96 (1986) 79–84.

[28] T. Nakamura-Zimmerer, Q. Gong and W. Kang, Adaptive deep learning for high-dimensional Hamilton-Jacobi-bellman equations (2019).

[29] T. Osa, J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel and J. Peters, An algorithmic perspective on imitation learning. Found. Trends Robotics 7 (2018) 1–179.

[30] J. Peters and S. Schaal, Reinforcement learning of motor skills with policy gradients. Neural Networks 21 (2008) 682–697.

[31] A. Pinkus, Approximation theory of the MLP model. Neural Networks 8 (1999) 143–195.

[32] S. P. Ponomarëv, Submersions and pre-images of sets of measure zero. Sibirsk. Mat. Zh. 28 (1987) 199–210.

[33] B. Recht, A tour of reinforcement learning: The view from continuous control. Annu. Rev. Control Robotics Auton. Syst. 2 (2018) 253–279.

[34] H. L. Royden, Real analysis. The Macmillan Co., New York; Collier-Macmillan Ltd., London (1963).

[35] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, second ed. (2018).

[36] L. Thevenet, J.-M. Buchot and J.-P. Raymond, Nonlinear feedback stabilization of a two-dimensional Burgers equation. ESAIM: COCV 16 (2010) 929–955.

[37] F. Trèves, Topological vector spaces, distributions and kernels. Academic Press, New York-London (1967).

[38] K. Vamvoudakis, F. Lewis and S. S. Ge, Neural networks in feedback control systems. Mechanical Engineers’ Handbook: Instrumentation, Systems, Controls, and MEMS. Wiley (2015).

[39] A. Van Der Schaft, L2-gain and passivity techniques in nonlinear control. Vol. 218 of Lecture Notes in Control and Information Sciences. Springer-Verlag London, Ltd., London (1996).

[40] E. Weinan and B. Yu, The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6 (2018) 1–12.

Cité par Sources :

Research supported by the ERC advanced grant 668998 (OCLOC) under the EU’s H2020 research program.