A learning approach for optimal feedback gains for nonlinear continuous time control systems is proposed and analysed. The goal is to establish a rigorous framework for computing approximating optimal feedback gains using neural networks. The approach rests on two main ingredients. First, an optimal control formulation involving an ensemble of trajectories with ‘control’ variables given by the feedback gain functions. Second, an approximation to the feedback functions via realizations of neural networks. Based on universal approximation properties we prove the existence and convergence of optimal stabilizing neural network feedback controllers.
Keywords: Optimal feedback stabilization, neural networks, Hamilton-Jacobi-Bellman equation, reinforcement learning
@article{COCV_2021__27_1_A18_0,
author = {Kunisch, Karl and Walter, Daniel},
title = {Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation},
journal = {ESAIM: Control, Optimisation and Calculus of Variations},
year = {2021},
publisher = {EDP-Sciences},
volume = {27},
doi = {10.1051/cocv/2021009},
language = {en},
url = {https://www.numdam.org/articles/10.1051/cocv/2021009/}
}
TY - JOUR AU - Kunisch, Karl AU - Walter, Daniel TI - Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation JO - ESAIM: Control, Optimisation and Calculus of Variations PY - 2021 VL - 27 PB - EDP-Sciences UR - https://www.numdam.org/articles/10.1051/cocv/2021009/ DO - 10.1051/cocv/2021009 LA - en ID - COCV_2021__27_1_A18_0 ER -
%0 Journal Article %A Kunisch, Karl %A Walter, Daniel %T Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation %J ESAIM: Control, Optimisation and Calculus of Variations %D 2021 %V 27 %I EDP-Sciences %U https://www.numdam.org/articles/10.1051/cocv/2021009/ %R 10.1051/cocv/2021009 %G en %F COCV_2021__27_1_A18_0
Kunisch, Karl; Walter, Daniel. Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation. ESAIM: Control, Optimisation and Calculus of Variations, Tome 27 (2021), article no. 16. doi: 10.1051/cocv/2021009
[1] , Adaptive control processes: A guided tour. (A RAND Corporation Research Study). Princeton University Press, XVI, Princeton, N.J. (1961).
[2] , Multiagent rollout algorithms and reinforcement learning (2019).
[3] , Reinforcement Learning and Optimal Control. Athena Scientific (2019).
[4] , and , Infinite-horizon bilinear optimal control problems: Sensitivity analysis and polynomial feedback laws. SIAM J. Control Optim. 56 (2018) 3184–3214.
[5] , and , Numerical study of polynomial feedback laws for a bilinear control problem. Math. Control Relat. Fields 8 (2018) 557–582.
[6] , and , Feedback stabilization of the two-dimensional Navier-Stokes equations by value function approximation, tech. rep., University of Graz (2019). Preprint . | arXiv
[7] and , Stabilization by sparse controls for a class of semilinear parabolic equations. SIAM J. Control Optim. 55 (2017) 512–532.
[8] , , and , Algorithm for Hamilton-Jacobi equations in density space via a generalized Hopf formula (2018).
[9] and Conditions for an infinitely differentiable function to be a polynomial. Revista Mat. Hisp.-Amer. 14 (1954) 26–43.
[10] and , An Introduction to Infinite-Dimensional Linear Systems Theory. Springer-Verlag (2005).
[11] and , Vector measures. With a foreword by B. J. Pettis, Mathematical Surveys, No. 15. American Mathematical Society, Providence, R.I. (1977).
[12] , and , Tensor decomposition for high-dimensional Hamilton-Jacobi-Bellman equations. To appear in: Siam J. Sci. Comput. (2019).
[13] , Distributions and Fourier transforms. Vol. 32 of Pure and Applied Mathematics. Academic Press, New York (1969).
[14] , Functional Analysis. Theory and Applications. Holt, Rinehart and Winston, New York (1965).
[15] and , Semi-Lagrangian approximation schemes for linear and Hamilton-Jacobi equations. Society for Industrial and Applied Mathematics SIAM, Philadelphia, PA (2014).
[16] and , Controlled Markov processes and viscosity solutions. Vol. 25 of Stochastic Modelling and Applied Probability. Springer, New York, second ed. (2006).
[17] and , Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids. J. Sci. Comput. 70 (2017) 1–28.
[18] and , Constrained optimization with low-rank tensors and applications to parametric problems with PDEs. SIAM J. Sci. Comput. 39 (2017) A25–A54.
[19] , , and , Deep residual learning for image recognition. Preprint (2015). | arXiv
[20] , Multilayer feedforward networks are universal approximators. Neural Netw. 2 (1989) 359–366.
[21] and , Polynomial approximation of high-dimensional Hamilton-Jacobi-Bellman equations and applications to feedback control of semilinear parabolic PDEs. SIAM J. Sci. Comput. 40 (2018) A629–A652.
[22] , and , eds., Hamilton-Jacobi-Bellman equations. Vol. 21 of Radon Series on Computational and Applied Mathematics. De Gruyter, Berlin (2018).
[23] , , and , A trust-region algorithm with adaptive stochastic collocation for PDE optimization under uncertainty. SIAM J. Sci. Comput. 35 (2013) A1847–A1879.
[24] , , and , Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6 (1993) 861–867.
[25] and , Fundamentality of ridge functions. J. Approx. Theory 75 (1993) 295–311.
[26] and , Non-homogeneous Boundary Value Problems and Applications. Vol. I/II. Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen. Springer-Verlag, Berlin (1972).
[27] and , Hopf formula and multitime Hamilton-Jacobi equations. Proc. Am. Math. Soc. 96 (1986) 79–84.
[28] , and , Adaptive deep learning for high-dimensional Hamilton-Jacobi-bellman equations (2019).
[29] , , , , and , An algorithmic perspective on imitation learning. Found. Trends Robotics 7 (2018) 1–179.
[30] and , Reinforcement learning of motor skills with policy gradients. Neural Networks 21 (2008) 682–697.
[31] , Approximation theory of the MLP model. Neural Networks 8 (1999) 143–195.
[32] , Submersions and pre-images of sets of measure zero. Sibirsk. Mat. Zh. 28 (1987) 199–210.
[33] , A tour of reinforcement learning: The view from continuous control. Annu. Rev. Control Robotics Auton. Syst. 2 (2018) 253–279.
[34] , Real analysis. The Macmillan Co., New York; Collier-Macmillan Ltd., London (1963).
[35] and , Reinforcement learning: an introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, second ed. (2018).
[36] , and , Nonlinear feedback stabilization of a two-dimensional Burgers equation. ESAIM: COCV 16 (2010) 929–955.
[37] , Topological vector spaces, distributions and kernels. Academic Press, New York-London (1967).
[38] , and , Neural networks in feedback control systems. Mechanical Engineers’ Handbook: Instrumentation, Systems, Controls, and MEMS. Wiley (2015).
[39] , L2-gain and passivity techniques in nonlinear control. Vol. 218 of Lecture Notes in Control and Information Sciences. Springer-Verlag London, Ltd., London (1996).
[40] and , The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6 (2018) 1–12.
Cité par Sources :
Research supported by the ERC advanced grant 668998 (OCLOC) under the EU’s H2020 research program.





