A policy iteration algorithm for zero-sum stochastic games with mean payoff

Cochet-Terrasson, Jean; Gaubert, Stéphane

doi:10.1016/j.crma.2006.07.011

Game Theory/Complex Analysis

A policy iteration algorithm for zero-sum stochastic games with mean payoff
[Algorithme d'itération sur les politiques pour les jeux stochastiques à somme nulle avec gain moyen]

Présenté par : Pierre-Louis Lions

Cochet-Terrasson, Jean ¹ ; Gaubert, Stéphane ²

¹ CGA, 14, rue Saint Dominique, 75007 Paris, France
² INRIA, domaine de Voluceau, B.P. 105, 78153 Le Chesnay cedex, France

Comptes Rendus. Mathématique, Tome 343 (2006) no. 5, pp. 377-382.

Résumé
Abstract

Nous donnons un algorithme d'itération sur les politiques pour résoudre les jeux stochastiques à somme nulle, avec espaces d'état et d'action finis, en information parfaite, lorsque la valeur du jeu est définie en termes de gain moyen par tour. Cet algorithme ne demande pas que les chaînes de Markov déterminées par les stratégies des deux joueurs soient irréductibles. Il repose sur un analogue discret non-linéaire de la notion de réduite d'une fonction surharmonique.

We give a policy iteration algorithm to solve zero-sum stochastic games with finite state and action spaces and perfect information, when the value is defined in terms of the mean payoff per turn. This algorithm does not require any irreducibility assumption on the Markov chains determined by the strategies of the players. It is based on a discrete nonlinear analogue of the notion of reduction of a super-harmonic function.

Reçu le : 2006-07-03
Accepté le : 2006-07-12
Publié le : 2006-08-10

DOI : 10.1016/j.crma.2006.07.011

Affiliations des auteurs :

Cochet-Terrasson, Jean ¹ ; Gaubert, Stéphane ²

¹ CGA, 14, rue Saint Dominique, 75007 Paris, France
² INRIA, domaine de Voluceau, B.P. 105, 78153 Le Chesnay cedex, France

@article{CRMATH_2006__343_5_377_0,
     author = {Cochet-Terrasson, Jean and Gaubert, St\'ephane},
     title = {A policy iteration algorithm for zero-sum stochastic games with mean payoff},
     journal = {Comptes Rendus. Math\'ematique},
     pages = {377--382},
     publisher = {Elsevier},
     volume = {343},
     number = {5},
     year = {2006},
     doi = {10.1016/j.crma.2006.07.011},
     language = {en},
     url = {http://www.numdam.org/articles/10.1016/j.crma.2006.07.011/}
}

TY  - JOUR
AU  - Cochet-Terrasson, Jean
AU  - Gaubert, Stéphane
TI  - A policy iteration algorithm for zero-sum stochastic games with mean payoff
JO  - Comptes Rendus. Mathématique
PY  - 2006
SP  - 377
EP  - 382
VL  - 343
IS  - 5
PB  - Elsevier
UR  - http://www.numdam.org/articles/10.1016/j.crma.2006.07.011/
DO  - 10.1016/j.crma.2006.07.011
LA  - en
ID  - CRMATH_2006__343_5_377_0
ER  -

%0 Journal Article
%A Cochet-Terrasson, Jean
%A Gaubert, Stéphane
%T A policy iteration algorithm for zero-sum stochastic games with mean payoff
%J Comptes Rendus. Mathématique
%D 2006
%P 377-382
%V 343
%N 5
%I Elsevier
%U http://www.numdam.org/articles/10.1016/j.crma.2006.07.011/
%R 10.1016/j.crma.2006.07.011
%G en
%F CRMATH_2006__343_5_377_0

Cochet-Terrasson, Jean; Gaubert, Stéphane. A policy iteration algorithm for zero-sum stochastic games with mean payoff. Comptes Rendus. Mathématique, Tome 343 (2006) no. 5, pp. 377-382. doi : 10.1016/j.crma.2006.07.011. http://www.numdam.org/articles/10.1016/j.crma.2006.07.011/

Bibliographie
Cité par

[1] Akian, M.; Gaubert, S. Spectral theorem for convex monotone homogeneous maps, and ergodic control, Nonlinear Anal., Volume 52 (2003) no. 2, pp. 637-679

[2] H. Bjorklund, S. Sandberg, S. Vorobyov, A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games, Technical Report 05, DIMACS, 2004

[3] J. Cochet-Terrasson, Algorithmes d'itération sur les politiques pour les applications monotones contractantes, Thèse, École des Mines de Paris, 2001

[4] Cochet-Terrasson, J.; Gaubert, S.; Gunawardena, J. A constructive fixed point theorem for min–max functions, Dynamics and Stability of Systems, Volume 14 (1999) no. 4, pp. 407-433

[5] Condon, A. The complexity of stochastic games, Inform. Comput., Volume 96 (1992) no. 2, pp. 203-224

[6] Denardo, E.V.; Fox, B.L. Multichain Markov renewal programs, SIAM J. Appl. Math., Volume 16 (1968), pp. 468-487

[7] Gaubert, S.; Gunawardena, J. The duality theorem for min–max functions, C. R. Acad. Sci. Paris, Ser. I, Volume 326 (1998) no. 1, pp. 43-48

[8] Gaubert, S.; Gunawardena, J. The Perron–Frobenius theorem for homogeneous, monotone functions, Trans. Amer. Math. Soc., Volume 356 (2004) no. 12, pp. 4931-4950

[9] Hoffman, A.J.; Karp, R.M. On nonterminating stochastic games, Management Sci., Volume 12 (1966) no. 5, pp. 359-370

[10] Howard, R. Dynamic Programming and Markov Processes, Wiley, 1960

[11] M. Jurdziński, M. Paterson, U. Zwick, A deterministic subexponential algorithm for solving parity games, in: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA 2006), January 2006

[12] Kohlberg, E. Invariant half-lines of nonexpansive piecewise-linear transformations, Math. Oper. Res., Volume 5 (1980) no. 3, pp. 366-372

[13] Lazarus, A.J.; Loeb, D.E.; Propp, J.G.; Stromquist, W.R.; Ullman, D.H. Combinatorial games under auction play, Games Econom. Behav., Volume 27 (1999) no. 2, pp. 229-264

[14] Peres, Y.; Schramm, O.; Sheffield, S.; Wilson, D. Tug-of-war and the infinity Laplacian, 2006 (arXiv:) | arXiv

Cité par Sources :