Abstract
We consider finite Markov decision processes with undiscounted total effective payoff. We show that there exist uniformly optimal pure and stationary strategies that can be computed by solving a polynomial number of linear programs. This implies that in a two-player zero-sum stochastic game with perfect information and with total effective payoff there exists a stationary best response to any stationary strategy of the opponent. From this, we derive the existence of a uniformly optimal pure and stationary saddle point. Finally we show that mean payoff can be viewed as a special case of total payoff.
Original language | British English |
---|---|
Pages (from-to) | 1-29 |
Number of pages | 29 |
Journal | Annals of Operations Research |
DOIs | |
State | Accepted/In press - 28 May 2018 |
Keywords
- Equilibrium computation
- Linear programming
- Markov decision processes
- Optimal stationary strategies
- Stochastic games