Article

Trading Performance for Stability in Markov Decision Processes

Authors:

Krishnendu Chatterjee,

Vojtech Forejt,

Antonin KuceraAuthors Info & Claims

LICS '13: Proceedings of the 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science

Pages 331 - 340

https://doi.org/10.1109/LICS.2013.39

Published: 25 June 2013 Publication History

Abstract

We study the complexity of central controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize both the expected mean-payoff performance of the system and its stability. e argue that the basic theoretical notion of expressing the stability in terms of the variance of the mean-payoff (called global variance in our paper) is not always sufficient, since it ignores possible instabilities on respective runs. For this reason we propose alernative definitions of stability, which we call local and hybrid variance, and which express how rewards on each run deviate from the run's own mean-payoff and from the expected mean-payoff, respectively. We show that a strategy ensuring both the expected mean-payoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We then look at the problem of determining whether there is a such a strategy. For the global variance, we show that the problem is in PSPACE, and that the answer can be approximated in pseudo-polynomial time. For the hybrid variance, the analogous decision problem is in NP, and a polynomial-time approximating algorithm also exists. For local variance, we show that the decision problem is in NP. Since the overall performance can be traded for stability (and vice versa), we also present algorithms for approximating the associated Pareto curve in all the three cases. Finally, we study a special case of the decision problems, where we require a given expected mean-payoff together with zero variance. Here we show that the problems can be all solved in polynomial time.

References

[1]

E. Altman. Constrained Markov Decision Processes (Stochastic Modeling). Chapman & Hall/CRC, 1999.

[2]

T. Br�zdil, V. Bro�ek, K. Chatterjee, V. Forejt, and A. Ku�era. Two views on multiple mean-payoff objectives in Markov decision processes. In Proceedings of LICS 2011. IEEE, 2011.

Digital Library

[3]

T. Br�zdil, K. Chatterjee, V. Forejt, and A. Ku�era. Trading performance for stability in Markov decision processes. Available at arXiv.org, 2013.

[4]

J. Canny. Some algebraic and geometric computations in PSPACE. In Proceedings of STOC'88, pages 460-467. ACM Press, 1988.

Digital Library

[5]

K. Chatterjee and M. Henzinger. Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In SODA, pages 1318-1336. SIAM, 2011.

Digital Library

[6]

K. Chatterjee and M. Henzinger. An O(n²) time algorithm for alternating B?uchi games. In SODA, pages 1386-1399. SIAM, 2012.

Digital Library

[7]

K. Chatterjee, R. Majumdar, and T. Henzinger. Markov decision processes with multiple objectives. In Proceedings of STACS 2006, volume 3884 of LNCS, pages 325-336. Springer, 2006.

Digital Library

[8]

K-J. Chung. Mean-variance tradeoffs in an undiscounted MDP: The unichain case. Operations Research, 42:184-188, 1994.

Digital Library

[9]

C. Courcoubetis and M. Yannakakis. Markov decision processes and regular events. IEEE Transactions on Automatic Control, 43(10):1399- 1418, 1998.

[10]

K. Etessami, M. Kwiatkowska, M. Vardi, and M. Yannakakis. Multiobjective model checking of Markov decision processes. Logical Methods in Computer Science, 4(4):1-21, 2008.

[11]

J. A. Filar, L.C.M. Kallenberg, and H-M. Lee. Variance-penalize Markov decision processes. Math. of Oper. Research, 14:147-161, 1989.

Digital Library

[12]

V. Forejt, M. Kwiatkowska, and D. Parker. Pareto curves for probabilistic model checking. In Proc. of ATVA'12, volume 7561 of LNCS, pages 317-332. Springer, 2012.

Digital Library

[13]

S. Mannor and J. Tsitsiklis. Mean-variance optimization in Markov decision processes. In Proceedings of ICML-11, pages 177-184, New York, NY, USA, June 2011. ACM.

[14]

J.R. Norris. Markov Chains. Cambridge University Press, 1998.

[15]

M.L. Puterman. Markov Decision Processes. Wiley, 1994.

[16]

H. L. Royden. Real analysis. Macmillan, New York, 3rd edition, 1988.

[17]

M. J. Sobel. The variance of discounted MDP's. Journal of Applied Probability, 19:794-802, 1982.

[18]

M. J. Sobel. Mean-variance tradeoffs in an undiscounted MDP. Operations Research, 42:175-183, 1994.

Digital Library

[19]

S. A. Vavasis. Quadratic programming is in NP. Information Processing Letters, 36(2):73 - 77, 1990.

Digital Library

[20]

S. A. Vavasis. Approximation algorithms for indefinite quadratic programming. Math. Program., 57(2):279-311, November 1992.

Digital Library

Cited By

Křetínský JMeggendorfer T(2018)Conditional Value-at-Risk for Reachability and Mean Payoff in Markov Decision ProcessesProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3209108.3209176(609-618)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.1145/3209108.3209176
Brázdil TChatterjee KForejt VKučera A(2017)Trading performance for stability in Markov decision processesJournal of Computer and System Sciences10.1016/j.jcss.2016.09.00984:C(144-170)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.jcss.2016.09.009
Chatterjee KKomarkova ZKretinsky J(2015)Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision ProcessesProceedings of the 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS.2015.32(244-256)Online publication date: 6-Jul-2015
https://dl.acm.org/doi/10.1109/LICS.2015.32
Show More Cited By

Trading Performance for Stability in Markov Decision Processes
1. Theory of computation

Recommendations

The Complexity of Markov Decision Processes

We investigate the complexity of the classical problem of optimal policy computation in Markov decision processes. All three variants of the problem finite horizon, infinite horizon discounted, and infinite horizon average cost were known to be solvable ...
Variability Sensitive Markov Decision Processes

Considered are time-average Markov Decision Processes MDPs with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in ...
Variance-Penalized Markov Decision Processes

We consider a Markov decision process with both the expected limiting average, and the discounted total return criteria, appropriately modified to include a penalty for the variability in the stream of rewards. In both cases we formulate appropriate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LICS '13: Proceedings of the 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science

June 2013

597 pages

ISBN:9780769550206

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

IEEE Computer Society

United States

Publication History

Published: 25 June 2013

Check for updates

Author Tags

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 215 of 622 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
42
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Křetínský JMeggendorfer T(2018)Conditional Value-at-Risk for Reachability and Mean Payoff in Markov Decision ProcessesProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3209108.3209176(609-618)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.1145/3209108.3209176
Brázdil TChatterjee KForejt VKučera A(2017)Trading performance for stability in Markov decision processesJournal of Computer and System Sciences10.1016/j.jcss.2016.09.00984:C(144-170)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.jcss.2016.09.009
Chatterjee KKomarkova ZKretinsky J(2015)Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision ProcessesProceedings of the 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS.2015.32(244-256)Online publication date: 6-Jul-2015
https://dl.acm.org/doi/10.1109/LICS.2015.32
Chatterjee KHenzinger M(2014)Efficient and Dynamic Algorithms for Alternating Büchi Games and Maximal End-Component DecompositionJournal of the ACM10.1145/259763161:3(1-40)Online publication date: 2-Jun-2014
https://dl.acm.org/doi/10.1145/2597631

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents