skip to main content
10.1109/LICS.2013.39acmconferencesArticle/Chapter ViewAbstractPublication PageslicsConference Proceedingsconference-collections
Article

Trading Performance for Stability in Markov Decision Processes

Published: 25 June 2013 Publication History

Abstract

We study the complexity of central controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize both the expected mean-payoff performance of the system and its stability. e argue that the basic theoretical notion of expressing the stability in terms of the variance of the mean-payoff (called global variance in our paper) is not always sufficient, since it ignores possible instabilities on respective runs. For this reason we propose alernative definitions of stability, which we call local and hybrid variance, and which express how rewards on each run deviate from the run's own mean-payoff and from the expected mean-payoff, respectively. We show that a strategy ensuring both the expected mean-payoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We then look at the problem of determining whether there is a such a strategy. For the global variance, we show that the problem is in PSPACE, and that the answer can be approximated in pseudo-polynomial time. For the hybrid variance, the analogous decision problem is in NP, and a polynomial-time approximating algorithm also exists. For local variance, we show that the decision problem is in NP. Since the overall performance can be traded for stability (and vice versa), we also present algorithms for approximating the associated Pareto curve in all the three cases. Finally, we study a special case of the decision problems, where we require a given expected mean-payoff together with zero variance. Here we show that the problems can be all solved in polynomial time.

References

[1]
E. Altman. Constrained Markov Decision Processes (Stochastic Modeling). Chapman & Hall/CRC, 1999.
[2]
T. Br�zdil, V. Bro�ek, K. Chatterjee, V. Forejt, and A. Ku�era. Two views on multiple mean-payoff objectives in Markov decision processes. In Proceedings of LICS 2011. IEEE, 2011.
[3]
T. Br�zdil, K. Chatterjee, V. Forejt, and A. Ku�era. Trading performance for stability in Markov decision processes. Available at arXiv.org, 2013.
[4]
J. Canny. Some algebraic and geometric computations in PSPACE. In Proceedings of STOC'88, pages 460-467. ACM Press, 1988.
[5]
K. Chatterjee and M. Henzinger. Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In SODA, pages 1318-1336. SIAM, 2011.
[6]
K. Chatterjee and M. Henzinger. An O(n2) time algorithm for alternating B?uchi games. In SODA, pages 1386-1399. SIAM, 2012.
[7]
K. Chatterjee, R. Majumdar, and T. Henzinger. Markov decision processes with multiple objectives. In Proceedings of STACS 2006, volume 3884 of LNCS, pages 325-336. Springer, 2006.
[8]
K-J. Chung. Mean-variance tradeoffs in an undiscounted MDP: The unichain case. Operations Research, 42:184-188, 1994.
[9]
C. Courcoubetis and M. Yannakakis. Markov decision processes and regular events. IEEE Transactions on Automatic Control, 43(10):1399- 1418, 1998.
[10]
K. Etessami, M. Kwiatkowska, M. Vardi, and M. Yannakakis. Multiobjective model checking of Markov decision processes. Logical Methods in Computer Science, 4(4):1-21, 2008.
[11]
J. A. Filar, L.C.M. Kallenberg, and H-M. Lee. Variance-penalize Markov decision processes. Math. of Oper. Research, 14:147-161, 1989.
[12]
V. Forejt, M. Kwiatkowska, and D. Parker. Pareto curves for probabilistic model checking. In Proc. of ATVA'12, volume 7561 of LNCS, pages 317-332. Springer, 2012.
[13]
S. Mannor and J. Tsitsiklis. Mean-variance optimization in Markov decision processes. In Proceedings of ICML-11, pages 177-184, New York, NY, USA, June 2011. ACM.
[14]
J.R. Norris. Markov Chains. Cambridge University Press, 1998.
[15]
M.L. Puterman. Markov Decision Processes. Wiley, 1994.
[16]
H. L. Royden. Real analysis. Macmillan, New York, 3rd edition, 1988.
[17]
M. J. Sobel. The variance of discounted MDP's. Journal of Applied Probability, 19:794-802, 1982.
[18]
M. J. Sobel. Mean-variance tradeoffs in an undiscounted MDP. Operations Research, 42:175-183, 1994.
[19]
S. A. Vavasis. Quadratic programming is in NP. Information Processing Letters, 36(2):73 - 77, 1990.
[20]
S. A. Vavasis. Approximation algorithms for indefinite quadratic programming. Math. Program., 57(2):279-311, November 1992.

Cited By

View all
  • (2018)Conditional Value-at-Risk for Reachability and Mean Payoff in Markov Decision ProcessesProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3209108.3209176(609-618)Online publication date: 9-Jul-2018
  • (2017)Trading performance for stability in Markov decision processesJournal of Computer and System Sciences10.1016/j.jcss.2016.09.00984:C(144-170)Online publication date: 1-Mar-2017
  • (2015)Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision ProcessesProceedings of the 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS.2015.32(244-256)Online publication date: 6-Jul-2015
  • Show More Cited By
  1. Trading Performance for Stability in Markov Decision Processes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LICS '13: Proceedings of the 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science
    June 2013
    597 pages
    ISBN:9780769550206

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 25 June 2013

    Check for updates

    Author Tags

    1. Markov decision processes
    2. mean payoff

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 215 of 622 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Conditional Value-at-Risk for Reachability and Mean Payoff in Markov Decision ProcessesProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3209108.3209176(609-618)Online publication date: 9-Jul-2018
    • (2017)Trading performance for stability in Markov decision processesJournal of Computer and System Sciences10.1016/j.jcss.2016.09.00984:C(144-170)Online publication date: 1-Mar-2017
    • (2015)Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision ProcessesProceedings of the 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS.2015.32(244-256)Online publication date: 6-Jul-2015
    • (2014)Efficient and Dynamic Algorithms for Alternating Büchi Games and Maximal End-Component DecompositionJournal of the ACM10.1145/259763161:3(1-40)Online publication date: 2-Jun-2014

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media