skip to main content
article

Markov Decision Processes with Sample Path Constraints: The Communicating Case

Published: 01 October 1989 Publication History

Abstract

We consider time-average Markov Decision Processes MDPs, which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a specified value with probability one. The optimization problem is to maximize the expected average reward over all policies that meet the sample-path constraint. The sample-path constraint is compared with the more commonly studied constraint of requiring the average expected cost to be less than a specified value. Although the two criteria are equivalent for certain classes of MDPs, their feasible and optimal policies differ for many nontrivial problems. In general, there does not exist optimal or nearly optimal stationary policies when the expected average-cost constraint is employed. Assuming that a policy exists that meets the sample-path constraint, we establish that there exist nearly optimal stationary policies for communicating MDPs. A parametric linear programming algorithm is given to construct nearly optimal stationary policies. The discussion relies on well known results from the theory of stochastic processes and linear programming. The techniques lead to simple proofs of the existence of optimal and nearly optimal stationary policies for unichain and deterministic MDPs, respectively.

References

[1]
BATHER, J. 1973. Optimal Decision Procedures for Finite Markov Chains. Part III: General Convex Systems. Adv. Appl. Prob. 5, 541-553.
[2]
BEUTLER, F. J., AND K. W. Ross. 1985. Optimal Policies for Controlled Markov Chains With a Constraint. J. Math. Anal. Appl. 112, 236-252.
[3]
BEUTLER, F. J., AND K. W. Ross. 1986. Time-Average Optimal Constrained Semi-Markov Decision Processes. Adv. Appl. Prob. 18, 341-359.
[4]
BILLINGSLY, P. 1979. Probability and Measure. John Wiley & Sons, New York.
[5]
CINLAR, E. 1975. Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs, N.J.
[6]
DERMAN, C. 1970. Finite State Markovian Decision Processes. Academic Press, New York.
[7]
GOLABI, K., R. B. KULKARNI AND C. B. WAY. 1982. A Statewide Pavement Management System. Interfaces 12, 6, 5-21.
[8]
GONDRAN, M., AND M. MINOUX. 1984. Graphs and Algorithms. John Wiley & Sons, New York.
[9]
HORDIJK, A., AND L. C. M. KALLENBERG. 1984. Constrained Undiscounted Dynamic Programming. MOR 9, 276-289.
[10]
KALLENBERG, L. C. M. 1983. Linear Programming and Finite Markovian Control Problems. Mathematical Centre Tracts 148, Amsterdam.
[11]
KOLESAR, P. 1970. A Markovian Model for Hospital Admission Scheduling. Mgmt. Sci. 16, 384- 396.
[12]
LOEVE, M. 1978. Probability Theory, Vol. 2 (4th ed.). Springer-Verlag, New York.
[13]
MAGLARIS, B., AND M. SCHWARTZ. 1982. Optimal Fixed Frame Multiplexing in Integrated Line- and Packet-Switched Communication Networks. IEEE Trans. Inform. Theory IT-28, 263-273.
[14]
MURTY, K. G. 1983. Linear Programming. John Wiley & Sons, New York.
[15]
NAIN, P., AND K. W. Ross. 1986. Optimal Multiplexing of Heterogeneous Traffic With Hard Constraint. Perf. Eval. Rev. 14.
[16]
Ross, K. W. 1989. Randomized and Past-Dependent Policies for Markov Decision Processes With Multiple Constraints. Opns. Res. 37, 474-477.
[17]
Ross, K. W., AND R. VARADARAJAN. 1986. The Decomposition of Time-Average MDPs: Theory, Algorithms and Applications. Technical Report, Department of Systems, University of Pennsylvania, Philadelphia.

Cited By

View all
  • (2023)Safety-GymnasiumProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666953(18964-18993)Online publication date: 10-Dec-2023
  • (2021)Constrained Multiagent Markov Decision ProcessesJournal of Artificial Intelligence Research10.1613/jair.1.1223370(955-1001)Online publication date: 1-May-2021
  • (2020)First order constrained optimization in policy spaceProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497010(15338-15349)Online publication date: 6-Dec-2020
  • Show More Cited By
  1. Markov Decision Processes with Sample Path Constraints: The Communicating Case

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Operations Research
    Operations Research  Volume 37, Issue 5
    October 1989
    163 pages

    Publisher

    INFORMS

    Linthicum, MD, United States

    Publication History

    Published: 01 October 1989

    Author Tags

    1. Markov finite state: time-average case
    2. Markov processes: sample-path constraints
    3. dynamic programming
    4. probability

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Safety-GymnasiumProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666953(18964-18993)Online publication date: 10-Dec-2023
    • (2021)Constrained Multiagent Markov Decision ProcessesJournal of Artificial Intelligence Research10.1613/jair.1.1223370(955-1001)Online publication date: 1-May-2021
    • (2020)First order constrained optimization in policy spaceProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497010(15338-15349)Online publication date: 6-Dec-2020
    • (2014)Multi-period marketing-mix optimization with response spike forecastingIBM Journal of Research and Development10.1147/JRD.2014.233713158:5-6(1:1-1:13)Online publication date: 1-Sep-2014
    • (2012)Cost-sensitive exploration in Bayesian reinforcement learningProceedings of the 25th International Conference on Neural Information Processing Systems - Volume 210.5555/2999325.2999477(3068-3076)Online publication date: 3-Dec-2012
    • (2012)Time-consistency of optimization problemsProceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence10.5555/2900929.2901003(1945-1951)Online publication date: 22-Jul-2012
    • (2011)Point-based value iteration for constrained POMDPsProceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three10.5555/2283696.2283730(1968-1974)Online publication date: 16-Jul-2011
    • (2007)Semi-markov decision processesProbability in the Engineering and Informational Sciences10.1017/S026996480700037X21:4(635-657)Online publication date: 1-Oct-2007
    • (2006)Resource allocation among agents with MDP-induced preferencesJournal of Artificial Intelligence Research10.5555/1622572.162258727:1(505-549)Online publication date: 1-Dec-2006
    • (2006)A note on two-person zero-sum communicating stochastic gamesOperations Research Letters10.1016/j.orl.2005.07.00834:4(412-420)Online publication date: 1-Jul-2006
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media