skip to main content
10.5555/1995456.1995518acmconferencesArticle/Chapter ViewAbstractPublication PageswscConference Proceedingsconference-collections
research-article

Reinforcement learning for model building and variance-penalized control

Published: 13 December 2009 Publication History

Abstract

Reinforcement learning (RL) is a simulation-based technique to solve Markov decision problems or processes (MDPs). It is especially useful if the transition probabilities in the MDP are hard to find or if the number of states in the problem is too large. In this paper, we present a new model-based RL algorithm that builds the transition probability model without the generation of the transition probabilities; the literature on model-based RL attempts to compute the transition probabilities. We also present a variance-penalized Bellman equation and an RL algorithm that uses it to solve a variance-penalized MDP. We conclude with some numerical experiments with these algorithms.

References

[1]
Baird, L. C. 1995. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning.
[2]
Barto, A., R. Sutton, and C. Anderson. 1983. Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on SMC 13:835--846.
[3]
Bertsekas, D., and J. Tsitsiklis. 1996. Neuro-dynamic programming. Belmont, MA, USA: Athena Scientific.
[4]
Borkar, V. 2002. Q-learning for risk-sensitive control. Mathematics of Operations Research 27(2):294--311.
[5]
Borkar, V. S. 1997. Stochastic approximation with two-time scales. Systems and Control Letters 29:291--294.
[6]
Filar, J., L. Kallenberg, and H. Lee. 1989. Variance-penalized Markov decision processes. Mathematics of Operations Research 14(1):147--161.
[7]
Geibel, P., and F. Wysotzki. 2005. Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research 24:81--108.
[8]
Gosavi, A. 2003. Simulation-based optimization: Parametric optimization techniques and reinforcement learning. Boston, MA: Kluwer Academic.
[9]
Gosavi, A. 2006. A risk-sensitive approach to total productive maintenance. Automatica 42:1321--1330.
[10]
Gosavi, A. 2007. Adaptive critics for airline revenue management. In Proceedings of 18th Annual Conference of the Production and Operations Management Society, Dallas, TX.
[11]
Gosavi, A., and S. Meyn. 2009. A dynamic programming algorithm for variance-penalized Markov decision process. Working Paper, Missouri University of Science and Technology and University of Illinois.
[12]
Markowitz, H. 1952. Portfolio selection. Journal of Finance 7(1):77--91.
[13]
Rummery, G., and M. Niranjan. 1994. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166. Engineering Department, Cambridge University.
[14]
Sato, M., and S. Kobayashi. 2001. Average-reward reinforcement learning for variance-penalized Markov decision problems. In Proceedings of the 18th International Conference on Machine Learning, 473--480. Morgan Kauffman.
[15]
Sutton, R. 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.
[16]
Sutton, R., and A. G. Barto. 1998. Reinforcement learning: An introduction. Cambridge, MA, USA: The MIT Press.
[17]
Tadepalli, P., and D. Ok. 1998. Model-based Average Reward Reinforcement Learning Algorithms. Artificial Intelligence 100:177--224.
[18]
Tsitsiklis, J. N., and B. V. Roy. 1997. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5):674--690.
[19]
Watkins, C. 1989. Learning from delayed rewards. Ph. D. thesis, Kings College, England.
[20]
Werb�s, P. 1990. A menu of designs for reinforcement learning over time. In Neural Networks for Control, 67--95. MIT Press, MA.
[21]
Werb�s, P. J. 1974, May. Beyond regression: New tools for prediction and analysis of behavioral sciences. Ph. D. thesis, Harvard University, Cambridege, MA, USA.
[22]
Werb�s, P. J. 1987. Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on SMC 17:7--20.
[23]
Williams, R. 1988. On the use of backpropagation in associative reinforcement learning. In Proceedings of the International Conference on Neural Networks, San Diego, CA.
[24]
Witten, I. 1977. An adaptive optimal controller for discrete time Markov environments. Information and Control 34:286--295.

Cited By

View all
  • (2018)Solving Markov decision processes with downside risk adjustmentInternational Journal of Automation and Computing10.1007/s11633-016-1005-313:3(235-245)Online publication date: 17-Dec-2018
  • (2015)A comprehensive survey on safe reinforcement learningThe Journal of Machine Learning Research10.5555/2789272.288679516:1(1437-1480)Online publication date: 1-Jan-2015
  • (2011)Stochastic policy search for variance-penalized semi-Markov controlProceedings of the Winter Simulation Conference10.5555/2431518.2431858(2865-2876)Online publication date: 11-Dec-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSC '09: Winter Simulation Conference
December 2009
3211 pages
ISBN:9781424457717

Sponsors

Publisher

Winter Simulation Conference

Publication History

Published: 13 December 2009

Check for updates

Qualifiers

  • Research-article

Conference

WSC09
Sponsor:
WSC09: Winter Simulation Conference
December 13 - 16, 2009
Texas, Austin

Acceptance Rates

WSC '09 Paper Acceptance Rate 137 of 256 submissions, 54%;
Overall Acceptance Rate 3,413 of 5,075 submissions, 67%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Solving Markov decision processes with downside risk adjustmentInternational Journal of Automation and Computing10.1007/s11633-016-1005-313:3(235-245)Online publication date: 17-Dec-2018
  • (2015)A comprehensive survey on safe reinforcement learningThe Journal of Machine Learning Research10.5555/2789272.288679516:1(1437-1480)Online publication date: 1-Jan-2015
  • (2011)Stochastic policy search for variance-penalized semi-Markov controlProceedings of the Winter Simulation Conference10.5555/2431518.2431858(2865-2876)Online publication date: 11-Dec-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media