Article

Improving action selection in MDP's via knowledge transfer

Authors:

Alexander A. Sherstov,

Peter StoneAuthors Info & Claims

AAAI'05: Proceedings of the 20th national conference on Artificial intelligence - Volume 2

Pages 1024 - 1029

Published: 09 July 2005 Publication History

Abstract

Temporal-difference reinforcement learning (RL) has been successfully applied in several domains with large state sets. Large action sets, however, have received considerably less attention. This paper demonstrates the use of knowledge transfer between related tasks to accelerate learning with large action sets. We introduce action transfer, a technique that extracts the actions from the (near-)optimal solution to the first task and uses them in place of the full action set when learning any subsequent tasks. When optimal actions make up a small fraction of the domain's action set, action transfer can substantially reduce the number of actions and thus the complexity of the problem. However, action transfer between dissimilar tasks can be detrimental. To address this difficulty, we contribute randomized task perturbation (RTP), an enhancement to action transfer that makes it robust to unrepresentative source tasks. We motivate RTP action transfer with a detailed theoretical analysis featuring a formalism of related tasks and a bound on the suboptimality of action transfer. The empirical results in this paper show the potential of RTP action transfer to substantially expand the applicability of RL to problems with large action sets.

References

[1]

Boutilier, C.; Reiter, R.; and Price, B. 200l. Symbolic dynamic programming for first-order MDPs. In Proc. 17th International Joint Conference on Artificial Intelligence (IJCAl-01), 690-697.

Digital Library

Google Scholar

[2]

Crites, R. H., and Barto, A. G. 1996. Improving elevator performance using reinforcement learning. In Touretzky, D. S.; Mozer, M. C.; and Hasselmo, M. E., eds., Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.

Google Scholar

[3]

Dietterich, T. G. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13:227-303.

Crossref

Google Scholar

[4]

Gaskett, C.; Weltergreen, D.; and Zelinsky, A. 1999. Q-learning in continuous state and action spaces. In Australian Joint Conference on Artificial Intelligence, 417-428.

Digital Library

Google Scholar

[5]

Guestrin, C.; Koller, D.; Gearhart, C.; and Kanodia, N. 2003. Generalizing plans to new environments in relational MDPs. In Proc. 18th International Joint Conference on Artificial Intelligence (IJCAI-03).

Digital Library

Google Scholar

[6]

Hauskrecht, M.; Meuleau, N.; Kaelbling, L. P.; Dean, T.; and Boutilier, C. 1998. Hierarchical solution of Markov decision processes using macro-actions. In Proc. Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98), 220-229.

Digital Library

Google Scholar

[7]

Santamaria, J. C.; Sutton, R. S.; and Ram, A. 1997. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior 6(2):163-217.

Digital Library

Google Scholar

[8]

Stone, P., and Sutton, R. S. 2001. Scaling reinforcement learning toward RoboCup soccer. In Proc. 18th International Conference on Machine Learning (ICML-01), 537-544. Morgan Kaufmann. San Francisco. CA.

Digital Library

Google Scholar

[9]

Sutton, R., and Barto, A. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

Digital Library

Google Scholar

[10]

Tesauro, G. 1994. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2):215-219.

Digital Library

Google Scholar

[11]

Watkins, C. J. C. H. 1989. Learning from Delayed Rewards. Ph.D. Dissertation, Cambridge University.

Google Scholar

Cited By

View all

Da Silva FCosta A(2019)A survey on transfer learning for multiagent reinforcement learning systemsJournal of Artificial Intelligence Research10.1613/jair.1.1139664:1(645-703)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1613/jair.1.11396
Fern�ndez SAler RBorrajo D(2011)Knowledge Transfer between Automated PlannersAI Magazine10.1609/aimag.v32i2.233432:2(79-94)Online publication date: 1-Jun-2011
https://dl.acm.org/doi/10.1609/aimag.v32i2.2334
Matos TBergamo Yda Silva VCosta A(2011)Stochastic abstract policies for knowledge transfer in robotic navigation tasksProceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I10.1007/978-3-642-25324-9_39(454-465)Online publication date: 26-Nov-2011
https://dl.acm.org/doi/10.1007/978-3-642-25324-9_39
Show More Cited By

Recommendations

Action knowledge transfer for action prediction with partial videos
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

Predicting action class from partially observed videos, which is known as action prediction, is an important task in computer vision field with many applications. The challenge for action prediction mainly lies in the lack of discriminative action ...
Weakly-Supervised Action Recognition and Localization via Knowledge Transfer
Pattern Recognition and Computer Vision
Abstract
Action recognition and localization has attracted much attention in the past decade. However, a challenging problem is that it typically requires large-scale temporal annotations of action instances for training models in untrimmed video scenarios,...
Selective knowledge transfer for machine learning

Comments

Information & Contributors

Information

Published In

AAAI'05: Proceedings of the 20th national conference on Artificial intelligence - Volume 2

July 2005

1035 pages

ISBN:157735236x

Editor:
Anthony Cohn
University of Leeds

Publisher

AAAI Press

Publication History

Published: 09 July 2005

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Da Silva FCosta A(2019)A survey on transfer learning for multiagent reinforcement learning systemsJournal of Artificial Intelligence Research10.1613/jair.1.1139664:1(645-703)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1613/jair.1.11396
Fern�ndez SAler RBorrajo D(2011)Knowledge Transfer between Automated PlannersAI Magazine10.1609/aimag.v32i2.233432:2(79-94)Online publication date: 1-Jun-2011
https://dl.acm.org/doi/10.1609/aimag.v32i2.2334
Matos TBergamo Yda Silva VCosta A(2011)Stochastic abstract policies for knowledge transfer in robotic navigation tasksProceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I10.1007/978-3-642-25324-9_39(454-465)Online publication date: 26-Nov-2011
https://dl.acm.org/doi/10.1007/978-3-642-25324-9_39
Waskow SBazzan A(2010)Improving space representation in multiagent learning via tile codingProceedings of the 20th Brazilian conference on Advances in artificial intelligence10.5555/1929622.1929641(153-162)Online publication date: 23-Oct-2010
https://dl.acm.org/doi/10.5555/1929622.1929641
da Silva BMackworth ALuck MSen S(2010)Using spatial hints to improve policy reuse in a reinforcement learning agentProceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 110.5555/1838206.1838251(317-324)Online publication date: 10-May-2010
https://dl.acm.org/doi/10.5555/1838206.1838251
Walsh TSzita IDiuk CLittman MMcAllester D(2009)Exploring compact reinforcement-learning representations with linear regressionProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence10.5555/1795114.1795183(591-598)Online publication date: 18-Jun-2009
https://dl.acm.org/doi/10.5555/1795114.1795183
Taylor MStone P(2009)Transfer Learning for Reinforcement Learning Domains: A SurveyThe Journal of Machine Learning Research10.5555/1577069.175583910(1633-1685)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/1577069.1755839
Mahadevan SOsentoski SJohns JFerguson KWang C(2007)Learning to plan using harmonic analysis of diffusion modelsProceedings of the Seventeenth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037176.3037206(224-231)Online publication date: 22-Sep-2007
https://dl.acm.org/doi/10.5555/3037176.3037206
Fern�ndez FVeloso M(2006)Reusing and building a policy libraryProceedings of the Sixteenth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037104.3037159(378-381)Online publication date: 6-Jun-2006
https://dl.acm.org/doi/10.5555/3037104.3037159
Fern�ndez FVeloso MNakashima HWellman MWeiss GStone P(2006)Probabilistic policy reuse in a reinforcement learning agentProceedings of the fifth international joint conference on Autonomous agents and multiagent systems10.1145/1160633.1160762(720-727)Online publication date: 8-May-2006
https://dl.acm.org/doi/10.1145/1160633.1160762
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Action knowledge transfer for action prediction with partial videos

Weakly-Supervised Action Recognition and Localization via Knowledge Transfer

Selective knowledge transfer for machine learning