skip to main content
10.5555/1018410.1018798acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
Article

Best-Response Multiagent Learning in Non-Stationary Environments

Published: 19 July 2004 Publication History

Abstract

This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the sense of finding a best-response policy, rather than in reaching an equilibrium. We present the first learning algorithm that is provably optimal against restricted classes of non-stationary opponents. The algorithm infers an accurate model of the opponent�s non-stationary strategy, and simultaneously creates a best-response policy against that strategy. Our learning algorithm works within the very general framework of n-player, general-sum stochastic games, and learns both the game structure and its associated optimal policy.

References

[1]
{1} M. Bena, V. Jagannathan, and R. Dodhiawala. On optimal cooperation of knowledge sources -- an empirical investigation. Technical Report BCS-G2010-28, Boeing Advanced Technology Center, Boeing Computer Services, Seattle, Washington, July 1986.
[2]
{2} M. Bowling and M. Veloso. Rational and convergent learning in stochastic games. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence , pages 1021-1026, Seattle, Washington, August 2001.
[3]
{3} C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence , pages 746-752, 1998.
[4]
{4} V. Conitzer and T. Sandholm. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the Twentieth International Conference on Machine Learning, pages 83-90, 2003.
[5]
{5} A. Greenwald and K. Hall. Correlated Q-learning. In Proceedings of the Twentieth International Conference on Machine Learning, pages 242-249, 2003.
[6]
{6} J. Hu. Best-response algorithm for multiagent reinforcement learning. 2003.
[7]
{7} J. Hu and M. Wellman. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4:1039-1069, 2003.
[8]
{8} M. L. Littman. Markov games as a framework for multiagent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning, pages 157-163, New Brunswick, NJ, 1994.
[9]
{9} M. L. Littman. Friend-or-foe: Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 322-328, 2001.
[10]
{10} J. H. Nachbar and W. R. Zame. Non-computable strategies and discounted repeated games. Economic Theory, 8:103- 122, 1996.
[11]
{11} S. Sen and G. Weiss. Learning in multiagent systems. In G. Weiss, editor, Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, chapter 6. MIT Press, Cambridge, MA, 1999.
[12]
{12} Y. Shoham, R. Powers, and T. Grenager. Multi-agent reinforcement learning: a critical survey. Technical report, Computer Science Department, Stanford University, Stanford, 2003.
[13]
{13} P. Stone and M. Veloso. Multiagent systems: A survey from a machine learning perspective. Technical Report CMU-CS- 97-193, School of Computer Science, Carnegie Mellon University, 1997.
[14]
{14} R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
[15]
{15} C. Szepesvri and M. L. Littman. A unified analysis of value-function-based reinforcement learning algorithms. Neural Computation, 11:8:2017-2059, 1999.
[16]
{16} C. J. C. H. Watkins and P. Dayan. Technical note: Q-learning. Machine Learning, 8(3):279-292, 1992.

Cited By

View all
  • (2019)Reaching Cooperation using Emerging Empathy and Counter-empathyProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331764(746-753)Online publication date: 8-May-2019
  • (2017)An exploration strategy for non-stationary opponentsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9347-331:5(971-1002)Online publication date: 1-Sep-2017
  • (2013)Performance of distributed multi-agent multi-state reinforcement spectrum management using different exploration schemesExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.01.03540:10(4115-4126)Online publication date: 1-Aug-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '04: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
July 2004
464 pages
ISBN:1581138644

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 19 July 2004

Check for updates

Qualifiers

  • Article

Conference

AAMAS04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Reaching Cooperation using Emerging Empathy and Counter-empathyProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331764(746-753)Online publication date: 8-May-2019
  • (2017)An exploration strategy for non-stationary opponentsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9347-331:5(971-1002)Online publication date: 1-Sep-2017
  • (2013)Performance of distributed multi-agent multi-state reinforcement spectrum management using different exploration schemesExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.01.03540:10(4115-4126)Online publication date: 1-Aug-2013
  • (2007)Dynamically learning sources of trust informationProceedings of the 6th international joint conference on Autonomous agents and multiagent systems10.1145/1329125.1329325(1-8)Online publication date: 14-May-2007
  • (2006)Learning trust strategies in reputation exchange networksProceedings of the fifth international joint conference on Autonomous agents and multiagent systems10.1145/1160633.1160857(1241-1248)Online publication date: 8-May-2006
  • (2006)Multi-agent reinforcement learning algorithm to handle beliefs of other agents' policies and embedded beliefsProceedings of the fifth international joint conference on Autonomous agents and multiagent systems10.1145/1160633.1160772(789-791)Online publication date: 8-May-2006
  • (2006)Learning from induced changes in opponent (re)actions in multi-agent gamesProceedings of the fifth international joint conference on Autonomous agents and multiagent systems10.1145/1160633.1160763(728-735)Online publication date: 8-May-2006
  • (2006)Learning to negotiate optimally in non-stationary environmentsProceedings of the 10th international conference on Cooperative Information Agents10.1007/11839354_21(288-300)Online publication date: 11-Sep-2006
  • (2005)Cooperative Multi-Agent LearningAutonomous Agents and Multi-Agent Systems10.1007/s10458-005-2631-211:3(387-434)Online publication date: 1-Nov-2005
  • (2005)Unifying convergence and no-regret in multiagent learningProceedings of the First international conference on Learning and Adaption in Multi-Agent Systems10.1007/11691839_5(100-114)Online publication date: 25-Jul-2005
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media