Article

Best-Response Multiagent Learning in Non-Stationary Environments

Authors:

Michael Weinberg,

Jeffrey S. RosenscheinAuthors Info & Claims

AAMAS '04: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2

Pages 506 - 513

Published: 19 July 2004 Publication History

Abstract

This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the sense of finding a best-response policy, rather than in reaching an equilibrium. We present the first learning algorithm that is provably optimal against restricted classes of non-stationary opponents. The algorithm infers an accurate model of the opponent�s non-stationary strategy, and simultaneously creates a best-response policy against that strategy. Our learning algorithm works within the very general framework of n-player, general-sum stochastic games, and learns both the game structure and its associated optimal policy.

References

[1]

{1} M. Bena, V. Jagannathan, and R. Dodhiawala. On optimal cooperation of knowledge sources -- an empirical investigation. Technical Report BCS-G2010-28, Boeing Advanced Technology Center, Boeing Computer Services, Seattle, Washington, July 1986.

[2]

{2} M. Bowling and M. Veloso. Rational and convergent learning in stochastic games. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence , pages 1021-1026, Seattle, Washington, August 2001.

Digital Library

[3]

{3} C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence , pages 746-752, 1998.

Digital Library

[4]

{4} V. Conitzer and T. Sandholm. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the Twentieth International Conference on Machine Learning, pages 83-90, 2003.

[5]

{5} A. Greenwald and K. Hall. Correlated Q-learning. In Proceedings of the Twentieth International Conference on Machine Learning, pages 242-249, 2003.

[6]

{6} J. Hu. Best-response algorithm for multiagent reinforcement learning. 2003.

[7]

{7} J. Hu and M. Wellman. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4:1039-1069, 2003.

[8]

{8} M. L. Littman. Markov games as a framework for multiagent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning, pages 157-163, New Brunswick, NJ, 1994.

[9]

{9} M. L. Littman. Friend-or-foe: Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 322-328, 2001.

Digital Library

[10]

{10} J. H. Nachbar and W. R. Zame. Non-computable strategies and discounted repeated games. Economic Theory, 8:103- 122, 1996.

[11]

{11} S. Sen and G. Weiss. Learning in multiagent systems. In G. Weiss, editor, Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, chapter 6. MIT Press, Cambridge, MA, 1999.

Digital Library

[12]

{12} Y. Shoham, R. Powers, and T. Grenager. Multi-agent reinforcement learning: a critical survey. Technical report, Computer Science Department, Stanford University, Stanford, 2003.

[13]

{13} P. Stone and M. Veloso. Multiagent systems: A survey from a machine learning perspective. Technical Report CMU-CS- 97-193, School of Computer Science, Carnegie Mellon University, 1997.

[14]

{14} R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.

Digital Library

[15]

{15} C. Szepesvri and M. L. Littman. A unified analysis of value-function-based reinforcement learning algorithms. Neural Computation, 11:8:2017-2059, 1999.

Digital Library

[16]

{16} C. J. C. H. Watkins and P. Dayan. Technical note: Q-learning. Machine Learning, 8(3):279-292, 1992.

Digital Library

Cited By

Chen JWang CElkind EVeloso MAgmon NTaylor M(2019)Reaching Cooperation using Emerging Empathy and Counter-empathyProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331764(746-753)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331764
Hernandez-Leal PZhan YTaylor MSucar LMunoz De Cote E(2017)An exploration strategy for non-stationary opponentsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9347-331:5(971-1002)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s10458-016-9347-3
Ko ASabourin RGagnon F(2013)Performance of distributed multi-agent multi-state reinforcement spectrum management using different exploration schemesExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.01.03540:10(4115-4126)Online publication date: 1-Aug-2013
https://dl.acm.org/doi/10.1016/j.eswa.2013.01.035
Show More Cited By

Index Terms

Best-Response Multiagent Learning in Non-Stationary Environments
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems
    2. Planning and scheduling
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Markov decision processes
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

A multiagent reinforcement learning algorithm using extended optimal response
AAMAS '02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1

Stochastic games provides a theoretical framework to multiagent reinforcement learning. Based on the framework, a multiagent reinforcement learning algorithm for zero-sum stochastic games was proposed by Littman and it was extended to general-sum games ...
A layered approach to learning coordination knowledge in multiagent environments

Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals. Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal ...
Multiagent reactive plan application learning in dynamic environments
Proceedings of the 15th WSEAS international conference on Computers

With bottom-up learning approaches such as reinforcement learning (RL), a team of agents can only learn emergent policies. However it may also be desirable to constrain policy search from top-down so that a team can learn more explicit policies in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '04: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2

July 2004

464 pages

ISBN:1581138644

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

IEEE Computer Society

United States

Publication History

Published: 19 July 2004

Check for updates

Qualifiers

Article

Conference

AAMAS04

Sponsor:

SIGAI

AAMAS04: The Third International Joint Conference on Autonomous Agents and Multi-Agent Systems 2004

July 19 - 23, 2004

New York, New York

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
74
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen JWang CElkind EVeloso MAgmon NTaylor M(2019)Reaching Cooperation using Emerging Empathy and Counter-empathyProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3331764(746-753)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3331764
Hernandez-Leal PZhan YTaylor MSucar LMunoz De Cote E(2017)An exploration strategy for non-stationary opponentsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9347-331:5(971-1002)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s10458-016-9347-3
Ko ASabourin RGagnon F(2013)Performance of distributed multi-agent multi-state reinforcement spectrum management using different exploration schemesExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.01.03540:10(4115-4126)Online publication date: 1-Aug-2013
https://dl.acm.org/doi/10.1016/j.eswa.2013.01.035
Fullam KBarber KDurfee EYokoo MHuhns MShehory O(2007)Dynamically learning sources of trust informationProceedings of the 6th international joint conference on Autonomous agents and multiagent systems10.1145/1329125.1329325(1-8)Online publication date: 14-May-2007
https://dl.acm.org/doi/10.1145/1329125.1329325
Fullam KBarber KNakashima HWellman MWeiss GStone P(2006)Learning trust strategies in reputation exchange networksProceedings of the fifth international joint conference on Autonomous agents and multiagent systems10.1145/1160633.1160857(1241-1248)Online publication date: 8-May-2006
https://dl.acm.org/doi/10.1145/1160633.1160857
Makino TAihara KNakashima HWellman MWeiss GStone P(2006)Multi-agent reinforcement learning algorithm to handle beliefs of other agents' policies and embedded beliefsProceedings of the fifth international joint conference on Autonomous agents and multiagent systems10.1145/1160633.1160772(789-791)Online publication date: 8-May-2006
https://dl.acm.org/doi/10.1145/1160633.1160772
't Hoen PBohte SLa Poutr� JNakashima HWellman MWeiss GStone P(2006)Learning from induced changes in opponent (re)actions in multi-agent gamesProceedings of the fifth international joint conference on Autonomous agents and multiagent systems10.1145/1160633.1160763(728-735)Online publication date: 8-May-2006
https://dl.acm.org/doi/10.1145/1160633.1160763
Narayanan VJennings N(2006)Learning to negotiate optimally in non-stationary environmentsProceedings of the 10th international conference on Cooperative Information Agents10.1007/11839354_21(288-300)Online publication date: 11-Sep-2006
https://dl.acm.org/doi/10.1007/11839354_21
Panait LLuke S(2005)Cooperative Multi-Agent LearningAutonomous Agents and Multi-Agent Systems10.1007/s10458-005-2631-211:3(387-434)Online publication date: 1-Nov-2005
https://dl.acm.org/doi/10.1007/s10458-005-2631-2
Banerjee BPeng J(2005)Unifying convergence and no-regret in multiagent learningProceedings of the First international conference on Learning and Adaption in Multi-Agent Systems10.1007/11691839_5(100-114)Online publication date: 25-Jul-2005
https://dl.acm.org/doi/10.1007/11691839_5
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents