research-article

Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

Authors:

Zhirong WangAuthors Info & Claims

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

Pages 210 - 219

https://doi.org/10.1145/3383313.3412233

Published: 22 September 2020 Publication History

Abstract

With the rise of online e-commerce platforms, more and more customers prefer to shop online. To sell more products, online platforms introduce various modules to recommend items with different properties such as huge discounts. A web page often consists of different independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which might result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. In this paper, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that different modules cannot communicate. Our contributions are three-fold. Firstly, inspired by a solution concept in game theory named correlated equilibrium, we design a signal network to promote cooperation of all modules by generating signals (vectors) for different modules. Secondly, an entropy-regularized version of the signal network is proposed to coordinate agents’ exploration of the optimal global policy. Furthermore, experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.

References

[1]

Robert J Aumann. 1974. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1, 1 (1974), 67–96.

[2]

Sungwoon Choi, Heonseok Ha, Uiwon Hwang, Chanju Kim, Jung-Woo Ha, and Sungroh Yoon. 2018. Reinforcement learning based recommender system using biclustering technique. arXiv preprint arXiv:1801.05532(2018).

[3]

Jun Feng, Heng Li, Minlie Huang, Shichen Liu, Wenwu Ou, Zhirong Wang, and Xiaoyan Zhu. 2018. Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning. In Proceedings of the World Wide Web Conference. 1939–1948.

Digital Library

[4]

Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint cs.AI/1705.08926(2017).

[5]

Amy Greenwald, Keith Hall, and Roberto Serrano. 2003. Correlated Q-learning. In Proceedings of the International Conference on Machine Learning. 242.

[6]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290(2018).

[7]

Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. arXiv preprint arXiv:1803.00710(2018).

[8]

Eugene Ie, Vihan Jain, Jing Wang, Sanmit Navrekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, 2019. Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology. arXiv preprint arXiv:1905.12767(2019).

[9]

Shariq Iqbal and Fei Sha. 2018. Actor-attention-critic for multi-agent reinforcement learning. arXiv preprint arXiv:1810.02912(2018).

[10]

Martin Jankowiak and Fritz Obermeyer. 2018. Pathwise derivatives beyond the reparameterization trick. arXiv preprint arXiv:1806.01851(2018).

[11]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114(2013).

[12]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015).

[13]

Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning. Elsevier, 157–163.

[14]

Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and trends in information retrieval 3, 3 (2009), 225–331.

[15]

Yong Liu, Yinan Zhang, Qiong Wu, Chunyan Miao, Lizhen Cui, Binqiang Zhao, Yin Zhao, and Lu Guan. 2019. Diversity-promoting deep reinforcement learning for interactive recommendation. arXiv preprint arXiv:1903.07826(2019).

[16]

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems. 6379–6390.

[17]

Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering 16, 1 (2010), 100–103.

[18]

Harrie Oosterhuis and Maarten de Rijke. 2018. Ranking for relevance and display preferences in complex presentation layouts. In International ACM SIGIR Conference on Research & Development in Information Retrieval. 845–854.

Digital Library

[19]

Changhua Pei, Xinru Yang, Qing Cui, Xiao Lin, Fei Sun, Peng Jiang, Wenwu Ou, and Yongfeng Zhang. 2019. Value-aware Recommendation based on Reinforced Profit Maximization in E-commerce Systems. arXiv preprint arXiv:1902.00851(2019).

[20]

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder De Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485(2018).

[21]

Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel Visentin, and Ben Coppin. 2015. Deep reinforcement learning with attention for slate markov decision processes with high-dimensional states and actions. arXiv preprint arXiv:1512.01124(2015).

[22]

Ryuichi Takanobu, Tao Zhuang, Minlie Huang, Jun Feng, Haihong Tang, and Bo Zheng. 2019. Aggregating E-commerce Search Results from Heterogeneous Sources via Hierarchical Reinforcement Learning. In Proceedings of the World Wide Web Conference. 1771–1781.

Digital Library

[23]

Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, and Zinovi Rabinovich. 2020. Learning Efficient Multi-agent Communication: An Information Bottleneck Approach. Proceedings of the International Conference on Machine Learning (2020).

[24]

Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of NDCG type ranking measures. In Conference on Learning Theory. 25–54.

[25]

Mengchen Zhao, Zhao Li, Bo An, Haifeng Lu, Yifan Yang, and Chen Chu. 2018. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty.Proceedings of the International Joint Conferences on Artificial Intelligence Organization, 3940–3946.

[26]

Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019. Deep reinforcement learning for search, recommendation, and online advertising: a survey by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB NewsletterSpring (2019), 4.

[27]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep Reinforcement Learning for Page-wise Recommendations. arXiv preprint arXiv:1805.02343(2018).

[28]

Xiangyu Zhao, Long Xia, Yihong Zhao, Dawei Yin, and Jiliang Tang. 2019. Model-Based Reinforcement Learning for Whole-Chain Recommendations. arXiv preprint arXiv:1902.03987(2019).

[29]

Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 167–176.

Digital Library

[30]

Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems. arXiv preprint arXiv:1902.05570(2019).

Cited By

Liu YLi GPayne TYue YMan K(2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
https://doi.org/10.3390/electronics13112075
Chen XWang SMcAuley JJannach DYao L(2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3661996
Lin YLiu YLin FZou LWu PZeng WChen HMiao C(2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
https://doi.org/10.1109/TNNLS.2023.3280161
Show More Cited By

Recommendations

Adaptive Learning Rates for Multi-Agent Reinforcement Learning
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

In multi-agent reinforcement learning (MARL), the learning rates of actors and critic are mostly hand-tuned and fixed. This not only requires heavy tuning but more importantly limits the learning. With adaptive learning rates according to gradient ...
Cooperative Multi-Agent Joint Action Learning Algorithm CMJAL for Decision Making in Retail Shop Application

This article gives a novel approach to cooperative decision-making algorithms by Joint Action learning for the retail shop application. Accordingly, this approach presents three retailer stores in the retail marketplace. Retailers can help to each other ...
Pedestrian simulation as multi-objective reinforcement learning
IVA '18: Proceedings of the 18th International Conference on Intelligent Virtual Agents

Modelling and simulation of pedestrian crowds require agents to reach pre-determined goals and avoid collisions with static obstacles and dynamic pedestrians, while maintaining natural gait behaviour. We model pedestrians as autonomous, learning, and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

September 2020

796 pages

ISBN:9781450375832

DOI:10.1145/3383313

Copyright � 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

Reinforcement learning

Qualifiers

Research-article
Research
Refereed limited

Conference

RecSys '20

Sponsor:

RecSys '20: Fourteenth ACM Conference on Recommender Systems

September 22 - 26, 2020

Virtual Event, Brazil

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
733
Total Downloads

Downloads (Last 12 months)101
Downloads (Last 6 weeks)35

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu YLi GPayne TYue YMan K(2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
https://doi.org/10.3390/electronics13112075
Chen XWang SMcAuley JJannach DYao L(2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3661996
Lin YLiu YLin FZou LWu PZeng WChen HMiao C(2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
https://doi.org/10.1109/TNNLS.2023.3280161
Wu XWang HZhang YZou BHong H(2024)A Tutorial-Generating Method for Autonomous Online LearningIEEE Transactions on Learning Technologies10.1109/TLT.2024.339059317(1558-1567)Online publication date: 2024
https://doi.org/10.1109/TLT.2024.3390593
Li YHou LLi J(2024)Towards Knowledge-Aware and Deep Reinforced Cross-Domain Recommendation Over Collaborative Knowledge GraphIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339126836:11(7171-7187)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3391268
Cai TBao SJiang JZhou SZhang WGu LGu JZhang GChen HDuh WHuang HKato MMothe JPoblete B(2023)Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592022(2179-2183)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592022
Xi YLin JLiu WDai XZhang WZhang RTang RYu YChua TLauw HSi LTerzi ETsaparas P(2023)A Bird's-eye View of Reranking: From List Level to Page LevelProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570399(1075-1083)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3570399
Deng YLi YDing BLam W(2023)Leveraging Long Short-Term User Preference in Conversational Recommendation via Multi-agent Reinforcement LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322510935:11(11541-11555)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3225109
Du ZYang NYu ZYu P(2023)Learning From Atypical Behavior: Temporary Interest Aware Recommendation Based on Reinforcement LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314429235:10(9824-9835)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TKDE.2022.3144292
Wang DLiu KXiong HFu Y(2023)Online POI Recommendation: Learning Dynamic Geo-Human Interactions in StreamsIEEE Transactions on Big Data10.1109/TBDATA.2022.32151349:3(832-844)Online publication date: 1-Jun-2023
https://doi.org/10.1109/TBDATA.2022.3215134
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents