skip to main content
10.1145/3383313.3412233acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

Published: 22 September 2020 Publication History

Abstract

With the rise of online e-commerce platforms, more and more customers prefer to shop online. To sell more products, online platforms introduce various modules to recommend items with different properties such as huge discounts. A web page often consists of different independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which might result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. In this paper, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that different modules cannot communicate. Our contributions are three-fold. Firstly, inspired by a solution concept in game theory named correlated equilibrium, we design a signal network to promote cooperation of all modules by generating signals (vectors) for different modules. Secondly, an entropy-regularized version of the signal network is proposed to coordinate agents’ exploration of the optimal global policy. Furthermore, experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.

References

[1]
Robert J Aumann. 1974. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1, 1 (1974), 67–96.
[2]
Sungwoon Choi, Heonseok Ha, Uiwon Hwang, Chanju Kim, Jung-Woo Ha, and Sungroh Yoon. 2018. Reinforcement learning based recommender system using biclustering technique. arXiv preprint arXiv:1801.05532(2018).
[3]
Jun Feng, Heng Li, Minlie Huang, Shichen Liu, Wenwu Ou, Zhirong Wang, and Xiaoyan Zhu. 2018. Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning. In Proceedings of the World Wide Web Conference. 1939–1948.
[4]
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint cs.AI/1705.08926(2017).
[5]
Amy Greenwald, Keith Hall, and Roberto Serrano. 2003. Correlated Q-learning. In Proceedings of the International Conference on Machine Learning. 242.
[6]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290(2018).
[7]
Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. arXiv preprint arXiv:1803.00710(2018).
[8]
Eugene Ie, Vihan Jain, Jing Wang, Sanmit Navrekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, 2019. Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology. arXiv preprint arXiv:1905.12767(2019).
[9]
Shariq Iqbal and Fei Sha. 2018. Actor-attention-critic for multi-agent reinforcement learning. arXiv preprint arXiv:1810.02912(2018).
[10]
Martin Jankowiak and Fritz Obermeyer. 2018. Pathwise derivatives beyond the reparameterization trick. arXiv preprint arXiv:1806.01851(2018).
[11]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114(2013).
[12]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015).
[13]
Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning. Elsevier, 157–163.
[14]
Tie-Yan Liu. 2009. Learning to rank for information retrieval. Foundations and trends in information retrieval 3, 3 (2009), 225–331.
[15]
Yong Liu, Yinan Zhang, Qiong Wu, Chunyan Miao, Lizhen Cui, Binqiang Zhao, Yin Zhao, and Lu Guan. 2019. Diversity-promoting deep reinforcement learning for interactive recommendation. arXiv preprint arXiv:1903.07826(2019).
[16]
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems. 6379–6390.
[17]
Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering 16, 1 (2010), 100–103.
[18]
Harrie Oosterhuis and Maarten de Rijke. 2018. Ranking for relevance and display preferences in complex presentation layouts. In International ACM SIGIR Conference on Research & Development in Information Retrieval. 845–854.
[19]
Changhua Pei, Xinru Yang, Qing Cui, Xiao Lin, Fei Sun, Peng Jiang, Wenwu Ou, and Yongfeng Zhang. 2019. Value-aware Recommendation based on Reinforced Profit Maximization in E-commerce Systems. arXiv preprint arXiv:1902.00851(2019).
[20]
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder De Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485(2018).
[21]
Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel Visentin, and Ben Coppin. 2015. Deep reinforcement learning with attention for slate markov decision processes with high-dimensional states and actions. arXiv preprint arXiv:1512.01124(2015).
[22]
Ryuichi Takanobu, Tao Zhuang, Minlie Huang, Jun Feng, Haihong Tang, and Bo Zheng. 2019. Aggregating E-commerce Search Results from Heterogeneous Sources via Hierarchical Reinforcement Learning. In Proceedings of the World Wide Web Conference. 1771–1781.
[23]
Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, and Zinovi Rabinovich. 2020. Learning Efficient Multi-agent Communication: An Information Bottleneck Approach. Proceedings of the International Conference on Machine Learning (2020).
[24]
Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. 2013. A theoretical analysis of NDCG type ranking measures. In Conference on Learning Theory. 25–54.
[25]
Mengchen Zhao, Zhao Li, Bo An, Haifeng Lu, Yifan Yang, and Chen Chu. 2018. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty.Proceedings of the International Joint Conferences on Artificial Intelligence Organization, 3940–3946.
[26]
Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019. Deep reinforcement learning for search, recommendation, and online advertising: a survey by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB NewsletterSpring (2019), 4.
[27]
Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep Reinforcement Learning for Page-wise Recommendations. arXiv preprint arXiv:1805.02343(2018).
[28]
Xiangyu Zhao, Long Xia, Yihong Zhao, Dawei Yin, and Jiliang Tang. 2019. Model-Based Reinforcement Learning for Whole-Chain Recommendations. arXiv preprint arXiv:1902.03987(2019).
[29]
Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 167–176.
[30]
Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems. arXiv preprint arXiv:1902.05570(2019).

Cited By

View all
  • (2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
  • (2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
  • (2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems
September 2020
796 pages
ISBN:9781450375832
DOI:10.1145/3383313
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. Reinforcement learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

RecSys '20: Fourteenth ACM Conference on Recommender Systems
September 22 - 26, 2020
Virtual Event, Brazil

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)101
  • Downloads (Last 6 weeks)35
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Non-Stationary Transformer Architecture: A Versatile Framework for Recommendation SystemsElectronics10.3390/electronics1311207513:11(2075)Online publication date: 27-May-2024
  • (2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
  • (2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
  • (2024)A Tutorial-Generating Method for Autonomous Online LearningIEEE Transactions on Learning Technologies10.1109/TLT.2024.339059317(1558-1567)Online publication date: 2024
  • (2024)Towards Knowledge-Aware and Deep Reinforced Cross-Domain Recommendation Over Collaborative Knowledge GraphIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339126836:11(7171-7187)Online publication date: Nov-2024
  • (2023)Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592022(2179-2183)Online publication date: 19-Jul-2023
  • (2023)A Bird's-eye View of Reranking: From List Level to Page LevelProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570399(1075-1083)Online publication date: 27-Feb-2023
  • (2023)Leveraging Long Short-Term User Preference in Conversational Recommendation via Multi-agent Reinforcement LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322510935:11(11541-11555)Online publication date: 1-Nov-2023
  • (2023)Learning From Atypical Behavior: Temporary Interest Aware Recommendation Based on Reinforcement LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.314429235:10(9824-9835)Online publication date: 1-Oct-2023
  • (2023)Online POI Recommendation: Learning Dynamic Geo-Human Interactions in StreamsIEEE Transactions on Big Data10.1109/TBDATA.2022.32151349:3(832-844)Online publication date: 1-Jun-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media