AlphaGo原来是这样运行的,一文详解多智能体强化学习( 八 )
4. 总结
多智能体强化学习(MARL)是结合了强化学习和多智能体学习这两个领域的重要研究方向 , 关注的是多个智能体的序贯决策问题 。 本篇文章主要基于智能体之间的关系类型 , 包括完全合作式、完全竞争式和混合关系式 , 对多智能体强化学习的理论和算法展开介绍 , 并在应用方面列举了一些相关的研究工作 。 在未来 , 对 MARL 方面的研究(包括理论层面和应用层面)仍然需要解决较多的问题 , 包括理论体系的补充和完善、方法的可复现性、模型参数的训练和计算量、模型的安全性和鲁棒性等 [15] 。
参考文献:
[1] Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018.
[2] Zhang K , Yang Z , Baar T . Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms[J]. 2019.
[3] L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multi-agent reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 38, no. 2, pp. 156–172, Mar. 2008.
[4] Littman M L. Markov games as a framework for multi-agent reinforcement learning[C]. international conference on machine learning, 1994: 157-163.
[5] Hu J, Wellman M P. Nash Q-learning for general-sum stochastic games[J]. Journal of machine learning research, 2003, 4(Nov): 1039-1069.
[6] Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 746–752, 1998.
[7] S. Kapetanakis and D. Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. American Association for Artificial Intelligence, pp. 326-331, 2002.
[8] Yang Y, Luo R, Li M, et al. Mean Field Multi-Agent Reinforcement Learning[C]. international conference on machine learning, 2018: 5567-5576
[9] Lowe R, Wu Y, Tamar A, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments[C]. neural information processing systems, 2017: 6379-6390.
[10] Foerster J, Farquhar G, Afouras T, et al. Counterfactual Multi-Agent Policy Gradients[J]. arXiv: Artificial Intelligence, 2017.
[11] Sunehag P, Lever G, Gruslys A, et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning.[J]. arXiv: Artificial Intelligence, 2017.
[12] Rashid T, Samvelyan M, De Witt C S, et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning[J]. arXiv: Learning, 2018.
[13] OpenAI Five, OpenAI, , 2018.
[14] Vinyals, O., Babuschkin, I., Czarnecki, W.M. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
[15] P. Long, T. Fan, X. Liao, W. Liu, H. Zhang and J. Pan, "Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, 2018, pp. 6252-6259, doi: 10.1109/ICRA.2018.8461113.
[16] Y. F. Chen, M. Everett, M. Liu and J. P. How, "Socially aware motion planning with deep reinforcement learning," 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, 2017, pp. 1343-1350, doi: 10.1109/IROS.2017.8202312.
【AlphaGo原来是这样运行的,一文详解多智能体强化学习】[17] Hernandez-Leal P , Kartal B , Taylor M E . A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents & Multi Agent Systems, 2019(2).
- 对手|一加9Pro全面曝光,或是小米11最大对手
- 行业|现在行业内客服托管费用是怎么算的
- 王兴称美团优选目前重点是建设核心能力;苏宁旗下云网万店融资60亿元;阿里小米拟增资居然之家|8点1氪 | 美团
- 手机基带|为了5G降低4G网速?中国移动回应来了:罪魁祸首不是运营商
- 技术|做“视频”绿厂是专业的,这项技术获人民日报评论点赞
- 互联网|苏宁跳出“零售商”重组互联网平台业务 融资60亿只是第一步
- 体验|闭上眼睛点外卖是什么感觉?时隔一年再次体验,进步令人欣慰
- 再次|华为Mate40Pro干瞪眼?P50再次曝光,这次是真香!
- 当初|这是我的第一部华为手机,当初花6799元买的,现在“一文不值”?
- 无国界|嘴上说着支持华为,却为苹果贡献了2000亿!还真是科技无国界啊?