Unified Mirror Descent: Towards a Big Unification of Decision Making

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: decision-making problems, reinforcement learning, mirror descent, zero-order optimization
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: This work presents the first attempt to investigate all types of decision-making problems under a single RL algorithmic framework.
Abstract: Decision-making problems, encompassing single-agent, cooperative multi-agent, competitive multi-agent, and mixed cooperative-competitive cases, are ubiquitous in real-world applications. In the past several decades, substantial strides in theoretical and algorithmic advancements have been achieved within these fields. Nevertheless, these fields have been predominantly evolving independently, giving rise to a fundamental question: Can we develop a single algorithm to effectively tackle all these scenarios? In this work, we embark upon an exploration of this question by introducing a unified approach to address all types of decision-making scenarios. First, we propose a unified mirror descent (UMD) algorithm which synergistically integrates multiple base policy update rules. Specifically, at each iteration, the new policy of an agent is computed by weighting the base policies obtained through different policy update rules. One of the advantages of UMD is that only minimal modifications are required when integrating new policy update rules. Second, as the evaluation metric of the resulting policy is non-differentiable with respect to the weights of the base policies, we propose a simple yet effective zero-order method to optimize these weights. Finally, we conduct extensive experiments on 24 benchmark environments, which shows that in over 87\% (21/24) games UMD performs better than or on-par with the base policies, demonstrating its potential to serve as a unified approach for various decision-making problems. To our knowledge, this is the first attempt to comprehensively study all types of decision-making problems under a single algorithmic framework.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5314
Loading