Unifying Diverse Decision-Making Scenarios with Learned Discrete Actions

Yazhe Niu; Yuan Pu; Yun Chen; Chunyu Xuan; Zhenjie Yang; Yu Liu; Hongsheng Li

Unifying Diverse Decision-Making Scenarios with Learned Discrete Actions

Yazhe Niu, Yuan Pu, Yun Chen, Chunyu Xuan, Zhenjie Yang, Yu Liu, Hongsheng Li

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Reinforcement Learning, DeepRL, Representation Learning, Action Discretization

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a novel algorithm GADM that can learn unified and compact discrete latent actions for different RL environments, which is the first decoupling paradigm capable of both online and offline RL training.

Abstract: Designing effective action spaces for complex environments is a fundamental and challenging problem in reinforcement learning (RL). Although various action shaping and representation learning methods have been proposed to address some specific action spaces and decision-making requirements (e.g. action constraints), these methods often are typically customized to fixed scenarios and require extensive domain knowledge. In this paper, we introduce a general framework that can apply any common RL algorithms to a class of discrete latent actions learned from data. This framework unifies a wide range of action spaces, including those with continuous, hybrid, or constrained actions. Specifically, we propose a novel algorithm, General Action Discretization Model (GADM), that can adaptively discretize raw actions to construct unified and compact latent action spaces. Moreover, GADM also predicts confidence scores of different latent actions, which can help mitigate the instability of parallel optimization in online RL settings, and serve as an implicit contraint for offline RL cases. Quantitative experiments and visualization results demonstrate that our proposed framework can match or outperform various approaches specifically designed for different environments.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1787

Loading