A Discrete Actor and Critic for Reinforcement Learning on Continuous Tasks

Jundong Zhang; Tianqi Wei

A Discrete Actor and Critic for Reinforcement Learning on Continuous Tasks

Jundong Zhang, Tianqi Wei

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement Learning, discrete action space, continuous control, bipedal locomotion

TL;DR: A discrete reinforcement learning model achieved a performance similar with continuous models in continuous reinforcement learning tasks.

Abstract: Solving continuous reinforcement learning (RL) tasks typically requires models with continuous action spaces, as discrete models face challenges such as the curse of dimensionality. Inspired by discrete controlling signals in control systems, such as pulse-width modulation, we investigated RL models with discrete action spaces with performance comparable to continuous models on continuous tasks. In this paper, we propose an RL model with a discrete action space, designed a discrete actor that outputs action distributions and twin discrete critics for value distribution estimation. We also developed both the training method and exploration strategy for this model. The model successfully solved BipedalWalkerHardcore-v3, a continuous robot control task in a complex environment, achieved a higher score than the state-of-the-art baselines and comparable results across various other control tasks.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10630

Loading