Action Dimension Coordination via Centralised Critics for Continuous Control

ICLR 2026 Conference Submission19732 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Online Reinforcement Learning, Offline Reinforcement Learning
Abstract: Continuous control tasks with large action spaces often demand coordination across action dimensions. Recent work has shown that factorising the action space enables deep Q-learning to tackle high-dimensional continuous control problems by leveraging value decomposition methods adapted from multi-agent reinforcement learning (MARL). However, these approaches treat action dimensions independently, which can result in sub-optimal policies when coordination is required. To overcome this, we propose a general framework that adapts centralised training with decentralised execution (CTDE) to single-agent continuous control with factorised action spaces. Our key insight is to reinterpret action dimensions as cooperative "agents" and enable them to exchange information via a centralised critic during training, leading to coordinated policies that can be executed in a decentralised manner at test time. We instantiate this framework with two algorithms, DAC-AC and DAC-DDPG, and evaluate them on 13 DeepMind Control Suite tasks, demonstrating that incorporating centralised critics improves both sample efficiency and asymptotic performance on a wide range of tasks. Using these two algorithms, we further show that our framework seamlessly integrates with existing offline RL methods, achieving state-of-the-art performance across multiple benchmarks.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 19732
Loading