Sample-Efficient Reinforcement Learning with Action Chunking

Published: 12 Jun 2025, Last Modified: 21 Jun 2025EXAIT@ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Robotics
Keywords: Reinforcement learning, exploration, action chunking
TL;DR: We propose Q-LAC, a simple, effective offline-to-online RL method that uses action chunking to improve value propagation and exploration via temporally coherent actions.
Abstract: We present Q-learning with action chunking (Q-LAC), a simple yet strong actor-critic RL algorithm for offline-to-online RL. Our method addresses two common shortcomings of the existing actor-critic RL methods in these problem settings: (1) slow 1-step temporal-difference (TD) backups, (2) temporally incoherent actions for exploration. The former slows down value backup, leading to inefficient value learning and sample inefficiency. The latter reduces the quality of data collection online, compounding the sample inefficiency further. Our key idea is to use temporally extended actions where the policy predicts a sequence of actions for a fixed horizon and executes them one-by-one open loop, and run RL update directly with this extended action space with behavioral constraint. RL training on the temporal extended action space speeds up the TD backup by ``skipping" over time steps, while the behavioral constraint and the open-loop execution ensures the temporal coherence of the actions. On a range of long-horizon, sparse-reward manipulation tasks, our method exhibit strong offline performance and online sample efficiency, outperforming prior methods that operate in the original action space and skill-based methods.
Serve As Reviewer: ~Qiyang_Li1, ~Zhiyuan_Zhou2
Submission Number: 22
Loading