Action Chunking Proximal Policy Optimization for Universal Robotic Dexterous Grasping

Action Chunking Proximal Policy Optimization for Universal Robotic Dexterous Grasping

ICLR 2026 Conference Submission18970 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dexterous Grasping, Reinforcement Learning, Action Chunking, Manipulation

TL;DR: We propose Action Chunking Proximal Policy Optimization, the first action chunking reinforcement learning method designed for high degrees of freedom environments.

Abstract: Universal dexterous grasping across diverse objects is a crucial step towards human-like manipulation. In order to handle the high degrees of freedom (DoF) of dexterous hands, state-of-the-art universal dexterous grasping methods adopt online reinforcement learning (RL) algorithms such as Proximal Policy Optimization (PPO) to learn action policies. Although PPO is a common choice, its vanilla version often leads to insufficient exploration and slow policy improvement, requiring additional training augmentation to achieve high performance. While action chunking is a promising strategy to improve exploration by temporally coherent actions, prior RL algorithms that integrate action chunking are unsuitable for dexterous hands due to their high-DoF Q-functions. To address this, we reformulate the PPO objective over action chunks and use a standard state-value function as the critic, naming \emph{Action Chunking Proximal Policy Optimization} (ACPPO). ACPPO retains the simplicity of PPO while encouraging temporally coherent exploration and avoiding the curse of dimensionality. Validating on the DexGraspNet dataset, we observe that ACPPO outperforms all prior PPO-based methods by a success rate of 95.4\% with $2.3\times$ faster training without any auxiliary learning mechanisms.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 18970

Loading