Keywords: Dexterous Grasping, Reinforcement Learning, Action Chunking, Manipulation
TL;DR: We propose Action Chunking Proximal Policy Optimization, the first action chunking reinforcement learning method designed for high degrees of freedom environments.
Abstract: Universal dexterous grasping across diverse objects is a crucial step towards human-like manipulation.
In order to handle the high degrees of freedom (DoF) of dexterous hands, state-of-the-art universal dexterous grasping methods adopt online reinforcement learning (RL) algorithms such as Proximal Policy Optimization (PPO) to learn action policies.
Although PPO is a common choice, its vanilla version often leads to insufficient exploration and slow policy improvement, requiring additional training augmentation to achieve high performance.
While action chunking is a promising strategy to improve exploration by temporally coherent actions, prior RL algorithms that integrate action chunking are unsuitable for dexterous hands due to their high-DoF Q-functions.
To address this, we reformulate the PPO objective over action chunks and use a standard state-value function as the critic, naming \emph{Action Chunking Proximal Policy Optimization} (ACPPO).
ACPPO retains the simplicity of PPO while encouraging temporally coherent exploration and avoiding the curse of dimensionality.
Validating on the DexGraspNet dataset, we observe that ACPPO outperforms all prior PPO-based methods by a success rate of 95.4\% with $2.3\times$ faster training without any auxiliary learning mechanisms.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 18970
Loading