Keywords: Residual Learning, Robotic Assembly, Combining BC and RL
TL;DR: We combine a frozen, chunked behavior cloning (BC) model with a closed-loop residual policy trained via reinforcement learning (RL) to achieve precise and reactive manipulation, outperforming standard BC and RL methods on high-precision tasks.
Abstract: Recent advances in behavior cloning (BC), like action-chunking and diffusion, have led to impressive progress. Still, imitation alone remains insufficient for tasks requiring reliable and precise movements, such as aligning and inserting objects. Our central insight is that chunked BC policies function as trajectory planners, enabling long-horizon tasks. Still, since they execute action chunks open loop, they lack the fine-grained reactivity necessary for reliable execution. Further, we find that the performance of BC policies saturates despite increasing data. We present a simple yet effective method, ResiP (\textit{Resi}dual for \textit{P}recise Manipulation), that sidesteps these challenges by augmenting a frozen, chunked BC model with a fully closed-loop residual policy trained with reinforcement learning (RL). The residual policy is trained via on-policy RL, addressing distribution shifts and reactivity without altering the BC trajectory planner. Evaluation on high-precision manipulation tasks demonstrates strong performance of ResiP over BC methods and direct RL fine-tuning.
Submission Number: 10
Loading