Keywords: Robotics, reinforcement learning, sim-to-real
Abstract: Imitation learning has been an effective tool for bootstrapping sequential decision making behavior, showing surprisingly strong results as methods are scaled up to high-dimensional, dexterous problems in robotics. These ``behavior cloning" methods have been further bolstered by the integration of generative modeling techniques such as diffusion modeling or flow matching for training expressive multimodal behavior policies. However, these pretrained models do not always generalize perfectly, and require finetuning to maximize deployment-time performance. This finetuning procedure must retain the strengths of pretraining for exploration, while being able to quickly correct for local inaccuracies in model performance. In this work, we propose an efficient reinforcement learning (RL) framework for fast adaptation of pretrained generative policies. Specifically, our proposed methodology - residual flow steering, instantiates an efficient RL technique that quickly adapts a pretrained flow-matching model by steering it jointly by optimizing a policy for selecting both a latent noise distribution and a residual action. Doing so allows policies to perform both local (residual actions) and global exploration (latent noise), data-efficient adaptation. We demonstrate that this technique is effective for dexterous manipulation problems, serving both as a tool to pretrain behaviors in simulation and efficiently finetune them in the real world.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 22113
Loading