Keywords: self-improvement, generative models, visual planning
TL;DR: We present the Self-Adapting Improvement Loop (SAIL), bootstraps a high-performance video model for solving novel robotic tasks from cheap, suboptimal data through self-improvement.
Abstract: Video generative models have been recently applied to robotic applications as visual planners. However, such visual planning models are generally trained on in-domain expert data, which may potentially be expensive to collect. Recent work has shown that instead, an in-domain video model trained on suboptimal data can be composed with a video model trained on internet-scale data to produce a performant video planner capable of generating high-quality trajectories during interaction with the environment. In this work, we investigate if utilizing these improved trajectories to update the in-domain model in a virtuous cycle can facilitate further downstream robotic task performance over multiple iterations. We present the Self-Adapting Improvement Loop (SAIL), where an in-domain model initially trained on only suboptimal demonstration data is iteratively adapted to the trajectories synthesized when using it as an adapted visual planner, without any reward annotation or heuristical data filtering. We apply SAIL on a large suite of MetaWorld tasks unseen during initial in-domain training, and find that improvements do continuously emerge over multiple iterations, thus demonstrating a way to iteratively bootstrap a high-performance video model for solving novel robotic tasks from cheap, suboptimal data through self-improvement.
Submission Number: 43
Loading