TL;DR: A non-myopic method for zero-shot imitation from arbitrary offline data.
Abstract: Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks. The code is available at https://github.com/martius-lab/zilot.
Lay Summary: A task for a robot (or agent) can be specified as a list of goals it should achieve in order. Most current methods make the robot go after each goal one at a time. But this can make the robot short-sighted: reaching the next goal as fast as possible might sacrifice any chance of reaching future goals. For example, if a robot arm should move an object first to position A and then to B, a fast way to achieve A might be to throw the object there. But this action could also make the object roll out of the arm’s reach after landing at position A, making it impossible to move to position B.
Our approach helps robots (or agents) look at the big picture. Instead of just chasing after the next goal, the robot naturally plans its actions with future goals in mind, so that it can successfully follow the entire goal sequence from start to finish—even if the goals are specified only roughly. We show that this helps simulated robots perform more complex tasks, which is a step towards more reliable and flexible robots in the real world.
Link To Code: https://github.com/martius-lab/zilot
Primary Area: Reinforcement Learning
Keywords: Imitation Learning, Deep Reinforcement Learning, Optimal Transport
Submission Number: 10028
Loading