Keywords: Long Horizon Task Execution, Goal Conditioned Policy Learning, Imitation Learning, Learning from Demonstration, Robot Manipulation
TL;DR: Learning to predict subgoal attainment aids in imitation of long horizon tasks.
Abstract: Imitation-based policy training for long-horizon manipulation tasks involving multi-step object interactions is often susceptible to compounding action errors. Contemporary methods discover semantic subgoals embedded within the overall task, decomposing the overall task into tractable shorter-horizon goal-conditioned policy learning. However, policy deployment requires iteratively estimating $\textit{which}$ subgoal is being pursued and $\textit{when}$ it is achieved. We observe the brittleness of conventional $\textit{heuristic}$-based approaches (ad hoc threshold based), particularly for long-horizon imitation, since pursuing an incorrect subgoal can lead the robot policy to experience out of distribution states. In this work, we introduce two policy architectures for modeling subgoal transitions within a policy learning loop for long-horizon tasks. The first model autoregressively predicts the likelihood of the next subgoal transition, while the second uses cross-attention (via a transformer-based architecture) and implicitly models smooth and continuous transitions. We evaluate our models on $25$ simulated tasks on Franka Kitchen, $6$ real-world table-top tasks and $18$ simulated tasks on a new corpus (Franka-Long Horizon Tasks (LHT)) focused on tasks with rich object interactions over long episode lengths. Experimental results show significant improvements in learning efficacy, task success rates and generalization to out-of-distribution settings- extending horizon lengths for imitating manipulation tasks $\textit{from long to long(er)}$.
Supplementary Material: zip
Spotlight: zip
Submission Number: 543
Loading