Keywords: Long-Horizon Planning, Large-Scale Settings, Imitation Learning
Abstract: The ability to plan in large-scale settings with several objects and high-level tasks is a challenging problem. Unlike myopic planning, handling complex application domains requires an agent to reason about the salient aspects of the environment conditioned on the task specification. Recent approaches leverage dense object embeddings using commonsense knowledgebases and neural inference to predict plans for a given goal specification and environment state. We present a neural model, namely TANGO, to learn abstractions from human demonstrations for scaling to domains with complex inter-object interactions. TANGO encodes the world state using a graph neural network with ConceptNet embeddings, and applies goal-conditioned attention to decode symbolic actions to be executed by an embodied agent. A combination of teacher-forced and end-to-end training, enables TANGO to outperform state-of-the-art baseline in both small and large scale settings, increasing the goal reaching rate by 2.2-5.8 times.