Abstract: Robotic agents performing domestic chores using natural language directives re-quire to learn the complex task of navigating an environment and interacting with objects in it. To address such composite tasks, we propose a hierarchical modular approach to learn agents that navigate and manipulate objects in a divide-and-conquer manner for the diverse nature of the entailing tasks. Specifically, our policy operates at three levels of hierarchy. We first infer a sequence of subgoals to be executed based on language instructions by a high-level policy composition controller (PCC). We then discriminatively control the agent’s navigation by a master policy by alternating between navigation policy and various independent interaction policies. Finally, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy. Our hierarchical agent, named HACR (Hierarchical Approach for Compositional Reasoning), generates a human interpretable and short sequence of sub-objectives, leading to efficient interaction with an environment, and achieves the state-of-the-art performance on the challenging ALFRED benchmark.
One-sentence Summary: We present a hierarchical approach for interactive instruction following by compositional reasoning.
Supplementary Material: zip
28 Replies
Loading