Abstract: To successfully apply learning-based approaches to long-horizon sequential decision making tasks, a human teacher must be able to specify the task in a way that provides appropriate guidance to the learner. The two most prominent policy learning paradigms, reinforcement learning (RL) and imitation learning, both require considerable human effort to specify a long-horizon task, either through dense reward engineering or providing many demonstrations that follow an approach that is feasible for the learner. We propose the \textit{illustrated landmark graph (ILG)} as a form of task specification that exposes opportunities for the learner to customize its approach to its unique capabilities without the need for reward engineering, and allows the human teacher to intuitively provide intermediate guidance without the need for full-length demonstration. Each source-to-sink path in the ILG represents a way to complete the task, and each vertex along a path represents an intermediate \textit{landmark}. To communicate the meaning of a landmark to the learner, the teacher provides \textit{illustrative observations} drawn from states within the landmark. We further propose ILG-Learn, an approach that interleaves planning over the ILG, policy learning, and active querying of the human teacher to guide the learner. Our experimental evaluation shows that ILG-Learn learns policies that successfully complete a block stacking task and a 2D navigation task, while approaches that receive specifications in the form of final goal observations (RCE) or full-length demonstrations (behavior cloning) fail. Additionally, we show that a multi-path ILG allows ILG-Learn to adapt to the capabilities of a learner with limited perception.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Marlos_C._Machado1
Submission Number: 3715
Loading