Keywords: interactive, task, learning, language, model, agent
TL;DR: We explore the Interactive Task Learning capabilities of small-to-medium size LLMs on compositional symbolic tasks.
Abstract: Large Language Models (LLMs) can perform tasks specified in natural language,
making them accessible to users regardless of technical background. However,
specifying tasks within a single, static prompt is often both difficult and suboptimal.
Interactive Task Learning (ITL)—a goal for autonomous agents—proposes
to address this challenge through multi-turn interactions: teachers provide a task
description and (optionally) a demonstration, agents attempt the task while asking
clarifying questions, and teachers offer feedback. Despite ITL’s promise, systematic
evaluation of LLMs’ interactive learning capabilities remains limited. We introduce
the ListOps Domain, a novel testbed for evaluating models’ ability to learn
compositional symbolic tasks through ITL. We evaluate small-to-medium size
LLMs (4 to 32 billion parameters) and find that a limited form of teacher feedback—
expressing only reminders about broken rules rather than explicitly identifying
or correcting errors—enhances generalization. Using this feedback, we compare
models’ ITL and Few-Shot Learning (FSL) capabilities and find that ITL frequently
outperforms FSL, especially within more powerful models. We conclude with a
discussion of limitations and recommendations for advancing ITL research.
Submission Number: 143
Loading