Abstract: Modern AI systems such as self-driving cars and game-playing agents achieve superhuman
performance. But they often lack human-like generalization, interpretability, and inter-
operability with human users. This paper introduces *Policy Learning with a Language
Bottleneck* (PLLB), a framework enabling AI agents to generate linguistic rules that capture
the high-level strategies underlying rewarding behaviors. PLLB alternates between a *rule
generation* step guided by language models, and an *update* step where agents learn new
policies guided by rules. Crucially, PLLB enables this kind of language-guided learning
even when a natural language rule is insufficient to completely describe the target policy.
Across five diverse tasks, including a two-player signaling game, maze navigation, image
reconstruction, and robot grasp planning, we show that PLLB learns more interpretable
and generalizable behaviors than standard policy learning methods. In three additional
human subject studies, we show that show the learned rules significantly improve human
task performance, enabling more effective human-AI coordination
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Erin_J_Talvitie1
Submission Number: 6178
Loading