Training a Generally Curious Agent

Published: 08 Mar 2025, Last Modified: 14 Apr 2025SSI-FM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Agent, Synethic Data, Multiturn finetuning
TL;DR: Method for training on synthetic data to improve LLMs' sequential decision making capabilities
Abstract: Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present **PAPRIKA**, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, **PAPRIKA** teaches models to explore and adapt their behavior based on the environment feedback in context without gradient updates. Experimental results show that models fine-tuned with **PAPRIKA** can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. We also introduce a curriculum learning algorithm that improves PAPRIKA's sample efficiency. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.
Submission Number: 55
Loading