Training a Generally Curious Agent

Fahim Tajwar; Yiding Jiang; Abitha Thankaraj; Sumaita Sadia Rahman; J Zico Kolter; Jeff Schneider; Ruslan Salakhutdinov

Training a Generally Curious Agent

Fahim Tajwar, Yiding Jiang, Abitha Thankaraj, Sumaita Sadia Rahman, J Zico Kolter, Jeff Schneider, Ruslan Salakhutdinov

Published: 08 Mar 2025, Last Modified: 14 Apr 2025SSI-FM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Agent, Synethic Data, Multiturn finetuning

TL;DR: Method for training on synthetic data to improve LLMs' sequential decision making capabilities

Abstract: Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present **PAPRIKA**, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, **PAPRIKA** teaches models to explore and adapt their behavior based on the environment feedback in context without gradient updates. Experimental results show that models fine-tuned with **PAPRIKA** can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. We also introduce a curriculum learning algorithm that improves PAPRIKA's sample efficiency. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.

Submission Number: 55

Loading