Keywords: LLM Agent, Reinforcement Learning, Active Learning
Abstract: Recent advances in Large Language Models (LLMs) have created new opportunities for their application in interactive environments. However, these agentic tasks present significant challenges due to the complexity of long and specialized interaction trajectories that are underrepresented in standard training distributions. While Reinforcement Learning (RL) post-training offers a promising approach to mitigate the need for extensive human-annotated data, it faces fundamental limitations in exploration efficiency when applied to LLMs. In this paper, we introduce a novel framework that synergistically combines RL post-training with Active Learning (AL) for LLM agents. By choosing informative tasks with reward-based filter and diversity-based selection criteria, our approach enables models to not only refine their capabilities through autonomous exploration but also strategically request expert demonstrations for challenging scenarios, thereby extending their exploration boundaries. We demonstrate the efficacy of this method on the AppWorld benchmark, showing significant performance improvements with minimal expert demonstrations. We then further look into adapting our framework for different budget and examine the factors that affect the final performance. Our method highlights the potential of efficiently integrating human resources within RL pipelines to enhance LLM agents' capabilities in complex interactive environments.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 23822
Loading