Iterative Improvements Based on Ground Truth: Building LLM Agents in the Era of Experience Inspired by Games AI

TMLR Paper4897 Authors

20 May 2025 (modified: 30 May 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: LLM agents have attracted much attention recently. However, how to build successful LLM agents, esp. w.r.t. autonomy and optimality, is still an open problem. We present a perspective paper with a brief survey about building LLM agents with iterative improvements based on ground truth, in the era of experience inspired by the successes of games AI. We propose AgentZero, Agent$\mu$, and Agent$\infty$, agent frameworks with perfect, learned and no world models, following AlphaZero, MuZero and a model-free method like DQN, respectively. We propose to leverage domain knowledge for data collection, architecture design and algorithm design, and propose decision time planning and meta reinforcement learning at both pre- and post-training stages. We present case studies for building agents for games, maths, or coding, with approximate simulators, facts, and/or human-in-the-loop.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Erin_J_Talvitie1
Submission Number: 4897
Loading