Iterative Improvements Based on Ground Truth: Building LLM Agents in the Era of Experience Inspired by Games AI

Iterative Improvements Based on Ground Truth: Building LLM Agents in the Era of Experience Inspired by Games AI

20 May 2025 (modified: 13 Aug 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: LLM agents have attracted much attention recently. However, how to build successful LLM agents, esp. w.r.t. autonomy and optimality, is still an open problem. We present a perspective paper with a brief survey about building LLM agents with iterative improvements based on ground truth, in the era of experience inspired by the successes of games AI. We propose AgentZero, Agent$\mu$, and Agent$\infty$, agent frameworks with perfect, learned and no world models, following AlphaZero, MuZero and a model-free method like DQN, respectively. We propose to leverage domain knowledge for data collection, architecture design and algorithm design, and propose decision time planning and meta reinforcement learning at both pre- and post-training stages. We present case studies for building agents for games, maths, or coding, with approximate simulators, facts, and/or human-in-the-loop.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Erin_J_Talvitie1

Submission Number: 4897

Loading