Building Learning Context For Autonomous Agents Through Generative Optimization

ICLR 2026 Conference Submission21387 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agent Learning, Large Language Model Optimizer, Reinforcement learning
TL;DR: We show how to construct context-specific learning graphs that allow LLM agents to learn desirable behaviors
Abstract: Building intelligent agents that learn involves designing systems that can evolve their behavior based on experiences. While early approaches to large language models (LLMs) agent learning relied mostly on structured memory and in-context learning, they often led to behavioral instability, poor interpretability, and difficulty in control. Recent success in generative optimization, where an LLM is used as an optimizer, has shown the possibility of creating autonomous software agents. By separating behavior logic (workflow) and how that logic is updated (optimizer), the agent designer can exhibit more control over the agent. In this work, we show the surprising fact that the agent learning problem is \textit{under-specified} with the generative optimization framework. If we want an agent to learn the right behavior, we must set up the right context that will induce such behavior. We investigate three types of software engineering problems that span data science, computer security, game playing, and question answering. We show that the original generative optimization framework can only learn robustly under one of the three settings. To address the issue, we propose to construct a meta-graph through templates to introduce the right learning context to an LLM optimizer. With this addition, we demonstrate that defining the right learning context enables agents to discover behaviors aligned with the designer's objectives. In particular, we show the first known result of using generative optimizers to learn executable programs that play Atari games, where the resulting agents achieve performance comparable to deep reinforcement learning while requiring 50%-90% less training time.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21387
Loading