TL;DR: This paper argues that general agents should be evaluated locally, and proposes a structural certification framework that converts local performance bounds into tight, conservative guarantees on the reliability of the agent’s internal world model.
Abstract: In the big-world regime, agents cannot be universally capable and their ability is inevitably specialized across a world model in pieces. Consequently, standard uniform guarantees fail to distinguish between the understanding of critical bottlenecks and irrelevant failures. We first formalize this limitation by proving that *general agents are not universal*, rendering standard worst-case analysis uninformative. To overcome this, we introduce **structural certification**, a transition-local framework that maps bounded goal-conditioned performance to entry-wise guarantees on the agent's internal world model. Our main contribution is constructive. We provide algorithms that filter specific transitions using deep compositional goals and prove that a general agent on these goals has a structural world model with a $\mathcal{O}(1/n)+\mathcal{O}(\delta)$ error bound. Conversely, this bound is tight in the small-$\delta$ regime, whose existence is explicitly guaranteed by our certification. These results enable the certifiable deployment of general agents by localizing the specific transitions where long-horizon planning is reliable.
Lay Summary: Modern AI agents are increasingly asked to complete long and complex tasks. However, an agent does not need to, and can not understand every part of the world perfectly. In many scenarios, success often depends on a few critical steps, such as choosing the right item, entering the correct page, or opening a key door. This paper studies how to identify the parts of the world that an agent truly understands well enough to rely on. We first show that expecting one agent to be uniformly reliable on all possible goals is impossible in complex environments. We then propose a certification method that tests an agent on carefully designed tasks and uses its behavior to verify whether it has an accurate internal model of specific important transitions. For the certified transitions, we prove that the agent's implied predictions closely match the real dynamics. This provides a practical way to map where an agent can be trusted, supporting safe deployment and reliable long-horizon planning for general agents.
Primary Area: Theory->Reinforcement Learning and Planning
Keywords: Reinforcement Learning, World Models, General Agents
Originally Submitted PDF: pdf
Submission Number: 34421
Loading