Environment Exploration as a Scaling Paradigm for Interactive Agents
Keywords: LLM Agents, Environment Exploration, Scaling
Abstract: Recent advances in Large Language Models have been driven largely by scaling model parameters, training data, and inference-time computation. Although these approaches have substantially improved the reasoning ability and overall performance of large models, interactive agents built on top of them face more fundamental challenges in complex environments. In particular, such agents must (1) generalize to unfamiliar environments that differ significantly from those encountered during training, (2) become fully aware of the complex structures, functionalities, and procedures embedded in those environments, and (3) adapt efficiently at inference time in order to deliver a reliable and responsive user experience. Thus, current scaling paradigms are not yet robust or generalizable enough to support interactive agents across the broad range of real-world applications and the effectively unenumerable environments in which they must operate.
We propose Environment Exploration as a new scaling paradigm for interactive agents, in which agents undergo a pre-deployment exploration stage to automatically discover contextual structures, interaction patterns, and task-relevant workflows embedded in an environment, without requiring concrete user queries or human supervision. Rather than waiting to adapt only after deployment, agents use this stage to proactively familiarize themselves with the environment, identify its functional organization, and accumulate procedural knowledge that would otherwise be costly to acquire at inference time. This process serves as a form of self-preparation: agents either condense the acquired experience into reusable skill sets or collect training data to further adapt themselves to the specific environment. By introducing this additional stage, Environment Exploration reduces human effort in domain-specific training, enables generalization across multiple heterogeneous environments, and ultimately supports more efficient, reliable, and robust task execution at inference time.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 178
Loading