Thinking vs. Doing: Agents that Reason by  Scaling Test-Time Interaction

Junhong Shen; Hao Bai; Lunjun Zhang; Yifei Zhou; Amrith Setlur; Shengbang Tong; Diego Caples; Nan Jiang; Tong Zhang; Ameet Talwalkar; Aviral Kumar

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar

Published: 28 Sept 2025, Last Modified: 19 Oct 2025SEA @ NeurIPS 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Agent, Web Agent, Reasoning

TL;DR: We propose to scale the number of interaction steps for agents as a new axis of test-time scaling and develop a curriculum-based online RL algorithm for training agents to scale interaction.

Abstract: Test-time scaling in agentic tasks often relies on generating long reasoning traces ("think'' more) before acting, but this does not allow agents to acquire new information from the environment or adapt behavior over time. In this work, we propose scaling test-time interaction, an untapped dimension for test-time scaling that increases the agent's interaction horizon to enable rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we situate our study in the domain of web agents. We first show that even prompting-based interaction scaling can improve task success on web benchmarks non-trivially. Building on this, we introduce TTI, a curriculum-based online reinforcement learning (RL) approach that trains agents by adaptively adjusting their interaction lengths during rollout. Using a Gemma 3 12B model, TTI sets a new state-of-the-art among open-source agents trained on public data on WebVoyager and WebArena.

Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.

Submission Number: 10

Loading