Test-Time Exploration in Unknown Environments

Wentong Chen; Xin Cong; Zhong Zhang; Yaxi Lu; Siyuan Zhao; Yesai Wu; Qinyu Luo; Haotian Chen; Yankai Lin; Zhiyuan Liu; Maosong Sun

Test-Time Exploration in Unknown Environments

Wentong Chen, Xin Cong, Zhong Zhang, Yaxi Lu, Siyuan Zhao, Yesai Wu, Qinyu Luo, Haotian Chen, Yankai Lin, Zhiyuan Liu, Maosong Sun

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM-based agent, learn to explore, reinforcement learning

TL;DR: We propose a framework which enables agents to explore environmental implicit rules at test time, and train a specialized thinker to improve its performance.

Abstract: With the continuous advancement of Large Language Models (LLMs), intelligent agents are becoming increasingly vital. However, these agents often fail in environments governed by implicit rules—hidden constraints that cannot be observed directly and must be inferred through interaction. This causes agents to fall into repetitive trial-and-error loops, ultimately leading to task failure. To address this challenge, we propose **Test-Time Exploration (TTExplore)**, a framework where a thinker component analyzes interaction history to infer these implicit rules and guide an actor. As training a thinker is challenged due to sparse task rewards, we introduce a novel training pipeline for stable reinforcement learning by incorporating techniques such as task decomposition and difficulty filtering. Using this pipeline, we train a specialized 7B model, **Exp-Thinker**. Evaluated on five text-based embodied tasks, TTExplore with our trained Exp-Thinker significantly improves baseline agent scores by an average of $14$-$19$ points, demonstrating the effectiveness of explicitly reasoning about implicit rules.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 9201

Loading