Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

ACL ARR 2026 January Submission6085 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Languge Model, Search Agent, Reinforcement Learning, Hierarchical Experience Knowledge

Abstract: Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcome rewards, leading to inefficient reasoning trajectories and unstable training. To address these issues, we propose a novel framework, Hierarchical Experience (HiExp), to enhance the performance and training stability of search agents. Specifically, we extract empirical knowledge through contrastive analysis and a multi-level clustering mechanism, transforming raw reasoning trajectories into hierarchical experience knowledge. By leveraging experience-aligned training, we effectively regularize stochastic exploration, evolving it into a strategic and experience-driven search process. Extensive evaluations on multiple complex agentic search and mathematical reasoning benchmarks demonstrate that our approach not only achieves substantial performance gains but also exhibits strong cross-task and cross-algorithm generalization.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: Generation, NLP Applications

Contribution Types: NLP engineering experiment, Position papers

Languages Studied: English

Submission Number: 6085

Loading