MaskSearch: Towards Scalable Agentic Pre-Training for Search-Enhanced Reasoning

Weiqi Wu; Xin Guan; Shen Huang; Yong Jiang; Pengjun Xie; Fei Huang; Jiuxin Cao; hai zhao; Jingren Zhou

MaskSearch: Towards Scalable Agentic Pre-Training for Search-Enhanced Reasoning

Weiqi Wu, Xin Guan, Shen Huang, Yong Jiang, Pengjun Xie, Fei Huang, Jiuxin Cao, hai zhao, Jingren Zhou

04 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Search Agent, Retrieval-Augmented Language Model, Reasoning, Agentic Model

TL;DR: We propose MaskSearch, a two-stage framework that first pre-trains search agents on a Retrieval-Augmented Mask Prediction (RAMP) task, then adapts them to downstream tasks like multi-hop QA.

Abstract: Retrieval-Augmented Large Language Models (LLMs) excel at knowledge-intensive tasks but struggle in complex scenarios due to passive retrieval. While search agents empower LLMs to actively use tools for reasoning, existing training-based methods remain constrained by task-specific data. Therefore, we propose MaskSearch, a two-stage training framework that bridges foundation models with search agents through a novel Retrieval-Augmented Mask Prediction (RAMP) task. Models first learn to recover masked spans via multi-step search and reasoning as a pre-training stage to endow foundational agentic capabilities that can be further improved by post-training on downstream tasks. We apply Supervised Fine-tuning (SFT) or Reinforcement Learning (RL) for training. For SFT, we combine multi-agent trajectory synthesis with iterative self-evolution distillation to construct data. For RL, we employ DAPO with a hybrid reward system consisting of answer and format rewards. Additionally, we introduce a curriculum learning strategy based on the number of masked spans. We evaluate the effectiveness of our framework in the scenario of open-domain multi-hop question answering. Extensive experiments demonstrate that MaskSearch effectively equips LLMs with transferable agentic abilities, advancing the development of search agents.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 2113

Loading