Meta-Researcher: Empowering Planning and Reflection Mechanisms in Large Reasoning Models for Advanced Deep Research Abilities
Keywords: Deep Research, Large Reasoning Models, Metacognitive Capabilities
TL;DR: We propose Meta-Researcher, an end-to-end reinforcement learning method that turns uncontrollable task planning and reflection processes of LRMs into controllable ones, so as to effectively enhance decision-making and metacognitive abilities of LRMs.
Abstract: Deep research significantly reduces the time and cost of information gathering for researchers by collecting and integrating vast amounts of data. However, its uncontrollable planning and reflection phases during reasoning lead to errors or gaps in information collection, and make it challenging to ensure timely reflection for correcting and supplementing information—thereby performing suboptimally in complex tasks requiring extensive data gathering. To address this limitation, we propose Meta-Researcher, an End-to-End Reinforcement Learning-based Deep Research Method designed to equip Large reasoning models (LRMs) and non-reasoning models with metacognitive capabilities for autonomously executing the research process of "Task Planning - Information Gathering - Process Reflection - Problem Solving'', thereby effectively tackling complex problems that require multiple rounds of information collection and reasoning. Firstly, our approach standardizes LRMs to explicitly output controllable planning and reflection processes rather than implicitly including them within reasoning, thus ensuring that LRMs demonstrate metacognitive abilities in practice. Secondly, we perform end-to-end optimization through the Group Relative Policy Optimization (GRPO) strategy to enhance the active decision-making capabilities of LRMs while strengthening the metacognitive process. Extensive experiments on two tasks — closed-ended question answering and open-ended topic research — demonstrate that Meta-Researcher significantly outperforms existing deep search methods, deep research methods, and proprietary systems. Our approach enhances the reliability and applicability of LRMs in complex task scenarios, offering a new paradigm for developing intelligent agents with autonomous research capabilities.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 295
Loading