Keywords: Post-training, Information Control, Search-Augmented Reasoning, Reinforcement Learning
TL;DR: DeepControl is a framework for adaptive information control in search-augmented reasoning, using information utility to efficiently regulate retrieval.
Abstract: Search-augmented reasoning agents interleave multi-step reasoning with external information retrieval, but uncontrolled retrieval often leads to redundant evidence, context saturation, and unstable learning.
Existing approaches typically rely on outcome-based reinforcement learning (RL), where sparse, delayed rewards provide limited guidance for regulating when, how much, and at what granularity information should be acquired.
We propose DeepControl, a framework for adaptive information control grounded in a formal notion of information utility, which quantifies the state-dependent marginal value of retrieved evidence for ongoing reasoning.
Building on this utility, we introduce retrieval continuation and granularity control mechanisms that selectively decide whether retrieval should proceed and which parts of hierarchical information to expand.
An annealed control strategy further enables the agent to internalize effective information acquisition behaviors during training.
Extensive experiments across seven benchmarks demonstrate that our method consistently outperforms strong outcome-based RL baselines and retrieval-free or retrieval-based reasoning methods without explicit information control.
In particular, compared with Search-R1, a strong outcome-based RL baseline, our approach improves average performance by +9.4 and +8.6 points on Qwen2.5-7B and Qwen2.5-3B, respectively.
Beyond performance, our analysis reveals how information utility evolves with retrieval depth and training scale, shedding light on efficiency–performance trade-offs in large-scale post-training for search-augmented reasoning agents.
An anonymous project repository is available at https://drive.google.com/drive/folders/1y6gbnpFdVPwBsqNEqaxdZKr0sC3JUTFV?usp=sharing
Submission Number: 26
Loading