SuperIntelligent Retrieval Agent: The Next Frontier of Information Retrieval

Published: 03 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop MemAgents PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval Agent, Information Retrieval, Sparse Retrieval, Domain Pre-Training
TL;DR: We introduce SIRA, a retrieval-centric agent that tightly integrates LLM reasoning with fine-grained control over search constraints, enabling expert-level, efficient, and scalable knowledge retrieval far beyond traditional black-box search agents.
Abstract: Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most treat retrieval as a black box: they can rewrite queries and react to returned snippets, but they cannot directly steer retrieval (e.g., enforce constraints, weight keywords, or decompose queries) or exploit what is already likely known in the domain. In practice, these agents act like a domain newcomer, relying on standard textbook commonsense resulting in inefficient exploratory querying; instead, our proposal behaves like an expert by first anticipating the likely answer and then using that expectation to plan and precisely control retrieval. We introduce SIRA (SuperIntelligent Retrieval Agent), a retrieval-centric agent pretrained on the target knowledge base and equipped with explicit retrieval control knobs such as constraint selection, keyword weighting, and query decomposition. SIRA follows a scalable two-stage framework that bridges retrieval and LLM inference. First, a domain-pretrained LLM produces an expected response—an expert-like sketch of what the correct answer should contain. Second, SIRA converts this expectation into a retrieval plan (keywords, constraints, and sub-queries) and executes controllable sparse retrieval that preserves fine-grained control throughout execution. This design avoids high-latency, memory-intensive vector search while tightly coupling retrieval decisions with next-token generation. Across BEIR and downstream question-answering benchmarks, SIRA consistently outperforms strong dense retrievers, SPLADE, and state-of-the-art agentic baselines, pointing to a practical path toward expert-level, controllable, and scalable retrieval-augmented agents.
Submission Number: 93
Loading