When Enough is Enough: Rank-Aware Early Termination for Vector Search

Published: 19 Mar 2026, Last Modified: 20 May 2026MLSys 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vector Search, Approximated Nearest Neighbor, Retrieval-Augmented Generation
Abstract: Graph-based vector search underpins modern LLM applications such as retrieval-augmented generation (RAG), but its efficiency is increasingly constrained by disk I/O. Existing systems continue searching long after discovering the higher-ranked (i.e., most valuable) results for downstream applications. We present Terminus, a rank-aware early termination mechanism that dynamically aligns I/O spending with application utility. Terminus models per-I/O search utility using a rank-weighted function and terminates once recent I/Os yield negligible utility gains. By adaptively terminating search based on rank-aware signals, Terminus improves recovery of top-ranked results that matter most for downstream tasks, achieving a better performance–accuracy trade-off. It delivers up to 1.4× higher throughput at the same accuracy target compared to existing early termination schemes, and up to 3.2× higher throughput than a baseline without early termination, with minimal impact on RAG accuracy.
Topics: Agentic Systems: Systems optimizations for agentic AI applications
Submission Number: 9
Loading