Keywords: multi-vector search, late interaction, vector databases, near-neighbor search
TL;DR: We present a novel extension of classical stability theory to the multi-vector setting and prove that the popular Chamfer distance metric preserves stability.
Abstract: Modern vector databases enable efficient retrieval over high-dimensional neural embeddings, powering applications from web search to retrieval-augmented generation. However, classical theory predicts such tasks should suffer from \emph{the curse of dimensionality}, where distances between points become nearly indistinguishable, thereby crippling efficient nearest-neighbor search. We revisit this paradox through the lens of \emph{stability}, the property that small perturbations to a query do not radically alter its nearest neighbors. Building on foundational results, we extend stability theory to multi-vector search, where we prove that the popular Chamfer distance metric preserves single-vector stability, while average pooling aggregation may destroy it. Across synthetic and real datasets, our experimental results match our theoretical predictions, offering concrete guidance for model and system design to circumvent the curse of dimensionality in multi-vector settings.
Submission Number: 3
Loading