Abstract: Dense Passage Retrieval (DPR) typically relies on Euclidean or Cosine distance to measure query-passage relevance in embedding space. While effective when embeddings lie on a linear manifold, our experiments across DPR benchmarks suggest that embeddings often lie on lower-dimensional, non-linear manifolds, especially in out-of-distribution (OOD) settings, where these distances fail to capture semantic similarity. To address this limitation, we propose a *manifold-aware* distance metric for DPR (**MA-DPR**) that models the intrinsic manifold structure of passages using a nearest neighbor graph and measures distance between query and passages based on their shortest path in this graph. We show that MA-DPR outperforms Euclidean and Cosine distance by up to 26% on OOD passage retrieval while maintaining performance on in-distribution data across various embedding models, with only a small increase in query inference time. Empirical evidence suggests that manifold-aware distance allows DPR to leverage context from related neighboring passages, making it effective even in the absence of direct semantic overlap. In addition, it can be extended to a wide range of dense embedding and DPR tasks, offering practical utility across diverse retrieval scenarios.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: Dense Passage Retrieval, Manifold, Distance Metrics, Graph Search
Languages Studied: English
Submission Number: 5173
Loading