Reranker Optimization via Geodesic Distances on k-NN Manifolds

Published: 10 May 2026, Last Modified: 10 May 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Current neural reranking approaches for retrieval-augmented generation (RAG) rely on cross- encoders or large language models (LLMs), requiring substantial computational resources and exhibiting latencies of 3–5 seconds per query. We propose Maniscope, a geometric reranking method that computes geodesic distances on k-nearest neighbor (k-NN) manifolds constructed over retrieved document candidates. This approach combines global cosine similarity with local manifold geometry to capture neighborhood coherence within the candidate set that global pairwise similarity alone cannot model. Evaluated on 15 BEIR benchmark datasets (∼25,000 queries spanning scientific, biomedical, financial, web search, and fact-verification domains), Maniscope achieves 0.9806 average NDCG@10, ranking best on 13 of 15 datasets and outperforming HNSW (0.9673) and three established graph-diffusion baselines (0.7326–0.7630) at 13 ms average latency, 1.8× faster than HNSW (23.7 ms). The algorithm requires O(N D + M 2 D + M k log k) complexity with M ≪ N . Code and data are released as open source.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: # Changes Since Last Submission All four reviewer concerns have been fully addressed and are reflected in the revised paper. --- ## 1. Positioning and Novelty - Removed the false claim that geodesic distances on k-NN manifolds "haven't been applied to retrieval tasks." A dedicated **"Manifold-Based Retrieval"** subsection situates the contribution as a RAG-specific architecture with sub-15 ms latency engineering. - Removed misused citations (Arora et al. 2018; Ethayarajh 2019). Section 1 rewritten around local neighborhood coherence. - Added Discussion paragraph **"The performance gap is not incremental"**: graph-diffusion baselines score 0.7326–0.7630 average NDCG@10; Maniscope scores 0.9806, a gap of $+0.218$–$0.248$ (29–34% relative error reduction over best prior method). - Added Discussion paragraphs on: (a) graph-based reranking as an overlooked direction (decade-long gap between Zhou 2003 and Dampanaboina 2025); (b) technical motivation for Dijkstra from a cosine anchor vs. Laplacian diffusion; (c) practical deployability (no fine-tuning, no labeled data, no GPU). --- ## 2. Baselines - **Three new graph-diffusion baselines** implemented and evaluated across all 15 datasets (Table 2): Manifold Ranking (Zhou et al., NIPS 2003), Diffusion-Aided RAG (Dampanaboina et al., 2025), Donoser & Bischof PSP (CVPR 2013). - Wang et al. (2012) added to Related Work. - BGE-M3 corrected from "cross-encoder" to "multi-functional bi-encoder" throughout. - HNSW is applied as a **reranker** over the same top-$M$ candidate set as Maniscope; all latency figures are pure reranking overhead with first-stage retrieval excluded uniformly. --- ## 3. Dataset Coverage - Evaluation expanded from **8 to 15 BEIR datasets** (~25,000 queries, 5 task types) plus AorB disambiguation benchmark. Maniscope ranks best on **13 of 15 datasets**. --- ## 4. Metrics - All experiments re-run with **NDCG@10, MRR@10, and P@10** using the `ir_measures` harness (BEIR standard). Saturation artefacts at @3 resolved. Formal metric definitions added to Section 4.3. - NFCorpus MRR bolding error corrected; annotations now derived programmatically. - Term "flat k-NN graph" removed; replaced with "single-layer k-NN graph." --- ## 5. Hyperparameter Sensitivity (Section 5.3, Table 4, Figure 1, Appendix Table 5) - **$k$-sweep** ($k \in \{3,5,7,9,11,13,15\}$, $\alpha=0.5$): performance plateaus at $k=9$; stable across datasets. - **$\alpha$-sweep** ($\alpha \in \{0.0, 0.25, 0.5, 0.75, 1.0\}$, $k=5$): $\alpha=0.5$ robustly optimal; stable within $\alpha \in [0, 0.5]$ (delta $<0.005$ NDCG@10). - **$M$-sweep** ($M \in \{10, 50, 100, 200, 500\}$): latency scales sub-linearly, confirming the $O(kM\log M)$ bound (Appendix Table 5). - **Figure 1**: NDCG@10 contour heatmap over $\alpha \times k$. Default ($k=5$, $\alpha=0.5$) lies in the robust high-performance region, not a narrow optimum. --- ## 6. Qualitative Analysis (Figure 2) - Two-panel UMAP projection of top-$M$ candidates from NFCorpus. **Panel A** ("veggie chicken"): tight semantic cluster — geodesic reranking exploits density. **Panel B** ("Cholesterol and Lower Back Pain"): isolated relevant documents — illustrates the failure mode when the manifold assumption does not hold. --- ## 7. Limitations and Future Work (Section 6) - Added **"Anchor sensitivity"** paragraph: failure mode explained; $\alpha > 0$ partially mitigates; full ablation committed as future work. - Added **"First-stage recall dependency"** paragraph: increasing $M$ from 10 to 100 costs only 144 ms (Appendix Table 5) vs. cross-encoders scaling as $O(M)$ forward passes. - Added note on embedding quality: impact of higher-dimensional embeddings on manifold structure is an open question for future work. --- ## 8. Reproducibility - Code and data released as open source: https://github.com/digital-duck/maniscope, installable via `pip install maniscope`.
Code: https://github.com/digital-duck/maniscope
Assigned Action Editor: ~Ankit_Singh_Rawat1
Submission Number: 7197
Loading