EVO-Reranker: Continuous Reranker Evolution under Multi-Round Feedback

EVO-Reranker: Continuous Reranker Evolution under Multi-Round Feedback

ACL ARR 2026 January Submission2604 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Reranking, Preference Optimization, Direct Preference Optimization, Multi-round Learning, Continual Learning, Learning from Feedback, Large Language Models

Abstract: Retrieval-Augmented Generation (RAG) is essential for grounding large language models in external knowledge, where effective reranking is critical because irrelevant or noisy retrieved documents can substantially impair downstream reasoning. In realistic deployments, retrieval pipelines continuously evolve, producing new query–document candidates over time and naturally giving rise to a multi-round optimization problem for rerankers. However, effectively leveraging multi-round feedback is challenging: while an initial labeled dataset may be available, newly retrieved data in later rounds is typically unlabeled or lacks reliable human feedback, making sustained improvement difficult. In this work, we study continuous reranker evolution under two realistic supervision settings. Under the full-feedback setting, where preference supervision is available at each round, we propose ReplayDPO, which stabilizes multi-round preference optimization by replaying historical preferences and mitigating catastrophic forgetting. Under the more practical no-feedback setting, where only initial supervision is provided, we introduce CautiousDPO, which cautiously constructs reliable preference signals from unlabeled data via confidence-aware filtering and multi-model consensus. Extensive experiments on six benchmarks covering factual verification, multi-hop reasoning, and domain-specific retrieval demonstrate that ReplayDPO consistently improves stability and performance under full feedback, while CautiousDPO enables reliable self-evolution without expert supervision and substantially narrows the performance gap to full-feedback training. These results show that EVO-Reranker provides a unified framework for continuous reranker evolution across both feedback-rich and feedback-limited scenarios.

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: re-ranking, preference optimization, continual learning

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2604

Loading