Keywords: Retrieval-Augmented Generation, Reranking, Preference Optimization, Direct Preference Optimization, Multi-round Learning, Continual Learning, Learning from Feedback, Large Language Models
Abstract: Retrieval-Augmented Generation (RAG) is essential for grounding large language models in external knowledge, where effective reranking is critical because irrelevant or noisy retrieved documents can substantially impair downstream reasoning. In realistic deployments, retrieval pipelines continuously evolve, producing new query–document candidates over time and naturally giving rise to a multi-round optimization problem for rerankers. However, effectively leveraging multi-round feedback is challenging: while an initial labeled dataset may be available, newly retrieved data in later rounds is typically unlabeled or lacks reliable human feedback, making sustained improvement difficult.
In this work, we study continuous reranker evolution under two realistic supervision settings. Under the full-feedback setting, where preference supervision is available at each round, we propose ReplayDPO, which stabilizes multi-round preference optimization by replaying historical preferences and mitigating catastrophic forgetting. Under the more practical no-feedback setting, where only initial supervision is provided, we introduce CautiousDPO, which cautiously constructs reliable preference signals from unlabeled data via confidence-aware filtering and multi-model consensus.
Extensive experiments on six benchmarks covering factual verification, multi-hop reasoning, and domain-specific retrieval demonstrate that ReplayDPO consistently improves stability and performance under full feedback, while CautiousDPO enables reliable self-evolution without expert supervision and substantially narrows the performance gap to full-feedback training. These results show that EVO-Reranker provides a unified framework for continuous reranker evolution across both feedback-rich and feedback-limited scenarios.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: re-ranking, preference optimization, continual learning
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2604
Loading