Keywords: Ranking tasks, LLM reasoning, Candidate elimination
TL;DR: IRanker is a reasoning-based ranking model that uses RL and iterative decoding to unify diverse ranking tasks, achieving state-of-the-art results and strong zero-shot generalization.
Abstract: Large language models (LLMs) have recently shown strong reasoning abilities
in domains like mathematics, coding, and scientific problem-solving, yet their
potential for ranking tasks, where prime examples include retrieval, recommender
systems, and LLM routing, remains underexplored. Ranking requires complex rea-
soning across heterogeneous candidates, but existing LLM-based rankers are often
domain-specific, tied to fixed backbones, and lack iterative refinement, limiting
their ability to fully exploit LLMs’ reasoning potential. To address these challenges,
we propose R1-Ranker, a reasoning-incentive framework built on reinforcement
learning, with two complementary designs: DRanker, which generates full rankings
in one shot, and IRanker, which decomposes ranking into an iterative elimination
process with step-wise rewards to encourage deeper reasoning. We evaluate unified
R1-Rankers on nine datasets spanning recommendation, routing, and passage rank-
ing, showing that IRanker-3B consistently achieves state-of-the-art performance,
surpasses larger 7B models on some tasks, and yields a 15.7% average relative
improvement. Ablation and generalization experiments further confirm the critical
role of reinforcement learning and iterative reasoning, with IRanker-3B improving
zero-shot performance by over 9% on out-of-domain tasks and reasoning traces
boosting other LLMs by up to 22.87%. These results demonstrate that unifying
diverse ranking tasks with a single reasoning-driven foundation model is both
effective and essential for advancing LLM reasoning in ranking scenarios.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 20163
Loading