Abstract: Large language models (LLMs) have recently shown strong reasoning abilities in domains
like mathematics, coding, and scientific problem-solving, yet their potential for ranking tasks,
where prime examples include retrieval, recommender systems, and LLM routing, remains
underexplored. Ranking requires complex reasoning across heterogeneous candidates, but
existing LLM-based rankers are often domain-specific, tied to fixed backbones, and lack
iterative refinement, limiting their ability to fully exploit LLMs’ reasoning potential. To
address these challenges, we propose ThinkRanker, a reasoning-incentive framework built on
reinforcement learning, with two complementary designs: DRanker, which generates full rank-
ings in one shot, and IRanker, which decomposes ranking into an iterative elimination process
with step-wise rewards to encourage deeper reasoning. We evaluate unified ThinkRankers
on nine datasets spanning recommendation, routing, and passage ranking, showing that
IRanker-3B consistently achieves state-of-the-art performance, surpasses larger 7B models on
some tasks, and yields a 15.7% average relative improvement. Ablation and generalization
experiments further confirm the critical role of reinforcement learning and iterative reasoning,
with IRanker-3B improving zero-shot performance by over 9% on out-of-domain tasks and
reasoning traces boosting other LLMs by up to 22.87%. These results demonstrate that
unifying diverse ranking tasks with a single reasoning-driven foundation model is both
effective and essential for advancing LLM reasoning in ranking scenarios.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Han-Jia_Ye1
Submission Number: 9101
Loading