Abstract: Large Language Models (LLMs) have revolutionized artificial intelligence with capabilities in reasoning, coding, and communication, driving innovation across industries. Their true potential depends on effective alignment to ensure correct, trustworthy and ethical behavior, addressing challenges like misinformation, hallucinations, bias and misuse. While existing Reinforcement Learning (RL)-based alignment methods are notoriously complex, direct optimization approaches offer a simpler alternative.
In this work, we introduce a novel direct optimization approach for LLM alignment by drawing on established Information Retrieval (IR) principles. We present a systematic framework that bridges LLM alignment and IR methodologies, mapping LLM generation and reward models to IR's retriever-reranker paradigm. Building on this foundation, we propose LLM Alignment as Retriever Preference Optimization (LarPO), a new alignment method that enhances overall alignment quality. Extensive experiments validate LarPO's effectiveness with 38.9 % and 13.7 % averaged improvement on AlpacaEval2 and MixEval-Hard respectively. Our work opens new avenues for advancing LLM alignment by integrating IR foundations, offering a promising direction for future research.
Lay Summary: Large language models (LLMs) like ChatGPT have transformed how we interact with technology — they can reason, write code, and hold conversations. But ensuring these models behave reliably and ethically is a major challenge. Without proper safeguards, they can produce misinformation, show bias, or behave unpredictably.
Traditionally, aligning these models to behave well involves complicated training techniques. In our work, we introduce a much simpler and more effective way to align LLMs by borrowing ideas from how search engines work.
Just like a search engine finds and ranks relevant information, our method trains LLMs to “rank” better responses — rewarding the ones that are more helpful or accurate. We call this approach LarPO, short for LLM Alignment as Retriever Preference Optimization.
Our results show that this strategy significantly improves how well the models perform on challenging tasks. By connecting language model training with ideas from information retrieval, we offer a new and practical path toward safer and more trustworthy AI systems.
Primary Area: Deep Learning->Large Language Models
Keywords: information retrieval, LLM alignment
Submission Number: 3942
Loading