Leveraging RAG for Training-Free Alignment of LLMs

Published: 01 Mar 2026, Last Modified: 05 Apr 2026TTU at ICLR 2026 (Main)EveryoneRevisionsBibTeXCC BY 4.0
Abstract: We introduce Retrieval Augmented Generation for Pref erence alignment (RAG-Pref), a training-free alignment algorithm compatible with existing off-the-shelf packages. By conditioning on preferred and dispreferred samples during inference, RAG-Pref utilizes additional contrastive information compared to standard RAG. For agentic safety alignment across five widely used models, we show that while state-of-the-art offline (training-based) and online preference alignment algorithms struggle to improve refusal guardrails against adversarial attacks, RAG-Pref effectively improves refusal performance. Furthermore, in stark contrast to other online alignment algorithms, RAG-Pref drastically improves performance on general human-preference alignment tasks without substantially increasing computational requirements.
Submission Number: 8
Loading