Abstract: Large language models (LLMs) augmented
with retrieval systems have significantly advanced natural language processing tasks by integrating external knowledge sources, enabling
more accurate and contextually rich responses.
To improve the robustness of such systems
against noisy retrievals, Retrieval-Augmented
Fine-Tuning (RAFT) has emerged as a widely
adopted method. However, RAFT conditions
models to generate answers even in the absence of reliable knowledge. This behavior
undermines their reliability in high-stakes domains, where acknowledging uncertainty is critical. To address this issue, we propose DivideThen-Align (DTA), a post-training approach designed to endow RAG systems with the ability to respond with "I don’t know" when the
query is out of the knowledge boundary of both
the retrieved passages and the model’s internal
knowledge. DTA divides data samples into four
knowledge quadrants and constructs tailored
preference data for each quadrant, resulting in
a curated dataset for Direct Preference Optimization (DPO). Experimental results on three
benchmark datasets demonstrate that DTA effectively balances accuracy with appropriate
abstention, enhancing the reliability and trustworthiness of retrieval-augmented systems.
Loading