Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables

Xuzhao Geng; Haozhao Wang; Jun Wang; Wei Liu; Zhang Jinghua; Ruixuan Li

Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables

Xuzhao Geng, Haozhao Wang, Jun Wang, Wei Liu, Zhang Jinghua, Ruixuan Li

18 Sept 2025 (modified: 22 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Retrieval-Augmented Generation, Budget-friendly Annotation

TL;DR: Mining informative model conversation data for fine-tuning LLMs through active learning.

Abstract: Retrieval-augmented generation (RAG) is a key technique for leveraging external knowledge and enhancing the factual accuracy of large language models (LLMs). However, RAG still faces challenges in ensuring fully reliable responses in all scenarios. To address this, it is essential to identify samples that tend to lead to unreliable outputs or guide LLMs toward factually correct responses, which experts then annotate to develop high-quality datasets for refining LLMs. However, the growing scarcity of such datasets makes their creation challenging. This paper proposes using the vast amount of conversations generated from widespread LLM usage to build these datasets, with the goal of training LLMs to appropriately handle queries outside its capabilities while providing accurate responses to manageable ones. Given the impracticality of having experts annotate all conversation records, we introduce AL4RAG, a framework that uses active learning to select the most suitable conversation samples for annotation, thereby optimizing model performance within a limited annotation budget. Additionally, recognizing that traditional active learning methods are not fully compatible with RAG due to unsuitable distance metrics, we develop a novel sample distance measurement for RAG active learning. Extensive experiments show that our method consistently outperforms baselines across multiple metrics.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 11961

Loading