Abstract: With the rapid development of large language models, Retrieval-Augmented Generation (RAG) that incorporates external knowledge has become a widely adopted approach to help large language models alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval, thereby diminishing the reliability and correctness of the generated outcomes. In this paper, we propose Credibility-Aware Generation (CAG), a universally applicable framework designed to address the issue of flawed information in RAG. At its core, CAG aims to equip models with the ability to discern and process information based on its credibility. To this end, we propose an innovative data transformation framework that generates data based on credibility, thereby effectively endowing models with the capability of CAG. To effectively assess models' capabilities of CAG, we construct a comprehensive benchmark encompassing three critical real-world scenarios. Experimental results demonstrate that our models can understand and utilize credibility, significantly outperform other models with retrieval augmentation, and effectively resist the impact of noise documents, maintaining robust performance.
Paper Type: long
Research Area: Question Answering
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
0 Replies
Loading