CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity

CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity

ACL ARR 2024 December Submission1615 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Retrieval-Augmented Generation (RAG) aims to enhance large language models (LLMs) to generate more accurate and reliable answers with the help of the retrieved context from external knowledge source. Despite the advancements, the evaluation of these systems remains a crucial research area due to the following issues: (1) Limited data diversity of knowledge sources and query types; (2) Difficult to locate problems in the RAG pipeline; (3) Unable to evaluate retrieval performance effectively. To tackle these challenges, we propose a Comprehensive Full-chain Evaluation (CoFE-RAG) framework to facilitate thorough evaluation across the entire RAG pipeline, including chunking, retrieval, reranking, and generation. To effectively evaluate the first three phases, we introduce multi-granularity keywords, including coarse-grained and fine-grained keywords, to assess the retrieved context instead of relying on the annotation of golden chunks. Moreover, we release a holistic benchmark dataset tailored for diverse data scenarios covering a wide range of document formats and query types. We demonstrate the utility of the CoFE-RAG framework by conducting experiments to evaluate each stage of RAG systems.

Paper Type: Long

Research Area: Information Retrieval and Text Mining

Research Area Keywords: Retrieval-Augmented Generation

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis

Languages Studied: English, Chinese

Submission Number: 1615

Loading