RAG Approach Enhanced by Category Classification with BERT

Yuki Taya; Daiki Ito; Shingo Maeda; Yusuke Hamano

RAG Approach Enhanced by Category Classification with BERT

Yuki Taya, Daiki Ito, Shingo Maeda, Yusuke Hamano

Published: 11 Sept 2024, Last Modified: 11 Sept 20242024 KDD Cup CRAG WorkshopEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: LLM, RAG, BERT

Abstract: We are honored to announce that our team has secured first place in the False-Premise category of Task 3 in the Meta Comprehensive Retrieval-Augmented Generation (CRAG) Challenge, part of the KDD Cup 2024 [1]. This competition addresses the critical issue of hallucination in Large Language Models (LLMs) by leveraging Retrieval-Augmented Generation (RAG) systems. Despite the advancements in LLMs, their accuracy in answering questions about both slow-changing and fast-changing facts remains below 15%, and even for stable facts, the accuracy is below 35% for less popular entities [1]. The CRAG Benchmark evaluates RAG systems across five domains and eight question types, providing a rigorous framework for assessing their performance. The challenge comprises three tasks: Web-Based Retrieval Summarization, Knowledge Graph and Web Augmentation, and End-to-End RAG, each designed to progressively enhance the complexity and capability of RAG systems. Evaluation metrics include both automated and human assessments, with a focus on response quality and conciseness. Participants are required to use Llama models [2] and adhere to specific hardware and resource constraints. Our approach consists of three major components. First, we classified the attributes of questions using BERT. This allowed us to handle relatively difficult questions by responding with "IDK" (I don't know), successfully navigating through them. Second, we implemented filtering techniques to use the same architecture across all tasks. This enabled us to conduct experiments efficiently across all tasks. Finally, after generating answers with the LLM, we adopted an architecture that refines the responses. This mechanism significantly reduced hallucinations in the LLM's answers. As a result, although our overall ranking across all tasks was not outstanding, we were able to secure first place in the False-Premise category of Task 3.

Submission Number: 14

Loading