KBQA or LLM-QA: A Unified Benchmark for Question Answering

KBQA or LLM-QA: A Unified Benchmark for Question Answering

ACL ARR 2025 February Submission7949 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, Large language models (LLMs) and retrieval-augmented generation (RAG) have demonstrated remarkable performance in question answering (QA) tasks. However, whether RAG can replace traditional supervised methods based on Knowledge Base (KB) remains to be further explored. The main difficulty is that in existing multi-source knowledge retrieval datasets, information from KBs and text is not equivalent and cannot be directly compared. To bridge this gap, we propose our Trace-then-Synthesize framework, synthesizing necessary knowledge from KBs into corpus. With this method, we have constructed a dataset with equivalent information in both KB and corpus. Compared to existing datasets, our dataset compensates for the weaknesses of the RAG dataset, such as its small number of questions and black-box reasoning process, while having a broader applicability than traditional complex QA datasets. Through extensive experiments, we have demonstrated the strengths and limitations of various existing QA methods and showcased the powerful capabilities of this dataset in QA tasks.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: Retrieval Augmented Generation, Knowledge Base Question Answering, Large language model, Synthetic Data

Languages Studied: English

Submission Number: 7949

Loading