Ko-LongRAG: A Korean Long-Context RAG Benchmark Built with a Retrieval-Free Approach

ACL ARR 2025 May Submission6871 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid advancement of large language models (LLMs) significantly enhances long-context Retrieval-Augmented Generation (RAG), yet existing benchmarks focus primarily on English. This leaves low-resource languages without comprehensive evaluation frameworks, limiting their progress in retrieval-based tasks. To bridge this gap, we introduce Ko-LongRAG, the first Korean long-context RAG benchmark. Unlike conventional benchmarks that depend on external retrievers, Ko-LongRAG adopts a retrieval-free approach designed around Specialized Content Knowledge (SCK), enabling controlled and high-quality QA pair generation without the need for an extensive retrieval infrastructure. %By clustering domain-specific documents and generating intra-cluster question-answer pairs, Ko-LongRAG effectively simulates retrieval-based reasoning while maintaining high contextual fidelity. Our evaluation shows that o1 model achieves the highest performance among proprietary models, while EXAONE 3.5 leads among open-sourced models. Additionally, various findings confirm Ko-LongRAG as a reliable benchmark for assessing Korean long-context RAG capabilities and highlight its potential for advancing multilingual RAG research. The dataset and source code will be released publicly.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, NLP Applications, Question Answering
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: Korean
Submission Number: 6871
Loading