Ko-LongRAG: A Korean Long-Context RAG Benchmark Built with a Retrieval-Free Approach

Ko-LongRAG: A Korean Long-Context RAG Benchmark Built with a Retrieval-Free Approach

ACL ARR 2025 May Submission6871 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rapid advancement of large language models (LLMs) significantly enhances long-context Retrieval-Augmented Generation (RAG), yet existing benchmarks focus primarily on English. This leaves low-resource languages without comprehensive evaluation frameworks, limiting their progress in retrieval-based tasks. To bridge this gap, we introduce Ko-LongRAG, the first Korean long-context RAG benchmark. Unlike conventional benchmarks that depend on external retrievers, Ko-LongRAG adopts a retrieval-free approach designed around Specialized Content Knowledge (SCK), enabling controlled and high-quality QA pair generation without the need for an extensive retrieval infrastructure. %By clustering domain-specific documents and generating intra-cluster question-answer pairs, Ko-LongRAG effectively simulates retrieval-based reasoning while maintaining high contextual fidelity. Our evaluation shows that o1 model achieves the highest performance among proprietary models, while EXAONE 3.5 leads among open-sourced models. Additionally, various findings confirm Ko-LongRAG as a reliable benchmark for assessing Korean long-context RAG capabilities and highlight its potential for advancing multilingual RAG research. The dataset and source code will be released publicly.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: Resources and Evaluation, NLP Applications, Question Answering

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: Korean

Submission Number: 6871

Loading