Abstract: The rapid advancement of large language models (LLMs) significantly enhances long-context Retrieval-Augmented Generation (RAG), yet existing benchmarks focus primarily on English.
This leaves low-resource languages without comprehensive evaluation frameworks, limiting their progress in retrieval-based tasks.
To bridge this gap, we introduce Ko-LongRAG, the first Korean long-context RAG benchmark.
Unlike conventional benchmarks that depend on external retrievers, Ko-LongRAG adopts a retrieval-free approach designed around Specialized Content Knowledge (SCK), enabling controlled and high-quality QA pair generation without the need for an extensive retrieval infrastructure.
%By clustering domain-specific documents and generating intra-cluster question-answer pairs, Ko-LongRAG effectively simulates retrieval-based reasoning while maintaining high contextual fidelity.
Our evaluation shows that o1 model achieves the highest performance among proprietary models, while EXAONE 3.5 leads among open-sourced models.
Additionally, various findings confirm Ko-LongRAG as a reliable benchmark for assessing Korean long-context RAG capabilities and highlight its potential for advancing multilingual RAG research. The dataset and source code will be released publicly.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, NLP Applications, Question Answering
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: Korean
Submission Number: 6871
Loading