KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts

ACL ARR 2025 May Submission4462 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Understanding and reasoning over text within visual contexts poses a significant challenge for Vision-Language Models (VLMs), given the complexity and diversity of real-world scenarios. To address this challenge, text-rich Visual Question Answering (VQA) datasets and evaluation benchmarks have emerged for high-resource languages like English. However, a critical gap remains: the lack of comprehensive, high-quality benchmarks for low-resource languages such as Korean, which hinders reliable model development and comparison. To bridge this gap, we introduce KRETA, a benchmark for Korean Reading and rEasoning in Text-rich VQA Attuned to diverse visual contexts. KRETA facilitates an in-depth evaluation of both visual text understanding and reasoning capabilities, while also supporting a multifaceted assessment across 15 domains and 26 image types. Additionally, we introduce a semi-automated VQA generation pipeline specifically optimized for text-rich settings, leveraging refined stepwise image decomposition and a rigorous seven-metric evaluation protocol to ensure data quality. We hope that our generation pipeline will be adaptable to other languages, accelerating multilingual VLM research. The code and dataset for KRETA are available at [anonymous.4open.science](https://anonymous.4open.science/r/KRETA-90D9/README.md).
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, Multilinguality and Language Diversity, Resources and Evaluation
Contribution Types: Reproduction study, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Korean
Submission Number: 4462
Loading