KoTextVQA: A Benchmark for Understanding and Reasoning in Korean Text-Rich Visual Question Answering

KoTextVQA: A Benchmark for Understanding and Reasoning in Korean Text-Rich Visual Question Answering

ACL ARR 2025 February Submission4485 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In real-world scenarios, text in images conveys essential information, appearing in documents, everyday scenes, and digital displays. Accurately interpreting text and its visual context is a key objective for Vision-Language Models (VLMs), driving advancements in text-rich Visual Question Answering (VQA) datasets and benchmarks. However, low-resource languages remain underexplored, lacking appropriate benchmarks for real-world applications. Without these benchmarks as milestones, systematic evaluation becomes difficult, slowing down iterative improvements in model performance and the refining of fine-tuning strategies. To address this, we introduce KoTextVQA, a Korean Text-rich VQA benchmark for comprehensive VLM evaluation. KoTextVQA enables a multifaceted assessment across diverse image types and domains, while also supporting in-depth analysis through comparisons between visual understanding (System 1) and reasoning (System2) ability. Additionally, we release an automated VQA generation pipeline that leverages well-trained resource-rich language models to efficiently construct benchmarks, facilitating scalable and high-quality dataset creation. While our benchmark is designed for Korean, the proposed methodology is highly adaptable and can be extended to other languages, enabling broader multilingual VLMs research.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, Multilingualism and Cross-Lingual NLP, Question Answering

Contribution Types: Approaches to low-resource settings, Data resources, Data analysis

Languages Studied: Korean, English

Submission Number: 4485

Loading