KMMMU: A Korean Massive Multi-discipline Multimodal Understanding Benchmark

KMMMU: A Korean Massive Multi-discipline Multimodal Understanding Benchmark

ACL ARR 2026 January Submission10836 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Korean benchmark, multimodal evaluation, expert-level reasoning

Abstract: The evaluation of expert-level multimodal capability remains constrained by a reliance on English-centric corpora and translated approximations that fail to capture the nuances of localized professional reasoning. To address this limitation, we introduce KMMMU which is a native Korean benchmark constructed exclusively from high-stakes professional and academic examinations. The dataset comprises 3,466 verified image-text pairs spanning diverse disciplines and task types. These questions necessitate the interpretation of information-dense visual inputs including technical diagrams and administrative tables where visual evidence is indispensable for the solution. While scaling and inference-time compute improve logical reasoning, our results suggest that they do not fully mitigate limitations in structural perception or cultural grounding. The observed disparity between processing text-rich documents and abstract diagrams points to ongoing challenges in structural visual reasoning. We hope that KMMMU serves as a useful diagnostic resource to address these fine-grained visual and institutional blind spots in future multimodal systems.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, NLP datasets, evaluation methodologies

Contribution Types: Model analysis & interpretability

Languages Studied: Korean

Submission Number: 10836

Loading