POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering

ACL ARR 2025 July Submission1063 Authors

29 Jul 2025 (modified: 04 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Charts are a universally adopted medium for data communication, yet existing chart understanding benchmarks are overwhelmingly English-centric, limiting their accessibility and relevance to global audiences. To address this limitation, we introduce $\textbf{PolyChartQA}$, the first large-scale multilingual benchmark for chart question answering, comprising 22,606 charts and 26,151 QA pairs across 10 diverse languages. PolyChartQA is built via a novel pipeline that enables scalable multilingual chart generation through data translation and code reuse, incorporating state-of-the-art LLM-based translation and rigorous quality control. We systematically evaluate multilingual chart understanding with PolyChartQA on state-of-the-art LVLMs and reveal a significant performance gap between English and other languages, particularly low-resource ones, which exposes a critical shortcoming in current models. To mitigate this gap, we introduce a large-scale training dataset, $\textbf{PolyChart-Instruct}$, containing 751,363 multilingual chart QA pairs across 131,515 chart images. Fine-tuning Qwen2.5-VL with PolyChart-Instruct yields average performance gains of up to 14 points. Together, our benchmark and datasets provide a foundation for developing globally inclusive vision-language models capable of understanding charts across diverse linguistic contexts.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: multimodality, vision question answering, cross-modal information extraction,
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English, Chinese, Russian, French, Spanish, Japanese, Arabic, Urdu, Hindi, Bengali
Submission Number: 1063
Loading