PolyChartQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering

PolyChartQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering

ACL ARR 2025 May Submission5248 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Charts are a widely used format around the world to interpret and communicate information. However, existing chart understanding benchmarks are predominantly English-centric, limiting their accessibility and applicability to global audiences. In this paper, we present $\textbf{PolyChartQA}$, the first large-scale multilingual chart question answering benchmark covering 22,606 charts and 26,151 question-answering pairs in 10 diverse languages. PolyChartQA leverages a decoupled pipeline that separates chart data from rendering code, allowing multilingual charts to be flexibly generated by simply translating the data and reusing the code. We adopt state-of-the-art LLM-based translation and enforce rigorous quality control in the pipeline to ensure the consistency of the generated multilingual charts. PolyChartQA enables systematic evaluation of multilingual chart understanding. Experiments across both open- and closed-source large vision-language models (LVLMs) reveal a significant performance gap between English and other languages, especially low-resource ones with non-Latin scripts. This benchmark lays a foundation for developing globally inclusive vision-language models.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: vision question answering, cross-modal information extraction, multimodality

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English, Chinese, Russian, French, Spanish, Japanese, Arabic, Urdu, Hindi, Bengali

Submission Number: 5248

Loading