ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation

ICLR 2026 Conference Submission14759 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Infographic Chart, Chart Understanding, Code Generation, Chart Generation, Dataset
TL;DR: ChartGalaxy is a million-scale dataset with 1,701,356 programmatically generated and 61,833 real infographic charts, covering 75 chart types, 440 chart variations, and 68 layout templates, to support automatic understanding and generation.
Abstract: Infographic charts are a powerful medium for communicating abstract data by combining visual elements (e.g., charts, images) with textual information. However, their visual and structural richness poses challenges for large vision-language models (LVLMs), which are typically trained on plain charts. To bridge this gap, we introduce ChartGalaxy, a million-scale dataset designed to advance the understanding and generation of infographic charts. The dataset is constructed through an inductive process that identifies 75 chart types, 440 chart variations, and 68 layout templates from real infographic charts and uses them to create synthetic ones programmatically. We showcase the utility of this dataset through: 1) improving infographic chart understanding via fine-tuning, 2) benchmarking code generation for infographic charts, and 3) enabling example-based infographic chart generation. By capturing the visual and structural complexity of real design, ChartGalaxy provides a useful resource for enhancing multimodal reasoning and generation in LVLMs.
Primary Area: datasets and benchmarks
Submission Number: 14759
Loading