A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code Generation

ACL ARR 2025 February Submission7311 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Chart2Code has recently received significant attention in the multimodal community due to its potential to reduce the burden of visualization and promote a more detailed understanding of charts. However, existing Chart2Code-related training datasets suffer from at least one of the following issues: (1) limited scale,(2) limited type coverage, and (3) inadequate complexity. To address these challenges, we seek more diverse sources that better align with real-world user distributions and construct a data synthesis pipeline and further cre- ated a large-scale Chart2Code training dataset. Experimental results demonstrate that even with fewer parameters, the model finetuned on our dataset achieves state-of-the-art performance on multiple Chart2Code benchmarks within open-source models.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation,Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 7311
Loading