Abstract: Chart2Code has recently received significant attention in the multimodal community due to its potential to reduce the burden of visualization and promote a more detailed understanding of charts. However, existing Chart2Code-related training datasets suffer from at least
one of the following issues: (1) limited scale,(2) limited type coverage, and (3) inadequate complexity. To address these challenges, we
seek more diverse sources that better align with real-world user distributions and construct a data synthesis pipeline and further cre-
ated a large-scale Chart2Code training dataset. Experimental results demonstrate that even with fewer parameters, the model finetuned
on our dataset achieves state-of-the-art performance on multiple Chart2Code benchmarks within open-source models.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation,Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 7311
Loading