NovaChart: A Large-scale Dataset towards Chart Understanding and Generation of Multimodal Large Language Models

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal Large Language Models (MLLMs) have shown significant potential for chart understanding and generation. However, they are still far from achieving the desired effectiveness in practical applications. This could be due to the limitations of the used training chart data. Existing chart datasets suffer from scarcity of chart types, limited coverage of tasks, and insufficient scalability, making them incapable of effectively enhancing the chart-related capabilities of MLLMs. To tackle these obstacles, we construct NovaChart, a large-scale dataset for chart understanding and generation of MLLMs. NovaChart contains 47K high-resolution chart images and 856K chart-related instructions, covering 18 different chart types and 15 unique tasks of chart understanding and generation. To build NovaChart, we propose a data generation engine for metadata curation, chart visualization and instruction formulation. Chart metadata in NovaChart contains detailed annotations, i.e., data points, visual elements, source data and the visualization code of every chart. This additional information endows NovaChart with considerable scalability, as it can facilitate the extension of chart instruction data to a larger scale and greater diversity. We utilize NovaChart to train several open-source MLLMs. Experimental results demonstrate NovaChart empowers MLLMs with stronger capabilities in 15 chart understanding and generation tasks by a large-margin (35.47\%-619.47\%), bringing them a step closer to smart chart assistants. Our dataset is now available at https://github.com/Elucidator-V/NovaChart.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Generation] Multimedia Foundation Models, [Experience] Multimedia Applications, [Engagement] Summarization, Analytics, and Storytelling
Relevance To Conference: This article introduces a large-scale dataset for chart understanding and generation, which is instrumental in enhancing the capabilities of existing multimodal language models (MLLMs) to comprehend chart image content and follow user instructions more effectively. The dataset aims to improve the models' ability to complete downstream tasks related to charts, thereby aiding in the development of multimodal intelligent assistants with practical value.
Supplementary Material: zip
Submission Number: 1924
Loading