Charting the Uncharted: Building and Analyzing a Multifaceted Chart Question Answering Dataset for Complex Logical Reasoning Process

Published: 01 Jan 2024, Last Modified: 13 Nov 2024PRCV (5) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Charts, as a vital part of visualization language, are omnipresent in real-world. Understanding charts is crucial for unveiling implicit data insights. The evolution of large-scale models has marked significant milestones in chart comprehension. However, comprehending multiple charts jointly remains challenging due to the complexities of multi-chart reasoning and the intricate dataset construction involving multiple charts. In this study, we introduce DGE, a sophisticated logic-based multi-chart question-answering dataset generation engine that, with only simple data input, generates diverse joint charts and questions with complex logic. It employs logical templates to guide question generation, ensuring excellent scalability. Leveraging the DGE engine, we propose MCQA, the inaugural large-scale dataset for joint reasoning question-answering involving multiple charts, which includes 22,860 chart pairs and 100,331 complex questions, each annotated with an inference process. Finally, we evaluate several baselines on the MCQA dataset, establishing a research foundation for the chart question answering community. The MCQA dataset is available at github (https://github.com/ICALK-CVU/MCQA).
Loading