Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning

Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning

ACL ARR 2025 February Submission354 Authors

06 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent large language models (LLMs) have advanced table understanding capabilities but rely on converting tables into text sequences. While multimodal large language models (MLLMs) enable direct visual processing, they face limitations in handling scientific tables due to fixed input image resolutions and insufficient numerical reasoning capabilities. To address these challenges, we present MMSci, a comprehensive dataset for scientific table understanding and reasoning. MMSci consists of three key components: (1) MMSci-Pre, a domain-specific dataset of 52K scientific table structure recognition samples, (2) MMSci-Ins, an instruction tuning dataset with 12K samples across three table-based tasks, and (3) MMSci-Eval, a benchmark with 3,114 testing samples specifically designed to evaluate numerical reasoning capabilities. Based on MMSci, we develop a table-based MLLM framework with dynamic input image resolutions. Extensive experiments demonstrate that our domain-specific approach with 52K scientific table images achieves superior performance compared to 150K general-domain tables, highlighting the importance of data quality over quantity. Our proposed framework shows significant improvements in both general table understanding and numerical reasoning capabilities, with strong generalisation to held-out datasets. Our code and data are publicly available at https://anonymous.4open.science/r/MMSci_Table-F278/.

Paper Type: Long

Research Area: Syntax: Tagging, Chunking and Parsing

Research Area Keywords: multimodality, cross-modal information extraction, vision question answering, cross-modal content generation, cross-modal application

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 354

Loading