RITT: A Retrieval-Assisted Framework with Image and Text Table Representations for Table Question Answering

Published: 05 Jun 2025, Last Modified: 05 Jun 2025TRL@ACL2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: table question answering, relevant cells retrieval, table representation
Abstract: Tables can be represented either as text or as images. Previous works on table question answering (TQA) typically rely on only one representation, neglecting the potential benefits of combining both. In this work, we explore integrating textual and visual table representations using multi-modal large language models (MLLMs) for TQA. Specifically, we propose RITT, a retrieval-assisted framework that first identifies the most relevant part of a table for a given question, then dynamically selects the optimal table representations based on the question type. Experiments demonstrate that our framework significantly outperforms the baseline MLLMs by an average of 13 Exact Match and surpasses two text-only state-of-the-art TQA methods on four TQA benchmarks, highlighting the benefits of leveraging both textual and visual table representations.
Include In Proceedings: Yes
Submission Number: 9
Loading