RITT: A Retrieval-Assisted Framework with Image and Text Table Representations for Table Question Answering

Wei Zhou; Mohsen Mesgar; Heike Adel; Annemarie Friedrich

RITT: A Retrieval-Assisted Framework with Image and Text Table Representations for Table Question Answering

Wei Zhou, Mohsen Mesgar, Heike Adel, Annemarie Friedrich

Published: 05 Jun 2025, Last Modified: 29 Jun 2025TRL@ACL2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: table question answering, relevant cells retrieval, table representation

Abstract: Tables can be represented either as text or as images. Previous works on table question answering (TQA) typically rely on only one representation, neglecting the potential benefits of combining both. In this work, we explore integrating textual and visual table representations using multi-modal large language models (MLLMs) for TQA. Specifically, we propose RITT, a retrieval-assisted framework that first identifies the most relevant part of a table for a given question, then dynamically selects the optimal table representations based on the question type. Experiments demonstrate that our framework significantly outperforms the baseline MLLMs by an average of 13 Exact Match and surpasses two text-only state-of-the-art TQA methods on four TQA benchmarks, highlighting the benefits of leveraging both textual and visual table representations.

Include In Proceedings: Yes

Copyright Form: pdf

Submission Number: 9

Loading