TRH2TQA: Table Recognition with Hierarchical Relationships to Table Question-Answering on Business Table Images

Pongsakorn Jirachanchaisiri, Nam Tuan Ly, Atsuhiro Takasu

Published: 01 Jan 2025, Last Modified: 11 Nov 2025WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Despite advancements in visual question answering, challenges persist with documents like financial reports, often structured in complicated tabular structures with complex numerical computations. An alternative approach, the pipeline-driven methodology, includes table recognition (TR) and table question-answering (TQA). Recent advancements in TR support this approach with better accuracy and interpretability. However, real-world tables usually represent hierarchical tables. They pose additional challenges due to merged cells and indents, necessitating a specific approach for hierarchical relationship extraction. In this paper, we propose TRH2TQA (Table Recognition with Hierarchical Relationships to Table Question-Answering) for business table images. It consists of three modules on table images with question-answer pairs. First, the TR module extracts structure and textual content from table images into HTML format. Second, post-structure extraction is applied to identify header and hierarchical relationships using predicted column span and bounding box. Finally, this information is combined with natural language questions in the TQA module to generate the answer through the decoder. In extensive experiments, TRH2TQA outperforms in questionanswering performance on the VQAonBD 2023 dataset.

External IDs:dblp:conf/wacv/Jirachanchaisiri25