An End-to-End Local Attention Based Model for Table Recognition

Nam Tuan Ly, Atsuhiro Takasu

Published: 2023, Last Modified: 11 Nov 2025ICDAR (2) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, due to the rapid development of deep learning, especially Transformer, many Transformer-based methods have been studied and proven to be very powerful for table recognition. However, Transformer-based models usually struggle to process big tables due to the limitation of their global attention mechanism. In this paper, we propose a local attention mechanism to address the limitation of the global attention mechanism. We also present an end-to-end local attention-based model for recognizing both table structure and table cell content from a table image. The proposed model consists of four main components: 1) an encoder for feature extraction; 2) the three decoders for the three sub-tasks of the table recognition problem. In the experiments, we evaluate the performance of the proposed model and the effectiveness of the local attention mechanism on the two large-scale datasets: PubTabNet and FinTabNet. The experiment results show that the proposed model outperforms the state-of-the-art methods on all benchmark datasets. Furthermore, we demonstrate the effectiveness of the local attention mechanism for table recognition, especially for big table recognition.