PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents

Muhammad Umer, Muhammad Ahmed Mohsin, Adnan Ul-Hasan, Faisal Shafait

Published: 2023, Last Modified: 05 Jul 2025ICDAR (5) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Table detection and structure recognition is an important component of document analysis systems. Deep learning-based transformer models have recently demonstrated significant success in various computer vision and document analysis tasks. In this paper, we introduce PyramidTabNet (PTN), a method that builds upon Convolution-less Pyramid Vision Transformer to detect tables in document images. Furthermore, we present a tabular image generative augmentation technique to effectively train the architecture. The proposed augmentation process consists of three steps, namely, clustering, fusion, and patching, for the generation of new document images containing tables. Our proposed pipeline demonstrates significant performance improvements for table detection on several standard datasets. Additionally, it achieves performance comparable to the state-of-the-art methods for structure recognition tasks.