ClusterTabNet: Supervised Clustering Method for Table Detection and Table Structure Recognition

Marek Polewczyk, Marco Spinaci

Published: 01 Jan 2024, Last Modified: 16 May 2025ICDAR (5) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Table detection and recognition consists of locating tables within a given document and identifying the exact location of its pieces, such as rows, columns, and headers. We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output. We interpret table structure bottom-up as a graph of relations between pairs of words (belonging to the same row, column, header, as well as to the same table) and use a transformer encoder model to predict its adjacency matrix. We demonstrate the performance of our method on the PubTables-1M dataset introduced in [17] as well as PubTabNet and FinTabNet datasets. Compared to the current state-of-the-art detection methods such as DETR [1] and Faster R-CNN [15], our method achieves similar or better accuracy, while requiring a significantly smaller model. The code is released at https://github.com/SAP-samples/clustertabnet.