Decompose Document Image Using Integer Linear Programming

Dashan Gao, Yizhou Wang, Haitham A. Hindi, Minh Do

Published: 2007, Last Modified: 13 Nov 2024ICDAR 2007EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Document decomposition is a basic but crucial step for many document related applications. This paper proposes a novel approach to decompose document images into zones. It first generates overlapping zone hypotheses based on generic visual features. Then, each candidate zone is eval- uated quantitatively by a learned generative zone model. We formulate the zone inference problem into a constrained optimization problem, so as to select an optimal set of non- overlapping zones that cover a given document image. The experimental results demonstrate that the proposed method is very robust to document structure variation and noise.