Abstract: Document decomposition is a basic but crucial step for many document related applications. This paper proposes a novel approach to decompose document images into zones. It first generates overlapping zone hypotheses based on generic visual features. Then, each candidate zone is eval- uated quantitatively by a learned generative zone model. We formulate the zone inference problem into a constrained optimization problem, so as to select an optimal set of non- overlapping zones that cover a given document image. The experimental results demonstrate that the proposed method is very robust to document structure variation and noise.
Loading