Abstract: Highlights•Gestalt patterns are characterized in text regions to describe text homogeneity.•A probabilistic hierarchical model is formulated to evaluate text homogeneity.•A Bayesian cue integration model is proposed to compute homogeneity probability.•Document segmentation using probabilistic homogeneity outperforms state-of-the-art.
Loading