Abstract: Scientific illustrations contribute to a deeper understanding of the core content of papers. Many of the scientific illustrations are complex flowcharts consisting of multiple subregions. Each subregion corresponds to a meaningful module or processing stage, which are visually coherent regions with uniform background colors or distinct boundaries. However, segmenting subregions and aligning them with corresponding text descriptions has been overlooked. We propose a self-prompted segmentation of scientific illustrations (SSSI) framework, which automatically utilizes the geometric relationships between text and subregions to generate point and bounding box prompts suitable for SAM. SSSI employs a two-stage processing pipeline. In the first stage, SAM-based pre-segmentation generates bounding boxes that are aligned with text bounding boxes to produce box prompt regions, during which Optical Character Recognition (OCR) is utilized to identify the text bounding boxes and eliminate visual interference caused by text. In the second stage, we propose a dynamic point sampling algorithm that dynamically samples points as prompts and iteratively refines masks of subregions, achieving high-quality subregion extraction. Experimental results show that our framework achieves an average precision (mAP) of 68.0, with an AP@50 of 78.1.
External IDs:dblp:conf/icdar/WangZZHZL25
Loading