Local projection-based character segmentation method for historical Chinese documents

Published: 01 Jan 2013, Last Modified: 08 Mar 2025DRR 2013EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Digitization of historical Chinese documents includes two key technologies, character segmentation and character recognition. This paper focuses on developing character segmentation algorithm. As a preprocessing step, we combine several effective measures to remove noises in a historical Chinese document image. After binarization, a new character segmentation algorithm segment single characters based on projections of a cost image in local windows. The cost image is constructed by utilizing the information of stroke bounding boxes and a skeleton image extracted from the binarized image. We evaluate the proposed algorithm based on matching degrees of character bounding boxes between segmentation results and ground-truth data, and achieve a recall rate of 74.3% on a test set, which shows the effectiveness of the proposed algorithm.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview