SPS-CG: Shape, Pronunciation, and Semantic Joint Modeling for Chinese Character Generation

Published: 2025, Last Modified: 04 Nov 2025ICDAR (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent advances in scene text image generation have attracted considerable attention. However, generating complex text with multiple attributes remains a challenging task. Such complexities include non-planar scenes, skewed camera perspectives, and structurally intricate Chinese characters. To address these challenges, we propose a method for Chinese character generation based on joint modeling of shape, pronunciation, and semantics (SPS-CG). Given the intrinsic interconnection between shape, pronunciation, and semantics in the evolution of Chinese characters, joint modeling of multi-dimensional features enables the model to generate more accurate and realistic images. Additionally, we design specialized data synthesis methods for various types of complex text to enhance the diversity of training data available to the model. The proposed approach employs a two-stage training strategy, achieving both text erasing and text generation in natural scenes. Experimental results indicate that our method demonstrates improved performance in terms of image generation accuracy and naturalness compared to existing approaches.
Loading