ArtGlyphDiffuser: Text-driven artistic glyph generation via Style-to-CLIP Projection and Multi-Level Controlled diffusion

Published: 01 Jan 2026, Last Modified: 07 Nov 2025Pattern Recognit. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose ArtGlyphDiffuser, a novel Stable-Diffusion-based one-shot text-driven artistic glyph image generation model that achieves state-of-the-art performance in generating characters of various shapes and styles.•To fuse cross-modal information from text and images, we introduce the innovative Style-to-CLIP Projection module, which maps the reference image onto the space that exists within the SD model itself.•We propose a novel controllable strategy and elaborate on a Multi-Level Controlled block that seamlessly integrates information from various scales into the denoising process of the UNet network, thereby enhancing the extraction and generation of complex features.•We utilize a Coarse-Grained Context-Consistent Loss and Randomly Masked Style strategy to fine-tune the model and demonstrate superior performance over existing methods in generating artistic glyph images across various shapes and styles.
Loading