Font transformer for few-shot font generation

Xu Chen, Lei Wu, Yongliang Su, Lei Meng, Xiangxu Meng

Published: 01 Jan 2024, Last Modified: 31 Jul 2025Comput. Vis. Image Underst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We propose TransFont, a pure transformer-based model for few-shot font generation, which empirically shows that the ViT is better at glyph image generation than CNNs, thanks to the ability in shape recognition.•We propose the glyph self-attention module for mitigating the quadratic computational and memory complexity of the self-attention mechanism on pixel-level glyph image generation, introducing several new techniques, i.e., multi-head multiple sampling, yz axis convolution, and approximate relative position bias.•We view TransFont as a simple but effective transformer baseline for future research, demonstrating its superiority over existing CNN-based font generation models on two challenging Chinese font libraries, FounderType and SinoType. The proposed TransFont generates glyph images with more accurate font style and stroke details.