Abstract: Highlights•We propose TransFont, a pure transformer-based model for few-shot font generation, which empirically shows that the ViT is better at glyph image generation than CNNs, thanks to the ability in shape recognition.•We propose the glyph self-attention module for mitigating the quadratic computational and memory complexity of the self-attention mechanism on pixel-level glyph image generation, introducing several new techniques, i.e., multi-head multiple sampling, yz axis convolution, and approximate relative position bias.•We view TransFont as a simple but effective transformer baseline for future research, demonstrating its superiority over existing CNN-based font generation models on two challenging Chinese font libraries, FounderType and SinoType. The proposed TransFont generates glyph images with more accurate font style and stroke details.
Loading