Abstract: Highlights•A bimodal learning strategy is proposed to generate vector images from glyph images.•Alignment of word image modalities and sequence modalities mapped to discrete space.•The ideas of Sliding Window attention and RevNet are used in Transformer.•Vector images are generated from raster images by cross-modal model distillation.•Complex vector font synthesis is achieved with important application value.
Loading