Abstract: Highlights•A Vision-Language Pre-training paradigm for Graph-based handwritten mathematical expression recognition (VLPG) is proposed.•VLPG pre-trains the model through localization pretext task and language modeling task.•A graph-structure-aware attention module is proposed to enhance the transformer decoder for graph-based HMER.•Superior performance has been achieved on benchmark datasets of HMER.
Loading