Vision-language pre-training for graph-based handwritten mathematical expression recognition

Hong-Yu Guo, Chuang Wang, Fei Yin, Xiao-Hui Li, Cheng-Lin Liu

Published: 2025, Last Modified: 11 Apr 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A Vision-Language Pre-training paradigm for Graph-based handwritten mathematical expression recognition (VLPG) is proposed.•VLPG pre-trains the model through localization pretext task and language modeling task.•A graph-structure-aware attention module is proposed to enhance the transformer decoder for graph-based HMER.•Superior performance has been achieved on benchmark datasets of HMER.