Vision-language pre-training for graph-based handwritten mathematical expression recognition

Published: 01 Jan 2025, Last Modified: 11 Apr 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A Vision-Language Pre-training paradigm for Graph-based handwritten mathematical expression recognition (VLPG) is proposed.•VLPG pre-trains the model through localization pretext task and language modeling task.•A graph-structure-aware attention module is proposed to enhance the transformer decoder for graph-based HMER.•Superior performance has been achieved on benchmark datasets of HMER.
Loading