Hierarchical visual-semantic interaction for scene text recognition

Liang Diao, Xin Tang, Jun Wang, Guotong Xie, Junlin Hu

Published: 2024, Last Modified: 01 Nov 2024Inf. Fusion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We present a hierarchical visual-semantic interaction (HVSI) method to exploit multi-scale interaction of visual and semantic features.•The proposed HVSI method is an end-to-end framework and does not depend on any pre-trained language model.•A visual-semantic alignment module is proposed to alleviate score gap between visual and semantic features.•Extensive experiments on multiple benchmarks show that HVSI achieves state-of- the-art or competitive performances.