A New Bottom-Up Path Augmentation Attention Network for Script Identification in Scene Images

Zhi Pan, Yaowei Yang, Kurban Ubul, Alimjan Aysa

Published: 2024, Last Modified: 12 Jun 2025ICDAR (5) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Script identification is an important component of a multilingual OCR system and plays a key role in the stability and accuracy of the OCR system. The greatest challenge of the script identification task is that similar scripts share a large set that consists of the same or similar characters, which makes script identification a fine-grained classification problem. Furthermore, when it comes to scene text script identification, additional challenges emerge, like the complex background, various text styles, arbitrary aspect ratios diverse noise, etc. In this paper, we design the Feature Intensification module, which aims to reduce the interference of redundant features and intensify feature representation through implicit cross-channel interaction and information fusion. To better adapt to sequential texts, the improved Bottom-up Path Augmentation Structure is proposed to capture long-range dependencies and fuse multi-scale feature maps more effectively. Moreover, by combining channel grouping and attention mechanism, the network can more accurately focus on the text and each word in a picture. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications, which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed method on the four benchmark datasets, and the experimental results demonstrate the effectiveness of each carefully designed component. Finally, we achieved better performance compared to competitive models, with accuracy rates of 89.65%, 96.11%, 98.88%, and 97.20% on RRC-MLT 2017, SIW-13, CVSI-2015, and MLe2e, respectively.